In this paper, we propose a novel secure wireless transmission architecture that enables the co-existence of spatial field modulation (SFM) and digital bandpass modulation (DBM), utilizing multi-mode vortex waves and programmable meta-surfaces (PMS). Distinct from conventional joint modulation schemes, our approach establishes two logically independent transmission channels--SFM and DBM--thereby eliminating the need for joint signal design or time synchronization. Specifically, the orthogonality of vortex wave modes is exploited to construct a high-capacity multi-mode DBM channel, in which each mode carries modulated symbols independently. As the composite waveform passes through the PMS, energy from different vortex modes is spatially focused onto distinct positions, dynamically determined by the PMS configuration. This spatial mapping forms a unique lookup table that encodes additional information in the electro-magnetic (EM) field distribution, effectively enabling a second, concurrent SFM channel. To enhance physical-layer security, the DBM channel transmits encrypted symbols transformed via dynamic symbol-domain mapping, while the corresponding mapping relations--or key information--are carried by the SFM channel. This lightweight dual-channel encryption strategy provides strong confidentiality without requiring complex joint decoding. To validate the feasibility of the proposed architecture, we design and implement a proof-of-concept prototype system, and conduct experimental demonstrations under real-world wireless communication conditions. The experimental results confirm the effectiveness of the co-existent DBM-SFM design in achieving reliable and secure transmission. The proposed architecture offers a scalable, low-complexity, and secure transmission solution for future IoT networks, especially in scenarios demanding both spectral efficiency and physical-layer confidentiality.
The dominant paradigm for Audio-Text Retrieval (ATR) relies on mini-batch-based contrastive learning. This process, however, is inherently limited by what we formalize as the Gradient Locality Bottleneck (GLB), which structurally prevents models from leveraging out-of-batch knowledge and thus impairs fine-grained and long-tail learning. While external knowledge-enhanced methods can alleviate the GLB, we identify a critical, unaddressed side effect: the Representation-Drift Mismatch (RDM), where a static knowledge base becomes progressively misaligned with the evolving model, turning guidance into noise. To address this dual challenge, we propose the Adaptive Self-improving Knowledge (ASK) framework, a model-agnostic, plug-and-play solution. ASK breaks the GLB via multi-grained knowledge injection, systematically mitigates RDM through dynamic knowledge refinement, and introduces a novel adaptive reliability weighting scheme to ensure consistent knowledge contributes to optimization. Experimental results on two benchmark datasets with superior, state-of-the-art performance justify the efficacy of our proposed ASK framework.
We introduce a new class of attitude control laws for rotational systems, which generalizes the use of the Euler axis-angle representation beyond quaternion-based formulations. Using basic Lyapunov's stability theory and the notion of extended $K_{\infty}$ functions, we developed a method for determining and enforcing the global asymptotic stability of the single fixed point of the resulting closed-loop (CL) scheme. In contrast with traditional quaternion-based methods, the proposed generalized axis-angle approach enables greater flexibility in the design of the control law, which is of great utility when employed in combination with a switching scheme whose transition state depends on the angular velocity of the controlled rotational system. Through simulation and real-time experimental results, we demonstrate the effectiveness of the proposed approach. According to the recorded data, in the execution of high-speed tumble-recovery maneuvers, the new method consistently achieves shorter stabilization times and requires lower control effort relative to those corresponding to the quaternion-based and geometric-control methods used as benchmarks.
This paper investigates a rate-splitting multiple access (RSMA) for uplink pinching antenna system (PASS). Our objective is to maximize the uplink sum rate by jointly optimizing a continuous antenna positioning and user's transmission power. The formulated problem is highly non-convex and difficult to solve directly; to address this challenge, we propose an alternating optimization (AO) framework which decomposes the original problem into two tractable sub-problems, namely (i) power allocation optimization sub-problem and (ii) antenna position optimization sub-problem. Both sub-problems are solved using successive convex approximation (SCA)-based algorithm and solved alternatively until convergence. The RSMA access for PASS is compared with conventional non-orthogonal multiple access (NOMA) and space-division multiple access (SDMA) techniques. The performance of discrete antenna activation with PASS strategy is also examined. Simulation results demonstrate that our proposed framework significantly enhances the achievable sum rate compared to other multiple access methods.
As integrated sensing and communication (ISAC) systems are deployed in next-generation wireless networks, a new security vulnerability emerges, particularly in terms of sensing privacy. Unauthorized sensing eavesdroppers (Eve) can potentially exploit the ISAC signal for their own independent passive sensing. However, solutions for sensing-secure ISAC remain largely unexplored to date. This work addresses sensing-security for OFDM- and OTFS-based ISAC waveforms from a target-detection perspective, aiming to prevent Eves from exploiting the ISAC signal for unauthorized passive sensing. We develop ISAC system models for the base station (BS), communication user equipment, and the sensing Eve, and define a Kullback-Leibler-divergence-based detection metric that accounts for mainlobe, sidelobe, and noise components in the ambiguity function and the resulting range-Doppler maps of the legitimate BS's and Eve's sensing. Building on this analysis, we formulate a sensing-secure ISAC signaling design problem that tunes a perturbation matrix to jointly control signal amplitude and phase in the time-frequency domain and solve it via simulated annealing. Simulation results show that the proposed scheme substantially degrades Eve's detection probability -- from 79.4% to 37.4% for OTFS and from 94.3% to 33.0% for OFDM -- while incurring only a small loss in BS sensing performance. In addition, it allows controllable trade-offs across sensing-security and communication performance.
The exponential proliferation of mobile devices and data-intensive applications in future wireless networks imposes substantial computational burdens on resource-constrained devices, thereby fostering the emergence of over-the-air computation (AirComp) as a transformative paradigm for edge intelligence.} To enhance the efficiency and scalability of AirComp systems, this paper proposes a comprehensive dual-approach framework that systematically transitions from traditional mathematical optimization to deep reinforcement learning (DRL) for resource allocation under execution uncertainty. Specifically, we establish a rigorous system model capturing execution uncertainty via Gamma-distributed computational workloads, resulting in challenging nonlinear optimization problems involving complex Gamma functions. For single-user scenarios, we design advanced block coordinate descent (BCD) and majorization-maximization (MM) algorithms, which yield semi-closed-form solutions with provable performance guarantees. However, conventional optimization approaches become computationally intractable in dynamic multi-user environments due to inter-user interference and resource contention. To this end, we introduce a Deep Q-Network (DQN)-based DRL framework capable of adaptively learning optimal policies through environment interaction. Our dual methodology effectively bridges analytical tractability with adaptive intelligence, leveraging optimization for foundational insight and learning for real-time adaptability. Extensive numerical results corroborate the performance gains achieved via increased edge server density and validate the superiority of our optimization-to-learning paradigm in next-generation AirComp systems.
We propose and demonstrate a power-fading-aware noise-shaping technique for C-band IMDD system with low resolution DAC, which shapes and concentrates quantization noise within the fading-induced notch areas, yielding 94% improvement in data-rate over traditional counterpart.
Large language models (LLMs) are emerging as key enablers of automation in domains such as telecommunications, assisting with tasks including troubleshooting, standards interpretation, and network optimization. However, their deployment in practice must balance inference cost, latency, and reliability. In this work, we study an edge-cloud-expert cascaded LLM-based knowledge system that supports decision-making through a question-and-answer pipeline. In it, an efficient edge model handles routine queries, a more capable cloud model addresses complex cases, and human experts are involved only when necessary. We define a misalignment-cost constrained optimization problem, aiming to minimize average processing cost, while guaranteeing alignment of automated answers with expert judgments. We propose a statistically rigorous threshold selection method based on multiple hypothesis testing (MHT) for a query processing mechanism based on knowledge and confidence tests. The approach provides finite-sample guarantees on misalignment risk. Experiments on the TeleQnA dataset -- a telecom-specific benchmark -- demonstrate that the proposed method achieves superior cost-efficiency compared to conventional cascaded baselines, while ensuring reliability at prescribed confidence levels.
The Heyland circle diagram is a classical graphical tool for representing the steady-state behavior of induction machines using no-load and blocked-rotor test data. While widely used in alternating-current machinery texts, the diagram is typically presented as a hand-constructed aid and lacks a standardized computational formulation. This paper presents HeylandCircle, a computational framework that reconstructs the classical Heyland circle diagram directly from standard test parameters. The framework formalizes the traditional geometric construction as a deterministic, reproducible sequence of geometric operations, establishing a clear mapping between measured data, fixed geometric objects, and steady-state operating points. Quantities such as power factor, slip, output power, torque, and efficiency are obtained through explicit geometric relationships on the constructed diagram. Validation using a representative textbook example demonstrates close agreement with classical results. The framework provides a computational realization of the traditional Heyland diagram suitable for instruction, analysis, and systematic extension.
We propose and demonstrate an elastic digital-analog radio-over-fiber (RoF) modulation and demodulation architecture, seamlessly bridging A-RoF and D-RoF solutions, achieving quasilinear SNR scaling with respect to 1/{\eta}, and evidenced by R^2=0.9908.
Reconfigurable intelligent surfaces (RISs) have been extensively applied in integrated sensing and communication (ISAC) systems due to the capability of enhancing physical layer security (PLS). However, conventional static RIS architectures lack the flexibility required for adaptive beam control in multi-user and multifunctional scenarios. To address this issue without introducing additional hardware complexity and power consumption, in this paper, we exploit a movable RIS (MRIS) architecture, which consists of a large fixed sub-surface and a smaller movable sub-surface that slides on the fixed sub-surface to achieve dynamic beam reconfiguration with static phase shifts. This paper investigates an MRIS-assisted ISAC system under imperfect sensing estimation, where dedicated radar signals serve as artificial noise to enhance secure transmission against potential eavesdroppers (Eves). The transmit beamforming vectors, MRIS phase shifts, and relative positions of the two sub-surfaces are jointly optimized to maximize the minimum secrecy rate, ensuring robust secrecy performance for the weakest user under the uncertainty of the Eves' channels. To handle the non-convexity, a convex bound is derived for the Eve channel uncertainty, and the S-procedure is employed to reformulate semi-infinite constraints as linear matrix inequalities. An efficient alternating optimization and penalty dual decomposition-based algorithm is developed. Simulation results demonstrate that the proposed MRIS architecture substantially improves secrecy performance, especially when only a small number of elements are allocated to the movable sub-surface.
In this work, we propose a data-driven divide and conquer strategy for the stability analysis of interconnected homogeneous nonlinear networks of degree one with unknown models and a fully unknown topology. The proposed scheme leverages joint dissipativity-type properties of subsystems described by storage functions, while providing a stability certificate over unknown interconnected networks. In our data-driven framework, we begin by formulating the required conditions for constructing storage functions as a robust convex program (RCP). Given that unknown models of subsystems are integrated into one of the constraints of the RCP, we collect data from trajectories of each unknown subsystem and provide a scenario convex program (SCP) that aligns with the original RCP. We solve the SCP as a linear program and construct a storage function for each subsystem with unknown dynamics. Under some newly developed data-driven compositionality conditions, we then construct a Lyapunov function for the fully unknown interconnected network utilizing storage functions derived from data of individual subsystems. We show that our data-driven {divide and conquer strategy} provides correctness guarantees (as opposed to probabilistic confidence) while significantly mitigating the sample complexity problem existing in data-driven approaches. To illustrate the effectiveness of our proposed results, we apply our approaches to three different case studies involving interconnected homogeneous (nonlinear) networks with unknown models. We collect data from trajectories of unknown subsystems and verify the global asymptotic stability (GAS) of the interconnected system with a guaranteed confidence.
This study proposes a practical approach for compressing 360-degree equirectangular videos using pretrained neural video compression (NVC) models. Without requiring additional training or changes in the model architectures, the proposed method extends quantization parameter adaptation techniques from traditional video codecs to NVC, utilizing the spatially varying sampling density in equirectangular projections. We introduce latitude-based adaptive quality parameters through rate-distortion optimization for NVC. The proposed method utilizes vector bank interpolation for latent modulation, enabling flexible adaptation with arbitrary quality parameters and mitigating the limitations caused by rounding errors in the adaptive quantization parameters. Experimental results demonstrate that applying this method to the DCVC-RT framework yields BD-Rate savings of 5.2% in terms of the weighted spherical peak signal-to-noise ratio for JVET class S1 test sequences, with only a 0.3% increase in processing time.
Binaural reproduction is gaining increasing attention with the rise of devices such as virtual reality headsets, smart glasses, and head-tracked headphones. Achieving accurate binaural signals with these systems is challenging, as they often employ arbitrary microphone arrays with limited spatial resolution. The Binaural Signals Matching with Magnitude Least-Squares (BSM-MagLS) method was developed to address limitations of earlier BSM formulations, improving reproduction at high frequencies and under head rotation. However, its accuracy still degrades as head rotation increases, resulting in spatial and timbral artifacts, particularly when the virtual listener's ear moves farther from the nearest microphones. In this work, we propose the integration of deep learning with BSM-MagLS to mitigate these degradations. A post-processing framework based on the SpatialNet network is employed, leveraging its ability to process spatial information effectively and guided by both signal-level loss and a perceptually motivated binaural loss derived from a theoretical model of human binaural hearing. The effectiveness of the approach is investigated in a simulation study with a six-microphone semicircular array, showing its ability to perform robustly across head rotations. These findings are further studied in a listening experiment across different reverberant acoustic environments and head rotations, demonstrating that the proposed framework effectively mitigates BSM-MagLS degradations and provides robust correction across substantial head rotations.
We successfully transmitted a net 582-Gb/s probabilistically shaped PAM12 C-band signal over 11-km dispersion-shifted fibre and net 4$\times$526-Gb/s uniform PAM8 O-band signals over 2-km four-core fibre using a single-carrier 216-GBd IMDD system based on a 150-GHz bandwidth InP-DHBT electrical mixer and a thin-film lithium-niobate modulator.
Many existing audio processing and generation models rely on task-specific architectures, resulting in fragmented development efforts and limited extensibility. It is therefore promising to design a unified framework capable of handling multiple tasks, while providing robust instruction and audio understanding and high-quality audio generation. This requires a compatible paradigm design, a powerful backbone, and a high-fidelity audio reconstruction module. To meet these requirements, this technical report introduces QuarkAudio, a decoder-only autoregressive (AR) LM-based generative framework that unifies multiple tasks. The framework includes a unified discrete audio tokenizer, H-Codec, which incorporates self-supervised learning (SSL) representations into the tokenization and reconstruction process. We further propose several improvements to H-Codec, such as a dynamic frame-rate mechanism and extending the audio sampling rate to 48 kHz. QuarkAudio unifies tasks by using task-specific conditional information as the conditioning sequence of the decoder-only LM, and predicting discrete target audio tokens in an AR manner. The framework supports a wide range of audio processing and generation tasks, including speech restoration (SR), target speaker extraction (TSE), speech separation (SS), voice conversion (VC), and language-queried audio source separation (LASS). In addition, we extend downstream tasks to universal free-form audio editing guided by natural language instructions (including speech semantic editing and audio event editing). Experimental results show that H-Codec achieves high-quality audio reconstruction with a low frame rate, improving both the efficiency and performance of downstream audio generation, and that QuarkAudio delivers competitive or comparable performance to state-of-the-art task-specific or multi-task systems across multiple tasks.
Integrated Sensing and Communication (ISAC) systems enable cellular networks to jointly operate as communication technology and sense the environment. While opportunities and potential performance have been largely investigated in simulations, few experimental works have showcased Automatic Target Recognition (ATR) effectiveness in a real-world deployment based on cellular radio units. To bridge this gap, this paper presents an initial study investigating the feasibility of ATR for ISAC. Our ATR solution uses a Deep Learning (DL)-based detector to infer the target class directly from the radar images generated by the ISAC system. The DL detector is evaluated with experimental data from a ISAC testbed based on commercially available mmWave radio units in the ARENA 2036 industrial research campus located in Stuttgart, Germany. Experimental results demonstrate accurate classification performance, demonstrating the feasibility of ATR ISAC with cellular hardware in our setup. We finally provide insights about the open generalization challenges, that will fuel future work on the topic.
In distributed computing systems, reducing the communication load during the data shuffling phase is a critical challenge, as excessive inter-node transmissions are a major performance bottleneck. One promising approach to alleviate this burden is Embedded Index Coding (EIC), which exploits cached data at user nodes to encode transmissions more efficiently. However, most prior work on EIC has focused on minimizing code length in wired, error-free environments-an objective often suboptimal for wireless multiple-input multiple-output (MIMO) systems, where channel conditions and spatial multiplexing gains must be considered. This paper investigates the joint design of EIC and transmit beamforming in MIMO systems to minimize total transmission time, an NP-hard problem. We first present a conventional optimization method that determines the optimal EIC via exhaustive search. To address its prohibitive complexity and adapt to dynamic wireless environments, we propose a novel, low-complexity multi-agent reinforcement learning (MARL) framework. The proposed framework enables decentralized agents to act on local observations while effectively managing the hybrid action space of discrete EIC selection and continuous beamforming design. Simulation results demonstrate that the proposed MARL-based approach achieves near-optimal performance with significantly reduced complexity, underscoring its effectiveness and practicality for real-world wireless systems.
This paper develops a robust safety-critical control method for nonlinear strictfeedback systems with mismatched disturbances. Using a state transformation and a linear time-varying disturbance observer, the system is converted into a form that enables safe control design. The approach ensures forward invariance of the safety set and also applies to disturbancefree systems. Safety is proven for all cases, and a numerical example illustrates the results.
A robust tracking control strategy is designed to empower wheeled mobile robots (WMRs) to track predetermined routes while operating in diverse fields and encountering disturbances like strong winds or uneven path conditions, which affect tracking performance. Ensuring the applicability of this tracking method in real-world scenarios is essential. To accomplish this, the WMR model is initially transformed into a linear canonical form by leveraging the differential flatness of its kinematic model, facilitating controller design. Subsequently, a novel integral nonlinear hyperplane-based sliding mode control (INH-SMC) technique is proposed for WMR under disturbances. The stability of the technique is analyzed and verified. Finally, its practical viability is demonstrated through a comparative real-world indoor experiment on a TurtleBot3 WMR subjected to disturbances, confirming the feasibility and efficacy of the proposed approach.
In this paper, we address the robustness of parabolic-elliptic systems under boundary control. A sliding mode control strategy is proposed to reject matched perturbations. The stability analysis establishes finite-time convergence of the sliding manifold and exponential stability of the closed-loop system. Since the closed-loop system is discontinuous, we also prove its well-posedness. A numerical example is provided to validate the effectiveness of the proposed approach.
A multiuser uplink transmission framework based on the segmented waveguide-enabled pinching-antenna system (SWAN) is proposed under two operating protocols: segment selection (SS) and segment aggregation (SA). For each protocol, the achievable uplink sum-rate is characterized for both time-division multiple access (TDMA) and non-orthogonal multiple access (NOMA). Low-complexity placement methods for the pinching antennas (PAs) are developed for both protocols and for both multiple-access schemes. Numerical results validate the effectiveness of the proposed methods and show that SWAN achieves higher sum-rate performance than conventional pinching-antenna systems, while SA provides additional performance gains over SS.
Latent force models, a class of hybrid modeling approaches, integrate physical knowledge of system dynamics with a latent force - an unknown, unmeasurable input modeled as a Gaussian process. In this work, we introduce two optimal state estimation frameworks to reconstruct the latent forces and to estimate the states. In contrast to state-of-the-art approaches, the designed estimators enable the consideration of system-inherent constraints. Finally, the performance of the novel frameworks is investigated in several numerical examples. In particular, we demonstrate the performance of the new framework in a real-world biomedical example - the hypothalamic-pituitary-thyroid axis - using hormone measurements.
In this work, we introduce a sample- and data-based moving horizon estimation framework for linear systems. We perform state estimation in a sample-based fashion in the sense that we assume to have only few, irregular output measurements available. This setting is encountered in applications where measuring is expensive or time-consuming. Furthermore, the state estimation framework does not rely on a standard mathematical model, but on an implicit system representation based on measured data. We prove sample-based practical robust exponential stability of the proposed estimator under mild assumptions. Furthermore, we apply the proposed scheme to estimate the states of a gastrointestinal tract absorption system.
The goal of this paper is to provide a new perspective on speech modeling by incorporating perceptual invariances such as amplitude scaling and temporal shifts. Conventional generative formulations often treat each dataset sample as a fixed representative of the target distribution. From a generative standpoint, however, such samples are only one among many perceptually equivalent variants within the true speech distribution. To address this, we propose Linear Projection Conditional Flow Matching (LP-CFM), which models targets as projection-aligned elongated Gaussians along perceptually equivalent variants. We further introduce Vector Calibrated Sampling (VCS) to keep the sampling process aligned with the line-projection path. In neural vocoding experiments across model sizes, data scales, and sampling steps, the proposed approach consistently improves over the conventional optimal transport CFM, with particularly strong gains in low-resource and few-step scenarios. These results highlight the potential of LP-CFM and VCS to provide more robust and perceptually grounded generative modeling of speech.
We investigated two complementary strategies for multicontrast cardiac MR reconstruction: physics-consistent data-space augmentation (DualSpaceCMR) and parameter-efficient capacity scaling via VQPrompt and Moero. DualSpaceCMR couples image-level transforms with kspace noise and motion simulations while preserving forwardmodel consistency. VQPrompt adds a lightweight bottleneck prompt; Moero embeds a sparse mixture of experts within a deep unrolled network with histogram-based routing. In the multivendor, multisite CMRxRecon25 benchmark, we evaluate fewshot and out-of-distribution generalization. On small datasets, k-space motion-plus-noise improves reconstruction; on the large benchmark it degrades performance, revealing sensitivity to augmentation ratio and schedule. VQPrompt produces modest and consistent gains with negligible memory overhead. Moero continues to improve after early plateaus and maintains baseline-like fewshot and out-of-distribution behavior despite mild overfitting, but sparse routing lowers PyTorch throughput and makes wall clock time the main bottleneck. These results motivate scale-aware augmentation and suggest prompt-based capacity scaling as a practical path, while efficiency improvements are crucial for sparse expert models.
In many applications of biotechnology, measurements are available at different sampling rates, e.g., due to online sensors and offline lab analysis. Offline measurements typically involve time delays that may be unknown a priori due to the underlying laboratory procedures. This multirate (MR) setting poses a challenge to Kalman filtering, where conventionally measurement data is assumed to be available on an equidistant time grid and without delays. The present study derives the MR version of an extended Kalman filter (EKF) based on sample state augmentation, and applies it to the anaerobic digestion (AD) process in a simulative agricultural setting. The performance of the MR-EKF is investigated for various scenarios, i.e., varying delay lengths, measurement noise levels, plant-model mismatch (PMM), and initial state error. Provided with an adequate tuning, the MR-EKF could be demonstrated to reliably estimate the process state, to appropriately fuse delayed offline measurements, and to smooth noisy online measurements well. Because of the sample state augmentation approach, the delay length of offline measurements does not critically impair state estimation performance, provided observability is not lost during the delays. Poor state initialization and PMM affect convergence more than measurement noise levels. Further, selecting an appropriate tuning was found to be critically important for successful application of the MR-EKF, for which a systematic approach is presented. This study provides implementation guidance for practitioners aiming at successfully applying state estimation for multirate systems. It thereby contributes to develop demand-driven operation of biogas plants, which may aid in stabilizing a renewable electricity grid.
Accurate assessment of bowel cleanliness is essential for effective colonoscopy procedures. The Boston Bowel Preparation Scale (BBPS) offers a standardized scoring system but suffers from subjectivity and inter-observer variability when performed manually. In this paper, to support robust training and evaluation, we construct a high-quality colonoscopy dataset comprising 2,240 images from 517 subjects, annotated with expert-agreed BBPS scores. We propose a novel automated BBPS scoring framework that leverages the CLIP model with adapter-based transfer learning and a dedicated fecal-feature extraction branch. Our method fuses global visual features with stool-related textual priors to improve the accuracy of bowel cleanliness evaluation without requiring explicit segmentation. Extensive experiments on both our dataset and the public NERTHU dataset demonstrate the superiority of our approach over existing baselines, highlighting its potential for clinical deployment in computer-aided colonoscopy analysis.
In this paper, we investigate a movable antenna (MA) enabled anti-jamming optimization problem, where a legitimate uplink system is exposed to multiple jammers with unknown jamming channels. To enhance the anti-jamming capability of the considered system, an MA array is deployed at the receiver, and the antenna positions and the minimum-variance distortionless-response (MVDR) receive beamformer are jointly optimized to maximize the output signal-to-interference-plus-noise ratio (SINR). The main challenge arises from the fact that the interference covariance matrix is unknown and nonlinearly dependent on the antenna positions. To overcome these issues, we propose a surrogate objective by replacing the unknown covariance with the sample covariance evaluated at the current antenna position anchor. Under a two-timescale framework, the surrogate objective is updated once per block (contains multiple snapshots) at the current anchor position, while the MVDR beamformer is adapted on a per-snapshot basis. We establish a local bound on the discrepancy between the surrogate and the true objective by leveraging matrix concentration inequalities, and further prove that a natural historical-averaging surrogate suffers from a non-vanishing geometric bias. Building on these insights, we develop a low-complexity projected trust-region (TR) surrogate optimization (PTRSO) algorithm that maintains the locality of each iteration via TR constraints and enforces feasibility through projection, which is guaranteed to converge to a stationary point near the anchor. Numerical results verify the effectiveness and robustness of the proposed PTRSO algorithm, which consistently achieves higher output SINR than existing baselines.
The classical Andronov-Vyshnegradsky problem, which deals with locating regions of stability and oscillations in control systems with a Watt regulator, is solved using a divergence method for studying the stability of dynamic systems. This system is studied both with and without the self-regulation effect. The exact value of the hidden boundary of the global stability region is obtained. The stability criteria for a system with a Watt regulator are also presented in the context of the solvability of a linear matrix inequality. Computer modelling shows that the system exhibits hidden oscillations when the self-regulation effect is present and when it is not. The conditions for computing the hidden boundary of global stability are determined by three parameters in the Watt regulator model.
A fast convergence in a fixed-time of solutions of nonlinear dynamical systems, for which special requirements are satisfied on the derivative of a quadratic function calculated along the solutions of the system, is proposed. The conditions for the system solutions to converge to zero and to a given region within a fixed-time are obtained. To achieve fast convergence, a negative power is applied to the derivative of a quadratic function within a specific time interval during the evolution of the system. The application of the proposed results to the design of control laws for arbitrary order linear plants using the backstepping method is considered. All the main results are accompanied by numerical modelling and a comparison of the proposed solutions with some existing ones.
Accurate segmentation of ischemic stroke lesions from diffusion magnetic resonance imaging (MRI) is essential for clinical decision-making and outcome assessment. Diffusion-Weighted Imaging (DWI) and Apparent Diffusion Coefficient (ADC) scans provide complementary information on acute and sub-acute ischemic changes; however, automated lesion delineation remains challenging due to variability in lesion appearance. In this work, we study ischemic stroke lesion segmentation using multimodal diffusion MRI from the ISLES 2022 dataset. Several state-of-the-art convolutional and transformer-based architectures, including U-Net variants, Swin-UNet, and TransUNet, are benchmarked. Based on performance, a dual-encoder TransUNet architecture is proposed to learn modality-specific representations from DWI and ADC inputs. To incorporate spatial context, adjacent slice information is integrated using a three-slice input configuration. All models are trained under a unified framework and evaluated using the Dice Similarity Coefficient (DSC). Results show that transformer-based models outperform convolutional baselines, and the proposed dual-encoder TransUNet achieves the best performance, reaching a Dice score of 85.4% on the test set. The proposed framework offers a robust solution for automated ischemic stroke lesion segmentation from diffusion MRI.
Goal-oriented communications offer an attractive alternative to the Shannon-based communication paradigm, where the data is never reconstructed at the Receiver (RX) side. Rather, focusing on the case of edge inference, the Transmitter (TX) and the RX cooperate to exchange features of the input data that will be used to predict an unseen attribute of them, leveraging information from collected data sets. This chapter demonstrates that the wireless channel can be used to perform computations over the data, when equipped with programmable metasurfaces. The end-to-end system of the TX, RX, and MS-based channel is treated as a single deep neural network which is trained through backpropagation to perform inference on unseen data. Using Stacked Intelligent Metasurfaces (SIM), it is shown that this Metasurfaces-Integrated Neural Network (MINN) can achieve performance comparable to fully digital neural networks under various system parameters and data sets. By offloading computations onto the channel itself, important benefits may be achieved in terms of energy consumption, arising from reduced computations at the transceivers and smaller transmission power required for successful inference.
In this paper, we demonstrate how multiport network theory can be used as a powerful modeling tool in economics. The critical insight is using the port concept to pair the flow of goods (the electrical current) with the agent's incentive (the voltage) in an economic interaction. By building networks of agents interacting through ports, we create models with multiple levels of abstraction, from the macro level down to the micro level. We are thereby able to model complex macroeconomic systems whose dynamical behavior is emergent from the micro level. Using the LTSpice circuit simulator, we then design and analyze a series of example systems that range in complexity from the textbook Robinson Crusoe economy to a model of an entire economy.
Non terrestrial networks (NTNs), particularly low Earth orbit (LEO) satellite systems, play a vital role in supporting future mission critical applications such as disaster relief. Recent advances in artificial intelligence (AI)-native communications enable LEO satellites to act as intelligent edge nodes capable of on board learning and task oriented inference. However, the limited link budget, coupled with severe path loss and fading, significantly constrains reliable downlink transmission. This paper proposes a deep joint source-channel coding (DJSCC)-based downlink scheme for AI-native LEO networks, optimized for goal-oriented visual inference. In the DJSCC approach, only semantically meaningful features are extracted and transmitted, whereas conventional separate source-channel coding (SSCC) transmits the original image data. To evaluate information freshness and visual event detection performance, this work introduces the age of misclassified information (AoMI) metric and a threshold based AoI analysis that measures the proportion of users meeting application specific timeliness requirements. Simulation results show that the proposed DJSCC scheme provides higher inference accuracy, lower average AoMI, and greater threshold compliance than the conventional SSCC baseline, enabling semantic communication in AI native LEO satellite networks for 6G and beyond.
This article presents an analysis of the practical challenges and implementation perspectives of point-to-point continuous-variable quantum key distribution (CV-QKD) systems over optical fiber. The study addresses the physical layer, including the design of transmitters, quantum channels, and receivers, with emphasis on impairments such as attenuation, chromatic dispersion, polarization fluctuations, and coexistence with classical channels. We further examine the role of digital signal processing (DSP) as the bridge between quantum state transmission and classical post-processing, highlighting its impact on excess noise mitigation, covariance matrix estimation, and reconciliation efficiency. The post-processing pipeline is detailed with a focus on parameter estimation in the finite-size regime, information reconciliation using LDPC-based codes optimized for low-SNR conditions, and privacy amplification employing large-block universal hashing. From a hardware perspective, we discuss modular digital architectures that integrate dedicated accelerators with programmable processors, supported by a reference software framework (CV-QKD-ModSim) for algorithm validation and hardware co-design. Finally, we outline perspectives for the deployment of CV-QKD in Brazil, starting from metropolitan testbeds and extending toward hybrid fiber/FSO and space-based infrastructures. The work establishes the foundations for the first point-to-point CV-QKD system in Brazil, while providing a roadmap for scalable and interoperable quantum communication networks.
Drone applications continue to expand across various domains, with flocking offering enhanced cooperative capabilities but introducing significant challenges during initial formation. Existing flocking algorithms often struggle with efficiency and scalability, particularly when potential collisions force drones into suboptimal trajectories. This paper presents a time-efficient prioritised scheduling algorithm that improves the initial formation process of drone flocks. The method assigns each drone a priority based on its number of potential collisions and its likelihood of reaching its target position without permanently obstructing other drones. Using this hierarchy, each drone computes an appropriate delay to ensure a collision-free path. Simulation results show that the proposed algorithm successfully generates collision-free trajectories for flocks of up to 5000 drones and outperforms the coupling-degree-based heuristic prioritised planning method (CDH-PP) in both performance and computational efficiency.
This study introduces a novel data-driven framework and the first-ever county-scale application of Spatio-Temporal Graph Neural Networks (STGNN) to forecast composite sustainability indices from herd-level operational records. The methodology employs a novel, end-to-end pipeline utilizing a Variational Autoencoder (VAE) to augment Irish Cattle Breeding Federation (ICBF) datasets, preserving joint distributions while mitigating sparsity. A first-ever pillar-based scoring formulation is derived via Principal Component Analysis, identifying Reproductive Efficiency, Genetic Management, Herd Health, and Herd Management, to construct weighted composite indices. These indices are modelled using a novel STGNN architecture that explicitly encodes geographic dependencies and non-linear temporal dynamics to generate multi-year forecasts for 2026-2030.
An augmented system model provides an effective way to model non-Markovian quantum systems, which is useful in filtering and control for this class of systems. However, since a large number of ancillary quantum oscillators representing internal modes of a non-Markovian environment directly interact with the principal system in these models, the dimension of the augmented system may be very large causing significant computational burden in designing filters and controllers. In this context, this paper proposes an $\mathscr{H}_2$ model reduction method for the augmented model of linear non-Markovian quantum systems. We first establish necessary and sufficient conditions for the physical realizability of the augmented model of linear non-Markovian quantum systems, which are more stringent than those for Markovian quantum systems. However, these physical realizability conditions of augmented system model pose non-convex constrains in the optimization problem of model reduction, which makes the problem different from the corresponding classical model reduction problem. To solve the problem, we derive necessary conditions for determining the input matrix in the reduced model, with which a theorem for designing the system matrix of the ancillary system in the reduced system is proved. Building on this, we convert the nonlinear equality constraints into inequality constraints so that a semidefinite programming algorithm can be developed to solve the optimization problem for model reduction. A numerical example of a two-mode linear quantum system driven by three internal modes of a non-Markovian environment validates the effectiveness of our method.
Controllable Markov chains describe the dynamics of sequential decision making tasks and are the central component in optimal control and reinforcement learning. In this work, we give the general form of an optimal policy for learning controllable dynamics in an unknown environment by exploring over a limited time horizon. This policy is simple to implement and efficient to compute, and allows an agent to ``learn by exploring" as it maximizes its information gain in a greedy fashion by selecting controls from a constraint set that changes over time during exploration. We give a simple parameterization for the set of controls, and present an algorithm for finding an optimal policy. The reason for this policy is due to the existence of certain types of states that restrict control of the dynamics; such as transient states, absorbing states, and non-backtracking states. We show why the occurrence of these states makes a non-stationary policy essential for achieving optimal exploration. Six interesting examples of controllable dynamics are treated in detail. Policy optimality is demonstrated using counting arguments, comparing with suboptimal policies, and by making use of a sequential improvement property from dynamic programming.
Recent advances in learned image codecs have been extended from human perception toward machine perception. However, progressive image compression with fine granular scalability (FGS)-which enables decoding a single bitstream at multiple quality levels-remains unexplored for machine-oriented codecs. In this work, we propose a novel progressive learned image compression codec for machine perception, PICM-Net, based on trit-plane coding. By analyzing the difference between human- and machine-oriented rate-distortion priorities, we systematically examine the latent prioritization strategies in terms of machine-oriented codecs. To further enhance real-world adaptability, we design an adaptive decoding controller, which dynamically determines the necessary decoding level during inference time to maintain the desired confidence of downstream machine prediction. Extensive experiments demonstrate that our approach enables efficient and adaptive progressive transmission while maintaining high performance in the downstream classification task, establishing a new paradigm for machine-aware progressive image compression.
High-fidelity spectrum cartography is pivotal for spectrum management and wireless situational awareness, yet it remains a challenging ill-posed inverse problem due to the sparsity and irregularity of observations. Furthermore, existing approaches often decouple reconstruction from sensing, lacking a principled mechanism for informative sampling. To address these limitations, this paper proposes a unified diffusion-based Bayesian framework that jointly addresses spectrum reconstruction and active sensing. We formulate the reconstruction task as a conditional generation process driven by a learned diffusion prior. Specifically, we derive tractable, closed-form posterior transition kernels for the reverse diffusion process, which enforce consistency with both linear Gaussian and non-linear quantized measurements. Leveraging the intrinsic probabilistic nature of diffusion models, we further develop an uncertainty-aware active sampling strategy. This strategy quantifies reconstruction uncertainty to adaptively guide sensing agents toward the most informative locations, thereby maximizing spectral efficiency. Extensive experiments demonstrate that the proposed framework significantly outperforms state-of-the-art interpolation, sparsity-based, and deep learning baselines in terms of reconstruction accuracy, sampling efficiency, and robustness to low-bit quantization.
Deteriorating civil infrastructure requires automated inspection techniques overcoming limitations of visual assessment. While Ground Penetrating Radar and Infrared Thermography enable subsurface defect detection, single modal approaches face complementary constraints radar struggles with moisture and shallow defects, while thermography exhibits weather dependency and limited depth. This paper presents a multi modal attention network fusing radar temporal patterns with thermal spatial signatures for bridge deck delamination detection. Our architecture introduces temporal attention for radar processing, spatial attention for thermal features, and cross modal fusion with learnable embeddings discovering complementary defect patterns invisible to individual sensors. We incorporate uncertainty quantification through Monte Carlo dropout and learned variance estimation, decomposing uncertainty into epistemic and aleatoric components for safety critical decisions. Experiments on five bridge datasets reveal that on balanced to moderately imbalanced data, our approach substantially outperforms baselines in accuracy and AUC representing meaningful improvements over single modal and concatenation based fusion. Ablation studies demonstrate cross modal attention provides critical gains beyond within modality attention, while multi head mechanisms achieve improved calibration. Uncertainty quantification reduces calibration error, enabling selective prediction by rejecting uncertain cases. However, under extreme class imbalance, attention mechanisms show vulnerability to majority class collapse. These findings provide actionable guidance: attention based architecture performs well across typical scenarios, while extreme imbalance requires specialized techniques. Our system maintains deployment efficiency, enabling real time inspection with characterized capabilities and limitations.
Audio-Visual Segmentation (AVS) aims to localize sound-producing objects at the pixel level by jointly leveraging auditory and visual information. However, existing methods often suffer from multi-source entanglement and audio-visual misalignment, which lead to biases toward louder or larger objects while overlooking weaker, smaller, or co-occurring sources. To address these challenges, we propose DDAVS, a Disentangled Audio Semantics and Delayed Bidirectional Alignment framework. To mitigate multi-source entanglement, DDAVS employs learnable queries to extract audio semantics and anchor them within a structured semantic space derived from an audio prototype memory bank. This is further optimized through contrastive learning to enhance discriminability and robustness. To alleviate audio-visual misalignment, DDAVS introduces dual cross-attention with delayed modality interaction, improving the robustness of multimodal alignment. Extensive experiments on the AVS-Objects and VPO benchmarks demonstrate that DDAVS consistently outperforms existing approaches, exhibiting strong performance across single-source, multi-source, and multi-instance scenarios. These results validate the effectiveness and generalization ability of our framework under challenging real-world audio-visual segmentation conditions. Project page: this https URL
Recent advancements in joint speech-text models show great potential for seamless voice interactions. However, existing models face critical challenges: temporal resolution mismatch between speech tokens (25Hz) and text tokens (~3Hz) dilutes semantic information, incurs high computational costs, and causes catastrophic forgetting of text LLM knowledge. We introduce Fun-Audio-Chat, a Large Audio Language Model addressing these limitations via two innovations from our previous work DrVoice. First, Dual-Resolution Speech Representations (DRSR): the Shared LLM processes audio at efficient 5Hz (via token grouping), while the Speech Refined Head generates high-quality tokens at 25Hz, balancing efficiency (~50% GPU reduction) and quality. Second, Core-Cocktail Training, a two-stage fine-tuning with intermediate merging that mitigates catastrophic forgetting. We then apply Multi-Task DPO Training to enhance robustness, audio understanding, instruction-following and voice empathy. This multi-stage post-training enables Fun-Audio-Chat to retain text LLM knowledge while gaining powerful audio understanding, reasoning, and generation. Unlike recent LALMs requiring large-scale audio-text pre-training, Fun-Audio-Chat leverages pre-trained models and extensive post-training. Fun-Audio-Chat 8B and MoE 30B-A3B achieve competitive performance on Speech-to-Text and Speech-to-Speech tasks, ranking top among similar-scale models on Spoken QA benchmarks. They also achieve competitive to superior performance on Audio Understanding, Speech Function Calling, Instruction-Following and Voice Empathy. We develop Fun-Audio-Chat-Duplex, a full-duplex variant with strong performance on Spoken QA and full-duplex interactions. We open-source Fun-Audio-Chat-8B with training and inference code, and provide an interactive demo.
This paper presents a robust multi-channel speaker extraction algorithm designed to handle inaccuracies in reference information. While existing approaches often rely solely on either spatial or spectral cues to identify the target speaker, our method integrates both sources of information to enhance robustness. A key aspect of our approach is its emphasis on stability, ensuring reliable performance even when one of the features is degraded or misleading. Given a noisy mixture and two potentially unreliable cues, a dedicated network is trained to dynamically balance their contributions-or disregard the less informative one when necessary. We evaluate the system under challenging conditions by simulating inference-time errors using a simple direction of arrival (DOA) estimator and a noisy spectral enrollment process. Experimental results demonstrate that the proposed model successfully extracts the desired speaker even in the presence of substantial reference inaccuracies.
Most existing image compression approaches perform transform coding in the pixel space to reduce its spatial redundancy. However, they encounter difficulties in achieving both high-realism and high-fidelity at low bitrate, as the pixel-space distortion may not align with human perception. To address this issue, we introduce a Generative Latent Coding (GLC) architecture, which performs transform coding in the latent space of a generative vector-quantized variational auto-encoder (VQ-VAE), instead of in the pixel space. The generative latent space is characterized by greater sparsity, richer semantic and better alignment with human perception, rendering it advantageous for achieving high-realism and high-fidelity compression. Additionally, we introduce a categorical hyper module to reduce the bit cost of hyper-information, and a code-prediction-based supervision to enhance the semantic consistency. Experiments demonstrate that our GLC maintains high visual quality with less than 0.04 bpp on natural images and less than 0.01 bpp on facial images. On the CLIC2020 test set, we achieve the same FID as MS-ILLM with 45% fewer bits. Furthermore, the powerful generative latent space enables various applications built on our GLC pipeline, such as image restoration and style transfer. The code is available at this https URL.
Large language models (LLMs) rely on self-attention for contextual understanding, demanding high-throughput inference and large-scale token parallelism (LTPP). Existing dynamic sparsity accelerators falter under LTPP scenarios due to stage-isolated optimizations. Revisiting the end-to-end sparsity acceleration flow, we identify an overlooked opportunity: cross-stage coordination can substantially reduce redundant computation and memory access. We propose STAR, a cross-stage compute- and memory-efficient algorithm-hardware co-design tailored for Transformer inference under LTPP. STAR introduces a leading-zero-based sparsity prediction using log-domain add-only operations to minimize prediction overhead. It further employs distributed sorting and a sorted updating FlashAttention mechanism, guided by a coordinated tiling strategy that enables fine-grained stage interaction for improved memory efficiency and latency. These optimizations are supported by a dedicated STAR accelerator architecture, achieving up to 9.2$\times$ speedup and 71.2$\times$ energy efficiency over A100, and surpassing SOTA accelerators by up to 16.1$\times$ energy and 27.1$\times$ area efficiency gains. Further, we deploy STAR onto a multi-core spatial architecture, optimizing dataflow and execution orchestration for ultra-long sequence processing. Architectural evaluation shows that, compared to the baseline design, Spatial-STAR achieves a 20.1$\times$ throughput improvement.
Neural vocoders and codecs reconstruct waveforms from acoustic representations, which directly impact the audio quality. Among existing methods, upsampling-based time-domain models are superior in both inference speed and synthesis quality, achieving state-of-the-art performance. Still, despite their success in producing perceptually natural sound, their synthesis fidelity remains limited due to the aliasing artifacts brought by the inadequately designed model architectures. In particular, the unconstrained nonlinear activation generates an infinite number of harmonics that exceed the Nyquist frequency, resulting in ``folded-back'' aliasing artifacts. The widely used upsampling layer, ConvTranspose, copies the mirrored low-frequency parts to fill the empty high-frequency region, resulting in ``mirrored'' aliasing artifacts. Meanwhile, the combination of its inherent periodicity and the mirrored DC bias also brings ``tonal artifact,'' resulting in constant-frequency ringing. This paper aims to solve these issues from a signal processing perspective. Specifically, we apply oversampling and anti-derivative anti-aliasing to the activation function to obtain its anti-aliased form, and replace the problematic ConvTranspose layer with resampling to avoid the ``tonal artifact'' and eliminate aliased components. Based on our proposed anti-aliased modules, we introduce Pupu-Vocoder and Pupu-Codec, and release high-quality pre-trained checkpoints to facilitate audio generation research. We build a test signal benchmark to illustrate the effectiveness of the anti-aliased modules, and conduct experiments on speech, singing voice, music, and audio to validate our proposed models. Experimental results confirm that our lightweight Pupu-Vocoder and Pupu-Codec models can easily outperform existing systems on singing voice, music, and audio, while achieving comparable performance on speech.
Multimodal brain decoding aims to reconstruct semantic information that is consistent with visual stimuli from brain activity signals such as fMRI, and then generate readable natural language descriptions. However, multimodal brain decoding still faces key challenges in cross-subject generalization and interpretability. We propose a BrainROI model and achieve leading-level results in brain-captioning evaluation on the NSD dataset. Under the cross-subject setting, compared with recent state-of-the-art methods and representative baselines, metrics such as BLEU-4 and CIDEr show clear improvements. Firstly, to address the heterogeneity of functional brain topology across subjects, we design a new fMRI encoder. We use multi-atlas soft functional parcellations (soft-ROI) as a shared space. We extend the discrete ROI Concatenation strategy in MINDLLM to a voxel-wise gated fusion mechanism (Voxel-gate). We also ensure consistent ROI mapping through global label alignment, which enhances cross-subject transferability. Secondly, to overcome the limitations of manual and black-box prompting methods in stability and transparency, we introduce an interpretable prompt optimization process. In a small-sample closed loop, we use a locally deployed Qwen model to iteratively generate and select human-readable prompts. This process improves the stability of prompt design and preserves an auditable optimization trajectory. Finally, we impose parameterized decoding constraints during inference to further improve the stability and quality of the generated descriptions.
Unified hyperspectral image (HSI) restoration aims to recover various degraded HSIs using a single model, offering great practical value. However, existing methods often depend on explicit degradation priors (e.g., degradation labels) as prompts to guide restoration, which are difficult to obtain due to complex and mixed degradations in real-world scenarios. To address this challenge, we propose a Degradation-Aware Metric Prompting (DAMP) framework. Instead of relying on predefined degradation priors, we design spatial-spectral degradation metrics to continuously quantify multi-dimensional degradations, serving as Degradation Prompts (DP). These DP enable the model to capture cross-task similarities in degradation distributions and enhance shared feature learning. Furthermore, we introduce a Spatial-Spectral Adaptive Module (SSAM) that dynamically modulates spatial and spectral feature extraction through learnable parameters. By integrating SSAM as experts within a Mixture-of-Experts architecture, and using DP as the gating router, the framework enables adaptive, efficient, and robust restoration under diverse, mixed, or unseen degradations. Extensive experiments on natural and remote sensing HSI datasets show that DAMP achieves state-of-the-art performance and demonstrates exceptional generalization capability. Code is publicly available at this https URL.
The objective of this paper is to jointly synthesize interactive videos and conversational speech from text and reference images. With the ultimate goal of building human-like conversational systems, recent studies have explored talking or listening head generation as well as conversational speech generation. However, these works are typically studied in isolation, overlooking the multimodal nature of human conversation, which involves tightly coupled audio-visual interactions. In this paper, we introduce TAVID, a unified framework that generates both interactive faces and conversational speech in a synchronized manner. TAVID integrates face and speech generation pipelines through two cross-modal mappers (i.e., a motion mapper and a speaker mapper), which enable bidirectional exchange of complementary information between the audio and visual modalities. We evaluate our system across four dimensions: talking face realism, listening head responsiveness, dyadic interaction fluency, and speech quality. Extensive experiments demonstrate the effectiveness of our approach across all these aspects.
The parallel advances in language modeling and speech representation learning have raised the prospect of learning language directly from speech without textual intermediates. This requires extracting semantic representations directly from speech. Our contributions are threefold. First, we introduce SpidR, a self-supervised speech representation model that efficiently learns representations with highly accessible phonetic information, which makes it particularly suited for textless spoken language modeling. It is trained on raw waveforms using a masked prediction objective combined with self-distillation and online clustering. The intermediate layers of the student model learn to predict assignments derived from the teacher's intermediate layers. This learning objective stabilizes the online clustering procedure compared to previous approaches, resulting in higher quality codebooks. SpidR outperforms wav2vec 2.0, HuBERT, WavLM, and DinoSR on downstream language modeling benchmarks (sWUGGY, sBLIMP, tSC). Second, we systematically evaluate across models and layers the correlation between speech unit quality (ABX, PNMI) and language modeling performance, validating these metrics as reliable proxies. Finally, SpidR significantly reduces pretraining time compared to HuBERT, requiring only one day of pretraining on 16 GPUs, instead of a week. This speedup is enabled by the pretraining method and an efficient codebase, which allows faster iteration and easier experimentation. We open-source the training code and model checkpoints at this https URL.
High-mobility wireless communication systems suffer from severe Doppler spread and multi-path delay, which degrade the reliability and spectral efficiency of conventional modulation schemes. Orthogonal time frequency space (OTFS) modulation offers strong robustness in such environments by representing symbols in the delay-Doppler (DD) domain, while faster-than-Nyquist (FTN) signaling can further enhance spectral efficiency through intentional symbol packing. Meanwhile, reconfigurable intelligent surfaces (RIS) provide a promising means to improve link quality via passive beamforming. Motivated by these advantages, we propose a novel RIS-empowered OTFS modulation with FTN signaling (RIS-OTFS-FTN) scheme. First, we establish a unified DD-domain input-output relationship that jointly accounts for RIS passive beamforming, FTN-induced inter-symbol interference, and DD-domain channel characteristics. Based on this model, we provide comprehensive analytical performance for the frame error rate, spectral efficiency, and peak-to-average power ratio (PAPR), etc. Furthermore, a practical RIS phase adjustment strategy with quantized phase selection is designed to maximize the effective channel gain. Extensive Monte Carlo simulations under a standardized extended vehicular A (EVA) channel model validate the theoretical results and provide key insights into the trade-offs among spectral efficiency, PAPR, input back-off (IBO), and error performance, with some interesting this http URL proposed RIS-OTFS-FTN scheme demonstrates notable performance gains in both reliability and spectral efficiency, offering a viable solution for future high-mobility and spectrum-constrained wireless systems.
Recent advances in generative audio models have enabled high-fidelity environmental sound synthesis, raising serious concerns for audio security. The ESDD 2026 Challenge therefore addresses environmental sound deepfake detection under unseen generators (Track 1) and black-box low-resource detection (Track 2) conditions. We propose EnvSSLAM-FFN, which integrates a frozen SSLAM self-supervised encoder with a lightweight FFN back-end. To effectively capture spoofing artifacts under severe data imbalance, we fuse intermediate SSLAM representations from layers 4-9 and adopt a class-weighted training objective. Experimental results show that the proposed system consistently outperforms the official baselines on both tracks, achieving Test Equal Error Rates (EERs) of 1.20% and 1.05%, respectively.
Pinching antennas enable dynamic control of electromagnetic wave propagation through reconfigurable radiating structures, but selecting an optimal subset of antennas remains a combinatorial problem with exponential complexity. This letter considers antenna subset selection for a waveguide-fed pinching antenna array serving ground users under a time-division access scheme. The achievable rate depends on the coherent superposition of the effective complex channel gains and is therefore highly sensitive to the relative phase alignment of the activated antennas. To address the prohibitive complexity of exhaustive search, we propose a Viterbi state selection (VSS) algorithm that exploits the phase structure of the combined received signal. The trellis state is defined by a quantized representation of the phase of the accumulated complex gain, and a Viterbi-based survivor rule is used to prune dominated antenna subsets across stages. Numerical results demonstrate that the proposed method achieves the same antenna selection and rate as exhaustive search, while reducing the computational complexity from exponential to polynomial in the number of available antennas.
Cooperative collision avoidance between robots in swarm operations remains an open challenge. Assuming a decentralized architecture, each robot is responsible for making its own control decisions, including motion planning. To this end, most existing approaches mostly rely some form of (wireless) communication between the agents of the swarm. In reality, however, communication is brittle. It may be affected by latency, further delays and packet losses, transmission faults, and is subject to adversarial attacks, such as jamming or spoofing. This paper proposes Contingency Model-based Control (CMC) as a communicationless alternative. It follows the implicit cooperation paradigm, under which the design of the robots is based on consensual (offline) rules, similar to traffic rules. They include the definition of a contingency trajectory for each robot, and a method for construction of mutual collision avoidance constraints. The setup is shown to guarantee the recursive feasibility and collision avoidance between all swarm members in closed-loop operation. Moreover, CMC naturally satisfies the Plug \& Play paradigm, i.e., for new robots entering the swarm. Two numerical examples demonstrate that the collision avoidance guarantee is intact and that the robot swarm operates smoothly under the CMC regime.
We demonstrate a dual optical frequency comb concept that down-converts arbitrary narrowband D-band (110-170 GHz) signals to baseband without any filter or optical/RF frequency tuning, using low frequency RF components.
Most current assessments use ex post proxies that miss uncertainty and fail to consistently capture the rapid change in bitcoin mining. We introduce a unified, ex ante statistical model that derives expected return, downside risk, and upside potential profit from the first principles of mining: Each hash is a Bernoulli trial with a Bitcoin block difficulty-based success probability. The model yields closed-form expected revenue per hash-rate unit, risk metrics in different scenarios, and upside-profit probabilities for different fleet sizes. Empirical calibration closely matches previously reported observations, yielding a unified, faithful quantification across hardware, pools, and operating conditions. This foundation enables more reliable analysis of mining impacts and behavior.
As systems engineering (SE) objectives evolve from design and operation of monolithic systems to complex System of Systems (SoS), the discipline of Mission Engineering (ME) has emerged which is increasingly being accepted as a new line of thinking for the SE community. Moreover, mission environments are uncertain, dynamic, and mission outcomes are a direct function of how the mission assets will interact with this environment. This proves static architectures brittle and calls for analytically rigorous approaches for ME. To that end, this paper proposes an intelligent mission coordination methodology that integrates digital mission models with Reinforcement Learning (RL), that specifically addresses the need for adaptive task allocation and reconfiguration. More specifically, we are leveraging a Digital Engineering (DE) based infrastructure that is composed of a high-fidelity digital mission model and agent-based simulation; and then we formulate the mission tactics management problem as a Markov Decision Process (MDP), and employ an RL agent trained via Proximal Policy Optimization. By leveraging the simulation as a sandbox, we map the system states to actions, refining the policy based on realized mission outcomes. The utility of the RL-based intelligent mission coordinator is demonstrated through an aerial firefighting case study. Our findings indicate that the RL-based intelligent mission coordinator not only surpasses baseline performance but also significantly reduces the variability in mission performance. Thus, this study serves as a proof of concept demonstrating that DE-enabled mission simulations combined with advanced analytical tools offer a mission-agnostic framework for improving ME practice; which can be extended to more complicated fleet design and selection problems in the future from a mission-first perspective.
Modeling of multiple-scattering channels in atmospheric turbulence is essential for the performance analysis of long-distance non-line-of-sight (NLOS) ultraviolet (UV) communications. Existing works on the turbulent channel modeling for NLOS UV communications either focused on single-scattering cases or estimate the turbulent fluctuation effect in an unreliable way based on Monte-Carlo simulation (MCS) approach. In this paper, we establish a comprehensive turbulent multiple-scattering channel model by using a more efficient Monte-Carlo integration (MCI) approach for NLOS UV communications, where both the scattering, absorption, and turbulence effects are considered. Compared with the MCS approach, the MCI approach is more interpretable for estimating the turbulent fluctuation. To achieve this, we first introduce the scattering, absorption, and turbulence effects for NLOS UV communications in turbulent channels. Then we propose the estimation methods based on MCI approach for estimating both the turbulent fluctuation and the distribution of turbulent fading coefficient. Numerical results demonstrate that the turbulence-induced scattering effect can always be ignored for typical UV communication scenarios. Besides, the turbulent fluctuation will increase as either the communication distance increases or the zenith angle decreases, which is compatible with existing experimental results and also with our experimental results. Moreover, we demonstrate numerically that the distribution of the turbulent fading coefficient for UV multiple-scattering channels under all turbulent conditions can be approximated as log-normal distribution; and we also demonstrate both numerically and experimentally that the turbulent fading can be approximated as a Gaussian distribution under weak turbulence.
Nonuniformly sampled signals are prevalent in real-world applications. However, estimating their power spectra from finite samples poses a significant challenge. The optimal solution-Bronez Generalized Prolate Spheroidal Sequence (GPSS) by solving the associated Generalized Eigenvalue Problem (GEP)-is computationally intensive and thus impractical for large datasets. This paper describes a fast, nonparametric method: Multiband-Multitaper Nonuniform Fast Fourier Transform (M${}^2$NuFFT), which substantially reduces computational burden while maintaining statistical efficiency. The algorithm partitions the signal frequency band into multiple sub-bands. Within each sub-band, optimal tapers are computed at a nominal analysis band and shifted to other analysis bands using the Nonuniform Fast Fourier Transform (NuFFT), avoiding repeated GEP computations. Spectral power within the analysis band is then estimated as the average power across the taper outputs. For the special case where the nominal band is centered at zero frequency, tapers can be approximated via cubic spline interpolation of Discrete Prolate Spheroidal Sequence (DPSS), eliminating GEP computation entirely. This reduces the complexity from $O(N^4)$ to $O(N \log N + N \log(1/\epsilon))$. Statistical properties of the estimator, assessed using Bronez GPSS theory, reveal that the bias and variance bound of the M2NuFFT estimator are identical to those of the optimal estimator. Additionally, the degradation of bias bound indicates deviation from optimality. Finally, we propose an extension of Thomson F-test to test periodicity in nonuniform samples. The estimator's performance is validated through simulation and real-world data, demonstrating its practical applicability. The MATLAB code of the fast algorithm is available on GitHub (this https URL).
This paper introduces low-complexity frequency-dependent (memory) linearizers designed to suppress nonlinear distortion in analog-to-digital interfaces. Two different linearizers are considered, based on nonlinearity models which correspond to sampling before and after the nonlinearity operations, respectively. The proposed linearizers are inspired by convolutional neural networks but have an order-of-magnitude lower implementation complexity compared to existing neural-network-based linearizer schemes. The proposed linearizers can also outperform the traditional parallel Hammerstein (as well as Wiener) linearizers even when the nonlinearities have been generated through a Hammerstein model. Further, a design procedure is proposed in which the linearizer parameters are obtained through matrix inversion. This eliminates the need for costly and time-consuming iterative nonconvex optimization which is traditionally associated with neural network training. The design effectively handles a wide range of wideband multi-tone signals and filtered white noise. Examples demonstrate significant signal-to-noise-and-distortion ratio (SNDR) improvements of some $20$--$30$ dB, as well as a lower implementation complexity than the Hammerstein linearizers.
Recent advances in automatic speech recognition (ASR) have combined speech encoders with large language models (LLMs) through projection, forming Speech LLMs with strong performance. However, adapting them to new domains remains challenging, especially in low-resource settings where paired speech-text data is scarce. We propose a text-only fine-tuning strategy for Speech LLMs using unpaired target-domain text without requiring additional audio. To preserve speech-text alignment, we introduce a real-time evaluation mechanism during fine-tuning. This enables effective domain adaptation while maintaining source-domain performance. Experiments on LibriSpeech, SlideSpeech, and Medical datasets show that our method achieves competitive recognition performance, with minimal degradation compared to full audio-text fine-tuning. It also improves generalization to new domains without catastrophic forgetting, highlighting the potential of text-only fine-tuning for low-resource domain adaptation of ASR.
Integrated sensing and communication (ISAC) networks strive to deliver both high-precision target localization and high-throughput data services across the entire coverage area. In this work, we examine the fundamental trade-off between sensing and communication from the perspective of base station (BS) deployment. Furthermore, we conceive a design that simultaneously maximizes the target localization coverage, while guaranteeing the desired communication performance. In contrast to existing schemes optimized for a single target, an effective network-level approach has to ensure consistent localization accuracy throughout the entire service area. While employing time-of-flight (ToF) based localization, we first analyze the deployment problem from a localization-performance coverage perspective, aiming for minimizing the area Cramer-Rao Lower Bound (A-CRLB) to ensure uniformly high positioning accuracy across the service area. We prove that for a fixed number of BSs, uniformly scaling the service area by a factor \kappa increases the optimal A-CRLB in proportion to \kappa^{2\beta}, where \beta is the BS-to-target pathloss exponent. Based on this, we derive an approximate scaling law that links the achievable A-CRLB across the area of interest to the dimensionality of the sensing area. We also show that cooperative BSs extend the coverage but yield marginal A-CRLB improvement as the dimensionality of the sensing area grows.
With the rapid development of Generative Artificial Intelligence (GAI) technology, Generative Diffusion Models (GDMs) have shown significant empowerment potential in the field of wireless networks due to advantages, such as noise resistance, training stability, controllability, and multimodal generation. Although there have been multiple studies focusing on GDMs for wireless networks, there is still a lack of comprehensive reviews on their technological evolution. Motivated by this, we systematically explore the application of GDMs in wireless networks. Firstly, we identify the core challenges of wireless networks and argue why GDMs are uniquely suited to address them. We then introduce the mathematical principles of GDMs and representative models. Furthermore, we organize our comprehensive review through a structured taxonomy that categorizes GDM-based schemes into the sensing, transmission, and Applications, complemented by a security plane. For each representative scheme, we analyze its innovative points, the role of GDMs, strengths, and weaknesses. Ultimately, we extract key challenges and provide potential solutions, with the aim of providing directional guidance for future research in this field.
This work identifies and attempts to address a fundamental limitation of implicit neural representations with sinusoidal activation. The fitting error of SIRENs is highly sensitive to the target frequency content and to the choice of initialization. In extreme cases, this sensitivity leads to a spectral bottleneck that can result in a zero-valued output. This phenomenon is characterized by analyzing the evolution of activation spectra and the empirical neural tangent kernel (NTK) during the training process. An unfavorable distribution of energy across frequency modes was noted to give rise to this failure mode. Furthermore, the effect of Gaussian perturbations applied to the baseline uniformly initialized weights is examined, showing how these perturbations influence activation spectra and the NTK eigenbasis of SIREN. Overall, initialization emerges as a central factor governing the evolution of SIRENs, indicating the need for adaptive, target-aware strategies as the target length increases and fine-scale detail becomes essential. The proposed weight initialization scheme (WINNER) represents a simple ad hoc step in this direction and demonstrates that fitting accuracy can be significantly improved by modifying the spectral profile of network activations through a target-aware initialization. The approach achieves state-of-the-art performance on audio fitting tasks and yields notable improvements in image fitting tasks.
We propose DeepASA, a multi-purpose model for auditory scene analysis that performs multi-input multi-output (MIMO) source separation, dereverberation, sound event detection (SED), audio classification, and direction-of-arrival estimation (DoAE) within a unified framework. DeepASA is designed for complex auditory scenes where multiple, often similar, sound sources overlap in time and move dynamically in space. To achieve robust and consistent inference across tasks, we introduce an object-oriented processing (OOP) strategy. This approach encapsulates diverse auditory features into object-centric representations and refines them through a chain-of-inference (CoI) mechanism. The pipeline comprises a dynamic temporal kernel-based feature extractor, a transformer-based aggregator, and an object separator that yields per-object features. These features feed into multiple task-specific decoders. Our object-centric representations naturally resolve the parameter association ambiguity inherent in traditional track-wise processing. However, early-stage object separation can lead to failure in downstream ASA tasks. To address this, we implement temporal coherence matching (TCM) within the chain-of-inference, enabling multi-task fusion and iterative refinement of object features using estimated auditory parameters. We evaluate DeepASA on representative spatial audio benchmark datasets, including ASA2, MC-FUSS, and STARSS23. Experimental results show that our model achieves state-of-the-art performance across all evaluated tasks, demonstrating its effectiveness in both source separation and auditory parameter estimation under diverse spatial auditory scenes.
Dynamic line rating (DLR) enables greater utilization of existing transmission lines by leveraging real-time weather data. However, the elevated temperature operation (ETO) of conductors under DLR is often overlooked, despite its long-term impact on conductor health. This paper addresses this issue by 1) quantifying depreciation costs associated with ETO and 2) proposing a Conductor Health-Aware Unit Commitment (CHA-UC) that internalizes these costs in operational decisions. The CHA-UC incorporates a robust linear approximation of conductor temperature and integration of expected depreciation costs due to hourly ETO into the objective function. Case studies on the Texas 123-bus backbone test system using NOAA weather data demonstrate that the proposed CHA-UC model reduces the total cost by 0.8% and renewable curtailment by 84%compared to static line rating (SLR), while conventional DLR operation without risk consideration resulted in higher costs due to excessive ETO. Further analysis of the commitment decisions and the line temperature statistics confirms that the CHA-UC achieves safer line flows by shifting generator commitments. Finally, we examine the emergent correlation between wind generation and DLR forecast errors, and show that CHA-UC adaptively manages this effect by relaxing flows for risk-hedging conditions while tightening flows for risk-amplifying ones.
Accurate channel state information (CSI) is critical for current and next-generation multi-antenna systems. Yet conventional pilot-based estimators incur prohibitive overhead as antenna counts grow. In this paper, we address this challenge by developing a novel framework based on Gaussian process regression (GPR) that predicts full CSI from only a few observed entries, thereby reducing pilot overhead. The correlation between data points in GPR is defined by the covariance function, known as kernel. In the proposed GPR-based CSI estimation framework, we incorporate three kernels, i.e., radial basis function, Mat'ern, and rational quadratic, to model smooth and multi-scale spatial correlations derived from the antenna array geometry. The proposed approach is evaluated across two channel models with three distinct pilot probing schemes. Results show that the proposed GPR with 50% pilot saving achieves the lowest prediction error, the highest empirical 95% credible-interval coverage, and the best preservation of spectral efficiency relative to the benchmarks.
State space models (SSMs) have recently emerged as a powerful framework for long sequence processing, outperforming traditional methods on diverse benchmarks. Fundamentally, SSMs can generalize both recurrent and convolutional networks and have been shown to even capture key functions of biological systems. Here we report an approach to implement SSMs in energy-efficient compute-in-memory (CIM) hardware to achieve real-time, event-driven processing. Our work re-parameterizes the model to function with real-valued coefficients and shared decay constants, reducing the complexity of model mapping onto practical hardware systems. By leveraging device dynamics and diagonalized state transition parameters, the state evolution can be natively implemented in crossbar-based CIM systems combined with memristors exhibiting short-term memory effects. Through this algorithm and hardware co-design, we show the proposed system offers both high accuracy and high energy efficiency while supporting fully asynchronous processing for event-based vision and audio tasks.
Single-channel audio separation aims to separate individual sources from a single-channel mixture. Most existing methods rely on supervised learning with synthetically generated paired data. However, obtaining high-quality paired data in real-world scenarios is often difficult. This data scarcity can degrade model performance under unseen conditions and limit generalization ability. To this end, in this work, we approach this problem from an unsupervised perspective, framing it as a probabilistic inverse problem. Our method requires only diffusion priors trained on individual sources. Separation is then achieved by iteratively guiding an initial state toward the solution through reconstruction guidance. Importantly, we introduce an advanced inverse problem solver specifically designed for separation, which mitigates gradient conflicts caused by interference between the diffusion prior and reconstruction guidance during inverse denoising. This design ensures high-quality and balanced separation performance across individual sources. Additionally, we find that initializing the denoising process with an augmented mixture instead of pure Gaussian noise provides an informative starting point that significantly improves the final performance. To further enhance audio prior modeling, we design a novel time-frequency attention-based network architecture that demonstrates strong audio modeling capability. Collectively, these improvements lead to significant performance gains, as validated across speech-sound event, sound event, and speech separation tasks.
In recent years, the demand of image compression models for machine vision has increased dramatically. However, the training frameworks of image compression still focus on the vision of human, maintaining the excessive perceptual details, thus have limitations in optimally reducing the bits per pixel in the case of performing machine vision tasks. In this paper, we propose Semantic-based Low-bitrate Image compression for Machines by leveraging diffusion, termed SLIM. This is a new effective training framework of image compression for machine vision, using a pretrained latent diffusion this http URL compressor model of our method focuses only on the Region-of-Interest (RoI) areas for machine vision in the image latent, to compress it compactly. Then the pretrained Unet model enhances the decompressed latent, utilizing a RoI-focused text caption which containing semantic information of the image. Therefore, SLIM is able to focus on RoI areas of the image without any guide mask at the inference stage, achieving low bitrate when compressing. And SLIM is also able to enhance a decompressed latent by denoising steps, so the final reconstructed image from the enhanced latent can be optimized for the machine vision task while still containing perceptual details for human vision. Experimental results show that SLIM achieves a higher classification accuracy in the same bits per pixel condition, compared to conventional image compression models for machines.
Network-based Global Navigation Satellite Systems (GNSS) underpin critical infrastructure and autonomous systems, yet typically rely on centralized processing hubs that limit scalability, resilience, and latency. Here we report a global-scale, decentralized GNSS architecture spanning hundreds of ground stations. By modeling the receiver network as a time-varying graph, we employ a deep linear neural network approach to learn topology-aware mixing schedules that optimize information exchange. This enables a gradient tracking diffusion strategy wherein stations execute local inference and exchange succinct messages to achieve two concurrent objectives: centimeter-level self-localization and network-wide consensus on satellite correction products. The consensus products are broadcast to user receivers as corrections, supporting precise point positioning (PPP) and precise point positioning-real-time kinematic (PPP-RTK). Numerical results demonstrate that our method matches the accuracy of centralized baselines while significantly outperforming existing decentralized methods in convergence speed and communication overhead. By reframing decentralized GNSS as a networked signal processing problem, our results pave the way for integrating decentralized optimization, consensus-based inference, and graph-aware learning as effective tools in operational satellite navigation.
Finite element methods typically require a high resolution to satisfactorily approximate micro and even macro patterns of an underlying physical model. This issue can be circumvented by appropriate multiscale strategies that are able to obtain reasonable approximations on under-resolved scales. In this paper, we study the implicit neural representation and propose a continuous super-resolution network as a correction strategy for multiscale effects. It can take coarse finite element data to learn both in-distribution and out-of-distribution high-resolution finite element predictions. Our highlight is the design of a local implicit transformer, which is able to learn multiscale features. We also propose Gabor wavelet-based coordinate encodings, which can overcome the bias of neural networks learning low-frequency features. Finally, perception is often preferred over distortion, so scientists can recognize the visual pattern for further investigation. However, implicit neural representation is known for its lack of local pattern supervision. We propose to use stochastic cosine similarities to compare the local feature differences between prediction and ground truth. It shows better performance on structural alignments. Our experiments show that our proposed strategy achieves superior performance as an in-distribution and out-of-distribution super-resolution strategy.
Recent advances in deep learning-based point cloud registration have improved generalization, yet most methods still require retraining or manual parameter tuning for each new environment. In this paper, we identify three key factors limiting generalization: (a) reliance on environment-specific voxel size and search radius, (b) poor out-of-domain robustness of learning-based keypoint detectors, and (c) raw coordinate usage, which exacerbates scale discrepancies. To address these issues, we present a zero-shot registration pipeline called BUFFER-X by (a) adaptively determining voxel size/search radii, (b) using farthest point sampling to bypass learned detectors, and (c) leveraging patch-wise scale normalization for consistent coordinate bounds. In particular, we present a multi-scale patch-based descriptor generation and a hierarchical inlier search across scales to improve robustness in diverse scenes. We also propose a novel generalizability benchmark using 11 datasets that cover various indoor/outdoor scenarios and sensor modalities, demonstrating that BUFFER-X achieves substantial generalization without prior information or manual parameter tuning for the test datasets. Our code is available at this https URL.
Automatic Speech Recognition (ASR) error correction aims to correct recognition errors while preserving accurate text. Although traditional approaches demonstrate moderate effectiveness, LLMs offer a paradigm that eliminates the need for training and labeled data. However, directly using LLMs will encounter hallucinations problem, which may lead to the modification of the correct text. To address this problem, we propose the Reliable LLM Correction Framework (RLLM-CF), which consists of three stages: (1) error pre-detection, (2) chain-of-thought sub-tasks iterative correction, and (3) reasoning process verification. The advantage of our method is that it does not require additional information or fine-tuning of the model, and ensures the correctness of the LLM correction under multi-pass programming. Experiments on AISHELL-1, AISHELL-2, and Librispeech show that the GPT-4o model enhanced by our framework achieves 21%, 11%, 9%, and 11.4% relative reductions in CER/WER.
Accurate perception of the surrounding environment is essential for safe autonomous driving. 3D occupancy prediction, which estimates detailed 3D structures of roads, buildings, and other objects, is particularly important for vision-centric autonomous driving systems that do not rely on LiDAR sensors. However, in 3D semantic occupancy prediction -- where each voxel is assigned a semantic label -- annotated LiDAR point clouds are required, making data acquisition costly. In contrast, large-scale binary occupancy data, which only indicate occupied or free space without semantic labels, can be collected at a lower cost. Despite their availability, the potential of leveraging such data remains unexplored. In this study, we investigate the utilization of large-scale binary occupancy data from two perspectives: (1) pre-training and (2) learning-based auto-labeling. We propose a novel binary occupancy-based framework that decomposes the prediction process into binary and semantic occupancy modules, enabling effective use of binary occupancy data. Our experimental results demonstrate that the proposed framework outperforms existing methods in both pre-training and auto-labeling tasks, highlighting its effectiveness in enhancing 3D semantic occupancy prediction. The code will be available at this https URL
This paper presents a novel framework for automatically learning complete Stack-of-Tasks (SoT) controllers for redundant robotic systems, including task priorities, activation logic, and control parameters. Unlike classical SoT pipelines-where task hierarchies are manually defined and tuned-our approach optimizes the full SoT structure directly from a user-specified cost function encoding intuitive preferences such as safety, precision, manipulability, or execution speed. The method combines Genetic Programming with simulation-based evaluation to explore both discrete (priority order, task activation) and continuous (gains, trajectory durations) components of the controller. We validate the framework on a dual-arm mobile manipulator (the ABB mobile-YuMi research platform), demonstrating robust convergence across multiple cost definitions, automatic suppression of irrelevant tasks, and strong resilience to distractors. Learned SoTs exhibit expert-like hierarchical structure and adapt naturally to multi-objective trade-offs. Crucially, all controllers transfer from Gazebo simulation to the real robot, achieving safe and precise motion without additional tuning. Experiments in static and dynamic environments show reliable obstacle avoidance, high tracking accuracy, and predictable behavior in the presence of humans. The proposed method provides an interpretable and scalable alternative to manual SoT design, enabling rapid, user-driven generation of task execution hierarchies for complex robotic systems.
3D Gaussian Splatting (3DGS) has become a powerful representation for image-based object reconstruction, yet its performance drops sharply in sparse-view settings. Prior works address this limitation by employing diffusion models to repair corrupted renders, subsequently using them as pseudo ground truths for later optimization. While effective, such approaches incur heavy computation from the diffusion fine-tuning and repair steps. We present WaveletGaussian, a framework for more efficient sparse-view 3D Gaussian object reconstruction. Our key idea is to shift diffusion into the wavelet domain: diffusion is applied only to the low-resolution LL subband, while high-frequency subbands are refined with a lightweight network. We further propose an efficient online random masking strategy to curate training pairs for diffusion fine-tuning, replacing the commonly used, but inefficient, leave-one-out strategy. Experiments across two benchmark datasets, Mip-NeRF 360 and OmniObject3D, show WaveletGaussian achieves competitive rendering quality while substantially reducing training time.
The trade-off between model fidelity and computational cost remains a central challenge in the computational modeling of extrusion-based 3D printing, particularly for real time optimization and control. Although high fidelity simulations have advanced considerably for offline analysis, dynamical modeling tailored for online, control-oriented applications is still significantly underdeveloped. In this study, we propose a reduced order dynamical flow model that captures the transient behavior of extrusion-based 3D printing. The model is grounded in physics-based principles derived from the Navier Stokes equations and further simplified through spatial averaging and input dependent parameterization. To assess its performance, the model is identified via a nonlinear least squares approach using Computational Fluid Dynamics (CFD) simulation data spanning a range of printing conditions and subsequently validated across multiple combinations of training and testing scenarios. The results demonstrate strong agreement with the CFD data within the nozzle, the nozzle substrate gap, and the deposited layer regions. Overall, the proposed reduced order model successfully captures the dominant flow dynamics of the process while maintaining a level of simplicity compatible with real time control and optimization.
This work proposes a unified control architecture that couples a Reinforcement Learning (RL)-driven controller with a disturbance-rejection Extended State Observer (ESO), complemented by an Event-Triggered Mechanism (ETM) to limit unnecessary computations. The ESO is utilized to estimate the system states and the lumped disturbance in real time, forming the foundation for effective disturbance compensation. To obtain near-optimal behavior without an accurate system description, a value-iteration-based Adaptive Dynamic Programming (ADP) method is adopted for policy approximation. The inclusion of the ETM ensures that parameter updates of the learning module are executed only when the state deviation surpasses a predefined bound, thereby preventing excessive learning activity and substantially reducing computational load. A Lyapunov-oriented analysis is used to characterize the stability properties of the resulting closed-loop system. Numerical experiments further confirm that the developed approach maintains strong control performance and disturbance tolerance, while achieving a significant reduction in sampling and processing effort compared with standard time-triggered ADP schemes.
Accurate reconstruction of cardiac anatomy from sparse clinical images remains a major challenge in patient-specific modeling. While neural implicit functions have previously been applied to this task, their application to mapping anatomical consistency across subjects has been limited. In this work, we introduce Neural Implicit Heart Coordinates (NIHCs), a standardized implicit coordinate system, based on universal ventricular coordinates, that provides a common anatomical reference frame for the human heart. Our method predicts NIHCs directly from a limited number of 2D segmentations (sparse acquisition) and subsequently decodes them into dense 3D segmentations and high-resolution meshes at arbitrary output resolution. Trained on a large dataset of 5,000 cardiac meshes, the model achieves high reconstruction accuracy on clinical contours, with mean Euclidean surface errors of 2.51$\pm$0.33 mm in a diseased cohort (n=4549) and 2.3$\pm$0.36 mm in a healthy cohort (n=5576). The NIHC representation enables anatomically coherent reconstruction even under severe slice sparsity and segmentation noise, faithfully recovering complex structures such as the valve planes. Compared with traditional pipelines, inference time is reduced from over 60 s to 5-15 s. These results demonstrate that NIHCs constitute a robust and efficient anatomical representation for patient-specific 3D cardiac reconstruction from minimal input data.