Lupus nephritis (LN) is a severe complication of systemic lupus erythematosus that affects pediatric patients with significantly greater severity and worse renal outcomes compared to adults. Despite the urgent clinical need, predicting pediatric LN prognosis remains unexplored in computational pathology. Furthermore, the only existing histopathology-based approach for LN relies on multiple costly staining protocols and fails to integrate complementary clinical data. To address these gaps, we propose the first multimodal computational pathology framework for three-class treatment response prediction (complete remission, partial response, and no response) in pediatric LN, utilizing only routine PAS-stained biopsies and structured clinical data. Our framework introduces two key methodological innovations. First, a Clinical-Injection Transformer (CIT) embeds clinical features as condition tokens into patch-level self-attention, facilitating implicit and bidirectional cross-modal interactions within a unified attention space. Second, we design a decoupled representation-knowledge adaptation strategy using a domain-adapted Masked Autoencoder (MAE). This strategy explicitly separates self-supervised morphological feature learning from pathological knowledge extraction. Additionally, we introduce a multi-granularity morphological type injection mechanism to bridge distilled classification knowledge with downstream prognostic predictions at both the instance and patient levels. Evaluated on a cohort of 71 pediatric LN patients with KDIGO-standardized labels, our method achieves a three-class accuracy of 90.1% and an AUC of 89.4%, demonstrating its potential as a highly accurate and cost-effective prognostic tool.
Reconfigurable intelligent surfaces (RISs) enable programmable control of wireless propagation. Beyond environmental deployments, integrating metasurfaces at the antenna front end allows direct manipulation of the radiated electromagnetic field and enables wave-domain signal processing. In this context, stacked intelligent metasurfaces (SIMs) have recently been proposed as an advanced architecture in which multiple programmable metasurface layers interact through wave propagation, enabling richer and more flexible electromagnetic transformations than conventional single-layer designs. By leveraging cascaded wave-matter interactions at the transmitter or receiver front end, SIMs substantially expand the design space of programmable wireless systems. This survey provides a comprehensive overview of SIMs technologies from the electromagnetic processing perspective, covering their physical principles, modeling frameworks, hardware realizations, and emerging architectural designs. We review existing modeling approaches based on cascaded operators, multiport impedance formulations, and network parameter representations, and discuss their implications for scalable optimization and system design. The survey further examines key communication functionalities enabled by front-end metasurface processing, including communication performance optimization, near-field and wideband transmission, learning-driven control, integrated sensing and communications, and emerging architectures such as cell-free and non-terrestrial networks. Finally, we identify open research problems related to physical modeling, scalability, hardware-algorithm co-design, and network integration, and outline promising directions toward realizing SIM-based antenna front ends as fully programmable electromagnetic processors for future sixth-generation (6G) wireless systems.
Predicting the output of a dynamical system from streaming data is fundamental to real-time feedback control and decision-making. We first derive an autoregressive representation that relates future local outputs to asynchronous past outputs. Building on this structure, we propose an online least-squares algorithm to learn this autoregressive model for real-time prediction. We then establish a regret bound of O(log^3 N) relative to the optimal model-based predictor, which holds for marginally stable systems. Moreover, we provide a sufficient condition characterized via a symplectic matrix, under which the proposed cooperative online learning method provably outperforms the optimal model-based predictor that relies solely on local observations. From a technical standpoint, our analysis exploits the orthogonality of the innovation process under asynchronous data structure and the persistent excitation of the Gram matrix despite delay-induced asymmetries. Overall, these results offer both theoretical guarantees and practical algorithms for model-free cooperative prediction with asynchronous observations, thereby enriching the theory of online learning for dynamical systems.
Accelerated cardiac cine MRI requires reconstructing spatiotemporal images from highly undersampled k-space data. Implicit neural representations (INRs) enable scan-specific reconstruction without large training datasets, but encode content implicitly in network weights without physically interpretable parameters. Gaussian primitives provide an explicit and geometrically interpretable alternative, but their spectra are confined near the k-space origin, limiting high-frequency representation. We propose Gabor primitives for MRI reconstruction, modulating each Gaussian envelope with a complex exponential to place its spectral support at an arbitrary k-space location, enabling efficient representation of both smooth structures and sharp boundaries. To exploit spatiotemporal redundancy in cardiac cine, we decompose per-primitive temporal variation into a low-rank geometry basis capturing cardiac motion and a signal-intensity basis modeling contrast changes. Experiments on cardiac cine data with Cartesian and radial trajectories show that Gabor primitives consistently outperform compressed sensing, Gaussian primitives, and hash-grid INR baselines, while providing a compact, continuous-resolution representation with physically meaningful parameters.
Accurate longitudinal analysis of brain MRI is often hindered by evolving lesions, which bias automated neuroimaging pipelines. While deep generative models have shown promise in inpainting these lesions, most existing methods operate cross-sectionally or lack 3D anatomical continuity. We present a novel pseudo-3D longitudinal inpainting framework based on Denoising Diffusion Probabilistic Models (DDPM). Our approach utilizes multi-channel conditioning to incorporate longitudinal context from distinct visits (t_1, t_2) and extends Region-Aware Diffusion (RAD) to the medical domain, focusing the generative process on pathological regions without altering surrounding healthy tissue. We evaluated our model against state-of-the-art baselines on longitudinal brain MRI from 93 patients. Our model significantly outperforms the leading baseline (FastSurfer-LIT) in terms of perceptual fidelity, reducing the Learned Perceptual Image Patch Similarity (LPIPS) distance from 0.07 to 0.03 while effectively eliminating inter-slice discontinuities. Furthermore, our model demonstrates high longitudinal stability with a Temporal Fidelity Index of 1.024, closely approaching the ideal value of 1.0 and substantially narrowing the gap compared to LIT's TFI of 1.22. Notably, the RAD mechanism provides a substantial gain in efficiency; our framework achieves an average processing time of 2.53 min per volume, representing approximately 10x speedup over the 24.30 min required by LIT. By leveraging longitudinal priors and region-specific denoising, our framework provides a highly reliable and efficient preprocessing step for the study of progressive neurodegenerative diseases. A derivative dataset consisting of 93 pre-processed scans used for testing will be available upon request after acceptance. Code will be released upon acceptance.
The increasing intensity and frequency of wildfires are causing significant economic and societal impacts on communities through direct effects on the built environment, particularly critical infrastructure. Electrical systems can both initiate wild-fires (grid-to-fire) and be damaged by wildfire exposure (fire-to-grid). Therefore, resilient electric systems can both limit ignitions and be hardened such that they are more robust to fire demands. Researchers have investigated wildfire mitigation strategies using traditional transmission and distribution electrical test-system models. However, these test cases may not accurately represent realistic electrical system configurations or fuel landscapes, nor capture community impacts, particularly the social and economic effects of mitigation strategies. A wildfire-aware modeling framework enables researchers to develop test cases that benchmark resilience and mitigation strategies while reducing reliance on overly simplistic assumptions about wildfire effects on electrical systems and communities. This study proposes a modeling framework for wildfire-electrical system research by analyzing recent literature and identifying key dimensions as well as gaps within these dimensions. In particular, the framework considers how fire in the wildland-urban interface propagates in space and time, how hazard-infrastructure interactions (e.g., wind and fire) cause system- and component-level damage, and how wildfire-related power outages affect communities.
Automated quality assessment of structural brain MRI is an important prerequisite for reliable neuroimaging analysis, but yet remains challenging due to motion artifacts and poor generalization across acquisition sites. Existing approaches based on image quality metrics (IQMs) or deep learning either requires extensive preprocessing, which incurs high computational cost, or poor generalization to unseen data. In this work, we propose a lightweight and interpretable framework for detecting motion related artifacts in T1 weighted brain MRI by extending the Discriminative Histogram of Gradient Magnitude (DHoGM) to a three dimensional space. The proposed method integrates complementary slice-level (2D) and volume-level (3D) DHoGM features through a parallel decision strategy, capturing both localized and global motion-induced degradation. Volumetric analysis is performed using overlapping 3D cuboids to achieve comprehensive spatial coverage while maintaining computational efficiency. A simple threshold-based classifier and a low parameter multilayer perceptron are used, which results in a model with only 209 trainable parameters. Our method was evaluated on the MR-ART and ABIDE datasets under both seen-site and unseen-site conditions. Experimental results demonstrate strong performance, achieving up to 94.34\% accuracy the in domain evaluation and 89\% accuracy on unseen sites, while almost completely avoiding false acceptance of poor-quality scans. Ablation studies confirms the complementary benefits of combining 2D and 3D features. Overall, the proposed approach offers an effective, efficient, and robust solution for automated MRI quality check, with strong potential for integration into large scale clinical and research workflows.
Accurate overland runoff and infiltration predictions are critical for effective water resources management, in particular for urban flood management. However, the inherent uncertainty in rainfall patterns, soil properties, and initial conditions makes reliable flood forecasting a challenging task. This paper presents a framework for quantifying the impact of these uncertainties on hydrologic and hydrodynamic simulations via a state space approach based on a differential algebraic equation (DAE) formulation that couples surface and subsurface constraints with the governing dynamics. Under this formulation, the complex interactions between overland flow and infiltration dynamics are captured in realtime. To account for uncertainty in inputs and parameters, the proposed framework quantifies and propagates these uncertainties through the DAE model formulation under partial measurements. The effectiveness of the approach is demonstrated through a series of numerical experiments on synthetic and real world catchments, highlighting its ability to provide probabilistic estimates of watershed state conditions while accounting for uncertainty. An important aspect of the proposed methods is that they are distribution-agnostic, i.e., they only require covariances of uncertainty and not specific types of distributions. The proposed framework is further validated against Monte Carlo (MC) ensemble simulations while providing probabilistic state estimates for measured and unmeasured watershed states under partial gauging.
Phasor measurement units (PMUs) are widely used for sub-synchronous oscillation monitoring, yet the effect of windowed discrete Fourier transform (DFT)-based phasor estimation on oscillation observability is not fully characterized. This letter derives the complete complex-valued frequency response of the windowed DFT phasor estimator under both magnitude and phase modulation. The analysis shows that the estimation window introduces both frequency-dependent magnitude attenuation and phase shift to oscillation components, governed by the complex gain. A simple recovery method is also proposed to restore the true oscillation amplitude and phase from PMU data using the analytically known complex gain. The results are validated through time-domain simulations and provide guidance for industry practitioners on interpreting PMU-based oscillation measurements and selecting appropriate window lengths.
As 6G communications advance, the demand for new services and capabilities, as defined by the international telecommunication union (ITU), is increasing. A crucial aspect of 6G advancement lies in the development of signal waveforms that can meet these demands while maintaining compatibility with existing standards. This paper explores sustainable physical layer waveform options, focusing on a balanced approach that integrates non-orthogonality with orthogonality to achieve both backward compatibility and forward innovation. Specifically, we investigate two key signal formats: single-carrier orthogonal frequency division multiplexing (SC-OFDM) (1D,2D) and single-carrier non-orthogonal frequency shaping (SC-NOFS)(1D,2D). Both can use 1D frequency and 2D time-frequency precoding, offering enhanced frequency and time diversity, simplified processing, and resilience to delay-Doppler effects. SC-NOFS(2D) further introduces advantages such as improved spectral efficiency and reduced latency, making it a strong candidate for future 6G applications. The comparative analysis highlights that SC-NOFS(2D) provides a broader range of capabilities, particularly those requiring high data rate, high mobility, low-latency communication, sustainability, and interoperability, positioning it as a versatile solution for next-generation 6G communication.
Recent advances in learned video compression (LVC) have led to significant performance gains, with codecs such as DCVC-RT surpassing the H.266/VVC low-delay mode in compression efficiency. However, existing LVCs still exhibit key limitations: they often require separate models for intra and inter coding modes, and their performance degrades when temporal references are unreliable. To address this, we introduce Uni-LVC, a unified LVC method that supports both intra and inter coding with low-delay and random-access in a single model. Building on a strong intra-codec, Uni-LVC formulates inter-coding as intra-coding conditioned on temporal information extracted from reference frames. We design an efficient cross-attention adaptation module that integrates temporal cues, enabling seamless support for both unidirectional (low-delay) and bidirectional (random-access) prediction modes. A reliability-aware classifier is proposed to selectively scale the temporal cues, making Uni-LVC behave closer to intra coding when references are unreliable. We further propose a multistage training strategy to facilitate adaptive learning across various coding modes. Extensive experiments demonstrate that Uni-LVC achieves superior rate-distortion performance in intra and inter configurations while maintaining comparable computational efficiency.
This paper presents a hybrid safety-critical coordination architecture for multi-agent systems operating in dense environments. While control barrier functions (CBFs) provide formal safety guarantees, decentralized implementations typically rely on ego-centric safety filtering and may lead to redundant constraint enforcement and conservative collective behavior. To address this limitation, we introduce a combinatorial coordination layer formulated as a mixed-integer linear program (MILP) that assigns collision-avoidance responsibilities among agents. By explicitly distributing enforcement tasks, redundant reactions are eliminated and computational complexity is reduced. Each agent subsequently solves a reduced local quadratic program enforcing only its assigned constraints.
This paper proposes a novel formulation of frequency response for nonlinear systems in the Koopman operator framework. This framework is a promising direction for the analysis and synthesis of systems with nonlinear dynamics based on (linear) Koopman operators. We show that the frequency response of a nonlinear plant is derived through the Laplace transform of the output of the plant, which is a generalization of the classical approach to LTI plants and is guided by the resolvent theory of Koopman operators. The response is a complex-valued function of the driving angular frequency, allowing one to draw the so-called Bode plots, which display the gain and phase characteristics. Sufficient conditions for the existence of the frequency response are presented for three classes of dynamics.
Accent variability remains a major errors in automatic speech recognition, yet most adaptation methods rely on parameter fine-tuning without understanding where accent information is encoded. We treat accent variation as an interpretable subspace in hidden representations and investigate whether it can be identified and controlled directly in activation space. We extract layer-wise encoder activations and estimate mean-shift directions capturing accent-induced representation shifts. By injecting these directions into individual layers and measuring how they align accented and standard embeddings, we derive a layer-wise accent sensitivity profile, revealing that accent information concentrates in a narrow band of middle encoder layers. Leveraging this structure, we further introduce parameter-free accent steering that modifies representations during inference without updating model weights. Experiments across eight accents show consistent word error rate reductions.
Keyword spotting (KWS) identifies words for voice assistants, but environmental noise frequently reduces accuracy. Standard adaptation fixes this issue and strictly requires original or labeled audio. Test time adaptation (TTA) solves this data constraint using only unlabeled test audio. However, current methods fail to handle the severe imbalance between rare keywords and frequent background sounds. Consequently, standard entropy minimization (EM) becomes overconfident and heavily biased toward the frequent background class. To overcome this problem, we propose a TTA method named ImKWS. Our approach splits the entropy process into a reward branch and a penalty branch with separate update strengths. Furthermore, we enforce consistency across multiple audio transformations to ensure stable model updates. Experiments on the Google Speech Commands dataset indicate ImKWS achieves reliable adaptation in realistic imbalanced scenarios. The code is available on GitHub.
Polarimetric imaging aims to recover polarimetric parameters, including Total Intensity (TI), Degree of Polarization (DoP), and Angle of Polarization (AoP), from captured polarized measurements. In real-world scenarios, these measurements are frequently affected by diverse degradations such as low-light noise, motion blur, and mosaicing artifacts. Due to the nonlinear dependency of DoP and AoP on the measured intensities, accurately retrieving physically consistent polarimetric parameters from degraded observations remains highly challenging. Existing approaches typically adopt task-specific network architectures tailored to individual degradation types, limiting their adaptability across different restoration scenarios. Moreover, many methods rely on multi-stage processing pipelines that suffer from error accumulation, or operate solely in a single domain (either image or Stokes domain), failing to fully exploit the intrinsic physical relationships between them. In this work, we propose a unified architectural framework for polarimetric imaging that is structurally shared across multiple degradation scenarios. Rather than redesigning network structures for each task, our framework maintains a consistent architectural design while being trained separately for different degradations. The model performs single-stage joint image-Stokes processing, avoiding error accumulation and explicitly preserving physical consistency. Extensive experiments show that this unified architectural design, when trained for specific degradation types, consistently achieves state-of-the-art performance across low-light denoising, motion deblurring, and demosaicing tasks, establishing a versatile and physically grounded solution for degraded polarimetric imaging.
Neural audio codecs optimized for mel-spectrogram reconstruction often fail to preserve intelligibility. While semantic encoder distillation improves encoded representations, it does not guarantee content preservation in reconstructed speech. In this work, we demonstrate that self-supervised representation reconstruction (SSRR) loss fundamentally improves codec training and performance. First, SSRR significantly accelerates convergence, enabling competitive results using only a single GPU. Second, it enhances intelligibility by reconstructing distilled self-supervised representations from codec outputs. Third, SSRR enables high intelligibility without additional lookahead in streaming Transformer-based codecs, allowing a zero-lookahead architecture for real-time deployment. As a result, our JHCodec achieves state-of-the-art performance while maintaining minimal latency and reduced training cost. We open-source the full implementation, training pipeline, and demo on Github this https URL.
Stacked intelligent metasurfaces (SIMs) facilitate computation by cascaded programmable layers so that part of the signal processing can be performed in the wave domain during signal propagation, rather than solely after reception. This approach expands the controllable degrees of freedom and supports the joint design of communication, sensing, and computation with the potential for reduced energy usage, shorter end-to-end latency, and improved task execution. Despite these advances, research on the SIM concept is still at an early stage, with challenges in scalability, controllability, nonlinearity, and robustness. This article reviews the state-of-the-art of SIM research, including applications, functions, and characteristics. We also demonstrate their potential through case studies on neural-like analog inference and communication enhancement. Finally, the paper outlines open challenges and future directions toward establishing SIMs as a new signal processing paradigm for in-wave computation in next-generation (NG) networks.
Channel state information (CSI)-based electromagnetic inverse scattering for material reconstruction in ISAC systems enables physics-grounded, material-aware DT. Yet the resulting CSI-induced scattering operator is often severely ill-conditioned. To understand the origin of the ill-posedness, this paper analyzes the mathematical properties of the electromagnetic inverse problem and investigates the operator structure of the ISAC scattering matrix jointly shaped by in-domain scattering responses and Tx/Rx propagation channels. We show that background-related matrix columns are highly coherent and dominate the near rank deficiency, whereas scatterer-related columns are comparatively weakly correlated; their coherence decreases with the number of probing frequencies and thus contributes to the effective rank. Motivated by this analysis, we prove that restricting the ROI around the true scatterer yields a provable condition-number reduction and a tightened CRLB, and we quantify the impact of ROI mismatch numerically. To operationalize these insights, an ROI-constrained QP framework is adopted, where a linear sampling method delineates a coarse ROI and the QP update is performed in the reduced subspace. Full-wave FDTD simulations over multiple geometries and SNR validate pronounced conditioning improvement, substantial complexity savings, and improved robustness, consistent with the proposed analysis, compared with the full-domain formulation.
Zero-shot Text-to-Speech (TTS) models can generate speech that captures both the voice timbre and accent of a reference speaker. However, disentangling these attributes remains challenging, as the output often inherits both the accent and timbre from the reference. In this study, we introduce a novel, post-hoc, and training-free approach to neutralize accent while preserving the speaker's original timbre, utilizing inference-time activation steering. We first extract layer-specific "steering vectors" offline, which are derived from the internal activation differences within the TTS model between accented and native speech. During inference, the steering vectors are applied to guide the model to produce accent-neutralized, timbre-preserving speech. Empirical results demonstrate that the proposed steering vectors effectively mitigate the output accent and exhibit strong generalizability to unseen accented speakers, offering a practical solution for accent-free voice cloning.
The upper mid-band (UMB) spectrum is a key enabler for 6G systems, yet reconfigurable intelligent surface (RIS)-assisted UMB communications face severe channel estimation challenges due to near-field propagation and transitional scattering, which induce strong spatial correlation and ill-conditioned least-squares (LS) formulations. To overcome this limitation, we propose a conditioning-aware channel estimation framework that transforms the inherently ill-conditioned high-dimensional problem into multiple well-conditioned subproblems via greedy column grouping. By systematically separating highly correlated RIS elements into distinct sub-blocks via piecewise RIS phase design, the proposed method directly improves Gram matrix conditioning and stabilizes piecewise LS reconstruction without relying on sparsity assumptions. Simulation results demonstrate that the proposed method significantly outperforms conventional LS and OMP-based estimators in pilot-limited and transitional UMB regimes, achieving robust performance with low computational complexity.
This paper explores secure communication in an underwater energy-harvesting (EH) relay network that supports hybrid optical-acoustic transmission. The optical hop is modeled using a Gamma-Gamma turbulence channel with pointing errors and may occasionally be blocked by underwater obstacles. At the same time, an eavesdropper is assumed to monitor the acoustic hop, creating a secrecy concern. To address this, we formulate the relay power allocation problem as an infinite-horizon Markov decision process (MDP). A model-based reinforcement learning (RL) driven optimal power allocation (OPA) strategy is proposed to maximize long-term cumulative secrecy performance until the network stops functioning. To offer lower-complexity alternatives, we also develop a Greedy Algorithm (GA) and a Naive Algorithm (NA). Simulation results show that the RL based OPA adapts effectively to battery dynamics, varying channel conditions, and optical link availability, achieving the highest secure data transmission, while GA performs reasonably and NA performs poorly due to its short-sighted decisions.
We address the challenge of preserving emotional content in streaming speaker anonymization (SA). Neural audio codec language models trained for audio continuation tend to degrade source emotion: content tokens discard emotional information, and the model defaults to dominant acoustic patterns rather than preserving paralinguistic attributes. We propose supervised finetuning with neutral-emotion utterance pairs from the same speaker, combined with frame-level emotion distillation on acoustic token hidden states. All modifications are confined to finetuning, which takes less than 2 hours on 4 GPUs and adds zero inference latency overhead, while maintaining a competitive 180ms streaming latency. On the VoicePrivacy 2024 protocol, our approach achieves a 49.2% UAR (emotion preservation) with 5.77% WER (intelligibility), a +24% relative UAR improvement over the baseline (39.7%->49.2%) and +10% over the emotion-prompt variant (44.6% UAR), while maintaining strong privacy (EER 49.0%). Demo and code are available: this https URL
Static scene videos, such as surveillance feeds and videotelephony streams, constitute a dominant share of storage consumption and network traffic. However, both traditional standardized codecs and neural video compression (NVC) methods struggle to encode these videos efficiently due to inadequate usage of temporal redundancy and severe distribution gaps between training and test data, respectively. While recent generative compression methods improve perceptual quality, they introduce hallucinated details that are unacceptable in authenticity-critical applications. To overcome these limitations, we propose to incorporate positive-incentive noise into NVC for static scene videos, where short-term temporal changes are reinterpreted as positive-incentive noise to facilitate model finetuning. By disentangling transient variations from the persistent background, structured prior information is internalized in the compression model. During inference, the invariant component requires minimal signaling, thus reducing data transmission while maintaining pixel-level fidelity. Preliminary experiments demonstrate a 73% Bjøntegaard delta (BD) rate saving compared to general NVC models. Our method provides an effective solution to trade computation for bandwidth, enabling robust video transmission under adverse network conditions and economic long-term retention of surveillance footage.
Accurate and robust wireless localization is a key enabler for a wide range of mobile computing applications. Fingerprint-based localization using channel state information (CSI) has attracted significant attention due to its high accuracy and compatibility with existing communication infrastructures. However, traditional similarity-based fingerprinting methods suffer from high computational complexity and limited scalability in high-dimensional CSI spaces, while purely learning-based approaches fail to explicitly exploit correlations among reference fingerprints during inference. To address these challenges, this paper proposes a unified retrieval-assisted fingerprinting localization framework that tightly integrates similarity-based and learning-based paradigms. Specifically, channel charting is employed to project high-dimensional CSI into a low-dimensional latent space, enabling efficient and scalable retrieval of locally correlated reference points (RPs). Building upon the retrieved RPs, a graph attention network (GAT) is designed to explicitly model inter-sample correlations between the query CSI and its associated references, allowing adaptive and geometry-aware feature aggregation for accurate position estimation. Extensive experiments conducted on both real-world indoor and ray-tracing simulated outdoor scenarios demonstrate that the proposed method consistently outperforms state-of-the-art similarity-based and learning-based localization approaches.
This work presents MAD (Multimodal Affection Dataset), a multimodal emotion dataset designed for affective computing and neurophysiological modeling. MAD is built upon synchronous collection of diverse physiological signals (EEG, ECG, EOG, EMG, PPG, and BCG) together with tri-view RGB-D facial videos, enabling the observation of emotional dynamics from neural, physiological, and behavioral perspectives. The dataset consists of synchronized recordings from 18 participants and introduces two key contributions. First, it provides temporally aligned multimodal data that jointly capture central neural activity, peripheral physiological responses, and overt facial expressions. Second, it incorporates a three-level emotion annotation framework spanning stimulus elicitation, subjective cognition, and behavioral expression, supporting joint modeling of the full emotion process. To validate the dataset, we conduct systematic benchmark experiments covering intra-subject EEG emotion recognition, cross-subject EEG transfer learning, consistency analysis and emotion classification with cardiac-related signals, multimodal physiological fusion, and multi-view facial emotion recognition. The experimental results demonstrate that MAD supports consistent and comparable performance across both unimodal and multimodal settings, establishing it as a reliable benchmark for emotion recognition and cross-modal affective analysis, and as a valuable resource for studying emotion mechanisms across multiple levels.
J-peak detection in ballistocardiography (BCG) is a key component of unobtrusive heart rate monitoring during sleep. Most existing approaches formulate this task as a dense time-point segmentation problem and rely on heuristic post-processing to convert continuous responses into discrete peak events, resulting in redundant model structures and sensitivity to parameter settings. In this work, we construct and publicly release a pillow-based BCG--ECG dataset consisting of multi-subject, multi-night natural sleep recordings with manually annotated BCG J-peaks. Based on this dataset, we propose a set-prediction-based J-peak detection framework that directly models peaks as discrete temporal events, eliminating the need for high-resolution segmentation heads and explicit peak suppression. Experimental results show that, under a shared convolutional backbone, the proposed method achieves superior detection performance compared to a U-Net-based segmentation baseline, while substantially reducing model parameters and computational complexity. These results indicate that event-level set prediction provides a concise and efficient modeling paradigm for BCG J-peak detection in sleep monitoring.
Speech foundation models struggle with low-resource Pacific Indigenous languages because of severe data scarcity. Furthermore, full fine-tuning risks catastrophic forgetting. To address this gap, we present an empirical study adapting models to real-world Pacific datasets. We investigate how data volume and linguistic features affect adaptation success. Specifically, we evaluate strategies including Full Fine-Tuning and Low-Rank Adaptation (LoRA). Additionally, we analyze a continual learning framework for sequentially acquiring multiple languages. We demonstrate that adapting to these distant languages causes severe internal representational drift. Consequently, these models face a strict plasticity and stability dilemma. While LoRA adapts well initially, it suffers from catastrophic forgetting during sequential learning. Ultimately, this study highlights the urgent need for robust adaptation strategies tailored to underrepresented languages.
We present a cross-linguistic study of speech in autistic and non-autistic children speaking Finnish, French, and Slovak. We combine supervised classification with within-language and cross-corpus transfer experiments to evaluate classification performance within and across languages and to probe which acoustic cues are language-specific versus language-general. Using a large set of acoustic-prosodic features, we implement speaker-level classification benchmarks as an analytical tool rather than to seek state-of-the-art performance. Within-language models, evaluated with speaker-level cross-validation, yielded heterogeneous results. The Finnish model performed best (Accuracy 0.84, F1 0.88), followed by Slovak (Accuracy 0.63, F1 0.68) and French (Accuracy 0.68, F1 0.56). We then tested cross-language generalization. A model trained on all pooled corpora reached an overall Accuracy of 0.61 and F1 0.68. Leave-one-corpus-out experiments, which test transfer to an unseen language, showed moderate success when testing on Slovak (F1 0.70) and Finnish (F1 0.78), but poor transfer to French (F1 0.42). Feature-importance analyses across languages highlighted partially shared, but not fully language-invariant, acoustic markers of autism. These findings suggest that some autism-related speech cues generalize across typologically distinct languages, but robust cross-linguistic classifiers will likely require language-aware modeling and more homogeneous recording conditions.
Prosodic differences in autism are well-documented, but cross-linguistic evidence remains limited. This study investigates prosody in autism across a multilingual corpus of Finnish, French, and Slovak speakers. 88 acoustic features from over 5,000 inter-pausal units were extracted, and data were reduced via Principal Component Analysis (PCA) and analyzed using Linear Mixed-Effects Models (LMMs). Cross-linguistically, autistic speakers exhibited increased general intensity variability and a clearer, less breathy voice quality (higher Harmonics-to-Noise Ratio and alpha ratio), alongside reduced temporal intensity dynamics and lower central f0. Monolingual analyses revealed language-specific nuances: Slovak results aligned with cross-linguistic f0 patterns but diverged on voice quality, while Finnish results mirrored the broader voice quality findings. These results emphasize including voice quality and intensity dynamics in the study of possible language-independent markers of autism, alongside traditional pitch measures. The findings challenge deficiency-based models, suggesting instead a complex, acoustically distinct prosodic profile across languages.
Artificial intelligence-based radiation therapy (RT) planning has the potential to reduce planning time and inter-planner variability, improving efficiency and consistency in clinical workflows. Most existing automated approaches rely on multiple dose evaluations and corrections, resulting in plan generation times of several minutes. We introduce AIRT (Artificial Intelligence-based Radiotherapy), an end-to-end deep-learning framework that directly infers deliverable treatment plans from CT images and structure contours. AIRT generates single-arc VMAT prostate plans, from imaging and anatomical inputs to leaf sequencing, in under one second on a single Nvidia A100 GPU. The framework includes a differentiable dose feedback, an adversarial fluence map shaping, and a plan generation augmentation to improve plan quality and robustness. The model was trained on more than 10,000 intact prostate cases. Non-inferiority to RapidPlan Eclipse was demonstrated across target coverage and OAR sparing metrics. Target homogeneity (HI = 0.10 $\pm$ 0.01) and OAR sparing were similar to reference plans when evaluated using AcurosXB. These results represent a significant step toward ultra-fast standardized RT planning and a streamlined clinical workflow.
Extracting patient medical conditions from code-switched clinical spoken dialogues is challenging due to rapid turn-taking and highly overlapped speech. We present a robust system evaluated on the DISPLACE-M dataset of real-world Hinglish medical conversations. We propose an End-to-End Neural Diarization with Vector Clustering approach (EEND-VC) to accurately resolve dense and speaker overlaps in Doctor-Patient Conversations (DoPaCo). For transcription, we adapt a Qwen3 ASR model via domain-specific fine-tuning, Devanagari script normalization, and dialogue-level LLM error correction, achieving an 18.59% tcpWER. We benchmark open and proprietary LLMs on medical condition extraction, comparing our text-based cascade system against a multimodal End-to-End (E2E) audio framework. While proprietary E2E models set the performance ceiling, our open cascaded architecture is highly competitive, as it achieved first place out of 25 participants in the DISPLACE-M challenge. All implementations are publicly released.
The upper 6 GHz (U6G) band with XL-MIMO is a key enabler for sixth-generation wireless systems, yet intelligent radiomap prediction for such systems remains challenging. Existing datasets support only small-scale arrays (up to 8x8) with predominantly isotropic antennas, far from the 1024-element directional arrays envisioned for 6G. Moreover, current methods encode array configurations as scalar parameters, forcing neural networks to extrapolate array-specific radiation patterns, which fails when predicting radiomaps for configurations absent from training data. To jointly address data scarcity and generalization limitations, this paper advances XL-MIMO radiomap prediction from three aspects. To overcome data limitations, we construct the first XL-MIMO radiomap dataset containing 78400 radiomaps across 800 urban scenes, five frequency bands (1.8-6.7 GHz), and nine array configurations up to 32x32 uniform planar arrays with directional elements. To enable systematic evaluation, we establish a comprehensive benchmark framework covering practical scenarios from coverage estimation without field measurements to generalization across unseen configurations and environments. To enable generalization to arbitrary beam configurations without retraining, we propose the beam map, a physics-informed spatial feature that analytically computes array-specific coverage patterns. By decoupling deterministic array radiation from data learned multipath propagation, beam maps shift generalization from neural network extrapolation to physics-based computation. Integrating beam maps into existing architectures reduces mean absolute error by up to 60.0% when generalizing to unseen configurations and up to 50.5% when transferring to unseen environments. The complete dataset and code are publicly available at this https URL.
This paper introduces a unified regression framework based on the Lagrange formalism, demonstrating how polynomial and logistic regression can all be formulated within a common variational (Lagrangian formalism) structure. Within this framework, the DCT-based (Discrete Cosine Transform) model naturally emerges as a novel and effective approach to regression. The DCT is used as the constraints in the Lagrangian formalism. By leveraging the nearly orthogonal and bounded nature of the cosine basis, the DCT model offers computational advantages and improved convergence properties compared with traditional polynomial methods. The results further support the potential of the DCT-based neuron as a powerful tool for regression analysis and related learning tasks.
Next-generation wireless networks require enhanced flexibility, efficiency, and reliability in physical layer waveform design to address the challenges posed by heterogeneous channel conditions and stringent quality-of-service demands. To this end, this paper proposes a unified multicarrier waveform framework that provides a systematic characterization and practical implementation guidelines to facilitate waveform selection for the sixth-generation (6G) mobile networks and beyond. We commence by examining the design principles of the state-of-the-art waveforms, which are categorized into one-dimensional modulation waveforms (e.g., orthogonal frequency division multiplexing (OFDM) and affine frequency division multiplexing (AFDM)) and two-dimensional modulation waveforms (e.g., orthogonal time frequency space (OTFS)). Their inherent resilience against various channel-induced interference is further studied, revealing their distinct suitability in diverse channel conditions. Furthermore, an in-depth performance analysis is presented by comparing their key performance indicators (KPIs), followed by an extensive exploration of these advanced waveforms in various applications. Consequently, this work aims to serve as a pivotal reference for waveform adoption in future 6G standardization and network deployment.
In this paper, we propose an adaptive data-driven min-max model predictive control (MPC) scheme for discrete-time linear time-varying (LTV) systems. We assume that prior knowledge of the system dynamics and bounds on the variations are known, and that the states are measured online. Starting from an initial state-feedback gain derived from prior knowledge, the algorithm updates the state-feedback gain using online input-state data. To this end, a semidefinite program (SDP) is solved to minimize an upper bound on the infinite-horizon optimal cost and to derive a corresponding state-feedback gain. We prove that the resulting closed-loop system is exponentially stabilized and satisfies the constraints. Further, we extend the proposed scheme to LTV systems with process noise. The resulting closed-loop system is shown to be robustly stabilized to a robust positive invariant (RPI) set. Finally, the proposed methods are demonstrated by numerical simulations.
In our previous work [2], we introduced a hardware- and power-efficient architecture for hybrid digital-analog (HDA) multiuser MIMO (MU-MIMO) based on stacking identical basic modules. Each module consists of a small active multi-antenna feeder (AMAF) placed in the near field of a larger reflective intelligent surface (RIS). Each AMAF is driven by one RF chain and conveys one spatial stream, achieving a multiplexing gain of $K$ with $K$ stacked modules. While [2] focused on module design and efficiency compared to active arrays, performance was evaluated only under pure line-of-sight (LOS) conditions. This work extends our approach in several ways. First, we propose a simple, pragmatic method for designing phase-only flat-top beams for the AMAF-RIS module, enabling wide angular coverage with low ripple and sidelobes. This design supports hierarchical beamforming codebooks for efficient beam acquisition. Second, we evaluate MU-MIMO performance under realistic mmWave multipath channels including both LOS and non-LOS (NLOS) components modeled using a 3D von Mises-Fisher distribution. We propose a low-complexity HDA MU-MIMO framework with: user-beam association via standard beam acquisition; dynamic user grouping (one user per beam); effective baseband MIMO channel estimation using 3GPP-compliant pilots; and downlink transmission with zero-forcing precoding under per-antenna power constraints. Results show high spectral efficiency and multiplexing gain while preserving hardware simplicity and power efficiency. Crucially, the approach is fully compliant with 3GPP 5GNR beam acquisition and sounding reference signaling mechanisms.
We present LiveSense - a cross-platform that transforms a commercial off-the-shelf (COTS) Wi-Fi Network Interface Card (NIC) on a laptop into a centimeter-level Range-Doppler sensor while preserving simultaneous communication capability. The laptops are equipped with COTS Intel AX211 (Wi-Fi 6E) or Intel BE201 (Wi-Fi 7) NICs. LiveSense can (i) Extract fully-synchronized channel state information (CSI) at >= 40 Hz, (ii) Perform time-phase alignment and self-interference cancellation on-device, and (iii) Provide a real-time stream of range, Doppler, subcarrier magnitude/phase and annotated video frames to a Python/Qt Graphical User Interface (GUI). The demo will showcase the ability to detect (i) Distance and radial velocity of attendees within a few meters of the device, (ii) Micro-motion (respiration), and (iii) Hand-gesture ranging. To the best of our knowledge, this is the first-ever demo to obtain accurate range information of targets from commercial Wi-Fi, despite the limited 160 MHz bandwidth.
The deployment of complex soft robots in multiphysics environments requires advanced simulation frameworks that not only capture interactions between different types of material, but also translate accurately to real-world performance. Soft robots pose unique modeling challenges due to their large nonlinear deformations, material incompressibility, and contact interactions, which complicate both numerical stability and physical accuracy. Despite recent progress, robotic simulators often struggle with modeling such phenomena in a scalable and application-relevant manner. We present SORS (Soft Over Rigid Simulator), a versatile, high-fidelity simulator designed to handle these complexities for soft robot applications. Our energy-based framework, built on the finite element method, allows modular extensions, enabling the inclusion of custom-designed material and actuation models. To ensure physically consistent contact handling, we integrate a constrained nonlinear optimization based on sequential quadratic programming, allowing for stable and accurate modeling of contact phenomena. We validate our simulator through a diverse set of real-world experiments, which include cantilever deflection, pressure-actuation of a soft robotic arm, and contact interactions from the PokeFlex dataset. In addition, we showcase the potential of our framework for control optimization of a soft robotic leg. These tests confirm that our simulator can capture both fundamental material behavior and complex actuation dynamics with high physical fidelity. By bridging the sim-to-real gap in these challenging domains, our approach provides a validated tool for prototyping next-generation soft robots, filling the gap of extensibility, fidelity, and usability in the soft robotic ecosystem.
Recent multimodal systems often rely on separate expert modality encoders which cause linearly scaling complexity and computational overhead with added modalities. While unified Omni-models address this via Mixture-of-Expert (MoE) architectures with specialized experts and routing, they still inflate parameter counts and introduce routing overhead. In this paper, we propose Omni-C (Omni-Compress), a single dense Transformer-based encoder that learns competitive shared representations across heterogeneous modalities--images, audio, and text--through unimodal contrastive pretraining on large-scale unaligned data. By maximizing parameter sharing in the backbone and using lightweight modality-specific projection heads, Omni-C effectively mitigates inter-modality conflicts without requiring MoE, paired supervision, or routing. This design supports efficient deployment on memory-constrained systems via sequential modality processing and low-memory inference, eliminating the need for parallel expert loading or specialized hardware. Experiments show Omni-C achieves performance comparable to expert models in unimodal and cross-model tasks, with modest zero-shot degradation on audio and text that is largely recovered through lightweight linear probing or parameter efficient fine-tuning. The unified architecture substantially reduces inference memory usage compared to multi-encoder baselines, advancing efficient and scalable multimodal learning.
Confounding pathology with normal anatomical variation remains a significant challenge in unsupervised medical-image anomaly detection, resulting in numerous false positives. To enhance integration of healthy variation, we augment the latent representation of a CNN autoencoder with contextual similarities within a normal cohort through batch-wise hypergraph estimation and a shared-weights graph convolution layer, producing a population-aware embedding. On a heterogeneous brain-tumor dataset of 2D MRI scans, the method improves separability between healthy and pathological samples, achieving an AUC-ROC of 0.90 (95% CI 0.84-0.95, 5.7% absolute gain), and a 16% absolute improvement in average precision (0.78 AP, 95% CI 0.66-0.89), thereby lowering false-positive rates. Moreover, both anomaly detection and downstream tumor versus no-tumor classification performance improve with the size of the mini-batch context captured in the augmented representation, suggesting a tunable lever for integrating healthy variation.
Gait recognition is a non-intrusive biometric technique for security applications, yet existing studies are dominated by silhouette- and parsing-based representations. Silhouettes are sparse and miss internal structural details, limiting discriminability. Parsing enriches silhouettes with part-level structures, but relies heavily on upstream human parsers (e.g., label granularity and boundary precision), leading to unstable performance across datasets and sometimes even inferior results to silhouettes. We revisit gait representations from a structural perspective and describe a design space defined by edge density and supervision form: silhouettes use sparse boundary edges with weak single-label supervision, while parsing uses denser cues with strong semantic priors. In this space, we identify an underexplored paradigm: dense part-level structure without explicit semantic labels, and introduce SKETCH as a new visual modality for gait recognition. Sketch extracts high-frequency structural cues (e.g., limb articulations and self-occlusion contours) directly from RGB images via edge-based detectors in a label-free manner. We further show that label-guided parsing and label-free sketch are semantically decoupled and structurally complementary. Based on this, we propose SKETCHGAIT, a hierarchically disentangled multi-modal framework with two independent streams for modality-specific learning and a lightweight early-stage fusion branch to capture structural complementarity. Extensive experiments on SUSTech1K and CCPG validate the proposed modality and framework: SketchGait achieves 92.9% Rank-1 on SUSTech1K and 93.1% mean Rank-1 on CCPG.
Collaborative training across multiple institutions is becoming essential for building reliable medical image segmentation models. However, privacy regulations, data silos, and uneven data availability prevent hospitals from sharing raw scans or annotations, limiting the ability to train generalizable models. Latent-space collaboration frameworks such as privacy-segmentation framework (SF) offer a promising alternative, but such methods still face challenges in segmentation accuracy and vulnerability to latent inversion and membership-inference attacks. This work introduces a privacy-preserving collaborative medical image segmentation framework (PPCMI-SF) designed for heterogeneous medical datasets. The approach combines skip-connected autoencoders for images and masks with a keyed latent transform that applies client-specific orthogonal mixing and permutation to protect latent features before they are shared. A unified mapping network on the server-side performs multi-scale latent-to-latent translation, enabling segmentation inference without exposing raw data. Experiments on four datasets: PSFH ultrasound, ultrasound nerve segmentation, FUMPE CTA, and cardiac MRI show that the proposed PPCMI-SF consistently achieves high Dice scores and improved boundary accuracy, as reflected by lower 95th percentile Hausdorff distance (HD95) and average symmetric surface distance (ASD) compared to the current state-of-the-art and performs competitively with privacy-agnostic baselines. Privacy tests confirm strong resistance to inversion and membership attacks, and the overall system achieves real-time inference with low communication overhead. These results demonstrate that accurate and efficient medical image segmentation can be achieved without compromising data privacy in multi-institution settings.
Urban traffic flow is governed by the complex, nonlinear interaction between land use configuration and spatiotemporally heterogeneous mobility demand. Conventional global regression and time-series models cannot simultaneously capture these multi-scale dynamics across multiple travel modes. This study proposes a GeoAI Hybrid analytical framework that sequentially integrates Multiscale Geographically Weighted Regression (MGWR), Random Forest (RF), and Spatio-Temporal Graph Convolutional Networks (ST-GCN) to model the spatiotemporal heterogeneity of traffic flow patterns and their interaction with land use across three mobility modes: motor vehicle, public transit, and active transport. Applying the framework to an empirically calibrated dataset of 350 traffic analysis zones across six cities spanning two contrasting urban morphologies, four key findings emerge: (i) the GeoAI Hybrid achieves a root mean squared error (RMSE) of 0.119 and an R^2 of 0.891, outperforming all benchmarks by 23-62%; (ii) SHAP analysis identifies land use mix as the strongest predictor for motor vehicle flows and transit stop density as the strongest predictor for public transit; (iii) DBSCAN clustering identifies five functionally distinct urban traffic typologies with a silhouette score of 0.71, and GeoAI Hybrid residuals exhibit Moran's I=0.218 (p<0.001), a 72% reduction relative to OLS baselines; and (iv) cross-city transfer experiments reveal moderate within-cluster transferability (R^2>=0.78) and limited cross-cluster generalisability, underscoring the primacy of urban morphological context. The framework offers planners and transportation engineers an interpretable, scalable toolkit for evidence-based multimodal mobility management and land use policy design.
Transitional autonomous vehicles (tAVs), which operate beyond SAE Level 1-2 automation but short of full autonomy, are increasingly sharing the road with human-driven vehicles (HDVs). As these systems interact during complex maneuvers such as lane changes, new patterns may emerge with implications for traffic stability and safety. Assessing these dynamics, particularly during mandatory lane changes, requires high-resolution trajectory data, yet datasets capturing tAV lane-changing behavior are scarce. This study introduces the North Carolina Transitional Autonomous Vehicle Lane-Changing (NC-tALC) Dataset, a high-fidelity trajectory dataset designed to characterize tAV interactions during lane-changing maneuvers. The dataset includes two controlled experimental series. In the first, tAV lane-changing experiments, a tAV executes lane changes in the presence of adaptive cruise control (ACC) equipped target vehicles, enabling analysis of lane-changing execution. In the second, tAV responding experiments, two tAVs act as followers and respond to cut-in maneuvers initiated by another tAV, enabling analysis of follower response dynamics. The dataset contains 152 trials (72 lane-changing and 80 responding trials) sampled at 20 Hz with centimeter-level RTK-GPS accuracy. The NC-tALC dataset provides a rigorous empirical foundation for evaluating tAV decision-making and interaction dynamics in controlled mandatory lane-changing scenarios.
Hopping robots often lose balance on slopes because the tilted ground creates unwanted rotation at landing. This work analyzes that effect using a simple spring mass model and identifies how slope induced impulses destabilize the robot. To address this, we introduce two straightforward fixes, adjusting the bodys touchdown angle based on the slope and applying a small corrective torque before takeoff. Together, these steps effectively cancel the unwanted rotation caused by inclined terrain, allowing the robot to land smoothly and maintain stable hopping even on steep slopes. Moreover, the proposed method remains simple enough to implement on low cost robotic platforms without requiring complex sensing or computation. By combining this analytical model with minimal control actions, this approach provides a practical path toward reliable hopping on uneven terrain. The results from simulation confirm that even small slope aware adjustments can dramatically improve landing stability, making the technique suitable for future autonomous field robots that must navigate natural environments such as hills, rubble, and irregular outdoor landscapes.
Rydberg-atom quantum receivers (RAQRs) enable electric-field sensing with quantum-noise-limited performance, yet their optical readout provides only magnitude measurements whose fluctuations follow Rician statistics governed by atomic projection noise, optical shot noise, reference-field injection, and short coherence times. These non-Gaussian, phase-blind measurements invalidate classical single-shot RF detectors and necessitate multi-shot quantum sensing strategies. This work develops a physically consistent multi-shot statistical model for RAQRs and derives both the optimal genie-aided likelihood-ratio test (LRT) and a practical phase-averaged LRT that removes dependence on the unknown RF-field phase. Closed-form test statistics and thresholds are obtained for both detectors, and the limits imposed by finite quantum shots-due to atomic dephasing and measurement backaction-are explicitly quantified. A fully non-coherent energy detector is also analysed, with exact detection probability derived using noncentral chi-square models. Monte Carlo results show that only 5-10 quantum shots yield major gains: the phase-averaged LRT closely approaches the genie bound and RAQR detection markedly outperforms classical RF energy detection under comparable received power. The proposed framework provides the first unified statistical basis for multi-shot Rydberg-based weak-field detection and underscores the potential of RAQRs for quantum-enhanced signal detection.
In the dynamic landscape of modern healthcare, maintaining the highest standards in surgical instruments is critical for clinical success. This report explores the diverse realm of surgical instruments and their associated manufacturing defects, emphasizing their pivotal role in ensuring the safety of surgical procedures. With potentially fatal consequences arising from even minor defects, precision in manufacturing is this http URL report addresses the identification and rectification of critical defects such as cracks, rust, and structural irregularities. Such scrutiny prevents substantial financial losses for manufacturers and, more crucially, safeguards patient lives. The collaboration with industry leaders Daddy D Pro and Dr. Frigz International, renowned trailblazers in the Sialkot surgical cluster, provides invaluable insights into the analysis of defects in Pakistani-made instruments. This partnership signifies a commitment to advancing automated defect detection methodologies, specifically through the integration of deep learning architectures including YOLOv8, ResNet-152, and EfficientNet-b4, thereby elevating quality standards in the manufacturing process. The scope of this report is to identify various surgical instruments manufactured in Pakistan and analyze their associated defects using a newly developed dataset of 4,414 high-resolution images. By focusing on quality assurance through Automated Optical Inspection (AOI) tools, this document serves as a resource for manufacturers, healthcare professionals, and regulatory bodies. The insights gained contribute to the enhancement of instrument standards, ensuring a more reliable healthcare environment through industry expertise and cutting-edge technology.
This paper develops a physically consistent signal model with hardware constraints for a simultaneous transmitting and reflecting beyond-diagonal RIS (STAR BD-RIS) endowed with per-element amplification and lossless power splitting. We explicitly decouple (i) amplification via a diagonal gain matrix, (ii) element-wise reflection/transmission splitting, and (iii) passive beyond-diagonal coupling on each branch, while enforcing practical feasibility through per-element emission caps and an aggregate RIS power budget under the operating covariance. Building on this model, we cast downlink sum-rate maximization as an equivalent weighted minimum mean-square error (WMMSE) problem and propose an alternating optimization framework with provable monotonic descent. The method admits closed-form updates for MMSE combiners and weights, waterfilling-like beamformer updates via a single dual variable, a per-element amplification update that satisfies emission constraints, and a STAR power-splitting update based on cyclic coordinate descent with a global acceptance test. For the beyond-diagonal coupling matrices, we derive Riemannian gradient steps on the complex Stiefel manifold with QR/polar retraction method, preserving passivity at every iterate. Furthermore, the proposed approach decouples the optimization of the reflective and transmissive responses of the BD-RIS, enabling efficient distributed implementation. Numerical results demonstrate substantial sum-rate gains compared to the conventional passive BD-RIS.
In Internet of Things (IoTs), the freshness of system status information is crucial for real-time monitoring and decision-making. This paper studies the transmission scheduling problem in wireless monitoring systems, where information freshness -- typically quantified by the Age of Information (AoI) -- is heavily constrained by limited channel resources and influenced by factors such as the randomness of data arrivals and unreliable wireless channel. Such randomness leads to asynchronous AoI evolution at local sensors and the monitoring center, rendering conventional scheduling policies that rely solely on the monitoring center's AoI inefficient. To this end, we propose a dual-AoI model that captures asynchronous AoI dynamics and formulate the problem as minimizing a long-term time-average AoI function. We develop a scheduling policy based on Markov decision process (MDP) to solve the problem, and analyze the existence and monotonicity of a deterministic stationary optimal policy. Moreover, we derive a low-complexity scheduling policy which exhibits a channel-state-dependent threshold structure. In addition, we establish a necessary and sufficient condition for the stability of the AoI objective. Simulation results demonstrate that the proposed policy outperforms existing approaches.
The secrecy performance of continuous-aperture array (CAPA)-based wiretap channels in terms of secrecy rate and secrecy outage probability (SOP) is analyzed. First, the system models of CAPA systems with maximum-ratio transmission under a Rayleigh fading channel are established, and approximate probability density functions for the legitimate user Bob's signal-to-noise ratio (SNR) and the eavesdropper Eve's SNR are derived using Mercer's theorem and Landau's eigenvalue theorem. Three scenarios are considered, including a single Eve, multiple independent Eves, and multiple collaborative Eves. Next, the expressions of the secrecy rate and SOP under these three scenarios are derived, and the high-SNR slope, high-SNR power offset, diversity order, and array gain in Bob's high-SNR region are obtained. It is then theoretically proven that, in all three scenarios, the CAPA system achieves the same high-SNR slope and the same diversity order, with the latter being equal to the spatial degrees of freedom. Moreover, the CAPA system with a single Eve has the smallest high-SNR offset and the highest array gain, whereas the CAPA system with multiple collaborative Eves exhibits the largest high-SNR offset and the lowest array gain. Finally, the theoretical analyses of secrecy rate, SOP, high-SNR performance are validated by the simulation results, and a higher secrecy rate and a lower SOP are achieved by the CAPA systems compared to the spatially-discrete array systems with half-wavelength antenna spacing.
Long-form speech recognition with large encoder-decoder models such as Whisper often exhibit hallucinations, repetition loops, and content omissions. These errors can accumulate and be further amplified when the previous segment's transcription is used as decoding context. We propose Whisper-CD, a training-free contrastive decoding framework that contrasts clean-audio logits against negative logits computed from three acoustically motivated perturbations: Gaussian noise injection, silence signal, and audio temporal shift. We aggregate these negatives via the log-sum-exp operator, building a unified multi-negative objective for token-by-token decoding. Across five English long-form benchmarks, Whisper-CD reduces WER by up to 24.3pp on CORAAL and shows 48% faster token generation throughput than beam search. Because Whisper-CD operates purely at inference time, it can be applied as a drop-in replacement to already-deployed Whisper systems without retraining.
Residential demand response depends on sustained prosumer participation, yet existing coordination is either fully automated, or limited to one-way dispatch signals and price alerts that offer little possibility for informed decision-making. This paper introduces Conversational Demand Response (CDR), a coordination mechanism where aggregators and prosumers interact through bidirectional natural language, enabled through agentic AI. A two-tier multi-agent architecture is developed in which an aggregator agent dispatches flexibility requests and a prosumer Home Energy Management System (HEMS) assesses deliverability and cost-benefit by calling an optimization-based tool. CDR also enables prosumer-initiated upstream communication, where changes in preferences can reach the aggregator directly. Proof-of-concept evaluation shows that interactions complete in under 12 seconds. The architecture illustrates how agentic AI can bridge the aggregator-prosumer coordination gap, providing the scalability of automated DR while preserving the transparency, explainability, and user agency necessary for sustained prosumer participation. All system components, including agent prompts, orchestration logic, and simulation interfaces, are released as open source to enable reproducibility and further development.
This paper investigates an autonomous navigation method for spacecraft operating in the outer solar system, up to 250 AU from the Sun, using the parallactic shifts of nearby stars. These measurements enable estimation of the spacecraft trajectory while distant stars provide attitude information through conventional star-pattern matching. Stellar observation models are developed, accounting for delta light-time, parallax, and aberration effects. Navigation performance is assessed using two approaches: (1) a least-squares estimator using simultaneous multi-star measurements, and (2) a Kalman filter processing sequential single-star observations along deep-space trajectories. Monte Carlo simulations on trajectories representative of Voyager 1, Voyager 2, Pioneer 10, Pioneer 11, and New Horizons missions show sub-AU position accuracies at 250 AU, and velocity accuracies better than 0.00004 AU/day, under realistic spacecraft and instrumentation uncertainties. These values correspond to relative errors below 0.4% in position and velocity with respect to the reference trajectories. Although less precise than radiometric tracking, this performance can support navigation in the outer solar system without reliance on Earth. When ground-based navigation remains necessary, this approach can be employed during long cruising phases, lowering the number of ground contacts. The method additionally shows potential for future missions venturing farther from the Sun.
Generalizing across unknown targets is critical for open-world perception, yet existing 3D Multi-Object Tracking (3D MOT) pipelines remain limited by closed-set assumptions and ``semantic-blind'' heuristics. To address this, we propose Next-step Open-Vocabulary Autoregression (NOVA), an innovative paradigm that shifts 3D tracking from traditional fragmented distance-based matching toward generative spatio-temporal semantic modeling. NOVA reformulates 3D trajectories as structured spatio-temporal semantic sequences, enabling the simultaneous encoding of physical motion continuity and deep linguistic priors. By leveraging the autoregressive capabilities of Large Language Models (LLMs), we transform the tracking task into a principled process of next-step sequence completion. This mechanism allows the model to explicitly utilize the hierarchical structure of language space to resolve fine-grained semantic ambiguities and maintain identity consistency across complex long-range sequences through high-level commonsense reasoning. Extensive experiments on nuScenes, V2X-Seq-SPD, and KITTI demonstrate the superior performance of NOVA. Notably, on the nuScenes dataset, NOVA achieves an AMOTA of 22.41% for Novel categories, yielding a significant 20.21% absolute improvement over the baseline. These gains are realized through a compact 0.5B autoregressive model. Code will be available at this https URL.
3D semantic occupancy prediction is a cornerstone of robotic perception, yet real-world voxel annotations are inherently corrupted by structural artifacts and dynamic trailing effects. This raises a critical but underexplored question: can autonomous systems safely rely on such unreliable occupancy supervision? To systematically investigate this issue, we establish OccNL, the first benchmark dedicated to 3D occupancy under occupancy-asymmetric and dynamic trailing noise. Our analysis reveals a fundamental domain gap: state-of-the-art 2D label noise learning strategies collapse catastrophically in sparse 3D voxel spaces, exposing a critical vulnerability in existing paradigms. To address this challenge, we propose DPR-Occ, a principled label noise-robust framework that constructs reliable supervision through dual-source partial label reasoning. By synergizing temporal model memory with representation-level structural affinity, DPR-Occ dynamically expands and prunes candidate label sets to preserve true semantics while suppressing noise propagation. Extensive experiments on SemanticKITTI demonstrate that DPR-Occ prevents geometric and semantic collapse under extreme corruption. Notably, even at 90% label noise, our method achieves significant performance gains (up to 2.57% mIoU and 13.91% IoU) over existing label noise learning baselines adapted to the 3D occupancy prediction task. By bridging label noise learning and 3D perception, OccNL and DPR-Occ provide a reliable foundation for safety-critical robotic perception in dynamic environments. The benchmark and source code will be made publicly available at this https URL.
While Hamiltonian mechanics provides a powerful inductive bias for neural networks modeling dynamical systems, Hamiltonian Neural Networks and their variants often fail to capture complex temporal dynamics spanning multiple timescales. This limitation is commonly linked to the spectral bias of deep neural networks, which favors learning low-frequency, slow-varying dynamics. Prior approaches have sought to address this issue through symplectic integration schemes that enforce energy conservation or by incorporating geometric constraints to impose structure on the configuration-space. However, such methods either remain limited in their ability to fully capture multiscale dynamics or require substantial domain specific assumptions. In this work, we exploit the observation that Hamiltonian functions admit decompositions into explicit fast and slow modes and can be reconstructed from these components. We introduce the Frequency-Separable Hamiltonian Neural Network (FS-HNN), which parameterizes the system Hamiltonian using multiple networks, each governed by Hamiltonian dynamics and trained on data sampled at distinct timescales. We further extend this framework to partial differential equations by learning a state- and boundary-conditioned symplectic operators. Empirically, we show that FS-HNN improves long-horizon extrapolation performance on challenging dynamical systems and generalizes across a broad range of ODE and PDE problems.
Accurate fault detection in high-dimensional industrial environments remains a major challenge due to the inherent complexity, noise, and redundancy in sensor data. This paper introduces CLAIRE, i.e., a hybrid end-to-end learning framework that integrates unsupervised deep representation learning with supervised classification for intelligent quality control in smart manufacturing systems. It employs an optimized deep autoencoder to transform raw input into a compact latent space, effectively capturing the intrinsic data structure while suppressing irrelevant or noisy features. The learned representations are then fed into a downstream classifier to perform binary fault prediction. Experimental results on a high-dimensional dataset demonstrate that CLAIRE significantly outperforms conventional classifiers trained directly on raw features. Moreover, the framework incorporates a post hoc phase, using a game-theory-based interpretability technique, to analyze the latent space and identify the most informative input features contributing to fault predictions. The proposed framework highlights the potential of integrating explainable AI with feature-aware regularization for robust fault detection. The modular and interpretable nature of the proposed framework makes it highly adaptable, offering promising applications in other domains characterized by complex, high-dimensional data, such as healthcare, finance, and environmental monitoring.
Safe autonomy is a critical requirement and a key enabler for robots to operate safely in unstructured complex environments. Control barrier functions and safe motion corridors are two widely used but technically distinct safety methods, functional and geometric, respectively, for safe motion planning and control. Control barrier functions are applied to the safety filtering of control inputs to limit the decay rate of system safety, whereas safe motion corridors are geometrically constructed to define a local safe zone around the system state for use in motion optimization and reference-governor design. This paper introduces a new notion of control barrier corridors, which unifies these two approaches by converting control barrier functions into local safe goal regions for reference goal selection in feedback control systems. We show, with examples on fully actuated systems, kinematic unicycles, and linear output regulation systems, that individual state safety can be extended locally over control barrier corridors for convex barrier functions, provided the control convergence rate matches the barrier decay rate, highlighting a trade-off between safety and reactiveness. Such safe control barrier corridors enable safely reachable persistent goal selection over continuously changing barrier corridors during system motion, which we demonstrate for verifiably safe and persistent path following in autonomous exploration of unknown environments.
Coded caching (CC) can transform cache memory at network devices into an active communication resource. Prior studies have shown that CC can significantly enhance the achievable Degrees of Freedom (DoF) in multi-input multi-output (MIMO) systems. To fully exploit MIMO-CC gains across all SNR regimes and enable practical linear receivers, flexible scheduling is required. Existing DoF analysis, scheduling, and linear receiver design, however, largely assume symmetric stream allocations across users. This paper extends the authors' recent work on DoF and linear decodability analysis for MIMO-CC systems by deriving a simple criterion, based on per-user stream allocation, that guarantees linear decodability for both symmetric and non-symmetric bit-level CC schemes. Building on this, we propose a heuristic MIMO-CC delivery and scheduling framework that enables asymmetric stream allocation while adhering to linear decodability, thereby expanding the feasibility region of achievable DoF compared to symmetric-constrained designs.
Accurate and adaptive dynamic models are critical for underwater vehicle-manipulator systems where hydrodynamic effects induce time-varying parameters. This paper introduces a novel uncertainty-aware adaptive dynamics model framework that remains linear in lumped vehicle and manipulator parameters, and embeds convex physical consistency constraints during online estimation. Moving horizon estimation is used to stack horizon regressors, enforce realizable inertia, damping, friction, and hydrostatics, and quantify uncertainty from parameter evolution. Experiments on a BlueROV2 Heavy with a 4-DOF manipulator demonstrate rapid convergence and calibrated predictions. Manipulator fits achieve R2 = 0.88 to 0.98 with slopes near unity, while vehicle surge, heave, and roll are reproduced with good fidelity under stronger coupling and noise. Median solver time is approximately 0.023 s per update, confirming online feasibility. A comparison against a fixed parameter model shows consistent reductions in MAE and RMSE across degrees of freedom. Results indicate physically plausible parameters and confidence intervals with near 100% coverage, enabling reliable feedforward control and simulation in underwater environments.
Seismic exploration remains the most critical method for characterizing subsurface structures in geophysics. However, complex surface conditions often cause a non-uniform distribution of seismic receivers along survey lines, leading to irregularly acquired seismic data, which affects subsequent processing and inversion. Prior deep learning-based seismic data reconstruction methods typically rely on datasets for supervised training. While some existing methods avoid extra data, they lack effective constraints on reconstructed data, leading to unstable performance. In this study, we propose a self-supervised self-consistency learning strategy with a lightweight network for seismic data reconstruction. Our method requires no extra datasets, and it leverages inter-component correlations in seismic data to design a loss function, optimizing a network with only 188,849 learnable parameters. Validated on two public seismic datasets, results demonstrate our approach yields high-quality reconstruction, providing significant value for large-scale and complex seismic exploration tasks.
While LLM-based TTS models exhibit zero-shot emotion and speaker cloning, their cloning fidelity and pronunciation clarity degrade on unseen domains. Fine-tuning is essential for adaptation, yet uniform approaches overlook specific parameter contributions. Uniform tuning on limited data causes slow training and catastrophic forgetting, leading to degraded pronunciation accuracy. To address this, we propose CSP-FT, a characteristic-specific partial fine-tuning strategy. By dynamically analyzing layer contributions via a weighted sum, we selectively fine-tune only the two layers capturing the most and least emotion and speaker information, maximizing the utility of the former while explicitly strengthening the capacity of the latter. Experiments on a combined corpus of 11 datasets show CSP-FT matches or exceeds the fidelity and intelligibility of full fine-tuning while updating only ~8% of parameters, accelerating training by ~2x, and significantly mitigating catastrophic forgetting.
This paper focuses on the problem of automatic link selection in multi-channel multiple access control using bandit feedback. In particular, a controller assigns multiple users to multiple channels in a time-slotted system, where in each time slot, at most one user can be assigned to a given channel, and at most one channel can be assigned to a given user. Given that user $i$ is assigned to channel $j$, the transmission fails with a fixed unknown probability $1-q_{i,j}$. The assignments are made dynamically using success/failure feedback. The goal is to maximize the time-average utility, where we consider an arbitrary (possibly nonsmooth) concave, entrywise nondecreasing utility function. The first proposed algorithm has fast $\mathcal{O}(\sqrt{\log(T)/T})$ convergence. However, this algorithm requires solving a convex optimization problem within each iteration, which can be computationally expensive. The second algorithm has slower $\mathcal{O}(\sqrt[3]{\log(T)/T})$ convergence, while avoiding the costly inner optimization. Both of these algorithms are adaptive. In particular, the convergence guarantee holds for any interval of $T$ consecutive slots during which the success probabilities do not change. We further study several special cases. In the single-channel setting, we obtain both fast $\mathcal{O}(\sqrt{\log(T)/T})$ convergence and efficient implementation via a simpler adaptive mechanism. We also consider a UCB-based non-adaptive algorithm with max-weight-type decisions. Simulations highlight intriguing performance trade-offs and demonstrate rapid adaptation of the proposed adaptive schemes.
This paper presents a Virtual Inertia Skyhook (VISKY) controller for magnetorheological (MR) dampers in semi-active suspensions. The proposed law is derived from a continuous sky-ground damping baseline augmented with acceleration feedback on the sprung and unsprung masses. In the closed-loop equations, these acceleration terms appear as a mass-like virtual inertia matrix rather than as a change in physical hardware. This interpretation motivates the VISKY name while making the underlying sky-ground hybrid structure explicit. Numerical evaluations under half-sine bump, representative ISO 8608 random-road and while-acceleration metrics relative to conventional Skygroundhook, with the largest gains appearing near the wheel-hop mode. The controller retains low computational overhead because it requires only algebraic force computation and bounded MR inversion.
Image and video quality metrics, such as SSIM, LPIPS, and VMAF, aim to predict perceived visual quality and are often assumed to reflect principles of human vision. However, relatively few metrics explicitly incorporate models of human perception, with most relying on hand-crafted formulas or data-driven training to approximate perceptual alignment. In this paper, we introduce a set of tests for full-reference quality metrics that evaluate their ability to capture key aspects of low-level human vision: contrast sensitivity, contrast masking, and contrast matching. These tests provide an additional framework for assessing both established and newly proposed metrics. We apply the tests to 34 existing quality metrics and highlight patterns in their behavior, including the ability of LPIPS and MS-SSIM to predict contrast masking and the tendency of SSIM to overemphasize high spatial frequencies, which is mitigated in MS-SSIM, and the general inability of metrics to model supra-threshold contrast constancy. Our results demonstrate how these tests can reveal properties of quality metrics that are not easily observed with standard evaluation protocols.
Data availability is essential in the development of acoustic signal processing algorithms, especially when it comes to data-driven approaches that demand large and diverse training datasets. For this reason, an increasing number of databases have been published in recent years, including either room impulse responses (RIRs) or audio recordings during motion. In this paper we introduce the trajectoRIR database, an extensive, multi-array collection of both dynamic and stationary acoustic recordings along a controlled trajectory in a room. Specifically, the database contains moving-microphone recordings and stationary RIRs that spatially sample the room acoustics along an L-shaped trajectory. This combination makes trajectoRIR unique and applicable to a wide range of tasks, including sound source localization and tracking, spatially dynamic sound field reconstruction, auralization, and system identification. The recording room has a reverberation time of 0.5 s, and the three different microphone configurations employed include a dummy head, with additional reference microphones located next to the ears, 3 first-order Ambisonics microphones, two circular arrays of 16 and 4 channels, and a 12-channel linear array. The motion of the microphones was achieved using a robotic cart traversing a 4.62 m-long rail at three speeds: [0.2, 0.4, 0.8] m/s. Audio signals were reproduced using two stationary loudspeakers. The collected database features 8648 stationary RIRs, as well as perfect sweeps, speech, music, and stationary noise recorded during motion. Python functions are provided to access the recorded audio and retrieve the associated geometric information.
We introduce a unified framework for analyzing utility regions of wireless networks, with a focus on signal-to-interference-plus-noise-ratio (SINR) and achievable rate regions. The framework provides valuable insights into interference patterns of modern network architectures, including extremely large MIMO and cell-less networks. A central contribution is a simple characterization of feasible utility regions using the concept of spectral radius of nonlinear mappings. This characterization provides a powerful mathematical tool for wireless system design and analysis. For example, it allows us to generalize existing characterizations of the weak Pareto boundary using compact notation. It also allows us to derive tractable sufficient conditions for the identification of convex utility regions. This property is particularly important because, on the weak Pareto boundary, it guarantees that time sharing (or user grouping) cannot simultaneously improve the utilities of all users. Beyond geometrical insights, these sufficient conditions have two key implications. First, they identify a family of (weighted) sum-rate maximization problems that are inherently convex, thus paving the way for the development of efficient, provably optimal solvers for this family. Second, they provide justification for formulating sum-rate maximization problems directly in terms of achievable rates, rather than SINR levels. Our theoretical insights also motivate an alternative to the concept of favorable propagation in the massive MIMO literature -- one that explicitly accounts for self-interference and the beamforming strategy.
We introduce the multivariate fields of experts, a new framework for the learning of image priors. Our model generalizes existing fields of experts methods by incorporating multivariate potential functions constructed via Moreau envelopes of the $\ell_\infty$-norm. We demonstrate the effectiveness of our proposal across a range of inverse problems that include image denoising, deblurring, compressed-sensing magnetic-resonance imaging, and computed tomography. The proposed approach outperforms comparable univariate models and achieves performance close to that of deep-learning-based regularizers while being significantly faster, requiring fewer parameters, and being trained on substantially fewer data. In addition, our model retains a high level of interpretability due to its structured design. It is supported by theoretical convergence guarantees which ensure reliability in sensitive reconstruction tasks.
An important limitation of Inverter-Based Resources (IBR) is their reduced contribution to Short-Circuit Current (SCC), as compared to that of Synchronous Generators (SGs). With increasing penetration of IBR in most power systems, the reducing SCC poses challenges to a secure system operation, as line protections may not trip when required. In order to address this issue, the SCC ancillary service could be procured via an economic mechanism, aiming at securing adequate SCC on all buses. However, the suitability of markets for SCC services is not well understood, given that these could be prone to market power issues: since the SCC contributions from various SGs to a certain bus are determined by the electrical topology of the grid, this is a highly local service. It is necessary to understand if SGs at advantageous electrical locations could exert market power and, if so, how it could be mitigated. In order to fill this gap, this paper, for the first time, adopts an SCC-constrained bilevel model to investigate strategic behaviors of SGs. To address the non-convexity due to unit commitment variables, the model is restructured through a primal-dual formulation. Based on a modified IEEE 30-bus system, cases with strategic SGs placed at different buses are analyzed. These studies demonstrate that strategic agents exerting market power by manipulating service prices and extending operating periods could achieve up to triple revenues from SCC provision, which reduces market efficiency and would increase the financial burden on consumers. These findings highlight the need for careful market design, for which potential measures to mitigate these market power issues are also discussed.
Background and Objective: The electrocardiogram (ECG) plays a crucial role in the diagnosis and treatment of various cardiac diseases. ECG signals suffer from low-resolution (LR) due to the use of convenient acquisition devices, as well as internal and external noises and artifacts. Classical ECG super-resolution (ECGSR) methods adopt an open-loop architecture that converts LR ECG signals to super-resolution (SR) ones. According to the theory of automatic control, a closed-loop framework exhibits superior dynamic and static performance compared with its open-loop counterpart. Methods: This paper proposes a closed-loop approach, termed circular ECGSR (CECGSR), which models the degradation process from SR ECG signals to LR ones. The negative feedback mechanism of the closed-loop system is based on the differences between the LR ECG signals. A mathematical loop equation is constructed to characterize the closed-loop infrastructure. The Taylor series expansion is employed to demonstrate the near-zero steady-state error of the proposed method. A Plug-and-Play strategy is considered to establish the SR unit of the proposed architecture, leveraging any existing advanced open-loop ECGSR methods. This paper also presents Transformer model based open-loop ECGSR and closed-loop CECGSR algorithms. Results: Simulation experiments on both noiseless and noisy subsets of the Physikalisch-Technische Bundesanstalt-Extra Large (PTB-XL) datasets demonstrate that the proposed CECGSR outperforms state-of-the-art open-loop ECGSR algorithms in the reconstruction performance of ECG signals. Conclusions: The proposed method will efficiently enrich ECG signal details and remove ECG signal artifacts in clinical applications.
Modern data-driven applications that make real-time decisions increasingly depend on advanced sensors which use pre-stored calibration data. In such applications, accurate characterization of sensor output uncertainty is important for reliable data interpretation. Here, we present a method for real-time on-device dynamic uncertainty quantification for sensor outputs which depend on pre-stored calibration data. We show how sensor calibration compensation equations (essential in advanced sensing systems) propagate uncertainties resulting from the quantization of calibration parameters to the sensor output. We use a low-cost thermal sensor as a motivating example and show these ideas are practical and possible on actual embedded sensor systems by prototyping them on two commercially-available uncertainty tracking hardware platforms. One has average power dissipation 16.7 mW and achieves 42.9x speedup compared to the equal-accuracy Monte Carlo computation (the status quo), and the other 147.15 mW and achieves 94.4x speedup. We present a proof-of-usefulness application using the quantified uncertainty in edge detection over ten test scenes where we show accuracy and precision average improvement by 4.97 and 40.25 percentage points, respectively, trading off sensitivity. Another application example examines uncertainty quantification for four different calibration-data storage scenarios and compute that a 48% increase in memory yields 75% smaller uncertainty metrics over the baseline.
A data-driven framework is proposed for online estimation of quadrotor motor efficiency via residual minimization. The problem is formulated as a constrained nonlinear optimization that minimizes trajectory residuals between measured flight data and predictions generated by a quadrotor dynamics model. A sliding-window strategy enables online estimation, and the optimization is efficiently solved using an iteratively reweighted least squares (IRLS) scheme combined with a primal-dual interior-point method, with inequality constraints enforced through a logarithmic barrier function. Robust z-score weighting is employed to reject outliers, which is particularly effective in motor clipping scenarios where the proposed estimator exhibits smaller spikes than an EKF baseline. Compared to traditional filter-based approaches, the batch-mode formulation allows selective inclusion of data segments via IRLS reweighting and hard-rejection. This structure is well-suited for online estimation and supports applications such as fault detection and isolation (FDI), health monitoring, and predictive maintenance in aerial robotic systems. Simulation results under various degradation scenarios demonstrate the accuracy and robustness of the proposed estimator.
This paper presents conservative probabilistic bounds for the spectrum of the admittance matrix and classical linear power flow models under uncertain network parameters; for example, probabilistic line contingencies. Our proposed approach imports tools from probability theory, such as concentration inequalities for random matrices. This provides a theoretical framework for understanding error bounds of common approximations of the AC power flow equations under parameter uncertainty, including the DC and LinDistFlow approximations. Additionally, we show that the upper bounds scale as functions of nodal criticality. This network-theoretic quantity captures how uncertainty concentrates at critical nodes for use in contingency analysis. We validate these bounds on IEEE test networks, demonstrating that they correctly capture the scaling behavior of spectral perturbations up to conservative constants.
Neural ordinary differential equations (neural ODE) are powerful continuous-time machine learning models for depicting the behavior of complex dynamical systems, but their verification remains challenging due to limited reachability analysis tools adapted to them. We propose a novel interval-based reachability method that leverages continuous-time mixed monotonicity techniques for dynamical systems to compute an over-approximation for the neural ODE reachable sets. By exploiting the geometric structure of full initial sets and their boundaries via the homeomorphism property, our approach ensures efficient bound propagation. By embedding neural ODE dynamics into a mixed monotone system, our interval-based reachability approach, implemented in TIRA with single-step, incremental, and boundary-based approaches, provides sound and computationally efficient over-approximations compared with CORA's zonotopes and NNV2.0 star set representations, while trading tightness for efficiency. This trade-off makes our method particularly suited for high-dimensional, real-time, and safety-critical applications. Applying mixed monotonicity to neural ODE reachability analysis paves the way for lightweight formal analysis by leveraging the symmetric structure of monotone embeddings and the geometric simplicity of interval boxes, opening new avenues for scalable verification. This novel approach is illustrated on two numerical examples of a spiral system and a fixed-point attractor system modeled as a neural ODE.
This study presents a control strategy for coordinating multiple unmanned aerial vehicles (UAVs) to monitor unknown flood regions and estimate the extent of inundation. The proposed method adopts a density-driven coverage framework based on Centroidal Voronoi Tessellation (CVT), in which the density function is modeled using a Gaussian Mixture of Density Functions (GMDF). This formulation provides a more accurate characterization of inundated areas compared to conventional axis-aligned Gaussian models. The performance of the two density modeling approaches is systematically evaluated under different UAV fleet sizes (16, 20, and 24), with multiple simulation trials conducted in the ROS/Gazebo environment. The results show that the GMDF-based formulation consistently achieves higher coverage rates, demonstrating its effectiveness in enhancing flood monitoring and improving UAV spatial distribution.
Synthetic aperture radar (SAR) enables versatile, all-time, all-weather remote sensing. Coupled with automatic target recognition (ATR) leveraging machine learning (ML), SAR is empowering a wide range of Earth observation and surveillance applications. However, the surge of attacks based on adversarial perturbations against the ML algorithms underpinning SAR ATR is prompting the need for systematic research into adversarial perturbation mechanisms. Research in this area began in the digital (image) domain and evolved into the physical (signal) domain, resulting in physical adversarial attacks (PAAs) that strategically exploit corner reflectors as attack vectors to evade ML-based ATR. Existing PAAs assume that the attacker knows the SAR platform's aspect angles, restricting their applicability to idealized scenarios. We propose the SAR Aspect-Angles-Invariant Physical Adversarial Attack (SAAIPAA), a framework that determines the optimal positions and orientations of any given set of reflectors, regardless of their number or size, even when the attacker lacks knowledge of the SAR platform's aspect angles. This is enabled by rigorous physics-based modeling of the reflected signal and the SAR imaging process. To facilitate mapping between image and scene coordinates, we additionally propose a method for generating bounding boxes in densely sampled azimuthal SAR images, allowing the target object to serve as a spatial reference. The resultant physical evasion attacks are efficiently realizable and optimal over the considered range of aspect angles between a SAR platform and a target, achieving state-of-the-art fooling rates (80% for DenseNet-121 and ResNet50) in the white-box setting for a four-reflector configuration. When aspect angles are known to the attacker, an average fooling rate of is 99.2% attainable. In black-box settings, SAAIPAA transfers well between some models.
Speech-to-Speech (S2S) models have shown promising dialogue capabilities, but their ability to handle paralinguistic cues - such as emotion, tone, and speaker attributes - and to respond appropriately in both content and style remains under-explored. Progress is further hindered by the scarcity of high-quality and expressive demonstrations. To address this, we introduce a new reinforcement learning (RL) framework for paralinguistic-aware S2S, ParaS2S, which evaluates and optimizes both response content and speaking style directly at the waveform level. We first construct ParaS2SBench, a benchmark that evaluates the naturalness of input-output pairs in terms of content and speaking style using expressive and challenging queries. For the automatic judge, we propose a PolyTone training strategy and a multi-stage framework, preventing the style hallucination of end-to-end audio LLM judging. Our judge correlates well with human preferences and is scalable, enabling the model to interact and learn from unlabeled speech via RL. Experiments show that existing S2S models fail to respond appropriately to paralinguistic attributes, performing no better than pipeline-based baselines. Our RL approach (ParaS2SAlign) achieves a 10% relative improvement in the appropriateness of response content and speaking style on ParaS2SBench over supervised fine-tuning (SFT), surpassing all prior models while requiring substantially fewer paired demonstrations than pure SFT. Our findings highlight the need for a scalable and accurate automatic evaluator for speech-to-speech interaction.
Beyond the commonly recognized optical aberrations, the imaging performance of simplified optical systems--including single-lens and metalens designs--is often further degraded by veiling glare caused by stray-light scattering from non-ideal optical surfaces and coatings, particularly in complex real-world environments. This compound degradation undermines traditional lens aberration correction yet remains underexplored. A major challenge is that conventional scattering models (e.g., for dehazing) fail to fit veiling glare due to its spatial-varying and depth-independent nature. Consequently, paired high-quality data are difficult to prepare via simulation, hindering application of data-driven veiling glare removal models. To this end, we propose VeilGen, a generative model that learns to simulate veiling glare by estimating its underlying optical transmission and glare maps in an unsupervised manner from target images, regularized by Stable Diffusion (SD)-based priors. VeilGen enables paired dataset generation with realistic compound degradation of optical aberrations and veiling glare, while also providing the estimated latent optical transmission and glare maps to guide the veiling glare removal process. We further introduce DeVeiler, a restoration network trained with a reversibility constraint, which utilizes the predicted latent maps to guide an inverse process of the learned scattering model. Extensive experiments on challenging simplified optical systems demonstrate that our approach delivers superior restoration quality and physical fidelity compared with existing methods. These suggest that VeilGen reliably synthesizes realistic veiling glare, and its learned latent maps effectively guide the restoration process in DeVeiler. All code and datasets will be publicly released at this https URL.
This paper introduces a novel reachability problem for the scenario involving two agents, where one agent follows another agent using a feedback strategy. The geometry of the reachable set for an agent, termed \emph{dependent reachable set}, is characterized using the constant bearing pursuit strategy as a case study. Key theoretical results are presented that provide geometric bounds for the associated dependent reachable set. Simulation results are presented to empirically establish the shape of the dependent reachable set. In the process, an original optimization problem is formulated and analyzed for the constant bearing pursuit strategy.
Millimeter-wave (mmWave) radar provides robust sensing under adverse conditions and can penetrate thin materials for non-visual perception in industrial and robotic settings. Recent work with MIMO mmWave radar has demonstrated its ability to penetrate cardboard packaging for occluded object classification. However, existing models leave room for extensions and improvements across different sensing frequencies. Building on recent work with MIMO radar for occluded object classification, we propose ACCOR, an attention-enhanced complex-valued contrastive learning approach for radar, enabling robust occluded object classification. ACCOR processes complex-valued IQ radar signals via a complex-valued CNN backbone, a multi-head attention layer and a hybrid loss. The hybrid loss combines a weighted cross-entropy term with a supervised contrastive term. We extend an existing 64 GHz dataset with a new 67 GHz subset and evaluate performance across both bands. ACCOR achieves 96.60 % accuracy at 64 GHz and 93.59 % at 67 GHz on 10 objects, surpassing prior radar-specific and adapted image models. Results demonstrate the benefits of integrating complex-valued deep learning, attention, and contrastive learning for mmWave radar-based occluded object classification.
This paper addresses precoder design for secure multiple-input multiple-output (MIMO) integrated sensing and communications (ISAC) systems. We introduce a MIMO channel with a multiple-antenna eavesdropper and a multiple-antenna sensing receiver (MIMO-ME-MS) and analyze the fundamental performance limits of this tripartite tradeoff. Using sensing mutual information, we formulate the precoder design as a nonconvex weighted sum rate maximization problem. A high signal-to-noise ratio analysis based on a subspace decomposition characterizes the maximum weighted degrees of freedom. This analysis reveals the structure of a quasi-optimal precoder that must span the ``useful subspace'' and demonstrates the inadequacy of extending known schemes from simpler wiretap or ISAC channels. To solve this nonconvex problem, we develop a practical two-stage iterative algorithm that alternates between a sequential basis construction stage and a power allocation stage that solves the resulting difference-of-convex program. We demonstrate that the proposed method captures the desirable precoder structure identified in our analysis and achieves substantial performance gains in the MIMO-ME-MS channel.
Audio-visual speech recognition (AVSR) typically improves recognition accuracy in noisy environments by integrating noise-immune visual cues with audio signals. Nevertheless, high-noise audio inputs are prone to introducing adverse interference into the feature fusion process. To mitigate this, recent AVSR methods often adopt mask-based strategies to filter audio noise during feature interaction and fusion, yet such methods risk discarding semantically relevant information alongside noise. In this work, we propose an end-to-end noise-robust AVSR framework coupled with speech enhancement, eliminating the need for explicit noise mask generation. This framework leverages a Conformer-based bottleneck fusion module to implicitly refine noisy audio features with video assistance. By reducing modality redundancy and enhancing inter-modal interactions, our method preserves speech semantic integrity to achieve robust recognition performance. Experimental evaluations on the public LRS3 benchmark suggest that our method outperforms prior advanced mask-based baselines under noisy conditions.
We present this http URL, an open-source Julia-based toolbox for generating Stochastic Barrier Functions (SBFs) for safety verification of discrete-time stochastic systems with additive Gaussian noise. this http URL certifies linear, polynomial, and piecewise affine (PWA) systems. The latter enables verification for a wide range of system dynamics, including general nonlinear types. The toolbox implements a Sum-of-Squares (SOS) optimization approach, as well as methods based on piecewise constant (PWC) functions. For SOS-based SBFs, this http URL leverages semi-definite programming solvers, while for PWC SBFs, it offers three engines: two using linear programming (LP) and one based on gradient descent (GD). Benchmarking this http URL against the state-of-the-art shows that the tool outperforms existing tools in computation time, safety probability bounds, and scalability across over 30 case studies. Compared to its closest competitor, this http URL is up to four orders of magnitude faster, achieves significant safety probability improvements, and supports higher-dimensional systems.
Decoding infant cry causes remains challenging for healthcare monitoring due to short nonstationary signals, limited annotations, and strong domain shifts across infants and datasets. We propose a compact acoustic framework that fuses MFCC, STFT, and pitch features within a multi-branch CNN encoder and models temporal dynamics using an enhanced Legendre Memory Unit (LMU). Compared to LSTMs, the LMU backbone provides stable sequence modeling with substantially fewer recurrent parameters, supporting efficient deployment. To improve cross-dataset generalization, we introduce calibrated posterior ensemble fusion with entropy-gated weighting to preserve domain-specific expertise while mitigating dataset bias. Experiments on Baby2020 and Baby Crying demonstrate improved macro-F1 under cross-domain evaluation, along with leakageaware splits and real-time feasibility for on-device monitoring.
We introduce Whisper-RIR-Mega, a benchmark dataset of paired clean and reverberant speech for evaluating automatic speech recognition (ASR) robustness to room acoustics. Each sample pairs a clean LibriSpeech utterance with the same utterance convolved with a real room impulse response from the RIR-Mega corpus, with stratified splits by reverberation time (RT60) and direct-to-reverberant ratio (DRR). We evaluate five Whisper models (tiny through large-v3) on 1600 test samples and report word error rate (WER) and character error rate (CER) under clean and reverberant conditions. Reverberation consistently degrades performance across all model sizes; the reverb penalty in WER ranges from 0.12 to 1.07 percentage points depending on the model. We release the dataset, evaluation code, and baseline results to support reproducible research on robust ASR.
Edema is a potential indicator of underlying pathological changes. However, its low-contrast signature is often masked in conventional B-mode imaging by strong scatterers, making reliable detection challenging. Ultrasound (US) provides a non-invasive, non-ionizing, and cost-efficient imaging option that is widely used. Conventional techniques, which rely on beamforming, often lack sufficient physical interpretability. Quantitative US (QUS) can estimate physical properties such as the speed of sound (SoS) and density by solving a physics-based inverse problem directly on the measured US wavefields, i.e., the raw per-element channel data (CD), to recover their spatial distribution. However, state-of-the-art physics-based inversion methods, including full waveform inversion (FWI) and model-based quantitative radar and US (MB-QRUS), are computationally intensive and susceptible to local minima, which constrains their clinical utility. We introduce deep unfolded FWI (DUFWI), a physics-faithful unfolded iterative inversion method that exhibits FWI-like refinement behavior while learning the update rule from data, requiring only a small number of iterations for real-time SoS reconstruction. Across both simulated datasets and hardware measurements acquired with a Verasonics US system, the DUFWI significantly outperforms classical FWI and MB-QRUS in reconstruction quality while maintaining high computational efficiency. These results demonstrate real-time edema diagnosis in both simulation and hardware experiments, with phantom-based validation using cylindrical rods, supporting practical deployment under typical US imaging setting.
Large Audio Language Models (LALMs) are increasingly capable of reasoning over audio. However, existing benchmarks provide limited coverage of reasoning in polyphonic audio, where multiple sound events co-occur and induce compositional structure. In this work, we introduce PolyBench, a benchmark designed to evaluate compositional reasoning in polyphonic audio. PolyBench comprises five evaluation subsets covering counting, classification, detection, concurrency, and duration estimation, requiring reasoning over multiple concurrent events and their relations. Evaluation of state-of-the-art LALMs reveals consistent performance degradation in polyphonic audio, indicating a fundamental bottleneck in current LALMs.
Generative control policies have recently unlocked major progress in robotics. These methods produce action sequences via diffusion or flow matching, with training data provided by demonstrations. But existing methods come with two key limitations: they require expert demonstrations, which can be difficult to obtain, and they are limited to relatively slow, quasi-static tasks. In this paper, we leverage a tight connection between sampling-based predictive control and generative modeling to address each of these issues. In particular, we introduce generative predictive control, a supervised learning framework for tasks with fast dynamics that are easy to simulate but difficult to demonstrate. We then show how trained flow-matching policies can be warm-started at inference time, maintaining temporal consistency and enabling high-frequency feedback. We believe that generative predictive control offers a complementary approach to existing behavior cloning methods, and hope that it paves the way toward generalist policies that extend beyond quasi-static demonstration-oriented tasks.
We demonstrate the surprising real-world effectiveness of a very simple approach to whole-body model-predictive control (MPC) of quadruped and humanoid robots: the iterative LQR (iLQR) algorithm with MuJoCo dynamics and finite-difference approximated derivatives. Building upon the previous success of model-based behavior synthesis and control of locomotion and manipulation tasks with MuJoCo in simulation, we show that these policies can easily generalize to the real world with few sim-to-real considerations. Our baseline method achieves real-time whole-body MPC on a variety of hardware experiments, including dynamic quadruped locomotion, quadruped walking on two legs, and full-sized humanoid bipedal locomotion. We hope this easy-to-reproduce hardware baseline lowers the barrier to entry for real-world whole-body MPC research and contributes to accelerating research velocity in the community. Our code and experiment videos will be available online at:this https URL
In clinical imaging, magnetic resonance (MR) image volumes are often acquired as stacks of 2D slices with decreased scan times, improved signal-to-noise ratio, and image contrasts unique to 2D MR pulse sequences. While this is sufficient for clinical evaluation, automated algorithms designed for 3D analysis perform poorly on multi-slice 2D MR volumes, especially those with thick slices and gaps between slices. Super-resolution (SR) methods aim to address this problem, but previous methods do not address all of the following: slice profile shape estimation, slice gap, domain shift, and non-integer or arbitrary upsampling factors. In this paper, we propose ECLARE (Efficient Cross-planar Learning for Anisotropic Resolution Enhancement), a self-SR method that addresses each of these factors. ECLARE uses a slice profile estimated from the multi-slice 2D MR volume, trains a network to learn the mapping from low-resolution to high-resolution in-plane patches from the same volume, and performs SR with anti-aliasing. We compared ECLARE to cubic B-spline interpolation, SMORE, and other contemporary SR methods. We used realistic and representative simulations so that quantitative performance against ground truth can be computed, and ECLARE outperformed all other methods in both signal recovery and downstream tasks. Importantly, as ECLARE does not use external training data it cannot suffer from domain shift between training and testing. Our code is open-source and available at this https URL.
The paper presents a novel sample-based algorithm, called C*, for real-time coverage path planning (CPP) of unknown environments. C* is built upon the concept of a Rapidly Covering Graph (RCG), which is incrementally constructed during robot navigation via progressive sampling of the search space. By using efficient sampling and pruning techniques, the RCG is constructed to be a minimum-sufficient graph, where its nodes and edges form the potential waypoints and segments of the coverage trajectory, respectively. The RCG tracks the coverage progress, generates the coverage trajectory and helps the robot to escape from the dead-end situations. To minimize coverage time, C* produces the desired back-and-forth coverage pattern, while adapting to the TSP-based optimal coverage of local isolated regions, called coverage holes, which are surrounded by obstacles and covered regions. It is analytically proven that C* provides complete coverage of unknown environments. The algorithmic simplicity and low computational complexity of C* make it easy to implement and suitable for real-time on-board applications. The performance of C* is validated by 1) extensive high-fidelity simulations and 2) laboratory experiments using an autonomous robot. C* yields near optimal trajectories, and a comparative evaluation with seven existing CPP methods demonstrates significant improvements in performance in terms of coverage time, number of turns, trajectory length, and overlap ratio, while preventing the formation of coverage holes. Finally, C* is comparatively evaluated on two different CPP applications using 1) energy-constrained robots and 2) multi-robot teams.
A wireless wearable Electrical Impedance Tomography (EIT) system has been developed utilizing the AD5933 chip to achieve real-time imaging of lung respiration. The system employs a voltage excitation method tailored to human impedance characteristics, injecting current by applying a known voltage and measuring the resulting current through the body. Additionally, specific measures have been implemented to effectively suppress signal oscillations and leakage currents caused by parasitic capacitances. To enhance data acquisition speed, the system employs five parallel AD5933 units, with multiple techniques implemented to ensure high synchronization during simultaneous measurements. Performance testing shows that the system achieves a signal-to-noise ratio greater than 50 dB, a relative standard deviation below 0.3%, and a reciprocity error under 0.8%. Imaging experiments using a water tank phantom, human lungs during breathing, and a resting human calf further demonstrate that this portable EIT system can accurately measure biological tissues with high precision and low cost.
This paper investigates unequal error protection (UEP) in digital semantic communication, where semantically important bits require substantially higher reliability than less critical ones. To characterize this heterogeneity, we introduce a novel perspective that treats learned bit-flip probabilities of semantic bits as target error protection levels, thereby directly linking semantic importance to bit-level reliability. This formulation reveals that the required protection levels of the semantic bits may differ by several orders of magnitude, making short-block coding more advantageous than conventional long-block designs. Motivated by this, we develop two UEP frameworks that minimize total blocklength under heterogeneous reliability constraints. First, we propose a bit-level UEP framework based on repetition coding, providing an analytically tractable solution that precisely meets per-bit protection requirements. Second, to improve energy and blocklength efficiency, we design a block-level UEP framework in which the semantic bits are partitioned into short blocks with similar protection levels. Guided by finite blocklength capacity analysis, we derive a closed-form threshold condition for beneficial partitioning and develop a systematic algorithm for integrating modern channel codes. Simulation results on image transmission tasks demonstrate substantial gains in both task performance and transmission efficiency compared with conventional equal-protection schemes.
As climate change intensifies extreme weather events, water disasters pose growing threats to global communities, making adaptive reservoir management critical for protecting vulnerable populations and ensuring water security. Modern water resource management faces unprecedented challenges from cascading uncertainties propagating through interconnected reservoir networks. These uncertainties, rooted in physical water transfer losses and environmental variability, make precise control difficult. For example, sending 10 tons downstream may yield only 8-12 tons due to evaporation and seepage. Traditional centralized optimization approaches suffer from exponential computational complexity and cannot effectively handle such real-world uncertainties, while existing multi-agent reinforcement learning (MARL) methods fail to achieve effective coordination under uncertainty. To address these challenges, we present MARLIN, a decentralized reservoir management framework inspired by starling murmurations intelligence. Integrating bio-inspired alignment, separation, and cohesion rules with MARL, MARLIN enables individual reservoirs to make local decisions while achieving emergent global coordination. In addition, a LLM provides real-time reward shaping signals, guiding agents to adapt to environmental changes and human-defined preferences. Experiments on USGS data show that MARLIN improves uncertainty handling by 23\%, cuts computation by 35\%, and accelerates flood response by 68\%, exhibiting super-linear coordination, with complexity scaling 5.4x from 400 to 10,000 nodes. These results demonstrate MARLIN's potential for disaster prevention and protecting communities through intelligent, scalable water resource management.
ROSflight is a lean, open-source autopilot ecosystem for unmanned aerial vehicles (UAVs). Designed by researchers for researchers, it is built to lower the barrier to entry to UAV research and accelerate the transition from simulation to hardware experiments by maintaining a lean (not full-featured), well-documented, and modular codebase. This publication builds on previous treatments and describes significant additions to the architecture that improve the modularity and usability of ROSflight, including the transition from ROS 1 to ROS 2, supported hardware, low-level actuator mixing, and the simulation environment. We believe that these changes improve the usability of ROSflight and enable ROSflight to accelerate research in areas like advanced-air mobility. Hardware results are provided, showing that ROSflight is able to control a multirotor over a serial connection at 400 Hz while closing all control loops on the companion computer.
Unmanned aerial vehicle (UAV) research requires the integration of cutting-edge technology into existing autopilot frameworks. This process can be arduous, requiring extensive resources, time, and detailed knowledge of the existing system. ROSplane is a lean, open-source fixed-wing autonomy stack built by researchers for researchers. It is designed to accelerate research by providing clearly defined interfaces with an easily modifiable framework. Built around ROS 2, ROSplane allows for rapid integration of low or high-level control, path planning, or estimation algorithms. A focus on lean, easily-understood code and extensive documentation lowers the barrier to entry for researchers. Recent developments to ROSplane improve its capacity to accelerate UAV research, including the transition from ROS 1 to ROS 2, enhanced estimation and control algorithms, increased modularity, and an improved aerodynamic modeling pipeline. This aerodynamic modeling pipeline significantly reduces the effort of transitioning from simulation to real-world testing without requiring costly system identification or computational fluid dynamics tools. ROSplane's architecture reduces the effort required to integrate new research tools and methods, expediting hardware experimentation.
In complex production lines, it is essential to have strict, fast-acting rules to determine whether the system is In Control (InC) or Out of Control (OutC). This study explores a bio-inspired method that digitally mimics ant colony behavior to classify InC/OutC states and forecast imminent transitions requiring maintenance. A case study on industrial potato chip frying provides the application context. During each two-minute frying cycle, sequences of eight temperature readings are collected. Each sequence is treated as a digital ant depositing virtual pheromones, generating a Base Score. New sequences, representing new ants, can either reinforce or weaken this score, leading to a Modified Base Score that reflects the system's evolving condition. Signals such as extreme temperatures, large variations within a sequence, or the detection of change-points contribute to a Threat Score, which is added to the Modified Base Score. Since pheromones naturally decay over time unless reinforced, an Environmental Score is incorporated to reflect recent system dynamics, imitating real ant behavior. This score is calculated from the Modified Base Scores collected over the past hour. The resulting Total Score, obtained as the sum of the Modified Base Score, Threat Score, and Environmental Score, is used as the main indicator for real-time system classification and forecasting of transitions from InC to OutC. This ant colony optimization-inspired approach provides an adaptive and interpretable framework for process monitoring and predictive maintenance in industrial environments.
We present the first direct comparison between gate-based quantum computing (GQC) and adiabatic quantum computing (AQC) paradigms for solving the AC power flow (PF) equations. The PF problem is reformulated as a combinatorial optimization problem. For the GQC approach, the Quantum Approximate Optimization Algorithm (QAOA) is employed, while for the AQC approach, the problem is formulated as an Ising model. Numerical experiments on a 4-bus test system evaluate solution accuracy and computational performance. Results obtained using QAOA are benchmarked against those produced by D-Wave's Advantage system and Fujitsu's latest-generation Digital Annealer, implemented through the Quantum-Inspired Integrated Optimization (QIIO) software. The findings provide quantitative insights into the performance trade-offs, scalability, and practical viability of GQC and AQC paradigms for PF analysis, highlighting the potential of quantum optimization algorithms to address the computational challenges associated with the operation of modern electricity grids in the fault-tolerant era.
Autonomous systems often must predict the motions of nearby agents from partial and noisy data. This paper asks and answers the question: "can we learn, in real-time, a nonlinear predictive model of another agent's motions?" Our online framework denoises and forecasts such dynamics using a modified sliding-window Hankel Dynamic Mode Decomposition (Hankel-DMD). Partial noisy measurements are embedded into a Hankel matrix, while an associated Page matrix enables singular-value hard thresholding (SVHT) to estimate the effective rank. A Cadzow projection enforces structured low-rank consistency, yielding a denoised trajectory and local noise variance estimates. From this representation, a time-varying Hankel-DMD lifted linear predictor is constructed for multi-step forecasts. The residual analysis provides variance-tracking signals that can support downstream estimators and risk-aware planning. We validate the approach in simulation under Gaussian and heavy-tailed noise, and experimentally on a dynamic crane testbed. Results show that the method achieves stable variance-aware denoising and short-horizon prediction suitable for integration into real-time control frameworks.
While the zero-drift first arrival position (FAP) channel exhibits a Cauchy-distributed lateral displacement, nonzero drift in practical systems introduces advective transport that regularizes this singular limit. This letter characterizes the drift-induced transition of FAP noise from heavy-tailed algebraic decay to exponential regularization. By asymptotically examining the exact FAP density, we identify a characteristic propagation distance (CPD) that serves as the fundamental boundary separating diffusion-dominated and drift-dominated regimes. Numerical evaluations demonstrate that in low-drift environments, variance-matched Gaussian approximations severely underestimate the true communication potential, whereas the zero-drift Cauchy law provides a robust, physically grounded performance baseline.
As mobile robots increasingly operate alongside humans in shared workspaces, ensuring safe, efficient, and interpretable Human-Robot Interaction (HRI) has become a pressing challenge. While substantial progress has been devoted to human behavior prediction, limited attention has been paid to how humans perceive, interpret, and trust robots' inferences and how robots plan safe and efficient trajectories based on predicted human behaviors. To address these challenges, this paper presents XR-DT, an eXtended Reality-enhanced Digital Twin framework for mobile robots, which bridges physical and virtual spaces to enable bi-directional understanding between humans and robots. Our hierarchical XR-DT architecture integrates augmented-, virtual-, and mixed-reality layers, fusing real-time sensor data, simulated environments in the Unity game engine, and human feedback captured through wearable XR devices. Within this framework, we design a novel Human-Aware Model Predictive Path Integral (HA-MPPI) control model, an MPPI-based motion planner that incorporates ATLAS (Attention-based Trajectory Learning with Anticipatory Sensing), a multi-modal Transformer model designed for egocentric human trajectory prediction via XR headsets. Extensive real-world experimental results demonstrate accurate human trajectory prediction, and safe and efficient robot navigation, validating the HA-MPPI's effectiveness within the XR-DT framework. By embedding human behavior, environmental dynamics, and robot navigation into the XR-DT framework, our system enables interpretable, trustworthy, and adaptive HRI.
Generating clinical reports that summarize abnormal patterns, diagnostic findings, and clinical interpretations from long-term EEG recordings remains labor-intensive. We curate a large-scale clinical EEG dataset with $9{,}922$ reports paired with approximately $11{,}000$ hours of EEG recordings from $9{,}048$ patients. We therefore develop CELM, the first clinical EEG-to-Language foundation model capable of summarizing long-duration, variable-length EEG recordings and performing end-to-end clinical report generation at multiple scales, including recording description, background activity, epileptiform abnormalities, events/seizures, and impressions. Experimental results show that, with patient history supervision, our method achieves $70\%$-$95\%$ average relative improvements in standard generation metrics (e.g., ROUGE-1 and METEOR) from $0.2$-$0.3$ to $0.4$-$0.6$. In the zero-shot setting without patient history, CELM attains generation scores in the range of $0.43$-$0.52$, compared to baselines of $0.17$-$0.26$. CELM integrates pretrained EEG foundation models with language models to enable scalable multimodal learning. We release our model and benchmark construction pipeline at this https URL.