Modern methods of environmental monitoring are deficient in the lack of ability to take measurements of energy flows since traditional readings involve capturing parameters such as temperature, pressure, and humidity without considering their physical causes. The present research describes Differential Temporal Derivative Soft-Sensing (DTDSS), a physics-based approach which enables any ordinary low cost sensor array to infer estimates of the energy exchange in the environment by modeling its radiative heat fluxes. In particular, the proposed approach combines a novel paired sensor configuration along with a unique algorithmic solution called Inertial Noise Reduction or INR, that mathematically models the flow of energy in the environment by computing Global Horizontal Irradiance, or GHI, and convective heat flux. Experimental field testing has been conducted with the use of calibrated reference pyranometers supplied by the Department of Meteorology of Sri Lanka, yielding a correspondence between 8 bit embedded processor results and the reference of R2 approx. eqv. to 0.9 and RMSE approx. eqv. to 45 Watts per square meter in under 2KB RAM of a microcontroller unit.
This study presents a methodology for identifying the most informative frequencies and channels in electromyography (EMG) data to evaluate muscle recovery using Decision Tree classifiers. EMG signals, recorded from the vastus lateralis muscle during squat exercises, were analyzed across varying rest intervals to assess optimal recovery periods. By employing single Decision Tree classifiers, the study enhances interpretability, offering insights into feature importance - essential for applications in medical and sports settings where transparency is critical. The experimental protocol utilized a grid search for hyperparameter tuning and cross-validation to address class imbalance, ultimately achieving a reliable classification of rest intervals based on power spectral density features. The results indicate that a limited subset of highly informative features provides sufficient accuracy, suggesting that streamlined, interpretable models are effective for the evaluation of muscle recovery. This approach can guide future research in developing compact, robust models adapted to EMG-based diagnostics.
Portable medical imaging (PMI) has emerged as an important solution for point-of-care diagnosis in emergency, rural, and resource-limited settings where conventional imaging infrastructure is not readily available. Modalities such as portable computed tomography, portable magnetic resonance imaging, portable ultrasound, and wireless capsule endoscopy improve access to timely diagnosis, but they remain highly vulnerable to image-quality degradation caused by motion artifacts, environmental interference, hardware limitations, and unstable acquisition conditions. This review provides a systematic and quality-centered synthesis of recent advances in PMI. It introduces a taxonomy of AI-based PMI methods spanning machine learning, deep learning, transfer learning, and Transformer-based approaches, and examines their roles in image enhancement, reconstruction, quality assessment, detection, and classification. The review also analyzes PMI devices, sensing pipelines, modality-specific distortions, evaluation metrics, and publicly available datasets. In contrast to existing surveys that are mainly modality-driven or application-focused, this work emphasizes the relationship between image quality, AI robustness, and clinical usability in portable settings. Finally, it identifies current research gaps and outlines future directions toward reliable, interpretable, and clinically deployable PMI systems.
Medical image denoising (MID) lacks absolutely clean images for supervision, leading to a noisy reference problem that fundamentally limits denoising performance. Existing simulated-supervised discriminative learning (SimSDL) and simulated-supervised generative learning (SimSGL) treat noisy references as clean targets, causing suboptimal convergence or reference-biased learning, while self-supervised learning (SSL) imposes restrictive noise assumptions that are seldom satisfied in realistic MID scenarios. We propose \textbf{RelativeFlow}, a flow matching framework that learns from heterogeneous noisy references and drives inputs from arbitrary quality levels toward a unified high-quality target. RelativeFlow reformulates flow matching by decomposing the absolute noise-to-clean mapping into relative noisier-to-noisy mappings, and realizes this formulation through two key components: 1) consistent transport (CoT), a displacement map that constrains relative flows to be components of and progressively compose a unified absolute flow, and 2) simulation-based velocity field (SVF), which constructs a learnable velocity field using modality-specific degradation operators to support different medical imaging modalities. Extensive experiments on Computed Tomography (CT) and Magnetic Resonance (MR) denoising demonstrate that RelativeFlow significantly outperforms existing methods, taming MID with noisy references.
This paper focuses on data-driven fault detection, identification, and recovery (FDIR) for nonlinear control-affine systems under actuator faults. We create a unified framework in the space of probability densities, rather than on individual trajectories, using fault-indexed Perron--Frobenius (PF) operators to predict the evolution of state distributions under different fault profiles. By leveraging the probability-flow representation of the Fokker--Planck equation, we construct deterministic PF operators that reproduce exact stochastic marginals, define forward reachable density families, and establish certifiable 2-Wasserstein bounds on the divergence between fault-driven and nominal density evolutions. These provide quantitative conditions for the detectability and identifiability of various faults. The fault-indexed operators are learned from trajectory data via flow map matching (FMM), and we demonstrate that the observable FMM residual directly bounds the approximation error of the operator in the 2-Wasserstein metric. Additionally, we co-train a contraction certificate that bounds the gap between the learned operator family, the actual fault-driven density flow, and the nominal dynamics. The operator library is then used online for continuous fault parameter fitting over a continuous parameter space to generalize the learned operators to out-of-distribution (OOD) scenarios. To carry out the recovery control, we employ reachable density propagation and Gaussian mixture covariance steering. The proposed framework is validated on a 10-state spacecraft attitude-control system with four reaction wheels.
Control barrier functions (CBFs) provide a rigorous framework for designing controllers enforcing safety constraints. While CBF theory is well-developed for a finite number of safety constraints, certain applications, e.g., backup CBFs, require an infinite number of constraints. Despite the practical success of CBFs, several fundamental questions remain unanswered when safe sets are defined with an infinite numbers of constraints, including: necessary and sufficient conditions for forward set invariance, the actual definition of CBFs associated with these sets, the regularity properties of the resulting controllers, and the ability to reduce a collection of infinite constraints to a finite number. This paper addresses these questions by extending CBF theory to the infinite constraint setting. We identify regularity conditions under which Nagumo's Theorem reduces to barrier-like inequalities and when the associated CBF controllers are at least continuous. We further connect these results to optimal-decay CBFs, bridging theoretical conditions for invariance and practical instantiations of the resulting controller. Finally, we illustrate how the developed theory addresses limitations of backup CBFs.
The distribution system restoration (DSR) problem has received considerable attention over the last decade or more. Solutions to the DSR problem identify the best set or sequence of actions to perform on a distribution circuit to restore service after a disruption. The problem is challenging from a computational perspective, with engineering constraints specific to distribution systems, such as radial operations, that are difficult to effectively model. In this paper, we revisit the model for how specific loads are shed, energized and restored--and develop a formulation that more accurately models the requirements of load shedding, load energizing and restoration in distribution systems.
This work considers a system where a dual-function radar transmitter (source) performs direct communication with a reader while simultaneously enabling ambient backscatter communication from a tag. The source embeds its message into a coded pulse repeatedly transmitted over a frame, whereas the tag exploits the resulting environmental reverberation (clutter) as an ambient carrier to convey its own message. By leveraging the structure induced by the radar waveforms, we develop two signaling schemes. In the pilot-free scheme, the source and tag messages are conveyed through nonlinear vector modulation; the induced subspace structure enables both joint decoding, where all unknown quantities are simultaneously estimated, and disjoint decoding, where the tag codeword is recovered first, followed by the estimation of the source codeword and the channel vectors. In the pilot-aided scheme, pilot symbols and linearly modulated data symbols are embedded within each frame, enabling both non-iterative decoding based on pilot-derived channel estimates and iterative decoding via alternating channel estimation and data detection. We establish sufficient conditions on the source and tag codebooks that guarantee noiseless identifiability of the involved messages and channels. Finally, performance is evaluated in terms of source/tag error probabilities and channel-estimation accuracy, and the resulting system-level tradeoffs are discussed.
This paper introduces a safety-critical optimization-based control strategy that leverages control Lyapunov and control barrier functions to guide the spatial density of robotic swarms governed by the Fokker-Planck equation to a predefined target distribution. In contrast to traditional open-loop state-constrained optimal control strategies, the proposed approach operates in closed-loop, and a Voronoi-based variant further enables distributed deployments. Theoretical guarantees of safety are derived, and numerical simulations demonstrate the performance of the proposed controllers. Finally, a multi-robot experiment showcases the real-world applicability of the proposed controllers under localization and motion noises, illustrating how it is much easier for a sparse swarm to satisfy safety specifications than it is for a densely packed one.
This paper presents a novel density control framework for multi-robot systems with spatial safety and energy sustainability guarantees. Stochastic robot motion is encoded through the Fokker-Planck Partial Differential Equation (PDE) at the density level. Control Lyapunov and control barrier functions are integrated with PDEs to enforce target density tracking, obstacle region avoidance, and energy sufficiency over multiple charging cycles. The resulting quadratic program enables fast in-the-loop implementation that adjusts commands in real-time. Multi-robot experiment and extensive simulations were conducted to demonstrate the effectiveness of the controller under localization and motion uncertainties.
We present a dual-radio hierarchical mesh architecture for infrastructure-free emergency communication that exploits the complementary strengths of Bluetooth Low Energy (BLE) and LoRa. Nodes equipped with both an nRF52840 (BLE 5.0 Coded PHY) and an SX1262 (LoRa sub-GHz) form local clusters via BLE advertising-based AODV routing, while dynamically elected cluster heads bridge inter-cluster traffic over a LoRa backbone. We derive a formal traffic offloading model showing that with locality bias beta >= 0.76, validated against search-and-rescue communication patterns, the architecture keeps 82-90% of traffic on BLE, reducing LoRa energy consumption by 79% compared to LoRa-only mesh. Analytical evaluation demonstrates 10 km+ network diameter, 250-562 node scalability, and sub-50 ms intra-cluster latency on a 3.0 KB RAM footprint. To our knowledge, this is the first architecture combining BLE advertising-based mesh routing with a multi-hop LoRa backbone on commodity hardware.
Reported chest CT segmentation performance can be strongly inflated when train and test partitions mix slices from the same study. We present CTSCAN, a reproducible multi-source chest CT benchmark and research stack designed to measure what survives under patient-disjoint evaluation. The current four-class artifact aggregates 89 cases from PleThora, MedSeg SIRM, and LongCIU, and we show that the original slice-PNG workflow induces near-complete case reuse across train, validation, and test. Using the playground environment, we run a multi-seed protocol sweep with the same FPN plus EfficientNet-B0 control configuration under slice-mixed and case-disjoint evaluation. Across 3 seeds and 12 epochs per seed, the slice-mixed protocol reaches 0.6665 foreground Dice and 0.5031 foreground IoU, whereas the case-disjoint protocol reaches 0.2066 Dice and 0.1181 IoU. Removing patient reuse therefore reduces foreground Dice by 0.4599 absolute (69.00% relative) and foreground IoU by 0.3850 absolute (76.52% relative). CTSCAN packages the corrected benchmark with deterministic split manifests, explicit weak-supervision controls, a scripted multi-seed protocol sweep, and reproducible figure generation, providing a reusable basis for patient-disjoint chest CT evaluation.
The paper studies the optimal density steering problem for nonlinear continuous-time stochastic systems. To accurately capture nonlinear dynamics in high-uncertainty regions that deviate significantly from a nominal linearization point, we introduce the concept of Multiple Distribution-to-Distribution Linearization. The proposed approach first approximates the boundary distributions using Gaussian Mixture Models (GMMs), and decomposes the original nonlinear problem into a collection of Gaussian-to-Gaussian Optimal Covariance Steering (OCS) subproblems between pairs of mixture components. Each elementary OCS problem is solved via local linearization around the mean trajectory connecting the corresponding initial and terminal Gaussian components. The resulting elementary policies are then combined according to their associated conditional densities. We prove that the proposed multi-linearization approach yields tighter approximation error bounds than single-linearization for a broad class of problems. The effectiveness of the approach is demonstrated through numerical experiments on an Earth-to-Mars orbit transfer scenario.
Recovering a source signal from indirect measurements often requires estimating latent parameters, such as wireless channel states or MRI coil sensitivities, that cannot be directly observed. Here, we introduce Physics-Embedded Inverse Learning (PEIL), in which a learned estimator predicts these parameters and a fixed, physics-based inverse operator uses them to reconstruct the signal, so that training requires only the source signal as supervision. In systems where multiple parameter combinations can reconstruct the signal equally well, the estimator exploits this freedom to coordinate parameters that compensate for residual modelling errors rather than match ground-truth parameters. In high-mobility wireless communications, PEIL discovers task-optimal configurations that outperform baselines given access to ground-truth parameters, enabling zero-shot generalisation and over 20-fold reduction in training data relative to supervised baselines. To test whether these properties extend across physical domains, we apply PEIL to parallel MRI, where it discovers physically interpretable coil sensitivity maps without calibration scans, yielding reconstructions grounded purely in acquired measurements. These results demonstrate that non-identifiability, conventionally a liability, becomes a resource when the learning objective targets reconstruction quality rather than parameter accuracy.
We show that a common Lyapunov matrix exists for the convex combination of two Hurwitz matrices if and only if the intersection of the set of strict Lyapunov matrices for one matrix and the set of non-strict Lyapunov matrices for the other is nonempty. This simple relaxation is useful for the convergence analysis of the augmented primal-dual gradient flow for constrained optimization problems with affine inequality constraints, which can be viewed as a polytopic linear parameter-varying (LPV) system driven by the active-constraint selector. Under a relaxed strong convexity condition, exponential convergence is proved for the LPV system. The analysis can further be extended to the integral quadratic constraints (IQCs) framework for LPV systems to facilitate numerical search of the convergence rate.
This paper considers the problem of reachability analysis of control systems with optimal controllers, as a first step towards verifying the safety and correctness of such systems. Despite their appeal in guaranteeing task satisfaction through cost minimization, optimal controllers are often challenging to assure. In particular, as system dynamics grow in complexity, solving the resulting optimization problem may be difficult, especially given time and computation constraints on real platforms. Thus, it is essential to verify that, even if the optimal solution is not always found, such controllers still accomplish the high-level control objective. In this paper, we focus on gradient descent algorithms and design a reachability algorithm by treating gradient descent as a separate (digital) dynamical system, embedded in the original (physical) dynamical system, with controls as part of the state. We evaluate the feasibility of the proposed method on two control systems, a two-dimensional quadrotor and a cartpole.
With the push toward 6G commercialization, Frequency Range 3 (FR3) bands, specifically 7.125-8.4 GHz and 14.8-15.3 GHz, have become focal points for achieving wide-area, high-capacity coverage. However, practical deployment is often limited by the physical aperture constraints of base station antennas. This study conducts comprehensive measurements in Urban Macro (UMa) scenarios using a unified dual-band sounding platform to evaluate channel characteristics and system performance under the strict constraint of "equal physical array aperture." The results indicate that higher frequency bands exhibit increased sparsity in both delay and spatial domains. Regarding coverage, while the 15 GHz band can theoretically accommodate four times the number of antenna elements (128 elements) within the same area to compensate for path loss, empirical data reveals a residual coverage deficit of approximately 3.0 dB at cell edges compared to the 8 GHz baseline. In contrast, the 15 GHz band excels in capacity; the increased element count effectively overcomes channel sparsity, resulting in spectral efficiency (SE) that significantly outperforms the 8 GHz band. Furthermore, the research demonstrates that for a fixed number of elements, system performance remains largely insensitive to specific array topologies (e.g., 1x32, 2x16, or 4x8). Ultimately, FR3 system performance is dictated by the trade-off between propagation characteristics and hardware-enabled gain. These findings provide a theoretical foundation for spatial-domain design and help address engineering challenges for 6G base station implementation
This paper addresses the classic problem of parameter estimation (PE) in multimachine power system models. Such models are typically described by a set of nonlinear differential-algebraic equations (DAE), where generator physics and network power flow equations are coupled. DAE models are well established in classic power system textbooks, but parameter identification and estimation of generator inertia and damping together with network branch resistances and reactances for these models remain relatively underexplored. In contrast to prior approaches that rely on ODE approximations, this paper develops a joint Bayesian inference framework to perform PE of generator and network parameters while exploiting grid DAE models. It further combines physics-aware statistical modeling with computationally efficient posterior sampling to make joint Bayesian calibration practical. Results on the IEEE 9-bus system show accurate parameter recovery with well-behaved posterior uncertainty, while a short 39-bus study provides evidence that the framework remains effective on a materially larger joint-estimation problem. These results are obtained without requiring overly conservative priors.
This paper introduces a high-precision indoor positioning and tracking method that utilizes multi-site single-input single-output (SISO) radar systems. We propose a novel velocity synthesis-assisted (VSA) localization algorithm that iteratively refines target position estimates within range bins by fusing radial velocity measurements from multiple radars. This approach ensures enhanced accuracy in both velocity and position estimation. Moreover, the inherent geometric constraints introduced by velocity synthesis enable the proposed algorithm to remain robust under low signal-to-noise ratio (SNR), severe multipath propagation, and large synchronization latency. Notably, our method eliminates the use of multiple-input-multiple-output (MIMO) configurations and stringent phase synchronization requirements, substantially reducing hardware complexity while maintaining high positioning accuracy. We define standardized reference trajectories to facilitate a comprehensive and reproducible performance evaluation. Extensive simulations and experimental validations demonstrate that our multi-site radar systems achieve centimeter-level tracking accuracy for human subjects, outperforming existing methods in complex trajectory tracking.
Artificial Intelligence (AI), especially cloud platforms and large language models (LLMs), is changing how engineering is taught by making learning more interactive and flexible. However, in electrical engineering and energy systems, students often find power system dynamics difficult to understand because the concepts are abstract, math-heavy, and there are limited opportunities for hands-on practice. This paper presents an AI-based interactive learning framework that combines simulation with intelligent feedback to improve understanding and student engagement. The framework has three connected parts: an AI layer that provides explanations and guidance, a simulation layer that models system behavior, and a user layer that allows students to interact with the system in real time. These parts work together in a continuous loop where students explore how the system behaves, change parameters, and receive feedback based on the results. The paper also provides a step-by-step process to help educators design and apply AI-supported learning environments, including breaking down concepts, using simulations, and assessing performance. This method helps students learn through practice and better understand how ideas from class apply to real power systems. It also provides a practical way to improve electrical engineering education and helps students get ready to use AI tools carefully and responsibly in engineering.
Quantitative speed-of-sound (SoS) and attenuation of tissues are closely related to pathology; however, conventional B-mode images are limited to qualitative visualization. Existing ultrasound full-waveform inversion (FWI) methods for quantitative SoS reconstruction are primarily developed under double-sided or ring-shaped arrays, which limits their applicability to widely adopted routine clinical acquisitions. In this work, we develop a frequency-domain, total variation (TV)-regularized FWI framework tailored for single-sided linear ultrasound arrays, which enables quantitative reconstruction of SoS maps using standard clinical probes. To address the severe ill-posedness and computational challenges in this setup, efficient forward modeling, fast gradient evaluation, ADMM-based optimization, and multi-GPU parallelization are integrated into the inversion framework. Numerical experiments in a thyroid cyst imaging scenario demonstrate that the proposed method reconstructs the SoS of both simple (fluid-filled) and solid cysts with improved visual and quantitative performance compared to conventional FWI. Additional 2D and 3D simulations across different target and array apertures further elucidate the capabilities and limitations of single-sided ultrasound FWI.
Gas infrastructure datasets are essential inputs for energy system planning to support strategic decision-making toward decarbonization. However, relevant data are typically scattered across heterogeneous sources, including geospatial datasets, image-based infrastructure plans, and tabular data, making it complex, time-consuming, and error-prone to create topology-consistent network representations with existing this http URL paper presents QGas, an interactive toolkit for visualizing, creating, and collaboratively extending georeferenced gas infrastructure datasets. QGas integrates GIS-based geometry editing with topology-preserving graph operations in a unified web-based environment, enabling users to digitize infrastructure plans, edit network elements, manage attributes, and perform topology-consistent modifications while maintaining a georeferenced representation of the system. The toolkit is implemented using a modular architecture based on Python, JavaScript, and the Leaflet mapping library. An illustrative example demonstrates its application in extending a natural gas dataset to include hydrogen and CO2 infrastructure, highlighting QGas's capability to support the preparation of consistent multi-carrier gas infrastructure datasets for energy system planning.
Mobile molecular communication (MC) links with counting receivers are sensitive to transmitter--receiver geometry especially when nodes are mobile. We study binary detection from within-symbol count observations with unknown finite-memory inter-symbol interference (ISI) and a block-constant multiplicative geometry gain. Under a mixed-Poisson view mobility and geometry uncertainty can randomize the latent received intensity and create extra-Poisson dispersion. We propose a profiled dispersion-domain statistic $T_k^{(\Delta)}$ formed after profiling the deterministic mean shape. The statistic subtracts the intrinsic Poisson component and normalizes by the squared profiled mean to target threshold stability under the stated multiplicative-gain model. Activity gating makes conditional and gate-integrated false-alarm probabilities explicit. We characterize $T_k^{(\Delta)}$ using a time-series central-limit-theorem (CLT)-motivated Gaussian working approximation with long-run-variance dependence correction yielding Gaussian-approximate receiver operating characteristic (ROC)/bit-error-rate (BER) formulas and separability design metrics. Simulations with symbol-dependent active-Brownian mobility and finite-memory ISI support the proposed mechanism show empirical threshold stability over the tested gain range and indicate usefulness when mean-domain differences are weak unreliable or intentionally suppressed.
As energy communities move from policy design to implementation in Switzerland, understanding their performance in practice has become increasingly important. A techno-economic assessment of a regulation-compliant LEC is presented under the new Swiss legal framework in this study. A reference case without local electricity exchange is compared to a LEC scenario with internal electricity sharing. Results show that LEC participation increases local renewable utilization, reduces grid exports, and delivers economic benefits to both consumers and prosumers. A sensitivity analysis further indicates that internal electricity pricing plays a critical role in shaping trade-offs between overall efficiency and fairness in benefit distribution. This exploratory study provides practical insights to support informed decision-making and the future development of LEC in Switzerland.
How difficult can it be to implement a PID controller? The answer is twofold. Implementing the PID control law is simple and computationally inexpensive. However, this basic form will not work in practical applications. The primary reason for this is the various physical limitations of the actuator. Measurement noise, different implementations depending on the various structures (P, PI, PD or PID), bumpless transfer, and varying sampling time also result in problems rendering the basic form inoperable. PID implementation is therefore more difficult than meets the eye. This paper introduces a reference implementation of the PID controller which considers these practical issues. It includes pseudo-code, discussion of the implementation choices and simulation of carefully selected, important test cases.
Data-based adaptive optimization methods hold great promise for the performance optimization of uncertain, time-varying processes. However, current methods are often based on continuous perturbation which is in general undesired for real-life (e.g., industrial) applications. In this paper, a new uncertainty-based perturb-and-observe method is developed that addresses this limitation and reduces the required number of perturbations, while retaining the capability to track time-varying optima. The method is based on the philosophy of `only perturbing when needed,' and is shown to converge to the optimum under mild conditions. A simulation-based case study on a photo-voltaic solar array demonstrates that it can outperform the standard perturb and observe approach as well as three other data-based optimization methods.
Ultra-massive multiple-input multiple-output (UM-MIMO) is a key technology for enabling terahertz (THz) communications in 6G networks, offering high beamforming gain to combat severe path loss. However, the large antenna array expands the near-field region, resulting in a hybrid near- and far-field communication environment. This makes channel estimation significantly more challenging than in conventional networks. To address this issue, we propose a novel attention augmented channel estimator named the fixed-point attention network (FP-ANet), which integrates fixed-point theory with a dual-attention mechanism. By combining a linear and dual-attention residual blocks based non-linear estimator in each iteration, this model-driven approach effectively exploits the sparsity of THz channels in the angular-distance domain, enabling a more precise and physically-grounded channel estimation. Simulation results show that FP-ANet achieves superior channel estimation accuracy compared to state-of-the-art methods while maintaining comparable computational complexity.
Accurate automatic brain tumor segmentation in Low and Middle-Income (LMIC) countries is challenging due to the lack of defined national imaging protocols, diverse imaging data, extensive use of low-field Magnetic Resonance Imaging (MRI) scanners and limited health-care resources. As part of the Brain Tumor Segmentation (BraTS) Africa 2025 Challenge, we applied topology refinement to the state-of-the-art segmentation models like nnU-Net, MedNeXt, and a combination of both. Since the BraTS-Africa dataset has low MRI image quality, we incorporated the BraTS 2025 challenge data of pre-treatment adult glioma (Task 1) to pre-train the segmentation model and use it to fine-tune on the BraTS-Africa dataset. We added an extra topology refinement module to address the issue of deformation in prediction that arose due to topological error. With the introduction of this module, we achieved a better Normalized Surface Distance (NSD) of 0.810, 0.829, and 0.895 on Surrounding Non-Enhancing FLAIR Hyperintensity (SNFH) , Non-Enhancing Tumor Core (NETC) and Enhancing tumor (ET).
Robust stabilization conditions for uncertain switched affine systems subject to a unitary input delay are presented. They are obtained through the Lyapunov framework and a min-switching state-feedback predictive control law. The result relies on a prediction scheme considering nominal system parameters. By constructing a Lyapunov function that considers the prediction error, we demonstrate the exponential convergence of the system trajectories and system prediction to a robust limit cycle. An example is provided to validate the obtained result.
This paper studies the vehicle bicycle model under three classes of stealthy cyber-attacks: replay attacks, zero dynamics attacks, and covert attacks. Using a system-theoretic framework, we analyze the feasibility and impact of these attacks on vehicle lateral dynamics. The investigation considers different measurement configurations, including yaw rate, lateral acceleration, and longitudinal acceleration outputs, to evaluate how sensor selection influences attack detectability and system vulnerability. Each attack class is characterized in terms of required system knowledge, communication access, and impact. The analysis shows that replay attacks remain largely model-agnostic, while zero dynamics attacks are fundamentally constrained by control-oriented design choices, particularly output selection, which can eliminate unstable zero dynamics and limit the attack impact. In contrast, covert attacks, enabled by coordinated actuator and sensor manipulation, allow sustained and stealthy deviation of lateral states when sufficient access and system knowledge are available. The effects of actuator and tire saturation are also examined, revealing attack-dependent impacts on stealthiness and effectiveness. Finally, simulation case studies are conducted by using CarSim-Simulink co-simulation to validate and verify the theoretical results.
High-precision indoor sensing using monostatic multiple-input multiple-output (MIMO) radar typically relies on increasing the physical aperture size of antennas, leading to high hardware complexity and cost. To overcome this bottleneck, this paper establishes a unified framework for multi-site radar sensing based on equivalent angular resolution, together with a design methodology that uses this metric to optimize distributed Single-Input Single-Output (SISO) configurations. By mapping spatial diversity into the angular domain, the proposed metric enables a direct and physically interpretable comparison with monostatic MIMO beamwidth. The associated methodology provides a principled way to select node placement and geometry to synthesize an effective virtual aperture that suppresses angular glint and multipath. Experiments with commercial 60-GHz radars in cluttered indoor environments validate the superiority of the multi-site SISO configuration over monostatic MIMO, demonstrating a reduction in maximum localization error from 0.58 m to 0.20 m and mean error from 0.35 m to 0.12 m.
This paper presents a comprehensive link budget analysis for millimeter wave (mm-Wave) and sub-Terahertz (sub-THz) communication systems with primary focus on transmitter (TX) noise propagation, an often overlooked impairment that can dominate in scenarios where path loss is insufficient to suppress TX noise below receiver thermal and atmospheric molecular noise levels. Unlike traditional thermal noise limited analyses, this work demonstrates that TX noise is amplified by component noise figures that degrade significantly with frequency, rising from single digits to more than $15\,\mathrm{dB}$ in the sub-THz range. In the scenarios analyzed, this propagated TX noise reduces the achievable Signal-to-Noise Ratio (SNR) by approximately $15$ to $25\,\mathrm{dB}$ at short distances, creating fundamental SNR ceilings at ranges below about $10\,\mathrm{cm}$. We develop a systematic framework quantifying TX noise dominance conditions as functions of distance, frequency, and component parameters, revealing fundamental performance constraints for short-range next generation wireless systems. Our findings indicate that the TX noise figure should be as low as possible for short-range communication, and both TX noise and atmospheric molecular noise should be considered for medium- and long-range links.
Buildings account for approximately 40% of global energy consumption, and with the growing share of intermittent renewable energy sources, enabling demand-side flexibility, particularly in heating, ventilation and air conditioning systems, is essential for grid stability and energy efficiency. This paper presents a safe deep reinforcement learning-based control framework to optimize building space heating while enabling demand-side flexibility provision for power system operators. A deep deterministic policy gradient algorithm is used as the core deep reinforcement learning method, enabling the controller to learn an optimal heating strategy through interaction with the building thermal model while maintaining occupant comfort, minimizing energy cost, and providing flexibility. To address safety concerns with reinforcement learning, particularly regarding compliance with flexibility requests, we propose a real-time adaptive safety-filter to ensure that the system operates within predefined constraints during demand-side flexibility provision. The proposed real-time adaptive safety filter guarantees full compliance with flexibility requests from system operators and improves energy and cost efficiency -- achieving up to 50% savings compared to a rule-based controller -- while outperforming a standalone deep reinforcement learning-based controller in energy and cost metrics, with only a slight increase in comfort temperature violations.
ISAC systems introduce new privacy risks because an unintended sensing node may exploit the shared radio waveform to infer transmitter-related information even when the communication payload remains secure. This paper investigates transmitter privacy, defined as limiting unauthorized inference of transmitter-related information through channel estimation, in a RIS-aided multi-antenna wireless system with a transmitter, a legitimate receiver, a malicious sensor, and a RIS. The malicious sensor is assumed to estimate the transmitter--sensor channel, and the resulting channel state information can then support unauthorized sensing, inference, or related signal processing. To mitigate this threat, we consider a privacy-oriented design in which the transmitter adopts superposition-based signaling with a message signal and transmit-side artificial noise, while the RIS shapes the propagation environment in a privacy-aware manner. The channel-estimation performance at the malicious sensor is first analyzed under imperfect prior knowledge, and both the true and predicted mean-square-error expressions are derived. Based on this analysis, we formulate a joint active--passive beamforming design problem that maximizes the malicious sensor's predicted channel-estimation error subject to a communication quality-of-service constraint, a transmit-power budget, and the unit-modulus constraints of the RIS. The resulting non-convex problem is handled through a numerically efficient alternating-optimization framework based on an augmented Lagrangian reformulation. Numerical results show that RIS-assisted propagation shaping can substantially degrade unauthorized channel estimation relative to the non-RIS case while preserving reliable communication, and further show that the privacy gains also improve a more direct sensing metric, namely the malicious sensor's angle-of-arrival estimation accuracy.
This paper proposes the LiFE-CD algorithm for convergence time analysis of the max-consensus algorithm in multi-agent systems under Bernoulli-distributed link failures. Unlike existing approaches, which either assume ideal communication or provide asymptotic upper bounds on the expected convergence time, LiFE-CD deterministically computes the full probability distribution of the convergence time from network topology and individual link failure probabilities, without simulation. The full probability distribution enables deadline-aware protocol design with specified reliability guarantees. Based on geometrically distributed link delays, the proposed algorithm iteratively reduces the given network topology considering both unicast and broadcast transmissions. LiFE-CD yields exact results for acyclic networks and, for cyclic networks, tight upper bounds on the convergence time via shortest-path spanning tree construction. Numerical results confirm analytical exactness for acyclic networks, validate tightness for cyclic networks, and demonstrate improvement over existing approaches. Our complexity analysis shows reduced computational cost compared to Monte Carlo simulations, while eliminating stochastic variability and enhancing reproducibility. All results extend directly to min-consensus by structural equivalence.
Lung cancer remains one of the leading causes of cancer-related mortality worldwide. Conventional computed tomography (CT) imaging, while essential for detection and staging, has limitations in distinguishing benign from malignant lesions and providing interpretable diagnostic insights. To address this challenge, this study proposes a dual-modal artificial intelligence framework that integrates CT radiology with hematoxylin and eosin (H&E) histopathology for lung cancer diagnosis and subtype classification. The system employs convolutional neural networks to extract radiologic and histopathologic features and incorporates clinical metadata to improve robustness. Predictions from both modalities are fused using a weighted decision-level integration mechanism to classify adenocarcinoma, squamous cell carcinoma, large cell carcinoma, small cell lung cancer, and normal tissue. Explainable AI techniques including Grad-CAM, Grad-CAM++, Integrated Gradients, Occlusion, Saliency Maps, and SmoothGrad are applied to provide visual interpretability. Experimental results show strong performance with accuracy up to 0.87, AUROC above 0.97, and macro F1-score of 0.88. Grad-CAM++ achieved the highest faithfulness and localization accuracy, demonstrating strong correspondence with expert-annotated tumor regions. These results indicate that multimodal fusion of radiology and histopathology can improve diagnostic performance while maintaining model transparency, suggesting potential for future clinical decision support systems in precision oncology.
This paper addresses the numerical optimization of proportional-integral-derivative (PID) controllers for linear time-invariant systems with delays, where the derivative action is implemented using a low-pass filter. While performance assessment is often based on the spectral abscissa of the ideal PID-controlled system, the inclusion of a derivative filter fundamentally alters the closed-loop spectral properties and cannot be treated as a post-processing step. In particular, the spectral abscissa of the filtered closed-loop system may differ significantly from that of its unfiltered counterpart, potentially affecting both stability and performance. We propose a systematic numerical design framework in which the PID gains and the filter constant are optimized simultaneously by directly minimizing the spectral abscissa of the filtered closed-loop system. Treating the filter as an integral part of the control design allows us to reconcile robustness at high frequencies, in the sense of mitigating fragility issues due to approximate identities, with performance at low frequencies, in addition to counter measurement noise amplification. At the end of the presentation, numerical examples illustrate the proposed approach and highlight the benefits of controller-filter co-design. The results apply to general linear systems with input and/or state delays and are valid for both single-input single-output (SISO) and multi-input multi-output (MIMO) configurations.
Computational complexity has been a major challenge in game-theoretic model predictive control (GT-MPC), as real-time solutions to a game (e.g., Nash equilibria (NEs)) have to be computed at each sampling instant of an MPC. This challenge is especially critical in autonomous driving, where interactions may involve many agents, and decisions must be made at fast sampling rates. We show that this challenge can be addressed through time-distributed solution-seeking iterations designed based on, e.g., Newton and Newton--Kantorovich methods. Specifically, the autonomous vehicle decision-making problem is first formulated as a GT-MPC problem. To ensure solution attainability, a potential game framework is adopted. Within this framework, both potential-function optimization and best-response dynamics are used to seek the NE. To enable real-time implementation, Newton and Newton--Kantorovich methods are employed to solve the optimization problems arising in the NE-seeking algorithms, with their iterations distributed over time. Numerical experiments on an intersection-crossing scenario demonstrate that the proposed methods achieve effective real-time performance.
This paper presents a unified optimization framework for phase change material (PCM) based cooling systems. Thermal management is critical in applications such as photovoltaic (PV) modules, battery packs, and power electronics, where excessive heat reduces performance and lifespan. Designing such systems is challenging because energy dynamics, capacity, heat rejection, and structural constraints must all be considered. Although prior studies have investigated PCM applications and heat transfer enhancement, there are limited efforts that unify such diverse performance objectives through formalized design methods. This paper develops a framework that formulates the PCM design problem using critical energy-based terms, with static and dynamic objectives capturing the PCM physical design and control aspects. Two case studies are used to validate the approach: the first explores passive cooling, and the second implements an active cooling configuration. The results compare the design and control of these systems, showing improvement in individual performance metrics between the two options.
This article proposes a data-driven framework to verify the distributed conditions that guarantee the system-wide stability for interconnected power systems. To guarantee system wide stability, the dynamics of each bus are required to satisfy an output differential passivity (ODP) condition with a sufficient index. These ODP indices uniformly quantify the impacts on the system-wide stability of individual bus dynamics and the coupling strength from the power network. To obtain these indices without explicit physical models, we derive a data-driven linear matrix inequality (LMI) criterion based exclusively on measured input-state trajectories. Furthermore, extracting the optimal ODP index is formulated as a convex semi-definite programming (SDP) problem. Simulations verify the effectiveness of the proposed method under both single-device offline evaluation and system-wide online certification scenarios.
With a steady increase in the inverter technology integration to the grid, frequency response of the large inter-connection system becomes more unpredictable. This leads to a significant change in the boundaries of the coherent region, which highly depends on the changing disturbance locations and operating conditions. While most of the existing coherency identification is based on a single large generator outage, it is important to identify these boundaries in view of wide range of disturbances. With large amount of inverters in the system, there is increase in the dynamic interactions of the various grid components leading to a need for such boundary identification. This paper presents the multi-view consensus algorithm to identify coherency in the case of variable grid operating conditions and wide range of disturbances. The proposed approach is demonstrated by identifying the coherent regions in the miniWECC 240 bus test system.
We examine market outcomes in energy transport networks with a focus on gas-fired generators, which are producers in a wholesale electricity market and consumers in the natural gas market. Market administrators monitor bids to determine whether a participant wields market power to manipulate the price of energy, reserves, or financial transmission rights. If economic or physical withholding of generation from the market is detected, mitigation is imposed by replacing excessive bids with reference level bids to prevent artificial supply shortages. We review market monitoring processes in the power grid, and present scenarios in small interpretable test networks to show how gas-fired generators can bid in the gas market to alter outcomes in a power market. We develop a framework based on DC optimal power flow (OPF) and steady-state optimal gas flow (OGF) formulations to represent two interacting markets with structured exchange of price and quantity bids. We formulate optimization-based methods to identify market power in a power grid, as well as to identify market conditions that indicate market power being exerted by a generator using gas market bids.
This study presents a triadic analysis of energy storage operation under multi-stage model predictive control, investigating the interplay between data characteristics, forecast uncertainty, planning horizon, and battery c-rate. Synthetic datasets are generated to systematically explore variations in data profiles and uncertainty, enabling parametrization and the construction of relationships that map these characteristics to optimal horizon length. Results reveal the presence of an effective horizon, defined as the look-ahead length beyond which additional forecast information provides limited operational benefit. Accounting for this horizon can reduce computational costs while maintaining optimal performance. The study provides optimal horizon lengths across a broad range of combinations of battery types, uncertainty levels, and data profiles, offering practical guidance for industrial storage operation. It also quantifies revenue losses due to forecast uncertainty, showing that errors can impact performance even for fast batteries. Finally, the framework lays the groundwork for future machine learning approaches that map dataset parametrization to optimal horizons, supporting continuous optimization in industrial settings without heavy computation.
Recent progress in visual brain decoding from fMRI has been enabled by large-scale datasets such as the Natural Scenes Dataset (NSD) and powerful diffusion-based generative models. While current pipelines are primarily optimized for perception, their performance under mental-imagery remains less well understood. In this work, we study how a state-of-the-art (SOTA) perception decoder (DynaDiff) can be adapted to reconstruct imagined content from the Imagery-NSD benchmark. We propose a latent functional alignment approach that maps imagery-evoked activity into the pretrained model's conditioning space, while keeping the remaining components frozen. To mitigate the limited amount of matched imagery-perception supervision, we further introduce a retrieval-based augmentation strategy that selects semantically related NSD perception trials. Across four subjects, latent functional alignment consistently improves high-level semantic reconstruction metrics relative to the frozen pretrained baseline and a voxel-space ridge alignment baseline, and enables above-chance decoding from multiple cortical regions. These results suggest that semantic structure learned from perception can be leveraged to stabilize and improve visual imagery decoding under out-of-distribution conditions.
Large-scale AI model training workloads use thousands of GPUs operating in tightly synchronized loops. During synchronous communication, start-up, shut-down, and checkpointing, GPU power consumption can swing from peak to idle within milliseconds. These large and rapid load swings endanger grid infrastructure as they induce steep power ramp rates, voltage and frequency shifts, and reactive power transients that can damage transformers, converters, and protection equipment. To solve this problem, we introduce EasyRider, a power architecture to mitigate power fluctuations at the rack level. EasyRider uses passive components and actively-controlled auxiliary energy storage to attenuate rack power swings. A software system continually monitors the energy storage system to maximize its lifetime in the presence of frequent charge/discharge cycles. EasyRider filters rack power variations to be within grid safety requirements without requiring software modifications to AI training frameworks or wasting energy. We evaluate EasyRider on a 400VDC-rated prototype system against published workload traces and our own GPU testbed, demonstrating its effectiveness across heterogeneous power levels and workload power profiles.
Continuum robots are well suited for navigating confined and fragile environments, such as vascular or endoluminal anatomy, where contact with surrounding structures is often unavoidable. While controlled contact can assist motion, unfavorable contact can degrade controllability, induce kinematic singularities, or introduce safety risks. We present a contact-aware planning approach that evaluates contact quality, penalizing hazardous interactions, while permitting benign contact. The planner produces kinematically feasible trajectories and contact-aware Jacobians which can be used for closed-loop control in hardware experiments. We validate the approach by testing the integrated system (planning, control, and mechanical design) on anatomical models from patient scans. The planner generates effective plans for three common anatomical environments, and, in all hardware trials, the continuum robot was able to reach the target while avoiding dangerous tip contact (100% success). Mean tracking errors were 1.9 +/- 0.5 mm, 1.2 +/- 0.1 mm, and 1.7 +/- 0.2 mm across the three different environments. Ablation studies showed that penalizing end-of-continuum-segment (ECS) contact improved manipulability and prevented hardware failures. Overall, this work enables reliable, contact-aware navigation in highly constrained environments.
Always-on converter health monitoring demands sub-mW edge inference, a regime inaccessible to GPU-based physics-informed neural networks. This work separates spiking temporal processing from physics enforcement: a three-layer leaky integrate-and-fire SNN estimates passive component parameters while a differentiable ODE solver provides physics-consistent training by decoupling the ODE physics loss from the unrolled spiking loop. On an EMI-corrupted synchronous buck converter benchmark, the SNN reduces lumped resistance error from $25.8\%$ to $10.2\%$ versus a feedforward baseline, within the $\pm 10\%$ manufacturing tolerance of passive components, at a projected ${\sim}270\times$ energy reduction on neuromorphic hardware. Persistent membrane states further enable degradation tracking and event-driven fault detection via a $+5.5$ percentage-point spike-rate jump at abrupt faults. With $93\%$ spike sparsity, the architecture is suited for always-on deployment on Intel Loihi 2 or BrainChip Akida.
Low earth orbit (LEO) satellites are a key technology to enable connectivity for rural and remote users. Communication satellites in LEO can provide coverage to much larger areas than terrestrial or aerial systems, while offering improved data rates when compared with geostationary systems. However, a major challenge with LEO satellite communications is the high mobility of the satellite, which results in a rapidly changing communication channel. Due to this, it is challenging to fairly allocate communication resources to multiple users in the system. This work proposes an Adaptive Power Allocation and Scheduling Scheme (APASS) to ensure user fairness in the downlink of a LEO satellite system serving mobile ground users. First, a novel channel and transmission model is introduced to capture the variability in channel statistics due to the satellite's trajectory. Then, a non-convex optimization problem is formulated to maximize the minimum rate across all ground users over a fixed set of time slots. To solve this problem, the proposed APASS dynamically allocates power and schedules transmissions based on predicted future channel gains. Numerical results show that APASS achieves strong performance even with substantial prediction errors, faring close to an upper bound that assumes perfect future channel knowledge. Furthermore, it improves the minimum user rate by a factor of 2.98 compared to equal-power allocation and maintains user fairness with a Jain's fairness index of well above 0.99.
In this work, we present Qwen3.5-Omni, the latest advancement in the Qwen-Omni model family. Representing a significant evolution over its predecessor, Qwen3.5-Omni scales to hundreds of billions of parameters and supports a 256k context length. By leveraging a massive dataset comprising heterogeneous text-vision pairs and over 100 million hours of audio-visual content, the model demonstrates robust omni-modality capabilities. Qwen3.5-Omni-plus achieves SOTA results across 215 audio and audio-visual understanding, reasoning, and interaction subtasks and benchmarks, surpassing Gemini-3.1 Pro in key audio tasks and matching it in comprehensive audio-visual understanding. Architecturally, Qwen3.5-Omni employs a Hybrid Attention Mixture-of-Experts (MoE) framework for both Thinker and Talker, enabling efficient long-sequence inference. The model facilitates sophisticated interaction, supporting over 10 hours of audio understanding and 400 seconds of 720P video (at 1 FPS). To address the inherent instability and unnaturalness in streaming speech synthesis, often caused by encoding efficiency discrepancies between text and speech tokenizers, we introduce ARIA. ARIA dynamically aligns text and speech units, significantly enhancing the stability and prosody of conversational speech with minimal latency impact. Furthermore, Qwen3.5-Omni expands linguistic boundaries, supporting multilingual understanding and speech generation across 10 languages with human-like emotional nuance. Finally, Qwen3.5-Omni exhibits superior audio-visual grounding capabilities, generating script-level structured captions with precise temporal synchronization and automated scene segmentation. Remarkably, we observed the emergence of a new capability in omnimodal models: directly performing coding based on audio-visual instructions, which we call Audio-Visual Vibe Coding.
Automated classification of electrocardiogram (ECG) signals is a useful tool for diagnosing and monitoring cardiovascular diseases. This study compares three traditional machine learning algorithms (Decision Tree Classifier, Random Forest Classifier, and Logistic Regression) and three deep learning models (Simple Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Complex CNN (ECGLens)) for the classification of ECG signals from the PTB-XL dataset, which contains 12-lead recordings from normal patients and patients with various cardiac conditions. The DL models were trained on raw ECG signals, allowing them to automatically extract discriminative features. Data augmentation using the Stationary Wavelet Transform (SWT) was applied to enhance model performance, increase the diversity of training samples, and preserve the essential characteristics of the ECG signals. The models were evaluated using multiple metrics, including accuracy, precision, recall, F1-score, and ROC-AUC. The ECG-Lens model achieved the highest performance, with 80% classification accuracy and a 90% ROC-AUC. These findings demonstrate that deep learning architectures, particularly complex CNNs substantially outperform traditional ML methods on raw 12-lead ECG data, and provide a practical benchmark for selecting automated ECG classification models and identifying directions for condition-specific model development.
We present ArtifactNet, a lightweight framework that detects AI-generated music by reframing the problem as forensic physics -- extracting and analyzing the physical artifacts that neural audio codecs inevitably imprint on generated audio. A bounded-mask UNet (ArtifactUNet, 3.6M parameters) extracts codec residuals from magnitude spectrograms, which are then decomposed via HPSS into 7-channel forensic features for classification by a compact CNN (0.4M parameters; 4.0M total). We introduce ArtifactBench, a multi-generator evaluation benchmark comprising 6,183 tracks (4,383 AI from 22 generators and 1,800 real from 6 diverse sources). Each track is tagged with bench_origin for fair zero-shot evaluation. On the unseen test partition (n=2,263), ArtifactNet achieves F1 = 0.9829 with FPR = 1.49%, compared to CLAM (F1 = 0.7576, FPR = 69.26%) and SpecTTTra (F1 = 0.7713, FPR = 19.43%) evaluated under identical conditions with published checkpoints. Codec-aware training (4-way WAV/MP3/AAC/Opus augmentation) further reduces cross-codec probability drift by 83% (Delta = 0.95 -> 0.16), resolving the primary codec-invariance failure mode. These results establish forensic physics -- direct extraction of codec-level artifacts -- as a more generalizable and parameter-efficient paradigm for AI music detection than representation learning, using 49x fewer parameters than CLAM and 4.8x fewer than SpecTTTra.
Spatial-temporal estimation of signals on graph edges is challenging because most conventional Graph Signal Processing techniques are defined on the graph nodes. Leveraging the Line Graph transform, the Line Graph Least Mean Square (LGLMS) algorithm unifies the Line Graph transformation with classical adaptive filters, reinterpreting online estimation techniques for time-varying signals on graph edges. LGLMS leverages the full power of existing GSP techniques on signals on edges by embedding edge signals into node representations, eliminating the necessity of redefining edge-specific techniques. Experimenting with transportation graphs and meteorological graphs, with the signal observations having noisy and missing values, we confirmed that LGLMS is suitable for the online prediction of time-varying edge signals.
This paper proposes a composite learning backstepping control (CLBC) strategy based on modular backstepping and high-order tuners to achieve closed-loop exponential stability without high-gain feedback and PE. A novel composite learning mechanism that maximizes the staged exciting strength is designed for parameter estimation, enabling parameter convergence under interval excitation (IE) or even partial IE, which is strictly weaker than PE. An extra prediction error is employed in the adaptive law to ensure the transient performance without high-gain feedback. Simulations have demonstrated the effectiveness and superiority of the proposed method in both parameter estimation and control compared to state-of-the-art methods.
High-throughput biological imaging is often constrained by a trade-off between acquisition speed and image quality. Fast imaging modalities, such as wide-field fluorescence microscopy, enable large-scale data acquisition but suffer from reduced contrast and resolution, whereas high-resolution techniques, including confocal microscopy or single-molecule localization microscopy-based super-resolution techniques, provide superior image quality at the cost of throughput and instrument time. Here, we present a deep learning-based approach for modality transfer across independent microscopes, enabling the transformation of low-quality images acquired on fast systems into high-quality representations comparable to those obtained using advanced imaging platforms. To achieve this, we employ a generative adversarial network (GAN)-based model trained on paired datasets acquired on physically separate wide-field and confocal microscopes, demonstrating that image quality can be reliably transferred between independent instruments. Quantitative evaluation shows substantial improvement in structural similarity and signal fidelity, with median SSIM and PSNR of 0.94 and 31.87, respectively, compared to 0.83 and 21.48 for the original wide-field images. These results indicate that key structural features can be recovered with high accuracy. Importantly, this approach enables a workflow in which high-throughput imaging can be performed on fast, accessible microscopy systems while preserving the ability to computationally recover high-quality structural information. High-resolution microscopy can then be reserved for targeted validation, reducing acquisition time and improving overall experimental efficiency. Together, our results establish deep learning-enabled modality transfer as a practical strategy for bridging independent microscopy systems and supporting scalable, high-content imaging workflows.
Sizing a residential microgrid efficiently requires solving a coupled design-and-operation problem: photovoltaic (PV) and battery capacities should be chosen in a way that reflects how the system will actually be dispatched over time. This paper proposes BOOST, or Battery-solar Ordinal Optimization Sizing Technique, which combines ordinal optimization (OO) with mixed-integer linear programming (MILP). OO is used to screen a large set of candidate battery/PV designs with a simple linear model and then re-evaluate only the most promising designs with a more accurate MILP that captures diesel commitment logic. Relative to the original short paper, this expanded manuscript retains the full methodological narrative but refreshes the quantitative section using a new synthetic benchmark dataset suite generated from the released clean reimplementation. The suite contains five yearly synthetic datasets/configurations: base, cheap battery, cheap PV, expensive diesel, and high peak tariff. On the base synthetic dataset, the best accurate design is a 500 kWh battery with 1833.3 kW of PV, achieving 13.169 c/kWh, while BOOST improves upon dynamic programming and greedy baselines. Across the full 10 x 10 design grid, the LP and MILP rankings are effectively identical (rho = 1.000), the paper-style choice of N = 90 and s = 18 recovers the global accurate optimum, and the OO-based workflow reduces runtime by 51.8% relative to exhaustive accurate evaluation on the refreshed synthetic benchmark run. Because these added datasets are synthetic, they should be read as methodological stress tests rather than as direct empirical claims about any specific real-world site. Code is available at this https URL.
In heterogeneous integration, different dies may employ distinct technologies, making floorplanning across multiple dies inherently coupled with technology assignment. By assuming a fixed technology, almost all prior floorplanning studies were developed without addressing the challenge of technology assignment. This work presents the first systematic study of multi-die floorplanning that treats technology choice as a variable. To address the challenge of variable block areas, we incorporate a recent machine learning technique for rapid PPA estimation. Our methods jointly optimize area, wirelength, performance, power, and cost, thereby highlighting the importance of technology assignment. Experimental evaluations, validated with a commercial tool for both 2.5D and 3D ICs, demonstrate that our systematic optimizations significantly outperform a greedy approach.
In the context of building electrification, the operation of distributed energy resources integrating multiple energy carriers (electricity, heat, mobility) poses a significant challenge due to the nonlinear device dynamics, uncertainty, and computational issues. As such, energy management systems seek to decide the power dispatch in the best way possible. The objective is to minimize and balance operative costs (energy bills or asset degradation) with user requirements (mobility, heating, etc.). Current energy management uses empirical battery ageing models outside of their specific fitting conditions, resulting in inaccuracies and poor performance. Moreover, the link to thermal systems is also overlooked. This paper presents an ageing-aware nonlinear economic model predictive controller for electrified buildings that incorporates physics-based battery ageing models. The models distinguish between energy storage systems (chemistry, ageing state, etc.) and make explicit the trade-off between grid cost and battery degradation. The proposed algorithm can either cut down on grid costs or extend battery lifetime (electric vehicle or stationary battery packs). Additionally, substituting NMC cells with LFP chemistries optimizes grid performance during the summer, yielding a 10% grid cost reduction and a 20% decrease in degradation. Finally, the grid cost and degradation of the presented MPC when using aged batteries are improved with respect to the state of the art by 10% and 5% respectively, in periods with high solar generation and low thermal loads like summer.
This paper addresses the mathematical modeling and compensation of stochastic discrete-time clock jitter in analog-to-digital converters (ADCs). We model the stochastic clock jitter as a first-order autoregressive (AR(1)) process, and we propose two novel, computationally efficient, pilot-assisted dejittering algorithms for baseband signals: one based on solving a sequence of weighted least-squares problems, and another that exploits the correlated jitter structure via a Kalman filter-based routine. We also propose a conditional maximum-likelihood estimator for the autoregressive parameters, enabling near-optimal Kalman-filter performance even when such parameters vary over time. We further provide a mathematical analysis of the induced linearization errors, and we complement the theory with synthetic simulations to evaluate the proposed techniques across different scenarios. The proposed techniques are shown to yield a 1-15 dB improvement in signal-to-noise-and-distortion ratio (SINADR) and 0.02-1.6 dB in symbol error vector magnitude (EVM), depending on impairment severity and pilot density. The Kalman smoother generally provides superior performance by leveraging additional temporal information.
A procedure for the determination of the optimal group-delay of a Linearly-Constrained Minimum-Variance (LCMV) beamformer is proposed. Two ways of selecting the optimal delay are recommended: the first is the solution that minimizes the noise power; the second is the solution that minimizes the processing delay. The potential of this hitherto unexplored degree of design freedom is investigated using simulated Very-High-Frequency (VHF) communication, and Ultra-High-Frequency (UHF) bistatic radar, applications
Scaled Relative Graphs (SRGs) provide a novel graphical frequency-domain method for the analysis of nonlinear systems. There have been recent efforts to generalize SRG analysis to Multiple-Input Multiple-Output (MIMO) systems. However, these attempts yielded only results for square systems, due to the inherent Hilbert space structure of the SRG. In this paper, we develop an SRG analysis method that accommodates non-square operators. The key element is the embedding of operators to a space of operators acting on a common Hilbert space, while restricting the input space to the original input dimension, to avoid conservatism. We generalize SRG interconnection rules to restricted input spaces and develop stability theorems to guarantee causality, well-posedness and (incremental) $L_2$-gain bounds for the overall interconnection. We show utilization of the proposed theoretical concepts on the analysis of nonlinear systems in a Linear Fractional Representation (LFR) form, which is a rather general class of systems, and the LFR is directly utilizable for control. Moreover, we provide formulas for the computation of MIMO SRGs of stable LTI operators and diagonal and non-square static nonlinear operators. Finally, we demonstrate the advantages of our embedding approach on several examples.
The rapid advancement of artificial intelligence (AI) in healthcare imaging has revolutionized diagnostic medicine and clinical decision-making processes. This work presents an intelligent multimodal framework for medical image analysis that leverages Vision-Language Models (VLMs) in healthcare diagnostics. The framework integrates Google Gemini 2.5 Flash for automated tumor detection and clinical report generation across multiple imaging modalities including CT, MRI, X-ray, and Ultrasound. The system combines visual feature extraction with natural language processing to enable contextual image interpretation, incorporating coordinate verification mechanisms and probabilistic Gaussian modeling for anomaly distribution. Multi-layered visualization techniques generate detailed medical illustrations, overlay comparisons, and statistical representations to enhance clinical confidence, with location measurement achieving 80 pixels average deviation. Result processing utilizes precise prompt engineering and textual analysis to extract structured clinical information while maintaining interpretability. Experimental evaluations demonstrated high performance in anomaly detection across multiple modalities. The system features a user-friendly Gradio interface for clinical workflow integration and demonstrates zero-shot learning capabilities to reduce dependence on large datasets. This framework represents a significant advancement in automated diagnostic support and radiological workflow efficiency, though clinical validation and multi-center evaluation are necessary prior to widespread adoption.
Accurate source localization in Multi-Platform Radar Networks (MPRNs) benefits from exploiting both range and angle measurements under robust estimation. In this paper, we propose a robust Euclidean distance matrix (EDM) optimization model that simultaneously integrates range measurements, angle information, and the least absolute deviation ($\ell_1$-norm) criterion for the case of 3D single-source localization (3DSSL). A key theoretical contribution of this work is the rigorous reformulation of {existing} 3D angle measurements into simple box constraints on the Euclidean distances. Unlike previous approximations, we achieve this by reducing each of the 3D angle measurements to a two-dimensional nonlinear optimization problem, whose global minimum and maximum solutions can be characterized and utilized to get the lower and upper bounds of the distances from the unknown source to the sensors. To solve the resulting rank-constrained EDM problem, we develop an efficient algorithm based on the majorization penalty method. Extensive numerical experiments confirm that the new EDM model significantly outperforms leading solvers in terms of localization accuracy and computational efficiency, particularly in low Signal-to-Noise Ratio (SNR) scenarios.
Modeling dynamical systems is crucial across the science and engineering fields for accurate prediction, control, and decision-making. Recently, machine learning (ML) approaches, particularly neural ordinary differential equations (NODEs), have emerged as a powerful tool for data-driven modeling of continuous-time dynamics. Nevertheless, standard NODEs require a large number of data samples to remain consistent under varying control inputs, posing challenges to generate sufficient simulated data and ensure the safety of control design. To address this gap, we propose trajectory-sensitivity-aware (TRASE-)NODEs, which construct an augmented system for both state and sensitivity, enabling simultaneous learning of their dynamics. This formulation allows the adjoint method to update gradients in a memory-efficient manner and ensures that time-invariant control set-point effects are captured in the learned dynamics. We evaluate TRASE-NODEs using damped oscillator and inverter-based resources (IBRs). The results show that TRASE-NODEs generalize better from the limited training data, yielding lower prediction errors than standard NODEs for both examples. The proposed framework offers a data-efficient, control-oriented modeling approach suitable for dynamic systems that require accurate trajectory sensitivity prediction.
Reconfigurable Intelligent Surface (RIS) technology has emerged as a key enabler for future wireless communications. However, its potential is constrained by the difficulty of acquiring accurate user-to-RIS channel state information (CSI), due to the cascaded channel structure and the high pilot overhead of non-parametric methods. Unlike a passive RIS, where the reflected signal suffers from multiplicative path loss, an active RIS amplifies the signal, improving its practicality in real deployments. In this letter, we propose a parametric channel estimation method tailored for active RISs. The proposed approach integrates an active RIS model with an adaptive Maximum Likelihood Estimator (MLE) to recover the main channel parameters using a minimal number of pilots. To further enhance performance, an adaptive active RIS configuration strategy is employed, which refines the beam direction based on an initial user location estimate. Moreover, an orthogonal angle-pair codebook is used instead of the conventional Discrete Fourier Transform (DFT) codebook, significantly reducing the codebook size and ensuring reliable operation for both far-field and near-field users. Extensive simulations demonstrate that the proposed method achieves near-optimal performance with very few pilots compared to non-parametric approaches. Its performance is also benchmarked against that of a traditional passive RIS under the same total power budget to ensure fairness. Results show that active RIS yields higher spectral efficiency (SE) by eliminating the multiplicative fading inherent in passive RISs and allocating more resources to data transmission
Multi-sensory systems for embodied intelligence, from wearable body-sensor networks to instrumented robotic platforms, routinely face a sensor-asymmetry problem: the richest modality available during laboratory data collection is absent or impractical at deployment time due to cost, fragility, or interference with physical interaction. We introduce PULSE, a general framework for privileged knowledge transfer from an information-rich teacher sensor to a set of cheaper, deployment-ready student sensors. Each student encoder produces shared (modality-invariant) and private (modality-specific) embeddings; the shared subspace is aligned across modalities and then matched to representations of a frozen teacher via multi-layer hidden-state and pooled-embedding distillation. Private embeddings preserve modality-specific structure needed for self-supervised reconstruction, which we show is critical to prevent representational collapse. We instantiate PULSE on the wearable stress-monitoring task, using electrodermal activity (EDA) as the privileged teacher and ECG, BVP, accelerometry, and temperature as students. On the WESAD benchmark under leave-one-subject-out evaluation, PULSE achieves 0.994 AUROC and 0.988 AUPRC (0.965/0.955 on STRESS) without EDA at inference, exceeding all no-EDA baselines and matching the performance of a full-sensor model that retains EDA at test time. We further demonstrate modality-agnostic transfer with ECG as teacher, provide extensive ablations on hidden-state matching depth, shared-private capacity, hinge-loss margin, fusion strategy, and modality dropout, and discuss how the framework generalizes to broader embodied sensing scenarios involving tactile, inertial, and bioelectrical modalities.
As Electric Vehicle (EV) adoption accelerates in urban environments, optimizing charging infrastructure is vital for balancing user satisfaction, energy efficiency, and financial viability. This study advances beyond static models by proposing a digital twin framework that integrates agent-based decision support with embedded optimization to dynamically simulate EV charging behaviors, infrastructure layouts, and policy responses across scenarios. Applied to a localized urban site (a university campus) in Hanoi, Vietnam, the model evaluates operational policies, EV station configurations, and renewable energy sources. The interactive dashboard enables seasonal analysis, revealing a 20% drop in solar efficiency from October to March, with wind power contributing under 5% of demand, highlighting the need for adaptive energy management. Simulations show that dynamic notifications of newly available charging slots improve user satisfaction, while gasoline bans and idle fees enhance slot turnover with minimal added complexity. Embedded metaheuristic optimization identifies near-optimal mixes of fast (30kW) and standard (11kW) solar-powered chargers, balancing energy performance, profitability, and demand with high computational efficiency. This digital twin provides a flexible, computation-driven platform for EV infrastructure planning, with a transferable, modular design that enables seamless scaling from localized to city-wide urban contexts.
We study the contraction of Hodgkin-Huxley model and its role in the reliability of spike timings. Without input, the model is contractive in the region of physiological interest. With impulsive synaptic inputs, contraction is retained provided that the input events are sparse enough. Contraction is lost when the input firing rate is too high. Spike timings are shown to be reliable in the contracting regime.
Transportation electrification introduces strong coupling between the power and transportation systems. In this paper, we generalize the classical notion of Braess' paradox to coupled power and transportation systems, and examine how the cross-system coupling induces new types of Braess' paradoxes. To this end, we model the power and transportation networks as graphs, coupled with charging points connecting to nodes in both graphs. The power system operation is characterized by the economic dispatch optimization, while the transportation system user equilibrium models travelers' route and charging choices. By analyzing simple coupled systems, we demonstrate that capacity expansion in either transportation or power system can deteriorate the performance of both systems, and uncover the fundamental mechanisms for such new Braess' paradoxes to occur. We also provide necessary and sufficient conditions of the occurrences of Braess' paradoxes for general coupled systems, leading to managerial insights for infrastructure planners. For general networks, through characterizing the generalized user equilibrium of the coupled systems, we develop novel charging pricing policies to mitigate them.
Advanced speech synthesis technologies have enabled highly realistic speech generation, posing security risks that motivate research into audio deepfake detection (ADD). While state space models (SSMs) offer linear complexity, pure causal SSMs architectures often struggle with the content-based retrieval required to capture global frequency-domain artifacts. To address this, we explore the scaling properties of hybrid architectures by proposing XLSR-MamBo, a modular framework integrating an XLSR front-end with synergistic Mamba-Attention backbones. We systematically evaluate four topological designs using advanced SSM variants, Mamba, Mamba2, Hydra, and Gated DeltaNet. Experimental results demonstrate that the MamBo-3-Hydra-N3 configuration achieves competitive performance compared to other state-of-the-art systems on the ASVspoof 2021 LA, DF, and In-the-Wild benchmarks. This performance benefits from Hydra's native bidirectional modeling, which captures holistic temporal dependencies more efficiently than the heuristic dual-branch strategies employed in prior works. Furthermore, evaluations on the DFADD dataset demonstrate robust generalization to unseen diffusion- and flow-matching-based synthesis methods. Crucially, our analysis reveals that scaling backbone depth effectively mitigates the performance variance and instability observed in shallower models. These results demonstrate the hybrid framework's ability to capture artifacts in spoofed speech signals, providing an effective method for ADD.
Unmanned aerial vehicles (UAVs) are pivotal for future 6G non-terrestrial networks, yet their high mobility creates a complex coupled optimization problem for beamforming and trajectory design. Existing numerical methods suffer from prohibitive latency, while standard deep learning often ignores dynamic interference topology, limiting scalability. To address these issues, this paper proposes a hierarchically decoupled framework synergizing graph neural networks (GNNs) with multi-agent reinforcement learning. Specifically, on the short timescale, we develop a topology-aware GNN beamformer by incorporating GraphNorm. By modeling the dynamic UAV-user association as a time-varying heterogeneous graph, this method explicitly extracts interference patterns to achieve sub-millisecond inference. On the long timescale, trajectory planning is modeled as a decentralized partially observable Markov decision process and solved via the multi-agent proximal policy optimization algorithm under the centralized training with decentralized execution paradigm, facilitating cooperative behaviors. Extensive simulation results demonstrate that the proposed framework significantly outperforms state-of-the-art optimization heuristics and deep learning baselines in terms of system sum rate, convergence speed, and generalization capability.
In this work, we construct an explicit, theoretically rigorous deconvolution method that relies entirely on iterative forward convolutions, thus can be numerically implemented. We first prove that convolution with an even Schwartz kernel acts as an automorphism on the vector space of finite-degree polynomials. Exploiting the parity of the kernel, we derive an exact algebraic inverse for this space, expressed uniquely as a finite linear combination of repeated convolutions. The core contribution of this work extends this algebraic inversion to infinite-dimensional function spaces, including $L^1(\mathbb{R})$, $L^2(\mathbb{R})$, the Schwartz space $\mathscr{S}(\mathbb{R})$, and the space of tempered distributions $\mathscr{S}'(\mathbb{R})$. By passing the finite-sum polynomial inversion formula to the limit, we demonstrate that an arbitrary function or distribution convolved with a Schwartz kernel can be exactly recovered in its respective topology. The resulting inverse is an explicitly computable limit of a sequence of linear combinations of recursive convolutions. As a primary application, this limit provides a fundamentally new, iterative numerical formula for the inverse of the Weierstrass Transform. By bypassing traditional numerically ill-posed inversion techniques, our method offers a mathematically exact and numerically robust algorithm for computational signal recovery.
This work proposes a method for model-free synthesis of a state observer for nonlinear systems with manipulated inputs, where the observer is trained offline using a historical or simulation dataset of state measurements. We use the structure of the Kazantzis-Kravaris/Luenberger (KKL) observer, extended to nonautonomous systems by adding an additional input-affine term to the linear time-invariant (LTI) observer-state dynamics, which determines a nonlinear injective mapping of the true states. Both this input-affine term and the nonlinear mapping from the observer states to the system states are learned from data using fully connected feedforward multi-layer perceptron neural networks. Furthermore, we theoretically prove that trained neural networks, when given new input-output data, can be used to observe the states with a guaranteed error bound. To validate the proposed observer synthesis method, case studies are performed on a bioreactor and a Williams-Otto reactor.
Over the last decades, progress in modal analysis has enabled increasingly routine use of modal parameters for applications such as structural health monitoring and finite element model updating. For output-only identification, or Operational Modal Analysis (OMA), widely adopted approaches include Stochastic Subspace Identification (SSI) methods and the Natural Excitation Technique combined with the Eigensystem Realization Algorithm (NExT-ERA). Nevertheless, SSI-based techniques may become cumbersome on large systems, while NExT-ERA fitting can struggle when measurements are contaminated by noise. To alleviate these, this work investigates an OMA frequency-domain formulation for aeronautical structures by coupling the Loewner Framework (LF) with NExT, yielding the proposed NExT-LF method. The method exploits the computational efficiency of LF, due to the effectiveness of tangential interpolation, together with the impulse response function retrieval enabled by NExT. NExT-LF is assessed on two experimental benchmarks: the eXperimental BeaRDS 2 high-aspect-ratio wing main spar and an Airbus Helicopters H135 bearingless main rotor blade. The identified modal parameters are compared against available experimental references and results obtained via SSI with Canonical Variate Analysis and NExT-ERA. The results show that the modes identified by NExT-LF correlate well with benchmark data, particularly for high-amplitude tests and in the low-frequency range.
We propose a generative framework for multi-track music source separation (MSS) that reformulates the task as conditional discrete token generation. Unlike conventional approaches that directly estimate continuous signals in the time or frequency domain, our method combines a Conformer-based conditional encoder, a dual-path neural audio codec (HCodec), and a decoder-only language model to autoregressively generate audio tokens for four target tracks. The generated tokens are decoded back to waveforms through the codec decoder. Evaluation on the MUSDB18-HQ benchmark shows that our generative approach achieves perceptual quality approaching state-of-the-art discriminative methods, while attaining the highest NISQA score on the vocals track. Ablation studies confirm the effectiveness of the learnable Conformer encoder and the benefit of sequential cross-track generation.
In this work, we study angle-based localization and rigidity maintenance control for multi-robot networks. First, we establish the relationship between angle rigidity and bearing rigidity considering \textit{directed} sensing graphs and \textit{body-frame} bearing measurements in both $2$ and $3$-\textit{dimensional space}. In particular, we demonstrate that a framework in $\mathrm{SE}(d)$ is infinitesimally bearing rigid if and only if it is infinitesimally angle rigid and each robot obtains at least $d-1$ bearing measurements ($d \in \{2, 3\}$). Building on these findings, this paper proposes a distributed angle-based localization scheme and establishes local exponential stability under switching sensing graphs, requiring only infinitesimal angle rigidity across the visited topologies. Then, since the set of available angles strongly depends on the robots' spatial configuration due to sensing constraints, we investigate rigidity maintenance control. The \textit{angle rigidity eigenvalue} is presented as a metric for the degree of rigidity. A decentralized gradient-based controller capable of executing mission-specific commands while maintaining a sufficient level of angle rigidity is proposed. Simulations were conducted to evaluate the scheme's effectiveness and practicality.
Discrete audio tokens have recently gained considerable attention for their potential to bridge audio and language processing, enabling multimodal language models that can both generate and understand audio. However, preserving key information such as phonetic content, speaker identity, and paralinguistic cues remains a major challenge. Identifying the optimal tokenizer and configuration is further complicated by inconsistent evaluation settings across existing studies. To address this, we introduce the Discrete Audio and Speech Benchmark (DASB), a comprehensive framework for benchmarking discrete audio tokens across speech, general audio, and music domains on a range of discriminative and generative tasks. Our results show that discrete representations are less robust than continuous ones and require careful tuning of factors such as model architecture, data size, learning rate, and capacity. Semantic tokens generally outperform acoustic tokens, but a gap remains between discrete tokens and continuous features, highlighting the need for further research. DASB codes, evaluation setup, and leaderboards are publicly available at this https URL.
In recent years, live video streaming has gained widespread popularity across various social media platforms. Quality of experience (QoE), which reflects end-users' satisfaction and overall experience, plays a critical role for media service providers to optimize large-scale live compression and transmission strategies to achieve perceptually optimal rate-distortion trade-off. Although many QoE metrics for video-on-demand (VoD) have been proposed, there remain significant challenges in developing QoE metrics for live video streaming. To bridge this gap, we conduct a comprehensive study of subjective and objective QoE evaluations for live video streaming. For the subjective QoE study, we introduce the first live video streaming QoE dataset, TaoLive QoE, which consists of $42$ source videos collected from real live broadcasts and $1,155$ corresponding distorted ones degraded due to a variety of streaming distortions, including conventional streaming distortions such as compression, stalling, as well as live streaming-specific distortions like frame skipping, variable frame rate, etc. Subsequently, a human study was conducted to derive subjective QoE scores of videos in the TaoLive QoE dataset. For the objective QoE study, we benchmark existing QoE models on the TaoLive QoE dataset as well as publicly available QoE datasets for VoD scenarios, highlighting that current models struggle to accurately assess video QoE, particularly for live content. Hence, we propose an end-to-end QoE evaluation model, Tao-QoE, which integrates multi-scale semantic features and optical flow-based motion features to predicting a retrospective QoE score, eliminating reliance on statistical quality of service (QoS) features.
This article studies the time-optimal path planning problem for a convexified Reeds-Shepp (CRS) vehicle on a unit sphere, capable of both forward and backward motion, with speed bounded in magnitude by 1 and turning rate bounded in magnitude by a given constant. For the case in which the turning-rate bound is at least 1, using Pontryagin's Maximum Principle and a phase-portrait analysis, we show that the optimal path connecting a given initial configuration to a desired terminal configuration consists of at most six segments drawn from three motion primitives: tight turns, great circular arcs, and turn-in-place motions. A complete classification yields a finite sufficient list of 23 optimal path types with closed-form segment angles derived. The complementary case in which the turning-rate bound is less than 1 is addressed via an equivalent reformulation. The proposed formulation is applicable to underactuated satellite attitude control, spherical rolling robots, and mobile robots operating on spherical or gently curved surfaces. The source code for solving the time-optimal path problem and visualization is publicly available at this https URL.
Efficient workload scheduling is a critical challenge in modern heterogeneous computing environments, particularly in high-performance computing (HPC) systems. Traditional software-based schedulers struggle to efficiently balance workloads due to scheduling overhead, lack of adaptability to stochastic workloads, and suboptimal resource utilization. The scheduling problem further compounds in the context of shared HPC clusters, where job arrivals and processing times are inherently stochastic. Prediction of these elements is possible, but it introduces additional overhead. To perform this complex scheduling, we developed two FPGA-assisted hardware accelerator microarchitectures, Hercules and Stannic. Hercules adopts a task-centric abstraction of stochastic scheduling, whereas Stannic inherits a schedule-centric abstraction. These hardware-assisted solutions leverage parallelism, pre-calculation, and spatial memory access to significantly accelerate scheduling. We accelerate a non-preemptive stochastic online scheduling algorithm to produce heterogeneity-aware schedules in near real time. With Hercules, we achieved a speedup of up to 1060x over a baseline C/C++ implementation, demonstrating the efficacy of a hardware-assisted acceleration for heterogeneity-aware stochastic scheduling. With Stannic, we further improved efficiency, achieving a 7.5x reduction in latency per computation iteration and a 14x increase in the target heterogeneous system size. Experimental results show that the resulting schedules demonstrate efficient machine utilization and low average job latency in stochastic contexts.
We introduce MMAudioSep, a generative model for video/text-queried sound separation that is founded on a pretrained video-to-audio model. By leveraging knowledge about the relationship between video/text and audio learned through a pretrained audio generative model, we can train the model more efficiently, i.e., the model does not need to be trained from scratch. We evaluate the performance of MMAudioSep by comparing it to existing separation models, including models based on both deterministic and generative approaches, and find it is superior to the baseline models. Furthermore, we demonstrate that even after acquiring functionality for sound separation via fine-tuning, the model retains the ability for original video-to-audio generation. This highlights the potential of foundational sound generation models to be adopted for sound-related downstream tasks. Our code is available at this https URL.
Full-Duplex Speech Language Models (FD-SLMs) enable real-time, overlapping conversational interactions, offering a more dynamic user experience compared to traditional half-duplex models. However, existing benchmarks primarily focus on evaluating single-round interactions, neglecting the complexities of multi-round communication. Evaluating FD-SLMs in multi-round settings poses significant challenges, including blurred turn boundaries in communication and context inconsistency during model inference. Also, existing benchmarks often focus solely on evaluating conversational features, neglecting other critical aspects. To address these gaps, we introduce MTR-DuplexBench, a novel benchmark designed for a comprehensive multi-round evaluation of FD-SLMs. MTR-DuplexBench not only segments continuous full-duplex dialogues into discrete turns for turn-by-turn assessment but also incorporates various evaluation aspects, including conversational features, dialogue quality, instruction following, and safety. Experimental results reveal that current FD-SLMs face difficulties in maintaining consistent performance across multiple rounds and evaluation dimensions, highlighting the necessity and effectiveness of our benchmark. Code and data are available at: this https URL
In many countries, declining demand in energy-intensive industries (EIIs) such as cement, steel, and aluminum is leading to industrial overcapacity. Although industrial overcapacity is traditionally envisioned as problematic and resource-wasteful, it could unlock EIIs' flexibility in electricity use. Here, using China's aluminum smelting industry as a case study, we evaluate the system-level cost-benefit of retaining EII overcapacity for flexible electricity use in decarbonized energy systems. We find that overcapacity can enable aluminum smelters to adopt a seasonal operation paradigm, ceasing production during winter load peaks that are exacerbated by heating electrification and renewable seasonality. This seasonal operation paradigm could reduce the investment and operational costs of China's decarbonized electricity system by 23-32 billion CNY/year (11-15% of the aluminum smelting industry's product value), sufficient to offset the increased smelter maintenance and product storage costs associated with overcapacity. It may also create labor complementarities between the aluminum and thermal power sectors.
Speech-to-speech language models have recently emerged to enhance the naturalness of conversational AI. In particular, full-duplex models are distinguished by their real-time interactivity, including handling of pauses, interruptions, and backchannels. However, improving their factuality remains an open challenge. While scaling the model size could address this gap, it would make real-time inference prohibitively expensive. In this work, we propose MoshiRAG, a modular approach that combines a compact full-duplex interface with selective retrieval to access more powerful knowledge sources. Our asynchronous framework enables the model to identify knowledge-demanding queries and ground its responses in external information. By leveraging the natural temporal gap between response onset and the delivery of core information, the retrieval process can be completed while maintaining a natural conversation flow. With this approach, MoshiRAG achieves factuality comparable to the best publicly released non-duplex speech language models while preserving the interactivity inherent to full-duplex systems. Moreover, our flexible design supports plug-and-play retrieval methods without retraining and demonstrates strong performance on out-of-domain mathematical reasoning tasks.