New articles on Quantitative Biology


[1] 2410.14683

Brain-Aware Readout Layers in GNNs: Advancing Alzheimer's early Detection and Neuroimaging

Alzheimer's disease (AD) is a neurodegenerative disorder characterized by progressive memory and cognitive decline, affecting millions worldwide. Diagnosing AD is challenging due to its heterogeneous nature and variable progression. This study introduces a novel brain-aware readout layer (BA readout layer) for Graph Neural Networks (GNNs), designed to improve interpretability and predictive accuracy in neuroimaging for early AD diagnosis. By clustering brain regions based on functional connectivity and node embedding, this layer improves the GNN's capability to capture complex brain network characteristics. We analyzed neuroimaging data from 383 participants, including both cognitively normal and preclinical AD individuals, using T1-weighted MRI, resting-state fMRI, and FBB-PET to construct brain graphs. Our results show that GNNs with the BA readout layer significantly outperform traditional models in predicting the Preclinical Alzheimer's Cognitive Composite (PACC) score, demonstrating higher robustness and stability. The adaptive BA readout layer also offers enhanced interpretability by highlighting task-specific brain regions critical to cognitive functions impacted by AD. These findings suggest that our approach provides a valuable tool for the early diagnosis and analysis of Alzheimer's disease.


[2] 2410.14697

Learning Cortico-Muscular Dependence through Orthonormal Decomposition of Density Ratios

The cortico-spinal neural pathway is fundamental for motor control and movement execution, and in humans it is typically studied using concurrent electroencephalography (EEG) and electromyography (EMG) recordings. However, current approaches for capturing high-level and contextual connectivity between these recordings have important limitations. Here, we present a novel application of statistical dependence estimators based on orthonormal decomposition of density ratios to model the relationship between cortical and muscle oscillations. Our method extends from traditional scalar-valued measures by learning eigenvalues, eigenfunctions, and projection spaces of density ratios from realizations of the signal, addressing the interpretability, scalability, and local temporal dependence of cortico-muscular connectivity. We experimentally demonstrate that eigenfunctions learned from cortico-muscular connectivity can accurately classify movements and subjects. Moreover, they reveal channel and temporal dependencies that confirm the activation of specific EEG channels during movement.


[3] 2410.14800

Unlocking the Full Potential of High-Density Surface EMG: Novel Non-Invasive High-Yield Motor Unit Decomposition

The decomposition of high-density surface electromyography (HD-sEMG) signals into motor unit discharge patterns has become a powerful tool for investigating the neural control of movement, providing insights into motor neuron recruitment and discharge behavior. However, current algorithms, while very effective under certain conditions, face significant challenges in complex scenarios, as their accuracy and motor unit yield are highly dependent on anatomical differences among individuals. This can limit the number of decomposed motor units, particularly in challenging conditions. To address this issue, we recently introduced Swarm-Contrastive Decomposition (SCD), which dynamically adjusts the separation function based on the distribution of the data and prevents convergence to the same source. Initially applied to intramuscular EMG signals, SCD is here adapted for HD-sEMG signals. We demonstrated its ability to address key challenges faced by existing methods, particularly in identifying low-amplitude motor unit action potentials and effectively handling complex decomposition scenarios, like high-interference signals. We extensively validated SCD using simulated and experimental HD-sEMG recordings and compared it with current state-of-the-art decomposition methods under varying conditions, including different excitation levels, noise intensities, force profiles, sexes, and muscle groups. The proposed method consistently outperformed existing techniques in both the quantity of decoded motor units and the precision of their firing time identification. For instance, under certain experimental conditions, SCD detected more than three times as many motor units compared to previous methods, while also significantly improving accuracy. These advancements represent a major step forward in non-invasive EMG technology for studying motor unit activity in complex scenarios.


[4] 2410.14883

Disease Incidence in a Stochastic SVIRS Model with Waning Immunity

This paper deals with the long-term behaviour and incidence of a vaccine-preventable contact disease, under the assumption that both vaccine protection and immunity after recovery are not lifelong. The mathematical model is developed in a stochastic markovian framework. The evolution of the disease in a finite population is thus represented by a three-dimensional continuous-time Markov chain, which is versatile enough to be able to compensate for the loss of protection by including vaccination before the onset of the outbreak and also during the course of the epidemics.


[5] 2410.14898

Proteins with alternative folds reveal blind spots in AlphaFold-based protein structure prediction

In recent years, advances in artificial intelligence (AI) have transformed structural biology, particularly protein structure prediction. Though AI-based methods, such as AlphaFold (AF), often predict single conformations of proteins with high accuracy and confidence, predictions of alternative folds are often inaccurate, low-confidence, or simply not predicted at all. Here, we review three blind spots that alternative conformations reveal about AF-based protein structure prediction. First, proteins that assume conformations distinct from their training-set homologs can be mispredicted. Second, AF overrelies on its training set to predict alternative conformations. Third, degeneracies in pairwise representations can lead to high-confidence predictions inconsistent with experiment. These weaknesses suggest approaches to predict alternative folds more reliably.


[6] 2410.14915

Enumeration of rooted binary perfect phylogenies

Rooted binary perfect phylogenies provide a generalization of rooted binary unlabeled trees in which each leaf is assigned a positive integer value that corresponds in a biological setting to the count of the number of indistinguishable lineages associated with the leaf. For the rooted binary unlabeled trees, these integers equal 1. We address a variety of enumerative problems concerning rooted binary perfect phylogenies with sample size $s$: the rooted binary unlabeled trees in which a sample of size $s$ lineages is distributed across the leaves of an unlabeled tree with $n$ leaves, $1 \leq n \leq s$. The enumerations further characterize the rooted binary perfect phylogenies, which include the rooted binary unlabeled trees, and which can provide a set of structures useful for various biological contexts.


[7] 2410.15018

Latency correction in sparse neuronal spike trains with overlapping global events

Background: In Kreuz et al., J Neurosci Methods 381, 109703 (2022) two methods were proposed that perform latency correction, i.e., optimize the spike time alignment of sparse neuronal spike trains with well defined global spiking events. The first one based on direct shifts is fast but uses only partial latency information, while the other one makes use of the full information but relies on the computationally costly simulated annealing. Both methods reach their limits and can become unreliable when successive global events are not sufficiently separated or even overlap. New Method: Here we propose an iterative scheme that combines the advantages of the two original methods by using in each step as much of the latency information as possible and by employing a very fast extrapolation direct shift method instead of the much slower simulated annealing. Results: We illustrate the effectiveness and the improved performance, measured in terms of the relative shift error, of the new iterative scheme not only on simulated data with known ground truths but also on single-unit recordings from two medial superior olive neurons of a gerbil. Comparison with Existing Method(s): The iterative scheme outperforms the existing approaches on both the simulated and the experimental data. Due to its low computational demands, and in contrast to simulated annealing, it can also be applied to very large datasets. Conclusions: The new method generalizes and improves on the original method both in terms of accuracy and speed. Importantly, it is the only method that allows to disentangle global events with overlap.


[8] 2410.15108

The shape of the brain's connections is predictive of cognitive performance: an explainable machine learning study

The shape of the brain's white matter connections is relatively unexplored in diffusion MRI tractography analysis. While it is known that tract shape varies in populations and across the human lifespan, it is unknown if the variability in dMRI tractography-derived shape may relate to the brain's functional variability across individuals. This work explores the potential of leveraging tractography fiber cluster shape measures to predict subject-specific cognitive performance. We implement machine learning models to predict individual cognitive performance scores. We study a large-scale database from the HCP-YA study. We apply an atlas-based fiber cluster parcellation to the dMRI tractography of each individual. We compute 15 shape, microstructure, and connectivity features for each fiber cluster. Using these features as input, we train a total of 210 models to predict 7 different NIH Toolbox cognitive performance assessments. We apply an explainable AI technique, SHAP, to assess the importance of each fiber cluster for prediction. Our results demonstrate that shape measures are predictive of individual cognitive performance. The studied shape measures, such as irregularity, diameter, total surface area, volume, and branch volume, are as effective for prediction as microstructure and connectivity measures. The overall best-performing feature is a shape feature, irregularity, which describes how different a cluster's shape is from an idealized cylinder. Further interpretation using SHAP values suggest that fiber clusters with features highly predictive of cognitive ability are widespread throughout the brain, including fiber clusters from the superficial association, deep association, cerebellar, striatal, and projection pathways. This study demonstrates the strong potential of shape descriptors to enhance the study of the brain's white matter and its relationship to cognitive function.


[9] 2410.15141

Mimicking the Gas-Phase to Transport Odorants through the Nasal Mucus: Functional Insights into Odorant Binding Proteins

Mammalian odorant binding proteins (OBPs) have long been suggested to transport hydrophobic odorant molecules through the aqueous environment of the nasal mucus. While the function of OBPs as odorant transporters is supported by their hydrophobic beta-barrel structure, no rationale has been provided on why and how these proteins facilitate the uptake of odorants from the gas phase. Here, a multi-scale computational approach validated through available high-resolution spectroscopy experiments reveals that the conformational space explored by carvone inside the binding cavity of porcine OBP (pOBP) is much closer to the gas than the aqueous phase, and that pOBP effectively manages to transport odorants by lowering the free energy barrier of odorant uptake. Understanding such perireceptor events is crucial to fully unravel the molecular processes underlying the olfactory sense, and move towards the development of protein-based biomimetic sensor units that can serve as artificial noses.


[10] 2410.15259

Optimizing adaptive sampling via Policy Ranking

Efficient sampling in biomolecular simulations is critical for accurately capturing the complex dynamical behaviors of biological systems. Adaptive sampling techniques aim to improve efficiency by focusing computational resources on the most relevant regions of phase space. In this work, we present a framework for identifying the optimal sampling policy through metric driven ranking. Our approach systematically evaluates the policy ensemble and ranks the policies based on their ability to explore the conformational space effectively. Through a series of biomolecular simulation case studies, we demonstrate that choice of a different adaptive sampling policy at each round significantly outperforms single policy sampling, leading to faster convergence and improved sampling performance. This approach takes an ensemble of adaptive sampling policies and identifies the optimal policy for the next round based on current data. Beyond presenting this ensemble view of adaptive sampling, we also propose two sampling algorithms that approximate this ranking framework on the fly. The modularity of this framework allows incorporation of any adaptive sampling policy making it versatile and suitable as a comprehensive adaptive sampling scheme.


[11] 2410.15367

DNA Language Model and Interpretable Graph Neural Network Identify Genes and Pathways Involved in Rare Diseases

Identification of causal genes and pathways is a critical step for understanding the genetic underpinnings of rare diseases. We propose novel approaches to gene prioritization and pathway identification using DNA language model, graph neural networks, and genetic algorithm. Using HyenaDNA, a long-range genomic foundation model, we generated dynamic gene embeddings that reflect changes caused by deleterious variants. These gene embeddings were then utilized to identify candidate genes and pathways. We validated our method on a cohort of rare disease patients with partially known genetic diagnosis, demonstrating the re-identification of known causal genes and pathways and the detection of novel candidates. These findings have implications for the prevention and treatment of rare diseases by enabling targeted identification of new drug targets and therapeutic pathways.


[12] 2410.15433

Discriminating image representations with principal distortions

Image representations (artificial or biological) are often compared in terms of their global geometry; however, representations with similar global structure can have strikingly different local geometries. Here, we propose a framework for comparing a set of image representations in terms of their local geometries. We quantify the local geometry of a representation using the Fisher information matrix, a standard statistical tool for characterizing the sensitivity to local stimulus distortions, and use this as a substrate for a metric on the local geometry in the vicinity of a base image. This metric may then be used to optimally differentiate a set of models, by finding a pair of "principal distortions" that maximize the variance of the models under this metric. We use this framework to compare a set of simple models of the early visual system, identifying a novel set of image distortions that allow immediate comparison of the models by visual inspection. In a second example, we apply our method to a set of deep neural network models and reveal differences in the local geometry that arise due to architecture and training types. These examples highlight how our framework can be used to probe for informative differences in local sensitivities between complex computational models, and suggest how it could be used to compare model representations with human perception.


[13] 2410.15592

CPE-Pro: A Structure-Sensitive Deep Learning Model for Protein Representation and Origin Evaluation

Protein structures are important for understanding their functions and interactions. Currently, many protein structure prediction methods are enriching the structure database. Discriminating the origin of structures is crucial for distinguishing between experimentally resolved and computationally predicted structures, evaluating the reliability of prediction methods, and guiding downstream biological studies. Building on works in structure prediction, We developed a structure-sensitive supervised deep learning model, Crystal vs Predicted Evaluator for Protein Structure (CPE-Pro), to represent and discriminate the origin of protein structures. CPE-Pro learns the structural information of proteins and captures inter-structural differences to achieve accurate traceability on four data classes, and is expected to be extended to more. Simultaneously, we utilized Foldseek to encode protein structures into "structure-sequence" and trained a protein Structural Sequence Language Model, SSLM. Preliminary experiments demonstrated that, compared to large-scale protein language models pre-trained on vast amounts of amino acid sequences, the "structure-sequences" enable the language model to learn more informative protein features, enhancing and optimizing structural representations. We have provided the code, model weights, and all related materials on https://github.com/GouWenrui/CPE-Pro-main.git.


[14] 2410.15940

Machine learning methods to study disordered proteins

Recent years have seen tremendous developments in the use of machine learning models to link amino acid sequence, structure and function of folded proteins. These methods are, however, rarely applicable to the wide range of proteins and sequences that comprise intrinsically disordered regions. We here review developments in the study of disordered proteins that exploit or are used to train machine learning models. These include methods for generating conformational ensembles and designing new sequences, and for linking sequences to biophysical properties and biological functions. We highlight how these developments are built on a tight integration between experiment, theory and simulations, and account for evolutionary constraints, which operate on sequences of disordered regions differently than on those of folded domains.


[15] 2410.16005

Precision Adaptive Hormone Control for Personalized Metastatic Prostate Cancer Treatment

With the oncologist acting as the ``game leader'', we employ a Stackelberg game-theoretic model involving multiple populations to study prostate cancer. We refine the drug dosing schedule using an empirical Bayes feed-forward analysis, based on clinical data that reflects each patient's prostate-specific drug response. Our methodology aims for a quantitative grasp of the parameter landscape of this adaptive multi-population model, focusing on arresting the growth of drug-resistant prostate cancer by promoting competition across drug-sensitive cancer cell populations. Our findings indicate that not only is it is feasible to considerably extend cancer suppression duration through careful optimization, but even transform metastatic prostate cancer into a chronic condition instead of an acute one for most patients, with supporting clinical and analytical evidence.


[16] 2410.16064

Characterizing RNA oligomers using Stochastic Titration Constant-pH Metadynamics simulations

RNA molecules exhibit various biological functions intrinsically dependent on their diverse ecosystem of highly flexible structures. This flexibility arises from complex hydrogen-bonding networks defined by canonical and non-canonical base pairs that require protonation events to stabilize or perturb these interactions. Constant pH molecular dynamics (CpHMD) methods provide a reliable framework to explore the conformational and protonation space of dynamic structures and for robust calculations of pH-dependent properties, such as the pK$_\mathrm{a}$ of titrable sites. Despite growing biological evidence concerning pH regulation of certain motifs and in biotechnological applications, pH-sensitive in silico methods have rarely been applied to nucleic acids. In this work, we extended the stochastic titration CpHMD method to include RNA parameters from the standard $\chi$OL3 AMBER force field and highlighted its capability to depict titration events of nucleotides in single-stranded RNAs. We validated the method using trimers and pentamers with a single central titrable site while integrating a well-tempered metadynamics approach into the st-CpHMD methodology (CpH-MetaD) using PLUMED. This approach enhanced the convergence of the conformational landscape and enabled more efficient sampling of protonation-conformation coupling. Our pK$_\mathrm{a}$ estimates agree with experimental data, validating the method's ability to reproduce electrostatic changes around a titrable nucleobase in single-stranded RNA. These findings provided molecular insight into intramolecular phenomena, such as nucleobase stacking and phosphate interactions, that dictate the experimentally observed pK$_\mathrm{a}$ shifts between different strands. Overall, this work validates both the st-CpHMD and the metadynamics integration as reliable tools for studying biologically relevant RNA systems.


[17] 2410.16101

Muscle coactivation primes the nervous system for fast and task-dependent feedback control

Humans and other animals coactivate agonist and antagonist muscles in many motor actions. Increases in muscle coactivation are thought to leverage viscoelastic properties of skeletal muscles to provide resistance against limb motion. However, coactivation also emerges in scenarios where it seems paradoxical because the goal is not to resist limb motion but instead to rapidly mobilize the limb(s) or body to launch or correct movements. Here, we present a new perspective on muscle coactivation: to prime the nervous system for fast, task-dependent responses to sensory stimuli. We review distributed neural control mechanisms that may allow the healthy nervous system to leverage muscle coactivation to produce fast and flexible responses to sensory feedback.


[18] 2410.16123

The role of spike-timing-dependent plasticity and random inputs in neurodegenerative diseases and neuromorphic systems

Neuronal oscillations are related to symptoms of Parkinson's disease. The random inputs could affect such oscillations in the brain states that translate collective activities of neurons interconnected via synaptic connections. In this paper, we study coupled effects of channels and synaptic dynamics under the stochastic influence, together with spike-timing-dependent plasticity (STDP) of healthy brain cells with applications to Parkinson's disease (PD). In particular, we investigate the effects of random inputs and input correlations in a subthalamic nucleus (STN) cell membrane potential model. Our numerical results show that the random inputs strongly affect the spiking activities of the STN neuron not only in the case of healthy cells but also in the case of PD cells in the presence of DBS treatment. The STDP increases the interspike interval (ISI) regularity of spike trains of the output neurons. However, the existence of a random refractory period and random input current in the system may substantially influence an increased irregularity of spike trains of the output neurons. Furthermore, the presence of the stochastic influence together with spike-timing-dependent plasticity could increase the correlation of the neurons. These effects would potentially contribute to the management of PD symptoms.


[19] 2410.16136

Modeling dynamic neural activity by combining naturalistic video stimuli and stimulus-independent latent factors

Understanding how the brain processes dynamic natural stimuli remains a fundamental challenge in neuroscience. Current dynamic neural encoding models either take stimuli as input but ignore shared variability in neural responses, or they model this variability by deriving latent embeddings from neural responses or behavior while ignoring the visual input. To address this gap, we propose a probabilistic model that incorporates video inputs along with stimulus-independent latent factors to capture variability in neuronal responses, predicting a joint distribution for the entire population. After training and testing our model on mouse V1 neuronal responses, we found that it outperforms video-only models in terms of log-likelihood and achieves further improvements when conditioned on responses from other neurons. Furthermore, we find that the learned latent factors strongly correlate with mouse behavior, although the model was trained without behavior data.


[20] 2410.16169

The Interplay Between Physical Activity, Protein Consumption, and Sleep Quality in Muscle Protein Synthesis

This systematic review examines the synergistic and individual influences of resistance exercise, dietary protein supplementation, and sleep/recovery on muscle protein synthesis (MPS). Electronic databases such as Scopus, Google Scholar, and Web of Science were extensively used. Studies were selected based on relevance to the criteria and were ensured to be directly applicable to the objectives. Research indicates that a protein dose of 20 to 25 grams maximally stimulates MPS post-resistance training. It is observed that physically frail individuals aged 76 to 92 and middle-aged adults aged 62 to 74 have lower mixed muscle protein synthetic rates than individuals aged 20 to 32. High-whey protein and leucine-enriched supplements enhance MPS more efficiently than standard dairy products in older adults engaged in resistance programs. Similarly, protein intake before sleep boosts overnight MPS rates, which helps prevent muscle loss associated with sleep debt, exercise-induced damage, and muscle-wasting conditions like sarcopenia and cachexia. Resistance exercise is a functional intervention to achieve muscular adaptation and improve function. Future research should focus on variables such as fluctuating fitness levels, age groups, genetics, and lifestyle factors to generate more accurate and beneficial results.


[21] 2410.14696

REBIND: Enhancing ground-state molecular conformation via force-based graph rewiring

Predicting the ground-state 3D molecular conformations from 2D molecular graphs is critical in computational chemistry due to its profound impact on molecular properties. Deep learning (DL) approaches have recently emerged as promising alternatives to computationally-heavy classical methods such as density functional theory (DFT). However, we discover that existing DL methods inadequately model inter-atomic forces, particularly for non-bonded atomic pairs, due to their naive usage of bonds and pairwise distances. Consequently, significant prediction errors occur for atoms with low degree (i.e., low coordination numbers) whose conformations are primarily influenced by non-bonded interactions. To address this, we propose REBIND, a novel framework that rewires molecular graphs by adding edges based on the Lennard-Jones potential to capture non-bonded interactions for low-degree atoms. Experimental results demonstrate that REBIND significantly outperforms state-of-the-art methods across various molecular sizes, achieving up to a 20\% reduction in prediction error.


[22] 2410.14719

A Transformer Based Generative Chemical Language AI Model for Structural Elucidation of Organic Compounds

For over half a century, computer-aided structural elucidation systems (CASE) for organic compounds have relied on complex expert systems with explicitly programmed algorithms. These systems are often computationally inefficient for complex compounds due to the vast chemical structural space that must be explored and filtered. In this study, we present a transformer based generative chemical language artificial intelligence (AI) model, an innovative end-to-end architecture designed to replace the logic and workflow of the classic CASE framework for ultra-fast and accurate spectroscopic-based structural elucidation. Our model employs an encoder-decoder architecture and self-attention mechanisms, similar to those in large language models, to directly generate the most probable chemical structures that match the input spectroscopic data. This approach demonstrates the potential of transformer based generative AI to accelerate traditional scientific problem-solving processes. The model's ability to iterate quickly based on new data highlights its potential for rapid advancements in structural elucidation.


[23] 2410.14946

DEL-Ranking: Ranking-Correction Denoising Framework for Elucidating Molecular Affinities in DNA-Encoded Libraries

DNA-encoded library (DEL) screening has revolutionized the detection of protein-ligand interactions through read counts, enabling rapid exploration of vast chemical spaces. However, noise in read counts, stemming from nonspecific interactions, can mislead this exploration process. We present DEL-Ranking, a novel distribution-correction denoising framework that addresses these challenges. Our approach introduces two key innovations: (1) a novel ranking loss that rectifies relative magnitude relationships between read counts, enabling the learning of causal features determining activity levels, and (2) an iterative algorithm employing self-training and consistency loss to establish model coherence between activity label and read count predictions. Furthermore, we contribute three new DEL screening datasets, the first to comprehensively include multi-dimensional molecular representations, protein-ligand enrichment values, and their activity labels. These datasets mitigate data scarcity issues in AI-driven DEL screening research. Rigorous evaluation on diverse DEL datasets demonstrates DEL-Ranking's superior performance across multiple correlation metrics, with significant improvements in binding affinity prediction accuracy. Our model exhibits zero-shot generalization ability across different protein targets and successfully identifies potential motifs determining compound binding affinity. This work advances DEL screening analysis and provides valuable resources for future research in this area.


[24] 2410.14956

Airborne Biomarker Localization Engine (ABLE) for Open Air Point-of-Care Detection

Unlike biomarkers in biofluids, airborne biomarkers are dilute and difficult to trace. Detecting diverse airborne biomarkers with sufficient sensitivity typically relies on bulky and expensive equipment like mass spectrometers that remain inaccessible to the general population. Here, we introduce Airborne Biomarker Localization Engine (ABLE), a simple, affordable, and portable platform that can detect both volatile, non-volatile, molecular, and particulate biomarkers in about 15 minutes. ABLE significantly improves gas detection limits by converting dilute gases into droplets by water condensation, producing concentrated aqueous samples that are easy to be tested. Fundamental studies of multiphase condensation revealed unexpected stability in condensate-trapped biomarkers, making ABLE a reliable, accessible, and high-performance system for open-air-based biosensing applications such as non-contact infant healthcare, pathogen detection in public space, and food safety.


[25] 2410.15165

Explaining Graph Neural Networks with Large Language Models: A Counterfactual Perspective for Molecular Property Prediction

In recent years, Graph Neural Networks (GNNs) have become successful in molecular property prediction tasks such as toxicity analysis. However, due to the black-box nature of GNNs, their outputs can be concerning in high-stakes decision-making scenarios, e.g., drug discovery. Facing such an issue, Graph Counterfactual Explanation (GCE) has emerged as a promising approach to improve GNN transparency. However, current GCE methods usually fail to take domain-specific knowledge into consideration, which can result in outputs that are not easily comprehensible by humans. To address this challenge, we propose a novel GCE method, LLM-GCE, to unleash the power of large language models (LLMs) in explaining GNNs for molecular property prediction. Specifically, we utilize an autoencoder to generate the counterfactual graph topology from a set of counterfactual text pairs (CTPs) based on an input graph. Meanwhile, we also incorporate a CTP dynamic feedback module to mitigate LLM hallucination, which provides intermediate feedback derived from the generated counterfactuals as an attempt to give more faithful guidance. Extensive experiments demonstrate the superior performance of LLM-GCE. Our code is released on https://github.com/YinhanHe123/new\_LLM4GNNExplanation.


[26] 2410.15500

Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example

Speech anonymisation aims to protect speaker identity by changing personal identifiers in speech while retaining linguistic content. Current methods fail to retain prosody and unique speech patterns found in elderly and pathological speech domains, which is essential for remote health monitoring. To address this gap, we propose a voice conversion-based method (DDSP-QbE) using differentiable digital signal processing and query-by-example. The proposed method, trained with novel losses, aids in disentangling linguistic, prosodic, and domain representations, enabling the model to adapt to uncommon speech patterns. Objective and subjective evaluations show that DDSP-QbE significantly outperforms the voice conversion state-of-the-art concerning intelligibility, prosody, and domain preservation across diverse datasets, pathologies, and speakers while maintaining quality and speaker anonymity. Experts validate domain preservation by analysing twelve clinically pertinent domain attributes.


[27] 2410.15614

Topology-Aware Exploration of Circle of Willis for CTA and MRA: Segmentation, Detection, and Classification

The Circle of Willis (CoW) vessels is critical to connecting major circulations of the brain. The topology of the vascular structure is clinical significance to evaluate the risk, severity of the neuro-vascular diseases. The CoW has two representative angiographic imaging modalities, computed tomography angiography (CTA) and magnetic resonance angiography (MRA). TopCow24 provided 125 paired CTA-MRA dataset for the analysis of CoW. To explore both CTA and MRA images in a unified framework to learn the inherent topology of Cow, we construct the universal dataset via independent intensity preprocess, followed by joint resampling and normarlization. Then, we utilize the topology-aware loss to enhance the topology completeness of the CoW and the discrimination between different classes. A complementary topology-aware refinement is further conducted to enhance the connectivity within the same class. Our method was evaluated on all the three tasks and two modalities, achieving competitive results. In the final test phase of TopCow24 Challenge, we achieved the second place in the CTA-Seg-Task, the third palce in the CTA-Box-Task, the first place in the CTA-Edg-Task, the second place in the MRA-Seg-Task, the third palce in the MRA-Box-Task, the second place in the MRA-Edg-Task.


[28] 2410.15896

Simulation-based inference of single-molecule experiments

Single-molecule experiments are a unique tool to characterize the structural dynamics of biomolecules. However, reconstructing molecular details from noisy single-molecule data is challenging. Simulation-based inference (SBI) integrates statistical inference, physics-based simulators, and machine learning and is emerging as a powerful framework for analysing complex experimental data. Recent advances in deep learning have accelerated the development of new SBI methods, enabling the application of Bayesian inference to an ever-increasing number of scientific problems. Here, we review the nascent application of SBI to the analysis of single-molecule experiments. We introduce parametric Bayesian inference and discuss its limitations. We then overview emerging deep-learning-based SBI methods to perform Bayesian inference for complex models encoded in computer simulators. We illustrate the first applications of SBI to single-molecule force-spectroscopy and cryo-electron microscopy experiments. SBI allows us to leverage powerful computer algorithms modeling complex biomolecular phenomena to connect scientific models and experiments in a principled way.


[29] 2410.15901

Harnessing single polarization doppler weather radars for tracking Desert Locust Swarms

Desert locusts are notorious agriculture pests prompting billions in losses and global food scarcity concerns. With billions of these locusts invading agrarian lands, this is no longer a thing of the past. This study taps into the existing doppler weather radar (DWR) infrastructure which was originally deployed for meteorological applications. This study demonstrates a systematic approach to distinctly identify and track concentrations of desert locust swarms in near real time using single polarization radars. Findings reveal the potential to establish early warning systems with lead times of around 7 hours and spatial coverage of approximately 100 kilometers. Embracing these technological advancements are crucial to safeguard agricultural landscapes and upload global food security.


[30] 2410.15943

Molecular Signal Reception in Complex Vessel Networks: The Role of the Network Topology

The notion of synthetic molecular communication (MC) refers to the transmission of information via molecules and is largely foreseen for use within the human body, where traditional electromagnetic wave (EM)-based communication is impractical. MC is anticipated to enable innovative medical applications, such as early-stage tumor detection, targeted drug delivery, and holistic approaches like the Internet of Bio-Nano Things (IoBNT). Many of these applications involve parts of the human cardiovascular system (CVS), here referred to as networks, posing challenges for MC due to their complex, highly branched vessel structures. To gain a better understanding of how the topology of such branched vessel networks affects the reception of a molecular signal at a target location, e.g., the network outlet, we present a generic analytical end-to-end model that characterizes molecule propagation and reception in linear branched vessel networks (LBVNs). We specialize this generic model to any MC system employing superparamagnetic iron-oxide nanoparticles (SPIONs) as signaling molecules and a planar coil as receiver (RX). By considering components that have been previously established in testbeds, we effectively isolate the impact of the network topology and validate our theoretical model with testbed data. Additionally, we propose two metrics, namely the molecule delay and the multi-path spread, that relate the LBVN topology to the molecule dispersion induced by the network, thereby linking the network structure to the signal-to-noise ratio (SNR) at the target location. This allows the characterization of the SNR at any point in the network solely based on the network topology. Consequently, our framework can, e.g., be exploited for optimal sensor placement in the CVS or identification of suitable testbed topologies for given SNR requirements.


[31] 2410.15955

The mutual arrangement of Wright-Fisher diffusion path measures and its impact on parameter estimation

The Wright-Fisher diffusion is a fundamentally important model of evolution encompassing genetic drift, mutation, and natural selection. Suppose you want to infer the parameters associated with these processes from an observed sample path. Then to write down the likelihood one first needs to know the mutual arrangement of two path measures under different parametrizations; that is, whether they are absolutely continuous, equivalent, singular, and so on. In this paper we give a complete answer to this question by finding the separating times for the diffusion - the stopping time before which one measure is absolutely continuous with respect to the other and after which the pair is mutually singular. In one dimension this extends a classical result of Dawson on the local equivalence between neutral and non-neutral Wright-Fisher diffusion measures. Along the way we also develop new zero-one type laws for the diffusion on its approach to, and emergence from, the boundary. As an application we derive an explicit expression for the joint maximum likelihood estimator of the mutation and selection parameters and show that its convergence properties are closely related to the separating time.


[32] 2410.16109

Interpreting Microbiome Relative Abundance Data Using Symbolic Regression

Understanding the complex interactions within the microbiome is crucial for developing effective diagnostic and therapeutic strategies. Traditional machine learning models often lack interpretability, which is essential for clinical and biological insights. This paper explores the application of symbolic regression (SR) to microbiome relative abundance data, with a focus on colorectal cancer (CRC). SR, known for its high interpretability, is compared against traditional machine learning models, e.g., random forest, gradient boosting decision trees. These models are evaluated based on performance metrics such as F1 score and accuracy. We utilize 71 studies encompassing, from various cohorts, over 10,000 samples across 749 species features. Our results indicate that SR not only competes reasonably well in terms of predictive performance, but also excels in model interpretability. SR provides explicit mathematical expressions that offer insights into the biological relationships within the microbiome, a crucial advantage for clinical and biological interpretation. Our experiments also show that SR can help understand complex models like XGBoost via knowledge distillation. To aid in reproducibility and further research, we have made the code openly available at https://github.com/swag2198/microbiome-symbolic-regression .


[33] 2410.16158

Networks: The Visual Language of Complexity

Understanding the origins of complexity is a fundamental challenge with implications for biological and technological systems. Network theory emerges as a powerful tool to model complex systems. Networks are an intuitive framework to represent inter-dependencies among many system components, facilitating the study of both local and global properties. However, it is unclear whether we can define a universal theoretical framework for evolving networks. While basic growth mechanisms, like preferential attachment, recapitulate common properties such as the power-law degree distribution, they fall short in capturing other system-specific properties. Tinkering, on the other hand, has shown to be very successful in generating modular or nested structures "for-free", highlighting the role of internal, non-adaptive mechanisms in the evolution of complexity. Different network extensions, like hypergraphs, have been recently developed to integrate exogenous factors in evolutionary models, as pairwise interactions are insufficient to capture environmentally-mediated species associations. As we confront global societal and climatic challenges, the study of network and hypergraphs provides valuable insights, emphasizing the importance of scientific exploration in understanding and managing complexity.


[34] 2410.16212

Comprehensive benchmarking of large language models for RNA secondary structure prediction

Inspired by the success of large language models (LLM) for DNA and proteins, several LLM for RNA have been developed recently. RNA-LLM uses large datasets of RNA sequences to learn, in a self-supervised way, how to represent each RNA base with a semantically rich numerical vector. This is done under the hypothesis that obtaining high-quality RNA representations can enhance data-costly downstream tasks. Among them, predicting the secondary structure is a fundamental task for uncovering RNA functional mechanisms. In this work we present a comprehensive experimental analysis of several pre-trained RNA-LLM, comparing them for the RNA secondary structure prediction task in an unified deep learning framework. The RNA-LLM were assessed with increasing generalization difficulty on benchmark datasets. Results showed that two LLM clearly outperform the other models, and revealed significant challenges for generalization in low-homology scenarios.