New articles on Quantitative Biology


[1] 2405.17433

ScAtt: an Attention based architecture to analyze Alzheimer's disease at cell type level from single-cell RNA-sequencing data

Alzheimer's disease (AD) is a pervasive neurodegenerative disorder that leads to memory and behavior impairment severe enough to interfere with daily life activities. Understanding this disease pathogenesis can drive the development of new targets and strategies to prevent and treat AD. Recent advances in high-throughput single-cell RNA sequencing technology (scRNA-seq) have enabled the generation of massive amounts of transcriptomic data at the single-cell level provided remarkable insights into understanding the molecular pathogenesis of Alzheimer's disease. In this study, we introduce ScAtt, an innovative Attention-based architecture, devised specifically for the concurrent identification of cell-type specific AD-related genes and their associated gene regulatory network. ScAtt incorporates a flexible model capable of capturing nonlinear effects, leading to the detection of AD-associated genes that might be overlooked by traditional differentially expressed gene (DEG) analyses. Moreover, ScAtt effectively infers a gene regulatory network depicting the combined influences of genes on the targeted disease, as opposed to examining correlations among genes in conventional gene co-expression networks. In an application to 95,186 single-nucleus transcriptomes from 17 hippocampus samples, ScAtt shows substantially better performance in modeling quantitative changes in expression levels between AD and healthy controls. Consequently, ScAtt performs better than existing methods in the identification of AD-related genes, with more unique discoveries and less overlap between cell types. Functional enrichments of the corresponding gene modules detected from gene regulatory network show significant enrichment of biologically meaningful AD-related pathways across different cell types.


[2] 2405.17515

EPI-VALID : Validation of an algorithm for identifying patients with epilepsy in the SNDS using data from the CONSTANCES cohort

Introduction: The HAS conducted a study in 2018 using the French National Health Data System (SNDS) on the care pathway of patients with epilepsy. This study used 2 algorithms to identify patients with epilepsy, based on hospitalization for epilepsy, insurance for chronic severe epilepsy, and antiepileptic drug (AE) dispensing, with exclusion of AEs not specific to epilepsy offering a low population estimate, or exclusion of dispensing in a context other than epilepsy (migraine, neuropathic pain, bipolar disorder, alcohol dependence, anxiety) offering a high population estimate. To study the characteristics of these 2 algorithms (low and high populations), we used data from CONSTANCES, a French generalist cohort and its control sample, which are matched to SNDS data. Method: This is a validation study of a method for identifying patients with epilepsy in the SNDS. The study population was patients identified as epileptic according to the algorithms defined in the introduction, applied over the year 2018 and the reference population was all patients for whom there is a mention of epilepsy (free text field) in the inclusion and/or follow-up questionnaires of the CONSTANCES cohort for the years prior to 2019. Results: Among 156,819 patients present in the CONSTANCES cohort in 2018, 689 patients mentioned epilepsy in one of the questionnaires (0.4%). This number is lower than expected: in the literature, the prevalence of epilepsy in the general population is estimated at 0.6-0.7% in adults. This is due to under-reporting and lack of representativeness of epilepsy in the cohort, which we were able to estimate from the control sample data. Aware of these two limitations, we compared the characteristics of the algorithms with those of a control algorithm using insurance for chronic severe epilepsy and/or hospitalization for epilepsy as inclusion criteria in the SNDS. The accuracy of our SNDS algorithms is not very high (17.8% and 35% respectively in high and low populations) compared with the control algorithm (58.1%). However, they do improve sensitivity (+33 and +14 points respectively), with little loss of specificity (-1.2 and -0.3 points respectively). Discussion: The analysis of false-positive patients has enabled us to identify ways of improving the algorithms. Indeed, some AE drugs could be excluded from the inclusion criteria, as they are widely used in another indication. For example, eliminating pregabalin and gabapentin from the high-population algorithm would significantly improve its accuracy (45.3% vs. 17.8%).


[3] 2405.17530

Universal deterministic patterns in stochastic count data

We report the existence of deterministic patterns in plots showing the relationship between the mean and the Fano factor (ratio of variance and mean) of stochastic count data. These patterns are found in a wide variety of datasets, including those from genomics, paper citations, commerce, ecology, disease outbreaks, and employment statistics. We develop a theory showing that the patterns naturally emerge when data sampled from discrete probability distributions is organised in matrix form. The theory precisely predicts the patterns and shows that they are a function of only one variable - the sample size.


[4] 2405.17745

Shaping the distribution of neural responses with interneurons in a recurrent circuit model

Efficient coding theory posits that sensory circuits transform natural signals into neural representations that maximize information transmission subject to resource constraints. Local interneurons are thought to play an important role in these transformations, shaping patterns of circuit activity to facilitate and direct information flow. However, the relationship between these coordinated, nonlinear, circuit-level transformations and the properties of interneurons (e.g., connectivity, activation functions, response dynamics) remains unknown. Here, we propose a normative computational model that establishes such a relationship. Our model is derived from an optimal transport objective that conceptualizes the circuit's input-response function as transforming the inputs to achieve a target response distribution. The circuit, which is comprised of primary neurons that are recurrently connected to a set of local interneurons, continuously optimizes this objective by dynamically adjusting both the synaptic connections between neurons as well as the interneuron activation functions. In an application motivated by redundancy reduction theory, we demonstrate that when the inputs are natural image statistics and the target distribution is a spherical Gaussian, the circuit learns a nonlinear transformation that significantly reduces statistical dependencies in neural responses. Overall, our results provide a framework in which the distribution of circuit responses is systematically and nonlinearly controlled by adjustment of interneuron connectivity and activation functions.


[5] 2405.17833

Neutral phylogenetic models and their role in tree-based biodiversity measures

A wide variety of stochastic models of cladogenesis (based on speciation and extinction) lead to an identical distribution on phylogenetic tree shapes once the edge lengths are ignored. By contrast, the distribution of the tree's edge lengths is generally quite sensitive to the underlying model. In this paper, we review the impact of different model choices on tree shape and edge length distribution, and its impact for studying the properties of phylogenetic diversity (PD) as a measure of biodiversity, and the loss of PD as species become extinct at the present. We also compare PD with a stochastic model of feature diversity, and investigate some mathematical links and inequalities between these two measures plus their predictions concerning the loss of biodiversity under extinction at the present.


[6] 2405.17847

Sparsification of Phylogenetic Covariance Matrices of $k$-Regular Trees

Consider a tree $T=(V,E)$ with root $\circ$ and edge length function $\ell:E\to\mathbb{R}_+$. The phylogenetic covariance matrix of $T$ is the matrix $C$ with rows and columns indexed by $L$, the leaf set of $T$, with entries $C(i,j):=\sum_{e\in[i\wedge j,o]}\ell(e)$, for each $i,j\in L$. Recent work [15] has shown that the phylogenetic covariance matrix of a large, random binary tree $T$ is significantly sparsified with overwhelmingly high probability under a change-of-basis with respect to the so-called Haar-like wavelets of $T$. This finding notably enables manipulating the spectrum of covariance matrices of large binary trees without the necessity to store them in computer memory but instead performing two post-order traversals of the tree. Building on the methods of [15], this manuscript further advances their sparsification result to encompass the broader class of $k$-regular trees, for any given $k\ge2$. This extension is achieved by refining existing asymptotic formulas for the mean and variance of the internal path length of random $k$-regular trees, utilizing hypergeometric function properties and identities.


[7] 2405.17960

Elementary Flux Modes as CRN Gears for Free Energy Transduction

We demonstrate that, for a chemical reaction network (CRN) engaged in energy transduction, its optimal operation from a thermodynamic efficiency standpoint is contingent upon its working conditions. Analogously to the bicycle gear system, CRNs have at their disposal several transducing mechanisms characterized by different yields. We highlight the critical role of the CRN's elementary flux modes in determining this "gearing" and their impact on maximizing energy transduction efficiency. Furthermore, we introduce an enzymatically regulated CRN, engineered to autonomously adjust its "gear", thereby optimizing its efficiency under different external conditions.


[8] 2405.18327

Histopathology Based AI Model Predicts Anti-Angiogenic Therapy Response in Renal Cancer Clinical Trial

Predictive biomarkers of treatment response are lacking for metastatic clear cell renal cell carcinoma (ccRCC), a tumor type that is treated with angiogenesis inhibitors, immune checkpoint inhibitors, mTOR inhibitors and a HIF2 inhibitor. The Angioscore, an RNA-based quantification of angiogenesis, is arguably the best candidate to predict anti-angiogenic (AA) response. However, the clinical adoption of transcriptomic assays faces several challenges including standardization, time delay, and high cost. Further, ccRCC tumors are highly heterogenous, and sampling multiple areas for sequencing is impractical. Here we present a novel deep learning (DL) approach to predict the Angioscore from ubiquitous histopathology slides. To overcome the lack of interpretability, one of the biggest limitations of typical DL models, our model produces a visual vascular network which is the basis of the model's prediction. To test its reliability, we applied this model to multiple cohorts including a clinical trial dataset. Our model accurately predicts the RNA-based Angioscore on multiple independent cohorts (spearman correlations of 0.77 and 0.73). Further, the predictions help unravel meaningful biology such as association of angiogenesis with grade, stage, and driver mutation status. Finally, we find our model can predict response to AA therapy, in both a real-world cohort and the IMmotion150 clinical trial. The predictive power of our model vastly exceeds that of CD31, a marker of vasculature, and nearly rivals the performance (c-index 0.66 vs 0.67) of the ground truth RNA-based Angioscore at a fraction of the cost. By providing a robust yet interpretable prediction of the Angioscore from histopathology slides alone, our approach offers insights into angiogenesis biology and AA treatment response.


[9] 2405.18329

Spatial-temporal analysis of neural desynchronization in sleep-like states reveals critical dynamics

Sleep is characterized by non-rapid eye movement (nREM) sleep, originating from widespread neuronal synchrony, and REM sleep, with neuronal desynchronization akin to waking behavior. While these were thought to be global brain states, recent research suggests otherwise. Using time-frequency analysis of mesoscopic voltage-sensitive dye recordings of mice in a urethane-anesthetized model of sleep, we find transient neural desynchronization occurring heterogeneously across the cortex within a background of synchronized neural activity, in a manner reminiscent of a critical spreading process and indicative of an "edge-of-synchronization phase" transition.


[10] 2405.18343

On in-silico estimation of left ventricular end-diastolic pressure from cardiac strains

Left ventricular diastolic dysfunction (LVDD) is a group of diseases that adversely affect the passive phase of the cardiac cycle and can lead to heart failure. While left ventricular end-diastolic pressure (LVEDP) is a valuable prognostic measure in LVDD patients, traditional invasive methods of measuring LVEDP present risks and limitations, highlighting the need for alternative approaches. This paper investigates the possibility of measuring LVEDP non-invasively using inverse in-silico modeling. We propose the adoption of patient-specific cardiac modeling and simulation to estimate LVEDP and myocardial stiffness from cardiac strains. We have developed a high-fidelity patient-specific computational model of the left ventricle. Through an inverse modeling approach, myocardial stiffness and LVEDP were accurately estimated from cardiac strains that can be acquired from in vivo imaging, indicating the feasibility of computational modeling to augment current approaches in the measurement of ventricular pressure. Integration of such computational platforms into clinical practice holds promise for early detection and comprehensive assessment of LVDD with reduced risk for patients.


[11] 2405.18402

Antigenic Cooperation in Viral Populations: Redistribution of Loads Among Altruistic Viruses and Maximal Load per Altruist

The paper continues the study of the phenomenon of local immunodeficiency (LI) in viral cross-immunoreactivity networks, with a focus on the roles and interactions between altruistic and persistent viral variants. As always, only the state of stable (i.e. observable) LI is analysed. First, we show that a single altruistic viral variant has an upper limit for the number of persistent viral variants that it can support. Our findings reveal that in viral cross-immunoreactivity networks, altruistic viruses act essentially autonomously from each other. Namely, connections between altruistic viruses do not change neither their qualitative roles, nor the quantitative values of the strengths of their connections in the CRNs. In other words, each altruistic virus does exactly the same actions and with the same strengths with or without presence of other altruistic viruses. However, having more altruistic viruses allows to keep sizes of populations of persistent viruses at the higher levels. Likewise, the strength of the immune response against any altruistic virus remains at the same constant level regardless of how many persistent viruses this altruistic virus supports, i.e. shields from the immune response of the host's immune system. It is also shown that viruses strongly compete with each other in order to become persistent in the state of stable LI. We also present an example for a CRN with stable LI that only consists of persistent viral variants.


[12] 2405.18419

Exploring the Evolution of Altruistic Punishment with a PDE Model of Cultural Multilevel Selection

Two mechanisms that have been used to study the evolution of cooperative behavior are altruistic punishment, in which cooperative individuals pay additional costs to punish defection, and multilevel selection, in which competition between groups can help to counteract individual-level incentives to cheat. Boyd, Gintis, Bowles, and Richerson have used simulation models of cultural evolution to suggest that altruistic punishment and pairwise group-level competition can work in concert to promote cooperation, even when neither mechanism can do so on its own. In this paper, we formulate a PDE model for multilevel selection motivated by the approach of Boyd and coauthors, modeling individual-level birth-death competition with a replicator equation based on individual payoffs and describing group-level competition with pairwise conflicts based on differences in the average payoffs of the competing groups. Building off of existing PDE models for multilevel selection with frequency-independent group-level competition, we use analytical and numerical techniques to understand how the forms of individual and average payoffs can impact the long-time ability to sustain altruistic punishment in group-structured populations. We find several interesting differences between the behavior of our new PDE model with pairwise group-level competition and existing multilevel PDE models, including the observation that our new model can feature a non-monotonic dependence of the long-time collective payoff on the strength of altruistic punishment. Going forward, our PDE framework can serve as a way to connect and compare disparate approaches for understanding multilevel selection across the literature in evolutionary biology and anthropology.


[13] 2405.17578

Cell migration: Beyond Brownian motion

This brief `New & Notable' (perspectives-type) article contains a mini-review on stochastic modelling of cell migration before elaborating on the article by Klimek et al., arXiv:2311.16753 [Biophys. J. 123, 1173-1183 (2024)].


[14] 2405.17656

Alignment is Key for Applying Diffusion Models to Retrosynthesis

Retrosynthesis, the task of identifying precursors for a given molecule, can be naturally framed as a conditional graph generation task. Diffusion models are a particularly promising modelling approach, enabling post-hoc conditioning and trading off quality for speed during generation. We show mathematically that permutation equivariant denoisers severely limit the expressiveness of graph diffusion models and thus their adaptation to retrosynthesis. To address this limitation, we relax the equivariance requirement such that it only applies to aligned permutations of the conditioning and the generated graphs obtained through atom mapping. Our new denoiser achieves the highest top-$1$ accuracy ($54.7$\%) across template-free and template-based methods on USPTO-50k. We also demonstrate the ability for flexible post-training conditioning and good sample quality with small diffusion step counts, highlighting the potential for interactive applications and additional controls for multi-step planning.


[15] 2405.17802

Multi-level Interaction Modeling for Protein Mutational Effect Prediction

Protein-protein interactions are central mediators in many biological processes. Accurately predicting the effects of mutations on interactions is crucial for guiding the modulation of these interactions, thereby playing a significant role in therapeutic development and drug discovery. Mutations generally affect interactions hierarchically across three levels: mutated residues exhibit different sidechain conformations, which lead to changes in the backbone conformation, eventually affecting the binding affinity between proteins. However, existing methods typically focus only on sidechain-level interaction modeling, resulting in suboptimal predictions. In this work, we propose a self-supervised multi-level pre-training framework, ProMIM, to fully capture all three levels of interactions with well-designed pretraining objectives. Experiments show ProMIM outperforms all the baselines on the standard benchmark, especially on mutations where significant changes in backbone conformations may occur. In addition, leading results from zero-shot evaluations for SARS-CoV-2 mutational effect prediction and antibody optimization underscore the potential of ProMIM as a powerful next-generation tool for developing novel therapeutic approaches and new drugs.


[16] 2405.17903

Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion

Visual object tracking, which is primarily based on visible light image sequences, encounters numerous challenges in complicated scenarios, such as low light conditions, high dynamic ranges, and background clutter. To address these challenges, incorporating the advantages of multiple visual modalities is a promising solution for achieving reliable object tracking. However, the existing approaches usually integrate multimodal inputs through adaptive local feature interactions, which cannot leverage the full potential of visual cues, thus resulting in insufficient feature modeling. In this study, we propose a novel multimodal hybrid tracker (MMHT) that utilizes frame-event-based data for reliable single object tracking. The MMHT model employs a hybrid backbone consisting of an artificial neural network (ANN) and a spiking neural network (SNN) to extract dominant features from different visual modalities and then uses a unified encoder to align the features across different domains. Moreover, we propose an enhanced transformer-based module to fuse multimodal features using attention mechanisms. With these methods, the MMHT model can effectively construct a multiscale and multidimensional visual feature space and achieve discriminative feature modeling. Extensive experiments demonstrate that the MMHT model exhibits competitive performance in comparison with that of other state-of-the-art methods. Overall, our results highlight the effectiveness of the MMHT model in terms of addressing the challenges faced in visual object tracking tasks.


[17] 2405.17992

fMRI predictors based on language models of increasing complexity recover brain left lateralization

Over the past decade, studies of naturalistic language processing where participants are scanned while listening to continuous text have flourished. Using word embeddings at first, then large language models, researchers have created encoding models to analyze the brain signals. Presenting these models with the same text as the participants allows to identify brain areas where there is a significant correlation between the functional magnetic resonance imaging (fMRI) time series and the ones predicted by the models' artificial neurons. One intriguing finding from these studies is that they have revealed highly symmetric bilateral activation patterns, somewhat at odds with the well-known left lateralization of language processing. Here, we report analyses of an fMRI dataset where we manipulate the complexity of large language models, testing 28 pretrained models from 8 different families, ranging from 124M to 14.2B parameters. First, we observe that the performance of models in predicting brain responses follows a scaling law, where the fit with brain activity increases linearly with the logarithm of the number of parameters of the model (and its performance on natural language processing tasks). Second, we show that a left-right asymmetry gradually appears as model size increases, and that the difference in left-right brain correlations also follows a scaling law. Whereas the smallest models show no asymmetry, larger models fit better and better left hemispheric activations than right hemispheric ones. This finding reconciles computational analyses of brain activity using large language models with the classic observation from aphasic patients showing left hemisphere dominance for language.


[18] 2405.18006

Mathematical models of the Arabidopsis circadian oscillator

We review the construction and evolution of mathematical models of the Arabidopsis circadian clock, structuring the discussion into two distinct historical phases of modeling strategies: extension and reduction. The extension phase explores the bottom-up assembly of regulatory networks introducing as many components and interactions as possible in order to capture the oscillatory nature of the clock. The reduction phase deals with functional decomposition, distilling complex models to their essential dynamical repertoire. Current challenges in this field, including the integration of spatial considerations and environmental influences like light and temperature, are also discussed. The review emphasizes the ongoing need for models that balance molecular detail with practical simplicity.


[19] 2405.18051

Predicting Progression Events in Multiple Myeloma from Routine Blood Work

The ability to accurately predict disease progression is paramount for optimizing multiple myeloma patient care. This study introduces a hybrid neural network architecture, combining Long Short-Term Memory networks with a Conditional Restricted Boltzmann Machine, to predict future blood work of affected patients from a series of historical laboratory results. We demonstrate that our model can replicate the statistical moments of the time series ($0.95~\pm~0.01~\geq~R^2~\geq~0.83~\pm~0.03$) and forecast future blood work features with high correlation to actual patient data ($0.92\pm0.02~\geq~r~\geq~0.52~\pm~0.09$). Subsequently, a second Long Short-Term Memory network is employed to detect and annotate disease progression events within the forecasted blood work time series. We show that these annotations enable the prediction of progression events with significant reliability (AUROC$~=~0.88~\pm~0.01$), up to 12 months in advance (AUROC($t+12~$mos)$~=0.65~\pm~0.01$). Our system is designed in a modular fashion, featuring separate entities for forecasting and progression event annotation. This structure not only enhances interpretability but also facilitates the integration of additional modules to perform subsequent operations on the generated outputs. Our approach utilizes a minimal set of routine blood work measurements, which avoids the need for expensive or resource-intensive tests and ensures accessibility of the system in clinical routine. This capability allows for individualized risk assessment and making informed treatment decisions tailored to a patient's unique disease kinetics. The represented approach contributes to the development of a scalable and cost-effective virtual human twin system for optimized healthcare resource utilization and improved patient outcomes in multiple myeloma care.


[20] 2405.18190

Mutation-Bias Learning in Games

We present two variants of a multi-agent reinforcement learning algorithm based on evolutionary game theoretic considerations. The intentional simplicity of one variant enables us to prove results on its relationship to a system of ordinary differential equations of replicator-mutator dynamics type, allowing us to present proofs on the algorithm's convergence conditions in various settings via its ODE counterpart. The more complicated variant enables comparisons to Q-learning based algorithms. We compare both variants experimentally to WoLF-PHC and frequency-adjusted Q-learning on a range of settings, illustrating cases of increasing dimensionality where our variants preserve convergence in contrast to more complicated algorithms. The availability of analytic results provides a degree of transferability of results as compared to purely empirical case studies, illustrating the general utility of a dynamical systems perspective on multi-agent reinforcement learning when addressing questions of convergence and reliable generalisation.