New articles on Quantitative Biology

[1] 2402.18583

Binding-Adaptive Diffusion Models for Structure-Based Drug Design

Structure-based drug design (SBDD) aims to generate 3D ligand molecules that bind to specific protein targets. Existing 3D deep generative models including diffusion models have shown great promise for SBDD. However, it is complex to capture the essential protein-ligand interactions exactly in 3D space for molecular generation. To address this problem, we propose a novel framework, namely Binding-Adaptive Diffusion Models (BindDM). In BindDM, we adaptively extract subcomplex, the essential part of binding sites responsible for protein-ligand interactions. Then the selected protein-ligand subcomplex is processed with SE(3)-equivariant neural networks, and transmitted back to each atom of the complex for augmenting the target-aware 3D molecule diffusion generation with binding interaction information. We iterate this hierarchical complex-subcomplex process with cross-hierarchy interaction node for adequately fusing global binding context between the complex and its corresponding subcomplex. Empirical studies on the CrossDocked2020 dataset show BindDM can generate molecules with more realistic 3D structures and higher binding affinities towards the protein targets, with up to -5.92 Avg. Vina Score, while maintaining proper molecular properties. Our code is available at

[2] 2402.18597

Cartographie de l'habitat de reproduction du tétras-lyre (Lyrurus tetrix) dans les Alpes françaises

The Black Grouse (Lyrurus tetrix) is an emblematic alpine species with high conservation importance. The population size of these mountain bird tends to decline on the reference sites and shows differences according to changes in local landscape characteristics. Habitat changes are at the centre of the identified pressures impacting part or all of its life cycle, according to experts. Hence, an approach to monitor population dynamics, is trough modelling the favourable habitats of Black Grouse breeding (nesting sites). Then, coupling modelling with multi-source remote sensing data (medium and very high spatial resolution), allowed the implementation of a spatial distribution model of the species. Indeed, the extraction of variables from remote sensing helped to describe the area studied at appropriate spatial and temporal scales: horizontal and vertical structure (heterogeneity), functioning (vegetation indices), phenology (seasonal or inter-annual dynamics) and biodiversity. An annual time series of radiometric indices (NDVI, NDWI, BI {\ldots}) from Sentinel-2 has made it possible to generate Dynamic Habitat Indices (DHIs) to derive phenological indications on the nature and dynamics of natural habitats. In addition, very high resolution images (SPOT6) provided access to the fine structure of natural habitats, i.e. the vertical and horizontal organisation by states identified as elementary (mineral, herbaceous, low and high woody). Indeed, one of the essential limiting factors for brood rearing is the presence of a well-developed herbaceous or ericaceous stratum in the northern Alps and larch forests in the southern region. A deep learning model was used to classify elementary strata. Finally, Biomod2 R platform, using an ensemble approach, was applied to model, the favourable habitat of Black Grouse reproduction. Of all the models, Random Forest and Extreme Boosted Gradient are the best performing, with TSS and ROC scores close to 1. For the SDM, we selected only Random Forest models (ensemble modelling) because of their low susceptibility to overfitting and coherent predictions (after comparing model predictions).In this ensemble model, the most important explanatory variables are altitude, the proportion of heathland, and the DHI (NDVI Max and NDWI Max). Results from the habitat model can be used as an operational tool for monitoring forest landscape shifts and changes. In addition, to delimiting potential areas to protect the species habitat, which constitute a valuable decision-making tool for conservation management of mountain open forest.

[3] 2402.18602

Estimation of migrate histories of the Japanese sardine in the Sea of Japan by combining the microscale stable isotope analysis of otoliths and a data assimilation model

The Japanese sardine (Sardinops melanostictus) is a small pelagic fish found in the Sea of Japan, the marginal sea of the western North Pacific. It is an important species for regional fisheries, but their transportation and migration patterns during early life stages remain unclear. In this study, we analyzed the stable oxygen isotope ratios of otoliths of young-of-the-year (age 0) Japanese sardines collected from the northern offshore and southern coastal areas of the Sea of Japan in 2015 and 2016. The ontogenetic shifts of the geographic distribution were estimated by comparing the profiles of life-long isotope ratios and temporally varying isoscape, which was calculated using the temperature and salinity fields produced by an ocean data assimilation model. Individuals that were collected in the northern and southern areas hatched and stayed in the southern areas (west offshore of Kyushu) until late June, and thereafter, they can be distinguished into two groups: one that migrated northward at shallow layer and one that stayed around the southern area in the deep layer. A comparison of somatic growth trajectories of the two groups, which was reconstructed based on otolith microstructure analysis, suggested that individuals that migrated northward had significantly larger body lengths in late June than those that stayed in the southern area. These results indicate that young-of-the-year Japanese sardines that hatched in the southern area may have been forced to choose one of two strategies to avoid extremely high water temperatures within seasonal and geographical limits. These include migrating northward or moving to deeper layers. Our results indicate that the environmental variabilities in the southern area could critically impact sardine population dynamics in the Sea of Japan.

[4] 2402.18611

HemaGraph: Breaking Barriers in Hematologic Single Cell Classification with Graph Attention

In the realm of hematologic cell populations classification, the intricate patterns within flow cytometry data necessitate advanced analytical tools. This paper presents 'HemaGraph', a novel framework based on Graph Attention Networks (GATs) for single-cell multi-class classification of hematological cells from flow cytometry data. Harnessing the power of GATs, our method captures subtle cell relationships, offering highly accurate patient profiling. Based on evaluation of data from 30 patients, HemaGraph demonstrates classification performance across five different cell classes, outperforming traditional methodologies and state-of-the-art methods. Moreover, the uniqueness of this framework lies in the training and testing phase of HemaGraph, where it has been applied for extremely large graphs, containing up to hundreds of thousands of nodes and two million edges, to detect low frequency cell populations (e.g. 0.01% for one population), with accuracies reaching 98%. Our findings underscore the potential of HemaGraph in improving hematoligic multi-class classification, paving the way for patient-personalized interventions. To the best of our knowledge, this is the first effort to use GATs, and Graph Neural Networks (GNNs) in general, to classify cell populations from single-cell flow cytometry data. We envision applying this method to single-cell data from larger cohort of patients and on other hematologic diseases.

[5] 2402.18757

Difficult control is related to instability in biologically inspired Boolean networks

Previous work in Boolean dynamical networks has suggested that the number of components that must be controlled to select an existing attractor is typically set by the number of attractors admitted by the dynamics, with no dependence on the size of the network. Here we study the rare cases of networks that defy this expectation, with attractors that require controlling most nodes. We find empirically that unstable fixed points are the primary recurring characteristic of networks that prove more difficult to control. We describe an efficient way to identify unstable fixed points and show that, in both existing biological models and ensembles of random dynamics, we can better explain the variance of control kernel sizes by incorporating the prevalence of unstable fixed points. In the end, the fact that these exceptions are associated with dynamics that are unstable to small perturbations hints that they are likely an artifact of using deterministic models. These exceptions are likely to be biologically irrelevant, supporting the generality of easy controllability in biological networks.

[6] 2402.18808

Stimulation technology for brain and nerves, now and future

In individuals afflicted with conditions such as paralysis, the implementation of Brain-Computer-Interface (BCI) has begun to significantly impact their quality of life. Furthermore, even in healthy individuals, the anticipated advantages of brain-to-brain communication and brain-to-computer interaction hold considerable promise for the future. This is attributed to the liberation from bodily constraints and the transcendence of existing limitations inherent in contemporary brain-to-brain communication methods. To actualize a comprehensive BCI, the establishment of bidirectional communication between the brain and the external environment is imperative. While neural input technology spans diverse disciplines and is currently advancing rapidly, a notable absence exists in the form of review papers summarizing the technology from the standpoint of the latest or potential input methods. The challenges encountered encompass the requisite for bidirectional communication to achieve a holistic BCI, as well as obstacles related to information volume, precision, and invasiveness. The review section comprehensively addresses both invasive and non-invasive techniques, incorporating nanotech/micro-device technology and the integration of Artificial Intelligence (AI) in brain stimulation.

[7] 2402.19035

Lotka-Volterra Model with Mutations and Generative Adversarial Networks

A model of population genetics of the Lotka-Volterra type with mutations on a statistical manifold is introduced. Mutations in the model are described by diffusion on a statistical manifold with a generator in the form of a Laplace-Beltrami operator with a Fisher-Rao metric, that is, the model combines population genetics and information geometry. This model describes a generalization of the model of machine learning theory, the model of generative adversarial network (GAN), to the case of populations of generative adversarial networks. The introduced model describes the control of overfitting for generative adversarial networks.

[8] 2402.19045

Noise-induced survival resonances during fractional killing of cell populations

Fractional killing in response to drugs is a hallmark of non-genetic cellular heterogeneity. Yet how individual lineages evade drug treatment, as observed in bacteria and cancer cells, is not quantitatively understood. We analyse a stochastic population model with age-dependent division and death rates and characterise the emergence of fractional killing as a stochastic phenomenon under constant and periodic drug environments. In constant environments, increasing cell cycle noise induces a phase transition from complete to fractional killing, while increasing death noise can induce the reverse transition. In periodic drug environments, we discover survival resonance phenomena that give rise to peaks in the survival probabilities at division or death times that are multiples of the environment duration not seen in unstructured populations.

[9] 2402.19095

A Protein Structure Prediction Approach Leveraging Transformer and CNN Integration

Proteins are essential for life, and their structure determines their function. The protein secondary structure is formed by the folding of the protein primary structure, and the protein tertiary structure is formed by the bending and folding of the secondary structure. Therefore, the study of protein secondary structure is very helpful to the overall understanding of protein structure. Although the accuracy of protein secondary structure prediction has continuously improved with the development of machine learning and deep learning, progress in the field of protein structure prediction, unfortunately, remains insufficient to meet the large demand for protein information. Therefore, based on the advantages of deep learning-based methods in feature extraction and learning ability, this paper adopts a two-dimensional fusion deep neural network model, DstruCCN, which uses Convolutional Neural Networks (CCN) and a supervised Transformer protein language model for single-sequence protein structure prediction. The training features of the two are combined to predict the protein Transformer binding site matrix, and then the three-dimensional structure is reconstructed using energy minimization.

[10] 2402.19177

Deep Mapper Graph and its Application to Visualize Plausible Pathways on High-Dimensional Distribution with Small Time-Complexity

Mapper is a topology based data analysis method that extracts topological features from high-dimensional data. The Mapper algorithm requires a filter function that maps the dataset to a Euclidian space and a clustering method, that is performed on the original dataset. This produces a graph which represents the shape of the original data. In this work, we use Mapper to uncover the conformational change of protein structures and we choose the filter function from a parameterized family, based on a deep neural network architecture. By optimizing its parameters with respect to an Energy-loss function derived from a theoretical background, we produce a Mapper graph that unveils the conformational pathways undertaken by the studied protein. Our method tackles conformational pathway detection in a unsupervised manner and therefore greatly reduces the manual and time costs necessary in this task.

[11] 2402.19185

Properties of Hagen-Poiseuille flows in channel networks

We derive the main properties of adaptive Hagen-Poiseuille flows in elastic microchannel networks akin to biological veins in organisms. We show that adaptive Hagen-Poiseuille flows successfully simulate key features of \textit{Physarum polycephalum} networks, replicating physiological out-of-equilibrium phenomena like peristalsis and shuttle streaming, associated with the mechanism of nutrient transport in \textit{Physarum}. A new topological steady state has been identified for asynchronous adaptation, supporting out-of-equilibrium laminar fluxes. Adaptive Hagen-Poiseuille flows show saturation effects on the fluxes in contractile veins, as observed in animal and artificial contractile veins.

[12] 2402.19190

Prediction of vaccination coverage level in the heterogeneous mixing population

Heterogeneity of population is a key factor in modeling the transmission of disease among the population and has huge impact on the outcome of the transmission. In order to investigate the decision making process in the heterogeneous mixing population regarding whether to be vaccinated or not, we propose the modeling framework which includes the epidemic models and the game theoretical analysis. We consider two sources of heterogeneity in this paper: the different activity levels and the different relative vaccination costs. It is interesting to observe that, if both sources of heterogeneity are considered, there exist a finite number of Nash equilibria (evolutionary stable strategies (ESS)) of the vaccination game. While if only the difference of activity levels is considered, there are infinitely many Nash equilibira. For the latter case, the initial condition of the decision making process becomes highly sensitive. In the application of public health management, the inclusion of population heterogeneity significantly complicates the prediction of the overall vaccine coverage level.

[13] 2402.19388

A model of pan-immunity maintenance by horizontal gene transfer in the ecological dynamics of bacteria and phages

Phages and their bacterial hosts are locked in an evolutionary competition which in small and closed systems typically results in the extinction of one or the other. To resist phages bacteria have evolved numerous defense systems, which nevertheless are still overcome by specific phage counter-defense mechanisms. These defense/counter-defense systems are a major element of microbial genetic diversity and have been demonstrated to propagate between strains by horizontal gene transfer (HGT). It has been proposed that the totality of defense systems found in microbial communities collectively form a distributed "pan-immune" system with individual elements moving between strains via ubiquitous HGT. Here, we formulate a Lotka-Volterra type model of a host/phage system interacting via a combinatorial variety of defense/counter-defense systems and show that HGT enables stable maintenance of diverse defense/counter-defense genes in the microbial pan-genome even when individual microbial strains inevitably undergo extinction. This stability requires the HGT rate to be sufficiently high to ensure that some descendant of a "dying" strain survives thanks to the immunity acquired through HGT from the community at large, thus establishing a new strain. This mechanism of persistence for the pan-immune gene pool is fundamentally similar to the "island migration" model of ecological diversity, with genes moving between genomes instead of species migrating between islands.

[14] 2402.18600

Artificial Intelligence and Diabetes Mellitus: An Inside Look Through the Retina

Diabetes mellitus (DM) predisposes patients to vascular complications. Retinal images and vasculature reflect the body's micro- and macrovascular health. They can be used to diagnose DM complications, including diabetic retinopathy (DR), neuropathy, nephropathy, and atherosclerotic cardiovascular disease, as well as forecast the risk of cardiovascular events. Artificial intelligence (AI)-enabled systems developed for high-throughput detection of DR using digitized retinal images have become clinically adopted. Beyond DR screening, AI integration also holds immense potential to address challenges associated with the holistic care of the patient with DM. In this work, we aim to comprehensively review the literature for studies on AI applications based on retinal images related to DM diagnosis, prognostication, and management. We will describe the findings of holistic AI-assisted diabetes care, including but not limited to DR screening, and discuss barriers to implementing such systems, including issues concerning ethics, data privacy, equitable access, and explainability. With the ability to evaluate the patient's health status vis a vis DM complication as well as risk prognostication of future cardiovascular complications, AI-assisted retinal image analysis has the potential to become a central tool for modern personalized medicine in patients with DM.

[15] 2402.18610

Why Attention Graphs Are All We Need: Pioneering Hierarchical Classification of Hematologic Cell Populations with LeukoGraph

In the complex landscape of hematologic samples such as peripheral blood or bone marrow, cell classification, delineating diverse populations into a hierarchical structure, presents profound challenges. This study presents LeukoGraph, a recently developed framework designed explicitly for this purpose employing graph attention networks (GATs) to navigate hierarchical classification (HC) complexities. Notably, LeukoGraph stands as a pioneering effort, marking the application of graph neural networks (GNNs) for hierarchical inference on graphs, accommodating up to one million nodes and millions of edges, all derived from flow cytometry data. LeukoGraph intricately addresses a classification paradigm where for example four different cell populations undergo flat categorization, while a fifth diverges into two distinct child branches, exemplifying the nuanced hierarchical structure inherent in complex datasets. The technique is more general than this example. A hallmark achievement of LeukoGraph is its F-score of 98%, significantly outclassing prevailing state-of-the-art methodologies. Crucially, LeukoGraph's prowess extends beyond theoretical innovation, showcasing remarkable precision in predicting both flat and hierarchical cell types across flow cytometry datasets from 30 distinct patients. This precision is further underscored by LeukoGraph's ability to maintain a correct label ratio, despite the inherent challenges posed by hierarchical classifications.

[16] 2402.18651

Quantifying Human Priors over Social and Navigation Networks

Human knowledge is largely implicit and relational -- do we have a friend in common? can I walk from here to there? In this work, we leverage the combinatorial structure of graphs to quantify human priors over such relational data. Our experiments focus on two domains that have been continuously relevant over evolutionary timescales: social interaction and spatial navigation. We find that some features of the inferred priors are remarkably consistent, such as the tendency for sparsity as a function of graph size. Other features are domain-specific, such as the propensity for triadic closure in social interactions. More broadly, our work demonstrates how nonclassical statistical analysis of indirect behavioral experiments can be used to efficiently model latent biases in the data.

[17] 2402.18784

Brain-inspired and Self-based Artificial Intelligence

The question "Can machines think?" and the Turing Test to assess whether machines could achieve human-level intelligence is one of the roots of AI. With the philosophical argument "I think, therefore I am", this paper challenge the idea of a "thinking machine" supported by current AIs since there is no sense of self in them. Current artificial intelligence is only seemingly intelligent information processing and does not truly understand or be subjectively aware of oneself and perceive the world with the self as human intelligence does. In this paper, we introduce a Brain-inspired and Self-based Artificial Intelligence (BriSe AI) paradigm. This BriSe AI paradigm is dedicated to coordinating various cognitive functions and learning strategies in a self-organized manner to build human-level AI models and robotic applications. Specifically, BriSe AI emphasizes the crucial role of the Self in shaping the future AI, rooted with a practical hierarchical Self framework, including Perception and Learning, Bodily Self, Autonomous Self, Social Self, and Conceptual Self. The hierarchical framework of the Self highlights self-based environment perception, self-bodily modeling, autonomous interaction with the environment, social interaction and collaboration with others, and even more abstract understanding of the Self. Furthermore, the positive mutual promotion and support among multiple levels of Self, as well as between Self and learning, enhance the BriSe AI's conscious understanding of information and flexible adaptation to complex environments, serving as a driving force propelling BriSe AI towards real Artificial General Intelligence.

[18] 2402.18826

The Machine Can't Replace the Human Heart

What is the true heart of mental healthcare -- innovation or humanity? Can virtual therapy ever replicate the profound human bonds where healing arises? As artificial intelligence and immersive technologies promise expanded access, safeguards must ensure technologies remain supplementary tools guided by providers' wisdom. Implementation requires nuance balancing efficiency and empathy. If conscious of ethical risks, perhaps AI could restore humanity by automating tasks, giving providers more time to listen. Yet no algorithm can replicate the seat of dignity within. We must ask ourselves: What future has people at its core? One where AI thoughtfully plays a collaborative role? Or where pursuit of progress leaves vulnerability behind? This commentary argues for a balanced approach thoughtfully integrating technology while retaining care's irreplaceable human essence, at the heart of this profoundly human profession. Ultimately, by nurturing innovation and humanity together, perhaps we reach new heights of empathy previously unimaginable.

[19] 2402.19157

Broken detailed balance and entropy production in directed networks

The structure of a complex network plays a crucial role in determining its dynamical properties. In this work, we show that the directed, hierarchical organisation of a network causes the system to break detailed balance and dictates the production of entropy through non-equilibrium dynamics. We consider a wide range of dynamical processes and show how different directed network features govern their thermodynamics. Next, we analyse a collection of 97 empirical networks and show that strong directedness and non-equilibrium dynamics are both ubiquitous in real-world systems. Finally, we present a simple method for inferring broken detailed balance and directed network structure from multivariate time-series and apply our method to identify non-equilibrium and hierarchical organisation in both human neuroimaging and financial time-series. Overall, our results shed light on the thermodynamic consequences of directed network structure and indicate the importance and ubiquity of hierarchical organisation and non-equilibrium dynamics in real-world systems.

[20] 2402.19207

Constrained hidden Markov models reveal further Hsp90 protein states

Time series of conformational dynamics in proteins are usually evaluated with hidden Markov models (HMMs). This approach works well if the number of states and their connectivity is known. But for the multi-domain protein Hsp90, a standard HMM analysis with optimization of the BIC (Bayesian information criterion) cannot explain long-lived states well. Therefore, here we employ constrained hidden Markov models, which neglect transitions between states by including assumptions. Gradually tuning a model with justified and focused changes allows us to improve its effectiveness and the score of the BIC. This became possible by analyzing time traces with several thousand observable transitions and, therefore, superb statistics. In this scheme, we also monitor the residences in the states reconstructed by the model, aiming to find exponentially distributed dwell times. We show how introducing new states can achieve these statistics but also point out limitations, e.g., for substantial similarity of two states connected to a common neighbor. One of the states displays the lowest free energy and is likely the idle open `waiting state', in which Hsp90 waits for the binding of nucleotides, cochaperones, or clients.