New articles on Quantitative Biology


[1] 2412.14338

GREGoR: Accelerating Genomics for Rare Diseases

Rare diseases are collectively common, affecting approximately one in twenty individuals worldwide. In recent years, rapid progress has been made in rare disease diagnostics due to advances in DNA sequencing, development of new computational and experimental approaches to prioritize genes and genetic variants, and increased global exchange of clinical and genetic data. However, more than half of individuals suspected to have a rare disease lack a genetic diagnosis. The Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) Consortium was initiated to study thousands of challenging rare disease cases and families and apply, standardize, and evaluate emerging genomics technologies and analytics to accelerate their adoption in clinical practice. Further, all data generated, currently representing ~7500 individuals from ~3000 families, is rapidly made available to researchers worldwide via the Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) to catalyze global efforts to develop approaches for genetic diagnoses in rare diseases (https://gregorconsortium.org/data). The majority of these families have undergone prior clinical genetic testing but remained unsolved, with most being exome-negative. Here, we describe the collaborative research framework, datasets, and discoveries comprising GREGoR that will provide foundational resources and substrates for the future of rare disease genomics.


[2] 2412.14341

How natural sequence variation modulates protein folding dynamics

Natural protein sequences serve as a natural record of the evolutionary constraints that shape their functional structures. We show that it is possible to use sequence information to go beyond predicting native structures and global stability to infer the folding dynamics of globular proteins. The one- and two-body evolutionary energy fields at the amino-acid level are mapped to a coarse-grained description of folding, where proteins are divided into contiguous folding elements, commonly referred to as foldons. For 15 diverse protein families, we calculated the folding dynamics of hundreds of proteins by simulating an Ising chain of foldons, with their energetics determined by the amino acid sequences. We show that protein topology imposes limits on the variability of folding cooperativity within a family. While most beta and alpha/beta structures exhibit only a few possible mechanisms despite high sequence diversity, alpha topologies allow for diverse folding scenarios among family members. We show that both the stability and cooperativity changes induced by mutations can be computed directly using sequence-based evolutionary models.


[3] 2412.14421

Comparing noisy neural population dynamics using optimal transport distances

Biological and artificial neural systems form high-dimensional neural representations that underpin their computational capabilities. Methods for quantifying geometric similarity in neural representations have become a popular tool for identifying computational principles that are potentially shared across neural systems. These methods generally assume that neural responses are deterministic and static. However, responses of biological systems, and some artificial systems, are noisy and dynamically unfold over time. Furthermore, these characteristics can have substantial influence on a system's computational capabilities. Here, we demonstrate that existing metrics can fail to capture key differences between neural systems with noisy dynamic responses. We then propose a metric for comparing the geometry of noisy neural trajectories, which can be derived as an optimal transport distance between Gaussian processes. We use the metric to compare models of neural responses in different regions of the motor system and to compare the dynamics of latent diffusion models for text-to-image synthesis.


[4] 2412.14445

Diffusion and Discrete Temporal Models of the Growth of Free-Ranging Cats in Urban Areas

The survival of the domestic cat (Felis catus) in various ecosystems has become increasingly relevant due to its impact on wildlife, public health, and society. In countries like Mexico, social factors such as abandonment have led to the feralization of the species and an unexpected increase in its population in urban areas. To design and implement effective population control methods, a thorough analysis of the species' population dynamics, along with the social factors influencing it, is necessary. We propose a reaction-diffusion model to simulate the natural dispersal of the population within a bounded domain. After exploring the species' spreading ability, we construct a complex dynamical system based on the biological characteristics of cats and their intraspecific and interspecific interactions, which we explain and study in detail. Both deterministic and stochastic parameters are considered to enhance the realism of the simulations. Our results indicate that the population reaches equilibrium, highlighting the need for control methods combined with social regulations to achieve sustainability in the system.


[5] 2412.14498

Diverging network architecture of the $\textit{C. elegans}$ connectome and signaling network

The connectome describes the complete set of synaptic contacts through which neurons communicate. While the architecture of the $\textit{C. elegans}$ connectome has been extensively characterized, much less is known about the organization of causal signaling networks arising from functional interactions between neurons. Understanding how effective communication pathways relate to or diverge from the underlying structure is a central question in neuroscience. Here, we analyze the modular architecture of the $\textit{C. elegans}$ signal propagation network, measured via calcium imaging and optogenetics, and compare it to the underlying anatomical wiring measured by electron microscopy. Compared to the connectome, we find that signaling modules are not aligned with the modular boundaries of the anatomical network, highlighting an instance where function deviates from structure. An exception to this is the pharynx which is delineated into a separate community in both anatomy and signaling. We analyze the cellular compositions of the signaling architecture and find that its modules are enriched for specific cell types and functions, suggesting that the network modules are neurobiologically relevant. Lastly, we identify a "rich club" of hub neurons in the signaling network. The membership of the signaling rich club differs from the rich club detected in the anatomical network, challenging the view that structural hubs occupy positions of influence in functional (signaling) networks. Our results provide new insight into the interplay between brain structure, in the form of a complete synaptic-level connectome, and brain function, in the form of a system-wide causal signal propagation atlas.


[6] 2412.14536

Multi-Modal Latent Variables for Cross-Individual Primary Visual Cortex Modeling and Analysis

Elucidating the functional mechanisms of the primary visual cortex (V1) remains a fundamental challenge in systems neuroscience. Current computational models face two critical limitations, namely the challenge of cross-modal integration between partial neural recordings and complex visual stimuli, and the inherent variability in neural characteristics across individuals, including differences in neuron populations and firing patterns. To address these challenges, we present a multi-modal identifiable variational autoencoder (miVAE) that employs a two-level disentanglement strategy to map neural activity and visual stimuli into a unified latent space. This framework enables robust identification of cross-modal correlations through refined latent space modeling. We complement this with a novel score-based attribution analysis that traces latent variables back to their origins in the source data space. Evaluation on a large-scale mouse V1 dataset demonstrates that our method achieves state-of-the-art performance in cross-individual latent representation and alignment, without requiring subject-specific fine-tuning, and exhibits improved performance with increasing data size. Significantly, our attribution algorithm successfully identifies distinct neuronal subpopulations characterized by unique temporal patterns and stimulus discrimination properties, while simultaneously revealing stimulus regions that show specific sensitivity to edge features and luminance variations. This scalable framework offers promising applications not only for advancing V1 research but also for broader investigations in neuroscience.


[7] 2412.14726

Towards a mathematical framework for modelling cell fate dynamics

An adult human body is made up of some 30 to 40 trillion cells, all of which stem from a single fertilized egg cell. The process by which the right cells appear to arrive in their right numbers at the right time at the right place -- development -- is only understood in the roughest of outlines. This process does not happen in isolation: the egg, the embryo, the developing foetus, and the adult organism all interact intricately with their changing environments. Conceptual and, increasingly, mathematical approaches to modelling development have centred around Waddington's concept of an epigenetic landscape. This perspective enables us to talk about the molecular and cellular factors that contribute to cells reaching their terminally differentiated state: their fate. The landscape metaphor is however only a simplification of the complex process of development; it for instance does not consider environmental influences, a context which we argue needs to be explicitly taken into account and from the outset. When delving into the literature, it also quickly becomes clear that there is a lack of consistency and agreement on even fundamental concepts; for example, the precise meaning of what we refer to when talking about a `cell type' or `cell state.' Here we engage with previous theoretical and mathematical approaches to modelling cell fate -- focused on trees, networks, and landscape descriptions -- and argue that they require a level of simplification that can be problematic. We introduce random dynamical systems as one natural alternative. These provide a flexible conceptual and mathematical framework that is free of extraneous assumptions. We develop some of the basic concepts and discuss them in relation to now `classical' depictions of cell fate dynamics, in particular Waddington's landscape.


[8] 2412.14754

Milestones at the Origin of Life

Living organisms have some common structures, chemical reactions and molecular structures. The organisms consist of cells with cell division, they have homochirality of protein and carbohydrate units, and metabolism, and genetics, and they are mortal. The molecular structures and chemical reactions underlying these features are common from the simplest bacteria to human beings. The origin of life is evolutionary with the emergence of a network of spontaneous biochemical reactions, and the evolution has taken place over a very long time. The evolution contains, however some "landmarks" and bottlenecks, which in a revolutionary manner directed the evolution, and the article tries to establish the order of these events. The article advocates that a possible order in the emergence of life is that the first milestone in prebiotic evolution is at the emergence of homochirality in proteins. The homochirality of peptides is, however, with instability and racemization which causes aging of the peptides and mortality. The metabolism and genetics are established through homochiral enzymes in the Earth's crust for $\approx$ 4 Gyr ago. Finally, the cells with cell division are established in the Hot Springs environment at the interface between the crust and the Hadean Ocean.


[9] 2412.14804

A unified theory for the development of tinnitus and hyperacusis based on associative plasticity in the dorsal cochlear nucleus

Tinnitus and hyperacusis can occur together or in isolation, with hyperacusis being associated with tinnitus much more frequently than vice versa. This striking correlation between tinnitus and hyperacusis prevalence implicates that there might be a common origin such as a (hidden) hearing loss and possibly interrelated neural mechanisms of pathological development of those two conditions. In this theoretical paper, we propose such interrelated pathological mechanisms, localized in the dorsal cochlear nucleus (DCN) of the brainstem, that are based on classical mechanisms of Hebbian and associative plasticity known from classical conditioning. Specifically, our model proposes that hyperacusis results from synaptic enhancement of cochlear input to the DCN, whereas chronic tinnitus results from synaptic enhancement of somatosensory input to the DCN. Specific conditions leading to one or the other condition are discussed. Our model predicts, that hearing loss leads to chronic tinnitus, while noise exposure (which may also cause hearing loss) leads to hyperacusis.


[10] 2412.14892

Assessing the effectiveness of test-trace-isolate interventions using a multi-layered temporal network

In the early stage of an infectious disease outbreak, public health strategies tend to gravitate towards non-pharmaceutical interventions (NPIs) given the time required to develop targeted treatments and vaccines. One of the most common NPIs is Test-Trace-Isolate (TTI). One of the factors determining the effectiveness of TTI is the ability to identify contacts of infected individuals. In this study, we propose a multi-layer temporal contact network to model transmission dynamics and assess the impact of different TTI implementations, using SARS-CoV-2 as a case study. The model was used to evaluate TTI effectiveness both in containing an outbreak and mitigating the impact of an epidemic. We estimated that a TTI strategy based on home isolation and testing of both primary and secondary contacts can contain outbreaks only when the reproduction number is up to 1.3, at which the epidemic prevention potential is 88.2% (95% CI: 87.9%-88.5%). On the other hand, for higher value of the reproduction number, TTI is estimated to noticeably mitigate disease burden but at high social costs (e.g., over a month in isolation/quarantine per person for reproduction numbers of 1.7 or higher). We estimated that strategies considering quarantine of contacts have a larger epidemic prevention potential than strategies that either avoid tracing contacts or require contacts to be tested before isolation. Combining TTI with other social distancing measures can improve the likelihood of successfully containing an outbreak but the estimated epidemic prevention potential remains lower than 50% for reproduction numbers higher than 2.1.


[11] 2412.14999

Accessing the topological properties of human brain functional sub-circuits in Echo State Networks

Recent years have witnessed an emerging trend in neuromorphic computing that centers around the use of brain connectomics as a blueprint for artificial neural networks. Connectomics-based neuromorphic computing has primarily focused on embedding human brain large-scale structural connectomes (SCs), as estimated from diffusion Magnetic Resonance Imaging (dMRI) modality, to echo-state networks (ESNs). A critical step in ESN embedding requires pre-determined read-in and read-out layers constructed by the induced subgraphs of the embedded reservoir. As \textit{a priori} set of functional sub-circuits are derived from functional MRI (fMRI) modality, it is unknown, till this point, whether the embedding of fMRI-induced sub-circuits/networks onto SCs is well justified from the neuro-physiological perspective and ESN performance across a variety of tasks. This paper proposes a pipeline to implement and evaluate ESNs with various embedded topologies and processing/memorization tasks. To this end, we showed that different performance optimums highly depend on the neuro-physiological characteristics of these pre-determined fMRI-induced sub-circuits. In general, fMRI-induced sub-circuit-embedded ESN outperforms simple bipartite and various null models with feed-forward properties commonly seen in MLP for different tasks and reservoir criticality conditions. We provided a thorough analysis of the topological properties of pre-determined fMRI-induced sub-circuits and highlighted their graph-theoretical properties that play significant roles in determining ESN performance.


[12] 2412.15013

MitraClip Device Automated Localization in 3D Transesophageal Echocardiography via Deep Learning

The MitraClip is the most widely percutaneous treatment for mitral regurgitation, typically performed under the real-time guidance of 3D transesophagel echocardiography (TEE). However, artifacts and low image contrast in echocardiography hinder accurate clip visualization. This study presents an automated pipeline for clip detection from 3D TEE images. An Attention UNet was employed to segment the device, while a DenseNet classifier predicted its configuration among ten possible states, ranging from fully closed to fully open. Based on the predicted configuration, a template model derived from computer-aided design (CAD) was automatically registered to refine the segmentation and enable quantitative characterization of the device. The pipeline was trained and validated on 196 3D TEE images acquired using a heart simulator, with ground-truth annotations refined through CAD-based templates. The Attention UNet achieved an average surface distance of 0.76 mm and 95% Hausdorff distance of 2.44 mm for segmentation, while the DenseNet achieved an average weighted F1-score of 0.75 for classification. Post-refinement, segmentation accuracy improved, with average surface distance and 95% Hausdorff distance reduced to 0.75 mm and 2.05 mm, respectively. This pipeline enhanced clip visualization, providing fast and accurate detection with quantitative feedback, potentially improving procedural efficiency and reducing adverse outcomes.


[13] 2412.14188

CogSimulator: A Model for Simulating User Cognition & Behavior with Minimal Data for Tailored Cognitive Enhancement

The interplay between cognition and gaming, notably through educational games enhancing cognitive skills, has garnered significant attention in recent years. This research introduces the CogSimulator, a novel algorithm for simulating user cognition in small-group settings with minimal data, as the educational game Wordle exemplifies. The CogSimulator employs Wasserstein-1 distance and coordinates search optimization for hyperparameter tuning, enabling precise few-shot predictions in new game scenarios. Comparative experiments with the Wordle dataset illustrate that our model surpasses most conventional machine learning models in mean Wasserstein-1 distance, mean squared error, and mean accuracy, showcasing its efficacy in cognitive enhancement through tailored game design.


[14] 2412.14343

Revisiting the Nowosiółka skull with RMaCzek

One of the first fully quantitative distance matrix visualization methods was proposed by Jan Czekanowski at the beginning of the previous century. Recently, a software package, RMaCzek, was made available that allows for producing such diagrams in R. Here we reanalyze the original data that Czekanowski used for introducing his method, and in the accompanying code show how the user can specify their own custom distance functions in the package.


[15] 2412.14350

Gaussian-convolution-invariant shell approximation to spherically-symmetric functions

We develop a class of functions Omega_N(x; mu, nu) in N-dimensional space concentrated around a spherical shell of the radius mu and such that, being convoluted with an isotropic Gaussian function, these functions do not change their expression but only a value of its 'width' parameter, nu. Isotropic Gaussian functions are a particular case of Omega_N(x; mu, nu) corresponding to mu = 0. Due to their features, these functions are an efficient tool to build approximations to smooth and continuous spherically-symmetric functions including oscillating ones. Atomic images in limited-resolution maps of the electron density, electrostatic scattering potential and other scalar fields studied in physics, chemistry, biology, and other natural sciences are examples of such functions. We give simple analytic expressions of Omega_N(x; mu, nu) for N = 1, 2, 3 and analyze properties of these functions. Representation of oscillating functions by a sum of Omega_N(x; mu, nu) allows calculating distorted maps for the same cost as the respective theoretical fields. We give practical examples of such representation for the interference functions of the uniform unit spheres for N = 1, 2, 3 that define the resolution of the respective images. Using the chain rule and analytic expressions of the Omega_N(x; mu, nu) derivatives makes simple refinement of parameters of the models which describe these fields.


[16] 2412.14428

WildSAT: Learning Satellite Image Representations from Wildlife Observations

What does the presence of a species reveal about a geographic location? We posit that habitat, climate, and environmental preferences reflected in species distributions provide a rich source of supervision for learning satellite image representations. We introduce WildSAT, which pairs satellite images with millions of geo-tagged wildlife observations readily-available on citizen science platforms. WildSAT uses a contrastive learning framework to combine information from species distribution maps with text descriptions that capture habitat and range details, alongside satellite images, to train or fine-tune models. On a range of downstream satellite image recognition tasks, this significantly improves the performance of both randomly initialized models and pre-trained models from sources like ImageNet or specialized satellite image datasets. Additionally, the alignment with text enables zero-shot retrieval, allowing for search based on general descriptions of locations. We demonstrate that WildSAT achieves better representations than recent methods that utilize other forms of cross-modal supervision, such as aligning satellite images with ground images or wildlife photos. Finally, we analyze the impact of various design choices on downstream performance, highlighting the general applicability of our approach.


[17] 2412.14572

Accelerated Patient-Specific Calibration via Differentiable Hemodynamics Simulations

One of the goals of personalized medicine is to tailor diagnostics to individual patients. Diagnostics are performed in practice by measuring quantities, called biomarkers, that indicate the existence and progress of a disease. In common cardiovascular diseases, such as hypertension, biomarkers that are closely related to the clinical representation of a patient can be predicted using computational models. Personalizing computational models translates to considering patient-specific flow conditions, for example, the compliance of blood vessels that cannot be a priori known and quantities such as the patient geometry that can be measured using imaging. Therefore, a patient is identified by a set of measurable and nonmeasurable parameters needed to well-define a computational model; else, the computational model is not personalized, meaning it is prone to large prediction errors. Therefore, to personalize a computational model, sufficient information needs to be extracted from the data. The current methods by which this is done are either inefficient, due to relying on slow-converging optimization methods, or hard to interpret, due to using `black box` deep-learning algorithms. We propose a personalized diagnostic procedure based on a differentiable 0D-1D Navier-Stokes reduced order model solver and fast parameter inference methods that take advantage of gradients through the solver. By providing a faster method for performing parameter inference and sensitivity analysis through differentiability while maintaining the interpretability of well-understood mathematical models and numerical methods, the best of both worlds is combined. The performance of the proposed solver is validated against a well-established process on different geometries, and different parameter inference processes are successfully performed.


[18] 2412.14674

Paradoxical non-Gaussian behavior in fractional Laplace motion with drift

We study fractional Laplace motion (FLM) obtained from subordination of fractional Brownian motion to a gamma process, in the presence of an external drift that acts on the composite process or of an internal drift acting solely on the parental process. We derive the statistical properties of this FLM process and find that the external drift does not influence the mean-squared displacement (MSD), whereas the internal drift leads to normal diffusion, dominating at long times in the subdiffusive Hurst exponent regime. We also investigate the intricate properties of the probability density function (PDF), demonstrating that it possesses a central Gaussian region, whose expansion in time is influenced by FBM's Hurst exponent. Outside of this region the PDF follows a non-Gaussian pattern. The kurtosis of this FLM process converges toward the Gaussian limit at long times insensitive to the extreme non-Gaussian tails. Additionally, in the presence of the external drift, the PDF remains symmetric and centered at $x=vt$. In contrast, for the internal drift this symmetry is broken. The results of our computer simulations are fully consistent with the theoretical predictions. The FLM model is suitable for describing stochastic processes with a non-Gaussian PDF and long-ranged correlations of the motion.