New articles on Quantitative Biology


[1] 2411.05017

Time in a bottle. A psychophysics study of human time perception through aging

Time perception is crucial for a coherent human experience. As life progresses, our perception of the passage of time becomes increasingly non-uniform, often feeling as though it accelerates with age. While various causes for this phenomenon have been theorized, a comprehensive mathematical and theoretical framework remains underexplored. This study aims to elucidate the mechanisms behind perceived time dilation by integrating classical and revised psychophysical theorems with a novel mathematical approach. Utilizing Weber-Fechner laws as foundational elements, we develop a model that transitions from exponential to logarithmic functions to represent changes in time perception across the human lifespan. Our results indicate that the perception of time shifts significantly around the age of mental maturity, aligning with a proposed inversion point where sensitivity to temporal stimuli decreases, eventually plateauing out at a constant rate. This model not only explains the underlying causes of time perception changes but also provides analytical values to quantify this acceleration. These findings offer valuable insights into the cognitive and neurological processes influencing how we experience time as we go through life.


[2] 2411.05030

EAP4EMSIG -- Experiment Automation Pipeline for Event-Driven Microscopy to Smart Microfluidic Single-Cells Analysis

Microfluidic Live-Cell Imaging (MLCI) generates high-quality data that allows biotechnologists to study cellular growth dynamics in detail. However, obtaining these continuous data over extended periods is challenging, particularly in achieving accurate and consistent real-time event classification at the intersection of imaging and stochastic biology. To address this issue, we introduce the Experiment Automation Pipeline for Event-Driven Microscopy to Smart Microfluidic Single-Cells Analysis (EAP4EMSIG). In particular, we present initial zero-shot results from the real-time segmentation module of our approach. Our findings indicate that among four State-Of-The- Art (SOTA) segmentation methods evaluated, Omnipose delivers the highest Panoptic Quality (PQ) score of 0.9336, while Contour Proposal Network (CPN) achieves the fastest inference time of 185 ms with the second-highest PQ score of 0.8575. Furthermore, we observed that the vision foundation model Segment Anything is unsuitable for this particular use case.


[3] 2411.05055

Integrating Large Language Models for Genetic Variant Classification

The classification of genetic variants, particularly Variants of Uncertain Significance (VUS), poses a significant challenge in clinical genetics and precision medicine. Large Language Models (LLMs) have emerged as transformative tools in this realm. These models can uncover intricate patterns and predictive insights that traditional methods might miss, thus enhancing the predictive accuracy of genetic variant pathogenicity. This study investigates the integration of state-of-the-art LLMs, including GPN-MSA, ESM1b, and AlphaMissense, which leverage DNA and protein sequence data alongside structural insights to form a comprehensive analytical framework for variant classification. Our approach evaluates these integrated models using the well-annotated ProteinGym and ClinVar datasets, setting new benchmarks in classification performance. The models were rigorously tested on a set of challenging variants, demonstrating substantial improvements over existing state-of-the-art tools, especially in handling ambiguous and clinically uncertain variants. The results of this research underline the efficacy of combining multiple modeling approaches to significantly refine the accuracy and reliability of genetic variant classification systems. These findings support the deployment of these advanced computational models in clinical environments, where they can significantly enhance the diagnostic processes for genetic disorders, ultimately pushing the boundaries of personalized medicine by offering more detailed and actionable genetic insights.


[4] 2411.05213

A chemostat model with variable dilution rate due to biofilm growth

In many real life applications, a continuous culture bioreactor may cease to function properly due to bioclogging which is typically caused by the microbial overgrowth. This is a problem that has been largely overlooked in the chemostat modeling literature, despite the fact that a number of models explicitly accounted for biofilm development inside the bioreactor. In a typical chemostat model, the physical volume of the biofilm is considered negligible when compared to the volume of the fluid. In this paper, we investigate the theoretical consequences of removing such assumption. Specifically, we formulate a novel mathematical model of a chemostat where the increase of the biofilm volume occurs at the expense of the fluid volume of the bioreactor, and as a result the corresponding dilution rate increases reciprocally. We show that our model is well-posed and describes the bioreactor that can operate in three distinct types of dynamic regimes: the washout equilibrium, the coexistence equilibrium, or a transient towards the clogged state which is reached in finite time. We analyze the multiplicity and the stability of the corresponding equilibria. In particular, we delineate the parameter combinations for which the chemostat never clogs up and those for which it clogs up in finite time. We also derive criteria for microbial persistence and extinction. Finally, we present a numerical evidence that a multistable coexistence in the chemostat with variable dilution rate is feasible.


[5] 2411.05244

Nonperfused Retinal Capillaries -- A New Method Developed on OCT and OCTA

To develop a new method to quantify nonperfused retinal capillaries (NPCs) by using co-registered optical coherence tomography (OCT) and OCT angiography (OCTA), and to evaluate NPCs in eyes with age-related macular degeneration (AMD) and diabetic retinopathy (DR). Multiple consecutive 3x3-mm OCT/OCTA scans were obtained using a commercial device (Solix; Visionix/Optovue, Inc., California, USA). We averaged multiple registered OCT/OCTA scans to create high-definition volumes. The deep capillary plexus slab was defined and segmented. A novel deep learning denoising algorithm removed tissue background noise from capillaries in the en face OCT/OCTA. The algorithm segmented NPCs by identifying capillaries from OCT without corresponding flow signals in the OCTA. We then investigated the relationships between NPCs and known features in AMD and DR. The denoised en face OCT/OCTA revealed the structure and flow of the capillaries. The automatically segmented NPC achieved an accuracy of 88.2% compared to manual grading of DR. Compared to healthy controls, both the mean number and total length (mm) of NPCs were significantly increased in eyes with AMD and eyes with DR (P < 0.001). Compared to early and intermediate AMD, the number and total length of NPCs were significantly higher in advanced AMD (number: P<0.001, P<0.001; total length: P = 0.002, P =0.003). Geography atrophy, macular neovascularization, drusen volume, and extrafoveal avascular area (EAA) significantly correlated with increased NPCs (P<0.05). In eyes with DR, NPCs correlated with the number of microaneurysms and EAA (P<0.05). The presence of fluid did not significantly correlate with NPCs in AMD and DR. Conclusions A deep learning-based algorithm can segment and quantify retinal capillaries that lack flow using colocalized OCT/OCTA. This novel biomarker may be useful in AMD and DR.


[6] 2411.05266

Use of 3D chaos game representation to quantify DNA sequence similarity with applications for hierarchical clustering

A 3D chaos game is shown to be a useful way for encoding DNA sequences. Since matching subsequences in DNA converge in space in 3D chaos game encoding, a DNA sequence's 3D chaos game representation can be used to compare DNA sequences without prior alignment and without truncating or padding any of the sequences. Two proposed methods inspired by shape-similarity comparison techniques show that this form of encoding can perform as well as alignment-based techniques for building phylogenetic trees. The first method uses the volume overlap of intersecting spheres and the second uses shape signatures by summarizing the coordinates, oriented angles, and oriented distances of the 3D chaos game trajectory. The methods are tested using: (1) the first exon of the beta-globin gene for 11 species, (2) mitochondrial DNA from four groups of primates, and (3) a set of synthetic DNA sequences. Simulations show that the proposed methods produce distances that reflect the number of mutation events; additionally, on average, distances resulting from deletion mutations are comparable to those produced by substitution mutations.


[7] 2411.05329

Electro-diffusive modeling and the role of spine geometry on action potential propagation in neurons

Electrical signaling in the brain plays a vital role to our existence but at the same time, the fundamental mechanism of this propagation is undeciphered. Notable advancements have been made in the numerical modeling supplementing the related experimental findings. Cable theory based models provided a significant breakthrough in understanding the mechanism of electrical propagation in the neuronal axons. Cable theory, however, fails for thin geometries such as a spine or a dendrite of a neuron, amongst its other limitations. Recently, the spatiotemporal propagation has been precisely modeled using the Poisson-Nernst-Planck (PNP) electro-diffusive theory in the neuronal axons as well as the dendritic spines respectively. Patch clamp and voltage imaging experiments have extensively aided the study of action potential propagation exclusively for the neuronal axons but not the dendritic spines because of the challenges linked with their thin geometry. Assisted by the super-resolution microscopes and the voltage dyeing experiments, it has become possible to precisely measure the voltage in the dendritic spines. This has facilitated the requirement of a high fidelity numerical frame that is capable of acting as a digital twin. Here, using the PNP theory, we integrate the dendritic spine, soma and the axon region to numerically model the propagation of excitatory synaptic potential in a complete neuronal geometry with the synaptic input at the spines, potential initiating at the axon hillock and propagating through the neuronal axon. The model outputs the forward propagation of the action potential along the neuronal axons as well as the back propagation into the spines. We point out the significance of the intricate geometry of the dendritic spines, namely the spine neck length and radius, and the ion channel density in the axon hillock to the action potential initiation and propagation.


[8] 2411.05371

BayesianFitForecast: A User-Friendly R Toolbox for Parameter Estimation and Forecasting with Ordinary Differential Equations

Background: Mathematical models based on ordinary differential equations (ODEs) are essential tools across various scientific disciplines, including biology, ecology, and healthcare informatics. They are used to simulate complex dynamic systems and inform decision-making. In this paper, we introduce BayesianFitForecast, an R toolbox specifically developed to streamline Bayesian parameter estimation and forecasting in ODE models, making it particularly relevant to health informatics and public health decision-making. The toolbox is available at https://github.com/gchowell/BayesianFitForecast/. Results: This toolbox enables automatic generation of Stan files, allowing users to configure models, define priors, and analyze results with minimal programming expertise. To demonstrate the versatility and robustness of BayesianFitForecast, we apply it to the analysis of the 1918 influenza pandemic in San Francisco, comparing Poisson and negative binomial error structures within the SEIR model. We also test it by fitting multiple time series of state variables using simulated data. BayesianFitForecast provides robust tools for evaluating model performance, including convergence diagnostics, posterior distributions, credible intervals, and performance metrics. Conclusion: By improving the accessibility of advanced Bayesian methods, this toolbox significantly broadens the application of Bayesian inference methods to dynamical systems critical for healthcare and epidemiological forecasting. A tutorial video demonstrating the toolbox's functionality is available at https://youtu.be/jnxMjz3V3n8.


[9] 2411.05502

Infection Pressure on Fish in Cages

We address the question of how to connect predictions by hydrodynamic models of how sea lice move in water to observable measures that count the number of lice on each fish in a cage in the water. This question is important for management and regulation of aquacultural practice that tries to maximise food production and minimise risk to the environment. We do this through a simple rule-based model of interaction between sea lice and caged fish. The model is simple: sea lice can attach and detach from a fish. The model has a novel feature, encoding what is known as a master equation producing a time-series of distributions of lice on fish that one might expect to find if a cage full of fish were placed at any given location. To demonstrate how this works, and to arrive at a rough estimate of the interaction rates, we fit a simplified version of the model with three free parameters to publicly available data about an experiment with sentinel cages in Loch Linnhe in Scotland. Our construction, coupled to the hydrodynamic models driven by surveillance data from industrial farms, quantifies the environmental impact as: what would the infection burden look like in a notional cage at any location and how does it change with time?


[10] 2411.05543

Evolution of cooperation in a three-strategy game combining snowdrift and stag hunt games

This study aimed to investigate the evolutionary dynamics of a three-strategy game that combines snowdrift and stag hunt games. This game is motivated by an experimental study, which found that individual solution lowers cooperation levels. Agents adopting this option aim to address a problem to the extent necessary to remove negative impact on themselves, although they do not free ride on cooperation effort provided by others. This property of the individual solution is similar to that of option defection in the stag hunt. Thus, the role of the interplay of defection in the snowdrift game and individual solution was examined in this study. The well-mixed population has two asymptotically stable rest points, one wherein the individual solution occupies the population, and the other wherein cooperation and defection coexist. The interactions on a square lattice enlarge the parameter region wherein cooperation survives, and the three strategies often coexist. The scrutinization of the evolutionary process shows that multiple mechanisms lead to the coexistence of the three strategies depending on parameter values. Our analysis suggests that considering the individual solution adds complexity to the evolutionary process, which might contribute to our understanding on the evolution of cooperation.


[11] 2411.05673

Relationships between the degrees of freedom in the affine Gaussian derivative model for visual receptive fields and 2-D affine image transformations, with application to covariance properties of simple cells in the primary visual cortex

When observing the surface patterns of objects delimited by smooth surfaces, the projections of the surface patterns to the image domain will be subject to substantial variabilities, as induced by variabilities in the geometric viewing conditions, and as generated by either monocular or binocular imaging conditions, or by relative motions between the object and the observer over time. To first order of approximation, the image deformations of such projected surface patterns can be modelled as local linearizations in terms of local 2-D spatial affine transformations. This paper presents a theoretical analysis of relationships between the degrees of freedom in 2-D spatial affine image transformations and the degrees of freedom in the affine Gaussian derivative model for visual receptive fields. For this purpose, we first describe a canonical decomposition of 2-D affine transformations on a product form, closely related to a singular value decomposition, while in closed form, and which reveals the degrees of freedom in terms of (i) uniform scaling transformations, (ii) an overall amount of global rotation, (iii) a complementary non-uniform scaling transformation and (iv) a relative normalization to a preferred symmetry orientation in the image domain. Then, we show how these degrees of freedom relate to the degrees of freedom in the affine Gaussian derivative model. Finally, we use these theoretical results to consider whether we could regard the biological receptive fields in the primary visual cortex of higher mammals as being able to span the degrees of freedom of 2-D spatial affine transformations, based on interpretations of existing neurophysiological experimental results.


[12] 2411.03871

Safe Paths and Sequences for Scalable ILPs in RNA Transcript Assembly Problems

A common step at the core of many RNA transcript assembly tools is to find a set of weighted paths that best explain the weights of a DAG. While such problems easily become NP-hard, scalable solvers exist only for a basic error-free version of this problem, namely minimally decomposing a network flow into weighted paths. The main result of this paper is to show that we can achieve speedups of two orders of magnitude also for path-finding problems in the realistic setting (i.e., the weights do not induce a flow). We obtain these by employing the safety information that is encoded in the graph structure inside Integer Linear Programming (ILP) solvers for these problems. We first characterize the paths that appear in all path covers of the DAG, generalizing a graph reduction commonly used in the error-free setting (e.g. by Kloster et al. [ALENEX~2018]). Secondly, following the work of Ma, Zheng and Kingsford [RECOMB 2021], we characterize the \emph{sequences} of arcs that appear in all path covers of the DAG. We experiment with a path-finding ILP model (least squares) and with a more recent and accurate one. We use a variety of datasets originally created by Shao and Kingsford [TCBB, 2017], as well as graphs built from sequencing reads by the state-of-the-art tool for long-read transcript discovery, IsoQuant [Prjibelski et al., Nat.~Biotechnology~2023]. The ILPs armed with safe paths or sequences exhibit significant speed-ups over the original ones. On graphs with a large width, average speed-ups are in the range $50-160\times$ in the latter ILP model and in the range $100-1000\times$ in the least squares model. Our scaling techniques apply to any ILP whose solution paths are a path cover of the arcs of the DAG. As such, they can become a scalable building block of practical RNA transcript assembly tools, avoiding heuristic trade-offs currently needed on complex graphs.


[13] 2411.05028

Leveraging Transfer Learning and Multiple Instance Learning for HER2 Automatic Scoring of H\&E Whole Slide Images

Expression of human epidermal growth factor receptor 2 (HER2) is an important biomarker in breast cancer patients who can benefit from cost-effective automatic Hematoxylin and Eosin (H\&E) HER2 scoring. However, developing such scoring models requires large pixel-level annotated datasets. Transfer learning allows prior knowledge from different datasets to be reused while multiple-instance learning (MIL) allows the lack of detailed annotations to be mitigated. The aim of this work is to examine the potential of transfer learning on the performance of deep learning models pre-trained on (i) Immunohistochemistry (IHC) images, (ii) H\&E images and (iii) non-medical images. A MIL framework with an attention mechanism is developed using pre-trained models as patch-embedding models. It was found that embedding models pre-trained on H\&E images consistently outperformed the others, resulting in an average AUC-ROC value of $0.622$ across the 4 HER2 scores ($0.59-0.80$ per HER2 score). Furthermore, it was found that using multiple-instance learning with an attention layer not only allows for good classification results to be achieved, but it can also help with producing visual indication of HER2-positive areas in the H\&E slide image by utilising the patch-wise attention weights.


[14] 2411.05188

AGE2HIE: Transfer Learning from Brain Age to Predicting Neurocognitive Outcome for Infant Brain Injury

Hypoxic-Ischemic Encephalopathy (HIE) affects 1 to 5 out of every 1,000 newborns, with 30% to 50% of cases resulting in adverse neurocognitive outcomes. However, these outcomes can only be reliably assessed as early as age 2. Therefore, early and accurate prediction of HIE-related neurocognitive outcomes using deep learning models is critical for improving clinical decision-making, guiding treatment decisions and assessing novel therapies. However, a major challenge in developing deep learning models for this purpose is the scarcity of large, annotated HIE datasets. We have assembled the first and largest public dataset, however it contains only 156 cases with 2-year neurocognitive outcome labels. In contrast, we have collected 8,859 normal brain black Magnetic Resonance Imagings (MRIs) with 0-97 years of age that are available for brain age estimation using deep learning models. In this paper, we introduce AGE2HIE to transfer knowledge learned by deep learning models from healthy controls brain MRIs to a diseased cohort, from structural to diffusion MRIs, from regression of continuous age estimation to prediction of the binary neurocognitive outcomes, and from lifespan age (0-97 years) to infant (0-2 weeks). Compared to training from scratch, transfer learning from brain age estimation significantly improves not only the prediction accuracy (3% or 2% improvement in same or multi-site), but also the model generalization across different sites (5% improvement in cross-site validation).


[15] 2411.05237

Pruning the Path to Optimal Care: Identifying Systematically Suboptimal Medical Decision-Making with Inverse Reinforcement Learning

In aims to uncover insights into medical decision-making embedded within observational data from clinical settings, we present a novel application of Inverse Reinforcement Learning (IRL) that identifies suboptimal clinician actions based on the actions of their peers. This approach centers two stages of IRL with an intermediate step to prune trajectories displaying behavior that deviates significantly from the consensus. This enables us to effectively identify clinical priorities and values from ICU data containing both optimal and suboptimal clinician decisions. We observe that the benefits of removing suboptimal actions vary by disease and differentially impact certain demographic groups.


[16] 2411.05316

Exploring the Alignment Landscape: LLMs and Geometric Deep Models in Protein Representation

Latent representation alignment has become a foundational technique for constructing multimodal large language models (MLLM) by mapping embeddings from different modalities into a shared space, often aligned with the embedding space of large language models (LLMs) to enable effective cross-modal understanding. While preliminary protein-focused MLLMs have emerged, they have predominantly relied on heuristic approaches, lacking a fundamental understanding of optimal alignment practices across representations. In this study, we explore the alignment of multimodal representations between LLMs and Geometric Deep Models (GDMs) in the protein domain. We comprehensively evaluate three state-of-the-art LLMs (Gemma2-2B, LLaMa3.1-8B, and LLaMa3.1-70B) with four protein-specialized GDMs (GearNet, GVP, ScanNet, GAT). Our work examines alignment factors from both model and protein perspectives, identifying challenges in current alignment methodologies and proposing strategies to improve the alignment process. Our key findings reveal that GDMs incorporating both graph and 3D structural information align better with LLMs, larger LLMs demonstrate improved alignment capabilities, and protein rarity significantly impacts alignment performance. We also find that increasing GDM embedding dimensions, using two-layer projection heads, and fine-tuning LLMs on protein-specific data substantially enhance alignment quality. These strategies offer potential enhancements to the performance of protein-related multimodal models. Our code and data are available at https://github.com/Tizzzzy/LLM-GDM-alignment.


[17] 2411.05439

A note on the periodic orbits of Wolbachia spread dynamics in mosquito populations in periodic environments

We consider the periodic model introduced in [20] and disprove the conjectures on the number of periodic orbits the model can have. We rebuild the conjecture to prove that for periodic sequences of maps of any period, the number of non-zero periodic trajectories is bounded by two.


[18] 2411.05450

Analysing control-theoretic properties of nonlinear synthetic biology circuits

Synthetic biology is a recent area of biological engineering, whose aim is to provide cells with novel functionalities. A number of important results regarding the development of control circuits in synthetic biology have been achieved during the last decade. A differential geometry approach can be used for the analysis of said systems, which are often nonlinear. Here we demonstrate the application of such tools to analyse the structural identifiability, observability, accessibility, and controllability of several biomolecular systems. We focus on a set of synthetic circuits of current interest, which can perform several tasks, both in open loop and closed loop settings. We analyse their properties with our own methods and tools; further, we describe a new open-source implementation of the techniques.


[19] 2411.05712

Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream

When trained on large-scale object classification datasets, certain artificial neural network models begin to approximate core object recognition (COR) behaviors and neural response patterns in the primate visual ventral stream (VVS). While recent machine learning advances suggest that scaling model size, dataset size, and compute resources improve task performance, the impact of scaling on brain alignment remains unclear. In this study, we explore scaling laws for modeling the primate VVS by systematically evaluating over 600 models trained under controlled conditions on benchmarks spanning V1, V2, V4, IT and COR behaviors. We observe that while behavioral alignment continues to scale with larger models, neural alignment saturates. This observation remains true across model architectures and training datasets, even though models with stronger inductive bias and datasets with higher-quality images are more compute-efficient. Increased scaling is especially beneficial for higher-level visual areas, where small models trained on few samples exhibit only poor alignment. Finally, we develop a scaling recipe, indicating that a greater proportion of compute should be allocated to data samples over model size. Our results suggest that while scaling alone might suffice for alignment with human core object recognition behavior, it will not yield improved models of the brain's visual ventral stream with current architectures and datasets, highlighting the need for novel strategies in building brain-like models.