New articles on Quantitative Biology


[1] 2505.09624

Neurophysiologically Realistic Environment for Comparing Adaptive Deep Brain Stimulation Algorithms in Parkinson Disease

Adaptive deep brain stimulation (aDBS) has emerged as a promising treatment for Parkinson disease (PD). In aDBS, a surgically placed electrode sends dynamically altered stimuli to the brain based on neurophysiological feedback: an invasive gadget that limits the amount of data one could collect for optimizing the control offline. As a consequence, a plethora of synthetic models of PD and those of the control algorithms have been proposed. Herein, we introduce the first neurophysiologically realistic benchmark for comparing said models. Specifically, our methodology covers not only conventional basal ganglia circuit dynamics and pathological oscillations, but also captures 15 previously dismissed physiological attributes, such as signal instabilities and noise, neural drift, electrode conductance changes and individual variability - all modeled as spatially distributed and temporally registered features via beta-band activity in the brain and a feedback. Furthermore, we purposely built our framework as a structured environment for training and evaluating deep reinforcement learning (RL) algorithms, opening new possibilities for optimizing aDBS control strategies and inviting the machine learning community to contribute to the emerging field of intelligent neurostimulation interfaces.


[2] 2505.09630

Generative diffusion model surrogates for mechanistic agent-based biological models

Mechanistic, multicellular, agent-based models are commonly used to investigate tissue, organ, and organism-scale biology at single-cell resolution. The Cellular-Potts Model (CPM) is a powerful and popular framework for developing and interrogating these models. CPMs become computationally expensive at large space- and time- scales making application and investigation of developed models difficult. Surrogate models may allow for the accelerated evaluation of CPMs of complex biological systems. However, the stochastic nature of these models means each set of parameters may give rise to different model configurations, complicating surrogate model development. In this work, we leverage denoising diffusion probabilistic models to train a generative AI surrogate of a CPM used to investigate \textit{in vitro} vasculogenesis. We describe the use of an image classifier to learn the characteristics that define unique areas of a 2-dimensional parameter space. We then apply this classifier to aid in surrogate model selection and verification. Our CPM model surrogate generates model configurations 20,000 timesteps ahead of a reference configuration and demonstrates approximately a 22x reduction in computational time as compared to native code execution. Our work represents a step towards the implementation of DDPMs to develop digital twins of stochastic biological systems.


[3] 2505.09643

A Computational Approach to Epilepsy Treatment: An AI-optimized Global Natural Product Prescription System

Epilepsy is a prevalent neurological disease with millions of patients worldwide. Many patients have turned to alternative medicine due to the limited efficacy and side effects of conventional antiepileptic drugs. In this study, we developed a computational approach to optimize herbal epilepsy treatment through AI-driven analysis of global natural products and statistically validated randomized controlled trials (RCTs). Our intelligent prescription system combines machine learning (ML) algorithms for herb-efficacy characterization, Bayesian optimization for personalized dosing, and meta-analysis of RCTs for evidence-based recommendations. The system analyzed 1,872 natural compounds from traditional Chinese medicine (TCM), Ayurveda, and ethnopharmacological databases, integrating their bioactive properties with clinical outcomes from 48 RCTs covering 48 epilepsy conditions (n=5,216). Using LASSO regression and SHAP value analysis, we identified 17 high-efficacy herbs (e.g., Gastrodia elata [using \'e for accented characters], Withania somnifera), showing significant seizure reduction (p$<$0.01, Cohen's d=0.89) with statistical significance confirmed by multiple testing (p$<$0.001). A randomized double-blind validation trial (n=120) demonstrated 28.5\% greater seizure frequency reduction with AI-optimized herbal prescriptions compared to conventional protocols (95\% CI: 18.7-37.3\%, p=0.003).


[4] 2505.09646

Temporal Interception and Present Reconstruction: A Cognitive-Signal Model for Human and AI Decision Making

This paper proposes a novel theoretical model to explain how the human mind and artificial intelligence can approach real-time awareness by reducing perceptual delays. By investigating cosmic signal delay, neurological reaction times, and the ancient cognitive state of stillness, we explore how one may shift from reactive perception to a conscious interface with the near future. This paper introduces both a physical and cognitive model for perceiving the present not as a linear timestamp, but as an interference zone where early-arriving cosmic signals and reactive human delays intersect. We propose experimental approaches to test these ideas using human neural observation and neuro-receptive extensions. Finally, we propose a mathematical framework to guide the evolution of AI systems toward temporally efficient, ethically sound, and internally conscious decision-making processes


[5] 2505.09656

VIGIL: Vision-Language Guided Multiple Instance Learning Framework for Ulcerative Colitis Histological Healing Prediction

Objective: Ulcerative colitis (UC), characterized by chronic inflammation with alternating remission-relapse cycles, requires precise histological healing (HH) evaluation to improve clinical outcomes. To overcome the limitations of annotation-intensive deep learning methods and suboptimal multi-instance learning (MIL) in HH prediction, we propose VIGIL, the first vision-language guided MIL framework integrating white light endoscopy (WLE) and endocytoscopy (EC). Methods:VIGIL begins with a dual-branch MIL module KS-MIL based on top-K typical frames selection and similarity metric adaptive learning to learn relationships among frame features effectively. By integrating the diagnostic report text and specially designed multi-level alignment and supervision between image-text pairs, VIGIL establishes joint image-text guidance during training to capture richer disease-related semantic information. Furthermore, VIGIL employs a multi-modal masked relation fusion (MMRF) strategy to uncover the latent diagnostic correlations of two endoscopic image representations. Results:Comprehensive experiments on a real-world clinical dataset demonstrate VIGIL's superior performance, achieving 92.69\% accuracy and 94.79\% AUC, outperforming existing state-of-the-art methods. Conclusion: The proposed VIGIL framework successfully establishes an effective vision-language guided MIL paradigm for UC HH prediction, reducing annotation burdens while improving prediction reliability. Significance: The research outcomes provide new insights for non-invasive UC diagnosis and hold theoretical significance and clinical value for advancing intelligent healthcare development.


[6] 2505.09664

KINDLE: Knowledge-Guided Distillation for Prior-Free Gene Regulatory Network Inference

Gene regulatory network (GRN) inference serves as a cornerstone for deciphering cellular decision-making processes. Early approaches rely exclusively on gene expression data, thus their predictive power remain fundamentally constrained by the vast combinatorial space of potential gene-gene interactions. Subsequent methods integrate prior knowledge to mitigate this challenge by restricting the solution space to biologically plausible interactions. However, we argue that the effectiveness of these approaches is contingent upon the precision of prior information and the reduction in the search space will circumscribe the models' potential for novel biological discoveries. To address these limitations, we introduce KINDLE, a three-stage framework that decouples GRN inference from prior knowledge dependencies. KINDLE trains a teacher model that integrates prior knowledge with temporal gene expression dynamics and subsequently distills this encoded knowledge to a student model, enabling accurate GRN inference solely from expression data without access to any prior. KINDLE achieves state-of-the-art performance across four benchmark datasets. Notably, it successfully identifies key transcription factors governing mouse embryonic development and precisely characterizes their functional roles. In mouse hematopoietic stem cell data, KINDLE accurately predicts fate transition outcomes following knockout of two critical regulators (Gata1 and Spi1). These biological validations demonstrate our framework's dual capability in maintaining topological inference precision while preserving discovery potential for novel biological mechanisms.


[7] 2505.09761

Sequential Monte Carlo Squared for online inference in stochastic epidemic models

Effective epidemic modeling and surveillance require computationally efficient methods that can continuously update estimates as new data becomes available. This paper explores the application of an online variant of Sequential Monte Carlo Squared (O-SMC$^2$) to the stochastic Susceptible-Exposed-Infectious-Removed (SEIR) model for real-time epidemic tracking. The particularity of O-SMC$^2$ lies in its ability to update the parameters using a particle Metropolis-Hastings kernel, ensuring that the target distribution remains invariant while only utilizing a fixed window of recent observations. This feature enables timely parameter updates and significantly enhances computational efficiency compared to the standard SMC$^2$, which processes the entire dataset. First, we demonstrate the efficiency of O-SMC$^2$ on simulated data, where both the parameters and the observation process are known. We then apply the method to a real-world COVID-19 dataset from Ireland, successfully tracking the epidemic trajectory and estimating the time-dependent reproduction number of the disease. Our results show that O-SMC$^2$ provides highly accurate online estimates of both static and dynamic epidemiological parameters while substantially reducing computational costs. These findings highlight the potential of O-SMC$^2$ for real-time epidemic monitoring and supporting adaptive public health interventions.


[8] 2505.09805

Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models

Clustering patient subgroups is essential for personalized care and efficient resource use. Traditional clustering methods struggle with high-dimensional, heterogeneous healthcare data and lack contextual understanding. This study evaluates Large Language Model (LLM) based clustering against classical methods using a pediatric sepsis dataset from a low-income country (LIC), containing 2,686 records with 28 numerical and 119 categorical variables. Patient records were serialized into text with and without a clustering objective. Embeddings were generated using quantized LLAMA 3.1 8B, DeepSeek-R1-Distill-Llama-8B with low-rank adaptation(LoRA), and Stella-En-400M-V5 models. K-means clustering was applied to these embeddings. Classical comparisons included K-Medoids clustering on UMAP and FAMD-reduced mixed data. Silhouette scores and statistical tests evaluated cluster quality and distinctiveness. Stella-En-400M-V5 achieved the highest Silhouette Score (0.86). LLAMA 3.1 8B with the clustering objective performed better with higher number of clusters, identifying subgroups with distinct nutritional, clinical, and socioeconomic profiles. LLM-based methods outperformed classical techniques by capturing richer context and prioritizing key features. These results highlight potential of LLMs for contextual phenotyping and informed decision-making in resource-limited settings.


[9] 2505.09816

Slow Transition to Low-Dimensional Chaos in Heavy-Tailed Recurrent Neural Networks

Growing evidence suggests that synaptic weights in the brain follow heavy-tailed distributions, yet most theoretical analyses of recurrent neural networks (RNNs) assume Gaussian connectivity. We systematically study the activity of RNNs with random weights drawn from biologically plausible L\'evy alpha-stable distributions. While mean-field theory for the infinite system predicts that the quiescent state is always unstable -- implying ubiquitous chaos -- our finite-size analysis reveals a sharp transition between quiescent and chaotic dynamics. We theoretically predict the gain at which the system transitions from quiescent to chaotic dynamics, and validate it through simulations. Compared to Gaussian networks, heavy-tailed RNNs exhibit a broader parameter regime near the edge of chaos, namely a slow transition to chaos. However, this robustness comes with a tradeoff: heavier tails reduce the Lyapunov dimension of the attractor, indicating lower effective dimensionality. Our results reveal a biologically aligned tradeoff between the robustness of dynamics near the edge of chaos and the richness of high-dimensional neural activity. By analytically characterizing the transition point in finite-size networks -- where mean-field theory breaks down -- we provide a tractable framework for understanding dynamics in realistically sized, heavy-tailed neural circuits.


[10] 2505.09873

Deep Learning and Explainable AI: New Pathways to Genetic Insights

Deep learning-based AI models have been extensively applied in genomics, achieving remarkable success across diverse applications. As these models gain prominence, there exists an urgent need for interpretability methods to establish trustworthiness in model-driven decisions. For genetic researchers, interpretable insights derived from these models hold significant value in providing novel perspectives for understanding biological processes. Current interpretability analyses in genomics predominantly rely on intuition and experience rather than rigorous theoretical foundations. In this review, we systematically categorize interpretability methods into input-based and model-based approaches, while critically evaluating their limitations through concrete biological application scenarios. Furthermore, we establish theoretical underpinnings to elucidate the origins of these constraints through formal mathematical demonstrations, aiming to assist genetic researchers in better understanding and designing models in the future. Finally, we provide feasible suggestions for future research on interpretability in the field of genetics.


[11] 2505.09883

DeepPlantCRE: A Transformer-CNN Hybrid Framework for Plant Gene Expression Modeling and Cross-Species Generalization

The investigation of plant transcriptional regulation constitutes a fundamental basis for crop breeding, where cis-regulatory elements (CREs), as the key factor determining gene expression, have become the focus of crop genetic improvement research. Deep learning techniques, leveraging their exceptional capacity for high-dimensional feature extraction and nonlinear regulatory relationship modeling, have been extensively employed in this field. However, current methodologies present notable limitations: single CNN-based architectures struggle to capture long-range regulatory interactions, while existing CNN-Transformer hybrid models demonstrate proneness to overfitting and inadequate generalization in cross-species prediction contexts. To address these challenges, this study proposes DeepPlantCRE, a deep-learning framework for plant gene expression prediction and CRE Extraction. The model employs a Transformer-CNN hybrid architecture that achieves enhanced Accuracy, AUC-ROC, and F1-score metrics over existing baselines (DeepCRE and PhytoExpr), with improved generalization performance and overfitting inhibiting. Cross-species validation experiments conducted on gene expression datasets from \textit{Gossypium}, \textit{Arabidopsis thaliana}, \textit{Solanum lycopersicum}, \textit{Sorghum bicolor}, and \textit{Arabidopsis thaliana} reveal that the model achieves peak prediction accuracy of 92.3\%, particularly excelling in complex genomic data analysis. Furthermore, interpretability investigations using DeepLIFT and Transcription Factor Motif Discovery from the importance scores algorithm (TF-MoDISco) demonstrate that the derived motifs from our model exhibit high concordance with known transcription factor binding sites (TFBSs) such as MYR2, TSO1 in JASPAR plant database, substantiating the potential of biological interpretability and practical agricultural application of DeepPlantCRE.


[12] 2505.10017

Data mining of public genomic repositories: harnessing off-target reads to expand microbial pathogen genomic resources

As sequencing technologies become more affordable and genomic databases expand continuously, the reuse of publicly available sequencing data emerges as a powerful strategy for studying microbial pathogens. Indeed, raw sequencing reads generated for the study of a given organism often contain reads originating from the associated microbiota. This review explores how such off-target reads can be detected and used for the study of microbial pathogens. We present genomic data mining as a method to identify relevant sequencing runs from petabase-scale databases, highlighting recent methodological advances that allow efficient database querying. We then briefly outline methods designed to retrieve relevant data and associated metadata, and provide an overview of common downstream analysis pipelines. We discuss how such approaches have (i) expanded the known genetic diversity of microbial pathogens, (ii) enriched our understanding of their spatiotemporal distribution, and (iii) highlighted previously unrecognized ecological interactions involving microbial pathogens. However, these analyses often rely on the completeness and accuracy of accompanying metadata, which remain highly variable. We detail common pitfalls, including data contamination and metadata misannotations, and suggest strategies for result interpretation. Ultimately, while data mining cannot replace dedicated studies, it constitutes an essential and complementary tool for microbial pathogen research. Broader utility will depend on improved data standardization and systematic genomic monitoring across ecosystems.


[13] 2505.10156

Population dynamics of generalist/specialist strategies in the feast-famine cycle

Microbial populations exhibit a broad spectrum of nutrient utilization strategies, ranging from strategies utilizing diverse nutrients, called "generalist," to ones being highly adapted to specific nutrients, called "specialists." The mathematical conditions for the diversification of nutrient utilization strategies are central questions in theoretical ecology. Previous studies have shown that trade-offs among different resource utilization functions that cells cannot utilize broad types of substrates at near-maximum speed are crucial for the emergence of diverse strategies. However, in natural settings, nutrient availability often fluctuates over time, imposing additional trade-offs on cells. Cells that grow rapidly under nutrient-rich conditions will suffer a higher death rate under nutrient-poor conditions, creating a growth-death trade-off that intersects with the classical resource-use trade-off. Here, we introduce a unified mathematical model that simultaneously incorporates the resource-use trade-off and the growth-death trade-off. The nutrient supply was modeled as discrete stochastic events, capturing realistic temporal fluctuations. We show that the relative balance between growth and death rates critically influences the dominance of either generalist or specialist strategies. Specifically, under conditions of high average growth rates among different environments and a weak trade-off between growth and death rates, generalists prevail. In contrast, when the growth-death trade-off is intense, specialists emerge as the dominant strategy. Our findings reveal that accounting for the growth-death trade-off is crucial for understanding how microbial communities adapt and evolve in temporally varying environments.


[14] 2505.10517

A Tutorial on Structural Identifiability of Epidemic Models Using StructuralIdentifiability.jl

Structural identifiability -- the theoretical ability to uniquely recover model parameters from ideal, noise-free data -- is a prerequisite for reliable parameter estimation in epidemic modeling. Despite its importance, structural identifiability analysis remains underutilized in the infectious disease modeling literature. In this tutorial, we present a practical and reproducible workflow for conducting structural identifiability analysis of ordinary differential equation models using the Julia package \texttt{StructuralIdentifiability.jl}. We apply the tool to a range of epidemic models, including SEIR variants with asymptomatic and pre-symptomatic transmission, vector-borne systems, and models incorporating hospitalization and disease-induced mortality. We compare results from StructuralIdentifiability.jl with those obtained using DAISY, a widely used differential algebra tool, and highlight cases where the Julia package succeeds in analyzing models that DAISY cannot handle. Our findings underscore how identifiability depends on model structure, the availability of initial conditions, and the choice of observed states. All code and diagrams are publicly available, making this tutorial a valuable reference for researchers and educators working with dynamic disease models.


[15] 2505.10536

Real-World fNIRS-Based Brain-Computer Interfaces: Benchmarking Deep Learning and Classical Models in Interactive Gaming

Brain-Computer Interfaces enable direct communication between the brain and external systems, with functional Near-Infrared Spectroscopy emerging as a portable and non-invasive method for capturing cerebral hemodynamics. This study investigates the classification of rest and task states during a realistic, interactive tennis simulation using fNIRS signals and a range of machine learning approaches. We benchmarked traditional classifiers based on engineered features, Long Short-Term Memory networks on raw time-series data, and Convolutional Neural Networks applied to Gramian Angular Field-transformed images. Ensemble models like Extra Trees and Gradient Boosting achieved accuracies above 97 percent, while the ResNet-based CNN reached 95.0 percent accuracy with a near-perfect AUC of 99.2 percent, outperforming both LSTM and EfficientNet architectures. A novel data augmentation strategy was employed to equalize trial durations while preserving physiological integrity. Feature importance analyses revealed that both oxygenated and deoxygenated hemoglobin signals, particularly slope and RMS metrics, were key contributors to classification performance. These findings demonstrate the strong potential of fNIRS-based BCIs for deployment in dynamic, real-world environments and underscore the advantages of deep learning models in decoding complex neural signals.


[16] 2505.09658

Great Short History of Microbiology Development as a Science

The study of microorganisms, or microbiology, has demonstrated significant development since its inception and is currently a key field of biological sciences that has a huge impact on modern society and scientific research. Over the centuries, this discipline has undergone significant changes, shaping our understanding of infectious diseases and food safety. Starting from the simplest observations of microscopic organisms such as bacteria, viruses, fungi and protozoa, and ending with modern molecular and genomic research methods. This article describes a brief historical path of microbiology development. The heuristic, morphological, physiological, immunological, and molecular genetic stages are the main periods into which the development of this science is traditionally divided, despite the lack of full-fledged and precise boundaries between them.


[17] 2505.10294

MIPHEI-ViT: Multiplex Immunofluorescence Prediction from H&E Images using ViT Foundation Models

Histopathological analysis is a cornerstone of cancer diagnosis, with Hematoxylin and Eosin (H&E) staining routinely acquired for every patient to visualize cell morphology and tissue architecture. On the other hand, multiplex immunofluorescence (mIF) enables more precise cell type identification via proteomic markers, but has yet to achieve widespread clinical adoption due to cost and logistical constraints. To bridge this gap, we introduce MIPHEI (Multiplex Immunofluorescence Prediction from H&E), a U-Net-inspired architecture that integrates state-of-the-art ViT foundation models as encoders to predict mIF signals from H&E images. MIPHEI targets a comprehensive panel of markers spanning nuclear content, immune lineages (T cells, B cells, myeloid), epithelium, stroma, vasculature, and proliferation. We train our model using the publicly available ORION dataset of restained H&E and mIF images from colorectal cancer tissue, and validate it on two independent datasets. MIPHEI achieves accurate cell-type classification from H&E alone, with F1 scores of 0.88 for Pan-CK, 0.57 for CD3e, 0.56 for SMA, 0.36 for CD68, and 0.30 for CD20, substantially outperforming both a state-of-the-art baseline and a random classifier for most markers. Our results indicate that our model effectively captures the complex relationships between nuclear morphologies in their tissue context, as visible in H&E images and molecular markers defining specific cell types. MIPHEI offers a promising step toward enabling cell-type-aware analysis of large-scale H&E datasets, in view of uncovering relationships between spatial cellular organization and patient outcomes.


[18] 2505.10444

Inferring entropy production in many-body systems using nonequilibrium MaxEnt

We propose a method for inferring entropy production (EP) in high-dimensional stochastic systems, including many-body systems and non-Markovian systems with long memory. Standard techniques for estimating EP become intractable in such systems due to computational and statistical limitations. We infer trajectory-level EP and lower bounds on average EP by exploiting a nonequilibrium analogue of the Maximum Entropy principle, along with convex duality. Our approach uses only samples of trajectory observables (such as spatiotemporal correlation functions). It does not require reconstruction of high-dimensional probability distributions or rate matrices, nor any special assumptions such as discrete states or multipartite dynamics. It may be used to compute a hierarchical decomposition of EP, reflecting contributions from different kinds of interactions, and it has an intuitive physical interpretation as a thermodynamic uncertainty relation. We demonstrate its numerical performance on a disordered nonequilibrium spin model with 1000 spins and a large neural spike-train dataset.