New articles on Quantitative Biology


[1] 2602.04901

Beyond Independent Genes: Learning Module-Inductive Representations for Gene Perturbation Prediction

Predicting transcriptional responses to genetic perturbations is a central problem in functional genomics. In practice, perturbation responses are rarely gene-independent but instead manifest as coordinated, program-level transcriptional changes among functionally related genes. However, most existing methods do not explicitly model such coordination, due to gene-wise modeling paradigms and reliance on static biological priors that cannot capture dynamic program reorganization. To address these limitations, we propose scBIG, a module-inductive perturbation prediction framework that explicitly models coordinated gene programs. scBIG induces coherent gene programs from data via Gene-Relation Clustering, captures inter-program interactions through a Gene-Cluster-Aware Encoder, and preserves modular coordination using structure-aware alignment objectives. These structured representations are then modeled using conditional flow matching to enable flexible and generalizable perturbation prediction. Extensive experiments on multiple single-cell perturbation benchmarks show that scBIG consistently outperforms state-of-the-art methods, particularly on unseen and combinatorial perturbation settings, achieving an average improvement of 6.7% over the strongest baselines.


[2] 2602.04916

AFD-INSTRUCTION: A Comprehensive Antibody Instruction Dataset with Functional Annotations for LLM-Based Understanding and Design

Large language models (LLMs) have significantly advanced protein representation learning. However, their capacity to interpret and design antibodies through natural language remains limited. To address this challenge, we present AFD-Instruction, the first large-scale instruction dataset with functional annotations tailored to antibodies. This dataset encompasses two key components: antibody understanding, which infers functional attributes directly from sequences, and antibody design, which enables de novo sequence generation under functional constraints. These components provide explicit sequence-function alignment and support antibody design guided by natural language instructions. Extensive instruction-tuning experiments on general-purpose LLMs demonstrate that AFD-Instruction consistently improves performance across diverse antibody-related tasks. By linking antibody sequences with textual descriptions of function, AFD-Instruction establishes a new foundation for advancing antibody modeling and accelerating therapeutic discovery.


[3] 2602.05118

Combination therapy for colorectal cancer with anti-PD-L1 and cancer vaccine: A multiscale mathematical model of tumor-immune interactions

The tumor-immune system plays a critical role in colorectal cancer progression. Recent preclinical and clinical studies showed that combination therapy with anti-PD-L1 and cancer vaccines improved treatment response. In this study, we developed a multiscale mathematical model of interactions among tumors, immune cells, and cytokines to investigate tumor evolutionary dynamics under different therapeutic strategies. Additionally, we established a computational framework based on approximate Bayesian computation to generate virtual tumor samples and capture inter-individual heterogeneity in treatment response. The results demonstrated that a multiple low-dose regimen significantly reduced advanced tumor burden compared to baseline treatment in anti-PD-L1 therapy. In contrast, the maximum dose therapy yielded superior tumor growth control in cancer vaccine therapy. Furthermore, cytotoxic T cells were identified as a consistent predictive biomarker both before and after treatment initiation. Notably, the cytotoxic T cells-to-regulatory T cells ratio specifically served as a robust pre-treatment predictive biomarker, offering potential clinical utility for patient stratification and therapy personalization.


[4] 2602.05196

Learning virulence-transmission relationships using causal inference

The relationship between traits that influence pathogen virulence and transmission is part of the central canon of the evolution and ecology of infectious disease. However, identifying directional and mechanistic relationships among traits remains a key challenge in various subfields of biology, as models often assume static, fixed links between characteristics. Here, we introduce learning evolutionary trait relationships (LETR), a data-driven framework that applies Granger-causality principles to determine which traits drive others and how these relationships change over time. LETR integrates causal discovery with generative mapping and transfer-operator analysis to link short-term predictability with long-term trait distributions. Using a synthetic myxomatosis virus-host data set, we show that LETR reliably recovers known directional influences, such as virulence driving transmission. Applying the framework to global pandemic (SARS-CoV-2) data, we find that past virulence improves future transmission prediction, while the reverse effect is weak. Invariant-density estimates reveal a long-term trend toward low virulence and transmission, with bimodality in virulence suggesting ecological influences or host heterogeneity. In summary, this study provides a blueprint for learning the relationship between how harmful a pathogen is and how well it spreads, which is highly idiosyncratic and context-dependent. This finding undermines simplistic models and encourages the development of new theory for the constraints underlying pathogen evolution. Further, by uniting causal inference with dynamical modeling, the LETR framework offers a general approach for uncovering mechanistic trait linkages in complex biological systems of various kinds.


[5] 2602.05274

Specieslike clusters based on identical ancestor points

We introduce several axioms which may or may not hold for any given subgraph of the directed graph of all organisms (past, present and future) where edges represent biological parenthood, with the simplifying background assumption that life does not go extinct. We argue these axioms are plausible for species: if one were to define species based purely on genealogical relationships, it would be reasonable to define them in such a way as to satisfy these axioms. The main axiom we introduce, which we call the identical ancestor point axiom, states that for any organism in any species, either the species contains at most finitely many descendants of that organism, or else the species contains at most finitely many non-descendants of that organism. We show that this (together with a convexity axiom) reduces the subjectivity of species, in a technical sense. We call connected sets satisfying these two axioms "specieslike clusters." We consider the question of identifying a set of biologically plausible constraints that would guarantee every organism inhabits a maximal specieslike cluster subject to those constraints. We provide one such set consisting of two constraints and show that no proper subset thereof suffices.


[6] 2602.05451

Somatic Mechanotherapy Activates a Spatiotemporally Synchronized CPTC Wnt Axis for Gut Regeneration

Somatic mechanical stimulation (e.g., acupuncture) exerts systemic immunomodulatory effects, yet the cellular bridge translating peripheral physical force into visceral repair remains elusive. Here, employing a custom interpretable deep learning framework (CARSS) on single-cell RNA sequencing data, we identify CD34$^{+}$PDGFR$\alpha$$^{+}$ telocytes (CPTCs) as the primary mechanosensors in both fascia and colon during bacterial colitis. We show that somatic mechanotherapy triggers an AP-1/Hsp70-dependent transcriptional program in fascial CPTCs, inducing systemic Wnt elevation, which elicits a "transcriptional resonance" in colonic CPTCs, reprogramming their communication network from an inflammatory amplifier to a Wnt-driven regenerative hub. Mechanistically, this axis activates epithelial $\beta$-catenin/Myc signaling, suppressing apoptosis and restoring barrier integrity independent of immune cells. Our findings define a CPTC-Driven Mechano-Resonance Axis, where CPTCs serve as synchronized relay stations that convert local mechanical cues into systemic regenerative microenvironments.


[7] 2602.05583

Intermittent precipitation and spatial Allee effects drive irregular vegetation patterns in semiarid ecosystems

Vegetation in semi-arid ecosystems frequently organizes into spatially heterogeneous mosaics that regulate ecosystem functioning, productivity, and resilience. These patterns arise from local biological interactions, including facilitation among neighboring plants and competition for limiting resources. Classical theoretical approaches have attributed such organization to scale-dependent feedbacks, predicting regular spatial patterns and abrupt transitions to collapse. However, growing empirical and theoretical evidence reveal that environmental variability and demographic stochasticity can fundamentally reshape spatial organization, driving irregular clusters, dynamic mosaics, and gradual rather than catastrophic vegetation declines. In drylands, rainfall variability is a dominant source of environmental forcing: precipitation typically occurs in short, irregular pulses that transiently enhance survival and recruitment before competitive interactions again dominate. Near persistence thresholds, ecosystem dynamics are therefore governed not only by average climatic conditions but also by the timing and spatial coincidence of favorable events. Under these conditions, positive density dependence and local facilitation can critically determine whether vegetation patches persist, expand, or collapse. Here, we develop an individual-based model that integrates intermittent precipitation with local Allee effects to examine how stochastic rainfall shapes spatial organization and persistence. We show that the interaction between pulsed resource availability and density-dependent survival generates irregular cluster structures and strongly modulates extinction risk, with resilience emerging from local spatial covariance and neighborhood density rather than from total biomass alone. These results highlight the importance of individual-level, stochastic processes in determining ecosystem resilience.


[8] 2602.05017

A novel scalable high performance diffusion solver for multiscale cell simulations

Agent-based cellular models simulate tissue evolution by capturing the behavior of individual cells, their interactions with neighboring cells, and their responses to the surrounding microenvironment. An important challenge in the field is scaling cellular resolution models to real-scale tumor simulations, which is critical for the development of digital twin models of diseases and requires the use of High-Performance Computing (HPC) since every time step involves trillions of operations. We hereby present a scalable HPC solution for the molecular diffusion modeling using an efficient implementation of state-of-the-art Finite Volume Method (FVM) frameworks. The paper systematically evaluates a novel scalable Biological Finite Volume Method (BioFVM) library and presents an extensive performance analysis of the available solutions. Results shows that our HPC proposal reach almost 200x speedup and up to 36% reduction in memory usage over the current state-of-the-art solutions, paving the way to efficiently compute the next generation of biological problems.


[9] 2602.05071

Optimal Harvesting in Stream Networks: Maximizing Biomass and Yield

In this study, we develop a metapopulation model framework to identify optimal harvesting strategies for a population in a stream network. We consider two distinct optimization objectives: maximization of total biomass and maximization of total yield, under the constraint of a fixed total harvesting effort. We examine in detail the special case of a two-patch network and fully characterize the optimal strategies for each objective. We show that when the population growth rate exceeds a critical threshold, a single harvesting strategy can simultaneously maximize both objectives. For general $n$-patch networks with homogeneous growth rates across patches, we focus on the regime of large growth rates and demonstrate that the optimal harvesting strategy selects patches according to their intraspecific competition rates and an effective net flow metric determined by network connectivity parameters.


[10] 2602.05971

Characterizing Human Semantic Navigation in Concept Production as Trajectories in Embedding Space

Semantic representations can be framed as a structured, dynamic knowledge space through which humans navigate to retrieve and manipulate meaning. To investigate how humans traverse this geometry, we introduce a framework that represents concept production as navigation through embedding space. Using different transformer text embedding models, we construct participant-specific semantic trajectories based on cumulative embeddings and extract geometric and dynamical metrics, including distance to next, distance to centroid, entropy, velocity, and acceleration. These measures capture both scalar and directional aspects of semantic navigation, providing a computationally grounded view of semantic representation search as movement in a geometric space. We evaluate the framework on four datasets across different languages, spanning different property generation tasks: Neurodegenerative, Swear verbal fluency, Property listing task in Italian, and in German. Across these contexts, our approach distinguishes between clinical groups and concept types, offering a mathematical framework that requires minimal human intervention compared to typical labor-intensive linguistic pre-processing methods. Comparison with a non-cumulative approach reveals that cumulative embeddings work best for longer trajectories, whereas shorter ones may provide too little context, favoring the non-cumulative alternative. Critically, different embedding models yielded similar results, highlighting similarities between different learned representations despite different training pipelines. By framing semantic navigation as a structured trajectory through embedding space, bridging cognitive modeling with learned representation, thereby establishing a pipeline for quantifying semantic representation dynamics with applications in clinical research, cross-linguistic analysis, and the assessment of artificial cognition.


[11] 2602.06020

Mechanisms of AI Protein Folding in ESMFold

How do protein structure prediction models fold proteins? We investigate this question by tracing how ESMFold folds a beta hairpin, a prevalent structural motif. Through counterfactual interventions on model latents, we identify two computational stages in the folding trunk. In the first stage, early blocks initialize pairwise biochemical signals: residue identities and associated biochemical features such as charge flow from sequence representations into pairwise representations. In the second stage, late blocks develop pairwise spatial features: distance and contact information accumulate in the pairwise representation. We demonstrate that the mechanisms underlying structural decisions of ESMFold can be localized, traced through interpretable representations, and manipulated with strong causal effects.


[12] 2505.17329

Transformer brain encoders explain human high-level visual responses

A major goal of neuroscience is to understand brain computations during visual processing in naturalistic settings. A dominant approach is to use image-computable deep neural networks trained with different task objectives as a basis for linear encoding models. However, in addition to requiring estimation of a large number of linear encoding parameters, this approach ignores the structure of the feature maps both in the brain and the models. Recently proposed alternatives factor the linear mapping into separate sets of spatial and feature weights, thus finding static receptive fields for units, which is appropriate only for early visual areas. In this work, we employ the attention mechanism used in the transformer architecture to study how retinotopic visual features can be dynamically routed to category-selective areas in high-level visual processing. We show that this computational motif is significantly more powerful than alternative methods in predicting brain activity during natural scene viewing, across different feature basis models and modalities. We also show that this approach is inherently more interpretable as the attention-routing signals for different high-level categorical areas can be easily visualized for any input image. Given its high performance at predicting brain responses to novel images, the model deserves consideration as a candidate mechanistic model of how visual information from retinotopic maps is routed in the human brain based on the relevance of the input content to different category-selective regions.


[13] 2509.25501

Load Transfer along Continuous Collagen Fibers Reduces the Importance of Wall Thickness Variations

The mechanical response of biological soft tissues is influenced by wall heterogeneity, including spatial variations in wall thickness. Traditional models for homogeneous soft tissues under uniaxial loading predict higher stretch and stress in thinner regions. In fact, large gradients in stretch and stress are predicted to be induced by spatial variations in wall thickness. In prior studies, the role of collagen fibers in regions of thickness transition has been largely neglected or only considered in terms of their effect on anisotropy. Here, we explore the role of collagen fibers as primary load-bearing components across regions of varying wall thickness, using a three-dimensional representative volume element (RVE) model incorporating explicit collagen fiber architecture and a gradual thickness gradient. We examined two distinct collagen fiber configurations across the thickness transition: one featuring abrupt fiber termination and another with fiber continuity. Finite element analysis (FEA) under uniaxial tension revealed that load transfer by continuous fibers across the specimen markedly reduced the importance of the change in wall thickness, with stretch differentials dropping from ~20% (fiber-termination network) to 0.68% (continuous fibers) and stress differentials dropping from ~65% (fiber-termination network) to 2.3% (continuous fibers). Fiber tortuosity delayed the point at which mechanical response was governed by fiber structure. These findings demonstrate the critical role of fiber continuity in reducing stretch and stress gradients across regions of varying wall thickness and clarify the importance of accurately representing fiber architecture when modeling soft tissues with heterogeneous wall thickness.


[14] 2503.03126

Controlling tissue size by active fracture

Groups of cells, including clusters of cancerous cells, multicellular organisms, and developing organs, may both grow and break apart. What physical factors control these fractures? In these processes, what sets the eventual size of clusters? We first develop a one-dimensional framework for understanding cell clusters that can fragment due to cell motility using an active particle model. We compute analytically how the break rate of cell-cell junctions depends on cell speed, cell persistence, and cell-cell junction properties. Next, we find the cluster size distributions, which differ depending on whether all cells can divide or only the cells on the edge of the cluster divide. Cluster size distributions depend solely on the ratio of the break rate to the growth rate - allowing us to predict how cluster size and variability depend on cell motility and cell-cell mechanics. Our results suggest that organisms can achieve better size control when cell division is restricted to the cluster boundaries or when fracture can be localized to the cluster center. Additionally, we derive a universal survival probability for an intact cluster $S(t)=\mathrm{e}^{-k_d t}$ at steady state if all cells can divide, which is independent of the rupture kinetics and depends solely on the cell division rate $k_d$. Finally, we further corroborate the one-dimensional analytics with two-dimensional simulations, finding quantitative agreement with some - but not all - elements of the theory across a wide range of cell motility. Our results link the general physics problem of a collective active escape over a barrier to size control, providing a quantitative measure of how motility can regulate organ or organism size.


[15] 2505.04672

Histo-Miner: Deep learning based tissue features extraction pipeline from H&E whole slide images of cutaneous squamous cell carcinoma

Recent advancements in digital pathology have enabled comprehensive analysis of Whole-Slide Images (WSI) from tissue samples, leveraging high-resolution microscopy and computational capabilities. Despite this progress, there is a lack of labeled datasets and open source pipelines specifically tailored for analysis of skin tissue. Here we propose Histo-Miner, a deep learning-based pipeline for analysis of skin WSIs and generate two datasets with labeled nuclei and tumor regions. We develop our pipeline for the analysis of patient samples of cutaneous squamous cell carcinoma (cSCC), a frequent non-melanoma skin cancer. Utilizing the two datasets, comprising 47,392 annotated cell nuclei and 144 tumor-segmented WSIs respectively, both from cSCC patients, Histo-Miner employs convolutional neural networks and vision transformers for nucleus segmentation and classification as well as tumor region segmentation. Performance of trained models positively compares to state of the art with multi-class Panoptic Quality (mPQ) of 0.569 for nucleus segmentation, macro-averaged F1 of 0.832 for nucleus classification and mean Intersection over Union (mIoU) of 0.907 for tumor region segmentation. From these predictions we generate a compact feature vector summarizing tissue morphology and cellular interactions, which can be used for various downstream tasks. Here, we use Histo-Miner to predict cSCC patient response to immunotherapy based on pre-treatment WSIs from 45 patients. Histo-Miner identifies percentages of lymphocytes, the granulocyte to lymphocyte ratio in tumor vicinity and the distances between granulocytes and plasma cells in tumors as predictive features for therapy response. This highlights the applicability of Histo-Miner to clinically relevant scenarios, providing direct interpretation of the classification and insights into the underlying biology.


[16] 2508.02276

CellForge: Agentic Design of Virtual Cell Models

Virtual cell modeling aims to predict cellular responses to diverse perturbations but faces challenges from biological complexity, multimodal data heterogeneity, and the need for interdisciplinary expertise. We introduce CellForge, a multi-agent framework that autonomously designs and synthesizes neural network architectures tailored to specific single-cell datasets and perturbation tasks. Given raw multi-omics data and task descriptions, CellForge discovers candidate architectures through collaborative reasoning among specialized agents, then generates executable implementations. Our core contribution is the framework itself: showing that multi-agent collaboration mechanisms - rather than manual human design or single-LLM prompting - can autonomously produce executable, high-quality computational methods. This approach goes beyond conventional hyperparameter tuning by enabling entirely new architectural components such as trajectory-aware encoders and perturbation diffusion modules to emerge from agentic deliberation. We evaluate CellForge on six datasets spanning gene knockouts, drug treatments, and cytokine stimulations across multiple modalities (scRNA-seq, scATAC-seq, CITE-seq). The results demonstrate that the models generated by CellForge are highly competitive with established baselines, while revealing systematic patterns of architectural innovation. CellForge highlights the scientific value of multi-agent frameworks: collaboration among specialized agents enables genuine methodological innovation and executable solutions that single agents or human experts cannot achieve. This represents a paradigm shift toward autonomous scientific method development in computational biology. Code is available at this https URL.