New articles on Quantitative Biology


[1] 2601.09737

Absorption and fixation times for evolutionary processes on graphs

In this paper, we study the absorption and fixation times for evolutionary processes on graphs, under different updating rules. While in Moran process a single neighbour is randomly chosen to be replaced, in proliferation processes other neighbours can be replaced using Bernoulli or binomial draws depending on $0 < p \leq 1$. There is a critical value $p_c$ such that the proliferation is advantageous or disadvantageous in terms of fixation probability depending on whether $p > p_c$ or $p < p_c$. We clarify the role of symmetries for computing the fixation time in Moran process. We show that the Maruyama-Kimura symmetry depend on the graph structure induced in each state, implying asymmetry for all graphs except cliques and cycles. There is a fitness value, not necessarily $1$, beyond which the fixation time decreases monotonically. We apply Harris' graphical method to prove that the fixation time decreases monotonically depending on $p$. Thus there exists another value $p_t$ for which the proliferation is advantageous or disadvantageous in terms of time. However, at the critical level $p=p_c$, the proliferation is highly advantageous when $r \to +\infty$.


[2] 2601.09738

From Ecological Connectivity to Outbreak Risk: A Heterogeneous Graph Network for Epidemiological Reasoning under Sparse Spatiotemporal Data

Estimating population-level prevalence and transmission dynamics of wildlife pathogens can be challenging, partly because surveillance data is sparse, detection-driven, and unevenly sequenced. Using highly pathogenic avian influenza A/H5 clade 2.3.4.4b as a case study, we develop zooNet, a graph-based epidemiological framework that integrates mechanistic transmission simulation, metadata-driven genetic distance imputation, and spatiotemporal graph learning to reconstruct outbreak dynamics from incomplete observations. Applied to wild bird surveillance data from the United States during 2022, zooNet recovered coherent spatiotemporal structure despite intermittent detections, revealing sustained regional circulation across multiple migratory flyways. The framework consistently identified counties with ongoing transmission weeks to months before confirmed detections, including persistent activity in northeastern regions prior to documented re-emergence. These signals were detectable even in areas with sparse sequencing and irregular reporting. These results show that explicitly representing ecological processes and inferred genomic connectivity within a unified graph structure allows persistence and spatial risk structure to be inferred from detection-driven wildlife surveillance data.


[3] 2601.09747

Topological Percolation in Urban Dengue Transmission: A Multi-Scale Analysis of Spatial Connectivity

We investigate the spatial organization of dengue cases in the city of Recife, Brazil, from 2015 to 2024, using tools from statistical physics and topological data analysis. Reported cases are modeled as point clouds in a metric space, and their spatial connectivity is studied through Vietoris-Rips filtrations and zero-dimensional persistent homology, which captures the emergence and collapse of connected components across spatial scales. By parametrizing the filtration using percentiles of the empirical distance distribution, we identify critical percolation thresholds associated with abrupt growth of the largest connected component. These thresholds define distinct geometric regimes, ranging from fragmented spatial patterns to highly concentrated, percolated structures. Remarkably, years with similar incidence levels exhibit qualitatively different percolation behavior, demonstrating that case counts alone do not determine the spatial organization of transmission. Our analysis further reveals pronounced temporal heterogeneity in the percolation properties of dengue spread, including a structural rupture in 2020 characterized by delayed or absent spatial percolation. These findings highlight percolation-based topological observables as physically interpretable and sensitive descriptors of urban epidemic structure, offering a complementary perspective to traditional spatial and epidemiological analyses.


[4] 2601.09758

Detecting Batch Heterogeneity via Likelihood Clustering

Batch effects represent a major confounder in genomic diagnostics. In copy number variant (CNV) detection from NGS, many algorithms compare read depth between test samples and a reference sample, assuming they are process-matched. When this assumption is violated, with causes ranging from reagent lot changes to multi-site processing, the reference becomes inappropriate, introducing false CNV calls or masking true pathogenic variants. Detecting such heterogeneity before downstream analysis is critical for reliable clinical interpretation. Existing batch effect detection methods either cluster samples based on raw features, risking conflation of biological signal with technical variation, or require known batch labels that are frequently unavailable. We introduce a method that addresses both limitations by clustering samples according to their Bayesian model evidence. The central insight is that evidence quantifies compatibility between data and model assumptions, technical artifacts violate assumptions and reduce evidence, whereas biological variation, including CNV status, is anticipated by the model and yields high evidence. This asymmetry provides a discriminative signal that separates batch effects from biology. We formalize heterogeneity detection as a likelihood ratio test for mixture structure in evidence space, using parametric bootstrap calibration to ensure conservative false positive rates. We validate our approach on synthetic data demonstrating proper Type I error control, three clinical targeted sequencing panels (liquid biopsy, BRCA, and thalassemia) exhibiting distinct batch effect mechanisms, and mouse electrophysiology recordings demonstrating cross-modality generalization. Our method achieves superior clustering accuracy compared to standard correlation-based and dimensionality-reduction approaches while maintaining the conservativeness required for clinical usage.


[5] 2601.09767

From the Hallmarks of Cancer to the Survival System: Integration and Paradigmatic Reconstruction of Classical Oncology Theories by the Tumor Existential Crisis-Driven Survival (ECDS) Theory

Malignant tumors are defined by extraordinarily intricate pathogenic mechanisms, yet classical oncological theories remain fragmented in an archipelagic state-lacking a unifying framework to integrate the processes of tumor initiation, progression, metastasis, and therapeutic resistance. This theoretical disjuncture constrains the efficacy of the confront-and-eradicate paradigm, culminating in limited therapeutic breakthroughs, recurrent drug resistance, and substantial host toxicity. We herein propose the ECDS (Tumor Existential Crisis-Driven Survival) theory, anchored in a central mechanistic axis: impairment of Existential Stability elicits compensatory hyperactivation of Survival Capacity. The ECDS framework establishes three foundational constructs (Existential Stability, Survival Capacity, and Existence Threshold) alongside three core principles, integrating pivotal theories of tumorigenesis. It furnishes a systematic account of the dynamic coupling between augmented survival capacity and declining existential stability throughout tumor evolution; reinterprets the fundamental nature and temporal hierarchy of the fourteen cancer hallmarks; clarifies the redundancy of survival pathways in tumor heterogeneity; and elucidates the adaptive underpinnings of therapeutic resistance. As a holistic integrative model, ECDS offers transformative conceptual insights for oncology by reframing tumors not merely as abnormally-proliferating cell aggregates but as biological subsystems driven into survival mode by eroding existential stability. This challenges the prevailing dogma of tumors as intrinsically aggressive entities, revealing their core traits of intrinsic vulnerability and passive adaptation, while laying a meta-theoretical foundation for reorienting cancer study and management toward threshold-regulated dynamic adaptation.


[6] 2601.09813

An agent-based modelling approach to investigate the impact of gender on tuberculosis transmission in Uganda

Tuberculosis (TB) is an airborne disease caused by the pathogen Mycobacterium tuberculosis. In 2023, it returned to being the leading cause of death from an infectious agent globally, replacing COVID-19; in the nineteenth century, one in seven of all humans died of tuberculosis. More than 10 million people are diagnosed with TB every year. The majority of cases in adults occur in males (62.5% of all global adult cases in 2023, compared to 37.5% in females). The main reasons for males suffering from a higher burden of global TB cases, compared to females, may be in large part due to population-scale factors, such as employment type, the quantity and type of social contacts they make, and their health-seeking behaviours (e.g. differences in diagnostic and treatment delays between genders). To investigate which population-scale factors are most important in determining this higher TB burden in males, we have developed an age- and gender-stratified, spatially heterogeneous epidemiological agent-based model. We have focused specifically on Kampala, the capital of Uganda, which is a high-burden TB country. We considered counterfactual scenarios to elucidate the impact of gender on the epidemiology of TB. Setting disease progression parameters equal between the genders leads to a reduction in both male-to-female case ratio and total case numbers.


[7] 2601.09816

The multi-allelic Moran process as a multi-zealot voter model: exact results and consequences for diversity thresholds

The Moran process is a foundational model of genetic drift and mutation in finite populations. In its standard two-allele form with population size $n$, allele counts, and hence allele frequencies, change through stochastic replacement and mutation, yet converge to a stationary distribution. This distribution undergoes a qualitative transition at the \emph{critical mutation rate} $\mu_c=1/(2n)$: at $\mu=\mu_c$ it is exactly uniform, so that the probability of observing $k$ copies of allele~1 (and $n-k$ of allele~2) is $\pi(k)=1/(n+1)$ for $k=0,\dots,n$. For $\mu<\mu_c$ diversity is low: the stationary distribution places most of its mass near $k=0$ and $k=n$, and the population is therefore typically dominated by one allele. For $\mu>\mu_c$, on the other hand, diversity is high: the distribution concentrates around intermediate values, so that both alleles are commonly present at comparable frequencies. Recently, the two-allele Moran process was shown to be exactly equivalent to the voter model with two candidates and $\alpha_1$ and $\alpha_2$ committed voters (\emph{zealots}) in a population of $n+\alpha_1+\alpha_2$, where mutation is played by zealot influence. Here we extend this equivalence to multiple alleles and multiple candidates. Using the mapping, we derive the exact stationary distribution of allele counts for well-mixed populations with an arbitrary number $m$ of alleles, and obtain the critical mutation rate $\mu_c = 1/(m+2n-2)$, which depends explicitly on $m$. We then analyze the Moran process on randomly connected populations and show that both the stationary distribution and $\mu_c$ are invariant to network structure and coincide with the well-mixed results. Finally, simulations on general network topologies show that structural heterogeneity can substantially reshape the stationary allele distribution and, consequently, the level of genetic diversity.


[8] 2601.09912

High-Density Multi-Depth Human Recordings Using 45 mm Long Neuropixels Probes

Neuropixels probes, initially developed for use in small animal models, have transformed basic neuroscience by enabling high-density, single-cell resolution recordings across multiple brain regions simultaneously. The recent development of Neuropixels 1.0 NHP Long, a longer probe designed for non-human primates, has expanded this capability, enabling unprecedented simultaneous access to multiple cortical layers and deep brain structures of large-brained animals. Here, we report the first use of these probes in humans, aiming to establish safe intraoperative use and assess feasibility for clinical and research applications. Nine patients undergoing neurosurgical procedures, including epilepsy or tumor resection and deep brain stimulation (DBS) implantation, were enrolled. Successful intraoperative recordings were obtained from surface and deep cortical structures without probe breakage or adverse events. Compared with conventional electrodes, the Neuropixels probe enabled dense sampling across multiple parenchymal depths with submillisecond temporal resolution. Recordings were obtained from deep targets including the hippocampus and cingulate cortex, as well as from regions that are challenging to access with single-unit precision, such as the superior frontal sulcus. Custom tools and refined workflows lowered technical barriers for operative use and improved recording stability. Neural activity was observed across all recordings. Neuropixels 1.0-NHP Long probes can be deployed in the human operating room, enabling simultaneous recordings from multiple brain structures at single-neuron resolution. These methods expand opportunities for studying human brain function and pathology in vivo, and may ultimately support the development of more precise neurosurgical interventions.


[9] 2601.10032

Macroscopic dynamics of quadratic integrate-and-fire neurons subject to correlated noise

The presence of correlated noise, arising from a mixture of independent fluctuations and a common noisy input shared across the neural population, is a ubiquitous feature of neural circuits, yet its impact on collective network dynamics remains poorly understood. We analyze a network of quadratic integrate-and-fire neurons driven by Gaussian noise with a tunable degree of correlation. Using the cumulant expansion method, we derive a reduced set of effective mean-field equations that accurately describe the evolution of the population's mean firing rate and membrane potential. Our analysis reveals a counterintuitive phenomenon: increasing the noise correlation strength suppresses the mean network activity, an effect we term correlated-noise-inhibited spiking. Furthermore, within a specific parameter regime, the network exhibits metastability, manifesting itself as spontaneous, noise-driven transitions between distinct high- and low-activity states. These results provide a theoretical framework for reducing the dynamics of complex stochastic networks and demonstrate how correlated noise can fundamentally regulate macroscopic neural activity, with implications for understanding state transitions in biological systems.


[10] 2601.10202

Robust and Generalizable Atrial Fibrillation Detection from ECG Using Time-Frequency Fusion and Supervised Contrastive Learning

Atrial fibrillation (AF) is a common cardiac arrhythmia that significantly increases the risk of stroke and heart failure, necessitating reliable and generalizable detection methods from electrocardiogram (ECG) recordings. Although deep learning has advanced automated AF diagnosis, existing approaches often struggle to exploit complementary time-frequency information effectively, limiting both robustness under intra-dataset and generalization across diverse clinical datasets. To address these challenges, we propose a cross-modal deep learning framework comprising two key components: a Bidirectional Gating Module (BGM) and a Cross-modal Supervised Contrastive Learning (CSCL) strategy. The BGM facilitates dynamic, reciprocal refinement between time and frequency domain features, enhancing model robustness to signal variations within a dataset. Meanwhile, CSCL explicitly structures the joint embedding space by pulling together label-consistent samples and pushing apart different ones, thereby improving inter-class separability and enabling strong cross-dataset generalization. We evaluate our method through five-fold cross-validation on the AFDB and the CPSC2021 dataset, as well as bidirectional cross-dataset experiments (training on one and testing on the other). Results show consistent improvements over state-of-the-art methods across multiple metrics, demonstrating that our approach achieves both high intra-dataset robustness and excellent cross-dataset generalization. We further demonstrate that our method achieves high computational efficiency and anti-interference capability, making it suitable for edge deployment.


[11] 2601.10221

A Unified Dynamical Field Theory of Learning, Inference, and Emergence

Learning, inference, and emergence in biological and artificial systems are often studied within disparate theoretical frameworks, ranging from energy-based models to recurrent and attention-based architectures. Here we develop a unified dynamical field theory in which learning and inference are governed by a minimal stochastic dynamical equation admitting a Martin--Siggia--Rose--Janssen--de Dominicis formulation. Within this framework, inference corresponds to saddle-point trajectories of the associated action, while fluctuation-induced loop corrections render collective modes dynamically emergent and generate nontrivial dynamical time scales. A central result of this work is that cognitive function is controlled not by microscopic units or precise activity patterns, but by the collective organization of dynamical time scales. We introduce the \emph{time-scale density of states} (TDOS) as a compact diagnostic that characterizes the distribution of collective relaxation modes governing inference dynamics. Learning and homeostatic regulation are naturally interpreted as processes that reshape the TDOS, selectively generating slow collective modes that support stable inference, memory, and context-dependent computation despite stochasticity and structural irregularity. This framework unifies energy-based models, recurrent neural networks, transformer architectures, and biologically motivated homeostatic dynamics within a single physical description, and provides a principled route toward understanding cognition as an emergent dynamical phenomenon.


[12] 2601.10276

How Intrinsic Motivation Underlies Embodied Open-Ended Behavior

Although most theories posit that natural behavior can be explained as maximizing some form of extrinsic reward, often called utility, some behaviors appear to be reward independent. For instance, spontaneous motor babbling in human newborns and curiosity in little kids and other animals seem to elude a simple explanation in terms of extrinsic reward maximization. Rooted in these observations, intrinsic motivation has emerged as a potentially major driver of behavior. However, only recently have several quantitative and foundational theories of intrinsic motivation been put forward. We first provide a general framework to understand behavior as being organized hierarchically: objective--intrinsic reward, or motivation--drives, goals and extrinsic reward. We next review the main formalizations of intrinsic motivation, including empowerment, the free energy principle, information-gain maximization, and the maximum occupancy principle. These theories produce complex behavior by promoting, in various ways, entropic action-state paths. The presence of a single intrinsic motivation objective breaks infinite regress, as drives and goals act only temporarily to serve the objective. Extrinsic rewards, such as sugar or protein, are just a means to achieve the objective. Bounded cognition and embodiment impose constraints and boundary conditions for the intrinsic motivation objective. By virtue of their capability to generate complex behavior in a task-agnostic manner, theories of intrinsic motivation promise to become successful generative models of open-ended, embodied behavior.


[13] 2601.10364

Gene genealogies in diploid populations evolving according to sweepstakes reproduction

Recruitment dynamics, or the distribution of the number of offspring among individuals, is central for understanding ecology and evolution. Sweepstakes reproduction (heavy right-tailed offspring number distribution) is central for understanding the ecology and evolution of highly fecund natural populations. Sweepstakes reproduction can induce jumps in type frequencies and multiple mergers in gene genealogies of sampled gene copies. We take sweepstakes reproduction to be skewed offspring number distribution due to mechanisms not involving natural selection, such as in chance matching of broadcast spawning with favourable environmental conditions. Here, we consider population genetic models of sweepstakes reproduction in a diploid panmictic populations absent selfing and evolving in a random environment. Our main results are {\it (i)} continuous-time Beta and Poisson-Dirichlet coalescents, when combining the results the skewness parameter $\alpha$ of the Beta-coalescent ranges from $0$ to $2$, and the Beta-coalescents may be incomplete due to an upper bound on the number of potential offspring produced by any pair of parents; {\it (ii)} in large populations time is measured in units proportional to either $N/\log N$ or $N$ generations (where $2N$ is the population size when constant); {\it (iii)} it follows that incorporating population size changes leads to time-changed coalescents with the time-change independent of $\alpha$; {\it (iv)} using simulations we show that the ancestral process is not well approximated by the corresponding coalescent (as measured through certain functionals of the processes); {\it (v)} whenever the skewness of the offspring number distribution is increased the conditional (conditioned on the population ancestry) and the unconditional ancestral processes are not in good agreement.


[14] 2601.10397

Reshaping Neural Representation via Associative, Presynaptic Short-Term Plasticity

Short-term synaptic plasticity (STP) is traditionally viewed as a purely presynaptic filter of incoming spike trains, independent of postsynaptic activity. Recent experiments, however, reveal an associative form of STP in which presynaptic release probability changes alongside long-term potentiation, implying a richer computational role for presynaptic plasticity. Here we develop a normative theory of associative STP using an information-theoretic framework. Extending Fisher-information-based learning to Tsodyks-Markram synapses, we derive analytic update rules for baseline synaptic strength and release probability that maximize encoded stimulus information under resource constraints. The learning rules separate into a conventional postsynaptic term tracking local firing and a distinct presynaptic term with a phase-advanced structure that selectively detects stimulus onset; critically, differences between plasticity of baseline strength and release probability arise within this presynaptic component. For stimulus variations slower than the EPSP time constant, onset sensitivity biases optimal connectivity toward anti-causal associations, strengthening synapses from neurons activated later to those activated earlier. In recurrent circuits, these rules yield ramp-like sustained representations and reverse replay after drive removal. Linear-response analysis further shows that STP confers frequency-dependent phase selectivity on presynaptic drive and that constraints on total release probability systematically tune temporal asymmetry. Together, our results provide a principled account of associative STP and identify presynaptic plasticity of release probability as a substrate for rapidly reconfigurable temporal coding.


[15] 2601.10405

A Predictive Model for Synergistic Oncolytic Virotherapy: Unveiling the Ping-Pong Mechanism and Optimal Timing of Combined Vesicular Stomatitis and Vaccinia Viruses

We present a mathematical model that describes the synergistic mechanism of combined Vesicular Stomatitis Virus (VSV) and Vaccinia Virus (VV). The model captures the dynamic interplay between tumor cells, viral replication, and the interferon-mediated immune response, revealing a `ping-pong' synergy where VV-infected cells produce B18R protein that neutralizes interferon-$\alpha$, thereby enhancing VSV replication within the tumor. Numerical simulations demonstrate that this combination achieves complete tumor clearance in approximately 50 days, representing an 11\% acceleration compared to VV monotherapy (56 days), while VSV alone fails to eradicate tumors. Through bifurcation analysis, we identify critical thresholds for viral burst size and B18R inhibition, while sensitivity analysis highlights infection rates and burst sizes as the most influential parameters for treatment efficacy. Temporal optimization reveals that therapeutic outcomes are maximized through immediate VSV administration followed by delayed VV injection within a 1-19 day window, offering a strategic approach to overcome the timing and dosing challenges inherent in OVT.


[16] 2601.10482

Convex Efficient Coding

Why do neurons encode information the way they do? Normative answers to this question model neural activity as the solution to an optimisation problem; for example, the celebrated efficient coding hypothesis frames neural activity as the optimal encoding of information under efficiency constraints. Successful normative theories have varied dramatically in complexity, from simple linear models (Atick & Redlich '90), to complex deep neural networks (Lindsay '21). What complex models gain in flexibility, they lose in tractability and often understandability. Here, we split the difference by constructing a set of tractable but flexible normative representational theories. Instead of optimising the neural activities directly, following Sengupta et al. '18, we optimise the representational similarity, a matrix formed from the dot products of each pair of neural responses. Using this, we show that a large family of interesting optimisation problems are convex. This family includes problems corresponding to linear and some non-linear neural networks, and problems from the literature not previously recognised as convex, such as modified versions of semi-nonnegative matrix factorisation or nonnegative sparse coding. We put these findings to work in three ways. First, we provide the first necessary and sufficient identifiability result for a form of semi-nonnegative matrix factorisation. Second, we show that if neural tunings are `different enough' then they are uniquely linked to the optimal representational similarity, partially justifying the use of single neuron tuning analysis in neuroscience. Finally, we use the tractable nonlinearity of some of our problems to explain why dense retinal codes, but not sparse cortical codes, optimally split the coding of a single variable into ON & OFF channels. In sum, we identify a space of convex problems, and use them to derive neural coding results.


[17] 2601.10515

Testing three models of cognitive stress effects: A psychopharmacological randomized controlled trial of acute stress and stress hormones across visual perception, response inhibition and cognitive flexibility

Acute stress alters cognitive performance, yet competing models make divergent predictions regarding the mechanisms, scope, and temporal dynamics of these effects. This large-scale randomized controlled trial tested predications from three influential stress-effect models using a broad cognitive task battery embedded within a psychopharmacological stress paradigm. Across 606 testing sessions, 303 healthy male participants completed both the Maastricht Acute Stress Test (MAST) and its non-stress control condition. To independently manipulate acute stress and stress hormone availability, participants were additionally randomized to receive atomoxetine (40 mg; to prolong norepinephrine availability), hydrocortisone (10 mg; to increase cortisol availability), or placebo. Cognitive performance was assessed over 80-minutes (post-stress) using tasks targeting visual perception (rapid serial visual presentation), response inhibition (stop-signal), and cognitive flexibility (dual and switch tasks). MAST exposure selectively impaired response inhibition, reflected in shorter stop-signal delays, lower probabilities of successful stopping and prolonged stop-signal reaction times, particularly during later testing phases (40-80 minutes post-stress). MAST exposure did not affect visual perception or task-switching performance but buffered time-related declines in processing efficiency at the expense of task prioritization in the dual task. Pharmacological manipulation of norepinephrine or cortisol availability was effective but did not moderate cognitive stress effects. Overall, this pattern of task-specific impairment alongside stabilized processing efficiency cannot be fully explained by any tested model, highlighting the need to refine existing models and adopt more integrative approaches to advance our mechanistic understanding of cognitive stress-effects in laboratory and real-world contexts.


[18] 2601.10663

Sporadic Creutzfeldt Jakob disease presenting with cerebral atrophy following traumatic brain injury mimicking hydrocephalus a case report and literature review

Introduction Sporadic Creutzfeldt Jakob disease sCJD is a rapidly progressive neurodegenerative disease without effective treatment that usually results in death within one year. The recently applied methods have improved the accuracy of the disease diagnosis and the specific radiological findings provide the necessary information for differential diagnosis. Research question The research is aimed to provide a different perspective on the development of CJD and associated literature review. Materials and methods The study presents a case who presented cognitive deficits, gait instability, and urinary and fecal incontinence suffered from traumatic brain injury eight months ago before admission with cerebral ventricle dilation on CT images. Furthermore, studies describe relevant cases are also included. Results The patients symptoms got deteriorated. Further examinations, including 14-3-3 and tau proteins in the cerebrospinal fluid CSF, MRI, and EEG, confirmed the patients diagnosis of sCJD. He returned to the local hospital for the conservative treatment without effective medical intervention. Conclusion This case illustrates the diagnostic process of CJD and underscores the importance of distinguishing rare disorders from common conditions to achieve a comprehensive understanding of the disease.


[19] 2601.10070

Comparative Evaluation of Deep Learning-Based and WHO-Informed Approaches for Sperm Morphology Assessment

Assessment of sperm morphological quality remains a critical yet subjective component of male fertility evaluation, often limited by inter-observer variability and resource constraints. This study presents a comparative biomedical artificial intelligence framework evaluating an image-based deep learning model (HuSHeM) alongside a clinically grounded baseline derived from World Health Organization criteria augmented with the Systemic Inflammation Response Index (WHO(+SIRI)). The HuSHeM model was trained on high-resolution sperm morphology images and evaluated using an independent clinical cohort. Model performance was assessed using discrimination, calibration, and clinical utility analyses. The HuSHeM model demonstrated higher discriminative performance, as reflected by an increased area under the receiver operating characteristic curve with relatively narrow confidence intervals compared to WHO(+SIRI). Precision-recall analysis further indicated improved performance under class imbalance, with higher precision-recall area values across evaluated thresholds. Calibration analysis indicated closer agreement between predicted probabilities and observed outcomes for HuSHeM, while decision curve analysis suggested greater net clinical benefit across clinically relevant threshold probabilities. These findings suggest that image-based deep learning may offer improved predictive reliability and clinical utility compared with traditional rule-based and inflammation-augmented criteria. The proposed framework supports objective and reproducible assessment of sperm morphology and may serve as a decision-support tool within fertility screening and referral workflows. The proposed models are intended as decision-support or referral tools and are not designed to replace clinical judgment or laboratory assessment.


[20] 2601.10250

Cell Behavior Video Classification Challenge, a benchmark for computer vision methods in time-lapse microscopy

The classification of microscopy videos capturing complex cellular behaviors is crucial for understanding and quantifying the dynamics of biological processes over time. However, it remains a frontier in computer vision, requiring approaches that effectively model the shape and motion of objects without rigid boundaries, extract hierarchical spatiotemporal features from entire image sequences rather than static frames, and account for multiple objects within the field of view. To this end, we organized the Cell Behavior Video Classification Challenge (CBVCC), benchmarking 35 methods based on three approaches: classification of tracking-derived features, end-to-end deep learning architectures to directly learn spatiotemporal features from the entire video sequence without explicit cell tracking, or ensembling tracking-derived with image-derived features. We discuss the results achieved by the participants and compare the potential and limitations of each approach, serving as a basis to foster the development of computer vision methods for studying cellular dynamics.


[21] 2601.10464

MitoFREQ: A Novel Approach for Mitogenome Frequency Estimation from Top-level Haplogroups and Single Nucleotide Variants

Lineage marker population frequencies can serve as one way to express evidential value in forensic genetics. However, for high-quality whole mitochondrial DNA genome sequences (mitogenomes), population data remain limited. In this paper, we offer a new method, MitoFREQ, for estimating the population frequencies of mitogenomes. MitoFREQ uses the mitogenome resources HelixMTdb and gnomAD, harbouring information from 195,983 and 56,406 mitogenomes, respectively. Neither HelixMTdb nor gnomAD can be queried directly for individual mitogenome frequencies, but offers single nucleotide variant (SNV) allele frequencies for each of 30 "top-level" haplogroups (TLHG). We propose using the HelixMTdb and gnomAD resources by classifying a given mitogenome within the TLHG scheme and subsequently using the frequency of its rarest SNV within that TLHG weighted by the TLHG frequency. We show that this method is guaranteed to provide a higher population frequency estimate than if a refined haplogroup and its SNV frequencies were used. Further, we show that top-level haplogrouping can be achieved by using only 227 specific positions for 99.9% of the tested mitogenomes, potentially making the method available for low-quality samples. The method was tested on two types of datasets: high-quality forensic reference datasets and a diverse collection of scrutinised mitogenomes from GenBank. This dual evaluation demonstrated that the approach is robust across both curated forensic data and broader population-level sequences. This method produced likelihood ratios in the range of 100-100,000, demonstrating its potential to strengthen the statistical evaluation of forensic mtDNA evidence. We have developed an open-source R package `mitofreq` that implements our method, including a Shiny app where custom TLHG frequencies can be supplied.


[22] 2407.06703

HERMES: Holographic Equivariant neuRal network model for Mutational Effect and Stability prediction

Predicting the stability and fitness effects of amino acid mutations in proteins is a cornerstone of biological discovery and engineering. Various experimental techniques have been developed to measure mutational effects, providing us with extensive datasets across a diverse range of proteins. By training on these data, traditional computational modeling and more recent machine learning approaches have advanced significantly in predicting mutational effects. Here, we introduce HERMES, a 3D rotationally equivariant structure-based neural network model for mutational effect and stability prediction. Pre-trained to predict amino acid propensity from its surrounding 3D structure, HERMES can be fine-tuned for mutational effects using our open-source code. We present a suite of HERMES models, pre-trained with different strategies, and fine-tuned to predict the stability effect of mutations. Benchmarking against other models shows that HERMES often outperforms or matches their performance in predicting mutational effect on stability, binding, and fitness. HERMES offers versatile tools for evaluating mutational effects and can be fine-tuned for specific predictive objectives.


[23] 2410.02111

Global dynamical structures from infinitesimal data

Scientists and engineers alike target modeling of complex, high dimensional, and nonlinear dynamical systems as a central goal. Machine learning breakthroughs alongside mounting computation and data advance the efficacy of learning from trajectory measurements. However scientifically interpreting data-driven models, e.g., localizing attracting sets and their basins, remains elusive. Such limitations particularly afflict identification of system-level regulatory mechanisms characteristic of living systems, e.g., stabilizing control for whole-body locomotion, where discontinuous, transient, and multiscale phenomena are common and prior models are rare. As a next step towards theory-grounded discovery of behavioral mechanisms in biology and beyond, we introduce VERT, a framework for discovering attracting sets from trajectories without recourse to any global model. Our infinitesimal-local-global (ILG) pipeline estimates the proximity of any sampled state to an attracting set, if one exists, with formal accuracy guarantees. We demonstrate our approach on phenomenological and physical oscillators with hierarchical and impulsive dynamics, finding sensitivity to both global and intermediate attractors composed in sequence and parallel. Application of VERT to human running kinematics data reveals insight into control modules that stabilize task-level dynamics, supporting a longstanding neuromechanical control hypothesis. The VERT framework promotes rigorous inference of underlying dynamical structure even for systems where learning a global dynamics model is impractical or impossible.


[24] 2507.13253

Life Finds A Way: Emergence of Cooperative Structures in Adaptive Threshold Networks

There has been a long debate on how new levels of organization have evolved. It might seem unlikely, as cooperation must prevail over competition. One well-studied example is the emergence of autocatalytic sets, which seem to be a prerequisite for the evolution of life. Using a simple model, we investigate how varying bias toward cooperation versus antagonism shapes network dynamics, revealing that higher-order organization emerges even amid pervasive antagonistic interactions. In general, we observe that a quantitative increase in the number of elements in a system leads to a qualitative transition. We present a random threshold-directed network model that integrates node-specific traits with dynamic edge formation and node removal, simulating arbitrary levels of cooperation and competition. In our framework, intrinsic node values determine directed links through various threshold rules. Our model generates a multi-digraph with signed edges (reflecting support/antagonism, labeled ``help''/``harm''), which ultimately yields two parallel yet interdependent threshold graphs. Incorporating temporal growth and node turnover in our approach allows exploration of the evolution, adaptation, and potential collapse of communities and reveals phase transitions in both connectivity and resilience. Our findings extend classical random threshold and Erdős-Rényi models, offering new insights into adaptive systems in biological and economic contexts, with emphasis on the application to Collective Affordance Sets. This framework should also be useful for making predictions that will be tested by ongoing experiments of microbial communities in soil.


[25] 2511.12205

LCPan: efficient variation graph construction using Locally Consistent Parsing

Efficient and consistent string processing is critical in the exponentially growing genomic data era. Locally Consistent Parsing (LCP) addresses this need by partitioning an input genome string into short, exactly matching substrings (e.g., "cores"), ensuring consistency across partitions. Labeling the cores of an input string consistently not only provides a compact representation of the input but also enables the reapplication of LCP to refine the cores over multiple iterations, providing a progressively longer and more informative set of substrings for downstream analyses. We present the first iterative implementation of LCP with Lcptools and demonstrate its effectiveness in identifying cores with minimal collisions. Experimental results show that the number of cores at the i^th iteration is O(n/c^i) for c ~ 2.34, while the average length and the average distance between consecutive cores are O(c^i). Compared to the popular sketching techniques, LCP produces significantly fewer cores, enabling a more compact representation and faster analyses. To demonstrate the advantages of LCP in genomic string processing in terms of computation and memory efficiency, we also introduce LCPan, an efficient variation graph constructor. We show that LCPan generates variation graphs >10x faster than vg, while using >13x less memory.


[26] 2512.18442

Markovian Promoter Models: A Mechanistic Alternative to Hill Functions in Gene Regulatory Networks

Gene regulatory networks are typically modeled using ordinary differential equations (ODEs) with phenomenological Hill functions to represent transcriptional regulation. While computationally efficient, Hill functions lack mechanistic grounding and cannot capture stochastic promoter dynamics. We present a hybrid Markovian-ODE framework that explicitly models discrete promoter states while maintaining computational tractability. Uniquely, we parameterize this model using fractional dwell times derived from ChEC-seq data, enabling the inference of in vivo kinetic rates from steady-state chromatin profiling. Our approach tracks individual transcription factor binding events as a continuous-time Markov chain, linked to deterministic molecular dynamics. We validate this framework on seven gene regulatory systems spanning basic to advanced complexity: the GAL system, repressilator, Goodwin oscillator, toggle switch, incoherent feed-forward loop, p53-Mdm2 oscillator, and NF-$\kappa$B pathway. Comparison with stochastic simulation algorithm (SSA) ground truth demonstrates that Markovian promoter models achieve similar accuracy to full stochastic simulations while being 10-100$\times$ faster. Our framework provides a mechanistic foundation for gene regulation modeling and enables investigation of promoter-level stochasticity in complex regulatory networks.


[27] 2512.23877

High-fidelity robotic PCR amplification

Polymerase chain reaction (PCR) underpins modern molecular biology, yet its deployment in emerging domains such as DNA data storage and distributed diagnostics remains constrained by bulky thermocyclers, complex thermal hardware, and contamination-prone workflows. Here, we present an autonomous robotic PCR platform that redefines thermocycling as a motion-controlled process rather than a temperature-controlled device. The system employs a programmable robotic liquid handler to execute PCR entirely within sealed pipette tips, repeatedly immersing and withdrawing reaction volumes in a single temperature-stabilized oil bath to realize denaturation, annealing, and extension steps through precise spatiotemporal control. This architecture eliminates conventional thermocyclers and enables fully enclosed reactions with complete sample recovery. We demonstrate that the robotic system achieves amplification efficiency and sequencing fidelity comparable to high-performance commercial thermocyclers when applied to DNA-encoded datasets. Beyond performance parity, the platform minimizes reagent consumption, suppresses cross-contamination through physical isolation, and supports parallelization through robotic scheduling rather than hardware duplication. By abstracting PCR thermocycling into a robotically orchestrated manipulation task, this work establishes a generalizable framework for automated biochemical processing and positions robotic control as a central design axis for scalable, low-cost molecular workflows.


[28] 2601.05301

Is E. coli good at chemotaxis?

Bacteria seem masters of chemotaxis, yet recent work suggests otherwise. Henry Mattingly and colleagues (Nature Physics, 2026) argue that Escherichia coli uses only a small fraction of the sensory information available at its surface, challenging the long-held view that bacterial chemotaxis operates near physical sensing limits. This article offers a brief conceptual discussion of their findings, placing them in the context of classical chemotaxis models, robustness to noise, and broader perspectives drawn from physics, biology, and Greek mythology.


[29] 2501.13188

Topological constraints on self-organisation in locally interacting systems

All intelligence is collective intelligence, in the sense that it is made of parts which must align with respect to system-level goals. Understanding the dynamics which facilitate or limit navigation of problem spaces by aligned parts thus impacts many fields ranging across life sciences and engineering. To that end, consider a system on the vertices of a planar graph, with pairwise interactions prescribed by the edges of the graph. Such systems can sometimes exhibit long-range order, distinguishing one phase of macroscopic behaviour from another. In networks of interacting systems we may view spontaneous ordering as a form of self-organisation, modelling neural and basal forms of cognition. Here, we discuss necessary conditions on the topology of the graph for an ordered phase to exist, with an eye towards finding constraints on the ability of a system with local interactions to maintain an ordered target state. By studying the scaling of free energy under the formation of domain walls in three model systems -- the Potts model, autoregressive models, and hierarchical networks -- we show how the combinatorics of interactions on a graph prevent or allow spontaneous ordering. As an application we are able to analyse why multiscale systems like those prevalent in biology are capable of organising into complex patterns, whereas rudimentary language models are challenged by long sequences of outputs.


[30] 2510.03621

A flux-based approach for analyzing the disguised toric locus of reaction networks

Dynamical systems with polynomial right-hand sides are very important in various applications, e.g., in biochemistry and population dynamics. The mathematical study of these dynamical systems is challenging due to the possibility of multistability, oscillations, and chaotic dynamics. One important tool for this study is the concept of reaction systems, which are dynamical systems generated by reaction networks for some choices of parameter values. Among these, disguised toric systems are remarkably stable: they have a unique attracting fixed point, and cannot give rise to oscillations or chaotic dynamics. The computation of the set of parameter values for which a network gives rise to disguised toric systems (i.e., the disguised toric locus of the network) is an important but difficult task. We introduce new ideas based on network fluxes for studying the disguised toric locus. We prove that the disguised toric locus of any network $G$ is a contractible manifold with boundary, and introduce an associated graph $G^{\max}$ that characterizes its interior. These theoretical tools allow us, for the first time, to compute the full disguised toric locus for many networks of interest.


[31] 2510.10020

Calibrating Generative Models to Distributional Constraints

Generative models frequently suffer miscalibration, wherein statistics of the sampling distribution such as class probabilities deviate from desired values. We frame calibration as a constrained optimization problem and seek the closest model in Kullback-Leibler divergence satisfying calibration constraints. To address the intractability of imposing these constraints exactly, we introduce two surrogate objectives for fine-tuning: (1) the relax loss, which replaces the constraint with a miscalibration penalty, and (2) the reward loss, which converts calibration into a reward fine-tuning problem. We demonstrate that these approaches substantially reduce calibration error across hundreds of simultaneous constraints and models with up to one billion parameters, spanning applications in protein design, image generation, and language modeling.