New articles on Quantitative Biology


[1] 2604.01252

A Data-Driven Measure of REM Sleep Propensity for Human and Rodent Sleep

Mammalian sleep is characterized by multiple alternations between episodes of rapid-eye-movement sleep (REMS) and non-REM sleep (NREMS). While the mechanisms governing the timing of these ultradian NREMS-REMS cycles remain poorly understood, the phenomenon of REMS pressure, namely a drive for REMS that builds up between REMS episodes, is thought to be a contributing factor. Prior analyses of NREMS-REMS cycles in mice has suggested that time in NREMS is a primary contributor to REMS pressure. Building on that finding, we previously introduced a REMS propensity measure defined as the probability to enter REMS before the accumulation of an additional amount of NREMS. Analyzing mouse ultradian cycle data, we showed that REMS propensity at REMS onset was positively correlated with REMS bout duration and with the probability of the occurrence of a REMS bout followed by a short inter-REMS interval, called a sequential REMS cycle. In this paper, we extend our analyses of REMS propensity to human and rat ultradian NREMS-REMS cycle data. We show that, as in mice, human and rat sleep contain both short NREMS-REMS sequential cycles and longer single NREMS-REMS cycles, though there are some differences in the relative distributions of cycle durations. Although rodents exhibit polyphasic sleep in contrast with the consolidated sleep of humans, the calculated REMS propensity measures in all three species show similar profiles as functions of time spent in NREMS: specifically, REMS propensity increases with time spent in NREMS until it reaches a peak value, and then it decays with additional time in NREMS. Positive correlations of REMS propensity at REMS onset with REMS bout duration were present in both human and rat data as in mouse data, suggesting that time spent in NREMS also influences REMS duration in these species.


[2] 2604.01295

Parallelized Hierarchical Connectome: A Spatiotemporal Recurrent Framework for Spiking State-Space Models

This work presents the Parallelized Hierarchical Connectome (PHC), a general framework that upgrades temporal-only State-Space Models (SSMs) into spatiotemporal recurrent networks. Conventional SSMs achieve high-speed sequence processing through parallel scans, yet are limited to temporal recurrence without lateral or feedback interactions within a single timestep. PHC maps the diagonal SSM core to a shared Neuron Layer and inter-neuronal communication to a shared Synapse Layer, where neurons are partitioned into hierarchical regions governed by the connectome topology. A Multi-Transmission Loop enables intra-slice spatial recurrence, allowing signals to propagate across the hierarchical connectome within each temporal window while preserving O(logT) parallelism. This framework enables integration of neuro-physical priors typically intractable for standard SSMs, including adaptive leaky integrate-and-fire dynamics, Dale's Law, short-term plasticity, and reward-modulated spike-timing-dependent plasticity. The framework is instantiated as PHCSSM, the first model to unify recurrent spiking neural network dynamics with diagonal SSM parallelism while enforcing all five biological constraints and learnable lateral connections within a fully parallelizable training pipeline. Empirical results on physiological benchmarks from the UEA multivariate time-series archive demonstrate that PHCSSM achieves performance competitive with state-of-the-art SSMs while reducing parameter complexity from Theta(D^2 L) for L-layer stacked architectures to Theta(D^2). These findings suggest that biologically grounded inductive biases offer a principled route to parameter-efficient sequence modeling, opening diagonal SSMs to spatiotemporal recurrence and enabling fully parallelizable recurrent spiking neural network training.


[3] 2604.01385

Strategies for tumor elimination and control under immune evasion and chemotherapy resistance

The evolutionary and ecological dynamics of tumors under immune responses and therapeutic interventions pose major challenges to long-term treatment success. Although treatment may initially achieve short-term disease control, resistant cancer cell subpopulations often arise, leading to relapse with more aggressive and treatment-resistant forms of the disease. Here, we develop and analyze mathematical models describing the interactions among effector cells, chemo-resistant tumor cells, and immuno-resistant tumor cells under distinct immune-evasion strategies. The models incorporate competition and cooperation between resistant and sensitive tumor subpopulations. We identify threshold conditions governing tumor persistence, elimination, and phenotype dominance under varying therapeutic intensities. These findings provide a theoretical framework for designing targeted and combination therapies and offer insights into strategies for mitigating the treatment resistance.


[4] 2604.01475

Interpretable Electrophysiological Features of Resting-State EEG Capture Cortical Network Dynamics in Parkinsons Disease

Parkinsons disease (PD) alters cortical neural dynamics, yet reliable non-invasive electrophysiological biomarkers remain elusive. This study examined whether interpretable EEG features capturing complementary aspects of neural dynamics can discriminate Parkinsonian neural states. A comprehensive set of interpretable features was extracted and grouped into Standard descriptors (spectral power, phase synchronization, time-domain statistics) and Dynamical descriptors (aperiodic activity, cross-frequency coupling, scale-free dynamics, neuronal avalanche statistics, and instantaneous frequency measures). A multi-head attention transformer classifier was trained using strict LOSO validation. Group-level comparisons were performed to identify electrophysiological differences associated with disease and medication state. Standard feature sets achieved strongest performance in discriminating medication states (PDoff vs PDon), whereas Dynamical performed competitively in contrasts between PD patients and healthy controls. Random feature ablation analyses indicated that Dynamical descriptors provide complementary information distributed across features while correlation analysis revealed low redundancy within both feature sets. Group-level comparisons revealed medication-sensitive reductions in delta power and voltage variance, modulation of neuronal avalanche statistics, persistent increases in theta phase synchronization in PD patients, and disease-related alterations in cross-frequency interactions. Traditional spectral and synchronization features primarily reflect medication-related neural modulation, whereas dynamical descriptors reveal broader alterations in cortical network organization associated with disease but also with medication. These findings support multivariate EEG representations as a promising framework for developing non-invasive biomarkers of PD.


[5] 2604.01734

A Novel Multi-view Mixture Model Framework for Longitudinal Clustering with Application to ANCA-Associated Vasculitis

Effectively modeling irregularly sampled longitudinal data is essential for understanding disease progression and improving risk prediction. We propose a two-view mixture model that integrates static baseline covariates and longitudinal biomarker trajectories within a unified probabilistic clustering framework. Temporal patterns are modeled using Neural Ordinary Differential Equations. Model training uses an EM algorithm with a sparsity-inducing log-penalty for interpretable subgroup discovery. Application of the model to an Irish cohort of ANCA-associated vasculitis patients reveals subgroups with heterogeneous serum creatinine trajectories and variation in end-stage kidney disease outcomes.


[6] 2604.01990

Evaluating Deep Surrogate Models for Knee Joint Contact Mechanics Under Input-Limited Conditions

Background and Objective: Accurate surrogate modeling of knee joint contact mechanics is important for reconstructing stress distributions and identifying risk-relevant regions, yet the relative suitability of different modeling paradigms under practically relevant input-limited conditions remains unclear. Methods: Nine male soccer players performed 90° change-of-direction trials. Finite element simulations driven by subject-specific joint posture and reaction forces were converted into graph-structured samples. Five surrogate architectures representing local diffusion, history-context enhancement, hierarchical multi-scale modeling, explicit global interaction, and local-global hybridization were compared using three-fold cross-subject validation under full, pose-corrupted, load-corrupted, and minimal-input conditions. Performance was evaluated using full-field error, high-stress error, high-risk region overlap, and hotspot localization metrics. Results: The hybrid model achieved the best overall performance under full inputs and remained the most robust under pose- and load-corrupted conditions. Under minimal inputs, no single model dominated all metrics: the history-context model yielded lower overall and high-stress errors, the hybrid model better preserved high-risk region reconstruction, and the hierarchical model showed an advantage in hotspot localization. Conclusion: Evaluation of surrogate models for knee joint contact mechanics should shift from accuracy comparisons under ideal inputs to a comprehensive assessment of the preservation of risk-relevant information under realistic input constraints. Although the local-global hybrid model showed the best overall robustness, the optimal model under minimal-input conditions remained task-dependent.


[7] 2604.02057

Thermodynamic connectivity reveals functional specialization and multiplex organization of extrasynaptic signaling

Neural communication operates on both fast synaptic transmission and slower, diffusive extrasynaptic signaling, yet how these two modes jointly organize brain function remains unclear. Here, using the complete synaptic and neuropeptidergic connectomes of \emph{Caenorhabditis elegans}, we develop a unified multiplex framework linking anatomical wiring to functional communication. We infer structure-derived functional connectivity from the synaptic connectome using equilibrium principles from statistical physics, yielding a probabilistic map of information flow across all synaptic pathways, and compare this functional layer directly with the extrasynaptic connectome. This reveals a principled functional specialization across four communication regimes: (i) a topology-dependent layer that reinforces and stabilizes synaptic motor circuits, (ii) a topology-resilient modulatory layer supporting global regulation and behavioral state control, (iii) a purely extrasynaptic network sustaining survival and homeostasis, and (iv) a purely synaptic regime mediating rapid, low-latency sensorimotor processing. Together, these findings reveal that synaptic and extrasynaptic signaling form complementary architectures optimized for speed, modulation, robustness, and survival, and provide a general strategy for integrating structural and modulatory connectomes to understand how distinct communication modes cooperate to sustain coherent brain function.


[8] 2604.02212

Phase estimation with autoregressive padding (PEAP): addressing inaccuracies and biases in EEG analysis

Accurate phase estimation at the edge of data segments is crucial for EEG applications such as EEG-TMS in offline and real-time data analysis. Our research evaluates the phase estimation performance of four commonly used methods (Phastimate, SSPE, ETP, and PhastPadding) for accuracy and systemic biases, using data from young and elderly healthy controls and chronic stroke participants. To address the identified limitations of the established methods, we introduce Phase Estimation with Autoregressive Padding (PEAP), a method that prevents strong bandpass filtering-induced artifacts. Contrary to the established methods, PEAP does not show significant biases and improves accuracy by 3.2 to 9.2% for the continuous phase estimation. Our offline analysis demonstrates how established methods are systematically biased towards some estimates and how they induce phase shifts. We also show that differences between methods do not vary between clinical and control populations, supporting their translatability. This work indicates that systematic biases in established phase estimation methods may compromise the validity and comparability of phase-dependent findings. PEAP addresses these limitations and thus offers a more reliable and more accurate alternative method.


[9] 2604.01357

Cell Migration Boundary Motion in Drosophila Egg Chambers: A Combined Phase Field and Chemoattractant Model

In the Drosophila melanogaster egg chamber, the collective migration of border cells toward the oocyte is guided by spatial gradients of chemoattractants. While cellular responses to these cues are well characterized, the spatial distribution of chemoattractant within the tissue remains difficult to measure experimentally due to imaging limitations and extracellular complexity. In this study, we develop a spatially resolved mathematical framework to model local chemoattractant concentrations during border cell migration. We use a phase-field approach to represent the egg chamber geometry and define a diffusion-reaction system with spatially heterogeneous diffusivity that accounts for confinement by cellular domains. This framework allows chemoattractant diffusion to be restricted to extracellular space while remaining excluded from the interiors of nurse cells, the border cell cluster, and the oocyte, similar to what we observe in vivo. We simulate secretion from the oocyte and degradation throughout the domain, showing how geometry shapes the distribution of signaling molecules. We further couple this chemical field to a mechanical model of cluster migration that includes a tangential interface migration (TIM) force, allowing the cluster to respond to both chemoattractant gradients and cell-cell contact. Our results show that signal localization and tissue geometry jointly influence directional persistence and the speed of migration. Notably, geometric bottlenecks and intersections can flatten local gradients and slow migration, consistent with experimental observations. This modeling framework offers a tool to investigate how biophysical constraints shape signaling environments and guide collective cell movement in vivo.


[10] 2604.01362

Multipath Channel Metrics and Detection in Vascular Molecular Communication: A Wireless-Inspired Perspective

Motivated by classical communications engineering, early works in molecular communication (MC) largely adopted established modeling and signal processing concepts from wireless electromagnetic communication systems. In the context of the human cardiovascular system (CVS), MC channel models evolved from simple unbounded and single-duct environments mimicking individual blood vessels to complex vessel network (VN) topologies, generally at the expense of analytical tractability. Up until now, this has largely prohibited rigorous communication-theoretic analysis of large-scale VNs. In this work, we leverage a recently established closed-form analytical channel model for VNs, named mixture of inverse Gaussians for hemodynamic transport (MIGHT), to conduct the first systematic communication-theoretic study of MC in complex, large-scale VNs. Based on MIGHT, we derive a Poisson channel noise model and unveil structural analogies between multipath wireless communications (MWC) and advective-diffusive MC in VNs. In particular, we establish classical MWC metrics, namely the root mean squared (RMS) delay spread, the mean excess delay, and the coherence bandwidth, for MC in VNs and derive closed-form expressions for the channel frequency response and power delay profile (PDP). Building on this characterization, we propose a VN-adapted, coherent decision-feedback (DF) detector and show how the derived multipath metrics can inform the choice of critical system parameters like the symbol duration, the sampling time, and the memory length. Additionally, we evaluate the detector's performance in different VNs exhibiting inter-symbol interference (ISI). Together, these contributions open the door to a systematic, MWC-inspired MC system design for large-scale VNs.


[11] 2604.01435

Osmotically Induced Shape Changes in Membrane Vesicles

We develop a self-consistent free-energy framework in which membrane shape and osmotic pressure are determined simultaneously in a finite reservoir by minimizing bending elasticity and solute entropy. Solute conservation makes osmotic pressure a thermodynamic variable rather than an externally prescribed parameter, producing a nonlinear coupling between membrane mechanics and solvent entropy. This coupling modifies the classical stability condition for spherical vesicles: instability emerges from global free-energy competition rather than the linear Helfrich stability criterion. The resulting critical pressures differ by orders of magnitude from Helfrich predictions and agree with simulations for small and large unilamellar vesicles. The framework is relevant to cellular environments involving biomolecular condensate confinement as well as synthetic vesicles and the development of osmotic-pressure-driven encapsulation platforms.


[12] 2604.01949

annbatch unlocks terabyte-scale training of biological data in anndata

The scale of biological datasets now routinely exceeds system memory, making data access rather than model computation the primary bottleneck in training machine-learning models. This bottleneck is particularly acute in biology, where widely used community data formats must support heterogeneous metadata, sparse and dense assays, and downstream analysis within established computational ecosystems. Here we present annbatch, a mini-batch loader native to anndata that enables out-of-core training directly on disk-backed datasets. Across single-cell transcriptomics, microscopy and whole-genome sequencing benchmarks, annbatch increases loading throughput by up to an order of magnitude and shortens training from days to hours, while remaining fully compatible with the scverse ecosystem. Annbatch establishes a practical data-loading infrastructure for scalable biological AI, allowing increasingly large and diverse datasets to be used without abandoning standard biological data formats. Github: this https URL


[13] 2604.02166

Data Sieving for Scalable Real-Time Multichannel Nanopore Sensing

High-throughput solid-state nanopore experiments generate continuous MHz-rate data streams in which only a small fraction of data contains informative molecular information. This creates storage and processing bottlenecks that limit experimental scalability. We introduce Data Sieving, a GPU-accelerated acquisition framework that integrates real-time event detection directly into the measurement pipeline and selectively stores and allows real-time analysis of snapshots around molecular translocations. The system employs a lightweight rolling-average and min-max trigger to identify event candidates in parallel across channels. This architecture reduces stored data volume by up to 98% while preserving complete molecular signatures across a wide temporal range, from microsecond-scale protein dynamics to second-scale nucleic acid nanoparticle events. Continuous baseline monitoring enables autonomous closed-loop actuation; in high-concentration DNA experiments, automatic declogging restored pore conductance, reducing the time spent in a non-productive clogged state to near-zero and without interrupting parallel measurements. Validated across DNA, protein, and nucleic acid nanoparticle measurements, Data Sieving links data storage directly to molecular information content rather than experiment duration, enabling scalable, real-time operation of parallel nanopore sensors. The approach provides a hardware-agnostic foundation for long-duration, high-bandwidth single-molecule experiments and other event-driven sensing platforms. By using algorithms intrinsically compatible with low-latency digital architectures, this framework provides a clear path toward high-bandwidth, highly multiplexed recording across hundreds of individual nanopore channels in both solid-state and biological pores.


[14] 2604.02203

QuantumXCT: Learning Interaction-Induced State Transformation in Cell-Cell Communication via Quantum Entanglement and Generative Modeling

Inferring cell-cell communication (CCC) from single-cell transcriptomics remains fundamentally limited by reliance on curated ligand-receptor databases, which primarily capture co-expression rather than the system-level effects of signaling on cellular states. Here, we introduce QuantumXCT, a hybrid quantum-classical generative framework that reframes CCC as the problem of learning interaction-induced state transformations between cellular state distributions. By encoding transcriptomic profiles into a high-dimensional Hilbert space, QuantumXCT trains parameterized quantum circuits to learn a unitary transformation that maps a baseline non-interacting cellular state to an interacting state. This approach enables the discovery of communication-driven changes in cellular state distributions without requiring prior biological assumptions. We validate QuantumXCT using both synthetic data with known ground-truth interactions and single-cell RNA-seq data from ovarian cancer-fibroblast co-culture systems. The model accurately recovers complex regulatory dependencies, including feedback structures, and identifies dominant communication hubs such as the PDGFB-PDGFRB-STAT3 axis. Importantly, the learned quantum circuit is interpretable: its entangling topology can be translated into biologically meaningful interaction networks, while post hoc contribution analysis quantifies the relative influence of individual interactions on the observed state transitions. By shifting CCC inference from static interaction lookup to learning data-driven state transformations, QuantumXCT provides a generative framework for modeling intercellular communication. This work establishes a new paradigm for de novo discovery of communication programs in complex biological systems and highlights the potential of quantum machine learning in single-cell biology.


[15] 2410.11548

Bayesian inference of mixed Gaussian phylogenetic models

Background: Continuous traits evolution of a group of taxa that are correlated through a phylogenetic tree is commonly modelled using parametric stochastic differential equations to represent deterministic change of trait through time, while incorporating noises that represent different unobservable evolutionary pressures. Often times, a heterogeneous Gaussian process that consists of multiple parametric sub-processes is often used when the observed data come from a very diverse set of taxa. In the maximum likelihood setting, challenges can be found when exploring the involved likelihood surface and when interpreting the uncertainty around the parameters. Results: We extend the methods to tackle inference problems for mixed Gaussian phylogenetic models (MGPMs) by implementing a Bayesian scheme that can take into account biologically relevant priors. The posterior inference method is based on the Population Monte Carlo (PMC) algorithm that are easily parallelized, and using an efficient algorithm to calculate the likelihood of phylogenetically correlated observations. A model evaluation method that is based on the proximity of the posterior predictive distribution to the observed data is also implemented. Simulation study is done to test the inference and evaluation capability of the method. Finally, we test our method on a real-world dataset. Conclusion: We implement the method in the R package bgphy, available at this http URL. Simulation study demonstrates that the method is able to infer parameters and evaluate models properly, while its implementation on the real-world dataset indicates that a carefully selected model of evolution based on naturally occurring classifications results in a better fit to the observed data.


[16] 2412.04172

Activity-dependent neuromodulation and calcium homeostasis cooperate to produce robust and modulable neuronal function

Neurons rely on two interdependent mechanisms, homeostasis and neuromodulation, to maintain robust and adaptable functionality. Calcium homeostasis stabilizes neuronal activity by adjusting ionic conductances, whereas neuromodulation dynamically modifies ionic properties in response to external signals carried by neuromodulators. Combining these mechanisms in conductance-based models often produces unreliable outcomes, particularly when sharp neuromodulation interferes with calcium-homeostatic tuning. This study explores how a biologically inspired neuromodulation controller can harmonize with calcium homeostasis to ensure reliable neuronal function. Using computational models of stomatogastric ganglion and dopaminergic neurons, we demonstrate that controlled neuromodulation preserves neuronal firing patterns while calcium homeostasis simultaneously maintains target intracellular calcium levels. Unlike sharp neuromodulation, the neuromodulation controller integrates activity-dependent feedback through mechanisms mimicking G-protein-coupled receptor cascades. The interaction between these controllers critically depends on the existence of an intersection in conductance space, representing a balance between target calcium levels and neuromodulated firing patterns. Maximizing neuronal degeneracy enhances the likelihood of such intersections, enabling robust modulation and compensation for channel blockades. We further show that this controller pairing extends to network-level activity, reliably modulating the rhythmic activity of central pattern generators. This study highlights the complementary roles of calcium homeostasis and neuromodulation, proposing a unified control framework for maintaining robust and adaptive neural activity under physiological and pathological conditions.


[17] 2505.22680

Cardiac-Phase-Dependent Spin Coherence as a Probe of Boundary Covariance Geometry in Neural Tissue

A recently proposed geometric framework predicts that the transition from distributed belief to committed action involves a metric regime change, culminating in a boundary regime where cross-mode structure becomes algebraically necessary for continued state-space compression. This paper examines whether reported magnetic resonance measurements of proton spins in neural tissue provide an empirical probe of this regime. A companion analysis identifies the detected signal as the readout-converted signature of double-quantum SU(1,1) pair coherence, which correlates with short-term memory performance and cardiac-phase dynamics during wakefulness. We show that the mathematical bridge between the abstract transport framework and the physical spin system is the Bures metric, which natively governs both Gaussian Wasserstein geometry and quantum density matrices. We argue that the observed signal is best understood as a probe of entry into a deep boundary regime where single-mode compression is exhausted and collective cross-mode squeezing emerges. Because high-temperature bulk NMR obstructs strictly bipartite entanglement witnesses, we contextualize the signal within a macroscopic multiple-quantum-coherence (MQC) framework. Consequently, the current data provide evidence for the metric-driven onset of collective non-compact SU(1,1) structure, establishing the necessary physical foundation for future macroscopic many-body entanglement certification.


[18] 2509.12073

CEP-IP: An Explainable Framework for Cell Subpopulation Identification in Single-cell Transcriptomics

Single-cell RNA sequencing (scRNA-seq) frameworks lack explainable approaches for identifying cell subpopulations harboring strong pairwise monotonic gene-module relationships between a gene of interest (GOI) and its co-expressed genes. CEP-IP is introduced as a novel explainable machine learning framework to address this gap. In the primary dataset, TRPM4 served as the GOI and its co-expressed ribosomal genes (Ribo) were identified via Spearman-Kendall dual-filter (i.e., dual-filtered gene, DFG). Generalized additive modeling quantified TRPM4-Ribo relationship strength via deviance explained (DE), which was then mapped to individual cells via CEP classification to identify top-ranked explanatory power (TREP) cells. TRPM4-Ribo transcriptional space was then stratified into pre-IP and post-IP regions using inflection point (IP) analysis, producing four subpopulations per patient for pathway analysis. TRPM4-Ribo modeling outperformed alternative gene set modules (FDR<0.05). In each prostate cancer (PCa) patient, CEP-IP yielded four cell subpopulations, where pre-IP TREP cells showed enrichment of immune-related processes, and post-IP TREP cells were enriched for ribosomal, translation, and cell adhesion pathways. Validation was performed in the Allen middle temporal gyrus (MTG) and Neftel glioblastoma (GBM) datasets. In the MTG dataset (CARM1P1-DFG module), post-IP TREP cells showed enrichment of neuron projection ontologies. In the GBM dataset, FOXM1 was the sole GOI yielding mesenchymal-state DFGs, with FOXM1-DFG post-IP TREP cells enriched for cell division and microtubule pathways; 3D trajectory analysis demonstrated continuous trajectories of TREP cells that were obscured in 2D embeddings. CEP-IP identifies biologically distinct cell subpopulations in three independent scRNA-seq datasets, and it may be applicable to other pairwise GOI-DFG modules in single-cell transcriptomics.


[19] 2510.16082

BIOGEN: Evidence-Grounded Multi-Agent Reasoning Framework for Transcriptomic Interpretation in Antimicrobial Resistance

Interpreting gene clusters from RNA sequencing (RNA-seq) remains challenging, especially in antimicrobial resistance studies where mechanistic insight is important for hypothesis generation. Existing pathway enrichment methods can summarize co-expressed modules, but they often provide limited cluster-specific explanations and weak connections to supporting literature. We present BIOGEN, an evidence-grounded multi-agent framework for post hoc interpretation of RNA-seq transcriptional modules. BIOGEN combines biomedical retrieval, structured reasoning, and multi-critic verification to generate traceable cluster-level explanations with explicit evidence and confidence labels. On a primary Salmonella enterica dataset, BIOGEN achieved strong biological grounding, including BERTScore 0.689, Semantic Alignment Score 0.715, KEGG Functional Similarity 0.342, and a hallucination rate of 0.000, compared with 0.100 for an LLM-only baseline. Across four additional bacterial RNA-seq datasets, BIOGEN also maintained zero hallucination under the same fixed pipeline. In comparisons with representative open-source agentic AI baselines, BIOGEN was the only framework that consistently preserved zero hallucination across all five datasets. These findings suggest that retrieval alone is not enough for reliable biological interpretation, and that evidence-grounded orchestration is important for transparent and source-traceable transcriptomic reasoning.


[20] 2604.01187

Competition at the front of expanding populations

When competing species grow into new territory, the population is dominated by descendants of successful ancestors at the expansion front. Successful ancestry depends on both the reproductive advantage (fitness), as well as ability and opportunity to colonize new domains. We present a model that integrates both elements by coupling the classic description of one-dimensional competition (Fisher equation) to the minimal model of front shape (KPZ equation). Macroscopic manifestations of these equations are distinct growth morphologies controlled by expansion rates, competitive abilities, or spatial anisotropy. In some cases the ability to expand in space may overcome reproductive advantage in colonizing new territory. When new traits appear with accumulating mutations, we find that variations in fitness in range expansion may be described by the Tracy--Widom distribution.


[21] 2503.03126

Controlling tissue size by active fracture

Groups of cells, including clusters of cancerous cells, multicellular organisms, and developing organs, may both grow and break apart. What physical factors control these fractures? In these processes, what sets the eventual size of clusters? We first develop a one-dimensional framework for understanding cell clusters that can fragment due to cell motility using an active particle model. We compute analytically how the break rate of cell-cell junctions depends on cell speed, cell persistence, and cell-cell junction properties. Next, we find the cluster size distributions, which differ depending on whether all cells can divide or only the cells on the edge of the cluster divide. Cluster size distributions depend solely on the ratio of the break rate to the growth rate - allowing us to predict how cluster size and variability depend on cell motility and cell-cell mechanics. Our results suggest that organisms can achieve better size control when cell division is restricted to the cluster boundaries or when fracture can be localized to the cluster center. Additionally, we derive a universal survival probability for an intact cluster $S(t)=\mathrm{e}^{-k_d t}$ at steady state if all cells can divide, which is independent of the rupture kinetics and depends solely on the cell division rate $k_d$. Finally, we further corroborate the one-dimensional analytics with two-dimensional simulations, finding quantitative agreement with some - but not all - elements of the theory across a wide range of cell motility. Our results link the general physics problem of a collective active escape over a barrier to size control, providing a quantitative measure of how motility can regulate organ or organism size.


[22] 2503.03485

TEDDY: A Family Of Foundation Models For Understanding Single Cell Biology

Understanding the biological mechanisms of disease is crucial for medicine, and in particular, for drug discovery. AI-powered analysis of genome-scale biological data holds great potential in this regard. The increasing availability of single-cell RNA sequencing data has enabled the development of large foundation models for disease biology. However, existing foundation models only modestly improve over task-specific models in downstream applications. Here, we explored two avenues for improving single-cell foundation models. First, we scaled the pre-training data to a diverse collection of 116 million cells, which is larger than those used by previous models. Second, we leveraged the availability of large-scale biological annotations as a form of supervision during pre-training. We trained the \model family of models comprising six transformer-based state-of-the-art single-cell foundation models with 70 million, 160 million, and 400 million parameters. We vetted our models on several downstream evaluation tasks, including identifying the underlying disease state of held-out donors not seen during training, distinguishing between diseased and healthy cells for disease conditions and donors not seen during training, and probing the learned representations for known biology. Our models showed substantial improvement over existing works, and scaling experiments showed that performance improved predictably with both data volume and parameter count.


[23] 2507.20598

Nullstrap-DE: A General Framework for Calibrating FDR and Preserving Power in DE Methods, with Applications to DESeq2 and edgeR

Differential expression (DE) analysis is a key task in RNA-seq studies, aiming to identify genes with expression differences across conditions. A central challenge is balancing false discovery rate (FDR) control with statistical power. Parametric methods such as DESeq2 and edgeR achieve high power by modeling gene-level counts using negative binomial distributions and applying empirical Bayes shrinkage. However, these methods may suffer from FDR inflation when model assumptions are mildly violated, especially in large-sample settings. In contrast, non-parametric tests like Wilcoxon offer more robust FDR control but often lack power and do not support covariate adjustment. We propose Nullstrap-DE, a general add-on framework that combines the strengths of both approaches. Designed to augment tools like DESeq2 and edgeR, Nullstrap-DE calibrates FDR while preserving power, without modifying the original method's implementation. It generates synthetic null data from a model fitted under the gene-specific null (no DE), applies the same test statistic to both observed and synthetic data, and derives a threshold that satisfies the target FDR level. We show theoretically that Nullstrap-DE asymptotically controls FDR while maintaining power consistency. Simulations confirm that it achieves reliable FDR control and high power across diverse settings, where DESeq2, edgeR, or Wilcoxon often show inflated FDR or low power. Applications to real datasets show that Nullstrap-DE enhances statistical rigor and identifies biologically meaningful genes.


[24] 2509.07013

Generalized Machine Learning for Fast Calibration of Agent-Based Epidemic Models

Agent-based models (ABMs) are widely used to study infectious disease dynamics, but their calibration is often computationally intensive, limiting their applicability in time-sensitive public health settings. We propose DeepIMC (Deep Inverse Mapping Calibration), a machine learning-based calibration framework that directly learns the inverse mapping from epidemic time series to epidemiological parameters. DeepIMC trains a bidirectional Long Short-Term Memory (BiLSTM) neural network on synthetic epidemic trajectories generated from agent-based models such as the Susceptible-Infected-Recovered (SIR) model, enabling rapid parameter estimation without repeated simulation at inference time. We evaluate DeepIMC through an extensive simulation study comprising 5,000 heterogeneous epidemic scenarios and benchmark its performance against Approximate Bayesian Computation (ABC) using likelihood-free Markov Chain Monte Carlo. The results show that DeepIMC substantially improves parameter recovery accuracy, produces sharp and well-calibrated predictive intervals, and reduces computational time by more than an order of magnitude relative to ABC. Although structural parameter identifiability constraints limit the precise recovery of all model parameters simultaneously, the calibrated models reliably reproduce epidemic trajectories and support accurate forward prediction with their estimated parameters. DeepIMC is implemented in the open-source R package epiworldRCalibrate, facilitating practical adoption for real-time epidemic modeling and policy analysis. Overall, our findings demonstrate that DeepIMC provides a scalable, operationally effective alternative to traditional simulation-based calibration methods for agent-based epidemic models.


[25] 2511.22828

Fast dynamical similarity analysis

Understanding how nonlinear dynamical systems (e.g., artificial neural networks and neural circuits) process information requires comparing their underlying dynamics at scale, across diverse architectures and large neural recordings. While many similarity metrics exist, current approaches fall short for large-scale comparisons. Geometric methods are computationally efficient but fail to capture governing dynamics, limiting their accuracy. In contrast, traditional dynamical similarity methods are faithful to system dynamics but are often computationally prohibitive. We bridge this gap by combining the efficiency of geometric approaches with the fidelity of dynamical methods. We introduce fast dynamical similarity analysis (fastDSA), a computationally efficient and accurate metric for measuring (dis)similarity between nonlinear dynamical systems. FastDSA leverages modern computational tools, including random matrix theory to determine optimal system rank, novel optimization pipelines for aligning system flow fields, and Koopman embeddings. Across benchmark nonlinear systems and recurrent network models, fastDSA is robust to arbitrary coordinate choices while remaining sensitive to meaningful dynamical differences, capturing variations in system evolution that geometric methods may miss and traditional methods detect only at high computational cost. To our knowledge, fastDSA is the fastest method that retains accuracy in comparing nonlinear dynamical systems. It enables scalable, statistical analyses across diverse systems, significantly expanding the practical applicability of dynamical similarity analysis.


[26] 2603.02491

What Capable Agents Must Know: Selection Theorems for Robust Decision-Making under Uncertainty

As artificial agents become increasingly capable, what internal structure is *necessary* for an agent to act competently under uncertainty? Classical results show that optimal control can be *implemented* using belief states or world models, but not that such representations are required. We prove quantitative "selection theorems" showing that strong task performance (low *average-case regret*) forces world models, belief-like memory and -- under task mixtures -- persistent variables resembling core primitives associated with emotion, along with informational modularity under block-structured tasks. Our results cover stochastic policies, partial observability, and evaluation under task distributions, without assuming optimality, determinism, or access to an explicit model. Technically, we reduce predictive modeling to binary "betting" decisions and show that regret bounds limit probability mass on suboptimal bets, enforcing the predictive distinctions needed to separate high-margin outcomes. In fully observed settings, this yields approximate recovery of the interventional transition kernel; under partial observability, it implies necessity of predictive state and belief-like memory, addressing an open question in prior world-model recovery work.