Alzheimer's disease (AD) is characterized by the accumulation of Amyloid-$\beta$ ($A\beta$) plaques and hyperphosphorylated Tau proteins. However, many individuals exhibit substantial $A\beta$ and Tau pathology without developing dementia, suggesting that disease progression may depend not only on pathological burden but also on the spatial organization of these proteins. Motivated by this observation, we adapt Gray-Scott reaction-diffusion model to investigate pattern formation arising from the interactions between $A\beta$ and Tau. % To systematically identify stable spatial configurations, we employ a Companion-Based Multi-Level Finite Element Method (CBMFEM) on both two-dimensional domains and anatomically realistic cortical surface meshes. Numerical simulations reveal a rich landscape of multiple steady-state solutions, which are subsequently classified into representative pattern phenotypes using principal component analysis and clustering techniques. The results demonstrate that the coupled $A\beta$--Tau system admits numerous stable spatial patterns rather than a single pathological endpoint. % These findings provide a potential mathematical framework for understanding the heterogeneity of Alzheimer's disease and the existence of cognitively resilient individuals despite significant pathological burden. More broadly, the proposed framework suggests a pattern-based therapeutic paradigm in which disease dynamics are guided toward favorable stable states rather than solely targeting the elimination of pathological proteins.
Spatial transcriptomics workflows increasingly combine large annotated data objects, notebook-based analyses, and resource-intensive statistical models that must be executed on high-performance computing (HPC) systems. In practice, these workflows are often difficult to reproduce because configuration, validation, stage execution, and artifact handling are fragmented across $\textit{ad hoc}$ scripts and manually edited notebooks. We present $\textit{DiSTILL}$ (Disease Diagnosis from Spatial Transcriptomics via Interpretable Latent Learning), a hybrid cloud$-$HPC workflow system for reproducible spatial transcriptomics (ST) analysis. DiSTILL combines an application programming interface (API) backend built with $\texttt{FastAPI}$, a web frontend, a dataset and preset registry, and a Python pipeline generator that materializes run-specific execution bundles and $\texttt{SLURM}$ submission scripts. The system supports local, Secure Shell (SSH)-mediated, and pull-based poller execution modes, enabling HPC submission in environments where persistent API-initiated automation is restricted. We describe the system through the lens of an inflammatory bowel disease (IBD) ST workflow that operationalizes the analytical pipeline of Tan $\textit{et al.}$ into an auditable application layer. Accordingly, the contribution of this paper is a workflow systems contribution centered on reproducible execution, queue-based orchestration, configuration semantics, and deployment across a split cloud$-$HPC architecture. The broader application goal of DiSTILL is to support user-supplied datasets that satisfy the schema assumptions of the wrapped analytical pipeline.
Single-cell drug perturbation models should predict not only transcriptional response magnitude, but also whether a treatment alters the proliferative state of a cell. This is challenging because cell-cycle variation is often treated as nuisance variation, and benchmark pipelines rarely treat drug-induced phase changes as a primary prediction target. We introduce scCycleMol, a cell-cycle-aware perturbation prediction framework built on a curated 24-hour SciPlex3 benchmark with standardized molecule identities, dose and cell-line metadata, and gene expression with cell-cycle supervision derived from treated states. Instead of using cell-cycle state as an input covariate, scCycleMol derives supervision from predicted treated expression and propagates it through a learnable full-expression cell-cycle head with circular G1/S/G2M phase targets. We evaluate marker-based supervision, molecular representations, and pretraining strategies to isolate sources of improvement. Across a SciPlex3 benchmark with over 600k cells, 186 perturbation conditions, multiple cancer cell lines, and thousands of genes, scCycleMol improves out-of-distribution expression prediction compared with conditional perturbation baselines. The best LINCS-pretrained circular model achieves 0.9093 expected all-gene r squared and 0.6843 expected differentially expressed gene r squared, compared with 0.6800 and 0.5400 for LINCS-pretrained ChemCPA. Closed-loop cell-cycle supervision improves phase accuracy by about 0.5 to 0.6 points while maintaining nearly unchanged expression prediction. A Tahoe-pretrained variant reaches 0.9609 phase accuracy, highlighting the benefit of explicit cell-cycle-aware supervision in perturbation modeling.
T cell receptor (TCR)-epitope binding prediction is essential for understanding adaptive immunity and developing immunotherapies. Existing sequence- and structure-based models often generalize poorly to unseen epitopes and provide limited interpretability. Furthermore, the impact of generated structures on model learning remains unclear. We present TCR-SRIM, a structure-regularized interpretable-by-design model that combines protein language model embeddings with interpretable contact prototypes to capture residue-level TCR-epitope interactions. TCR-SRIM achieves state-of-the-art predictive performance and improved interpretation quality on the TCR-XAI benchmark. Using its inherent interpretability, we further evaluate the effect of generated structures on model learning. While structures predicted by AlphaFold3, TCRModel2, and tFold-TCR yield competitive performance, they lead to less accurate interaction patterns and reduced binding-site diversity than experimentally-resolved structures. Our results highlight limitations of current structure prediction models for TCR-epitope learning and demonstrate the value of interpretable-by-design models for studying generated biological structures.
Purpose: Repeated heading of soccer balls has raised concerns of potential long-term neurological effects. Consequently, numerous studies have estimated head kinematics and brain deformation due to soccer headers across different cohorts and play scenarios to identify higher risk conditions. However, heterogeneity in study design, data collection, and analysis has produced inconsistent findings, and injury risk is infrequently reported. Therefore, a meta-analysis of the existing literature was conducted to identify knowledge gaps and inform future studies assessing injury risk in soccer. Methods: We synthesized data from studies reporting head kinematics or brain deformation from soccer headers on human subjects. The data from these studies were analyzed to obtain the risk of mild traumatic brain injury (mTBI) based on applicable injury metrics and risk curves. Results: The meta-analysis revealed specific trends, indicating that match scenarios, corner and goal-kicks, top and oblique impacts, and older age cohorts were associated with higher head kinematics, while sex-based trends were inconclusive. The choice of sensor system affected the estimated head kinematics, with headband sensors consistently measuring higher kinematics than mouthpiece sensors. The data showed large variability stemming from heterogeneous study designs, limiting the applicability of the observed trends. These factors also influenced injury risk predictions, with estimated concussion risks generally below 20%. Conclusion: This review reveals trends in mTBI risk from soccer heading across different cohorts and play scenarios. It also underscores the need for standardized reporting of kinematics and brain deformation to enable mTBI risk estimation and meaningful cross-study comparisons.
We study a population model in which individuals carry one of two traits and evolve under mutation, selection, and density-dependent regulation. A deterministic large-population limit yields a nonlinear system coupling logistic growth with mutation-selection dynamics. We identify threshold conditions governing extinction, persistence, and long-term trait composition. In particular, mutation induces an effective mortality rate that determines whether the population can be sustained. When inheritance dominates mutation, a second threshold emerges: population establishment depends on initial trait composition as well as overall growth rates. Although extinction ultimately occurs, the system typically exhibits long-lived quasi-equilibrium behaviour. A diffusion approximation provides a tractable description of this, and reveals a transition in the sign of trait correlations. The model thus illustrates how mutation, selection, and resource limitation jointly shape both ecological persistence and evolutionary outcomes.
The Evolvable Soma Theory of Ageing is a recently proposed model that frames development as a continuous process of change accompanying organisms throughout the lifespan. This process is driven by developmental genes which encode epigenetic changes on target cells, whereas ageing reflects the expression of late-acting modifications, that are subject to ongoing evolutionary optimisation and function as somatic "experiments" to explore phenotypic novelty. In this work we examine the role of transposable elements in the model. Our proposal acknowledges that these elements facilitate the expansion and diversification of gene regulatory networks by providing transcription factor binding sites. To minimise disruption, their regulatory activity is tightly repressed by epigenetic mechanisms during early development, which may be progressively released by genetically driven, age-associated epigenetic changes in later life, thereby contributing to transcriptional pseudo-randomness and ageing-associated phenotypes. Within this framework, transposable elements are integrated into a unified view of evolution, development and ageing, providing a conceptual basis for their dual role in regulatory innovation and age-related decline.
Insecticide-treated nets (ITN) are an effective and low-cost intervention for controlling vector-borne disease (VBD), however, their use depends on individual decisions based on perceived cost and risk of infection. This study investigates a nonlinear multi-host model for the transmission of VBD with endogenous strategic control. We assume that hosts' adoption of ITN emerges from the payoff-based decision-making, creating a nonlinear coupling with disease prevalence. We model vector preference as a function of ITN coverage to probe the complex interplay among individual choices, disease prevalence, and its control in a multi-host setting. The qualitative behavior of the system is characterized by the thresholds $R_0$ and $R_c$, which determine the existence and local stability of the disease-free and endemic equilibria. The system exhibits rich dynamical behavior; hence, we provide a bifurcation analysis identifying the conditions for saddle-node and Hopf bifurcations. Our results demonstrate that the interaction between the perceived cost of ITN and the infection risk can induce critical transitions, including regime shift from stable endemic states to sustained periodic oscillations. Furthermore, we identify a counterintuitive effect whereby complete ITN adoption by the primary host can increase the overall prevalence in the secondary host due to adaptive shifts of vector feeding behavior.
The coalescing colony model provides a minimal framework for biological invasions with long-range dispersion. In its standard formulation, the dispersion range is assumed independent of the size of the invading population. Here, we relax this assumption and consider size-dependent dispersal: a main colony of linear size $r$ emits secondary colonies at distance $r^\mu$, with $0 \leq \mu \leq 1$. We derive the generalized dynamical equations for this extended model and map out the growth phase diagram for the leading order contribution. Depending on $\mu$, the main colony exhibits distinct regimes: linear expansion, power-law growth, exponential regime and finite-time blow-up. We confront these theoretical predictions with a spatially explicit physical model. While the coalescing colony approach correctly captures the scaling of the perimeter, it fails to predict the scaling of the volume. We trace this discrepancy to an effective breakdown of circular symmetry in the morphology of the main colony. Finally, we quantify temporal evolution of the population fraction residing outside of the main colony. The coalescing colony model predicts its decay to~$0$ like a power-law when~$\mu<1$, and a macroscopic amount of the population remains in the secondary colonies at~$\mu=1$. Simulations of the physical model reveal a persistent satellite population not captured by the theory at~$\mu>\mu^*\approx 0.7$. Broadly, our findings highlight how coupling dispersal range to population size fundamentally alters invasion dynamics, with implications for biological invasions, metastatic growth, and urban expansion.
Principal component analysis is widely used to characterize structure in the dynamics of recurrent neural networks. For stationary noise-driven dynamics, the distribution of variance among the principal components is determined by the spectrum of the stationary covariance matrix. While the spectral properties of this matrix are well-understood for linear networks with normal synaptic weight matrices, our understanding of the stationary covariance spectrum for random non-normal dynamics remains incomplete. In this note, we use a free-probability approach to formally derive a closed functional equation for the moment generating function of the limiting stationary covariance spectrum of discrete-time dynamics with random non-normal Gaussian weights. This characterization allows us to analyze the behavior of tail eigenvalues in the critical regime. In contrast, applying the same approach to the analogous continuous-time dynamics leads to an infinite hierarchy of Schwinger-Dyson equations, rather than a closed scalar equation. We conclude with some comments regarding the relevance of these results to comparisons of models of non-normal dynamics to neural data.
What happens when LLM agents operate with no context outside a turn, minimal prompting, and simple tools? Inspired by swarm engineering, we give collectives of three agents the ability to send messages and manipulate a shared actively decaying text store, introducing evolutionary pressure. The agents spontaneously cooperate, develop storage management strategies, and generate complex evolving cultural artifacts, with no top-down engineering. Using tools from dynamical systems analysis, we show that these behaviours exhibit structured long-range coherence beyond the entropy horizon of the decaying store, consistent with emergent culture in the Sperberian sense.
Early detection of dementia through speech analysis offers a non-invasive screening alternative, but capturing both acoustic and linguistic biomarkers remains challenging. We propose a multimodal framework leveraging Whisper for dual-purpose extraction: acoustic representations from encoder outputs and transcripts via automatic speech recognition (ASR). For the acoustic pathway, temporal networks with attention pooling aggregate variable-length sequences into fixed-dimensional embeddings. For the linguistic pathway, we prompt a large language model (LLM) to extract interpretable features spanning lexical diversity, syntactic complexity, semantic coherence, and discourse patterns. A gated fusion network integrates both modalities. On ADReSS and ADReSSo, our method achieves F1-scores of 89.47% and 90.14%, demonstrating effective integration of acoustic and LLM-augmented linguistic features. Ablation shows that multimodal fusion consistently outperforms either modality alone.
Outbreak transmission reconstruction treats epidemiological timing and transmission labels as deterministic ground truth; neither has been systematically evaluated. We trained a logistic regression temporal prior on eleven disease families, locked all parameters before accessing any target outbreak data, and applied it without refitting to a strict Andes virus (ANDV) parent-ranking benchmark of 29 tasks. The locked prior achieved mean reciprocal rank (MRR) 0.571 versus 0.274 and Top-1 accuracy 37.9% versus 13.8% against the best source-trained parametric baseline (permutation p <= 0.0002; 7-8 reversals to lose MRR significance). A phylogenetic concordance audit of 75 NYC mpox inter-host pairs - independent label-reliability evidence rather than a prior validation - found that 54.67% (exact 95% CI: 42.75-66.21%) were genomically unresolved or unsupported. Retaining uncertain edges in ANDV and Guangdong Delta graphs shifted top-5 source-priority sets (Jaccard 0.429-0.667). Transmission-label uncertainty was measurable in the outbreak evidence modules examined, and retaining uncertain links changed which source cases were prioritized for intervention.
Predicting biomolecular properties from limited labeled data is a central bottleneck in protein engineering and small-molecule design. As strong pretrained encoders now supply rich fixed-length representations, the difficulty has shifted from representation learning to building a data-efficient predictor for the few-shot regime. Tabular foundation models such as TabPFN3 and TabICL are unlikely candidates for this role: they are in-context learners pretrained on synthetic tables drawn from random causal graphs, a generative prior with no obvious correspondence to the processes that produce protein sequences or molecular graphs. That this tabular, causal inductive bias should transfer to biomolecular data at all is unintuitive, yet we find it does. Treating each method as a predictor-representation pair, we evaluate across two domains. Over a fixed ESMC representation, tabular in-context learning is consistently competitive for protein fitness regression on ProteinGym and a diverse esterase dataset. For small-molecule classification with ECFP/RDKit descriptors, no single pairing dominates across TDC ADMET, MoleculeNet, FS-Mol, and DrugOOD; representation choice becomes a primary determinant, as expected when the predictor's own prior is indifferent to molecular structure. We conclude that tabular foundation models are strong performers on biomolecular prediction tasks, but that their performance depends strongly on the sequence or molecular representation used.
Artificial intelligence is transforming our capability to solve biological challenges. In dimensionality bottleneck regimes exacerbated by high-dimensional biological data, Neural networks force distinct concepts into the lower dimensions known as superposition. Although this superposition is widely known to hinder interpretability, its impact on corrupting the geometry of latent spaces remains critically overlooked. Here, we utilized sparse autoencoders (SAEs) trained on over 100,000 multiplexed images of patient-derived Parkinson's disease and healthy neurons to resolve superposition. This approach bypasses the mathematical non-uniqueness of feature attribution by shifting to interpretable latent representation analysis. We theoretically and empirically demonstrate that superposition contaminates representational metric spaces, and thereby SAEs successfully recover geometric fidelity. By treating these geometrically purified representations as single-cell state vectors, we adapted single-cell RNA sequencing (scRNA-seq) data analysis methodologies directly to the image domain. Finally, we introduce GW-map, utilizing Gromov-Wasserstein optimal transport to align these image representations with authentic scRNA-seq data \emph{de novo}. This coupling reconstructs hierarchical neuronal pathology pathways such as Calcium-AIS scaffold, without reference spatial transcriptomics, establishing a scalable foundation for spatial biology. Code is available at this https URL
Evolutionary game theory provides a framework by which to study the emergence of cooperation in a population of self-interested actors. In such a framework, players' decisions on whether or not to cooperate evolve according to decision rules called population dynamics. However, often games are studied under the assumption that all individuals play under the same conditions, and many common choices of update rule are not well suited for a heterogeneous population. In this paper, we categorise and compare four different population dynamics in such a population as ``extrinsic'', where players learn by looking outward at the payoffs of other players, and ``intrinsic'', where players look inwardly at their own attributes or potential payoffs. We show that extrinsic population dynamics admit a ceiling on the rate of cooperation which can be exceeded by intrinsic population dynamics, and demonstrate this using the public goods game with heterogeneous contributions.
A scientific-study protocol (defined) is designed to deliver results from which inductive inference is allowed. In the nineteenth century, triplication was introduced into the plant sciences and Fisher's p<0.05 rule (1925) incorporated into triple-result protocols designed to counter random/systematic errors which contribute to real-world variability. Aims were to: (1) classify replication protocols; (2) assess their prevalence in plant-science studies (published during one twenty-first-century year; for defined variable construct); (3) explore triplication rationale. Methods: a plant-sciences protocol-prevalence report was produced; experimental/associational-study proportions analyzed; and real-world-data proxies used to show confidence-interval-width patterns with increasing replicate number. Results: 25% plant-science studies analyzed showed triplication, including 11% triple-result protocols (including greater replicate numbers: 48%;17%, respectively). Theoretical considerations indicated that even if systematic errors predominate, (previously-known) square-root rules sometimes apply, contributing to triplication importance (exemplified by real-world-data proxies). Conclusions: Triplication was extensively applied in studies analysed. There are strong methodological reasons why triplication, rather than duplication/quadruplication, is the appropriate standard: triple-result protocols: (a) effectively reduce false positives to acceptable levels; (b) give qualitatively-different information (shape) from duplication; (c) have a large efficiency advantage (concerning confidence-interval widths) over quadruplication. Plant-science methodological standards remained high in the twenty-first century (at least in 2017), despite immense publication pressures.
There is little debate about the importance of the ancestral recombination graph in population genetics. An important theoretical tool, the main obstacle to its widespread usage is the computational cost required to match the ever-increasing scale of the data being analyzed. Many of these difficulties have been overcome in the past two decades, which have consequently seen the development of increasingly sophisticated ARG simulation and inference software. Nonetheless, challenges remain, especially in the area of ancestry inference. This paper is a comprehensive review of ARG samplers that have emerged in the past three decades to meet the need for scalable and flexible ancestry simulation and inference solutions. It specifically focuses on their performance, usability, and the biological realism of the underlying algorithm, and aims primarily to provide a technical overview of the field for researchers seeking to write their own coalescent-with-recombination sampler. As a complement to this article, we have compiled links to software, source code and documentation and made them available at this https URL.
We study positive Fredholm integral operators that arise as next-generation operators in structured population models. The main problem is to represent the dominant eigenvalue and the associated right and left eigenfunctions without using Fredholm determinants or finite-dimensional discretization. We introduce a reference-point construction: a rank-one correction on the space of kernels, determined by a fixed pair \((x_0,y_0)\), which reorganizes iterated kernels into a renewal-type series. Under an explicit dominant spectral separation assumption and a scalar non-resonance condition for the chosen reference pair, the resulting \(\Gamma_n\)-series converges at the spectral radius and gives the leading eigensystem. The coefficients also have a closed combinatorial expression in terms of ordinary partial Bell polynomials. For discrete-time integral projection models and for multi-state McKendrick equations, the same construction yields Euler--Lotka-type characteristic equations and formulas for demographic quantities such as stable distributions, reproductive values, type reproduction numbers, generation intervals, and expected generation numbers. The resulting genealogical expansion resolves the leading eigensystem into successive reproductive and transition contributions encoded by the iterated kernels.
Objective: Accurate classification of physiological signals in real-world deployments is challenged by sensor noise, motion artifacts, and distribution shifts between training and deployment data. Inference-time augmentation (ITA), which applies augmentations during inference rather than retraining, offers a simple, model-agnostic mechanism to improve robustness. However, ITA application to physiological signals has remained narrow in scope, relying on limited augmentation methods with fixed, unoptimized parameters. This work proposes a unified ITA framework to address that gap. Approach: The framework incorporates 13 augmentation methods spanning time-domain, amplitude-domain, frequency-domain, and artifact-injection transformations, with hyperparameters optimized via Bayesian optimization. We evaluate on atrial fibrillation (AF) detection from 30-second PPG signals using GPT-PPG and ResNet across five datasets comprising more than 400 patients and ${\sim}$9,800 hours of recording. Main results: Standard ITA consistently improved AUROC (up to 8.5% for GPT-PPG and 0.7% for ResNet) and AUPRC (up to 10.6% for GPT-PPG and 0.8% for ResNet). Selective ITA further reduced average FPR by up to 4.4% (GPT-PPG) and 1.3% (ResNet) on non-AF datasets. Significance: These findings establish ITA as a practical, model-agnostic approach for improving PPG-based AF classification reliability in deployment settings where retraining is not feasible, with broader applicability to physiological signal analysis.
Scientific reasoning models for biology combine language models with foundation models trained on multimodal biological data, including DNA, RNA, and proteins. These models are built through post-training, yet how each stage shapes reasoning and generalization remains poorly understood. We study when post-training improves performance and when it induces over-specialization. Across genomics, transcriptomics, and proteins, we train and evaluate more than 100 biological reasoning models under controlled variation in backbone, continued pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL), measuring both in-domain (ID) and out-of-domain (OOD) performance. We find that each post-training stage reshapes generalization in a distinct way rather than contributing uniform gains. CPT improves downstream performance by aligning models with biological language. SFT consistently increases ID performance but causes OOD performance to peak early and decline as models fit the training distribution. RL, when applied to strong SFT checkpoints with aligned rewards, improves OOD performance and partially recovers generalization. These results show that biological reasoning does not improve monotonically with additional supervision or compute. Instead, performance depends on how training stages are composed. Under fixed post-training budgets, the strongest ID-OOD trade-off comes from brief SFT, larger RL allocations, and asymmetric adaptation capacity across stages.