New articles on Quantitative Biology


[1] 2603.29116

Disentangling the interactive effects of anthropogenic disturbances on biodiversity

Anthropogenic activity threatens biodiversity through climate change, habitat fragmentation, and increasing frequency and scale of disturbance. Various theoretical studies have sought to shed light on how these factors could promote or hinder the coexistence of species. However, our understanding of the relative importance of, and interactions between, these factors remains limited. In this study, we employ a theoretical approach integrating three commonly cited coexistence mechanisms -- the competition-colonisation trade-off, the intermediate disturbance hypothesis, and spatial heterogeneity -- into a unified model. We implement a novel method to integrate habitat autocorrelation into a system of differential equations, to create a simple and flexible model that can be used to investigate coexistence of multiple species arranged in a competitive hierarchy under different disturbance and habitat structure scenarios. Using this model, we find that considering interactions between different mechanisms is crucial for explaining the coexistence of species. Biodiversity patterns alternative to the uni-peak curve predicted by the intermediate disturbance hypothesis (e.g., bimodal) emerge along disturbance gradients as habitat fragmentation increases. Furthermore, habitat loss outweighs habitat autocorrelation effects in highly disturbed scenarios, yet autocorrelation can shape species coexistence under low disturbance. These findings underscore the need to integrate spatial and temporal mechanisms in biodiversity management.


[2] 2603.29176

Predicting Neuromodulation Outcome for Parkinson's Disease with Generative Virtual Brain Model

Parkinson's disease (PD) affects over ten million people worldwide. Although temporal interference (TI) and deep brain stimulation (DBS) are promising therapies, inter-individual variability limits empirical treatment selection, increasing non-negligible surgical risk and cost. Previous explorations either resort to limited statistical biomarkers that are insufficient to characterize variability, or employ AI-driven methods which is prone to overfitting and opacity. We bridge this gap with a pretraining-finetuning framework to predict outcomes directly from resting-state fMRI. Critically, a generative virtual brain foundation model, pretrained on a collective dataset (2707 subjects, 5621 sessions) to capture universal disorder patterns, was finetuned on PD cohorts receiving TI (n=51) or DBS (n=55) to yield individualized virtual brains with high fidelity to empirical functional connectivity (r=0.935). By constructing counterfactual estimations between pathological and healthy neural states within these personalized models, we predicted clinical responses (TI: AUPR=0.853; DBS: AUPR=0.915), substantially outperforming baselines. External and prospective validations (n=14, n=11) highlight the feasibility of clinical translation. Moreover, our framework provides state-dependent regional patterns linked to response, offering hypothesis-generating mechanistic insights.


[3] 2603.29398

Pathogen diversity emerging from coevolutionary dynamics in interconnected systems

The spread of infectious disease and the evolution of antigenically distinct strains are often modeled separately, despite strong feedbacks mediated by host immune memory and heterogeneous contacts. To tackle this challenging problem, we introduce a coevolutionary framework in which transmission occurs on a metapopulation network while mutational exploration of strain space follows a mutation network. In this multiscale model, cross-immunity is encoded by similarity in the latent diffusion geometry of the strain network, so that nearby strains confer partial immune protection. We first identify an effective critical region that controls the transition between extinction, recurrent outbreak episodes, and long-lived endemic persistence, thus characterizing the resulting strain-turnover dynamics. We then derive a replicator-mutator-like equation for strain composition and an explicit dynamical evolutionary landscape induced by the coupling of mutation and transmission. Finally, allowing host heterogeneity to modulate the local mutation structure, we show that spreading across demes can effectively connect otherwise disconnected components of strain space, increasing long-term endemic diversity while producing a non-monotonic change in overall prevalence. Together, our results isolate minimal mechanisms by which immune-mediated competition and network structure can shape antigenic diversification.


[4] 2603.29546

Sampling from the Solution Space and Metabolic Environments of Genome-Scale Metabolic Models

Flux sampling is an analysis that, based on a distribution, picks randomly an efficient number of points from the solution space of a metabolic model. Unlike most constraint-based analyses, flux sampling does not require an objective function to optimize, allowing for the exploration of the whole spectrum of the phenotypes a species can exhibit. However, sampling can also be restricted to a subspace where a chosen objective reaches at least a specified fraction of its optimum. This targeted approach adds value when investigating phenotypes that are optimal for a specific function. Contrary to Flux Balance Analysis, which returns a single solution, sampling leverages statistical power to uncover phenotypes that otherwise would be masked. This can be especially useful when changing the conditions (medium) in which a species lives. Here, we highlight some state-of-the-art methods for applying flux sampling at Genome-Scale Metabolic Models in different scenarios, and we showcase flux sampling applications


[5] 2603.29597

Structural and dynamical strategies to prevent runaway excitation in reservoir computing

Reservoirs, typically implemented as recurrent neural networks with fixed random connection weights, can be combined with a simple trained readout layer to perform a wide range of computational tasks. However, increasing the magnitude of reservoir connection weights to exploit nonlinear dynamics can cause the network to develop strong spontaneous activity that drives neurons into saturation, dramatically degrading performance. In this work, we investigate two distinct countermeasures against such runaway excitation. The first approach introduces a subtle non-homogeneous structure into the matrix of connection weigths $w_{ij}$, without altering the overall probability distribution $p(w)$. We identify several favorable structuring principles, such as creating a small subset of neurons with weaker-than-average input connections. Even if the rest of the reservoir falls into runaway saturating behavior, this weakly coupled subset remains in a mildly nonlinear regime whose dynamics can still be exploited by the readout layer. The second approach implements a form of automatic gain control, in which a dedicated control unit dynamically regulates the reservoir's average global activation toward an optimal setpoint. Although the control unit modulates the excitability of the reservoir only via a global gain factor, this mechanism substantially enlarges the dynamical regime favorable for computation and renders performance largely independent of the underlying connection statistics.


[6] 2603.29617

Convergent Representations of Linguistic Constructions in Human and Artificial Neural Systems

Understanding how the brain processes linguistic constructions is a central challenge in cognitive neuroscience and linguistics. Recent computational studies show that artificial neural language models spontaneously develop differentiated representations of Argument Structure Constructions (ASCs), generating predictions about when and how construction-level information emerges during processing. The present study tests these predictions in human neural activity using electroencephalography (EEG). Ten native English speakers listened to 200 synthetically generated sentences across four construction types (transitive, ditransitive, caused-motion, resultative) while neural responses were recorded. Analyses using time-frequency methods, feature extraction, and machine learning classification revealed construction-specific neural signatures emerging primarily at sentence-final positions, where argument structure becomes fully disambiguated, and most prominently in the alpha band. Pairwise classification showed reliable differentiation, especially between ditransitive and resultative constructions, while other pairs overlapped. Crucially, the temporal emergence and similarity structure of these effects mirror patterns in recurrent and transformer-based language models, where constructional representations arise during integrative processing stages. These findings support the view that linguistic constructions are neurally encoded as distinct form-meaning mappings, in line with Construction Grammar, and suggest convergence between biological and artificial systems on similar representational solutions. More broadly, this convergence is consistent with the idea that learning systems discover stable regions within an underlying representational landscape - recently termed a Platonic representational space - that constrains the emergence of efficient linguistic abstractions.


[7] 2603.29684

FcsIT: An Open-Source, Cross-Platform Tool for Correlation and Analysis of Fluorescence Correlation Spectroscopy Data

FcsIT is a platform-independent, open-source tool for calculating the correlation and fitting fluorescence correlation spectroscopy data. The software is written in Python and uses a powerful Dear PyGUI engine for its interface. It provides reading and correlating the TTTR data, as well as TCSPC filtering of the photon time-trace data. The circular-block bootstrap method applied to the calculation of correlation data and its variance results in data quality comparable to that obtained with commercially available software. An intuitive fitting interface provides efficient analysis of large datasets and includes nine predefined mathematical models for fitting correlation curves. Moreover, it allows users to add their own models in a user-friendly manner. Validation of the FcsIT tool against simulated FCS data and real FCS experiments confirms its usability and potential appeal to a wide variety of FCS users.


[8] 2603.29727

Latent-Y: A Lab-Validated Autonomous Agent for De Novo Drug Design

Drug discovery relies on iterative expert workflows that are slow to parallelize and difficult to scale. Here we introduce Latent-Y, an AI agent that autonomously executes complete antibody design campaigns from text prompts, covering literature review, target analysis, epitope identification, candidate design, computational validation, and selection of lab-ready sequences. Latent-Y is integrated into the Latent Labs Platform, where it operates in the same environment as drug-discovery experts with access to bioinformatics tools, biological databases, and scientific literature. The agent can run fully autonomously end-to-end, or collaboratively, where researchers review progress, provide feedback, and direct subsequent steps. Candidate antibodies are generated using Latent-X2, our frontier generative model for drug-like antibody design. We demonstrate the agent's capability across three distinct campaign types: epitope discovery guided by therapeutic specifications, cross-species binder design, and autonomous design from a scientific publication targeting human transferrin receptor for blood-brain barrier crossing. Across nine targets, Latent-Y produced lab-confirmed nanobody binders against six, achieving a 67% target-level success rate with binding affinities reaching the single-digit nanomolar range, without human filtering or intervention. In user studies, experts working with Latent-Y completed design campaigns 56 times faster than independent expert time estimates, compressing weeks of work into hours. Because Latent-X2 is a general-purpose atomic-level model for biologics design, the same agent architecture naturally extends to macrocyclic peptide and mini-binder design campaigns, broadening autonomous discovery across therapeutic modalities. Latent-Y is available to selected partners at this https URL.


[9] 2603.29833

Copy-Spread-Annihilate Dynamics in Degree-Assortative Networks

In many systems, communication proceeds by broadcasting rather than single source-target routing, but network structures that maximize signal lifetime are not well understood. Degree correlations are known to influence robustness and spreading, yet their effect on signal persistence has remained unclear. Here we introduce Copy-Spread-Annihilate dynamics, a minimal synchronous broadcasting model with annihilation. We show that signal lifetimes vary non-monotonically with assortativity and are maximized near neutral assortativity, where hub-driven amplification is strong but annihilation via short cycles is still limited. Applying this framework to the mouse connectome suggests assortativity as a structural control parameter for broadcast signal persistence in brain-like and other complex networks.


[10] 2603.29843

Counterfactual Analysis of Brain Network Dynamics

Causal inference in brain networks has traditionally relied on regression-based models such as Granger causality, structural equation modeling, and dynamic causal modeling. While effective for identifying directed associations, these methods remain descriptive and acyclic, leaving open the fundamental question of intervention: what would the causal organization become if a pathway were disrupted or externally modulated? We introduce a unified framework for counterfactual causal analysis that models both pathological disruptions and therapeutic interventions as an energy-perturbation problem on network flows. Grounded in Hodge theory, directed communication is decomposed into dissipative and persistent (harmonic) components, enabling systematic analysis of how causal organization reconfigures under hypothetical perturbations. This formulation provides a principled foundation for quantifying network resilience, compensation, and control in complex brain systems.


[11] 2603.29903

Multimodal Higher-Order Brain Networks: A Topological Signal Processing Perspective

Brain connectomics is still largely dominated by pairwise-based models, such as graphs, which cannot represent circulatory or higher-order functional interactions. In this paper, we propose a multimodal framework based on Topological Signal Processing (TSP) that models the brain as a higher-order topological domain and treats functional interactions as discrete vector fields. We integrate diffusion MRI and resting-state fMRI to learn subject-specific brain cell complexes, where statistically validated structural connectivity defines a sparse scaffold and phase-coupling functional edge signals drive the inference of higher-order interactions (HOIs). Using Hodge-theoretic tools, spectral filtering, and sparse signal representations, our framework disentangles brain connectivity into divergence (source-sink organization), gradient (potential-driven coordination), and curl (circulatory HOIs), enabling the characterization of temporal dynamics through the lens of discrete vector calculus. Across 100 healthy young adults from Human Connectome Project, node-based HOIs are highly individualized, yet robust mesoscale structure emerges under functional-system aggregation. We identify a distributed default mode network-centered gradient backbone and limbic-centered rotational flows; divergence polarization and curl profiles defining circulation regimes with insightful occupancy and dwell-time statistics. These topological signatures yield significant brain-behavior associations, revealing a relevant higher-order organization intrinsic to edge-based models. By making divergence, circulation, and recurrent mesoscale coordination directly measurable, this work enables a principled and interpretable topological phenotyping of brain function.


[12] 2603.29986

ParetoEnsembles.jl: A Julia Package for Multiobjective Parameter Estimation Using Pareto Optimal Ensemble Techniques

Mathematical models of natural and man-made systems often have many adjustable parameters that must be estimated from multiple, potentially conflicting datasets. Rather than reporting a single best-fit parameter vector, it is often more informative to generate an ensemble of parameter sets that collectively map out the trade-offs among competing objectives. This paper presents this http URL, an open-source Julia package that generates such ensembles using Pareto Optimal Ensemble Techniques (POETs), a simulated-annealing-based algorithm that requires no gradient information. The implementation corrects the original dominance relation from weak to strict Pareto dominance, reduces the per-iteration ranking cost from $O(n^2 m)$ to $O(nm)$ through an incremental update scheme, and adds multi-chain parallel execution for improved front coverage. We demonstrate the package on a cell-free gene expression model fitted to experimental data and a blood coagulation cascade model with ten estimated rate constants and three objectives. A controlled synthetic-data study reveals parameter identifiability structure, with individual rate constants off by several-fold yet model predictions accurate to 7%. A five-replicate coverage analysis confirms that timing features are reliably covered while peak amplitude is systematically overconfident. Validation against published experimental thrombin generation data demonstrates that the ensemble predicts held-out conditions to within 10% despite inherent model approximation error. By making ensemble generation lightweight and accessible, this http URL aims to lower the barrier to routine uncertainty characterization in mechanistic modeling.


[13] 2603.30004

From Patterns to Policy: A Scoping Review Based on Bibliometric Analysis (ScoRBA) of Intelligent and Secure Smart Hospital Ecosystems

This study examines the evolution of Intelligent and Secure Smart Hospital Ecosystems using a Scoping Review with Bibliometric Analysis (ScoRBA) to map research patterns, identify gaps, and derive policy implications. Analyzing 891 journal articles from Scopus (2006-2025) through co-occurrence analysis, network visualization, overlay analysis, and the Enhanced Strategic Diagram (ESD), the study applies the PAGER framework to link Patterns, Advances, Gaps, Research directions, and Evidence-based policy implications. Findings reveal three interrelated clusters: AI-driven intelligent healthcare systems, decentralized privacy-preserving digital health ecosystems, and scalable cloud-edge infrastructures, showing a convergence toward integrated ecosystem architectures where intelligence, trust, and infrastructure reinforce each other. Despite progress in AI, blockchain, and cloud computing, gaps remain in interoperability, real-world implementation, governance, and cross-layer integration. Emerging themes such as explainable AI, federated learning, and privacy mechanisms highlight areas needing further research. Policy-relevant recommendations focus on coordinated governance, scalable infrastructure, and secure data ecosystems, particularly for developing country contexts. The study bridges bibliometric evidence with actionable policies, supporting informed decision-making in smart hospital development.


[14] 2603.28930

Retrospective Economic Evaluation of Group Testing in the COVID-19 Pandemic

Surveillance of diseases in a pandemic is an important part of public health policy. Diagnostic testing at the individual level is often infeasible due to resource constraints. To circumvent these constraints, group testing can be applied. The economic cost evaluation from the payer's perspective typically focuses only on deterministic costs which overlooks the substantial economic impact of productivity losses resulting from quarantine and workplace disruptions. The objective of this article is to develop a mathematical model for a retrospective economic evaluation of group testing that incorporates both deterministic costs and income-based economic loss. Group testing algorithms are revisited and simulated at optimized pool sizes to determine the required number of tests. Income data from the German Socio-Economic Panel are integrated into a mathematical model to capture the economic loss. Afterward, hybrid Monte Carlo experiments are conducted by evaluating the economic cost in the Coronavirus disease 2019 pandemic in Germany. Monte Carlo experiments show that the optimal choice of group testing algorithms changes substantially when income-based economic losses are included. Evaluations considering only deterministic costs systematically underestimate the total economic cost. Algorithms with a longer quarantine duration are less attractive than shorter quarantine duration if income-based economic loss is accounted for. The findings show that current evaluations underestimate the true economic cost. Group testing algorithms with shorter duration and fewer stages are preferred, even when they require a larger number of tests. These results underscore the importance of incorporating income-based economic loss into a mathematical model.


[15] 2603.29529

Sampling at intermediate temperatures is optimal for training large language models in protein structure prediction

We investigate the parameter space of transformer models trained on protein sequence data using a statistical mechanics framework, sampling the loss landscape at varying temperatures by Langevin dynamics to characterize the low-loss manifold and understand the mechanisms underlying the superior performance of transformers in protein structure prediction. We find that, at variance with feedforward networks, the lack of a first--order--like transition in the loss of the transformer produces a range of intermediate temperatures with good learning properties. We show that the parameters of most layers are highly conserved at these temperatures if the dimension of the embedding is optimal, and we provide an operative way to find this dimension. Finally, we show that the attention matrix is more predictive of the contact maps of the protein at higher temperatures and for higher dimensions of the embedding than those optimal for learning.


[16] 2603.29793

Multimodal Machine Learning for Early Prediction of Metastasis in a Swedish Multi-Cancer Cohort

Multimodal Machine Learning offers a holistic view of a patient's status, integrating structured and unstructured data from electronic health records (EHR). We propose a framework to predict metastasis risk one month prior to diagnosis, using six months of clinical history from EHR data. Data from four cancer cohorts collected at Karolinska University Hospital (Stockholm, Sweden) were analyzed: breast (n = 743), colon (n = 387), lung (n = 870), and prostate (n = 1890). The dataset included demographics, comorbidities, laboratory results, medications, and clinical text. We compared traditional and deep learning classifiers across single modalities and multimodal combinations, using various fusion strategies and a Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) 2a design, with an 80-20 development-validation split to ensure a rigorous, repeatable evaluation. Performance was evaluated using AUROC, AUPRC, F1 score, sensitivity, and specificity. We then employed a multimodal adaptation of SHAP to analyze the classifiers' reasoning. Intermediate fusion achieved the highest F1 scores on breast (0.845), colon (0.786), and prostate cancer (0.845), demonstrating strong predictive performance. For lung cancer, the intermediate fusion achieved an F1 score of 0.819, while the text-only model achieved the highest, with an F1 score of 0.829. Deep learning classifiers consistently outperformed traditional models. Colon cancer, the smallest cohort, had the lowest performance, highlighting the importance of sufficient training data. SHAP analysis showed that the relative importance of modalities varied across cancer types. Fusion strategies offer distinct strengths and weaknesses. Intermediate fusion consistently delivered the best results, but strategy choices should align with data characteristics and organizational needs.


[17] 2603.29916

Growth-rate distributions at stationarity

We propose new analytical tools for describing growth-rate distributions generated by stationary time-series. Our analysis shows how deviations from normality are not pathological behaviour, as suggested by some traditional views, but instead can be accounted for by clean and general statistical considerations. In contrast, strict normality is the effect of specific modelling choices. Systems characterized by stationary Gamma or heavy-tailed abundance distributions produce log-growth-rate distributions well described by a generalized logistic distribution, which can describe tent-shaped or nearly normal datasets and serves as a useful null model for these observables. These results prove that, for large enough time lags, in practice, growth-rate distributions cease to be time-dependent and exhibit finite variance. Based on this analysis, we identify some key stylized macroecological patterns and specific stochastic differential equations capable of reproducing them. A pragmatic workflow for heuristic selection between these models is then introduced. This approach is particularly useful for systems with limited data-tracking quality, where applying sophisticated inference methods is challenging.


[18] 2603.29977

Quantifying Cross-Modal Interactions in Multimodal Glioma Survival Prediction via InterSHAP: Evidence for Additive Signal Integration

Multimodal deep learning for cancer prognosis is commonly assumed to benefit from synergistic cross-modal interactions, yet this assumption has not been directly tested in survival prediction settings. This work adapts InterSHAP, a Shapley interaction index-based metric, from classification to Cox proportional hazards models and applies it to quantify cross-modal interactions in glioma survival prediction. Using TCGA-GBM and TCGA-LGG data (n=575), we evaluate four fusion architectures combining whole-slide image (WSI) and RNA-seq features. Our central finding is an inverse relationship between predictive performance and measured interaction: architectures achieving superior discrimination (C-index 0.64$\to$0.82) exhibit equivalent or lower cross-modal interaction (4.8\%$\to$3.0\%). Variance decomposition reveals stable additive contributions across all architectures (WSI${\approx}$40\%, RNA${\approx}$55\%, Interaction${\approx}$4\%), indicating that performance gains arise from complementary signal aggregation rather than learned synergy. These findings provide a practical model auditing tool for comparing fusion strategies, reframe the role of architectural complexity in multimodal fusion, and have implications for privacy-preserving federated deployment.


[19] 2412.11338

A spontaneously patterning reaction diffusion network, containing an integrated activator inhibitor and substrate depletion mechanism, specifies trichoblast cell fate in Arabidopsis roots

Arabidopsis root hair patterning is controlled by a complex transcription factor network containing positive and negative feedback loops, epidermal cell-cell signalling, and positional signalling from underlying tissue. Recently, several long accepted regulatory interactions within the network have been revised, and while there are extensive data regarding individual components, the complexity of the network has made it difficult to understand how these components combine to ensure correct and robust epidermal patterning. Here, mathematical modelling was used to integrate the wealth of experimental data into a single transcription factor network model. Current understanding of the epidermal patterning network was found to be insufficient to reproduce experimental data, and thus an additional negative feedback loop was hypothesized which enabled the model to reproduce both wildtype and mutant data. The negative feedback was supported by sequence analysis of candidate regulators. Modelling investigations uncovered interactions, mechanisms, and constraints essential for patterning, and revealed how a recently redefined reaction functions to produce mutant data while contributing to network robustness in wildtype. When analysed together, these results provide a holistic understanding of epidermal cell fate determination in Arabidopsis, shown here to be governed by a spontaneously patterning reaction-diffusion network containing combined activator-inhibitor and substrate depletion mechanisms.


[20] 2501.07620

From seasons to decades: Solar radiation, cloud cover, and CO$_2$ shape young leaf phenology in a tropical forest over 26 years

1. Climate change is altering plant phenology globally with potential deleterious impacts on animal species and entire ecosystems, yet the long-term effects of climate change on tropical leaf production remain poorly understood. 2. We analyzed 26 years of young leaf phenology field data from Kibale National Park, Uganda, focusing on 12 tree species consumed by leaf-eating mammals. We examined seasonal and long-term patterns and how they are related to climatic variables using Bayesian hierarchical generalized additive mixed models (GAMMs). 3. The tree community and most species exhibited peaks in young leaf production during the two annual rain seasons, with seasonal changes primarily associated with diffuse light availability through solar radiation and cloud cover, as well as rainfall and minimum temperature. Long-term variations in leaf production was primarily linked to long-term changes in atmospheric CO2, solar radiation, and cloud cover. 4. Our results support the role of CO2 fertilization, though decreasing levels of solar radiation resulting from the ending of the recent solar cycle may be slowing this effect. 5. Synthesis: This study highlights the critical role of diffuse light, solar radiation, and the solar cycle in predicting tropical leaf production, emphasizing that interpretations of greening trends must consider solar radiation alongside atmospheric CO2 levels. Furthermore, our findings emphasize the complex relationship between climate and young leaf phenology, highlighting the importance of integrating species-specific long-term data to better understand the effects of climate change on food availability for tropical folivores and tropical forest ecosystems in general.


[21] 2512.11164

Mixed updating in structured populations

Evolutionary graph theory (EGT) studies the effect of population structure on evolutionary dynamics. The vertices of the graph represent the $N$ individuals. The edges denote interactions for competitive replacement. Two standard update rules are death-Birth (dB) and Birth-death (Bd). Under dB, an individual is chosen uniformly at random to die, and its neighbors compete to fill the vacancy proportional to their fitness. Under Bd, an individual is chosen for reproduction proportional to fitness, and its offspring replaces a randomly chosen neighbor. Here we study mixed updating between those two scenarios. In each time step, with probability $\delta$ the update is dB and with remaining probability it is Bd. We study fixation probabilities and times as functions of $\delta$ under neutral evolution and constant selection. Despite the fact that fixation probabilities and times can be increasing, decreasing, or non-monotonic in $\delta$, we prove nearly all unweighted undirected graphs have short fixation times and provide an efficient algorithm to estimate their fixation probabilities. Finally, we prove exact formulas for fixation probabilities on cycles, stars, and more complex structures and classify their sensitivities to $\delta$.


[22] 2512.15534

Characterizing Open-Ended Evolution Through Undecidability Mechanisms in Random Boolean Networks

Discrete dynamical models underpin systems biology, but we still lack substrate-agnostic diagnostics for when such models can sustain genuinely open-ended evolution (OEE): the continual production of novel phenotypes rather than eventual settling. We introduce a simple, model-independent metric, {\Omega}, that quantifies OEE as the residence-time-weighted average of attractor cycle lengths across the sequence of attractors realized over time. {\Omega} is zero for single-attractor dynamics and grows with the number and persistence of distinct cyclic phenotypes, separating enduring innovation from transient noise. Using Random Boolean Networks (RBNs) as a unifying testbed, we compare classical Boolean dynamics with biologically motivated non-classical mechanisms (probabilistic context switching, annealed rule mutation, paraconsistent logic, modal necessary/possible gating, and quantum-inspired superposition/paired-state coupling) under homogeneous and heterogeneous updating schemes. Our results support the view that undecidability-adjacent, state-dependent mechanisms -- implemented as probabilistic context switching, modal necessity/possibility gating, paraconsistent logic (controlled contradictions), or quantum-inspired superposition/paired-state coupling (correlated branching) -- are enabling conditions for sustained novelty. At the end of our manuscript we outline a practical extension of {\Omega} to continuous/hybrid state spaces, positioning {\Omega} as a portable benchmark for OEE in discrete biological modeling and a guide for engineering evolvable synthetic circuits.


[23] 2505.07638

Identifiability of SDEs for reaction networks

Biochemical reaction networks are widely applied across scientific disciplines to model complex dynamic systems. We investigate the diffusion approximation of reaction networks with mass-action kinetics, focusing on the identifiability of the stochastic differential equations associated to the reaction network. We derive conditions under which the law of the diffusion approximation is identifiable and provide theorems for verifying identifiability in practice. Notably, our results show that some reaction networks have non-identifiable reaction rates, even when the law of the corresponding stochastic process is completely known. Moreover, we show that reaction networks with distinct graphical structures can generate the same diffusion law under specific choices of reaction rates. Finally, we compare our framework with identifiability results in the deterministic ODE setting and the discrete continuous-time Markov chain models for reaction networks.


[24] 2508.20125

Improving Liver Disease Diagnosis with SNNDeep: A Custom Spiking Neural Network Using Diverse Learning Algorithms

Purpose: Spiking neural networks (SNNs) have recently gained attention as energy-efficient, biologically plausible alternatives to conventional deep learning models. Their application in high-stakes biomedical imaging remains almost entirely unexplored. Methods: This study introduces SNNDeep, the first tailored SNN specifically optimized for binary classification of liver health status from computed tomography (CT) features. To ensure clinical relevance and broad generalizability, the model was developed and evaluated using the Task03\Liver dataset from the Medical Segmentation Decathlon (MSD), a standardized benchmark widely used for assessing performance across diverse medical imaging tasks. We benchmark three fundamentally different learning algorithms, namely Surrogate Gradient Learning, the Tempotron rule, and Bio-Inspired Active Learning across three architectural variants: a fully customized low-level model built from scratch, and two implementations using leading SNN frameworks, i.e., snnTorch and SpikingJelly. Hyperparameter optimization was performed using Optuna. Results: Our results demonstrate that the custom-built SNNDeep consistently outperforms framework-based implementations, achieving a maximum validation accuracy of 98.35%, superior adaptability across learning rules, and significantly reduced training overhead. Conclusion:This study provides the first empirical evidence that low-level, highly tunable SNNs can surpass standard frameworks in medical imaging, especially in data-limited, temporally constrained diagnostic settings, thereby opening a new pathway for neuro-inspired AI in precision medicine.


[25] 2509.20702

Incorporating LLM Embeddings for Variation Across the Human Genome

Recent advances in large language model (LLM) embeddings have enabled powerful representations for biological data, but most applications to date focus on gene-level information. We present one of the first systematic frameworks to generate genetic variant-level embeddings across the entire human genome. Using curated annotations from FAVOR, ClinVar, and the GWAS Catalog, we construct functional text descriptions for 8.9 billion possible variants and generated embeddings at three scales: 1.5 million HapMap3/MEGA variants, 90 million imputed UK Biobank (UKB) variants, and 9 billion all possible variants. Embeddings were produced using general purpose models including both OpenAI's text-embedding-3-large and the open-source Qwen3-Embedding-0.6B models. Baseline quality control experiments demonstrate high predictive accuracy for variant-level properties, validating the embeddings as structured representations of genomic variation. We further apply them to real-world embedding-augmented genetic risk predictions that demonstrate the performance of using LLM embeddings in polygenic risk score (PRS) style predictions over the UK Biobank cohort data. These resources, publicly available on Hugging Face, provide a foundation for advancing large-scale genomic discovery and precision medicine.


[26] 2510.00027

Learning Inter-Atomic Potentials without Explicit Equivariance

Accurate and scalable machine-learned inter-atomic potentials (MLIPs) are essential for molecular simulations ranging from drug discovery to new material design. Current state-of-the-art models enforce roto-translational symmetries through equivariant neural network architectures, a hard-wired inductive bias that can often lead to reduced flexibility, computational efficiency, and scalability. In this work, we introduce TransIP: Transformer-based Inter-Atomic Potentials, a novel training paradigm for interatomic potentials achieving symmetry compliance without explicit architectural constraints. Our approach guides a generic non-equivariant Transformer-based model to learn SO(3)-equivariance by optimizing its representations in the embedding space. Trained on the recent Open Molecules (OMol25) collection, a large and diverse molecular dataset built specifically for MLIPs and covering different types of molecules (including small organics, biomolecular fragments, and electrolyte-like species), TransIP attains comparable performance in machine-learning force fields versus state-of-the-art equivariant baselines. Further, compared to a data augmentation baseline, TransIP achieves 40% to 60% improvement in performance across varying OMol25 dataset sizes. More broadly, our work shows that learned equivariance can be a powerful and efficient alternative to equivariant or augmentation-based MLIP models. Our code is available at: this https URL.


[27] 2511.03849

Which Similarity-Sensitive Entropy (Sentropy)?

Shannon entropy is not the only entropy that is relevant to machine-learning datasets, nor possibly even the most important one. Traditional entropies such as Shannon entropy capture information represented by elements' frequencies but not the richer information encoded by their similarities and differences. Capturing the latter requires similarity-sensitive entropy (``sentropy''). Sentropy can be measured using either the recently developed Leinster-Cobbold-Reeve framework (LCR) or the newer Vendi score (VS). This raises the practical question of which one to use: LCR or VS. Here we address this question theoretically and numerically, using 53 large and well-known imaging and tabular datasets. We find that LCR and VS values can differ by orders of magnitude and are complementary, except in limiting cases. We show that both LCR and VS results depend on how similarities are scaled, and introduce the notion of ``half-distance'' to parameterize this dependence. We prove the VS provides an upper bound on LCR for all non-negative values of the Rényi-Hill order parameter, as well as for negative values in the special case that the similarity matrix is full rank. We conclude that VS is preferable only when a dataset's elements can be usefully interpreted as linear combinations of a more fundamental set of ``ur-elements'' or when the system that the dataset describes has a quantum-mechanical character. In the broader case where one simply wishes to capture the rich information encoded by elements' similarities and differences as well as their frequencies, we propose that LCR should be favored; nevertheless, for certain half-distances the two methods can complement each other.


[28] 2601.11691

Explainable histomorphology-based survival prediction of glioblastoma, IDH-wildtype

Glioblastoma, IDH-wildtype (GBM-IDHwt) is the most common malignant brain tumor. While histomorphology is a crucial component of GBM-IDHwt diagnosis, it is not further considered for prognosis. Here, we present an explainable artificial intelligence (AI) framework to identify and interpret histomorphological features associated with patient survival. The framework combines an explainable multiple instance learning (MIL) architecture that directly identifies prognostically relevant image tiles with a sparse autoencoder (SAE) that maps these tiles to interpretable visual patterns. The MIL model was trained and evaluated on a new real-world dataset of 720 GBM-IDHwt cases from three hospitals and four cancer registries across Germany. The SAE was trained on 1,878 whole-slide images from five independent public glioblastoma collections. Despite the many factors influencing survival time, our method showed some ability to discriminate between patients living less than 180 days or more than 360 days solely based on histomorphology (AUC: 0.67; 95% CI: 0.63-0.72). Cox proportional hazards regression confirmed a significant survival difference between predicted groups after adjustment for established prognostic factors (hazard ratio: 1.47; 95% CI: 1.26-1.72). Three neuropathologists categorized the identified visual patterns into seven distinct histomorphological groups, revealing both established prognostic features and unexpected associations, the latter being potentially attributable to surgery-related confounders. The presented explainable AI framework facilitates prognostic biomarker discovery in GBM-IDHwt and beyond, highlighting promising histomorphological features for further analysis and exposing potential confounders that would be hidden in black-box models.