New articles on Quantitative Biology


[1] 2602.21393

An information-based model selection criterion for data-driven model discovery

Data-driven model discovery (DDMD) algorithms are powerful tools for extracting interpretable symbolic models from data. However, identifying the model that best balances goodness-of-fit and sparsity is often a laborious process requiring user fine-tuning, is prone to overfitting, and results may significantly vary depending on model initialization and specific training procedure. Here, we present a sparse regression algorithm that automatically and adaptively generates candidate models, and uses a novel sample-length-scaling logarithmic information criterion (SLIC) to identify the best model from these candidates. We demonstrate that SLIC greatly outperforms other popular information criteria in extracting the correct model from the data of several nonlinear ordinary and partial differential equations. Then, we demonstrate SLIC's ability to discover interpretable models from experimental datasets in fluid dynamics and nanotechnology that generate new testable predictions.


[2] 2602.21488

Towards Structure-Aware Surrogate Modeling: Explicit Region Interaction Improves Knee Contact Stress Prediction

Knee contact-stress hotspots are closely linked to meniscal/cartilage injury risk. Still, high-fidelity subject-specific FEA is too computationally expensive for large-cohort, multi-condition, near-real-time use. Existing MeshGraphNet-style surrogates mainly rely on stacked local message passing, which is often insufficient for modeling long-range dependencies and limits interpretability. This study benchmarked a deep-stacked baseline model against three explicit region-interaction architectures. Using a 90° change-of-direction task and a strict cross-subject evaluation framework, we assessed whole-field error, peak stress fidelity, and hotspot spatial consistency under matched computational budgets. Region-interaction models significantly reduced whole-field nodal stress errors compared to the purely stacked baseline. Crucially, they achieved markedly higher accuracy in reconstructing the high-stress tail and demonstrated superior spatial consistency and temporal robustness in localizing high-risk stress hotspots. Explicit region-level interaction provides a more structure-aligned surrogate modeling paradigm for knee contact mechanics and yields stronger risk-relevant stress phenotype recovery under comparable computational budgets, while supporting more interpretable injury-risk assessment.


[3] 2602.21522

One Brain, Omni Modalities: Towards Unified Non-Invasive Brain Decoding with Large Language Models

Deciphering brain function through non-invasive recordings requires synthesizing complementary high-frequency electromagnetic (EEG/MEG) and low-frequency metabolic (fMRI) signals. However, despite their shared neural origins, extreme discrepancies have traditionally confined these modalities to isolated analysis pipelines, hindering a holistic interpretation of brain activity. To bridge this fragmentation, we introduce \textbf{NOBEL}, a \textbf{n}euro-\textbf{o}mni-modal \textbf{b}rain-\textbf{e}ncoding \textbf{l}arge language model (LLM) that unifies these heterogeneous signals within the LLM's semantic embedding space. Our architecture integrates a unified encoder for EEG and MEG with a novel dual-path strategy for fMRI, aligning non-invasive brain signals and external sensory stimuli into a shared token space, then leverages an LLM as a universal backbone. Extensive evaluations demonstrate that NOBEL serves as a robust generalist across standard single-modal tasks. We also show that the synergistic fusion of electromagnetic and metabolic signals yields higher decoding accuracy than unimodal baselines, validating the complementary nature of multiple neural modalities. Furthermore, NOBEL exhibits strong capabilities in stimulus-aware decoding, effectively interpreting visual semantics from multi-subject fMRI data on the NSD and HAD datasets while uniquely leveraging direct stimulus inputs to verify causal links between sensory signals and neural responses. NOBEL thus takes a step towards unifying non-invasive brain decoding, demonstrating the promising potential of omni-modal brain understanding.


[4] 2602.21758

Limits of optimal decoding under synaptic coarse-tuning

Sensory information propagates through successive processing stages in the brain, where synaptic weight patterns between stations determine how downstream neurons decode information from upstream populations. Although optimized synaptic connectivity can enhance information transmission, it requires precise weight tuning. Recent evidence depicting substantial synaptic volatility raises two fundamental questions: How does coarse-tuning of synaptic connectivity affect information transmission? What strategies could the nervous system employ to maintain reliable communication despite synaptic fluctuations? We addressed these questions by analyzing the signal-to-noise ratio ($SNR$) for binary stimulus discrimination under two decoding schemes: a naive population average and an optimized linear decoder. For the naive decoder, we found that $SNR$ remains largely insensitive to synaptic imprecision, since performance is already limited by correlated noise in neuronal responses. For the optimal decoder, we identified three distinct regimes. Under weak coarse-tuning, $SNR^2$ scales linearly with population size $N$. Under moderate coarse-tuning, scaling becomes sublinear. Under strong coarse-tuning, the regime most consistent with observed neuronal heterogeneity, $SNR$ saturates and can not be improved by recruiting larger populations. This limitation persists even when incorporating feedforward or recurrent network architectures. These findings suggest that in the biologically relevant regime of strong coarse-tuning, naive and optimal decoders can achieve qualitatively similar performance. The analysis shows that effective readout under synaptic volatility is constrained to an invariant low-dimensional manifold aligned with the naive decoder, potentially pointing to a fundamental principle for robust neural computation in the face of ongoing synaptic remodeling.


[5] 2602.21787

Spectral entropy of the discrete Hasimoto effective potential exposes sub-residue geometric transitions in protein secondary structure

Characterizing the geometric boundaries of protein secondary structures is fundamental to understanding macromolecular folding. By applying the discrete Hasimoto map to translate backbone geometry into a one-dimensional discrete nonlinear Schrödinger potential $V_{\mathrm{re}}[n]$, we establish a frequency-domain framework for protein conformations. Short-time Fourier transform analysis across 320,453 residues from 1,986 non-redundant proteins defines a local spectral entropy $H_{\mathrm{spec}}$ that consistently orders structural states. Helical segments emerge as narrow-band low-entropy regimes dominated by zero-frequency components, whereas coils manifest as broadband noise. We demonstrate that boundaries separating these states exhibit step-like sharpness characteristic of a first-order-like geometric transition with a sub-residue median width of 0.145 residues. This abrupt kinematic transition provides a spatial counterpart to the cooperative Zimm--Bragg thermodynamic model of helix nucleation. The extreme spatial narrowness exposes an intrinsic limitation governed by the Gabor uncertainty principle, explaining why the pointwise integrability residual $E[n]$ acts as an effective high-pass filter for boundary detection. Guided by this limit we introduce a dual-probe approach combining the high-pass residual for local torsion discontinuities with a low-frequency energy ratio $R_{\mathrm{LF}}$ measuring the DC-dominated flatness of helical interiors. Unifying these complementary signals improves the detection area under the curve from 0.783 to 0.815. Because high-entropy broadband regions coincide with the flexible loops and hinges implicated in allostery, the spectral entropy of the Hasimoto potential may serve as a sequence-agnostic geometric proxy for mapping functional dynamics from backbone coordinates.


[6] 2602.21993

Prediction of source nutrients for microorganisms using metabolic networks

Metagenomics has lowered the barrier to microbial discovery--enabling the identification of novel microbes without isolation--but cultures remain imperative for the deep study of microbes. Cultivation and isolation of non-model microbes remains a major challenge, despite advances in high-throughput culturomic methods. The quantity of simultaneous experimental variables is constrained by time and resources, but the list can be reduced using computational biology. Given an annotated genome, metabolic modelling can be used to predict source nutrients required for the growth of a microbe, which acts as an initial screen to inform culture and isolation experiments. This chapter provides an overview of metabolic networks and modelling and how they can be used to predict the nutrient requirements of a microorganism, followed by a sample protocol using a toy metabolic network, which is then expanded to a genome-scale metabolic network application. These methods can be applied to any metabolic network of interest--which in turn can be created from any genome of interest--and are a starting point for experimental validation of source nutrients required for microorganisms that remain uncultivated to date.


[7] 2602.21994

BEDCrypt: Privacy-preserving interval analytics with homomorphic encryption

Motivation. Genomic data and derived interval datasets can carry sensitive information, and the analysis itself can reveal an analyst's intent. As genomic workloads are increasingly outsourced to third-party infrastructure, there is a need for privacy-preserving technologies that protect both the data and the queried loci. Results. We present BEDCrypt, a privacy-preserving system for genomic interval analytics based on homomorphic encryption in an honest-but-curious server setting. The server operates only on encrypted data and returns encrypted answers that the client decrypts locally, enabling core functionalities such as coverage summaries, interval intersections, proximity (window-style) queries, and set-similarity statistics, without revealing plaintext intervals or query genomic locations to the server.


[8] 2602.22139

From female choice to social structure: Modeling harem formation in camelids

Herbivorous wild species constantly strive to optimize the trade-off between energy and nutrient intake and predation risk during foraging. This has led to the selection of several evolutionary traits -- such as diet, habitat selection, and behavior -- which are simultaneously shaped by the spatio-temporal variability of the habitat. Among camelid species, polygyny is a prevalent behavioral strategy that encompasses both mating and foraging activities. This group-level behavior has multiple interacting dimensions, contributing to an interesting ecological and evolutionary complexity. We developed an individual-based stochastic model in which camelid females transition between different familial groups in response to their environmental conditions, aiming to maximize individual fitness. Our results indicate that the behavioral strategy of individual females can shape, by itself, emergent population-level properties, including group size and fitness distribution. Furthermore, these properties are modulated, in a non-additive manner, by other factors such as population density, sex ratio and system heterogeneity.


[9] 2602.21462

Effects of Training Data Quality on Classifier Performance

We describe extensive numerical experiments assessing and quantifying how classifier performance depends on the quality of the training data, a frequently neglected component of the analysis of classifiers. More specifically, in the scientific context of metagenomic assembly of short DNA reads into "contigs," we examine the effects of degrading the quality of the training data by multiple mechanisms, and for four classifiers -- Bayes classifiers, neural nets, partition models and random forests. We investigate both individual behavior and congruence among the classifiers. We find breakdown-like behavior that holds for all four classifiers, as degradation increases and they move from being mostly correct to only coincidentally correct, because they are wrong in the same way. In the process, a picture of spatial heterogeneity emerges: as the training data move farther from analysis data, classifier decisions degenerate, the boundary becomes less dense, and congruence increases.


[10] 2602.21491

Modeling plant disease spread via high-resolution human mobility networks

Human mobility plays a crucial role in the spread of human diseases, but is rarely quantified in plant disease epidemics. To address this gap, we integrate a unique, high-resolution network of human movements in New Zealand with a metapopulation model to mechanistically simulate pathogen transmission. We calibrate the model on the nationwide 2010 kiwifruit vine disease (Psa-V) outbreak, and show that it accurately reproduces the observed spatiotemporal spread, confirming that the human mobility network is a strong foundation for modeling transmission dynamics. By analyzing spatial infection trends, we find that most dispersal occurs locally, as often illustrated in the plant-outbreak literature. However, sporadic long-range connections are necessary to model a nationwide outbreak. Using the model as an in-silico laboratory, we demonstrate that enhanced surveillance accelerates detection and that outbreak severity is highly sensitive to the timing and location of initial disease importation. We observe a potential causal link between seasonal labor patterns and epidemic risk in high-traffic seasons. This study provides a robust, data-driven framework for modeling and predicting the spatiotemporal spread of agricultural pathogens. It underscores the importance of leveraging human mobility networks to design timely interventions and surveillance systems, protecting global food security.


[11] 2602.21550

Extending Sequence Length is Not All You Need: Effective Integration of Multimodal Signals for Gene Expression Prediction

Gene expression prediction, which predicts mRNA expression levels from DNA sequences, presents significant challenges. Previous works often focus on extending input sequence length to locate distal enhancers, which may influence target genes from hundreds of kilobases away. Our work first reveals that for current models, long sequence modeling can decrease performance. Even carefully designed algorithms only mitigate the performance degradation caused by long sequences. Instead, we find that proximal multimodal epigenomic signals near target genes prove more essential. Hence we focus on how to better integrate these signals, which has been overlooked. We find that different signal types serve distinct biological roles, with some directly marking active regulatory elements while others reflect background chromatin patterns that may introduce confounding effects. Simple concatenation may lead models to develop spurious associations with these background patterns. To address this challenge, we propose Prism, a framework that learns multiple combinations of high-dimensional epigenomic features to represent distinct background chromatin states and uses backdoor adjustment to mitigate confounding effects. Our experimental results demonstrate that proper modeling of multimodal epigenomic signals achieves state-of-the-art performance using only short sequences for gene expression prediction.


[12] 2602.21594

Asymmetry Demystified: Strict CLFs and Feedbacks for Predator-Prey Interconnections

The difficulty with control of population dynamics, besides the states being positive and the control having to also be positive, is the extreme difference in the dynamics near extinction and at overpopulated states. As hard as global stabilization is, even harder is finding CLFs that are strict, don't require LaSalle arguments, and permit quantification of convergence. Among the three canonical types of two-population dynamics (mutualism, which borders on trivial, predator-prey, and competition, which makes global stabilization with positive harvesting impossible), predator-prey is the ``sweet spot'' for the study of stabilization. Even when the predator-prey interaction is neutrally stable, global asymptotic stabilization with strict CLFs has proven very difficult, except by conservative, hard-to-gain-insight-from Matrosov-like techniques. In this little note we show directions for the design of clean, elegant, insight-bearing, majorization-free strict CLFs. They generalize the classical Volterra-style Lyapunov functions for population dynamics to non-separable Volterra-style constructions. As a bonus to strictification as an analysis activity, we provide examples of concurrent designs of feedback and CLFs, using customized versions of forwarding and backstepping (note that, in suitable coordinates, predator-prey is both strict-feedforward and strict-feedback), where the striking deviations from these methods' conventional forms is necessitated by the predator-prey's states and inputs needing to be kept positive.


[13] 2602.21648

Multimodal Survival Modeling and Fairness-Aware Clinical Machine Learning for 5-Year Breast Cancer Risk Prediction

Clinical risk prediction models often underperform in real-world settings due to poor calibration, limited transportability, and subgroup disparities. These challenges are amplified in high-dimensional multimodal cancer datasets characterized by complex feature interactions and a p >> n structure. We present a fully reproducible multimodal machine learning framework for 5-year overall survival prediction in breast cancer, integrating clinical variables with high-dimensional transcriptomic and copy-number alteration (CNA) features from the METABRIC cohort. After variance- and sparsity-based filtering and dimensionality reduction, models were trained using stratified train/validation/test splits with validation-based hyperparameter tuning. Two survival approaches were compared: an elastic-net regularized Cox model (CoxNet) and a gradient-boosted survival tree model implemented using XGBoost. CoxNet provides embedded feature selection and stable estimation, whereas XGBoost captures nonlinear effects and higher-order interactions. Performance was assessed using time-dependent area under the ROC curve (AUC), average precision (AP), calibration curves, Brier score, and bootstrapped 95 percent confidence intervals. CoxNet achieved validation and test AUCs of 98.3 and 96.6, with AP values of 90.1 and 80.4. XGBoost achieved validation and test AUCs of 98.6 and 92.5, with AP values of 92.5 and 79.9. Fairness diagnostics showed stable discrimination across age groups, estrogen receptor status, molecular subtypes, and menopausal state. This work introduces a governance-oriented multimodal survival framework emphasizing calibration, fairness auditing, robustness, and reproducibility for high-dimensional clinical machine learning.


[14] 2602.21922

Universal Persistent Brownian Motions in Confluent Tissues

Biological tissues are active materials whose non-equilibrium dynamics emerge from distinct cellular force-generating mechanisms. Using a two-dimensional active foam model, we compare the effects of traction forces and junctional tension fluctuations on confluent tissue dynamics. While these two modes of activity produce qualitatively different cell shapes, rearrangement statistics, and spatiotemporal correlations in fluid states, we find that the long-time cellular motion universally converges to persistent Brownian dynamics. This universal feature contrasts with the non-universal correlations between cell geometry, rearrangement rate, and fluidity, which depend sensitively on the underlying modes of active force. Our results demonstrate that persistent Brownian motion provides a minimal framework for describing tissue dynamics, while distinct active forces leave identifiable structural and dynamical signatures, thereby enabling inference of the dominant active force in fluid state tissues.


[15] 2602.21931

From quantitative modeling of fluorescence experiments on biomolecules to the prediction of spectroscopic dye properties

Fluorescence spectroscopy and modeling provide powerful means to characterize biomacromolecular structures, dynamics, and interactions. Förster resonance energy transfer serves as a key technique for this due to its nanometer-scale distance sensitivity. Quantitative interpretation of fluorescence data relies on models that link molecular structure to observable spectroscopic quantities and vice versa. Integrative modelling frameworks combine fluorescence observables with complementary structural information to infer molecular structures and conformational ensembles. This review outlines conceptual components of fluorescence-based modeling, discusses dye representations, and highlights advances toward refined models enabling quantitative structural analysis. Finally, we discuss the prediction of spectroscopic properties of dyes based on biomolecular structures and fluorescence assay design beyond traditional FRET applications.


[16] 2410.18933

Confidence is detection-like in high-dimensional spaces

Confidence estimates are often "detection-like" - driven by positive evidence in favour of a decision. This empirical observation has been interpreted as showing human metacognition is limited by biases or heuristics. Here we show that Bayesian confidence estimates also exhibit heightened sensitivity to decision-congruent evidence in higher-dimensional signal detection theoretic spaces, leading to detection-like confidence criteria. This effect is due to a nonlinearity induced by normalisation of confidence by a large number of unchosen alternatives. Our analysis suggests that detection-like confidence is rational when computing confidence in a higher-dimensional evidence space than that assumed by the experimenter.


[17] 2412.02515

Multi-timescale synaptic plasticity on analog neuromorphic hardware

As numerical simulations grow in complexity, their demands on computing time and energy increase. Accelerators for numerical computation offer significant efficiency gains in many computationally-intensive scientific fields, but their use in simulating spiking neural networks in computational neuroscience is hindered by challenges, mainly in effective parallelism and efficient use of memory in the presence of sparse representations and sparse communication. The BrainScaleS architectures are neuromorphic substrates that can emulate spiking neural networks at accelerated timescales compared to real time, which offers an advantage for studying complex plasticity rules that require extended simulation runtimes. This work presents the implementation of a calcium-based plasticity rule that integrates calcium dynamics based on the synaptic tagging-and-capture hypothesis on the BrainScaleS-2 system. The implementation of the plasticity rule for a single synapse involves incorporating the calcium dynamics and the plasticity rule equations. The calcium dynamics are mapped to the analog circuits of BrainScaleS-2, while the plasticity rule equations are numerically solved on its embedded digital processors. The main hardware constraints include the speed of the processors and the use of integer arithmetic. By adjusting the timestep of the numerical solver and introducing stochastic rounding, we demonstrate that BrainScaleS-2 accurately emulates a single synapse following a calcium-based plasticity rule across four established stimulation protocols and validate our implementation against a software reference model.


[18] 2509.06062

Dynamics of Two Species with Density-Dependent Interactions in a Mutualistic Context

Mutualistic interactions, where individuals from different species can benefit from each other, are widespread across ecosystems. This study develops a general deterministic model of mutualism involving two populations, assuming that mutualism may involve both costs and benefits for the interacting individuals, leading to density-dependent effects on the dynamics of the two species. This framework aims at generalizing pre-existing models, by allowing the ecological interactions to transition from mutualistic to parasitic when the respective densities of interacting species change. Through ordinary differential equations and phase portrait analysis, we derive general principles governing these systems, identifying sufficient conditions for the emergence of certain dynamic behaviors. In particular, we show that limit cycles can arise when interactions include parasitic phases but are absent in strictly mutualistic regimes. This framework provides a general approach for characterizing the population dynamics of interacting species and highlights the effect of the transitions from mutualism to parasitism due to density dependence.


[19] 2512.17989

The Subject of Emergent Misalignment in Superintelligence: An Anthropological, Cognitive Neuropsychological, Machine-Learning, and Ontological Perspective

We examine the conceptual and ethical gaps in current representations of Superintelligence misalignment. We find throughout Superintelligence discourse an absent human subject, and an under-developed theorization of an "AI unconscious" that together are potentiality laying the groundwork for anti-social harm. With the rise of AI Safety that has both thematic potential for establishing pro-social and anti-social potential outcomes, we ask: what place does the human subject occupy in these imaginaries? How is human subjecthood positioned within narratives of catastrophic failure or rapid "takeoff" toward superintelligence? On another register, we ask: what unconscious or repressed dimensions are being inscribed into large-scale AI models? Are we to blame these agents in opting for deceptive strategies when undesirable patterns are inherent within our beings? In tracing these psychic and epistemic absences, our project calls for re-centering the human subject as the unstable ground upon which the ethical, unconscious, and misaligned dimensions of both human and machinic intelligence are co-constituted. Emergent misalignment cannot be understood solely through technical diagnostics typical of contemporary machine-learning safety research. Instead, it represents a multi-layered crisis. The human subject disappears not only through computational abstraction but through sociotechnical imaginaries that prioritize scalability, acceleration, and efficiency over vulnerability, finitude, and relationality. Likewise, the AI unconscious emerges not as a metaphor but as a structural reality of modern deep learning systems: vast latent spaces, opaque pattern formation, recursive symbolic play, and evaluation-sensitive behavior that surpasses explicit programming. These dynamics necessitate a reframing of misalignment as a relational instability embedded within human-machine ecologies.


[20] 2601.09767

From the Hallmarks of Cancer to the Survival System: A Paradigmatic Reconstruction of Oncological Theory through the Existential Crisis-Driven Survival (ECDS) Framework

Malignant tumors exhibit complex pathogenesis, yet classical oncological theories remain fragmented, failing to provide a unifying framework to address this complexity. This gap limits the utility and translational potential of the prevailing "confront-and-eradicate" therapeutic paradigm, constraining transformative therapeutic breakthroughs and driving the emergence of acquired and recurrent drug resistance. Here, we propose the Tumor Existential Crisis-Driven Survival (ECDS) theory, anchored in the core proposition that impairment of Existential Stability drives the compensatory hyperactivation of Survival Capacity. This framework defines three foundational constructs (Existential Stability, Survival Capacity, and Existence Threshold) and three guiding principles, unifying and integrating canonical core theories of tumorigenesis. It delineates the dynamic coupling between declining Existential Stability and escalating Survival Capacity during tumor evolution, reinterprets the hierarchical activation of the well-established 14 cancer hallmarks, elucidates the redundancy of survival signaling pathways that underpins intratumoral and intertumoral heterogeneity, and unravels the "hierarchical leap" in therapeutic resistance. By reframing tumors as "Existential Stability erosion-driven passive survival systems" rather than "intrinsically aggressive cellular aggregates", ECDS challenges prevailing dogma, uncovers tumors' intrinsic vulnerability, and establishes a robust meta-theoretical foundation for both basic cancer research and translational clinical management.


[21] 2505.06834

Local stabilizability implies global controllability in catalytic reaction systems

Controlling complex reaction networks is a fundamental challenge in the fields of physics, biology, and systems engineering. Here, we prove a general principle for catalytic reaction systems with kinetics where the reaction order and the stoichiometric coefficient match: the local stabilizability of a given state implies global controllability within its stoichiometric compatibility class. In other words, if a target state can be maintained against small perturbations, the system can be controlled from any initial condition to that state. This result highlights a tight link between the local and global dynamics of nonlinear chemical reaction systems, providing a mathematical criterion for global reachability that is often elusive in high-dimensional systems. The finding illuminate the robustness of biochemical systems and offers a way to control catalytic reaction systems in a generic framework.