New articles on Quantitative Biology


[1] 2512.22158

Pseudo-biodiversity effects across scales

Over the last decade several attempts have been made to extend biodiversity studies in ways that would allow researchers to explore how biodiversity-ecosystem functioning relationships may change across different spatial and temporal scales. Unfortunately, the studies based on these attempts often overlooked the serious issues that can arise when quantifying biodiversity effects at larger scales, specifically the fact that biodiversity effects measured across space and time can contain trivial effects that are unrelated to the role of biodiversity per se -- or even effects that are non-biological in nature due to being simple artefacts of how properties and entities are counted and quantified. Here we outline and describe three such pseudo-biodiversity effects: Population-level effects, Independence effects, and Arithmetic effects. Population-level effects are those related to temporal changes due to individual species population growth or development, and are thus independent of biodiversity. Independence and Arithmetic effects (which we explore here primarily in a spatial context) arise either as a simple consequence of the fact that not all species are present everywhere -- i.e., species turnover is inevitable at greater spatial scales (Independence effects); or they arise when the purported biodiversity effects measured are a simple byproduct of how mathematical functions behave (Arithmetic effects). Our study demonstrates the necessity of controlling for these trivial artefactual effects if one wishes to meaningfully measure how true biodiversity effects change across spatial and temporal scales.


[2] 2512.22225

Literature Mining System for Nutraceutical Biosynthesis: From AI Framework to Biological Insight

The extraction of structured knowledge from scientific literature remains a major bottleneck in nutraceutical research, particularly when identifying microbial strains involved in compound biosynthesis. This study presents a domain-adapted system powered by large language models (LLMs) and guided by advanced prompt engineering techniques to automate the identification of nutraceutical-producing microbes from unstructured scientific text. By leveraging few-shot prompting and tailored query designs, the system demonstrates robust performance across multiple configurations, with DeepSeekV3 outperforming LLaMA2 in accuracy, especially when domain-specific strain information is included. A structured and validated dataset comprising 35 nutraceutical-strain associations was generated, spanning amino acids, fibers, phytochemicals, and vitamins. The results reveal significant microbial diversity across monoculture and co-culture systems, with dominant contributions from Corynebacterium glutamicum, Escherichia coli, and Bacillus subtilis, alongside emerging synthetic consortia. This AI-driven framework not only enhances the scalability and interpretability of literature mining but also provides actionable insights for microbial strain selection, synthetic biology design, and precision fermentation strategies in the production of high-value nutraceuticals.


[3] 2512.22246

A nonconservative kinetic framework with logistic growth for modeling the coexistence in a multi-species ecological system

Kinetic theory frameworks are widely used for modeling stochastic interacting systems, where the evolution primarily depends on binary interactions. Recently, in this framework the action of the external force field has been introduction in order to gain a more realistic picture of some phenomena. In this paper, we introduce nonconservative kinetic equations where a particular shape external force field acts on the overall system. Then, this framework is used in an ecological context for modeling the evolution of a system composed of two species interacting with a prey-predator mechanism. The linear stability analysis concerned with the coexistence equilibrium point is provided, and a case where a Hopf bifurcations occurs is discussed. Finally, some relevant scenarios are numerically simulated.


[4] 2512.22257

LiveProteinBench: A Contamination-Free Benchmark for Assessing Models' Specialized Capabilities in Protein Science

In contrast to their remarkable performance on general knowledge QA, the true abilities of Large Language Models (LLMs) in tasks demanding deep, specialized reasoning, such as in protein biology, have yet to be thoroughly investigated. Current benchmarks suffer from critical deficiencies, such as data contamination due to outdated test sets, insufficient focus on essential protein-specific tasks, and a neglect of multimodal assessments. To resolve these issues, we introduce LiveProteinBench, a contamination-free, multimodal benchmark of 12 tasks for evaluating LLM performance on protein property and function prediction. Its central innovation lies in a test set composed exclusively of proteins validated after the start of 2025, guaranteeing that the data is novel to all tested models. We benchmarked a suite of prominent general-purpose LLMs and specialized biological LLMs using both unimodal and multimodal input schemes. Our results show that: 1) General-purpose proprietary large models demonstrate superior zero-shot performance when encountering new protein data, outperforming their open-source and domain-specific counterparts by over 20\% accuracy. 2) The effective use of multi-view structural information remains a significant challenge, as the inclusion of structural images often fails to provide a consistent benefit and can even degrade performance. This highlights the limitations of current models in effectively fusing information across different modalities. 3) Models' performance scales more directly with the computational cost during inference than with its parameter count, underscoring the critical role of Chain-of-Thought reasoning capabilities for protein-specific tasks. LiveProteinBench delineates the current performance frontiers for LLMs in bioinformatics and presents new challenges for the development of future multimodal foundation models for biology


[5] 2512.22262

INSIGHT: Spatially resolved survival modelling from routine histology crosslinked with molecular profiling reveals prognostic epithelial-immune axes in stage II/III colorectal cancer

Routine histology contains rich prognostic information in stage II/III colorectal cancer, much of which is embedded in complex spatial tissue organisation. We present INSIGHT, a graph neural network that predicts survival directly from routine histology images. Trained and cross-validated on TCGA (n=342) and SURGEN (n=336), INSIGHT produces patient-level spatially resolved risk scores. Large independent validation showed superior prognostic performance compared with pTNM staging (C-index 0.68-0.69 vs 0.44-0.58). INSIGHT spatial risk maps recapitulated canonical prognostic histopathology and identified nuclear solidity and circularity as quantitative risk correlates. Integrating spatial risk with data-driven spatial transcriptomic signatures, spatial proteomics, bulk RNA-seq, and single-cell references revealed an epithelium-immune risk manifold capturing epithelial dedifferentiation and fetal programs, myeloid-driven stromal states including $\mathrm{SPP1}^{+}$ macrophages and $\mathrm{LAMP3}^{+}$ dendritic cells, and adaptive immune dysfunction. This analysis exposed patient-specific epithelial heterogeneity, stratification within MSI-High tumours, and high-risk routes of CDX2/HNF4A loss and CEACAM5/6-associated proliferative programs, highlighting coordinated therapeutic vulnerabilities.


[6] 2512.22270

Measuring the time-scale-dependent information flow between maternal and fetal heartbeats during the third trimester

Prenatal maternal stress alters maternal-fetal heart rate coupling, as demonstrated by the Fetal Stress Index derived from bivariate phase-rectified signal averaging. Here, we extend this framework using information-theoretical measures to elucidate underlying mechanisms. In 120 third-trimester pregnancies (58 stressed, 62 control), we computed transfer entropy (TE), entropy rate (ER), and sample entropy (SE) under multiple conditioning paradigms, employing mixed linear models for repeated measures. We identify dual coupling mechanisms at the short-term (0.5 - 2.5 s), but not long-term (2.5 - 5 s) time scales: (1) stress-invariant state-dependent synchronization, with maternal decelerations exerting approximately 60% coupling strength on fetal heart rate complexity - a fundamental coordination conserved across demographics; and (2) stress-sensitive temporal information transfer (TE), showing exploratory associations with maternal cortisol that require replication. A robust sex-by-stress interaction emerged in TE from mixed models, with exploratory female-specific coupling patterns absent in males. Universal acceleration predominance was observed in both maternal and fetal heart rates, stronger in fetuses and independent of sex or stress. We provide insight into the dependence of these findings on the sampling rate of the underlying data, identifying 4 Hz, commonly used for ultrasound-derived fetal heart rate recordings, as the necessary and sufficient sampling rate regime to capture the information flow. Information-theoretical analysis reveals that maternal-fetal coupling operates through complementary pathways with differential stress sensitivity, extending the Fetal Stress Index by elucidating causal foundations. Future studies should explore additional information-theoretical conditional approaches to resolve stress-specific and time-scale-specific differences in information flow.


[7] 2512.22415

Hierarchical Preemption: A Novel Information-Theoretic Control Mechanism in Lambda Phage Decision-Making

Biological systems organize into hierarchies to manage complexity, yet the mechanisms governing hierarchical control remain incompletely understood. Using information theory and the Lambda phage lysis-lysogeny decision as a model system, we discover that hierarchical control operates through hierarchical preemption - higher layers collapse decision space rather than blocking lower-layer signals. Through mutual information (MI) analysis of 200 stochastic simulations, we demonstrate that the UV damage sensor (RecA) achieves 2.01x information advantage over environmental signals by preempting bistable outcomes into monostable attractors (98% lysogenic or 85% lytic). Conditional MI analysis reveals that the integrator signal (CII) carries lower information when RecA is absent (saturated, 0.06 bits) than when RecA is active (subsaturated, 0.38 bits). This saturation effect demonstrates that signals orchestrate compartment behaviors by removing decision space - achieving 85-98% outcome certainty while preserving 2-15% escape routes. These findings establish a quantitative framework for hierarchical information processing in cellular decision-making.


[8] 2512.22485

JParc: Joint cortical surface parcellation with registration

Cortical surface parcellation is a fundamental task in both basic neuroscience research and clinical applications, enabling more accurate mapping of brain regions. Model-based and learning-based approaches for automated parcellation alleviate the need for manual labeling. Despite the advancement in parcellation performance, learning-based methods shift away from registration and atlas propagation without exploring the reason for the improvement compared to traditional methods. In this study, we present JParc, a joint cortical registration and parcellation framework, that outperforms existing state-of-the-art parcellation methods. In rigorous experiments, we demonstrate that the enhanced performance of JParc is primarily attributable to accurate cortical registration and a learned parcellation atlas. By leveraging a shallow subnetwork to fine-tune the propagated atlas labels, JParc achieves a Dice score greater than 90% on the Mindboggle dataset, using only basic geometric features (sulcal depth, curvature) that describe cortical folding patterns. The superior accuracy of JParc can significantly increase the statistical power in brain mapping studies as well as support applications in surgical planning and many other downstream neuroscientific and clinical tasks.


[9] 2512.22506

The effect of dispersal area on the extinction threshold

The survival of populations hinges on their ability to offset local extinctions through new colonizations. The dispersal area ($A$) plays a crucial role in this process, as it determines the probability of finding colonizable vacant sites. We investigated the spatial colonization-extinction dynamics in a lattice model (a contact process), exploring various finite dispersal areas and estimating the extinction threshold $\lambda_E(A)$. Our results revealed a consistent $\lambda_E(A)$ relationship, largely independent of lattice geometry (except for the smallest $A$). This $\lambda_E(A)$ relationship obeyed universal scaling laws within two broad ranges of $A$. The scaling relations suggest considerable selection upon the increase of dispersal area, particularly at low $A$ values. We discuss these findings in the broader context of the evolution of dispersal area.


[10] 2512.22785

Nonlinear Dynamical Modeling of Human Intracranial Brain Activity with Flexible Inference

Dynamical modeling of multisite human intracranial neural recordings is essential for developing neurotechnologies such as brain-computer interfaces (BCIs). Linear dynamical models are widely used for this purpose due to their interpretability and their suitability for BCIs. In particular, these models enable flexible real-time inference, even in the presence of missing neural samples, which often occur in wireless BCIs. However, neural activity can exhibit nonlinear structure that is not captured by linear models. Furthermore, while recurrent neural network models can capture nonlinearity, their inference does not directly address handling missing observations. To address this gap, recent work introduced DFINE, a deep learning framework that integrates neural networks with linear state-space models to capture nonlinearities while enabling flexible inference. However, DFINE was developed for intracortical recordings that measure localized neuronal populations. Here we extend DFINE to modeling of multisite human intracranial electroencephalography (iEEG) recordings. We find that DFINE significantly outperforms linear state-space models (LSSMs) in forecasting future neural activity. Furthermore, DFINE matches or exceeds the accuracy of a gated recurrent unit (GRU) model in neural forecasting, indicating that a linear dynamical backbone, when paired and jointly trained with nonlinear neural networks, can effectively describe the dynamics of iEEG signals while also enabling flexible inference. Additionally, DFINE handles missing observations more robustly than the baselines, demonstrating its flexible inference and utility for BCIs. Finally, DFINE's advantage over LSSM is more pronounced in high gamma spectral bands. Taken together, these findings highlight DFINE as a strong and flexible framework for modeling human iEEG dynamics, with potential applications in next-generation BCIs.


[11] 2512.23144

An Inference-Based Architecture for Intent and Affordance Saturation in Decision-Making

Decision paralysis, i.e. hesitation, freezing, or failure to act despite full knowledge and motivation, poses a challenge for choice models that assume options are already specified and readily comparable. Drawing on qualitative reports in autism research that are especially salient, we propose a computational account in which paralysis arises from convergence failure in a hierarchical decision process. We separate intent selection (what to pursue) from affordance selection (how to pursue the goal) and formalize commitment as inference under a mixture of reverse- and forward-Kullback-Leibler (KL) objectives. Reverse KL is mode-seeking and promotes rapid commitment, whereas forward KL is mode-covering and preserves multiple plausible goals or actions. In static and dynamic (drift-diffusion) models, forward-KL-biased inference yields slow, heavy-tailed response times and two distinct failure modes, intent saturation and affordance saturation, when values are similar. Simulations in multi-option tasks reproduce key features of decision inertia and shutdown, treating autism as an extreme regime of a general, inference-based, decision-making continuum.


[12] 2512.23146

A Network of Biologically Inspired Rectified Spectral Units (ReSUs) Learns Hierarchical Features Without Error Backpropagation

We introduce a biologically inspired, multilayer neural architecture composed of Rectified Spectral Units (ReSUs). Each ReSU projects a recent window of its input history onto a canonical direction obtained via canonical correlation analysis (CCA) of previously observed past-future input pairs, and then rectifies either its positive or negative component. By encoding canonical directions in synaptic weights and temporal filters, ReSUs implement a local, self-supervised algorithm for progressively constructing increasingly complex features. To evaluate both computational power and biological fidelity, we trained a two-layer ReSU network in a self-supervised regime on translating natural scenes. First-layer units, each driven by a single pixel, developed temporal filters resembling those of Drosophila post-photoreceptor neurons (L1/L2 and L3), including their empirically observed adaptation to signal-to-noise ratio (SNR). Second-layer units, which pooled spatially over the first layer, became direction-selective -- analogous to T4 motion-detecting cells -- with learned synaptic weight patterns approximating those derived from connectomic reconstructions. Together, these results suggest that ReSUs offer (i) a principled framework for modeling sensory circuits and (ii) a biologically grounded, backpropagation-free paradigm for constructing deep self-supervised neural networks.


[13] 2512.23301

Somatosensory prediction in premature neonates: iatrogenic pain experience increases repetition suppression and deviance detection of innocuous stimuli in a tactile oddball protocol

Sensory prediction (SP) is a fundamental mechanism of perception that supports cognitive development. Atypical SP has been reported across multiple neurodevelopmental disorders (ND), suggesting it may constitute an early cross-syndromic marker. Premature birth is a major risk factor for ND, with risk increasing as gestational age (GA) at birth decreases. However, how perinatal risk factors shape the development of SP remains poorly understood. We do not know if untimely birth itself, or exposure to iatrogenic pain during neonatal intensive care, cause neurodevelopmental impairments. In this study, we first assessed whether SP can be detected in the brains of premature neonates at 35 weeks corrected GA using a tactile oddballomission paradigm with EEG. We then examined the effects of the degree of prematurity and of the exposure to painful care procedures on neural indices of SP. Results demonstrate the presence of repetition suppression (RS) and a mismatch response (MMR) to deviant stimuli in the contralateral somatosensory cortex of premature neonates. The amplitude of these SP proxies was significantly affected by the number of painful procedures experienced since birth, independently of the effect of GA at birth. Contrary to our initial hypothesis that greater neurodevelopmental risk would be associated with less mature SP, infants with higher exposure to pain exhibited more robust indices of SP. These findings suggest that increased ex utero experience, even painful, is associated with accelerated maturation of predictive somatosensory processing. Longitudinal follow-up of participants at age 2 will explore how these early markers relate to developmental outcomes.


[14] 2512.23442

Bandwidth Selection of Density Estimators over Treespaces

A kernel density estimator (KDE) is one of the most popular non-parametric density estimators. In this paper we focus on a best bandwidth selection method for use in an analogue of a classical KDE using the tropical symmetric distance, known as a tropical KDE, for use over the space of phylogenetic trees. We propose the likelihood cross validation (LCV) for selecting the bandwidth parameter for the KDE over the space of phylogenetic trees. In this paper, first, we show the explicit optimal solution of the best-fit bandwidth parameter via the LCV for tropical KDE over the space of phylogenetic trees. Then, computational experiments with simulated datasets generated under the multi-species coalescent (MSC) model show that a tropical KDE with the best-fit bandwidth parameter via the LCV perform better than a tropical KDE with an estimated best-fit bandwidth parameter via nearest neighbors in terms of accuracy and computational time. Lastly, we apply our method an empirical data from the Apicomplexa genome.


[15] 2512.23554

An integrated quantitative single-objective light-sheet microscope for subcellular dynamics in embryos and cultured multicellular systems

Quantitative imaging of subcellular processes in living embryos, stem-cell systems, and organoid models requires microscopy platforms that combine high spatial resolution, fast volumetric acquisition, long-term stability, and minimal phototoxicity. Single-objective light-sheet approaches based on oblique plane microscopy (OPM) are well suited for live imaging in standard sample geometries, but most existing implementations lack the optical calibration, timing precision, and end-to-end integration required for reproducible quantitative measurements. Here we present a fully integrated and quantitatively characterized OPM platform engineered for dynamic studies of transcription and nuclear organization in embryos, embryonic stem cells, and three-dimensional culture systems. The system combines high numerical aperture remote refocusing with tilt-invariant light-sheet scanning and hardware-timed synchronization of laser excitation, galvo scanning, and camera readout. We provide a comprehensive characterization of the optical performance, including point spread function, sampling geometry, usable field of view, and system stability, establishing a well-defined framework for quantitative volumetric imaging. To support high-throughput operation, we developed a unified acquisition and reconstruction pipeline that enables real time volumetric imaging at hardware-limited rates while preserving deterministic timing and reproducible geometry. Using this platform, we demonstrate quantitative three-dimensional imaging of MS2-labeled transcription sites in living Drosophila embryos, cultured mouse embryonic stem cells, and mESC-derived gastruloids, enabling extraction of transcriptional intensity traces across diverse biological contexts. This work establishes OPM as a robust and quantitatively calibrated single-objective light-sheet platform for transcription imaging in complex living systems.


[16] 2512.23661

Dynamical incompatibilities in paced finger tapping experiments

The behavioral description of the sensorimotor synchronization phenomenon in humans is exhaustive, mostly by using variations of the traditional paced finger-tapping task. This task helps unveil the inner workings of the error-correction mechanism responsible for the resynchronization after a perturbation to the period of the stimuli sequence. Yet, fundamental contradictions still exist among different works in the literature. One of such contradictions only emerges after comparing the two most-common period perturbation types: step changes and phase shifts. The stimulus sequence is exactly the same in both perturbation types up to and including the (unexpected) perturbed stimulus. Why then would the timing of the next response be different between perturbation types, as observed? The explanation lies in the buildup of different temporal contexts during the experiments that recalibrate the error-correction mechanism. Here we show, both experimentally and theoretically, that responses to different perturbation types are dynamically incompatible when they occur in separate experiments. That is, they can't be represented by the same underlying dynamical system, thus explaining many contradictory results and the difficulty in reproducing both types of perturbations with a single mathematical model. On the other hand, if both perturbation types are presented at random during the same experiment then the responses are compatible with each other and can be construed as produced by a unique underlying mechanism. We conclude that a single underlying dynamical system can represent the response to all perturbation types, signs, and sizes, which is nevertheless recalibrated by temporal context. Our results offer a ground for performing better comparisons in paced finger tapping and extend the usable range of data beyond the perturbed stimulus and into the information-rich resynchronization phase.


[17] 2512.22160

On the comparison of models and experiments in the study of DNA open states: the problem of degrees of freedom

Simple mechanical models of DNA play an important role in studying the dynamics of its open states. The main requirement when developing a DNA model is the correct selection of its effective potentials and parameters based on experimental data. At the same time, various experiments allow us to "see" different types of DNA open states. Consideration of this feature is one of the most important conditions in the development, optimization, and parameterization of any mechanical model. Violation of this condition, i.e., the comparison of incomparable characteristics, leads to critical errors. The present investigation is devoted to the problem of degrees of freedom of DNA bases taken into account in mechanical models. Using the Peyrard-Bishop-Dauxois model as an example, two types of errors in interpreting experimental data when compared with the model are examined. The first one is a mismatch between the open state types in the model and experiment. The second one is an incorrect specification of the "threshold coordinate" of the open state. The concept of the effective total threshold coordinate of the radial separation of DNA strands for registration of opening is introduced. It is shown that correct interpretation of experimental data can actually eliminate discrepancies with theory.


[18] 2512.22568

Lessons from Neuroscience for AI: How integrating Actions, Compositional Structure and Episodic Memory could enable Safe, Interpretable and Human-Like AI

The phenomenal advances in large language models (LLMs) and other foundation models over the past few years have been based on optimizing large-scale transformer models on the surprisingly simple objective of minimizing next-token prediction loss, a form of predictive coding that is also the backbone of an increasingly popular model of brain function in neuroscience and cognitive science. However, current foundation models ignore three other important components of state-of-the-art predictive coding models: tight integration of actions with generative models, hierarchical compositional structure, and episodic memory. We propose that to achieve safe, interpretable, energy-efficient, and human-like AI, foundation models should integrate actions, at multiple scales of abstraction, with a compositional generative architecture and episodic memory. We present recent evidence from neuroscience and cognitive science on the importance of each of these components. We describe how the addition of these missing components to foundation models could help address some of their current deficiencies: hallucinations and superficial understanding of concepts due to lack of grounding, a missing sense of agency/responsibility due to lack of control, threats to safety and trustworthiness due to lack of interpretability, and energy inefficiency. We compare our proposal to current trends, such as adding chain-of-thought (CoT) reasoning and retrieval-augmented generation (RAG) to foundation models, and discuss new ways of augmenting these models with brain-inspired components. We conclude by arguing that a rekindling of the historically fruitful exchange of ideas between brain science and AI will help pave the way towards safe and interpretable human-centered AI.


[19] 2512.22651

Decoding the Architecture of Living Systems

The possibility that evolutionary forces -- together with a few fundamental factors such as thermodynamic constraints, specific computational features enabling information processing, and ecological processes -- might constrain the logic of living systems is tantalizing. However, it is often overlooked that any practical implementation of such a logic requires complementary circuitry that, in biological systems, happens through complex networks of genetic regulation, metabolic reactions, cellular signalling, communication, social and eusocial non-trivial organization. We review and discuss how circuitries are not merely passive structures, but active agents of change that, by means of hierarchical and modular organization, are able to enhance and catalyze the evolution of evolvability. Using statistical physics to analyze the role of non-trivial topologies in major evolutionary transitions, we show that biological innovations are related to deviation from trivial structures and (thermo)dynamic equilibria. We argue that sparse heterogeneous networks such as hierarchical modular, which are ubiquitously observed in nature, are favored in terms of the trade-off between energetic costs for redundancy, error-correction and maintainance. We identify three main features -- namely, interconnectivity, plasticity and interdependency -- pointing towards a unifying framework for modeling the phenomenology, discussing them in terms of dynamical systems theory, non-equilibrium thermodynamics and evolutionary dynamics. Within this unified picture, we also show that slow evolutionary dynamics is an emergent phenomenon governed by the replicator-mutator equation as the direct consequence of a constrained variational nonequilibrium process. Overall, this work highlights how dynamical systems theory and nonequilibrium thermodynamics provide powerful analytical techniques to study biological complexity.


[20] 2512.22820

Epigenetic state encodes locus-specific chromatin mechanics

Chromatin is repeatedly deformed in vivo during transcription, nuclear remodeling, and confined migration - yet how mechanical response varies from locus to locus, and how it relates to epigenetic state, remains unclear. We develop a theory to infer locus-specific viscoelasticity from three-dimensional genome organization. Using chromatin structures derived from contact maps, we calculate frequency-dependent storage and loss moduli for individual loci and establish that the mechanical properties are determined both by chromatin epigenetic marks and organization. On large length scales, chromatin exhibits Rouse-like viscoelastic scaling, but this coarse behavior masks extensive heterogeneity at the single-locus level. Loci segregate into two mechanical subpopulations with distinct longest relaxation times: one characterized by single-timescale and another by multi-timescale relaxation. The multi-timescale loci are strongly enriched in active marks, and the longest relaxation time for individual loci correlates inversely with effective local stiffness. Pull-release simulations further predict a time-dependent susceptibility: H3K27ac-rich loci deform more under sustained forcing yet can resist brief, large impulses. At finer genomic scales, promoters, enhancers, and gene bodies emerge as "viscoelastic islands" aligned with their focal interactions. Together, these results suggest that chromatin viscoelasticity is an organized, epigenetically coupled property of the 3D genome, providing a mechanistic layer that may influence enhancer-promoter communication, condensate-mediated organization, and response to cellular mechanical stress. The prediction that locus-specific mechanics in chromatin are controlled by 3D structures as well as the epigenetic states is amenable to experimental test.


[21] 2512.22868

The body is not there to compute: Comment on "Informational embodiment: Computational role of information structure in codes and robots" by Pitti et al

Applying the lens of computation and information has been instrumental in driving the technological progress of our civilization as well as in empowering our understanding of the world around us. The digital computer was and for many still is the leading metaphor for how our mind operates. Information theory (IT) has also been important in our understanding of how nervous systems encode and process information. The target article deploys information and computation to bodies: to understand why they have evolved in particular ways (animal bodies) and to design optimal bodies (robots). In this commentary, I argue that the main role of bodies is not to compute.


[22] 2512.22946

Determining habitat anomalies in cross-diffusion predator-prey chemotaxis models

This paper addresses an open inverse problem at the interface of mathematical analysis and spatial ecology: the unique identification of unknown spatial anomalies -- interpreted as zones of habitat degradation -- and their associated ecological parameters in multi-species predator-prey systems with multiple chemical signals, using only boundary measurements. We formulate the problem as the simultaneous recovery of an unknown interior subdomain and discontinuous ecological interaction rules across its boundary. A unified theooretical framework is developed that unique determines both the anomaly's geometry and discontinuous coefficients characterizing the altered interactions within the degraded region. Our results cover smooth anomalies in time-dependent systems and are extended to non-smooth polyhedral inclusions in stationary regimes. This work bridges a gap between ecological sensing and the quantitative inference of internal habitat heterogeneity, offering a mathamtical basis for detecting and characterizing habitat degradation from limited external data.


[23] 2512.23080

QSAR-Guided Generative Framework for the Discovery of Synthetically Viable Odorants

The discovery of novel odorant molecules is key for the fragrance and flavor industries, yet efficiently navigating the vast chemical space to identify structures with desirable olfactory properties remains a significant challenge. Generative artificial intelligence offers a promising approach for \textit{de novo} molecular design but typically requires large sets of molecules to learn from. To address this problem, we present a framework combining a variational autoencoder (VAE) with a quantitative structure-activity relationship (QSAR) model to generate novel odorants from limited training sets of odor molecules. The self-supervised learning capabilities of the VAE allow it to learn SMILES grammar from ChemBL database, while its training objective is augmented with a loss term derived from an external QSAR model to structure the latent representation according to odor probability. While the VAE demonstrated high internal consistency in learning the QSAR supervision signal, validation against an external, unseen ground truth dataset (Unique Good Scents) confirms the model generates syntactically valid structures (100\% validity achieved via rejection sampling) and 94.8\% unique structures. The latent space is effectively structured by odor likelihood, evidenced by a Fréchet ChemNet Distance (FCD) of $\approx$ 6.96 between generated molecules and known odorants, compared to $\approx$ 21.6 for the ChemBL baseline. Structural analysis via Bemis-Murcko scaffolds reveals that 74.4\% of candidates possess novel core frameworks distinct from the training data, indicating the model performs extensive chemical space exploration beyond simple derivatization of known odorants. Generated candidates display physicochemical properties ....


[24] 2512.23137

Graph Neural Networks with Transformer Fusion of Brain Connectivity Dynamics and Tabular Data for Forecasting Future Tobacco Use

Integrating non-Euclidean brain imaging data with Euclidean tabular data, such as clinical and demographic information, poses a substantial challenge for medical imaging analysis, particularly in forecasting future outcomes. While machine learning and deep learning techniques have been applied successfully to cross-sectional classification and prediction tasks, effectively forecasting outcomes in longitudinal imaging studies remains challenging. To address this challenge, we introduce a time-aware graph neural network model with transformer fusion (GNN-TF). This model flexibly integrates both tabular data and dynamic brain connectivity data, leveraging the temporal order of these variables within a coherent framework. By incorporating non-Euclidean and Euclidean sources of information from a longitudinal resting-state fMRI dataset from the National Consortium on Alcohol and Neurodevelopment in Adolescence (NCANDA), the GNN-TF enables a comprehensive analysis that captures critical aspects of longitudinal imaging data. Comparative analyses against a variety of established machine learning and deep learning models demonstrate that GNN-TF outperforms these state-of-the-art methods, delivering superior predictive accuracy for predicting future tobacco usage. The end-to-end, time-aware transformer fusion structure of the proposed GNN-TF model successfully integrates multiple data modalities and leverages temporal dynamics, making it a valuable analytic tool for functional brain imaging studies focused on clinical outcome prediction.


[25] 2512.23175

HELM-BERT: A Transformer for Medium-sized Peptide Property Prediction

Therapeutic peptides have emerged as a pivotal modality in modern drug discovery, occupying a chemically and topologically rich space. While accurate prediction of their physicochemical properties is essential for accelerating peptide development, existing molecular language models rely on representations that fail to capture this complexity. Atom-level SMILES notation generates long token sequences and obscures cyclic topology, whereas amino-acid-level representations cannot encode the diverse chemical modifications central to modern peptide design. To bridge this representational gap, the Hierarchical Editing Language for Macromolecules (HELM) offers a unified framework enabling precise description of both monomer composition and connectivity, making it a promising foundation for peptide language modeling. Here, we propose HELM-BERT, the first encoder-based peptide language model trained on HELM notation. Based on DeBERTa, HELM-BERT is specifically designed to capture hierarchical dependencies within HELM sequences. The model is pre-trained on a curated corpus of 39,079 chemically diverse peptides spanning linear and cyclic structures. HELM-BERT significantly outperforms state-of-the-art SMILES-based language models in downstream tasks, including cyclic peptide membrane permeability prediction and peptide-protein interaction prediction. These results demonstrate that HELM's explicit monomer- and topology-aware representations offer substantial data-efficiency advantages for modeling therapeutic peptides, bridging a long-standing gap between small-molecule and protein language models.


[26] 2512.23531

Ambiguous signals and efficient codes

In many biological networks the responses of individual elements are ambiguous. We consider a scenario in which many sensors respond to a shared signal, each with limited information capacity, and ask that the outputs together convey as much information as possible about an underlying relevant variable. In a low noise limit where we can make analytic progress, we show that individually ambiguous responses optimize overall information transmission.


[27] 2212.04195

A Paradigm Shift in Human Neuroscience Research: Progress, Prospects, and a Proof of Concept for Population Neuroscience

Recent advances and reflections on reproducible human neuroscience, especially brain-wide association studies (BWAS) leveraging large datasets, have led to divergent and sometimes opposing views on research practices and priorities. The debates span multiple dimensions. Shifts along these axes have fractured consensus and further fragmented an already heterogeneous field of cognitive neuroscience. Here, we sketch a holistic and integrative response grounded in population neuroscience, organized around a closed-loop "design-analysis-interpretation" research cycle that aims to build consensus while bridging these divides. Our central claim is that population neuroscience offers a unique population-level vantage point for identifying general principles, characterizing inter-individual variabilities, and benchmarking intra-individual changes, thereby providing a supportive framework for small-scale, mechanism-focused studies at the individual level and allowing them to co-evolve with population-level studies. Population neuroscience is not simply about providing larger N for BWAS; its deeper goal is to accumulate a family of cross-scale priors and shared infrastructures that can support design, analysis, and interpretation of human neuroscience for decades to come. In this sense, we outline a "third-generation" view of population neuroscience that reorients the field from amassing isolated associations toward building integrative reference frameworks for future mechanistic and translational work.


[28] 2401.15251

EM and XRM Connectomics Imaging and Experimental Metadata Standards

High resolution volumetric neuroimaging datasets from electron microscopy (EM) and x-ray micro and holographic-nano tomography (XRM/XHN) are being generated at an increasing rate and by a growing number of research teams. These datasets are derived from an increasing number of species, in an increasing number of brain regions, and with an increasing number of techniques. Each of these large-scale datasets, often surpassing petascale levels, is typically accompanied by a unique and varied set of metadata. These datasets can be used to derive connectomes, or neuron-synapse level connectivity diagrams, to investigate the fundamental organization of neural circuitry, neuronal development, and neurodegenerative disease. Standardization is essential to facilitate comparative connectomics analysis and enhance data utilization. Although the neuroinformatics community has successfully established and adopted data standards for many modalities, this effort has not yet encompassed EM and XRM/ XHN connectomics data. This lack of standardization isolates these datasets, hindering their integration and comparison with other research performed in the field. Towards this end, our team formed a working group consisting of community stakeholders to develop Image and Experimental Metadata Standards for EM and XRM/XHN data to ensure the scientific impact and further motivate the generation and sharing of these data. This document addresses version 1.1 of these standards, aiming to support metadata services and future software designs for community collaboration. Standards for derived annotations are described in a companion document. Standards definitions are available on a community github page. We hope these standards will enable comparative analysis, improve interoperability between connectomics software tools, and continue to be refined and improved by the neuroinformatics community.


[29] 2408.00770

Linton Stereo Illusion

We present a new illusion that challenges our understanding of stereo vision. The illusion consists of a larger circle at 50cm, and smaller circle in front of it at 40cm, with constant angular sizes throughout. We move the larger circle forward by 10cm (to 40cm) and then back again (to 50cm). The question is, what distance should we move the smaller circle forward and back to maintain a constant perceived separation in depth between the circles? Constant physical distance (10cm) or constant retinal disparity (6.7cm)? Observers choose constant disparity. We therefore argue the 'Linton Stereo Illusion' appears to suggest that perceived stereo depth reflects retinal disparities rather than 3D geometry.


[30] 2507.05960

Mono- and Polyauxic Growth Kinetics: A Semi-Mechanistic Framework for Complex Biological Dynamics

Kinetic modeling of microbial growth is essential for the design, optimization, and scale-up of industrial bioprocesses. Classical empirical models often lack biologically interpretable parameters or fail to capture complex multiphasic (polyauxic) behaviors, while fully mechanistic models are impractical for systems involving complex substrates and mixed cultures. This study proposes a unified mathematical framework that reformulates the canonical Boltzmann and Gompertz equations into semi-mechanistic forms, explicitly defining the maximum specific reaction rate and lag phase duration. Polyauxic growth is represented as a weighted sum of sigmoidal phases, subject to stringent constraints that ensure parameter identifiability, temporal consistency, and biological plausibility. The methodology integrates a workflow to address nonlinear regression in high-dimensional parameter spaces. A two-stage optimization strategy using Differential Evolution for global search followed by L-BFGS-B for local refinement avoid bias and heuristic parameter initialization. A Charbonnier loss function and the Robust Regression and Outlier Removal procedure are employed to identify and mitigate experimental outliers. Model parsimony is enforced using Akaike (AIC, AICc) and Bayesian (BIC) information criteria to select the optimal number of growth phases and avoid overparameterization. The framework was evaluated using experimental anaerobic digestion datasets, demonstrating that conventional single-phase models can obscure relevant metabolic transitions in co-digestion systems.


[31] 2509.00023

Towards a compleat theory of ecosystem size spectra

The regularity of ecosystem size spectra is one of the most intriguing and relevant phenomena on our planet. Aquatic size spectra generally show a log-linearly downtrending shape, following a power-law distribution. A constant log-linear slope has been reported for many marine pelagic ecosystems, often being approximately b = -1. Conversely, there are variable trophic-level-biomass relationships (trophic pyramids). The contrasting observations of a constant biomass (M) spectrum slope and highly variable biomass pyramids may be defined as the constant size spectra-variable trophic dynamics paradox. Here, a mass-specific 'predator-prey-efficiency theory of size spectra' (PETS) is presented and discussed. A thorough analysis of available data, literature and models result in the conclusion that most pelagic marine ecosystems are controlled by trophic processes such as resource-limit stress (bottom-up control) and top-down regulation, with a key role of the carrying capacity of large-sized organisms. This has relevant consequences for the prediction and interpretation of size spectra and the context of fisheries, whaling, and the introduction of exotic predators (e.g., lionfish). The complete size spectrum obtained in situ, including living organisms and non-living particles (e.g., for data from LOPC and UVP) is discussed. This paper is intended as a plea for the integration of modeling approaches, to understand and integrate data and processes across communities including bacteria, phytoplankton, fish and mammals, considering the effect of non-organismic particles.


[32] 2509.09696

DCHO: A Decomposition-Composition Framework for Predicting Higher-Order Brain Connectivity to Enhance Diverse Downstream Applications

Higher-order brain connectivity (HOBC), which captures interactions among three or more brain regions, provides richer organizational information than traditional pairwise functional connectivity (FC). Recent studies have begun to infer latent HOBC from noninvasive imaging data, but they mainly focus on static analyses, limiting their applicability in dynamic prediction tasks. To address this gap, we propose DCHO, a unified approach for modeling and forecasting the temporal evolution of HOBC based on a Decomposition-Composition framework, which is applicable to both non-predictive tasks (state classification) and predictive tasks (brain dynamics forecasting). DCHO adopts a decomposition-composition strategy that reformulates the prediction task into two manageable subproblems: HOBC inference and latent trajectory prediction. In the inference stage, we propose a dual-view encoder to extract multiscale topological features and a latent combinatorial learner to capture high-level HOBC information. In the forecasting stage, we introduce a latent-space prediction loss to enhance the modeling of temporal trajectories. Extensive experiments on multiple neuroimaging datasets demonstrate that DCHO achieves superior performance in both non-predictive tasks (state classification) and predictive tasks (brain dynamics forecasting), significantly outperforming existing methods.


[33] 2510.13841

Identifying Autism-Related Neurobiomarkers Using Hybrid Deep Learning Models

Autism spectrum disorder (ASD) has been associated with structural alterations across cortical and subcortical regions. Quantitative neuroimaging enables large-scale analysis of these neuroanatomical patterns. This project used structural MRI (T1-weighted) data from the publicly available ABIDE I dataset (n = 1,112) to classify ASD and control participants using a hybrid model. A 3D convolutional neural network (CNN) was trained to learn neuroanatomical feature representations, which were then passed to a support vector machine (SVM) for final classification. Gradient-weighted class activation mapping (Grad-CAM) was applied to the CNN to visualize the brain regions that contributed most to the model predictions. The Grad-CAM difference maps showed strongest relevance along cortical boundary regions, with additional emphasis in midline frontal-temporal-parietal areas, which is broadly consistent with prior ASD neuroimaging findings.


[34] 2511.05546

Evolutionary Dynamics of Acid Resistance in Tumors: A Mathematical Model

Acidosis in tumors arises from reprogrammed metabolism and compromised vasculature, creating a harsh, acidic microenvironment that drives the evolutionary selection of acid-resistant cell phenotypes. A mathematical model is proposed to integrate phenotypic evolution, microenvironmental acidification, and tumor density dynamics. Three key mechanisms are incorporated in it: frequency-dependent selection favoring acid-resistant cells below a critical pH, stress-induced phenotypic switching, and a positive feedback loop where resistant cells produce excess acid that intensifies selection pressure. Well-posedness is established. Numerical simulations across biologically relevant parameter regimes lead to identifying two therapeutically targetable parameters as critical bifurcation parameters for resistance evolution: baseline acid clearance rate and a protection factor representing acid-resistance machinery effectiveness. In low-plasticity tumors, both parameters lead to sharp bifurcations with strong parameter interactions: clearance and protection effects are context-dependent, with therapeutic interventions effective only within specific parameter ranges. In high-plasticity tumors, both parameters produce continuous, monotonic responses with independent, additive effects. These regime-dependent dynamics suggest that treatment strategies should adapt to tumor plasticity: in the former, targeting perfusion alone is typically sufficient, though sequential therapy may be required if the perfusion rate approaches or exceeds the bifurcation threshold, whereas in the latter treatment might benefit from combination therapies addressing both parameters simultaneously. These findings suggest that a low-dimensional model can identify therapeutically targetable parameters governing resistance evolution, suggesting interventions to prevent or reverse the harmful effect of acid-resistant phenotypes.


[35] 2511.06153

Topologically Invariant Permutation Test

Functional brain networks exhibit topological structures that reflect neural organization; however, statistical comparison of these networks is challenging for several reasons. This paper introduces a topologically invariant permutation test for detecting topological inequivalence. Under topological equivalence, topological features can be permuted separately between groups without distorting individual network structures. The test statistic uses $2$-Wasserstein distances on persistent diagrams, computed in closed form. To reduce variability in brain connectivities while preserving topology, heat kernel expansion on the Hodge Laplacian is applied with bandwidth $t$ controlling diffusion intensity. Theoretical results guarantee variance reduction through optimal Hilbert space projection. Simulations across diverse network topologies show superior performance compared to conventional two-sample tests and alternative metrics. Applied to resting-state fMRI data from the Multimodal Treatment of ADHD study, the method detects significant topological differences between cannabis users and non-users.


[36] 2512.11136

Heuristic model on the origin of the homochirality of life

Life demonstrates remarkable homochirality of its major building blocks: nucleic acids, amino acids, sugars, and phospholipids. We propose a mechanism that places the root of life homochirality in the formation of phospholipid bilayer vesicles (liposomes). These liposomes are formed at the water-air interface from Langmuir layers and contain ribose, presumably delivered to Early Earth by carbonaceous meteorites. Although the extraterrestrial ribose was initially racemic, life is homochiral, based on D-ribose and its derivatives. The phospholipid membrane high permeability to D-ribose, combined with the ribose interaction with the bilayer charged phosphate groups, leads to ribose phosphorylation, forming D-ribose-5-phosphate. Once inside, the D-ribose-5-phosphate molecules cannot cross the membrane. The catalytic action of Fe (3+ions) greatly enhances the phosphorylation rate. Overall, this process is enantioselective, substantially favoring the buildup of D-ribose over L-ribose. Through liposome fusion, fission, and self-replication, this eventually leads to the Darwinian evolution of these structures and to the conversion of D-ribose-5-phosphate into complex functional molecules, such as ribozymes and RNA, and eventually into DNA, all of which inherit D-ribose chirality.


[37] 2512.11693

Algorithms for Reconstructing B Cell Lineages in the Presence of Context-Dependent Somatic Hypermutation

We introduce a method for approximating posterior probabilities of phylogenetic trees and reconstructing ancestral sequences under models of sequence evolution with site-dependence, where standard phylogenetic likelihood computations (pruning) fail. Our approach uses a combined data-augmentation and importance sampling scheme. A key advantage of our approach is the ability to leverage existing highly optimized phylogenetic software. We apply our approach to the reconstruction of B cell receptor affinity maturation lineages from high-throughput repertoire sequencing data and evaluate the impact of incorporating site-dependence on the reconstruction accuracy of both trees and ancestral sequences. We show that accounting for context-dependence during inference always improves the estimates of both ancestral sequences and lineage trees on simulated datasets. We also examine the impact of incorporating priors based on VDJ recombination models, and find that they significantly improve ancestral sequence reconstruction in germline-encoded regions, but increase errors in non-templated nucleotides. We propose a modified, piecewise prior to address this demonstrate that it improves empirical reconstruction accuracy. We apply our approach to the analysis of the HIV broadly neutralizing antibodies DH270 and CH235 which are important targets of current vaccine design efforts.


[38] 2512.18442

Markovian Promoter Models: A Mechanistic Alternative to Hill Functions in Gene Regulatory Networks

Gene regulatory networks are typically modeled using ordinary differential equations (ODEs) with phenomenological Hill functions to represent transcriptional regulation. While computationally efficient, Hill functions lack mechanistic grounding and cannot capture stochastic promoter dynamics. We present a hybrid Markovian-ODE framework that explicitly models discrete promoter states while maintaining computational tractability. Our approach tracks individual transcription factor binding events as a continuous-time Markov chain, coupled with deterministic ODEs for molecular concentrations. We validate this framework on seven gene regulatory systems spanning basic to advanced complexity: the GAL system, repressilator, Goodwin oscillator, toggle switch, incoherent feed-forward loop, p53-Mdm2 oscillator, and NF-$\kappa$B pathway. Comparison with stochastic simulation algorithm (SSA) ground truth demonstrates that Markovian promoter models achieve similar accuracy to full stochastic simulations while being 10-100$\times$ faster. Our framework provides a mechanistic foundation for gene regulation modeling and enables investigation of promoter-level stochasticity in complex regulatory networks.


[39] 2105.03508

Cross-Population Amplitude Coupling in High-Dimensional Oscillatory Neural Time Series

Neural oscillations have long been considered important markers of interaction across brain regions, yet identifying coordinated oscillatory activity from high-dimensional multiple-electrode recordings remains challenging. We sought to quantify time-varying covariation of oscillatory amplitudes across two brain regions, during a memory task, based on local field potentials recorded from 96 electrodes in each region. We extended Canonical Correlation Analysis (CCA) to multiple time series through the cross-correlation of latent time series. This, however, introduces a large number of possible lead-lag cross-correlations across the two regions. To manage that high dimensionality we developed rigorous statistical procedures aimed at finding a small number of dominant lead-lag effects. The method correctly identified ground truth structure in realistic simulation-based settings. When we used it to analyze local field potentials recorded from prefrontal cortex and visual area V4 we obtained highly plausible results. The new statistical methodology could also be applied to other slowly-varying high-dimensional time series.


[40] 2507.03951

Structure from Noise: Confirmation Bias in Particle Picking in Structural Biology

The computational pipelines of single-particle cryo-electron microscopy (cryo-EM) and cryo-electron tomography (cryo-ET) include an early particle-picking stage, in which a micrograph or tomogram is scanned to extract candidate particles, typically via template matching or deep-learning-based techniques. The extracted particles are then passed to downstream tasks such as classification and 3D reconstruction. Although it is well understood empirically that particle picking can be sensitive to the choice of templates or learned priors, a quantitative theory of the bias introduced by this stage has been lacking. Here, we develop a mathematical framework for analyzing bias in template matching-based detection with concrete applications to cryo-EM and cryo-ET. We study this bias through two downstream tasks: (i) maximum-likelihood estimation of class means in a Gaussian mixture model (GMM) and (ii) 3D volume reconstruction from the extracted particle stack. We show that when template matching is applied to pure noise, then under broad noise models, the resulting maximum-likelihood estimates converge asymptotically to deterministic, noise-dependent transforms of the user-specified templates, yielding a structure from noise effect. We further characterize how the resulting bias depends on the noise statistics, sample size, dimension, and detection threshold. Finally, controlled experiments using standard cryo-EM software corroborate the theory, demonstrating reproducible structure from noise artifacts in low-SNR data.