New articles on Quantitative Biology


[1] 2604.21951

Supregraph: Enabling Information-Optimal Assembly Graph Representation of a Read Set

The first step in any genome assembly algorithm entails the conversion from the domain of strings and overlaps to the language of graphs and paths, typically using one of the two conventional methods: de Bruijn graphs or overlap graphs. However, both standard approaches are known to have limitations. De Bruijn graphs fail to represent complete information from reads, while the overlap graphs often produce artificial breaks in contigs due to the necessity to discard contained reads as a preliminary step. In this work we present a mathematical model for genome assembly that provides a formal framework to determine what constitutes a correct conversion of a read set into an assembly graph under the assumption of error-free reads. We prove that a correct representation of a read set exists in the form of a new class of assembly graphs, which we call supregraphs. We show that supregraphs can be constructed by iteratively transforming de Bruijn graphs using the multiplexing procedure, previously employed in the genome assemblers LJA and Verkko. Finally, we demonstrate that, under a set of natural assumptions, supregraphs provide a foundation for constructing theoretically optimal genome assemblies.


[2] 2604.21988

Local growth laws determine global shape of molluscan shells

Molluscan shells come in various shapes and sizes. Despite this diversity, each species produces a shell with a characteristic shape that is independent of environmental conditions. We seek to understand this robust complexity. We are guided by two principles in the spirit of D'Arcy Thompson. First, the growth is governed by the repeated and continuous application of a fixed growth law, even as the shell evolves in overall shape, without any complex biological machinery to monitor and control the growth. Second, the growth law depends solely on local geometry at the shell's growing edge. The first principle naturally leads to the mathematical statement that the shape of the shell is generated by the action of a Lie group on a protoconch. The second naturally leads to a particular representation of the Lie group. We use this representation to show that the shapes of nearly all known molluscan shells can be described by essentially three parameters: a scalar (scaling), a vector (orientation), and a curve (edge of the protoconch). We relate these parameters to the phylogenetic tree. In addition to the morphogenetic insight, our results potentially point to a new approach to engineering complex structures.


[3] 2604.22018

Foundation models for discovering robust biomarkers of neurological disorders from dynamic functional connectivity

Several brain foundation models (FM) have recently been proposed to predict brain disorders by modelling dynamic functional connectivity (FC). While they demonstrate remarkable model performance and zero- or few-shot generalization, the salient features identified as potential biomarkers are yet to be thoroughly evaluated. We propose RE-CONFIRM, a framework for evaluating the robustness of potential biomarker candidates elucidated by deep learning (DL) models including FMs. From experiments on five large datasets of Autism Spectrum Disorder (ASD), Attention-deficit Hyperactivity Disorder (ADHD), and Alzheimer's Disease (AD), we found that although commonly used performance metrics provide an intuitive assessment of model predictions, they are insufficient for evaluating the robustness of biomarkers identified by these models. RE-CONFIRM metrics revealed that simply finetuning FMs leads to models that fail to capture regional hubs effectively, even in disorders where hubs are known to be implicated, such as ASD and ADHD. In view of this, we propose Hub-LoRA (Low-Rank Adaptation) as a fine-tuning technique that enables FMs to not only outperform customised DL models but also produce neurobiologically faithful biomarkers supported by meta-analyses. RE-CONFIRM is generalizable and can be easily applied to ascertain the robustness of DL models trained on functional MRI datasets. Code is available at: this https URL.


[4] 2604.22116

Resting-State EEG Biomarkers of Tinnitus Robust to Cross-Subject and Cross-Platform Variation

Tinnitus is a prevalent auditory condition lacking objective biomarkers, motivating the search for reliable neural signatures. EEG, being a noninvasive method of brain imaging with a high temporal resolution provides a way to investigate the neural dynamics that may be associated with tinnitus. The generalizability of EEG-based tinnitus biomarkers across different datasets remains a critical challenge. Microstate theory has allowed for the characterization of quasi-stable topographic configurations in EEG, with some studies reporting altered microstate dynamics in tinnitus patients. This work seeks to improve upon existing dynamical systems analysis and their viability in identifying a robust biomarker. Dynamical features were extracted from two resting-state EEG datasets for the binary classification of tinnitus. Here, robustness is quantified as cross-dataset generalization, which is critical for clinical translation. We employ microstate analysis by identifying topographic states, from which transition probability and state duration features are derived. We also apply Koopman operator analysis through Dynamic Mode Decomposition (DMD) to dimensionality-reduced EEG to extract features in single-window. A linear SVM is trained on each feature set and evaluated in a cross-dataset generalization paradigm. PCA-based Koopman features yield the strongest discrimination metrics across both transfer directions, outperforming microstate-derived features. A Wasserstein-distance consistency analysis further reveals that Koopman eigenvalue \emph{magnitude}, encoding oscillation stability, generalizes across datasets ($\bar{\rho} = 0.685$), whereas eigenvalue \emph{phase}, encoding oscillation frequency, does not ($\bar{\rho} = 1.583$), providing interpretable evidence that altered oscillatory decay rates, rather than frequency shifts, constitute the more robust tinnitus biomarker.


[5] 2604.22122

Global remote sensing reveals vegetation clustering as a physical footprint of shifting aridity trends in drylands

Due to climatic changes, excessive grazing, and deforestation, semi-arid and arid ecosystems are vulnerable to desertification and land degradation. As aridity increases, vegetation cover often self-organizes into spatial patterns before collapsing to bare soil. While recent theoretical work has established that spatially heterogeneous yet isotropic environments induce a smooth hysteresis loop -- yielding either periodic (hexagonal) patterns during degradation or disordered (clustered) patterns during recovery -- empirical validation of this physical footprint at a global scale has been lacking. Here, we present an extensive empirical validation using remote sensing across eight distinct global ecosystems, coupled with historical bio-climatic databases. We demonstrate that the spatial morphology of vegetation patches acts as a direct physical footprint of the ecosystem's historical aridity trend. Our results show that ecosystems experiencing increasing aridity display periodic arrays with a defined wavelength, whereas those recovering under decreasing aridity exhibit scale-free clustering. This framework provides a non-destructive, robust satellite-based indicator for diagnosing whether a dryland ecosystem is on a degradation or recovery pathway.


[6] 2604.22137

Earable Platform with Integrated Simultaneous EEG Sensing and Auditory Stimulation

Conventional scalp-based EEG systems are cumbersome to use, requiring extensive setup, restrictive wiring, and conductive gels that can dry out and limit long-term monitoring, while also carrying social stigma. As a result, there is increasing interest in in-ear EEG technology to improve comfort, convenience, and discretion for users. This work presents a personalized in-ear EEG monitor (IEEM) that simultaneously captures EEG signals from the outer ear while delivering audio playback through the same device. The earpiece is custom-molded to precisely match the user's ear anatomy, providing effective sound isolation from the environment and enabling direct audio transmission into the ear canal. Testing of the assembled earpiece shows successful detection of electrooculography (EOG), eye blinks, jaw clenches, auditory steady-state responses (ASSR), and alpha modulation. Electrochemical impedance spectroscopy (EIS) measurements confirm stable electrode-skin contact, with impedance values similar to those of traditional dry electrodes. The integrated approach enables potential closed-loop neuromodulation applications all in the ear where brain activity can be monitored in real-time and corresponding acoustic stimulation delivered adaptively.


[7] 2604.22275

Early Preconfiguration Failure: A Novel Predictor of the Repetitive Subconcussion

Early diagnosis and assessment of repetitive subconcussive (rSC) brain injuries are crucial for early clinical intervention. Conventional methods, largely relying on slow fMRI, fail to capture millisecond-level early cortical dynamics, particularly spatiotemporal features associated with pre-configuration dynamics. This study introduces a novel approach integrating dynamic hierarchical spatial features and cortical early behavioral time-domain sensitivity, utilizing EEG and visual attention tasks. We analyzed cortical early behaviors in 24 healthy controls (HC), 21 rSC patients,and a validation cohort of 25 cTBI patients from public datasets. Results reveal distinct temporal patterns in HC: elevated integration at 0-100 ms, rebound dynamics at 100-200ms, and visual perception integration peaks at 200-600 ms. In contrast, rSC patients exhibited significantly impaired dynamic features, with reduced integration levels indicating a decline in pre-configuration dynamics. Signed center distance (SCD) analysis of separation-integration trajectories showed significantly lower early SCD values in rSC patients compared to HC, while cTBI patients displayed negative SCD values, reflecting irreversible damage. Machine learning classification achieved optimal performance in distinguishing between HC, rSC, and cTBI groups using early cortical features, highlighting the critical role of millisecond-level cortical dynamics in rSC diagnosis.


[8] 2604.22440

The Cathaya argyrophylla Genome Reveals the Evolutionary Trade-offs of a Living Fossil

Cathaya argyrophylla is an endangered paleoendemic gymnosperm characterized by restricted ecological adaptability and high pathogen susceptibility. To elucidate its genomic architecture and evolutionary history, a de novo chromosome-level genome assembly was constructed using PacBio High-Fidelity long reads and Hi-C scaffolding. The resulting 22.73 Gb assembly resolves into 12 pseudochromosomes, demonstrating genome gigantism driven primarily by a 72.92 percent repeat sequence content and extensive intron expansion. Phylogenomic analysis using single-copy orthologs identifies C. argyrophylla as a sister lineage to the Pinus clade, with an estimated divergence time of 102.8 million years ago. Analysis of gene family dynamics reveals significant expansions in pathways related to membrane lipid metabolism, transmembrane transport, and translation machinery, indicating specific molecular adaptations for cellular homeostasis in resource-limited environments. Conversely, the genome exhibits massive contractions in endogenous defense networks, including plant-pathogen interactions, brassinosteroid signaling, and DNA repair mechanisms. This distinct genomic reduction correlates directly with the slow growth rate and weak innate immunity observed in the species, while the expanded transmembrane transport networks suggest an obligate physiological reliance on symbiotic microbiomes for survival. Ultimately, this reference genome establishes a critical molecular resource for future conservation and breeding programs.


[9] 2604.22611

Simple sign epistasis and evolutionary detours in fitness landscapes

In epistatic fitness landscapes, the fitness effect of a mutation depends on the genetic background and may even switch between deleterious and beneficial depending on the presence of another mutation. Epistatic interactions may cause both mutations to change the sign of each other's fitness effects (reciprocal sign epistasis) or only one mutation to do so (simple sign epistasis). Both these forms of epistasis influence evolutionary trajectories. While reciprocal sign epistasis has been associated with multi-peaked landscapes and their ruggedness, the role and relative frequency of simple sign epistasis in fitness landscapes have not been systematically investigated. Here, we prove that the presence of simple sign epistasis is associated with evolutionary detours, i.e., indirect, longer fitness-increasing paths to fitness peaks that include back-mutations. We also show that in experimentally resolved, weakly epistatic landscapes, simple sign epistasis occurs much more frequently than reciprocal sign epistasis. This result is consistent with the theoretical predictions we derive for most landscape models, with the exception of the block model and of landscapes dominated by pairwise allelic incompatibilities, such as RNA stability landscapes. Our results suggest that detours represent a general feature of evolutionary trajectories in weakly epistatic landscapes.


[10] 2604.22716

What are the functions of primary visual cortex (V1)?

Although Hubel and Wiesel established decades ago how individual V1 neurons transform retinal inputs, functions of V1 as a whole are being discovered only recently. First, V1 acts as a motor cortex for exogenously guiding saccades by constructing a bottom-up saliency map of the visual field. Second, V1 initiates a processing bottleneck: a massive reduction of visual information begins at its output to downstream areas. Third, downstream recognition is limited by impoverished information, V1 supports ongoing recognition by providing additional information queried by top-down feedback from downstream areas, directed predominantly to central visual field representations. These V1 functions underpin a framework in which vision is mainly looking and seeing through the bottleneck. Looking selects a fraction of visual information into the bottleneck, largely by saccades that center selected contents at gaze. Seeing recognizes the selected contents. Looking and seeing rely mainly on processing in the peripheral and central visual fields.


[11] 2604.22744

Multiplex Hypergraph Modeling of Higher Order Structures in Psychometric Networks

Psychiatric disorders have been traditionally conceptualized as latent conditions producing observable symptoms, but recent studies suggest that psychopathology may emerge from symptoms interactions. Psychometric networking model these relations focusing on pairwise associations but overlooks higher-order dependencies arising among groups of variables. These dependencies may reflect synergistic mechanisms, where joint symptom configurations convey more information than pairwise relations, or redundancy, where information overlaps. We introduce an information-theoretic multiplex hypergraph framework to identify and compare higher-order interactions in eating disorders data, across diagnostic groups (e.g., anorexia nervosa). Higher-order structures are quantified using $\Omega$-information, a measure that captures the balance between redundancy and synergy. To address the combinatorial growth of candidate subsets, multiple testing and estimation instability, we propose a structured pipeline comprising: (i) targeted candidate selection based on dyadic network topology and theory-driven subscale information; (ii) a three-stage inferential procedure combining null-model testing with bootstrap robustness assessment; and (iii) the construction and analysis of diagnosis-layered, synergistic and redundant multiplex hypergraphs. Results highlight how synergy captures the emergent, higher-order organization of diagnoses, revealing both a stable transdiagnostic core and diagnosis-specific ways in which these domains combine. By contrast, redundancy is confined to eating and body-image related content, marking reinforcement rather than broader symptom integration.


[12] 2403.14046

Clarifying the conceptual dimensions of representation in neuroscience

Despite the centrality of the notion of representation in neuroscience, the field lacks a unified framework for the concepts used to characterize representation, leading to disparate use of both terminology and measures associated with it. To offer clarification, we propose a core set of conceptual dimensions that characterize representations in neuroscience. These dimensions describe relations between a neural response, features that may be represented, and downstream effects of the neural response. A neural response may be shown to be sensitive or specific to a feature, invariant to other features, or functional (it is used downstream in the brain). We use information-theoretic measures to illustrate these conceptual dimensions and explain how they relate to data analysis methods such as correlational analyses, decoding and encoding models, representational similarity analysis, and tests of statistical dependence or adaptation. We consider several canonical examples, including models of the representation of orientation, numerosity, and spatial location, which illustrate how the evidence put forth in support or criticism of these models is systematized by our framework. By offering a unified conceptual framework to characterize representation in neuroscience, we hope to aid the comparison and integration of results across studies and research groups and to help determine when evidence for a neural representation is strong.


[13] 2506.09520

How attention simplifies mental representations for planning

Human planning is efficient--it frugally deploys limited cognitive resources to accomplish difficult tasks--and flexible--adapting to novel problems and environments. Computational approaches suggest that people construct simplified mental representations of their environment, balancing the complexity of a task representation with its utility. These models imply a nested optimisation in which planning shapes perception, and perception shapes planning--but the perceptual and attentional mechanisms governing how this interaction unfolds remain unknown. Here, we harness virtual maze navigation to characterise how spatial attention controls which aspects of a task representation enter subjective awareness and are available for planning. We find that spatial proximity governs which aspects of a maze are available for planning, and that when task-relevant information follows natural (lateralized) contours of attention, people can more easily construct simplified and useful maze representations. This influence of attention varies considerably across individuals, explaining differences in people's task representations and behaviour. Inspired by the 'spotlight of attention' analogy, we incorporate the effects of visuospatial attention into existing computational accounts of value-guided construal. Together, our work bridges computational perspectives on perception and decision-making to better understand how individuals represent their environments in aid of planning.


[14] 2508.02061

A Bayesian approach to model uncertainty in single-cell genomic data

Network models provide a powerful framework for analysing single-cell count data, facilitating the characterisation of cellular identities, disease mechanisms, and developmental trajectories. However, uncertainty modeling in unsupervised learning with genomic data remains insufficiently explored. Conventional clustering methods assign a singular identity to each cell, potentially obscuring transitional states during differentiation or mutation. This study introduces a variational Bayesian framework for clustering and analysing single-cell genomic data, employing a Bayesian Gaussian mixture model to estimate the probabilistic association of cells with distinct clusters. This approach captures cellular transitions, yielding biologically coherent insights into neurogenesis and breast cancer progression. The inferred clustering probabilities enable further analyses, including Differential Expression Analysis and pseudotime analysis. Furthermore, we propose utilising the misclustering rate and Area Under the Curve in clustering scRNA-seq data as an innovative metric to quantitatively evaluate overall clustering performance. This methodological advancement enhances the resolution of single-cell data analysis, enabling a more nuanced characterisation of dynamic cellular identities in development and disease.


[15] 2511.23344

A theory for coexistence and selection of branched actin networks in a shared and finite pool of monomers

Cellular actin structures are continuously turned over while keeping similar sizes. Since they all compete for a shared pool of actin monomers, the question arises how they can coexist in these dynamic steady states. Recently, the coexistence of branched actin networks with different densities growing in a shared and finite pool of purified proteins has been demonstrated in a biomimetic bead assay. However, theoretical work in the context of organelle size regulation has mainly been focused on linear architectures, such as single filaments and bundles, and thus is not able to explain this observation. Here we show theoretically that the local depletion of actin monomers caused by the growth of a branched network naturally gives rise to a negative feedback loop between network density and growth rate, and that this competition is captured by one central ordinary differential equation. A comprehensive bifurcation analysis shows that the theory leads to well-defined steady states even in the case of multiple networks sharing the same pool of monomers, without any need for specific molecular processes. Under increasing competition strength, coexistence is replaced by selection. We also show that our theory is in excellent agreement with spatiotemporal simulations, implemented in a finite element framework, and that local depletion even occurs in the presence of a large pool of non-polymerizable actin. In summary, our work suggests that local monomer depletion is the decisive and universal factor controlling growth of branched actin networks.


[16] 2512.15737

Enzyme-Substrate Complex Formation Modulates Diffusion-Driven Patterning In Metabolic Pathways

Spatial organization in metabolic pathways can arise from the interplay between enzymatic reaction kinetics and diffusion-driven instabilities. In this work we investigate how reversible enzyme--substrate binding influences pattern formation in a two-step metabolic pathway. Starting from a mechanistic description in which the substrate reversibly binds to the first enzyme before catalytic conversion, we formulate a three-species reaction--diffusion system that explicitly incorporates the enzyme--substrate complex. We first analyse the homogeneous dynamics and determine the unique steady state of the kinetic system. Exploiting the separation of time scales between the rapid binding kinetics and the slower evolution of metabolite concentrations, we derive a reduced two-variable model using a quasi-steady-state approximation for the enzyme-substrate complex. This reduction preserves the essential nonlinear coupling between catalytic reactions and spatial transport. Linear stability and weakly nonlinear analysis reveal conditions for diffusion-driven (Turing) instability and show that reversible enzyme binding significantly modifies the location and extent of the instability region compared to models with effective kinetics. Numerical simulations confirm the analytical predictions and demonstrate how enzyme-substrate interactions reshape pattern selection and slow the emergence of spatial heterogeneity. These results provide a mechanistic link between enzyme binding kinetics, diffusion-driven pattern formation, and mesoscale metabolic organization. The proposed framework offers a tractable approach for studying spatial patterning in enzymatic networks and may help explain the emergence of structured biochemical domains such as those associated with liquid--liquid phase separation.


[17] 2508.09160

Presenting DiaData for Research on Type 1 Diabetes

Type 1 diabetes (T1D) is an autoimmune disorder that leads to the destruction of insulin-producing cells, resulting in insulin deficiency, as to why the affected individuals depend on external insulin injections. However, insulin can decrease blood glucose levels and can cause hypoglycemia. Hypoglycemia is a severe event of low blood glucose levels ($\le$70 mg/dL) with dangerous side effects of dizziness, coma, or death. Data analysis can significantly enhance diabetes care by identifying personal patterns and trends leading to adverse events. Especially, machine learning (ML) models can predict glucose levels and provide early alarms. However, diabetes and hypoglycemia research is limited by the unavailability of large datasets. Thus, this work systematically integrates 15 datasets to provide a large database of 2510 subjects with glucose measurements recorded every 5 minutes. In total, 149 million measurements are included, of which 4% represent values in the hypoglycemic range. Moreover, two sub-databases are extracted. Sub-database I includes demographics, and sub-database II includes heart rate data. The integrated dataset provides an equal distribution of sex and different age levels. As a further contribution, data quality is assessed, revealing that data imbalance and missing values present a significant challenge. Moreover, a correlation study on glucose levels and heart rate data is conducted, showing a relation between 15 and 55 minutes before hypoglycemia.


[18] 2512.05794

Mechanistic Interpretability of Antibody Language Models Using SAEs

Sparse autoencoders (SAEs) are a mechanistic interpretability technique that have been used to provide insight into learned concepts within large protein language models. Here, we employ TopK and Ordered SAEs to investigate autoregressive antibody language models, and steer their generation. We show that TopK SAEs can reveal biologically meaningful latent features, but high feature-concept correlation does not guarantee causal control over generation. In contrast, Ordered SAEs impose a hierarchical structure that reliably identifies steerable features, but at the expense of more complex and less interpretable activation patterns. These findings advance the mechanistic interpretability of domain-specific protein language models and suggest that, while TopK SAEs suffice for mapping latent features to concepts, Ordered SAEs are preferable when precise generative steering is required.