New articles on Quantitative Biology


[1] 2606.23733

Germination capacity of pistachio (Pistacia vera L.) seeds related to genotypic variation and phytochemical contents

Genetic diversity and phytochemical components are the endogenous factors that influence seed germination. The current study aimed to compare the seed germination capacity of 15 Pistacia vera genotypes after assessing their genotypic variation using 32 primers (16 ISSR and 16 RAPD) and phytochemical contents. The obtained results explained that the ISSR primers classified the 15 P. vera genotypes into four groups, while the RAPD primers classified them into three groups. The genotypes G11, G5, G1, G9, G6, G14, and G10 had the highest germination percentages (98.89, 97.67, 96.67, 94.44, 93.33, 93.33, and 91.11%), respectively. Additionally, their germination speeds were also the highest. However, the lowest germination percentages (62.22 and 68.59%) were recorded in G8 and G4, respectively. Meanwhile, (G9, G10, and G11), (G1 and G14), and (G5 and G6) were identified together in the same group in accordance with both ISSR and RAPD primers. Also, G4 and G8 were in the same subgroup based on RAPD primers. Moreover, the maximum percent protein values (21.88, 21.88, and 20.78%) were measured in the seed kernels of G9, G11, and G1, respectively. Soluble sugar content was the best (798.9 ug g-1) in G11. The best percentage of oil (45.3%) was observed in G5.


[2] 2606.23744

Performance and Interpretability of Convolutional, Transformer, and Hybrid Deep Learning Models in Colorectal Histology Classification

Deep learning has become an important tool in computational pathology, enabling automated analysis of histopathological images. While convolutional neural networks (CNNs) have traditionally dominated this field, transformer-based and hybrid architectures have recently demonstrated promising performance. However, comprehensive comparisons of these approaches for colorectal histopathology remain limited. This study evaluated twelve ImageNet-pretrained CNN, transformer, and hybrid architectures using the Kather colorectal histopathology dataset containing 5,000 image tiles from eight tissue classes. All models were trained using a standardized transfer-learning and fine-tuning protocol and assessed using multiple performance metrics, including accuracy, precision, sensitivity, specificity, F1-score, ROC-AUC, Cohen's kappa, and Matthews correlation coefficient. All evaluated models achieved high classification performance, with accuracies ranging from 93.2% to 97.1%. EVA-02 achieved the highest overall performance (97.1% accuracy, 97.0% F1-score), closely followed by ViT-B/16. Among CNNs, ResNet34 and ConvNeXt-Tiny demonstrated highly competitive performance, achieving accuracies of 96.4% and 96.3%, respectively. Transformer architectures generally produced the strongest results across evaluation metrics, although the performance gap between the best transformer and CNN models was relatively small. Per-class analysis showed consistently strong classification performance across all tissue categories, with Complex Stroma representing the most challenging class. Overall, transformer-based architectures achieved the highest predictive performance, whereas modern CNNs provided a favorable balance between accuracy and model complexity. These findings provide a comprehensive benchmark of major deep learning paradigms for colorectal histopathology classification.


[3] 2606.23745

JEDEL: Zero-Shot DNA-Encoded Library Design for Early-Stage Drug Discovery

We present JEDEL, a framework for generating synthesis-ready DNA-encoded libraries (DELs) directly from three-dimensional pharmacophore representations of active ligands. JEDEL is the first model to map pharmacophore interaction patterns to actionable, scalable synthesis instructions, enabling the design of targeted libraries comprising potentially millions of molecules. Unlike existing generative approaches that produce virtual compounds requiring downstream synthesis planning, JEDEL operates within the space of purchasable building blocks and validated reactions, ensuring that every output is experimentally realizable by construction. JEDEL learns a predictive alignment between pharmacophore geometry and molecular structure and decodes this into combinatorial synthesis routes at scale. Across 18 protein targets, it generates focused libraries that outperform random and diversity-based baselines in predicted binding affinity, pharmacophore recovery, and sample efficiency, without target-specific retraining. JEDEL enables a shift from virtual molecule generation to experimentally deployable library design.


[4] 2606.23874

Identifying structural design principles shaping the computational abilities of recurrent neural networks

Understanding how the architecture of neural networks shapes the computations they carry is a central challenge in neuroscience and machine learning. While specific circuit architectures have been linked to particular network computations and theoretical bounds on expressivity of broad classes of networks have been found, we are still missing general principles connecting the structure of finite networks to their computational capabilities. Here, we characterize the computational abilities of recurrent neural networks as a function of their connectivity by training a large collection of different networks to compute a large set of Boolean functions. For small networks, we constructed the complete ``catalogs'' of network-function performance, which revealed that computational capacity varies widely across architectures and that most networks show poor performance, and most functions are hard to compute. However, we show that having local 2- and 3-cycles in a network strongly enhances its computational ability, and networks with such cycles are often the minimal architectures that can solve particular functions. We further show that a small set of structural statistics accurately predict networks' performance. Extending our analysis to large networks showed that typical networks fail even to approximate a randomly selected function. Surprisingly, adding a small number of sparsely connected biologically-inspired interneurons to the network dramatically increases computational capacity. As in small networks, adding short cycles improved networks' capacity, outperforming acyclic or reachability-matched controls. Thus, our results identify local cycles as design principles linking neural connectivity to computational power, and offer a general framework to explore structure-function relations in computing networks.


[5] 2606.23946

Mutant Fixation for a Stochastic Evolutionary Model in Fragmented Populations

Population fragmentation is a common feature of many biological systems. Understanding mutant fixation in such systems is challenging because the underlying stochastic dynamics are high-dimensional. In this work, we develop a general mathematical framework for analyzing stochastic evolution in fragmented populations connected by rare migration. The framework is sufficiently general to accommodate heterogeneous deme sizes, deme-dependent birth and death processes, and migration on arbitrary strongly connected directed networks with asymmetric migration rates. We show that, in the limit where migration occurs on a much slower timescale than within-deme dynamics, the full stochastic process can be reduced to a lower-dimensional Markov chain whose states correspond to configurations of fully mutant and fully wild-type demes. The reduction theorem establishes that fixation probabilities and absorption times of the original process are asymptotically determined by the corresponding quantities of a reduced chain. As an application, we derive explicit formulas for mutant fixation probabilities and fixation times in fragmented populations initiated by the introduction of a single mutant. The results provide a general and tractable approach for studying evolutionary dynamics in complex fragmented populations.


[6] 2606.24037

The Morality Game: An online multiplayer platform to standardize, expedite, and expand research on cooperation

This paper presents the Morality Game, a platform designed to standardize and accelerate research on cooperation and morality through game theory-based experiments. The Morality Game functions as a video game for science, a hub for economic game research, an open-access data repository, and a tool for expediting the research process. It allows researchers to launch customized online multiplayer experiments with zero coding, using game trees to simulate moral dilemmas. The platform automates participant payments, data collection, and analysis, promoting replication and transparency. This paper details the platform's architecture, emphasizing its capabilities for standardizing research methods, unifying data, and enabling rapid aggregation and comparison of results. The Morality Game leverages dynamic, self-correcting game trees to generate well-controlled, abstract experiments that can represent any social scenario. Participants interact through a responsive user interface, making the experiments intuitive and engaging. Researchers can configure experiments through a user-friendly dashboard, specifying various parameters and utilizing pre-created or auto-generated game trees. The platform supports nested belief representation and incorporates artificial agents with customizable traits. Plans include integrating social networking features, enhancing emotional expression capabilities, and expanding the platform's reach to remote small-scale societies to test the ecological validity of findings. By evolving into an integrated ecosystem that supports the entire research lifecycle, the Morality Game aims to foster collaboration, enhance data accessibility, and ultimately increase cooperation. This paper outlines the platform's current features, architectural details, and future directions, demonstrating its potential to advance cooperation research.


[7] 2606.24246

Hierarchical models for large chemical reaction networks

The quest for the origin of life, especially in the metabolism-first scenario inspired by the celebrated Miller-Urey experiment, has triggered a research program dedicated to studying the emergence of complex dynamical behaviors in large chemical mixtures. Though autocatalysis, understood as the capacity of a reaction network to grow exponentially, has been recognized as a potential driver of instability and multistability, no quantitative theory has yet emerged, partly because of the lack of available kinetic data. We introduce a computational tool for large chemical reaction networks based on a scale-splitting algorithm inspired by Wilson's renormalization group. We focus on dilute regimes, where species of interest have low concentration, non-unimolecular reactions may be neglected, and the dynamics is close to linear. Depending on parameter thresholds, such networks can exhibit autocatalytic behavior. Our algorithm takes as input a network structure and outputs (1) a simplified effective graph containing the dominant reaction pathways, obtained through recursive coarse-graining; and (2) analytical formulas for the dynamics in terms of kinetic rates, called hierarchical formulas. These formulas are approximate but interpretable, accurate when scale separation is effective, and provide a reliable multiscale description of the dynamics. Their domains of validity define kinetic phases, each typically associated with a distinct pattern of chemical composition. We show on a simple example that this approach enables fast and reliable inference of kinetic rates from concentration time series. Hierarchical formulas have been implemented as a Python package and are illustrated on a simplified model of the formose reaction.


[8] 2606.24351

Graph-based analysis of inflammatory profiles in New Onset Refractory Status Epilepticus (NORSE)

Background and Objectives: Cryptogenic new-onset refractory status epilepticus (cNORSE) represents one of the most severe forms of status epilepticus, occurring in patients without prior neurological disease, and remaining of unknown aetiology despite extensive diagnostic evaluation. Emerging evidence supports a role for immune dysregulation in cNORSE; however, marked heterogeneity in inflammatory signatures has been reported, complicating the selection of targeted immunotherapies. Therefore, a critical need for tools facilitating the interpretation of cytokine panels exists. Methods: Building on the identification of distinct inflammatory groups of cNORSE patients using a graph clustering approach applied to a cohort of 62 patients with serum profiling of 96 cytokines, we tailored new models to quantify attribution probability to biologically validated clusters. Statistical assessment of the most informative model involved Monte-Carlo simulations and custom-developed parametric tests. Ultimately, we applied our framework to the implementation of a clinician-friendly interface for inflammatory profiling. Results: Our approach enables quick processing of several cytokine profiles, providing the most likely inflammatory cluster, associated attribution probability, and statistical confidence. For longitudinal assessments, the proposed method may also allow tracking the evolution of inflammatory trajectories over time. Conclusion: Systematic statistical characterization of the inflammatory heterogeneity in cNORSE requires the development of clinically actionable support tools. Our study offers a framework that may support personalized immunomodulatory strategies in cNORSE patients through clustering-based cytokine profiling.


[9] 2606.24372

Machine learning-based modeling to predict inhibitors for targets of Alzheimer's Disease

Alzheimer's Disease is a chronic neurodegenerative disorder projected to affect 115 million people by 2050, driven by mechanisms like the cholinergic and amyloid hypotheses and insulin signaling disruptions involving key targets such as BACE-1, AChE, and GSK-3 beta. Utilizing machine learning (ML), we developed predictive models for inhibitor screening, achieving AUC-ROC scores above 0.9 for all targets. BACE-1 models showed high accuracy (86.63%) but limited chemical diversity. AChE models exhibited greater chemical diversity and similar performance (AUC-ROC: 92.86%, Accuracy: 85.20%), while GSK-3 beta models achieved an AUC-ROC of 91.14% with the highest proportion of viable drug candidates. These findings highlight the potential of ML in Alzheimer's drug discovery, with AChE and GSK-3 beta emerging as promising targets.


[10] 2606.24406

EEG Interpretation Across Chant Listening: A Single-Subject Pilot Investigation Using Spectral and Functional Connectivity Analysis

This technical report presents an EEG-based investigation of neural activity across five auditory conditions: Resting State (RS), Shiv Tandav Stotra (STS), Mahasudarshan Mantra (MM), Aum Chant, and Tanpura Listening. EEG recordings acquired from a healthy 5-year-old participant were analyzed using spectral power estimation and functional connectivity measures based on the weighted Phase Lag Index (wPLI). Spectral analysis revealed condition-specific modulation of neural oscillatory activity, with STS listening producing the highest relative power across multiple frequency bands, particularly within the beta range. Functional connectivity analysis demonstrated distinct network organizations across conditions. STS listening exhibited the strongest and most widespread connectivity pattern, characterized by prominent long-range interactions among frontal, temporal, parietal, and occipital regions. Tanpura listening generated a dense yet balanced connectivity network, while Aum listening showed moderate distributed connectivity. In contrast, MM and resting-state conditions displayed comparatively weaker and more localized network organization. These preliminary findings suggest that different chant-listening conditions engage distinct neural mechanisms involving both cortical activation and large-scale neural synchronization. The study establishes a methodological framework for future investigations examining the role of culturally relevant auditory interventions in cognitive development, neuroeducation, and child-centered neuroscience research.


[11] 2606.24487

CABS-flex standalone 3: an open command-line platform for protein flexibility simulation, peptide structure modeling, and protein-peptide docking

Summary: CABS-flex standalone 3 is an open command-line platform for fast CABS-based coarse-grained modeling of protein flexibility, peptide structures, and global or information-guided protein-peptide docking, coupled with all-atom reconstruction and analysis. The package builds on the established CABS-flex and CABS-dock ecosystem, widely used in structural bioinformatics for protein flexibility simulations and flexible protein-peptide docking. It provides a Python 3 implementation that brings together previous standalone functionality with recent developments in protein flexibility simulation, linear and cyclic peptide modeling, extended reporting and visualization, and deep-learning-based all-atom reconstruction with cg2all. Availability and Implementation: CABS-flex standalone 3 is implemented in Python 3 and is freely available as an open-source command-line package. Documentation is available at this https URL. Source code is available at this https URL.


[12] 2606.24660

Extended pseudo-spectral physics-informed neural networks for phase-field models

Phase-field models play a central role in the continuum description of phase separation, in which the bulk free-energy density and the interfacial thickness parameter determine pattern formation and microstructural evolution. In practice, these constitutive quantities are rarely known a priori and must be inferred from limited dynamical observations. In this work, an extended pseudo-spectral physics-informed neural network (ESPINN) framework is developed for the inverse identification of phase-field models from transient snapshot data. It enables the simultaneous recovery of both the bulk chemical potential and unknown gradient coefficients. Numerical experiments on the one-dimensional Cahn-Hilliard equation demonstrate accurate and statistically stable reconstruction in the noiseless regime, with substantial constitutive information recoverable from even a single snapshot pair. In the presence of noise, reconstruction accuracy degrades gracefully, and increasing the number of snapshots improves robustness by reducing variance across runs. These results establish ESPINN as a data-efficient and physically consistent approach for learning free-energy structure in continuum models of phase separation.


[13] 2606.24668

A pilot study examining transcranial photobiomodulation therapy intervention in college students with insomnia

College students commonly report insufficient sleep and poor sleep quality, with ~30% meeting insomnia criteria, posing significant threats to their physical growth, cognitive development, and overall well-being, as well as imposing a substantial economic burden on society [1]. The hyperarousal model of insomnia [2] emphasizes that hyperarousal across cognitive, emotional, and physiological domains mutually reinforces one another. Neuroimaging studies have further identified prefrontal hypoactivity as a key neural substrate underlying these dysfunctional cognitions and elevated arousal, reflecting a failure of top-down modulatory control over both limbic reactivity [3] and brainstem arousal nuclei [4]. Moreover, transcranial photobiomodulation (tPBM) therapy targeting the prefrontal cortex has demonstrated therapeutic efficacy across neuropsychiatric disorders with insomnia comorbidities [5,6], providing preliminary support for its application in insomnia. However, the neuro mechanisms underlying tPBM's therapeutic effects on insomnia remain to be elucidated.


[14] 2606.24779

DeepBD: A Grounded Agentic Workflow for Variant Prioritization and Diagnosis of Genetic Birth Defects

Birth defects are a major cause of fetal loss, neonatal morbidity and long-term disability. In the subset with suspected genetic etiologies, exome and genome sequencing have moved many cases from variant detection to post-sequencing interpretation: clinicians must rank patient-specific candidate variants under incomplete fetal or infant phenotypes and heterogeneous evidence from population genetics, variant-effect prediction, gene-disease validity, phenotype ontologies, cellular and pathway context, protein structure and clinical literature. We present DeepBD, a grounded agentic workflow for variant prioritization and diagnostic interpretation of genetic birth defects. DeepBD organizes the workflow into LLM-assisted case structuring, a pretrained evidence engine, specialist evidence modules and a grounded diagnostic review layer. The evidence engine learns patient-specific variant scores from structured rule evidence, sequence and variant-effect representations and phenotype-conditioned biological context, whereas specialist modules and the agentic layer provide tool-based refinement, candidate-pool review and diagnosis-oriented synthesis from ranked candidates. Developed using an in-house fetal and infant cohort comprising 18,622 cases, DeepBD achieved Recall@1/3/5/10 of 0.658/0.882/0.912/0.929 on an internal held-out solved-case benchmark, outperforming standalone Exomiser, DeepRare and prompted LLM reranking baselines evaluated on Exomiser-derived top-20 candidate variants. Ablation and overlap analyses show that rule evidence, mechanistic context, and specialist refinement provide complementary signals. These findings support a grounded agentic workflow that separates evidence integration, tool-based refinement, and LLM-assisted diagnostic review for retrospective variant prioritization in genetic birth defects.


[15] 2606.23871

Federated Survival Analysis in Healthcare: A Multi-Model Evaluation on Cross-Institutional Heterogeneous Breast Cancer Data

Survival analysis is central to clinical decision-making, yet reliable time-to-event models require large, diverse cohorts that are rarely available at a single institution, while privacy regulations restrict the centralization of patient data. Federated learning (FL) offers a privacy-preserving alternative by training shared models without exchanging raw data, but its effectiveness for survival modeling under realistic, heterogeneous conditions remains insufficiently understood. This paper presents a systematic, multi-model evaluation of federated survival analysis on a cross-institutional breast cancer cohort with naturally heterogeneous distributed clients. Three representative survival models, the Cox Proportional Hazards model, DeepSurv, and Random Survival Forest (RSF), are compared across centralized, local, and federated training, and three federated optimization strategies (FedAvg, FedProx, and FedAdam) are assessed for the gradient-based models. Results show that FL consistently outperforms local training and approaches, and occasionally exceeds, centralized performance, while RSF offers the best overall balance of discrimination, calibration, and robustness across heterogeneous clients. We further find that performance depends on the diversity of client distributions, and that FedAvg and FedProx are stronger and more stable than FedAdam. Based on these findings, we derive practical, decision-oriented guidelines mapping data, privacy, interpretability, and resource constraints to recommended model and training-paradigm choices for federated survival modeling in healthcare.


[16] 2606.23957

Learning the Koopman Operator using Attention Free Transformers

Learning Koopman operators with autoencoders enables linear prediction in a latent space, but long-horizon rollouts often drift off the learned manifold, leading to phase and amplitude errors on systems with switching, continuous spectra, or strong transients. We introduce two complementary components that make Koopman predictors more robust. First, we add an attention-free latent memory (AFT) block that aggregates a short window of past latents to produce a corrected latent before each Koopman update. Unlike multi-head attention, AFT operates in linear time and adds only $\approx$30k parameters ($3d^2 + T^2$, fewer than matched multi-head attention), yet captures the local temporal context needed to suppress error divergence. Second, we propose dynamic re-encoding: lightweight, online change-point triggers (EWMA, CUSUM, and sequential two-sample tests) that detect latent drift and project predictions back onto the autoencoder manifold. Across three benchmark systems -- Duffing oscillator, Repressilator, IRMA -- our model consistently reduces error accumulation compared to a Koopman autoencoder and matched-capacity multi-head attention. We also compare against GRU and Transformer autoencoders, evaluated both from initial conditions and with a 50-step context, and find that Koopman+AFT (with optional re-encoding) attains markedly lower long-horizon error while maintaining lower inference latency. We report improvements over horizons up to 1000 steps, together with ablations over trigger policies. The result is a fast, compact predictor that stays on the learned manifold over long horizons.


[17] 2606.23964

3D Masked Autoencoders are Robust Learners of Volumetric and Multimodal Cellular Representations for Microscopy

Self-supervised learning in fluorescence microscopy often relies on 2D projections, despite the inherently three-dimensional nature of cells. We present a systematic comparison of 2D and 3D masked autoencoders (MAE-2D vs. MAE-3D) on volumetric microscopy data. Under matched architectures and training protocols, MAE-3D consistently outperforms 2D max-projection and slice-based variants on downstream single-cell tasks. We further align visual representations with a pretrained protein language model (ESM2) and show that cross-modal supervision yields larger gains for volumetric models. Channel cross-attention and frequency-domain regularization are critical for leveraging 3D spatial context. On a protein--protein interaction task, MAE-3D achieves a ROC--AUC of 0.865, outperforming prior methods by up to +0.025. For protein localization, our best 3D model attains state-of-the-art AUC$_{\text{micro}}$ (0.952) and F1$_{\text{micro}}$ (0.742), improving over previous approaches by +0.003 and +0.010 absolute, respectively. Overall, these results demonstrate the advantages of native 3D modeling and multimodal alignment for representation learning in single-cell microscopy.


[18] 2606.24325

Exact Enumeration of Phylogenetic Networks: The Tree-Child, Reticulation-Visible and Orchard Hierarchy

We develop a unified framework for the exact enumeration and asymptotic analysis of the three most studied classes of phylogenetic networks: tree-child (TC), reticulation-visible (RV) and orchard networks, whose cardinalities satisfy the strict ordering $|\mathrm{TC}_{\ell,k}|<|\mathrm{RV}_{\ell,k}|<|\mathrm{Orch}_{\ell,k}|$ for reticulation number $k\geq2$ (with $\mathrm{TC}\subsetneq\mathrm{RV}$ and $\mathrm{TC}\subsetneq\mathrm{Orch}$, while $\mathrm{RV}$ and $\mathrm{Orch}$ are incomparable as sets). Using the Chang--Fuchs structural theorem, we derive a two-level master functional equation for the RV bivariate generating function and obtain exact closed-form identities for the differences $\Delta_k(\ell):=|RV_{\ell,k}|-|TC_{\ell,k}|$ for $k=2,3$, with the asymptotic universality $\Delta_k(\ell)/|TC_{\ell,k}|\sim k!/\ell$. For orchard networks, we prove a \emph{universal hypergeometric law} that resolves the exact enumeration problem for all $\ell$: the column generating function $F_\ell(v)$ is rational with denominator $D_\ell(v)=\prod_{j=2}^\ell X_j(v)$, where \[ X_\ell(v) = \sum_{k=0}^{\lfloor\ell/2\rfloor}(-1)^k\, \frac{\ell!}{(\ell-2k)!\,k!}\,v^k \] is the matching polynomial of the complete graph $K_\ell$ and a rescaled Jacobi polynomial. This immediately resolves the intractable $\ell=9$ case: $D_9$ has degree 20, dominant growth rate $\approx40.73$, and all spectral roots are positive real. A complete enumeration table is provided extending the published data of Cardona, Ribas and Pons.


[19] 2606.24394

Average Rankings Mask Per-Subject Optimality: A Friedman-Nemenyi Benchmark of EEG Motor-Imagery BCI Decoders

Electroencephalography (EEG) is the dominant non-invasive modality for brain-computer interfaces (BCIs), yet reliable decoding of motor imagery is hampered by inter- and intra-individual variability. A recurring claim is that one decoding pipeline, most often a spatial or Riemannian method, is broadly preferable. We test the weakest version of that claim under the most favourable conditions. Using the Mother of All BCI Benchmarks (MOABB) framework, we evaluated 1,056 decoding configurations (feature extractor x scaler x classifier), >340,000 subject-level model fits, across three public left-versus-right motor-imagery datasets (PhysionetMI, 109 participants; Cho2017, 52; Zhou2016, 4) and two frequency bands (8-15 Hz, 8-30 Hz). Every model is fit and tested within a single session of a single participant, the easiest regime, giving every pipeline its best chance. We apply the statistics standard for multi-classifier comparison: Friedman omnibus tests, Nemenyi critical-difference analysis and Wilcoxon signed-rank tests with effect sizes. Covariance tangent-space projection (cov-tgsp) and Common Spatial Patterns (CSP) are the strongest families, but their ordering is dataset-dependent and, on the largest and most heterogeneous cohort (PhysionetMI), statistically indistinguishable (Nemenyi p = 0.27; Kendall's W = 0.11). At the individual level the single best pipeline is optimal for only 35% of PhysionetMI participants, and nonlinear descriptors are best for roughly one third; matching pipeline to participant adds about seven accuracy points over the best fixed choice. The ranking is not an artefact of dimensionality, and classifier and scaler choices are secondary to the feature representation. Even in the easiest regime, no single pipeline dominates: a lower bound on the personalization problem and a quantitative case for participant-aware model selection rather than a universal decoder.


[20] 2606.24411

The impact of population heterogeneity on the redundancy principle

Biological signaling is often governed by extreme value statistics, where a rapid response relies on the fastest few out of a large redundant group of searchers. While extreme first passage time (FPT) theory is well established for homogeneous ensembles, its sensitivity to population heterogeneity remains open. We show that averaging over a heterogeneous population of memoryless random walkers gives rise to ensemble self-reinforcement. This heterogeneity drastically changes both the FPT and minimum FPT densities relative to a homogeneous ensemble with identical mean rates. The modal and minimum FPTs are an order of magnitude smaller for heterogeneous populations relative to homogeneous ones. Our exact analytical predictions establish that population heterogeneity is a parameter that biology can exploit and not merely noise to be averaged away.


[21] 2606.24415

Novel Triple-Based Problems for the Construction of Phylogenetic Networks via Least Common Ancestors

Evolutionary histories are often represented by rooted phylogenetic networks, whose leaves correspond to extant taxa and whose internal vertices represent ancestral lineages. Since such histories must usually be inferred from incomplete data, in particular from genomic sequences of present-day taxa, one often obtains only local information about relative evolutionary proximity. For instance, sequence data may suggest that two taxa $x$ and $y$ are more closely related to each other than either is to a third taxon $z$. This information is classically encoded by a rooted triple $xy|z$. In this paper, we study rooted triples in phylogenetic networks under an ancestor-based interpretation: $xy|z$ is displayed if the unique least common ancestor (LCA) of $x$ and $y$ lies strictly below the unique LCA of $x$ and $z$, respectively of $y$ and $z$, and the latter two LCAs coincide. We also introduce anchored triples $\underline{x}y|z$, which retain only the asymmetric comparison that the LCA of $x$ and $y$ lies below the LCA of $x$ and $z$. This relaxation is natural in networks, where different pairwise ancestral relationships need not behave as they do in trees. We consider several variants of consistency problems for ordinary and anchored triples, both with and without forbidden triples. Somewhat surprisingly, these ancestor-based consistency questions for triples in phylogenetic networks do not appear to have been addressed before despite their direct biological interpretation and the fact that such constraints can be inferred naturally from genomic sequence data. By translating these questions into realization problems for required and forbidden LCA-constraints, we show that all resulting problems can be solved in polynomial time. Moreover, whenever a solution exists, a suitable realizing DAG and phylogenetic network can be constructed within the same time bound.


[22] 2606.24417

cuSBF: A Minimizer-Aware Bloom Filter for Genomic Sequence Data on Modern GPUs

Efficient genomic k-mer indexing depends on approximate membership query (AMQ) structures that must deliver high throughput, low false-positive rates (FPR), and modest memory footprints. The Super Bloom filter (SBF) is attractive for this scenario because minimizer-guided sharding and the Findere scheme exploit the redundancy of overlapping k-mers. However, those same features cause high per-k-mer compute cost, severe register pressure, and irregular memory accesses, which hinder an effective GPU implementation. We present cuSBF, an open-source, header-only CUDA library that implements SBF for sequence-native workloads. cuSBF's design merges sectorized shards, cooperative shared-memory tiling, warp-level shard sharing, and segmented warp reductions, turning super-k-mer locality into scalable GPU parallelism. Across real genomic workloads on RTX PRO 6000 Blackwell and GH200 systems, cuSBF achieves the highest throughput among all evaluated sequence-capable baselines. On the RTX PRO 6000, it surpasses the cuCollections blocked Bloom filter baseline by up to 9.1x for insertion and 7.7x for query, while reaching up to 92x and 234x speedups over the multi-threaded CPU Super Bloom reference implementation. It also outperforms GPU-based dynamic AMQs (Cuckoo, Two-Choice, Quotient filters) by 1.5-3400x depending on workload characteristics. A parameter sweep identifies (s = 28, m = 16, H = 4) as Pareto-optimal for k = 31, yielding significantly lower FPR than cuCollections at matched memory budgets. Crucially, cuSBF's architecture-aware design sustains 85% streaming multiprocessor utilization even for out-of-cache filters - proving that sequence locality, not raw bandwidth, is the key to GPU-accelerated genomic indexing.


[23] 2606.24562

A parameterized family of balance indices for phylogenetic networks

We introduce a new family of balance indices for phylogenetic networks: the $H_\alpha$ indices, where $\alpha$ is a positive real number. This family includes the $B_2$ index as a special case ($\alpha = 1$) and provides a natural extension of the Sackin index to phylogenetic networks. We show that the $H_\alpha$ indices share many structural properties with the $B_2$ index, most notably a "grafting property" that makes it possible to express the $H_\alpha$ index of a network in terms of the $H_\alpha$ indices of its biconnected components. These properties allow us to identify networks that minimize / maximize $H_\alpha$ for various classes of phylogenetic networks, and to study its distribution for several models of random trees and networks (in particular, Galton-Watson trees and binary Markov branching trees, with a focus on the Yule and PDA models). Finally, we show how local limits can be used to analyze the asymptotic behavior of $H_\alpha$ for large trees and networks, and we obtain general results for the moments of $H_\alpha$ for a broad class of random phylogenetic networks known as blowups of Galton-Watson trees.


[24] 2606.24873

Data-Based Dynamical Systems Reconstruction: An Adequacy/Reliability Test

In this work, we address the problem of validating the reconstruction of a stochastic system from noisy data. We demonstrate the limitations of criteria based solely on the loss function or on standard metrics used for reconstructing deterministic dynamics. We also propose an exploratory approach, based on a two-step test, which allows for a general assessment of the reconstruction without relying on arbitrary error-tolerance thresholds. However, we discuss how system degeneracy and non-identifiability, together with features intrinsic to stochastic dynamics, impose certain constraints on the application of this test.


[25] 2507.15519

A Dynamical Blueprint for Brain State Organization

The brain is not static: neuronal networks shift between contrasting modes of activity, alternating between active and quiescent regimes known as up and down states. Together with rhythmic oscillations, such modes of activity are fundamental to perception, memory, and information processing. However, the dynamical principles underlying the diverse repertoire of activity patterns and their transitions remain poorly understood. Here, we identify a geometric structure that governs dynamic states emergence and organizes neuronal networks transitions. We derive the conditions for its existence and demonstrate that it emerges robustly across canonical models of neuronal population dynamics. Near this organizing center, switches between oscillations, bistability and up and down states are orchestrated by the excitation-inhibition balance in the neuronal network. Thus, we show that excitation and inhibition do not simply modulate network activity but define the dynamical landscape from which distinct brain states emerge. We also consider neuron-astrocyte interactions and reveal how astrocytes can tune excitatory-inhibitory balance, therefore modulating the transitions between neuronal activity regimes. Overall, our results identify a general dynamical blueprint underlying the emergence, organization, and control of brain states.


[26] 2508.21490

Testing quantum-like markers in neural dynamics

We propose two experiments for identifying quantum markers in neural data based on quantum variants of well-known equations for neural activity that describe electrical signal propagation on axonal arbors and dendrites. These include (i) testing if power spectra from subthreshold oscillations in neuronal cultures follow the classical Fitzgugh-Nagumo equations or a recently introduced quantum variant of them and (ii) testing if propagation statistics of electrical activity in axons follow the classical diffusive cable equation or a quantum variant of it.


[27] 2605.23967

Sensing Intelligence as a Trainable Metamaterial Property

In biological systems, sensing is not performed by the brain alone: the body deforms, vibrates, and filters external stimuli before they are transduced into neural signals. In engineered systems, this processing burden is placed largely on electronics and computation, while the mechanical body is usually designed only for strength and stability. Here, we present sensing intelligence as a trainable property of the body. We show that the geometry of a metamaterial can be optimized to reshape external stimuli into internal signals that are easier for a neural network to interpret. Rather than hand-designing this physical preprocessing, we let the neural network train its own body for sensing by backpropagating the sensing loss to the body's design parameters through differentiable simulation. Across numerical and experimental sensing scenarios, the optimized body improves sensing accuracy by up to fivefold or reduces the number of required electronic sensors by nearly an order of magnitude.


[28] 2606.21785

Mostly-monocular responses and other visual functions in a multiscale network model of Macaque V1

Visual signals from the two eyes merge gradually as they pass through the primary visual cortex (V1). Here we use a computational model of Macaque V1 to study the first stage of this integration along the magnocellular pathway, in layer 4C$\alpha$, aiming to infer neuroanatomical origins of binocular response. It is known that neurons in layer 4C$\alpha$ are predominantly monocular, though some do exhibit varying degrees of binocularity. We find (1) the emergence of narrow binocular strips along borders of ocular dominance columns (ODC), a finding that aligns with experiments; (2) most consistent with data is when $10-30\%$ of interactions near ODC boundaries are cross-columnar; and (3) feedback from layer 6 is largely monocular. These results were obtained through systematic hypothesis testing using a multiscale model that is orders of magnitude faster than its biologically-detailed predecessors. We propose that multiscale modeling can be an effective tool for bridging anatomy and function.


[29] 2406.16465

Genealogical processes of sequential Monte Carlo methods and other non-neutral population models under rapid mutation

We show that genealogical trees arising from a broad class of non-neutral models of population evolution converge to the Kingman coalescent under a suitable rescaling of time. As well as non-neutral biological evolution, our results apply to genetic algorithms encompassing the prominent class of sequential Monte Carlo (SMC) methods. The time rescaling we need differs slightly from that used in classical results for convergence to the Kingman coalescent, which has implications for the performance of different resampling schemes in SMC algorithms. In addition, our work substantially simplifies earlier proofs of convergence to the Kingman coalescent, and corrects an error common to several earlier results.


[30] 2503.21450

CMADiff: Cross-Modal Aligned Diffusion for Controllable Protein Generation

AI-assisted protein design has emerged as a critical tool for advancing biotechnology, as deep generative models have demonstrated their reliability in this domain. However, most existing models primarily utilize protein sequence or structural data for training, neglecting the physicochemical properties of this http URL, they are deficient to control the generation of proteins in intuitive conditions. To address these limitations,we propose CMADiff here, a novel framework that enables controllable protein generation by aligning the physicochemical properties of protein sequences with text-based descriptions through a latent diffusion process. Specifically, CMADiff employs a Conditional Variational Autoencoder (CVAE) to integrate physicochemical features as conditional input, forming a robust latent space that captures biological traits. In this latent space, we apply a conditional diffusion process, which is guided by BioAligner, a contrastive learning-based module that aligns text descriptions with protein features, enabling text-driven control over protein sequence generation. Validated by a series of evaluations including AlphaFold3, the experimental results indicate that CMADiff outperforms protein sequence generation benchmarks and holds strong potential for future applications. The implementation and code are available at this https URL.


[31] 2508.16650

Predicting brain tumour enhancement from non-contrast MR imaging with artificial intelligence: a multi-cohort retrospective diagnostic accuracy study

Brain tumour MRI typically requires both pre- and post-contrast imaging, but gadolinium is not always desirable (frequent follow-up, renal impairment, allergy, paediatric patients). We developed and validated a deep learning model to predict tumour contrast enhancement from non-contrast MRI alone. We assembled 11,089 brain MRI studies (2006-2024) from 10 datasets across four countries and three continents, spanning adult and paediatric populations with glioma, meningioma, metastases, and post-resection appearances. Three architectures were trained to detect and segment enhancing tumour from T1w, T2w and FLAIR alone. Performance was assessed in a 1,109-study held-out test set (primary endpoint: patient-level enhancement detection; secondary: voxel-level Dice). Eleven expert radiologists attempted the same task on a 564-case subset (100 cases each), blinded to history, prior imaging, and referral. The best model, nnU-Net, achieved 83.0% balanced accuracy (95% CI 79.1-87.2; sensitivity 91.5%, specificity 74.4%) for detection, with R2 = 0.859 for enhancement volume. Of enhancing cases, 76.8% reached Dice >= 0.3, 67.5% >= 0.5, and 50.2% >= 0.7. Under blinded conditions, radiologists' majority vote was lower (71.7% balanced accuracy; sensitivity 77.6%, specificity 65.8%). The proportion reaching Dice >= 0.3 varied by pathology (meningioma 93%, presurgical glioma 76%, metastases 74%, postoperative glioma 74%) and was lowest for paediatric cases (45%). Deep learning can identify contrast-enhancing brain tumours from non-contrast MRI. These models show promise as a triage or decision-support adjunct, such as in flagging studies likely to enhance so that contrast can be added to a non-contrast protocol, and may reduce gadolinium dependence in neuro-oncology imaging. Future work should optimise these models with radiologists.


[32] 2512.10279

Computing Evolutionarily Stable Strategies in Imperfect-Information Games

We present an algorithm for computing evolutionarily stable strategies (ESSs) in symmetric perfect-recall extensive-form games of imperfect information. Our main algorithm is for two-player games, and we describe how it can be extended to multiplayer games. The algorithm is sound and computes all ESSs in nondegenerate games and a subset of them in degenerate games which contain an infinite continuum of symmetric Nash equilibria. The algorithm is anytime and can be stopped early to find one or more ESSs. We experiment on an imperfect-information cancer signaling game as well as random games to demonstrate scalability.