Complex change is often described as "evolutionary" in economics, policy, and technology, yet most system dynamics models remain constrained to fixed state spaces and equilibrium-seeking behavior. This paper argues that evolutionary dynamics should be treated as a core system-thinking problem rather than as a biological metaphor. We introduce Stability-Driven Assembly (SDA) as a minimal, non-equilibrium framework in which stochastic interactions combined with differential persistence generate endogenous selection without genes, replication, or predefined fitness functions. In SDA, longer-lived patterns accumulate in the population, biasing future interactions and creating feedback between population composition and system dynamics. This feedback yields fitness-proportional sampling as an emergent property, realizing a natural genetic algorithm driven solely by stability. Using SDA, we demonstrate why equilibrium-constrained models, even when simulated numerically, cannot exhibit open-ended evolution: evolutionary systems require population-dependent, non-stationary dynamics in which structure and dynamics co-evolve. We conclude by discussing implications for system dynamics, economics, and policy modeling, and outline how agent-based and AI-enabled approaches may support evolutionary models capable of sustained novelty and structural emergence.
Understanding the dynamic nature of brain connectivity is critical for elucidating neural processing, behavior, and brain disorders. Traditional approaches such as sliding-window correlation (SWC) characterize time-varying undirected associations but do not resolve directional interactions, limiting inference about time-resolved information flow in brain networks. We introduce sliding-window prediction correlation (SWpC), which embeds a directional linear time-invariant (LTI) model within each sliding window to estimate time-varying directed functional connectivity (FC). SWpC yields two complementary descriptors of directed interactions: a strength measure (prediction correlation) and a duration measure (window-wise duration of information transfer). Using concurrent local field potential (LFP) and fMRI BOLD recordings from rat somatosensory cortices, we demonstrate stable directionality estimates in both LFP band-limited power and BOLD. Using Human Connectome Project (HCP) motor task fMRI, SWpC detects significant task-evoked changes in directed FC strength and duration and shows higher sensitivity than SWC for identifying task-evoked connectivity differences. Finally, in post-concussion vestibular dysfunction (PCVD), SWpC reveals reproducible vestibular-multisensory brain-state shifts and improves healthy-control vs subacute patient (HC-ST) discrimination using state-derived features. Together, these results show that SWpC provides biologically interpretable, time-resolved directed connectivity patterns across multimodal validation and clinical application settings, supporting both basic and translational neuroscience.
The loss of biodiversity due to the likely widespread extinction of species in the near future is a focus of current concern in conservation biology. One approach to measure the impact of this extinction is based on the predicted loss of phylogenetic diversity. These predictions have become a focus of the Zoological Society of London's 'EDGE2' program for quantifying biodiversity loss and involves considering the HED (heightened evolutionary distinctiveness) and HEDGE (heightened evolutionary distinctiveness and globally endangered) indices. Here, we show how to generalise the HED(GE) indices by expanding their application to more general settings (to phylogenetic networks, to feature diversity on discrete traits, and to arbitrary biodiversity measures). We provide a simple and explicit description of the mean and variance of such measures, and illustrate our results by an application to the phylogeny of all 27 extant Crocodilians. We also derive various equalities for feature diversity, and an inequality if species extinction rates are correlated with feature types.
Gene Regulatory Networks(GRNs) with feedback are essential components of many cellular processes and may exhibit oscillatory behavior. Analyzing such systems becomes increasingly complex as the number of components increases. Since gene regulation often involves a small number of molecules, fluctuations are inevitable. Therefore, it is important to understand how fluctuations affect the oscillatory dynamics of cellular processes, as this will allow comprehension of the mechanisms that enable cellular functions to remain even in the presence of fluctuations or, failing that, to determine the limit of fluctuations that permits various cellular functions. In this study, we investigated the conditions under which GRNs with feedback and intrinsic fluctuations exhibit oscillatory behavior. Our focus was on developing a procedure that would be both manageable and practical, even for extensive regulatory networks, that is, those comprising numerous nodes. Using the second-moment approach, we described the stochastic dynamics through a set of ordinary differential equations for the mean concentration and its second central moment. The system can attain either a stable equilibrium or oscillatory behavior, depending on its scale and, consequently, the intensity of fluctuations. To illustrate the procedure, we analyzed two relevant systems: a repressilator with three nodes and a system with five nodes, both incorporating intrinsic fluctuations. In both cases, it was observed that for very small systems, which therefore exhibit significant fluctuations, oscillatory behavior is inhibited. The procedure presented here for analyzing the stability of oscillations under fluctuations enables the determination of the critical minimum size of GRNs at which intrinsic fluctuations do not eliminate their cyclical behavior.
The representation of protein backbone geometry through the discrete nonlinear Schrödinger equation provides a theoretical connection between biological structure and integrable systems. Although the global application of this framework is constrained by chiral degeneracies and non-local interactions we propose that helical peptides can be effectively modeled as piecewise integrable systems in which the discrete Hasimoto map remains applicable within specific geometric boundaries. We delineate these boundaries through an analytic characterization of the mapping between biochemical dihedral angles and Frenet frame parameters for a dataset of 50 helical peptide chains. We demonstrate that the transformation is information-preserving globally but ill-conditioned within the helical basin characterized by a median Jacobian condition number of 31 which suggests that the loss of chiral information arises primarily from local coordinate compression rather than topological singularities. We define a local integrability error $E[n]$ derived from the discrete dispersion relation to show that deviations from integrability are driven predominantly by torsion non-uniformity while curvature remains structurally rigid. This metric identifies integrable islands where the analytic dispersion relation predicts backbone coordinates with sub-angstrom accuracy yielding a median root-mean-square deviation of 0.77\,Å and enables a segmentation strategy that isolates structural defects. We further indicate that the inverse design of peptide backbones is feasible within a quantitatively defined integrability zone where the design constraint reduces essentially to the control of torsion uniformity. These findings advance the Hasimoto formalism from a qualitative descriptor toward a precise quantitative framework for analyzing and designing local protein geometry within the limits of piecewise integrability.
Natural birth and death are fundamental mechanisms of population dynamics in ecosystems and have played pivotal roles in shaping population dynamics. Nevertheless, in studies of cyclic competition systems governed by the rock-paper-scissors (RPS) game, these mechanisms have often been ignored in analyses of biodiversity. On the other hand, given the prevalence and profound impact on biodiversity, understanding how higher-order interactions (HOIs) can affect biodiversity is one of the most challenging issues, and thus HOIs have been continuously studied for their effects on biodiversity in systems of cyclic competing populations, with a focus on neutral species. However, in real ecosystems, species can evolve and die naturally or be preyed upon by predators, whereas previous studies have considered only classic reaction rules among three species with a neutral, nonparticipant species. To identify how neutral species can affect the biodiversity of the RPS system when species' natural birth and death are assumed, we consider a model of neutral species in higher-order interactions within the spatial RPS system, assuming birth-and-death processes. Extensive simulations show that when neutral species interfere positively, they dominate the available space, thereby reducing the proportion of other species. Conversely, when the interference is harmful, the density of competing species increases. In addition, unlike traditional RPS dynamics, biodiversity can be effectively maintained even in high-mobility regimes. Our study reaffirms the critical role of neutral species in preserving biodiversity.
Diversity-Generating Retroelements (DGRs) create rapid, targeted variation within specific genomic regions in phages and bacteria. They operate through stochastic retro-transcription of a template region (TR) into a variable region (VR), which typically encodes ligand-binding proteins. Despite their prevalence, the evolutionary conditions that maintain such hypermutating systems remain unclear. Here we introduce a two-timescale framework separating fast VR diversification from slow TR evolution, allowing the dynamics of DGR-controlled loci to be analytically understood from the TR design point of view. We quantity the fitness gain provided by the diversification mechanism of DGR in the presence of environmental switching with respect to standard mutagenesis. Our framework accounts for observed patterns of DGR activity in human-gut \textit{Bacteroides} and clarifies when constitutive DGR activation is evolutionarily favored.
The vast majority of biological sequences encode unknown functions and bear little resemblance to experimentally characterized proteins, limiting both our understanding of biology and our ability to harness functional potential for the bioeconomy. Predicting enzyme function from sequence remains a central challenge in computational biology, complicated by low sequence diversity and imbalanced label support in publicly available datasets. Models trained on these data can overestimate performance and fail to generalize. To address this, we introduce GRIMM (Genetic stRatification for Inference in Molecular Modeling), a benchmark for enzyme function prediction that employs genetic stratification: sequences are clustered by similarity and clusters are assigned exclusively to training, validation, or test sets. This ensures that sequences from the same cluster do not appear in multiple partitions. GRIMM produces multiple test sets: a closed-set test with the same label distribution as training (Test-1) and an open-set test containing novel labels (Test-2), serving as a realistic out-of-distribution proxy for discovering novel enzyme functions. While demonstrated on enzymes, this approach is generalizable to any sequence-based classification task where inputs can be clustered by similarity. By formalizing a splitting strategy often used implicitly, GRIMM provides a unified and reproducible framework for closed- and open-set evaluation. The method is lightweight, requiring only sequence clustering and label annotations, and can be adapted to different similarity thresholds, data scales, and biological tasks. GRIMM enables more realistic evaluation of functional prediction models on both familiar and unseen classes and establishes a benchmark that more faithfully assesses model performance and generalizability.
There is growing evidence that independently trained AI systems come to represent the world in the same way. In other words, independently trained embeddings from text, vision, audio, and neural signals share an underlying geometry. We call this the Representational Alignment Hypothesis (RAH) and investigate evidence for and consequences of this claim. The evidence is of two kinds: (i) internal structure comparison techniques, such as representational similarity analysis and topological data analysis, reveal matching relational patterns across modalities without explicit mapping; and (ii) methods based on cross-modal embedding alignment, which learn mappings between representation spaces, show that simple linear transformations can bring different embedding spaces into close correspondence, suggesting near-isomorphism. Taken together, the evidence suggests that, even after controlling for trivial commonalities inherent in standard data preprocessing and embedding procedures, a robust structural correspondence persists, hinting at an underlying organizational principle. Some have argued that this result shows that the shared structure is getting at a fundamental, Platonic level of reality. We argue that this conclusion is unjustified. Moreover, we aim to give the idea an alternative philosophical home, rooted in contemporary metasemantics (i.e., theories of what makes a representation and what makes something meaningful) and responses to the symbol grounding problem. We conclude by considering the scope of the RAH and proposing new ways of distinguishing semantic structures that are genuinely invariant from those that inevitably arise due to the fact that all our data is generated under human-specific conditions on Earth.
Background: Human behavior shapes infectious disease dynamics, yet its integration into transmission models remains fragmented. Recent epidemics, particularly COVID-19, highlight the need for models capturing adaptation to perceived risk, social influence, and policy signals. This review synthesizes post-2020 models incorporating behavioral adaptation, examines their theoretical grounding, and evaluates how behavioral constructs modify transmission, vaccination, and compliance. Methods: Following PRISMA guidelines, we searched Scopus and PubMed (2020-2025), screening 1,274 records with citation chaining. We extracted data on disease context, country, modeling framework, behavioral mechanisms (prevalence-dependent, policy/media, imitation/social learning), and psychosocial constructs (personal threat, coping appraisal, barriers, social norms, cues to action). A total of 216 studies met inclusion criteria. Results: COVID-19 accounted for 73% of studies. Most used compartmental ODE models (81%) and focused on theoretical or U.S. settings. Behavioral change was mainly reactive: 47% applied prevalence-dependent feedback, 25% included awareness/media dynamics, and 19% relied on exogenous policy triggers. Game-theoretic or social learning approaches were rare (less or equal than 5%). Behavioral effects primarily modified contact or transmission rates (91%). Psychosocial constructs were unevenly represented: cues to action (n=159) and personal threat (n=145) dominated, whereas coping appraisal (n=82), barriers (n=36), and social norms (n=25) were less common. Conclusions: We propose a taxonomy structured by behavioral drivers, social scale, and memory to clarify dominant paradigms and their empirical basis. Mapping models to psychosocial constructs provides guidance for more theory-informed and data grounded-integration of behavioral processes in epidemiological modeling.
Single-cell RNA sequencing (scRNA-seq) data exhibit strong and reproducible statistical structure. This has motivated the development of large-scale foundation models, such as TranscriptFormer, that use transformer-based architectures to learn a generative model for gene expression by embedding genes into a latent vector space. These embeddings have been used to obtain state-of-the-art (SOTA) performance on downstream tasks such as cell-type classification, disease-state prediction, and cross-species learning. Here, we ask whether similar performance can be achieved without utilizing computationally intensive deep learning-based representations. Using simple, interpretable pipelines that rely on careful normalization and linear methods, we obtain SOTA or near SOTA performance across multiple benchmarks commonly used to evaluate single-cell foundation models, including outperforming foundation models on out-of-distribution tasks involving novel cell types and organisms absent from the training data. Our findings highlight the need for rigorous benchmarking and suggest that the biology of cell identity can be captured by simple linear representations of single cell gene expression data.
The rates at which individuals memorize and forget environmental information strongly influence their movement paths and long-term space use. To understand how these cognitive time scales shape population-level patterns, we propose and analyze a nonlocal population model with a cognitive map. The population density moves by a Fokker--Planck type diffusion driven by a cognitive map that stores a habitat quality information nonlocally. The map is updated through local presence with learning and forgetting rates, and we consider both truncated and normalized perception kernels. We first study the movement-only system without growth. We show that finite perceptual range generates spatial heterogeneity in the cognitive map even in nearly homogeneous habitats, and we prove a lingering phenomenon on unimodal landscapes: for the fixed high learning rate, the peak density near the best location is maximized at an intermediate forgetting rate. We then couple cognitive diffusion to logistic growth. We establish local well-posedness and persistence by proving instability of the extinction equilibrium and the existence of a positive steady state, with uniqueness under an explicit condition on the motility function. Numerical simulations show that lingering persists under logistic growth and reveal a trade-off between the lingering and total population size, since near the strongest-lingering regime the total mass can fall below the total resource, in contrast to classical random diffusive--logistic models.
Epilepsy affects over 50 million people worldwide, and one-third of patients suffer drug-resistant seizures where surgery offers the best chance of seizure freedom. Accurate localization of the epileptogenic zone (EZ) relies on intracranial EEG (iEEG). Clinical workflows, however, remain constrained by labor-intensive manual review. At the same time, existing data-driven approaches are typically developed on single-center datasets that are inconsistent in format and metadata, lack standardized benchmarks, and rarely release pathological event annotations, creating barriers to reproducibility, cross-center validation, and clinical relevance. With extensive efforts to reconcile heterogeneous iEEG formats, metadata, and recordings across publicly available sources, we present $\textbf{Omni-iEEG}$, a large-scale, pre-surgical iEEG resource comprising $\textbf{302 patients}$ and $\textbf{178 hours}$ of high-resolution recordings. The dataset includes harmonized clinical metadata such as seizure onset zones, resections, and surgical outcomes, all validated by board-certified epileptologists. In addition, Omni-iEEG provides over 36K expert-validated annotations of pathological events, enabling robust biomarker studies. Omni-iEEG serves as a bridge between machine learning and epilepsy research. It defines clinically meaningful tasks with unified evaluation metrics grounded in clinical priors, enabling systematic evaluation of models in clinically relevant settings. Beyond benchmarking, we demonstrate the potential of end-to-end modeling on long iEEG segments and highlight the transferability of representations pretrained on non-neurophysiological domains. Together, these contributions establish Omni-iEEG as a foundation for reproducible, generalizable, and clinically translatable epilepsy research. The project page with dataset and code links is available at this http URL.
Accurate estimation of the relative concentrations of chromophores in a spectroscopic photoacoustic (sPA) image can reveal immense structural, functional, and molecular information about physiological processes. However, due to nonlinearities and ill-posedness inherent to sPA imaging, concentration estimation is intractable. The Spectroscopic Photoacoustic Optical Inversion Autoencoder (SPOI-AE) aims to address the sPA optical inversion and spectral unmixing problems without assuming linearity. Herein, SPOI-AE was trained and tested on \textit{in vivo} mouse lymph node sPA images with unknown ground truth chromophore concentrations. SPOI-AE better reconstructs input sPA pixels than conventional algorithms while providing biologically coherent estimates for optical parameters, chromophore concentrations, and the percent oxygen saturation of tissue. SPOI-AE's unmixing accuracy was validated using a simulated mouse lymph node phantom ground truth.
Recent success in natural language processing has motivated growing interest in large-scale foundation models for neuroimaging data. Such models often require discretization of continuous neural time series data, a process referred to as 'tokenization'. However, the impact of different tokenization strategies for neural data is currently poorly understood. In this work, we present a systematic evaluation of sample-level tokenization strategies for transformer-based large neuroimaging models (LNMs) applied to magnetoencephalography (MEG) data. We compare learnable and non-learnable tokenizers by examining their signal reconstruction fidelity and their impact on subsequent foundation modeling performance (token prediction, biological plausibility of generated data, preservation of subject-specific information, and performance on downstream tasks). For the learnable tokenizer, we introduce a novel approach based on an autoencoder. Experiments were conducted on three publicly available MEG datasets spanning different acquisition sites, scanners, and experimental paradigms. Our results show that both learnable and non-learnable discretization schemes achieve high reconstruction accuracy and broadly comparable performance across most evaluation criteria, suggesting that simple fixed sample-level tokenization strategies can be used in the development of neural foundation models. The code is available at this https URL.
Aquaporins (AQPs) and aquaglyceroporins (AQGPs) play a crucial role in regulating water transport and solute selectivity across biological membranes. Besides their biological relevance, AQPs have at-tracted growing interest as models for the design of next-generation biomimetic membranes for water filtration. In this work, we present a pore-level Quantitative Structure-Activity Relationship (QSAR) approach that relates structural and physicochemical pore descriptors with experimentally reported water permeation rates across a set of AQ(G)Ps with high-resolution 3D structures. This data-driven methodology, presented here as a proof of concept, introduces a multi-feature framework for determining pore descriptors associated with water transport efficiency in AQ(G)Ps. Applied to two compiled permeation rate datasets, this framework recapitulates determinants previously reported in single-feature studies, while also highlighting additional pore descriptors that emerge as relevant in a multi-variable context. The insights gained through this approach may, in perspective, contribute to advancing the rational design of AQP-based filtration devices and to deepening the molecular understanding of the function of these valuable macromolecules in health and disease.
In both machine learning and in computational neuroscience, plasticity in functional neural networks is frequently expressed as gradient descent on a cost. Often, this imposes symmetry constraints that are difficult to reconcile with local computation, as is required for biological networks or neuromorphic hardware. For example, wake-sleep learning in networks characterized by Boltzmann distributions assumes symmetric connectivity. Similarly, the error backpropagation algorithm is notoriously plagued by the weight transport problem between the representation and the error stream. Existing solutions such as feedback alignment circumvent the problem by deferring to the robustness of these algorithms to weight asymmetry. However, they scale poorly with network size and depth. We introduce spike-based alignment learning (SAL), a complementary learning rule for spiking neural networks, which uses spike timing statistics to extract and correct the asymmetry between effective reciprocal connections. Apart from being spike-based and fully local, our proposed mechanism takes advantage of noise. Based on an interplay between Hebbian and anti-Hebbian plasticity, synapses can thereby recover the true local gradient. This also alleviates discrepancies that arise from neuron and synapse variability -- an omnipresent property of physical neuronal networks. We demonstrate the efficacy of our mechanism using different spiking network models. First, SAL can significantly improve convergence to the target distribution in probabilistic spiking networks versus Hebbian plasticity alone. Second, in neuronal hierarchies based on cortical microcircuits, SAL effectively aligns feedback weights to the forward pathway, thus allowing the backpropagation of correct feedback errors. Third, our approach enables competitive performance in deep networks using only local plasticity for weight transport.
Protein fitness landscapes frequently exhibit epistasis, where the effect of a mutation depends on the genetic context in which it occurs, i.e., the rest of the protein sequence. Epistasis increases landscape complexity, often resulting in multiple fitness peaks. In its simplest form, known as global epistasis, fitness is modeled as a non-linear function of an underlying additive trait. In contrast, more complex epistasis arises from a network of (pairwise or many-body) interactions between residues, which cannot be removed by a single non-linear transformation. Recent studies have explored how global and network epistasis contribute to the emergence of functional bottlenecks - fitness landscape topologies where two broad high-fitness basins, representing distinct phenotypes, are separated by a bottleneck that can only be crossed via one or a few mutational paths. Here, we introduce and analyze a stylized model of global epistasis with an additive underlying trait. We demonstrate that functional bottlenecks arise with high probability if the model is properly calibrated. Furthermore, our results underscore that a proper balance between neutral and non-neutral mutations is needed for the emergence of functional bottlenecks.
Phylogenetic inference, the task of reconstructing how related sequences evolved from common ancestors, is a central objective in evolutionary genomics. The current state-of-the-art methods exploit probabilistic models of sequence evolution along phylogenetic trees, by searching for the tree maximizing the likelihood of observed sequences, or by estimating the posterior of the tree given the sequences in a Bayesian framework. Both approaches typically require to compute likelihoods, which is only feasible under simplifying assumptions such as independence of the evolution at the different positions of the sequence, and even then remains a costly operation. Here we present the first likelihood-free inference method for posterior distributions over phylogenies. It exploits a novel expressive encoding for pairs of sequences, and a parameterized probability distribution factorized over a succession of subtree merges. The resulting network provides well-calibrated estimates of the posterior distribution leading to more accurate tree topologies than existing methods, even under models amenable to likelihood computation. We further show that its edge against likelihood-based methods dramatically increases under models of sequence evolution with intractable likelihoods.
A central challenge in cognitive neuroscience is to explain how semantic and episodic memory, two major forms of declarative memory, typically associated with cortical and hippocampal processing, interact to support learning, recall, and imagination. Despite significant advances, we still lack a unified computational framework that jointly accounts for core empirical phenomena across both semantic and episodic processing domains. Here, we introduce the Generative Episodic-Semantic Integration System (GENESIS), a computational model that formalizes memory as the interaction between two limited-capacity generative systems: a Cortical-VAE, supporting semantic learning and generalization, and a Hippocampal-VAE, supporting episodic encoding and retrieval within a retrieval-augmented generation (RAG) architecture. GENESIS reproduces hallmark behavioral findings, including generalization in semantic memory, recognition, serial recall effects and gist-based distortions in episodic memory, and constructive episodic simulation, while capturing their dynamic interactions. The model elucidates how capacity constraints shape the fidelity and memorability of experiences, how semantic processing introduces systematic distortions in episodic recall, and how episodic replay can recombine previous experiences. Together, these results provide a principled account of memory as an active, constructive, and resource-bounded process. GENESIS thus advances a unified theoretical framework that bridges semantic and episodic memory, offering new insights into the generative foundations of human cognition.
Background: Cognitive impairment in multiple sclerosis (MS) is driven by both focal inflammation and compartmentalized neurodegeneration, yet the relative effect of lesion-independent thalamic atrophy on information processing speed (IPS) remains unclear. Methods: This retrospective cohort study included 100 participants with MS. Automatic segmentation techniques quantified lesion load and delineated 26 thalamic regions of interest (ROIs). Linear models compared associations between ROI volumes and Symbol Digit Modalities Test (SDMT) performance in lesion-adjusted and unadjusted models. Results: Twenty-one of 26 ROIs showed significant SDMT associations before lesion adjustment; twelve remained significant after adjustment. Lesion-independent associations were observed in the global thalamus, sensory relay nuclei (ventral posterolateral, medial and lateral geniculate), and associative hubs (pulvinar and mediodorsal-parafascicular complex). These processing-associated ROIs exhibited significantly lower lesion-mediated effects (13.4%) than those losing significance after adjustment (34.2%, p < 0.001). Conclusion: Our findings suggest that IPS impairment reflects heterogeneous contributions from focal lesion-driven and chronic neurodegenerative pathology, with nucleus-specific phenotyping potentially informing identification of higher risk individuals.
Following limb amputation and targeted muscle reinnervation (TMR), nerves that originally innervated agonist and antagonist muscles are rerouted into one or more residual target muscles. This rerouting profoundly alters the natural mechanical coupling and afferent signalling that normally link muscle groups in intact limbs. Despite this disruption, in this study we demonstrate, using high-density intramuscular microelectrode arrays implanted in reinnervated muscles of three TMR participants, that motor units (MUs) associated with agonist and antagonist tasks remain functionally coupled. Specifically, over 40% of motor units active during agonist tasks were also recruited during the corresponding antagonist tasks, even though no visual feedback on antagonist neural activity was provided. These motor units exhibited significantly different firing rates depending on their functional role. These results provide the first motor-unit-level evidence that the central nervous system preserves coordinated agonist-antagonist control after TMR and inform restorative surgical strategies and prosthetic systems capable of regulating both limb kinematics and dynamics based on agonist-antagonist commands interplay.
During the development of an organism, cells must coordinate and organize to generate the correct shape, structure, and spatial patterns of tissues and organs, a process known as morphogenesis. The morphogenesis of embryonic tissues is supported by multiple processes that induce the precise physical deformations required for tissues to ultimately form organs with complex geometries. Among the most active players shaping the morphogenetic path are fine-tuned changes in cell adhesion. We review here recent advances showing that changes of a local, pair-wise property defined at the cell-cell contact level has important global consequences for embryonic tissue topology, being determinant in defining both the geometric and material properties of early embryo tissues.
The coalescent is a foundational model of latent genealogical trees under neutral evolution, but suffers from intractable sampling probabilities. Methods for approximating these sampling probabilities either introduce bias or fail to scale to large sample sizes. We show that a class of cost functionals of the coalescent with recurrent mutation and a finite number of alleles converge to tractable processes in the infinite-sample limit. A particular choice of costs yields insight about importance sampling methods, which are a classical tool for coalescent sampling probability approximation. These insights reveal that the behaviour of coalescent importance sampling algorithms differs markedly from standard sequential importance samplers, with or without resampling. We conduct a simulation study to verify that our asymptotics are accurate for algorithms with finite (and moderate) sample sizes. Our results constitute the first theoretical description of large-sample importance sampling algorithms for the coalescent, provide heuristics for the a priori optimisation of computational effort, and identify settings where resampling is harmful for algorithm performance. We observe strikingly different behaviour for importance sampling methods under the infinite sites model of mutation, which is regarded as a good and more tractable approximation of finite alleles mutation in most respects.
Background: Several studies show that large language models (LLMs) struggle with phenotype-driven gene prioritization for rare diseases. These studies typically use Human Phenotype Ontology (HPO) terms to prompt foundation models like GPT and LLaMA to predict candidate genes. However, in real-world settings, foundation models are not optimized for domain-specific tasks like clinical diagnosis, yet inputs are unstructured clinical notes rather than standardized terms. How LLMs can be instructed to predict candidate genes or disease diagnosis from unstructured clinical notes remains a major challenge. Methods: We introduce RAG-driven CoT and CoT-driven RAG, two methods that combine Chain-of-Thought (CoT) and Retrieval Augmented Generation (RAG) to analyze clinical notes. A five-question CoT protocol mimics expert reasoning, while RAG retrieves data from sources like HPO and OMIM (Online Mendelian Inheritance in Man). We evaluated these approaches on rare disease datasets, including 5,980 Phenopacket-derived notes, 255 literature-based narratives, and 220 in-house clinical notes from Childrens Hospital of Philadelphia. Results: We found that recent foundations models, including Llama 3.3-70B-Instruct and DeepSeek-R1-Distill-Llama-70B, outperformed earlier versions such as Llama 2 and GPT-3.5. We also showed that RAG-driven CoT and CoT-driven RAG both outperform foundation models in candidate gene prioritization from clinical notes; in particular, both methods with DeepSeek backbone resulted in a top-10 gene accuracy of over 40% on Phenopacket-derived clinical notes. RAG-driven CoT works better for high-quality notes, where early retrieval can anchor the subsequent reasoning steps in domain-specific evidence, while CoT-driven RAG has advantage when processing lengthy and noisy notes.
Biological organisms are adaptive, able to function in unpredictably changing environments. Drawing on recent nonequilibrium physics, we show that in adaptation, fitness has two components parameterized by observable coordinates: a static Generalism component characterized by state distributions, and a dynamic Tracking component sustained by nonequilibrium fluxes. Our findings: (1) General Theory: We prove that tracking gain scales strictly with environmental variability and switching time-scales; near-static or fast-switching environments are not worth tracking. (2) Optimal Strategies: We explain optimal bet-hedging and phenotypic memory as the interplay between these components. (3) Control: We demonstrate, with an example, how to suppress pathogens by independently attacking their Generalism robustness (via environmental time fractions) and Tracking capabilities (via environmental switching speed). This work provides a physical framework for understanding and controlling adaptivity.
The model organism Physarum polycephalum is known to perform decentralised problem solving despite absence of nervous system. Experimental evidence and modelling studies have linked these abilities, and in particular maze-solving, to some sort of memory and adaptation. However, despite compelling hypotheses, it is still not clear whether the tasks are solved optimally, and which key dynamical mechanisms enable Physarum's impressive abilities. Here, we employ a circuital network model for the foraging behaviour of Physarum polycephalum to prove that threshold sensing yields the emergence of unique and optimal paths that connect food sources and solve mazes. We also prove which conditions lead to alternative paths, thus elucidating how the organism achieves flexibility and adaptation in a self-organised manner. These findings are aligned with experimental evidences and provide insight into the evolution of primitive intelligence. Our results can also inspire the development of threshold-based algorithms for computing applications.
Predictive machine learning models generally excel on in-distribution data, but their performance degrades on out-of-distribution (OOD) inputs. Reliable deployment therefore requires robust OOD detection, yet this is particularly challenging for irregular 3D graphs that combine continuous geometry with categorical identities and are unordered by construction. Here, we present a probabilistic OOD detection framework for complex 3D graph data built on a diffusion model that learns a density of the training distribution in a fully unsupervised manner. A key ingredient we introduce is a unified continuous diffusion over both 3D coordinates and discrete features: categorical identities are embedded in a continuous space and trained with cross-entropy, while the corresponding diffusion score is obtained analytically via posterior-mean interpolation from predicted class probabilities. This yields a single self-consistent probability-flow ODE (PF-ODE) that produces per-sample log-likelihoods, providing a principled typicality score for distribution shift. We validate the approach on protein-ligand complexes and construct strict OOD datasets by withholding entire protein families from training. PF-ODE likelihoods identify held-out families as OOD and correlate strongly with prediction errors of an independent binding-affinity model (GEMS), enabling a priori reliability estimates on new complexes. Beyond scalar likelihoods, we show that multi-scale PF-ODE trajectory statistics - including path tortuosity, flow stiffness, and vector-field instability - provide complementary OOD information. Modeling the joint distribution of these trajectory features yields a practical, high-sensitivity detector that improves separation over likelihood-only baselines, offering a label-free OOD quantification workflow for geometric deep learning.