Diabetic foot ulcers (DFUs) are a severe complication of diabetes, often resulting in significant morbidity. This paper presents a predictive analytics framework utilizing time-series data captured by wearable foot sensors -- specifically NTC thin-film thermocouples for temperature measurement and FlexiForce pressure sensors for plantar load monitoring. Data was collected from healthy subjects walking on an instrumented pathway. Unsupervised machine learning algorithms, Isolation Forest and K-Nearest Neighbors (KNN), were applied to detect anomalies that may indicate early ulcer risk. Through rigorous data preprocessing and targeted feature engineering, physiologic patterns were extracted to identify subtle changes in foot temperature and pressure. Results demonstrate Isolation Forest is sensitive to micro-anomalies, while KNN is effective in flagging extreme deviations, albeit at a higher false-positive rate. Strong correlations between temperature and pressure readings support combined sensor monitoring for improved predictive accuracy. These findings provide a basis for real-time diabetic foot health surveillance, aiming to facilitate earlier intervention and reduce DFU incidence.
Intracranial language brain-computer interfaces (BCIs) are a promising route for restoring communication in people with severe motor and speech impairments, but clinical translation remains limited by fragmented evidence and unresolved design trade-offs across neuroscience, hardware, algorithm, evaluation, and clinical deployment. This review synthesizes progress in neural mechanisms of overt, mimed, and imagined speech; decision-oriented hardware comparisons of microelectrode array (MEA), electrocorticography (ECoG), and stereotactic electroencephalography (SEEG) recording modalities; experiment design for cross-subject and multilingual generalization; and neural decoding advances spanning sequence models, transformers, articulatory intermediates, and language-prior-assisted frameworks. We highlight persistent bottlenecks, including weak cross-subject transfer, long-term non-stationarity and recalibration burden, heterogeneous and non-comparable evaluation practices, limited naturalistic expressivity (especially for tonal/logosyllabic languages), and low signal-to-noise ratio (SNR) of neural activity in covert speech decoding. Our contributions are threefold: (1) an end-to-end, decision-oriented synthesis linking neural representations to recording choices, experimental design, decoding model architectures, and translational constraints; (2) a structured framework organized around five coupled design questions, together with a unified evaluation framework and a cross-language/cross-task benchmark template integrating objective, perceptual, expressive, conversational, and longitudinal metrics; and (3) user-centered translational guidance covering agency-preserving shared control, verifiable performance priorities, and scenario-specific minimum viable system (MVP) profiles for reliability-first home communication versus fidelity-first conversational speech restoration.
Parkinson's disease (PD) is projected to increase substantially due to population aging, making early diagnosis increasingly important, as timely detection may delay progression and reduce long-term complications. Retinal microvasculature has emerged as a promising anatomical biomarker of neurodegeneration, and when combined with artificial intelligence AI, retinal imaging may provide an advanced, noninvasive, and cost-effective screening strategy for PD. This study evaluated the evidence from the past 35 years regarding the capability of AI to detect early PD-related changes in retinal vascular structure. Five electronic databases including PubMed, Web of Science, Scopus, ScienceDirect, and ProQuest were systematically searched from January 1990 to January 2025. In addition, Annals of Neurology and Frontiers in Neuroscience were hand-searched, and the reference lists of included studies were screened for additional eligible publications. Nineteen studies met the inclusion criteria. Three principal diagnostic AI tasks were identified, including disease classification, retinal vessel segmentation, and PD risk stratification. The best-performing models were ShAMBi-LSTM on the Drishti dataset with 97.2 percent accuracy, 99.5 percent precision, 96.9 percent sensitivity, and an F1 score of 0.981 for classification, nnU-Net with 99.7 percent accuracy, 98.7 percent precision, 98.9 percent sensitivity, 99.8 percent specificity, and a Dice score of 98.9 percent for segmentation, and AlexNet for risk prediction with area under the curve values of 0.77, 0.68, and 0.73 across datasets. Overall, application of AI algorithms to retinal vasculature for detecting early signs of PD and predicting disease severity suggests that integration of AI with retinal biomarkers holds substantial potential for earlier and more accurate detection compared with traditional clinical evaluation alone.
Modern neuroscience has accumulated extensive evidence on perception, memory, prediction, valuation, and consciousness, yet still lacks an explicit operational architecture capable of integrating these phenomena within a unified computational framework. Existing theories address specific aspects of neural function: predictive coding and active inference emphasize hierarchical inference and prediction error minimization; engram theories explain memory through distributed cell assemblies; neuromodulatory accounts focus on value-dependent regulation of plasticity and behaviour; and global workspace or large-scale network models investigate mechanisms underlying conscious access. Despite their explanatory power, these approaches remain only partially integrated at the architectural level. This work introduces DIME (Detect-Integrate-Mark-Execute), a neural architecture organizing perception, memory, valuation, and conscious access within a common operational cycle. The framework includes four interacting components: engrams, distributed recurrent neural structures supporting multiple activation trajectories; execution threads, spatiotemporal trajectories implementing neural processes; marker systems, neuromodulatory and limbic mechanisms regulating gain, plasticity, and trajectory selection; and hyperengrams, large-scale integrative states associated with operational conscious access. The framework is consistent with empirical evidence from hippocampal indexing, recurrent cortical processing, replay phenomena, large-scale network integration, and neuromodulatory regulation. Formulated at an abstract computational level, DIME may also inform artificial intelligence and robotics by providing an architectural template in which representation, valuation, and temporal sequencing emerge from a unified mechanism. An extended theoretical exposition is available in a companion monograph on Zenodo.
In a context of growing agricultural demand and new challenges related to food security and accessibility, boosting agricultural productivity is more important than ever. Reducing the damage caused by invasive insect species is a crucial lever to achieve this objective. In support of these challenges, and in line with the principles of precision agriculture and Integrated Pest Management (IPM), an innovative simulation framework is presented, aiming to become the digital twin of a pest invasion. Through a flexible rule-based approach of the Agent-Based Modeling (ABM) paradigm, the framework supports the fine-tuning of the main ecological interactions of the pest with its crop host and the environment. Forecasting insect infestation in realistic scenarios, considering both spatial and temporal dimensions, is made possible by integrating heterogeneous data sources: pest biodata collected in the laboratory, environmental data from weather stations, and GIS data of a real crop field. In this study, an application to the global pest of soft fruit, the invasive fruit fly Drosophila suzukii, also known as Spotted Wing Drosophila (SWD), is presented.
Cryo-electron microscopy (cryo-EM) has emerged as a powerful technique for determining the three-dimensional structures of biological molecules at near-atomic resolution. However, reconstructing helical assemblies presents unique challenges due to their inherent symmetry and the need to determine unknown helical symmetry parameters. Traditional approaches require an accurate initial estimation of these parameters, which is often obtained through trial and error or prior knowledge. These requirements can lead to incorrect reconstructions, limiting the reliability of ab initio helical reconstruction. In this work, we present SHREC (Spectral Helical REConstruction), an algorithm that directly recovers the projection angles of helical segments from their two-dimensional cryo-EM images, without requiring prior knowledge of helical symmetry parameters. Our approach leverages the insight that projections of helical segments form a one-dimensional manifold, which can be recovered using spectral embedding techniques. Experimental validation on publicly available datasets demonstrates that SHREC achieves high resolution reconstructions while accurately recovering helical parameters, requiring only knowledge of the specimen's axial symmetry group. By eliminating the need for initial symmetry estimates, SHREC offers a more robust and automated pathway for determining helical structures in cryo-EM.
As proposed by Hebb's theory, neural assemblies are groups of excitatory neurons that fire synchronously and exhibit high synaptic density, representing external stimuli and supporting cognitive functions such as language and decision-making. Recently, a model called Assembly Calculus (AC) was proposed, enabling the formation of artificial neural assemblies through the $k$-winners-take-all selection process and Hebbian learning. Although the model is capable of forming assemblies according to Hebb's theory, the adopted selection process does not incorporate essential aspects of biological neural computation, as neural activity, which is often governed by statistical distributions consistent with power-law scaling. Given this limitation, the present work aimed to bring the model's dynamics closer to that observed in real cortical networks. To achieve this, a new selection mechanism inspired by the dynamics of gamma oscillation cycles, called E%-winners-take-all, was implemented, combined with an inhibition process based on the ratio between excitatory and inhibitory neurons observed in various regions of the cerebral cortex. The results obtained from our model (called E%-WTA model) were compared with those of the original model, and the analyses demonstrated that the introduced modifications allowed the network's own dynamics to determine the size of the formed assemblies. Furthermore, the recovery rate of these groups, through the evocation of the stimuli that generated them, became superior to that obtained in the original model.
Speech production and perception are the main ways humans communicate daily. Prior brain-to-text decoding studies have largely focused on a single modality and alphabetic languages. Here, we present a unified brain-to-sentence decoding framework for both speech production and perception in Mandarin Chinese. The framework exhibits strong generalization ability, enabling sentence-level decoding when trained only on single-character data and supporting characters and syllables unseen during training. In addition, it allows direct and controlled comparison of neural dynamics across modalities. Mandarin speech is decoded by first classifying syllable components in Hanyu Pinyin, namely initials and finals, from neural signals, followed by a post-trained large language model (LLM) that maps sequences of toneless Pinyin syllables to Chinese sentences. To enhance LLM decoding, we designed a three-stage post-training and two-stage inference framework based on a 7-billion-parameter LLM, achieving overall performance that exceeds larger commercial LLMs with hundreds of billions of parameters or more. In addition, several characteristics were observed in Mandarin speech production and perception: speech production involved neural responses across broader cortical regions than auditory perception; channels responsive to both modalities exhibited similar activity patterns, with speech perception showing a temporal delay relative to production; and decoding performance was broadly comparable across hemispheres. Our work not only establishes the feasibility of a unified decoding framework but also provides insights into the neural characteristics of Mandarin speech production and perception. These advances contribute to brain-to-text decoding in logosyllabic languages and pave the way toward neural language decoding systems supporting multiple modalities.
Objectively verifying the generative mechanism of consciousness is extremely difficult because of its subjective nature. As long as theories of consciousness focus solely on its generative mechanism, developing a theory remains challenging. We believe that broadening the theoretical scope and enhancing theoretical unification are necessary to establish a theory of consciousness. This study proposes seven questions that theories of consciousness should address: phenomena, self, causation, state, function, contents, and universality. The questions were designed to examine the functional aspects of consciousness and its applicability to system design. Next, we will examine how our proposed Dual-Laws Model (DLM) can address these questions. Based on our theory, we anticipate two unique features of a conscious system: autonomy in constructing its own goals and cognitive decoupling from external stimuli. We contend that systems with these capabilities differ fundamentally from machines that merely follow human instructions. This makes a design theory that enables high moral behavior indispensable.
The synchronized activity of neuronal populations can lead to pathological over-synchronization in conditions such as epilepsy and Parkinson disease. Such states can be desynchronized by brief electrical pulses. But when the underlying oscillating system is not known, as in most practical applications, to determine the specific times and intensities of pulses used for desynchronizaton is a difficult inverse problem. Here we propose a desynchronization scheme for neuronal models of bi-variate neural activity, with possible applications in the medical setting. Our main argument is the existence of a peculiar point in the phase space of the system, the centroid, that is both easy to calculate and robust under changes in the coupling constant. This important target point can be used in a control procedure because it lies in the region of minimal return times of the system.
This study presents the development of the PsyCogMetrics AI Lab (this http URL), an integrated, cloud-based platform that operationalizes psychometric and cognitive-science methodologies for Large Language Model (LLM) evaluation. Framed as a three-cycle Action Design Science study, the Relevance Cycle identifies key limitations in current evaluation methods and unfulfilled stakeholder needs. The Rigor Cycle draws on kernel theories such as Popperian falsifiability, Classical Test Theory, and Cognitive Load Theory to derive deductive design objectives. The Design Cycle operationalizes these objectives through nested Build-Intervene-Evaluate loops. The study contributes a novel IT artifact, a validated design for LLM evaluation, benefiting research at the intersection of AI, psychology, cognitive science, and the social and behavioral sciences.
Social media can reveal patient experiences with glucagon-like peptide-1 receptor agonists (GLP-1 RAs) that extend beyond clinical trial data. We analyzed 410,198 Reddit posts (May 2019-June 2025) mentioning semaglutide or tirzepatide. A total of 67,008 users self-reported using these medications, and 43.5% described at least one side effect. Gastrointestinal symptoms predominated, including nausea (36.9%), fatigue (16.7%), vomiting (16.3%), constipation (15.3%), and diarrhea (12.6%). Notably, reproductive symptoms (e.g., menstrual irregularities) and temperature-related complaints (e.g., chills, hot flashes) emerged as unrecognized potential effects. These findings highlight patient concerns not well captured in current labeling or trials. Large-scale social media analysis can complement traditional pharmacovigilance by detecting emerging safety signals and expanding understanding of the real-world safety profile of GLP-1 RAs.
Scientific discovery increasingly relies on AI systems to select candidates for expensive experimental validation, yet no principled, budget-aware evaluation framework exists for comparing selection strategies -- a gap intensified by large language models (LLMs), which generate plausible scientific proposals without reliable downstream evaluation. We introduce the Budget-Sensitive Discovery Score (BSDS), a formally verified metric -- 20 theorems machine-checked by the Lean 4 proof assistant -- that jointly penalizes false discoveries (lambda-weighted FDR) and excessive abstention (gamma-weighted coverage gap) at each budget level. Its budget-averaged form, the Discovery Quality Score (DQS), provides a single summary statistic that no proposer can inflate by performing well at a cherry-picked budget. As a case study, we apply BSDS/DQS to: do LLMs add marginal value to an existing ML pipeline for drug discovery candidate selection? We evaluate 39 proposers -- 11 mechanistic variants, 14 zero-shot LLM configurations, and 14 few-shot LLM configurations -- using SMILES representations on MoleculeNet HIV (41,127 compounds, 3.5% active, 1,000 bootstrap replicates) under both random and scaffold splits. Three findings emerge. First, the simple RF-based Greedy-ML proposer achieves the best DQS (-0.046), outperforming all MLP variants and LLM configurations. Second, no LLM surpasses the Greedy-ML baseline under zero-shot or few-shot evaluation on HIV or Tox21, establishing that LLMs provide no marginal value over an existing trained classifier. Third, the proposer hierarchy generalizes across five MoleculeNet benchmarks spanning 0.18%-46.2% prevalence, a non-drug AV safety domain, and a 9x7 grid of penalty parameters (tau >= 0.636, mean tau = 0.863). The framework applies to any setting where candidates are selected under budget constraints and asymmetric error costs.
Collecting multiple types of data on the same set of subjects is common in modern scientific applications including, genomics, metabolomics, and neuroimaging. Joint and Individual Variance Explained (JIVE) seeks a low-rank approximation of the joint variation between two or more sets of features captured on common subjects and isolates this variation from that unique to eachset of features. We develop an expectation-maximization (EM) algorithm to estimate a probabilistic model for the JIVE framework. The model extends probabilistic principal components analysis to multiple data sets. Our maximum likelihood approach simultaneously estimates joint and individual components, which can lead to greater accuracy compared to other methods. We apply ProJIVE to measures of brain morphometry and cognition in Alzheimer's disease. ProJIVE learns biologically meaningful courses of variation, and the joint morphometry and cognition subject scores are strongly related to more expensive existing biomarkers. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Code to reproduce the analysis is available on our GitHub page.
A key challenge in enzyme annotation is identifying the biochemical reactions catalyzed by proteins. Most existing methods rely on Enzyme Commission (EC) numbers as intermediaries: they first predict an EC number and then retrieve the associated reactions. This indirect strategy introduces ambiguity due to the complex many-to-many mappings among proteins, EC numbers, and reactions, and is further complicated by frequent updates to EC numbers and inconsistencies across databases. To address these challenges, we present RXNRECer, a transformer-based ensemble framework that directly predicts enzyme-catalyzed reactions without relying on EC numbers. It integrates protein language modeling and active learning to capture both high-level sequence semantics and fine-grained transformation patterns. Evaluations on curated cross-validation and temporal test sets demonstrate consistent improvements over six EC-based baselines, with gains of 16.54% in F1 score and 15.43% in accuracy. Beyond accuracy gains, the framework offers clear advantages for downstream applications, including scalable proteome-wide reaction annotation, enhanced specificity in refining generic reaction schemas, systematic annotation of previously uncurated proteins, and reliable identification of enzyme promiscuity. By incorporating large language models, it also provides interpretable rationales for predictions. These capabilities make RXNRECer a robust and versatile solution for EC-free, fine-grained enzyme function prediction, with potential applications across multiple areas of enzyme research and industrial applications.
Predicting the effects of chemical and genetic perturbations on quantitative cell states is a central challenge in computational biology, molecular medicine and drug discovery. Recent work has leveraged large-scale single-cell data and massive foundation models to address this task. However, such computational resources and extensive datasets are not always accessible in academic or clinical settings, hence limiting utility. Here we propose a lightweight framework for perturbation effect prediction that exploits the structured nature of biological interventions and specific inductive biases/invariances. Our approach leverages available information concerning perturbation effects to allow generalization to novel contexts and requires only widely-available bulk molecular data. Extensive testing, comparing predictions of context-specific perturbation effects against real, large-scale interventional experiments, demonstrates accurate prediction in new contexts. The proposed approach is competitive with SOTA foundation models but requires simpler data, much smaller model sizes and less time. Focusing on robust bulk signals and efficient architectures, we show that accurate prediction of perturbation effects is possible without proprietary hardware or very large models, hence opening up ways to leverage causal learning approaches in biomedicine generally.
Wildlife-vehicle collisions (WVC) threaten both biodiversity and human safety worldwide. Despite empirical efforts to characterize the major determinants of WVC risk and optimize mitigation strategies, we still lack a theoretical framework linking traffic, landscape, and individual movement features to collision risk. Here, we introduce such a framework by leveraging recent advances in movement ecology and reaction-diffusion stochastic processes with partially absorbing boundaries. Focusing on range-resident terrestrial mammals -- responsible for most fatal WVCs -- we model interactions with a single linear road and derive exact expressions for key survival statistics, including mean collision time and road-induced lifespan reduction. These quantities are expressed in terms of measurable parameters, such as traffic intensity or road width, and movement parameters that can be robustly estimated from relocation data, such as home-range crossing times, home-range sizes, or distance between home-range center and road. Therefore, our work provides an effective theoretical framework integrating movement and road ecology, laying the foundation for data-driven, evidence-based strategies to mitigate WVCs and promote safer, more sustainable transportation networks.
Predicting whether a molecule can cross the blood-brain barrier (BBB) is a key step in early-stage neuro-pharmaceutical design, directly influencing the efficiency and success rate of drug development. Traditional methods based on physicochemical properties are prone to systematic misjudgements due to their reliance on previous empirical evidence. Early machine learning (ML) models, although data-driven, often suffer from limited capacity, poor generalization, and insufficient interpretability. In recent years, more advanced models have become essential tools for predicting BBB permeability and guiding related drug design, owing to their ability to simulate molecular structures and capture complex biological mechanisms. This article systematically reviews the evolution of this field-from deep neural networks to graph-based structural modelling-highlighting the advantages of multi-task and multimodal learning strategies in identifying mechanism-related features. We further explore the emerging potential of generative models and causal inference methods for integrating permeability prediction with mechanism-aware drug design. Nowadays, ML-based BBB crossing prediction is in the critical transition from mere discriminative classification toward structure-function modelling from a mechanistic perspective. This paradigm shift provides a methodological progression and future roadmap for the integration of AI into neuropharmacological development.
Accurately characterizing higher-order interactions of brain regions and extracting interpretable organizational patterns from Functional Magnetic Resonance Imaging data is crucial for brain disease diagnosis. Current graph-based deep learning models primarily focus on pairwise or triadic patterns while neglecting signed higher-order interactions, limiting comprehensive understanding of brain-wide communication. We propose HOI-Brain, a novel computational framework leveraging signed higher-order interactions and organizational patterns in fMRI data for brain disease diagnosis. First, we introduce a co-fluctuation measure based on Multiplication of Temporal Derivatives to detect higher-order interactions with temporal resolution. We then distinguish positive and negative synergistic interactions, encoding them in signed weighted simplicial complexes to reveal brain communication insights. Using Persistent Homology theory, we apply two filtration processes to these complexes to extract signed higher-dimensional neural organizations spatiotemporally. Finally, we propose a multi-channel brain Transformer to integrate heterogeneous topological features. Experiments on Alzheimer' s disease, Parkinson' s syndrome, and autism spectrum disorder datasets demonstrate our framework' s superiority, effectiveness, and interpretability. The identified key brain regions and higher-order patterns align with neuroscience literature, providing meaningful biological insights.
Recent experiments in neuroscience reveal that task-relevant variables are often encoded in approximately orthogonal subspaces of neural population activity. These disentangled, or abstract, representations have been observed in multiple brain areas and across different species. These representations have been shown to support out of distribution generalization and rapid learning of novel tasks. The mechanisms by which these representations emerge remain poorly understood, especially in the case of supervised task behavior. Here, we show mathematically that abstract representations of latent variables are guaranteed to appear in the hidden layer of feedforward nonlinear networks when they are trained on tasks that depend directly on these latent variables. These learned abstract representations reflect the semantics of the input stimuli. To show this, we reformulate the usual optimization over the network weights into a mean field optimization problem over the distribution of neural preactivations. We then apply this framework to finite-width ReLU networks and show that the hidden layer of these networks will exhibit an abstract representation at all global minima of the task objective. Finally, we extend our findings to two broad families of activation functions as well as deep feedforward architectures. Together, our results provide an explanation for the widely observed abstract representations in both the brain and artificial neural networks. In addition, the general framework that we develop here provides a mathematically tractable toolkit for understanding the emergence of different kinds of representations in task-optimized, feature-learning network models.
Public omics databases like the Gene Expression Omnibus and the Sequence Read Archive offer substantial opportunities for data reuse to address novel biomedical questions. However, it is still difficult to find samples and studies of interest since they are described by free-text metadata and lack standardized annotations. To address this issue, multiple research groups have undertaken curation efforts to add standardized annotations to large collections of these data, but these annotations are fragmented across online resources and are stored in different formats subject to varying standardization criteria, hindering the integration of annotations across sources. We developed MetaHQ to harmonize and distribute standardized metadata for public omics samples. MetaHQ comprises a database with nearly 200,000 annotations from 13 sources and a user-friendly command-line interface (CLI) to query the database and retrieve annotations. The MetaHQ CLI is deployed as a Python Package on PyPI at this https URL that accesses the MetaHQ database available at this https URL. Project source code and documentation are available at this https URL.
The biomedical literature contains a vast collection of omics studies, yet most published data remain functionally inaccessible for computational reuse. When raw data are deposited in public repositories, essential information for reproducing reported results is dispersed across main text, supplementary files, and code repositories. In rarer instances where intermediate data is made available (e.g. protein abundance files), its location is irregular. In this article, we present an agentic framework that fetches omics-related articles and transforms the unstructured information into searchable research objects. Our system employs large language model (LLM) agents with access to tools for fetching omics studies, extracting article metadata, identifying and downloading published data, executing containerized quantification pipelines, and running analyses to address novel question. We demonstrate automated metadata extraction from PubMed Central articles, achieving 80% precision for dataset identification from standard data repositories. Using model context protocol (MCP) servers to expose containerized analysis tools, our set of agents were able to identify a set of relevant articles, download the associated datasets, and re-quantify the proteomics data. The results had a 63% overlap in differentially expressed proteins when matching reported preprocessing methods. Furthermore, we show that agents can identify semantically similar studies, determine data compatibility, and perform cross-study comparisons, revealing consistent protein regulation patterns in liver fibrosis. This work establishes a foundation for converting the static biomedical literature into an executable, queryable resource that enables automated data reuse at scale.
We propose denoising diffusion variational inference (DDVI), a black-box variational inference algorithm for latent variable models which relies on diffusion models as flexible approximate posteriors. Specifically, our method introduces an expressive class of diffusion-based variational posteriors that perform iterative refinement in latent space; we train these posteriors with a novel regularized evidence lower bound (ELBO) on the marginal likelihood inspired by the wake-sleep algorithm. Our method is easy to implement (it fits a regularized extension of the ELBO), is compatible with black-box variational inference, and outperforms alternative classes of approximate posteriors based on normalizing flows or adversarial networks. We find that DDVI improves inference and learning in deep latent variable models across common benchmarks as well as on a motivating task in biology -- inferring latent ancestry from human genomes -- where it outperforms strong baselines on the Thousand Genomes dataset.
Consider a population that is expanding in two-dimensional space. Suppose we collect data from a sample of individuals taken at random either from the entire population, or from near the outer boundary of the population. A quantity of interest in population genetics is the site frequency spectrum, which is the number of mutations that appear on $k$ of the $n$ sampled individuals, for $k = 1, \dots, n-1$. As long as the mutation rate is constant, this number will be roughly proportional to the total length of all branches in the genealogical tree that are on the ancestral line of $k$ sampled individuals. While the rigorous literature has primarily focused on models without any spatial structure, in many natural settings, such as tumors or bacteria colonies, growth is dictated by spatial constraints. Many such two dimensional growth models are expected to fall in the KPZ universality class. In this article we adopt the perspective that for population models in the KPZ universality class, the genealogical tree can be approximated by the tree formed by the infinite upward geodesics in the directed landscape, a universal scaling limit constructed in \cite{dov22}, starting from $n$ randomly chosen points. Relying on geodesic coalescence, we prove new asymptotic results for the lengths of the portions of these geodesics that are ancestral to $k$ of the $n$ sampled points and consequently obtain exponents driving the site frequency spectrum as predicted in \cite{fgkah16}. An important ingredient in the proof is a new tight estimate of the probability that three infinite upward geodesics stay disjoint up to time $t$, i.e., a sharp quantitative version of the well studied N3G problem, which is of independent interest.
This article frames the relation between biology and physics by characterizing the former as a subdiscipline rather than a special case of the latter. To do this, we posit biological physics as the science of living matter in contrast to classic biophysics, the study of organismal properties by physical techniques. At the scale of the individual cell, living matter is nonunitary, i.e., not composed of aggregated subunits, and has features (e.g., intracellular organizational arrangements and biomolecular condensates) that are unlike any materials of the nonliving world. In transiently or constitutively multicellular forms (social microorganisms, animals, plants), living matter sustains physical processes that are generic (shared with nonliving matter, e.g., subunit communication by molecular diffusion in cellular slime molds), biogeneric (analogous to nonliving matter but realized through cellular activities, e.g., subunit demixing in animal embryos) or nongeneric (pertaining to sui generis materials, e.g., budding of active solids in plants). This "forms of matter" perspective is philosophically situated in the dialectical materialism of Engels and Hessen and the multilevel physicalism of Neurath and the logical empiricists. We counterpose this view to informationism and to genetic and other hierarchically reductionist physical theories of biological systems and highlight open questions regarding incompletely characterized and enigmatic forms of living matter.