New articles on Quantitative Biology


[1] 2604.00036

When and Where: A Model Hippocampal Network Unifies Formation of Time Cells and Place Cells

Hippocampal place and time cells encode spatial and temporal aspects of experience. Both have the same neural substrate, but have been modeled as having different functions and mechanistic origins, place cells as continuous attractors, and time cells as leaky integrators. Here, we show that both types emerge from two dynamical regimes of a single recurrent network (RNN) modeling hippocampal CA3 as a predictive autoencoder. The network receives simulated, partially occluded ``experience vectors" containing spatial patterns (location-specific activity sampled during environmental traversal) and/or temporal patterns (correlated activity pairs separated by ``void" intervals), and is trained to reconstruct missing input. During spatial navigation, the network generates stable attractor-like place fields. But trained on temporally structured inputs, the network produces sequentially broadened fields, recapitulating time cells. By varying spatio-temporal input patterning, we observe hidden units transition smoothly between time cell-like and place cell-like representations. These results suggest a shared origin, but task-driven difference, between place and time cells.


[2] 2604.00058

GenoBERT: A Language Model for Accurate Genotype Imputation

Genotype imputation enables dense variant coverage for genome-wide association and risk-prediction studies, yet conventional reference-panel methods remain limited by ancestry bias and reduced rare-variant accuracy. We present Genotype Bidirectional Encoder Representations from Transformers (GenoBERT), a transformer-based, reference-free framework that tokenizes phased genotypes and uses a self-attention mechanism to capture both short- and long-range linkage disequilibrium (LD) dependencies. Benchmarking on two independent datasets including the Louisiana Osteoporosis Study (LOS) and the 1000 Genomes Project (1KGP) across ancestry groups and multiple genotype missingness levels (5-50%) shows that GenoBERT achieves the highest overall accuracy compared to four baseline methods (Beagle5.4, SCDA, BiU-Net, and STICI). At practical sparsity levels (up to 25% missing), GenoBERT attains high overall imputation accuracy ($r^2 approx 0.98$) across datasets, and maintains robust performance ($r^2 > 0.90$) even at 50% missingness. Experimental results across different ancestries confirm consistent gains across datasets, with resilience to small sample sizes and weak LD. A 128-SNP (single-nucleotide polymorphism) context window (approximately 100 Kb) is validated through LD-decay analyses as sufficient to capture local correlation structures. By eliminating reference-panel dependence while preserving high accuracy, GenoBERT provides a scalable and robust solution for genotype imputation and a foundation for downstream genomic modeling.


[3] 2604.00065

Genetic algorithms for multi-omic feature selection: a comparative study in cancer survival analysis

Multi-omic datasets offer opportunities for improved biomarker discovery in cancer research, but their high dimensionality and limited sample sizes make identifying compact and effective biomarker panels challenging. Feature selection in large-scale omics can be efficiently addressed by combining machine learning with genetic algorithms, which naturally support multi-objective optimization of predictive accuracy and biomarker set size. However, genetic algorithms remain relatively underexplored for multi-omic feature selection, where most approaches concatenate all layers into a single feature space. To address this limitation, we introduce Sweeping*, a multi-view, multi-objective algorithm alternating between single- and multi-view optimization. It employs a nested single-view multi-objective optimizer, and for this study we use the genetic algorithm NSGA3-CHS. It first identifies informative biomarkers within each layer, then jointly evaluates cross-layer interactions; these multi-omic solutions guide the next single-view search. Through repeated sweeps, the algorithm progressively identifies compact biomarker panels capturing cross-modal complementary signals. We benchmark five Sweeping* strategies, including hierarchical and concatenation-based variants, using survival prediction on three TCGA cohorts. Each strategy jointly optimizes predictive accuracy and set size, measured via the concordance index and root-leanness. Overall performance and estimation error are assessed through cross hypervolume and Pareto delta under 5-fold cross-validation. Our results show that Sweeping* can improve the accuracy-complexity trade-off when sufficient survival signal is present and that integrating omic layers can enhance survival prediction beyond clinical-only models, although benefits remain cohort-dependent.


[4] 2604.00075

Large Language Models for Variant-Centric Functional Evidence Mining

Functional evidence is essential for clinical interpretation of genomic variants, but identifying relevant studies and translating experimental results into structured evidence remains labor intensive. We developed a benchmark based on ClinGen curated annotations to evaluate two large language models (LLMs), a non reasoning model (gpt-4o-mini) and a reasoning model (o4-mini), on tasks relevant to functional evidence curation: (1) abstract screening to determine whether a study reports functional experiments directly testing specific variants, and (2) full text evidence extraction and classification from matched variant-paper pairs, including interpretation of evidence direction and generation of evidence summaries. Starting from ClinGen variants annotated with functional evidence, we processed curator comments with an LLM to extract PubMed identifiers, evidence labels, and narrative, and retrieved titles, abstracts, and open access PDFs to construct variant-paper pairs. In abstract screening, both models achieved high recall (0.88-0.90) with moderate specificity (0.59-0.65). For full text evidence classification under an explicit variant matching gate, o4-mini achieved 96% accuracy and higher specificity (0.83 vs. 0.37) while maintaining high F1 (0.98 vs. 0.96) compared with gpt-4o-mini. We also used an LLM-as-judge protocol to compare model generated evidence summaries with expert curator comments. Finally, we developed AcmGENTIC, an end to end pipeline that expands variant identifiers, retrieves literature via LitVar2, filters abstracts with LLMs, acquires PDFs, performs multimodal evidence extraction, and generates evidence reports for curator review, with optional agentic parsing of figures and tables. Together, this benchmark and pipeline provide a practical framework for scaling functional evidence curation with human in the loop LLM assistance.


[5] 2604.00153

Macroscopic Signatures of Gauge-Mediated Contagion: Deriving Behavioral Shielding from Stochastic Field Theory

We present a unified theoretical model relating stochastic microscopic epidemic dynamics with macroscopic non-linear population behavior. Utilizing the Doi-Peliti formalism, we model the pathogen as a gauge mediator field coupled to susceptible and infected host populations, and introduce a Reactive Immunity Field capable of spontaneous symmetry breaking. We demonstrate that the naive epidemic vacuum is destabilized by radiative loop corrections via the Coleman-Weinberg mechanism, generating a dynamic herd immunity threshold. By extracting the classical saddle-point limit of the Effective Action, we derive the macroscopic reaction-diffusion equations governing the host population. We show that integrating out the gauge mediator inherently generates a thermodynamic Free Energy dependent on the square of the susceptible density. This non-linearity produces a macroscopic spatial ``Fear Drift'' proportional to the magnitude of the immunity field, and a cubic shielding penalty in the effective reproductive number ($R_{eff}$). In this work, we establish a mapping between fundamental field-theoretic mechanisms and specific terms in the macroscopic behavioral equations. We demonstrate that Debye screening is physically executed by the spatial cross-diffusion fluxes driving host evacuation. Simultaneously, vacuum polarization manifests as a non-linear cubic penalty ($-S^3 I$) in the dressed reaction rate that dynamically suppresses the effective reproductive number. As a validation of our model, we apply the formalism to high-resolution spatiotemporal COVID-19 data from Germany.


[6] 2604.00349

Ultrasonic Brain Computer Interfaces for Enhancing Human-Machine Cognition

Low-intensity transcranial focused ultrasound (tFUS) is rapidly emerging as a transformative non-invasive brain stimulation (NIBS) modality characterized by high spatial resolution and ability to target deep brain circuits. Unlike electromagnetic techniques such as transcranial magnetic stimulation and transcranial direct current stimulation, which are constrained by centimeter-scale resolution and a depth-focality tradeoff, tFUS leverages mechanical pressure waves to modulate both superficial cortical and deep subcortical structures with millimeter precision. This article discusses recent scientific observations and engineering breakthroughs in the advancement of tFUS for next-generation ultrasonic brain-computer interfaces (uBCIs) and human-machine interfaces. These advancements move beyond open-loop systems and demonstrate closed-loop architectures that incorporate real-time electrophysiological feedback to optimize cognitive variables such as attention, learning, trust, and cooperation in various applications. Other advances in the development of ultrasound sensors for sonomyography to decode muscle activation and functional ultrasound to monitor hemodynamic brain activity are discussed as potential elements in bidirectional uBCIs. Together, these advances position ultrasound as a foundational technology for the development of intelligent, adaptive, and bidirectional neural interfaces that will seamlessly integrate human cognition with next-generation automation and robotic systems.


[7] 2604.00393

How to Forage for a Mate?

Foraging is a central decision-making behavior performed by all animals, essential to garnishing enough energy for an organism to survive. Similarly, mating is crucial for evolutionary continuity and offspring production. Mate choice is one of the central tenets of sexual selection, driving major evolutionary processes, and can be regarded as a decision-making process between potential mating partners. Often researchers have used coarse-grained models to describe macroscopic phenomenology pertaining to mate choice without detailed quantitative mechanisms of how animals use individual and environmental signals to guide their mating decisions. In this letter, we show that mate choice can be cast as a foraging problem, and we present an analytically tractable optimal foraging-inspired mechanistic theory of decision-making underlying mate choice. We begin from the premise that deciding upon which partner with which to mate is at its core a stochastic decision-making process. Agents adopt a variety of decision strategies, tuned by decision thresholds for leaving or committing to a mate. We find that sensitive leaving thresholds are favored independently of signal availability in the population. By contrast, optimal thresholds for committing to a mate depend upon signal availability in the population, with signal-rich populations generally favoring less eager strategies compared to signal-poor populations.


[8] 2604.00467

Stiff-FCS: Single-Cell Stiffness Profiling With Integrated Molecular and Functional Analysis

Cell stiffness is a key determinant of how cells deform, migrate, and adapt to mechanically restrictive environments, yet existing single-cell stiffness assays remain difficult to combine with molecular analysis and downstream functional studies. To address these limitations, we introduce a microfluidic platform, stiffness-based ferrohydrodynamic cell sorting (Stiff-FCS), designed for high-throughput quantification of single-cell stiffness, on-chip molecular analysis, and post-assay cell recovery. Stiff-FCS combines ferrofluid-driven actuation with graded confinement channels to control cell movement, induce deformation, and spatially separate cells based on stiffness. An inverse computational model converts cell position and morphology into quantitative Young's modulus values. We demonstrate stiffness profiling of hundreds to thousands of cells per chip within minutes, same-cell fluorescence-based protein analysis, and recovery of stiffness-defined cells for downstream assays. Across diverse human and mouse cell lines, Lamin A/C showed the most consistent association with stiffness, whereas softer cells exhibited greater migratory capacity than stiffer cells. In a series of human head and neck cancer cell models, Stiff-FCS further resolved a stiff, less migratory subpopulation enriched in a higher-molecular-weight Vimentin state, offering a workflow for linking single-cell stiffness to molecular heterogeneity and cell behavior.


[9] 2604.00602

The fitness landscape of overlapping genes

Natural genomes sometimes encode two different proteins in staggered reading frames of the same DNA sequence. Despite the prevalence of these 'overlapping genes' across the tree of life, it remains unknown whether arbitrary protein pairs can overlap, to what extent such overlaps are feasible, or what design principles govern them. Here, we study compatibility, frustration, and connectivity in the fitness landscape of overlapping genes. We computationally design sequences de novo that satisfy the dual functional constraints of two distinct protein families. The joint fitness landscape, inferred via Potts models from multiple sequence alignments, reveals a fundamental trade-off between the two proteins and provides a simple criterion for when overlap is feasible. We find widespread compatibility between protein families, with one class of reading frames markedly more permissible than others. By exploring alternative genetic codes, we find that the natural genetic code is uniquely well-suited to support overlapping genes. Constructing mutational paths between sequences, we find that sequence-diverse overlapped genes can be connected via a network of near-neutral mutations. Overall, our results suggest that protein fitness landscapes are sufficiently flexible so as to accommodate the stringent, orthogonal requirements of overlapping genes.


[10] 2604.00871

Analytical characterisation of the Mi- and To-phases in HeMiTo dynamics: exponential growth and logistic saturation of toxic prion-like proteins

Prion-like propagation of misfolded proteins is a key mechanism underlying the progression of neurodegenerative diseases such as Alzheimer's disease. In previous work, we introduced the HeMiTo framework, describing these prion-like dynamics for a class of heterodimer models in terms of three phases: the healthy (He), mixed (Mi), and toxic (To) phases. While the He-phase was characterised analytically, the Mi-phase was described numerically and the To-phase was inferred from linear stability arguments. In this work, we provide a complete analytical characterisation of the Mi- and To-phases for our class of heterodimer models. We derive exact inner solutions governing the Mi-phase and match them with outer solutions from the He-phase, explaining the concave-like behaviour of the healthy species and establishing explicit conditions for exponential growth of the toxic species with a mechanistically interpretable growth rate. Furthermore, we formalise a quasi steady-state reduction near the toxic steady state and show that the dynamics reduce to a logistic growth equation, linking exponential growth to saturation. Together, these results provide a unified and mechanistic description of prion-like dynamics across all phases of disease progression and establish a foundation for predictive modelling of biomarker trajectories.


[11] 2604.01187

Competition at the front of expanding populations

When competing species grow into new territory, the population is dominated by descendants of successful ancestors at the expansion front. Successful ancestry depends on both the reproductive advantage (fitness), as well as ability and opportunity to colonize new domains. We present a model that integrates both elements by coupling the classic description of one-dimensional competition (Fisher equation) to the minimal model of front shape (KPZ equation). Macroscopic manifestations of these equations are distinct growth morphologies controlled by expansion rates, competitive abilities, or spatial anisotropy. In some cases the ability to expand in space may overcome reproductive advantage in colonizing new territory. When new traits appear with accumulating mutations, we find that variations in fitness in range expansion may be described by the Tracy--Widom distribution.


[12] 2604.00243

UCell: rethinking generalizability and scaling of bio-medical vision models

The modern deep learning field is a scale-centric one. Larger models have been shown to consistently perform better than smaller models of similar architecture. In many sub-domains of biomedical research, however, the model scaling is bottlenecked by the amount of available training data, and the high cost associated with generating and validating additional high quality data. Despite the practical hurdle, the majority of the ongoing research still focuses on building bigger foundation models, whereas the alternative of improving the ability of small models has been under-explored. Here we experiment with building models with 10-30M parameters, tiny by modern standards, to perform the single-cell segmentation task. An important design choice is the incorporation of a recursive structure into the model's forward computation graph, leading to a more parameter-efficient architecture. We found that for the single-cell segmentation, on multiple benchmarks, our small model, UCell, matches the performance of models 10-20 times its size, and with a similar generalizability to unseen out-of-domain data. More importantly, we found that ucell can be trained from scratch using only a set of microscopy imaging data, without relying on massive pretraining on natural images, and therefore decouples the model building from any external commercial interests. Finally, we examined and confirmed the adaptability of ucell by performing a wide range of one-shot and few-shot fine tuning experiments on a diverse set of small datasets. Implementation is available at this https URL


[13] 2604.00246

Harmonization mitigates diffusion MRI scanner effects in infancy: insights from the HEALthy Brain and Childhood Development (HBCD) study

The HEALthy Brain and Childhood Development (HBCD) Study is an ongoing longitudinal initiative to understand population-level brain maturation; however, large-scale studies must overcome site-related variance and preserve biologically relevant signal. In addition to diffusion-weighted magnetic resonance imaging images, the HBCD dataset offers analysis-ready derivatives for scientists to conduct their analysis, including scalar diffusion tensor (DTI) metrics in a predetermined set of bundles. The purpose of this study is to characterize HBCD-specific site effects in diffusion MRI data, which have not been systematically reported. In this work, we investigate the sensitivity of HBCD bundle metrics to scanner model-related variance and address these variations with ComBat-GAM harmonization within the current HBCD data release 1.1 across six scanner models. Following ComBat-GAM, we observe zero statistically significant differences between the distributions from any scanner model following FDR correction and reduce Cohen's f effect sizes across all metrics. Our work underscores the importance of rigorous harmonization efforts in large-scale studies, and we encourage future investigations of HBCD data to control for these effects.


[14] 2604.00251

Evaluation of neuroCombat and deep learning harmonization for multi-site magnetic resonance neuroimaging in youth with prenatal alcohol exposure

In cases of prevalent diseases and disorders, such as Prenatal Alcohol Exposure (PAE), multi-site data collection allows for increased study samples. However, multi-site studies introduce additional variability through heterogeneous collection materials, such as scanner and acquisition protocols, which confound with biologically relevant signals. Neuroscientists often utilize statistical methods on image-derived metrics, such as volume of regions of interest, after all image processing to minimize site-related variance. HACA3, a deep learning harmonization method, offers an opportunity to harmonize image signals prior to metric quantification; however, HACA3 has not yet been validated in a pediatric cohort. In this work, we investigate HACA3's ability to remove site-related variance and preserve biologically relevant signal compared to a statistical method, neuroCombat, and pair HACA3 processing with neuroCombat to evaluate the efficacy of multiple harmonization methods in a pediatric (age 7 to 21) population across three unique scanners with controls and cases of PAE with downstream MaCRUISE volume metrics. We find that HACA3 qualitatively improves inter-site contrast variations, but statistical methods reduce greater site-related variance within the MaCRUISE volume metrics following an ANCOVA test, and HACA3 relies on follow-up statistical methods to approach maximal biological preservation in this context.


[15] 2604.00470

Contact-Dependent Ion Gating Explains Directional Asymmetry in the Bacterial Flagellar Motor

The bacterial flagellar motor (BFM) is a rotary molecular machine driven by the ion electrochemical potential across the cell membrane. Recent cryo-EM structures reveal a cogwheel-like architecture in which multiple stators engage a large rotor. A longstanding puzzle is the directional asymmetry of its torque-speed relation: concave in counterclockwise (CCW) rotation but nearly linear in clockwise (CW) rotation. Here, we develop a stochastic mechanochemical model that explicitly incorporates rotor-stator coupling and detailed ion translocation kinetics. By integrating physiological torque-speed data with recent measurements of rotor-stator relative motion, we show that under physiological conditions the motor operates in a tight engagement regime, rendering the torque-speed relation largely insensitive to the specific form of mechanical interactions. This finding rules out differences in rotor-stator mechanics as the origin of CW-CCW asymmetry. Guided by cryo-EM structures, we propose a contact-dependent gating mechanism in which the MotA-FliG interaction modulates the ion release rate of the MotB subunit proximal to the FliG ring. Molecular dynamics simulations indicate tighter MotA-FliG contact in the CW motor, implying a reduced ion release rate compared to CCW. Our model demonstrates that differential gating strength accounts for the observed asymmetry: stronger gating in CCW shortens torque-free waiting phases, enhances torque generation, and produces a concave torque-speed curve, whereas weaker gating in CW yields lower torque and a linear relation. This structure-based framework quantitatively links molecular asymmetry to motor function and identifies specific interfaces for targeted perturbation and mutational studies.


[16] 2604.00580

Representation choice shapes the interpretation of protein conformational dynamics

Molecular dynamics simulations provide detailed trajectories at the atomic level, but extracting interpretable and robust insights from these high-dimensional data remains challenging. In practice, analyses typically rely on a single representation. Here, we show that representation choice is not neutral: it fundamentally shapes the conformational organization, similarity relationships, and apparent transitions inferred from identical simulation data. To complement existing representations, we introduce Orientation features, a geometrically grounded, rotation-aware encoding of protein backbone. We compare it against common descriptions across three dynamical regimes: fast-folding proteins, large-scale domain motions, and protein-protein association. Across these systems, we find that different representations emphasize complementary aspects of conformational space, and that no single representation provides a complete picture of the underlying dynamics. To facilitate systematic comparison, we developed ManiProt, a library for efficient computation and analysis of multiple protein representations. Our results motivate a comparative, representation-aware framework for the interpretation of molecular dynamics simulations.


[17] 2604.00756

Stochastic ordering tools for continuous-time Markov chains and applications to reaction network models

Stochastic reaction networks are mathematical models with a wide range of applications in biochemistry, ecology, and epidemiology, and are often complex to analyze. Except for some special cases, it is generally difficult to predict how the abundances of all considered species evolve over time. A possible approach to address this issue is to develop tools to compare the model under study with a similar one whose behavior is better understood. The main contribution of our work is to provide direct and computable conditions that can be used to ensure the existence of an ordered coupling between two stochastic reaction networks and to identify which parameter changes in a given model lead to an increase or decrease in the count of certain species. We also make available an algorithm that implements our theory, and we illustrate it with several applications.


[18] 2604.00763

Non-ignorable fuzziness in granular counts: the case of RNA-seq data

RNA-seq count data are often affected by read-to-gene alignment ambiguity, especially in high-dimensional transcriptomics. This type of ambiguity can be conveniently expressed through granular counts, namely fuzzy-valued observations of latent discrete quantities. We study a class of fuzzy-reporting mechanisms and show that, when reporting exploits graded membership, ignorability fails generically, leading to a coarsening-not-at-random structure. A hierarchical model is then introduced as a tractable instance of this construction and illustrated using RNA-seq data.


[19] 2604.01018

A Bilevel Integer Programming Approach for the Synchronous Attractor Control Problem

Boolean networks are dynamical models of disease development in which the activation levels of genes are represented by binary variables. Given a Boolean network, controls represent mutations or medical treatments that fix the activation levels of selected genes so that all states in every attractor (i.e., long-term recurrent states) satisfy a desired phenotype. Our goal is to enumerate all minimal controls, identifying critical gene subsets in disease development and therapy. This problem has an inherent bilevel integer programming structure and is computationally challenging. We propose an infeasibility-based Benders decomposition, a logic-based Benders framework for bilevel integer programs with multiple subproblems. In our application, each subproblem finds a forbidden attractor of a given length and yields a problem-specific feasibility cut. We also propose an auxiliary IP called subspace separation that finds a Boolean subspace that includes multiple forbidden attractors and thereby strengthens the cut. Numerical experiments show that the resulting algorithms are much more scalable than state-of-the-art methods and that subspace separation substantially improves performance.


[20] 2604.01169

Bridging the Simulation-to-Experiment Gap with Generative Models using Adversarial Distribution Alignment

A fundamental challenge in science and engineering is the simulation-to-experiment gap. While we often possess prior knowledge of physical laws, these physical laws can be too difficult to solve exactly for complex systems. Such systems are commonly modeled using simulators, which impose computational approximations. Meanwhile, experimental measurements more faithfully represent the real world, but experimental data typically consists of observations that only partially reflect the system's full underlying state. We propose a data-driven distribution alignment framework that bridges this simulation-to-experiment gap by pre-training a generative model on fully observed (but imperfect) simulation data, then aligning it with partial (but real) observations of experimental data. While our method is domain-agnostic, we ground our approach in the physical sciences by introducing Adversarial Distribution Alignment (ADA). This method aligns a generative model of atomic positions -- initially trained on a simulated Boltzmann distribution -- with the distribution of experimental observations. We prove that our method recovers the target observable distribution, even with multiple, potentially correlated observables. We also empirically validate our framework on synthetic, molecular, and experimental protein data, demonstrating that it can align generative models with diverse observables. Our code is available at this https URL.


[21] 2604.01182

Digital nanophotonic biosensing empowered by silicon Mie voids

Optical biosensors are indispensable in medical and environmental diagnostics, yet existing approaches are fundamentally limited in their sensitivity due to ensemble-averaged measurements. Digital biosensing has emerged as a promising solution for resolving individual binding events, thereby providing signals at very low analyte concentrations down to the single-molecule level. Here, we present a novel concept for digital optical biosensing empowered by dielectric Mie voids, combining nanoparticle-based contrast enhancement and deep learning for ultrasensitive biomarker detection. The resonantly trapped light in the air cavities of the periodic Mie void arrays ensures strong overlap between the near-fields and the single gold nanoparticles that are captured on the surface in the presence of the protein biomarker. Remarkably, this strong interaction creates high-contrast digital signals for the precise counting of single nanoparticles located both within and outside the voids, yielding efficient use of the entire sensor area for high sensitivity. We employ deep-ultraviolet (DUV) lithography for the scalable and low-cost production of Mie voids in silicon wafers and automated image analysis with a convolutional neural network for robust nanoparticle counting. As a proof of our concept, we demonstrate the detection of an important disease biomarker, interleukin-6 (IL-6), from small sample volumes at concentrations as low as 1.84 pg/ml, within the physiological range of healthy individuals. Owing to its scalability, precision, and adaptability, our digital nanophotonic biosensing approach based on silicon Mie voids establishes a versatile route for applications ranging from bioanalytics to health and environmental monitoring.


[22] 2503.19115

Implementation of Support Vector Machines using Reaction Networks

Can machine learning algorithms be implemented using chemistry? We demonstrate that this is possible in the case of support vector machines (SVMs). SVMs are powerful tools for data classification, leveraging Vapnik-Chervonenkis theory to handle high-dimensional data and small datasets effectively. In this work, we propose a chemical reaction network scheme for implementing SVMs, utilizing the steady-state behavior of reaction network dynamics to model key computational aspects of SVMs. This approach introduces a novel biochemical framework for implementing machine learning algorithms in non-traditional computational environments.


[23] 2504.09614

Neural mechanisms of predictive processing: a collaborative community experiment through the OpenScope program

This review synthesizes advances in predictive processing within the sensory cortex. Predictive processing theorizes that the brain continuously predicts sensory inputs, refining neuronal responses by highlighting prediction errors. We identify key computational primitives, such as stimulus adaptation, dendritic computation, excitatory/inhibitory balance and hierarchical processing, as central to this framework. Our review highlights convergences, such as top-down inputs and inhibitory interneurons shaping mismatch signals, and divergences, including species-specific hierarchies and modality-dependent layer roles. To address these conflicts, we propose experiments in mice and primates using in-vivo two-photon imaging and electrophysiological recordings to test whether temporal, motor, and omission mismatch stimuli engage shared or distinct mechanisms. The resulting dataset, collected and shared via the OpenScope program, will enable model validation and community analysis, fostering iterative refinement and refutability to decode the neural circuits of predictive processing.


[24] 2507.06280

Direct Evidence of Apex-Hypha Interactions During Vegetative Growth of Fungal Thallus via Comprehensive Network and Trajectory Extraction

The mycelium of a filamentous fungus is a growing, branching network of numerous entangled hyphae exhibiting polarised apical growth. Expansion occurs during the vegetative phase from a single ascospore, driven by the need to explore and occupy surrounding space-limiting competitors, enhancing nutrient uptake, and promoting spore dispersal. Radial, rapid, and rectilinear growth combined with frequent branching appears adaptive. However, passive growth without interactions or feedback may produce suboptimal networks, as neither local density nor potential connectivity is considered. Reorientations of the apex near existing hyphae suggest apex-hypha feedback. Yet, the diversity of behaviours, spontaneous fluctuations, and limited apical trajectories studied leave open the question of active regulation. To investigate possible apex-hypha interactions, we analyse a dataset of Podospora anserina thallus growth by reconstructing all apical trajectories post-branching and fitting them with a classical Langevin model that incorporates potential interactions. Comparing isolated and non-isolated hyphae trajectories allows to identify a clear signature of interaction composed of abrupt deceleration and reorientation. This work opens the path towards a systematic exploration of hyphal interactions.


[25] 2510.00512

Adaptive Data-Knowledge Alignment in Genetic Perturbation Prediction

The transcriptional response to genetic perturbation reveals fundamental insights into complex cellular systems. While current approaches have made progress in predicting genetic perturbation responses, they provide limited biological understanding and cannot systematically refine existing knowledge. Overcoming these limitations requires an end-to-end integration of data-driven learning and existing knowledge. However, this integration is challenging due to inconsistencies between data and knowledge bases, such as noise, misannotation, and incompleteness. To address this challenge, we propose ALIGNED (Adaptive aLignment for Inconsistent Genetic kNowledgE and Data), a neuro-symbolic framework based on the Abductive Learning (ABL) paradigm. This end-to-end framework aligns neural and symbolic components and performs systematic knowledge refinement. We introduce a balanced consistency metric to evaluate the predictions' consistency against both data and knowledge. Our results show that ALIGNED outperforms state-of-the-art methods by achieving the highest balanced consistency, while also re-discovering biologically meaningful knowledge. Our work advances beyond existing methods to enable both the transparency and the evolution of mechanistic biological understanding.


[26] 2510.16082

BIOGEN: Evidence-Grounded Multi-Agent Reasoning Framework for Transcriptomic Interpretation in Antimicrobial Resistance

Interpreting gene clusters derived from RNA sequencing (RNA-seq) remains a persistent challenge in functional genomics, particularly in antimicrobial resistance studies where mechanistic context is essential for downstream hypothesis generation. Conventional pathway enrichment methods summarize co-expressed modules using predefined functional categories, but they often provide limited coverage and do not yield cluster-specific mechanistic explanations grounded in primary literature. We present BIOGEN, an evidence-grounded multi-agent framework for post hoc interpretation of RNA-seq transcriptional modules that integrates biomedical retrieval, structured interpretation, and multi-critic verification. BIOGEN organizes knowledge from PubMed and UniProt into traceable cluster-level explanations with explicit evidence reporting and confidence tiering. On the primary Salmonella enterica dataset, BIOGEN achieved strong evidence grounding and biological coherence, with a BERTScore of 0.689, RAGAS Faithfulness of 0.930, Semantic Alignment Score of 0.715, and KEGG Functional Similarity of 0.342. All retrieval-grounded configurations maintained a hallucination rate of 0.000, compared with 0.100 for the LLM-only baseline. Across four additional bacterial RNA-seq datasets, BIOGEN preserved zero hallucinations and provided broader thematic coverage than KEGG/ORA-based enrichment. Comparative experiments with representative agentic AI baselines further show that retrieval access alone is insufficient to ensure traceable biological interpretation, highlighting the importance of coordinated evidence grounding and verification in biomedical reasoning.


[27] 2602.13913

Gauge-Mediated Contagion: A Quantum Electrodynamics-Inspired Framework for Non-Local Epidemic Dynamics and Superdiffusion

In this paper, we introduce a gauge-mediated Epidemiological Model inspired by Quantum Electrodynamics (QED). In this model, the ``direct contact'' paradigm of classical SIR models is replaced by a gauge-mediated interaction where the environment, represented by a pathogen field $\varphi$, plays a fundamental role in the epidemic dynamics. In this model, the non-local characteristics of epidemics appear naturally by integrating out the pathogen field. Utilizing the Doi-Peliti formalism, we derive the effective action of the system and the standard Feynman rules that can be used to compute perturbatively any observables. The standard deterministic SIR equations emerge as the mean-field saddle-point approximation of this formalism. Going beyond this classical limit, we utilize 1-loop fluctuation computations to analytically derive spatial shielding effects that are inaccessible to standard compartmental models. Using standard QED techniques, we show how to relate renormalized pathogen mass, Debye screening, to epidemiological concepts and we compute at first order the effective reproductive number,$R_{eff}$, and how the condition to have an epidemic is related to a phase transition in the pathogen mass. We show that the superspreading hosts can be included easily in this formalism. We applied our model using high-resolution spatial data from the COVID-19 pandemic across 400 districts in Germany. Our analysis reveals that the gauge field provides a early warning signal, consistently anticipating surges in reported cases with a predictive lead time of approximately one week. Furthermore, the data analysis confirms a density-driven non-linear scaling in the correlation length. By linking out of equilibrium statistical physics to epidemiology, this model shows to be a predictive tool that anticipates outbreaks based on the structural instability of the network.


[28] 2603.29727

Latent-Y: A Lab-Validated Autonomous Agent for De Novo Drug Design

Drug discovery relies on iterative expert workflows that are slow to parallelize and difficult to scale. Here we introduce Latent-Y, an AI agent that autonomously executes complete antibody design campaigns from text prompts, covering literature review, target analysis, epitope identification, candidate design, computational validation, and selection of lab-ready sequences. Latent-Y is integrated into the Latent Labs Platform, where it operates in the same environment as drug-discovery experts with access to bioinformatics tools, biological databases, and scientific literature. The agent can run fully autonomously end-to-end, or collaboratively, where researchers review progress, provide feedback, and direct subsequent steps. Candidate antibodies are generated using Latent-X2, our frontier generative model for drug-like antibody design. We demonstrate the agent's capability across three distinct campaign types: epitope discovery guided by therapeutic specifications, cross-species binder design, and autonomous design from a scientific publication targeting human transferrin receptor for blood-brain barrier crossing. Across nine targets, Latent-Y produced lab-confirmed nanobody binders against six, achieving a 67% target-level success rate with binding affinities reaching the single-digit nanomolar range, without human filtering or intervention. In user studies, experts working with Latent-Y completed design campaigns 56 times faster than independent expert time estimates, compressing weeks of work into hours. Because Latent-X2 is a general-purpose atomic-level model for biologics design, the same agent architecture naturally extends to macrocyclic peptide and mini-binder design campaigns, broadening autonomous discovery across therapeutic modalities. Latent-Y is available to selected partners at this https URL.


[29] 2506.15471

Automatic computation of the glycemic index: data driven analysis of the glucose standard

The Glycemic Index (GI) is a tool for classifying carbohydrates based on their impact on postprandial glycemia, useful for diabetes prevention and management. This study applies a mathematical model for a data driven simulation of the glycemic response following glucose ingestion. The analysis is performed on a dataset of 35 healthy subjects undergone a standard 50 g oral glucose test. The results reveal a direct correlation between glucose response profiles and parameters describing glucose absorption, enabling the classification of subjects into three groups based on the timing of their glycemic peak: <30 min, 30-50 min, >50 min. These findings highlight the ability of a physiology-based mathematical model to capture inter-individual variability in postprandial glucose dynamics and represent a step toward simulation-based approaches for GI estimation.


[30] 2509.07013

Machine Generalize Learning in Agent-Based Models: Going Beyond Surrogate Models for Calibration in ABMs

Calibrating agent-based epidemic models is computationally demanding. We present a supervised machine learning calibrator that learns the inverse mapping from epidemic time series to SIR parameters. A three-layer bidirectional LSTM ingests 60-day incidence together with population size and recovery rate, and outputs transmission probability, contact rate, and R0. Training uses a composite loss with an epidemiology-motivated consistency penalty that encourages R0 \* recovery rate to equal transmission probability \* contact rate. In a 1000-scenario simulation study, we compare the calibrator with Approximate Bayesian Computation (likelihood-free MCMC). The method achieves lower error across all targets (MAE: R0 0.0616 vs 0.275; transmission 0.0715 vs 0.128; contact 1.02 vs 4.24), produces tighter predictive intervals with near nominal coverage, and reduces wall clock time from 77.4 s to 2.35 s per calibration. Although contact rate and transmission probability are partially nonidentifiable, the approach reproduces epidemic curves more faithfully than ABC, enabling fast and practical calibration. We evaluate it on SIR agent based epidemics generated with epiworldR and provide an implementation in R.