New articles on Quantitative Biology


[1] 2510.06230

Robust Federated Anomaly Detection Using Dual-Signal Autoencoders: Application to Kidney Stone Identification in Ureteroscopy

This work introduces Federated Adaptive Gain via Dual Signal Trust (FedAgain), a novel federated learning algorithm designed to enhance anomaly detection in medical imaging under decentralized and heterogeneous conditions. Focusing on the task of kidney stone classification, FedAgain addresses the common challenge of corrupted or low-quality client data in real-world clinical environments by implementing a dual-signal trust mechanism based on reconstruction error and model divergence. This mechanism enables the central server to dynamically down-weight updates from untrustworthy clients without accessing their raw data, thereby preserving both model integrity and data privacy. FedAgain employs deep convolutional autoencoders trained in two diverse kidney stone datasets and is evaluated in 16 types of endoscopy-specific corruption at five severity levels. Extensive experiments demonstrate that FedAgain effectively suppresses "expert forger" clients, enhances robustness to image corruptions, and offers a privacy-preserving solution for collaborative medical anomaly detection. Compared to traditional FedAvg, FedAgain achieves clear improvements in all 16 types of corruption, with precision gains of up to +14.49\% and F1 score improvements of up to +10.20\%, highlighting its robustness and effectiveness in challenging imaging scenarios.


[2] 2510.06232

Neu-RadBERT for Enhanced Diagnosis of Brain Injuries and Conditions

Objective: We sought to develop a classification algorithm to extract diagnoses from free-text radiology reports of brain imaging performed in patients with acute respiratory failure (ARF) undergoing invasive mechanical ventilation. Methods: We developed and fine-tuned Neu-RadBERT, a BERT-based model, to classify unstructured radiology reports. We extracted all the brain imaging reports (computed tomography and magnetic resonance imaging) from MIMIC-IV database, performed in patients with ARF. Initial manual labelling was performed on a subset of reports for various brain abnormalities, followed by fine-tuning Neu-RadBERT using three strategies: 1) baseline RadBERT, 2) Neu-RadBERT with Masked Language Modeling (MLM) pretraining, and 3) Neu-RadBERT with MLM pretraining and oversampling to address data skewness. We compared the performance of this model to Llama-2-13B, an autoregressive LLM. Results: The Neu-RadBERT model, particularly with oversampling, demonstrated significant improvements in diagnostic accuracy compared to baseline RadBERT for brain abnormalities, achieving up to 98.0% accuracy for acute brain injuries. Llama-2-13B exhibited relatively lower performance, peaking at 67.5% binary classification accuracy. This result highlights potential limitations of current autoregressive LLMs for this specific classification task, though it remains possible that larger models or further fine-tuning could improve performance. Conclusion: Neu-RadBERT, enhanced through target domain pretraining and oversampling techniques, offered a robust tool for accurate and reliable diagnosis of neurological conditions from radiology reports. This study underscores the potential of transformer-based NLP models in automatically extracting diagnoses from free text reports with potential applications to both research and patient care.


[3] 2510.06252

Dream2Image : An Open Multimodal EEG Dataset for Decoding and Visualizing Dreams with Artificial Intelligence

Dream2Image is the world's first dataset combining EEG signals, dream transcriptions, and AI-generated images. Based on 38 participants and more than 31 hours of dream EEG recordings, it contains 129 samples offering: the final seconds of brain activity preceding awakening (T-15, T-30, T-60, T-120), raw reports of dream experiences, and an approximate visual reconstruction of the dream. This dataset provides a novel resource for dream research, a unique resource to study the neural correlates of dreaming, to develop models for decoding dreams from brain activity, and to explore new approaches in neuroscience, psychology, and artificial intelligence. Available in open access on Hugging Face and GitHub, Dream2Image provides a multimodal resource designed to support research at the interface of artificial intelligence and neuroscience. It was designed to inspire researchers and extend the current approaches to brain activity decoding. Limitations include the relatively small sample size and the variability of dream recall, which may affect generalizability.


[4] 2510.06290

Soft-Evidence Fused Graph Neural Network for Cancer Driver Gene Identification across Multi-View Biological Graphs

Identifying cancer driver genes (CDGs) is essential for understanding cancer mechanisms and developing targeted therapies. Graph neural networks (GNNs) have recently been employed to identify CDGs by capturing patterns in biological interaction networks. However, most GNN-based approaches rely on a single protein-protein interaction (PPI) network, ignoring complementary information from other biological networks. Some studies integrate multiple networks by aligning features with consistency constraints to learn unified gene representations for CDG identification. However, such representation-level fusion often assumes congruent gene relationships across networks, which may overlook network heterogeneity and introduce conflicting information. To address this, we propose Soft-Evidence Fusion Graph Neural Network (SEFGNN), a novel framework for CDG identification across multiple networks at the decision level. Instead of enforcing feature-level consistency, SEFGNN treats each biological network as an independent evidence source and performs uncertainty-aware fusion at the decision level using Dempster-Shafer Theory (DST). To alleviate the risk of overconfidence from DST, we further introduce a Soft Evidence Smoothing (SES) module that improves ranking stability while preserving discriminative performance. Experiments on three cancer datasets show that SEFGNN consistently outperforms state-of-the-art baselines and exhibits strong potential in discovering novel CDGs.


[5] 2510.06297

Consciousness As Entropy Reduction (Short Version)

A model of consciousness is proposed which, having a logical basis, lends itself to simulation using a simple mathematical model called Consciousness as Entropy Reduction (CER). The approach has been inspired by previous models such as GWT, IIT and an earlier less mainstream model called "Feature Map" in Psychology. CER considers the contents of consciousness and subconsciousness as \textit{scenarios}: a vector of patterns (or features) on various "channels" (or feature locations). In CER, a feature map itself is not consciousness but only the input \textit{scenario} into a world of possible subconscious \textit{scenarios} from which the conscious \textit{scenario} (i.e., conscious experience) is chosen. Essentially, it creates an internal simulation of the outside world. Solving problems in simulation internally as a "thought experiment" is obviously more economical than doing experiments in a real environment and lends itself to adaptability and hence is a major evolutionary advantage. CER also has connections with the Hopfield model in artificial neural networks.


[6] 2510.06344

Retrieving the structure of probabilistic sequences from EEG data during the goalkeeper game

This work draws on the conjecture that fingerprints of stochastic event sequences can be retrieved from electroencephalographic data (EEG) recorded during a behavioral task. To test this, we used the Goalkeeper Game (this http URL). Acting as a goalkeeper, the participant predicted each kick in a probabilistic sequence while EEG activity was recorded. At each trial, driven by a context tree, the kicker chose one of three options: left, center, or right. The goalkeeper then predicted the next kick by pressing a button. Tree estimation was performed by applying the Context Algorithm to EEG segments locked to the button press (-300 to 0 ms). We calculated the distance between the penalty taker's tree and the trees retrieved per participant and electrode. This metric was then correlated with the goalkeeper's success rates. We observed a clear reduction in the overall distance distribution over time for a subset of electrodes, indicating that EEG dependencies become more congruent with the penalty taker's tree as the goalkeeper learns the sequence. This distance is inversely proportional to the goalkeepers' success rates, indicating a clear relationship between performance and the neural signatures associated with the sequence structure.


[7] 2510.06361

Diffusion-Guided Renormalization of Neural Systems via Tensor Networks

Far from equilibrium, neural systems self-organize across multiple scales. Exploiting multiscale self-organization in neuroscience and artificial intelligence requires a computational framework for modeling the effective non-equilibrium dynamics of stochastic neural trajectories. Non-equilibrium thermodynamics and representational geometry offer theoretical foundations, but we need scalable data-driven techniques for modeling collective properties of high-dimensional neural networks from partial subsampled observations. Renormalization is a coarse-graining technique central to studying emergent scaling properties of many-body and nonlinear dynamical systems. While widely applied in physics and machine learning, coarse-graining complex dynamical networks remains unsolved, affecting many computational sciences. Recent diffusion-based renormalization, inspired by quantum statistical mechanics, coarse-grains networks near entropy transitions marked by maximal changes in specific heat or information transmission. Here I explore diffusion-based renormalization of neural systems by generating symmetry-breaking representations across scales and offering scalable algorithms using tensor networks. Diffusion-guided renormalization bridges microscale and mesoscale dynamics of dissipative neural systems. For microscales, I developed a scalable graph inference algorithm for discovering community structure from subsampled neural activity. Using community-based node orderings, diffusion-guided renormalization generates renormalization group flow through metagraphs and joint probability functions. Towards mesoscales, diffusion-guided renormalization targets learning the effective non-equilibrium dynamics of dissipative neural trajectories occupying lower-dimensional subspaces, enabling coarse-to-fine control in systems neuroscience and artificial intelligence.


[8] 2510.06554

UniOTalign: A Global Matching Framework for Protein Alignment via Optimal Transport

Protein sequence alignment is a cornerstone of bioinformatics, traditionally approached using dynamic programming (DP) algorithms that find an optimal sequential path. This paper introduces UniOTalign, a novel framework that recasts alignment from a fundamentally different perspective: global matching via Optimal Transport (OT). Instead of finding a path, UniOTalign computes an optimal flow or transport plan between two proteins, which are represented as distributions of residues in a high-dimensional feature space. We leverage pre-trained Protein Language Models (PLMs) to generate rich, context-aware embeddings for each residue. The core of our method is the Fused Unbalanced Gromov-Wasserstein (FUGW) distance, which finds a correspondence that simultaneously minimizes feature dissimilarity and preserves the internal geometric structure of the sequences. This approach naturally handles sequences of different lengths and is particularly powerful for aligning proteins with nonsequential similarities, such as domain shuffling or circular permutations, which are challenging for traditional DP methods. UniOTalign therefore offers a new, mathematically principled, global matching paradigm for protein alignment, moving beyond the limitations of path-finding algorithms.


[9] 2510.06578

Novel point cloud registration approach for noninvasive patient specific estimation of leaflet strain from 3D images of heart valves

Valvular heart disease is prevalent and a major contributor to heart failure. Valve leaflet strain is a promising metric for evaluating the mechanics underlying the initiation and progression of valvular pathology. However, robust and generalizable methods for noninvasively quantifying valvular strain from clinically acquired patient images remain limited. In this work, we present a novel feature-tracking framework for quantifying leaflet strain in atrioventricular valves using 3D echocardiographic images of pediatric and adult patients. Our method demonstrated superior accuracy in the assessment of anatomical deformation and strain of heart valves compared to other point-based approaches, as verified against a finite element benchmark. Further, our approach can robustly track inter-phase deformation of valves across highly variable morphologies without parameter tuning. Our analysis revealed that a median and interquartile range of the 1st principal strain greater than 0.5 is associated with leaflet billow (prolapse). Further investigation of the biomechanical signatures of heart valve disease has the potential to enhance prognostic assessment and longitudinal evaluation of valvular disease.


[10] 2510.06671

Utilizing Information Theoretic Approach to Study Cochlear Neural Degeneration

Hidden hearing loss, or cochlear neural degeneration (CND), disrupts suprathreshold auditory coding without affecting clinical thresholds, making it difficult to diagnose. We present an information-theoretic framework to evaluate speech stimuli that maximally reveal CND by quantifying mutual information (MI) loss between inner hair cell (IHC) receptor potentials and auditory nerve fiber (ANF) responses and acoustic input and ANF responses. Using a phenomenological auditory model, we simulated responses to 50 CVC words under clean, time-compressed, reverberant, and combined conditions across different presentation levels, with systematically varied survival of low-, medium-, and high-spontaneous-rate fibers. MI was computed channel-wise between IHC and ANF responses and integrated across characteristic frequencies. Information loss was defined relative to a normal-hearing baseline. Results demonstrate progressive MI loss with increasing CND, most pronounced for time-compressed speech, while reverberation produced comparatively smaller effects. These findings identify rapid, temporally dense speech as optimal probes for CND, informing the design of objective clinical diagnostics while revealing problems associated with reverberation as a probe.


[11] 2510.06781

The Epigenetic Tapestry: A Review of DNA Methylation and Non-Coding RNA's Interplay with Genetic Threads, Weaving a Network Impacting Gene Expression and Disease Manifestations

The emerging field of epigenetics has recently unveiled a dynamic landscape in which gene expression is not determined solely by genetic sequences but also by intricate regulatory mechanisms. This review examines the interactions between these regulatory mechanisms, including DNA methylation and non-coding RNAs (ncRNAs), that orchestrate gene expression fine-tuning for cellular homeostasis and the pathogenesis of a multitude of diseases. We explore long non-coding RNAs (lncRNAs) such as telomeric repeat-containing RNA (TERRA) and Fendrr, highlighting their role in protein regulation to ensure proper gene activation or silencing. Additionally, we explain the therapeutic potential of brain-derived neurotrophic factor (BDNF)-related microRNA 132, which has shown promise in treating chronic illnesses by restoring BDNF levels. Finally, this review covers the role of DNA methyltransferases and ncRNAs in cancer, focusing on how lncRNAs contribute to X chromosome inactivation and interact with chromatin-modifying complexes and DNA methyltransferase inhibitors to reduce cancer cell aggressiveness. By amalgamating the wide array of research in this field, we aim to provide glimpses into the complex entangling of genetics and environment as they control gene expressions.


[12] 2510.06914

Gradient of White Matter Functional Variability via fALFF Differential Identifiability

Functional variability in both gray matter (GM) and white matter (WM) is closely associated with human brain cognitive and developmental processes, and is commonly assessed using functional connectivity (FC). However, as a correlation-based approach, FC captures the co-fluctuation between brain regions rather than the intensity of neural activity in each region. Consequently, FC provides only a partial view of functional variability, and this limitation is particularly pronounced in WM, where functional signals are weaker and more susceptible to noise. To tackle this limitation, we introduce fractional amplitude of low-frequency fluctuation (fALFF) to measure the intensity of spontaneous neural activity and analyze functional variability in WM. Specifically, we propose a novel method to quantify WM functional variability by estimating the differential identifiability of fALFF. Higher differential identifiability is observed in WM fALFF compared to FC, which indicates that fALFF is more sensitive to WM functional variability. Through fALFF differential identifiability, we evaluate the functional variabilities of both WM and GM, and find the overall functional variability pattern is similar although WM shows slightly lower variability than GM. The regional functional variabilities of WM are associated with structural connectivity, where commissural fiber regions generally exhibit higher variability than projection fiber regions. Furthermore, we discover that WM functional variability demonstrates a spatial gradient ascending from the brainstem to the cortex by hypothesis testing, which aligns well with the evolutionary expansion. The gradient of functional variability in WM provides novel insights for understanding WM function. To the best of our knowledge, this is the first attempt to investigate WM functional variability via fALFF.


[13] 2510.07072

A model of the human cornea as a hydrated, fluid-saturated medium

We introduce a new model of the human corneal stroma, regarded as a fluid-saturated continuum, able to describe surface flattening and thickness thinning observed in several pathological conditions. In contrast with more common approaches that describe the human cornea as a quasi-incompressible hyperelastic medium, eventually including micro-structured anisotropy and heterogeneity, here we focus on the multi-phase nature of the tissue, where the content of water reaches about 78\% in weight. The study is motivated by the fact that, although purely mechanical continuum models have been demonstrated to be satisfactory and accurate at predicting physiological behaviors, they have not been able to capture the geometrical features of tissue degeneration clearly associated to a reduction of the fluid content in the stroma, such thinning and loss of curvature. Here, we model the cornea as a fully saturated mixture of a solid phase and a fluid phase, in principle without restricting the formulation to specific assumptions on the actual inhomogeneous nature of both phases. The focus of the study is to understand whether a multiphysics model is capable of explaining, in terms of fluid flux imbalance, such as ectasia and keratoconus. As a first attempt in this direction, we make simple isotropic constitutive assumptions for both phases.


[14] 2510.07140

Quantifying spike train synchrony and directionality: Measures and Applications

By introducing the twin concepts of reliability and precision along with the corresponding measures, Mainen and Sejnowski's seminal 1995 paper "Reliability of spike timing in neocortical neurons" (Mainen and Sejnowski, 1995) paved the way for a new kind of quantitative spike train analysis. In subsequent years a host of new methods was introduced that measured both the synchrony among neuronal spike trains and the directional component, e.g. how activity propogates between neurons. This development culminated with a new class of measures that are both time scale independent and time resolved. These include the two spike train distances ISI- and SPIKE-Distance as well as the coincidence detector SPIKE-Synchronization and its directional companion SPIKE-Order. This article will not only review all of these measures but also include two recently proposed algorithms for latency correction which build on SPIKE-order and aim to optimize the spike time alignment of sparse spike trains with well-defined global spiking events. For the sake of clarity, all these methods will be illustrated on artificially generated data but in each case exemplary applications to real neuronal data will be described as well.


[15] 2510.07255

ARGscape: A modular, interactive tool for manipulation of spatiotemporal ancestral recombination graphs

Ancestral recombination graphs (ARGs) encode the complete genealogical history of a population of recombining lineages. ARGs, and their succinct representation, tree sequences, are increasingly central to modern population genetics methods, yet building an intuition for ARGs remains challenging. This is particularly true when analyzing ancestry in a geographic context, as there is a critical lack of dedicated, interactive tools capable of visualizing ARGs as spatiotemporal objects. To address this gap, we introduce ARGscape, an interactive platform for simulating, analyzing, and visualizing ARGs across space and time. ARGscape provides a user-friendly graphical interface featuring dynamic 2- and 3-dimensional visualizations to explore ARGs through space and time, as well as a novel "spatial diff" visualization for quantitative comparison of geographic inference methods. ARGscape is an innovative, unified framework that seamlessly integrates leading command-line, Python, and R-based tools for ARG simulation, manipulation, and use in spatiotemporal inference into both graphical and command-line interfaces. By integrating these various functionalities, ARGscape facilitates novel data exploration and hypothesis generation, while lowering the barrier to entry for spatiotemporal ARG analysis in both research and education use-cases. ARGscape is built with a Python FastAPI backend and a React/TypeScript frontend. It is freely available as a live demo at this https URL and as a Python package on PyPI (pip install argscape). The source code and documentation are available on GitHub at this https URL.


[16] 2510.07265

Entropy and diffusion characterize mutation accumulation and biological information loss

Aging is a universal consequence of life, yet researchers have identified no universal theme. This manuscript considers aging from the perspective of entropy, wherein things fall apart. We first examine biological information change as a mutational distance, analogous to physical distance. In this model, informational change over time is fitted to an advection-diffusion equation, a normal distribution with a time component. The solution of the advection-diffusion equation provides a means of measuring the entropy of diverse biological systems. The binomial distribution is also sufficient to demonstrate that entropy increases as mutations or epimutations accumulate. As modeled, entropy scales with lifespans across the tree of life. This perspective provides potential mechanistic insights and testable hypotheses as to how evolution has attained enhanced longevity: entropy management. We find entropy is an inclusive rather than exclusive aging theory.


[17] 2510.06235

Stacked Regression using Off-the-shelf, Stimulus-tuned and Fine-tuned Neural Networks for Predicting fMRI Brain Responses to Movies (Algonauts 2025 Report)

We present our submission to the Algonauts 2025 Challenge, where the goal is to predict fMRI brain responses to movie stimuli. Our approach integrates multimodal representations from large language models, video encoders, audio models, and vision-language models, combining both off-the-shelf and fine-tuned variants. To improve performance, we enhanced textual inputs with detailed transcripts and summaries, and we explored stimulus-tuning and fine-tuning strategies for language and vision models. Predictions from individual models were combined using stacked regression, yielding solid results. Our submission, under the team name Seinfeld, ranked 10th. We make all code and resources publicly available, contributing to ongoing efforts in developing multimodal encoding models for brain activity.


[18] 2510.06236

Space, time and altruism in pandemics and the climate emergency

Climate change is a global emergency, as was the COVID-19 pandemic. Why was our collective response to COVID-19 so much stronger than our response to the climate emergency, to date? We hypothesize that the answer has to do with the scale of the systems, and not just spatial and temporal scales but also the `altruistic scale' that measures whether an action must rely upon altruistic motives for it to be adopted. We treat COVID-19 and climate change as common pool resource problems that exemplify coupled human-environment systems. We introduce a framework that captures regimes of containment, mitigation, and failure to control. As parameters governing these three scales are varied, it is possible to shift from a COVID-like system to a climate-like system. The framework replicates both inaction in the case of climate change mitigation, as well as the faster response that we exhibited to COVID-19. Our cross-system comparison also suggests actionable ways that cooperation can be improved in large-scale common pool resources problems, like climate change. More broadly, we argue that considering scale and incorporating human-natural system feedbacks are not just interesting special cases within non-cooperative game theory, but rather should be the starting point for the study of altruism and human cooperation.


[19] 2510.06237

Coupled opinion-environmental dynamics in polarized and prejudiced populations

Public opinion on environmental issues remains polarized in many countries, posing a significant barrier to the implementation of effective policies. Behind this polarization, empirical studies have identified social susceptibility, personal prejudice, and personal experience as dominant factors in opinion formation on environmental issues. However, current coupled human-environment models have not yet incorporated all three factors in polarized populations. We developed a stylized coupled human-environment model to investigate how social susceptibility, personal prejudice, and personal experience shape opinion formation and the environment in polarized populations. Using analytical and numerical methods, we characterized the conditions under which polarization, consensus, opinion changes, and cyclic dynamics emerge depending on the costs of mitigation, environmental damage, and the factors influencing opinion formation. Our model shows that prejudice is the key driver of persistent polarization, with even slightly prejudiced populations maintaining indefinite polarization independent of their level of objectivity. We predict that polarization can be reduced by decreasing the role of prejudice or increasing the willingness to consider opposing opinions. Finally, our model shows that cost reduction methods are less effective at reducing environmental impact in prejudiced populations. Our model generates thresholds for when reducing costs or emissions is more useful depending on the factors which influence the population's opinion formation. Overall, our model provides a framework for investigating the importance of cognitive and social structures in determining human-environment dynamics.


[20] 2510.06584

Improving Artifact Robustness for CT Deep Learning Models Without Labeled Artifact Images via Domain Adaptation

Deep learning models which perform well on images from their training distribution can degrade substantially when applied to new distributions. If a CT scanner introduces a new artifact not present in the training labels, the model may misclassify the images. Although modern CT scanners include design features which mitigate these artifacts, unanticipated or difficult-to-mitigate artifacts can still appear in practice. The direct solution of labeling images from this new distribution can be costly. As a more accessible alternative, this study evaluates domain adaptation as an approach for training models that maintain classification performance despite new artifacts, even without corresponding labels. We simulate ring artifacts from detector gain error in sinogram space and evaluate domain adversarial neural networks (DANN) against baseline and augmentation-based approaches on the OrganAMNIST abdominal CT dataset. Our results demonstrate that baseline models trained only on clean images fail to generalize to images with ring artifacts, and traditional augmentation with other distortion types provides no improvement on unseen artifact domains. In contrast, the DANN approach successfully maintains high classification accuracy on ring artifact images using only unlabeled artifact data during training, demonstrating the viability of domain adaptation for artifact robustness. The domain-adapted model achieved classification performance on ring artifact test data comparable to models explicitly trained with labeled artifact images, while also showing unexpected generalization to uniform noise. These findings provide empirical evidence that domain adaptation can effectively address distribution shift in medical imaging without requiring expensive expert labeling of new artifact distributions, suggesting promise for deployment in clinical settings where novel artifacts may emerge.


[21] 2510.07286

Evolutionary Profiles for Protein Fitness Prediction

Predicting the fitness impact of mutations is central to protein engineering but constrained by limited assays relative to the size of sequence space. Protein language models (pLMs) trained with masked language modeling (MLM) exhibit strong zero-shot fitness prediction; we provide a unifying view by interpreting natural evolution as implicit reward maximization and MLM as inverse reinforcement learning (IRL), in which extant sequences act as expert demonstrations and pLM log-odds serve as fitness estimates. Building on this perspective, we introduce EvoIF, a lightweight model that integrates two complementary sources of evolutionary signal: (i) within-family profiles from retrieved homologs and (ii) cross-family structural-evolutionary constraints distilled from inverse folding logits. EvoIF fuses sequence-structure representations with these profiles via a compact transition block, yielding calibrated probabilities for log-odds scoring. On ProteinGym (217 mutational assays; >2.5M mutants), EvoIF and its MSA-enabled variant achieve state-of-the-art or competitive performance while using only 0.15% of the training data and fewer parameters than recent large models. Ablations confirm that within-family and cross-family profiles are complementary, improving robustness across function types, MSA depths, taxa, and mutation depths. The codes will be made publicly available at this https URL.


[22] 2403.19372

Emergent predictability in microbial ecosystems

Microbial ecosystems exhibit a surprising amount of functionally relevant diversity at all levels of taxonomic resolution, presenting a significant challenge for most modeling frameworks. A long-standing hope of theoretical ecology is that some patterns might persist despite community complexity -- or perhaps even emerge because of it. A deeper understanding of such "emergent simplicity" could enable new approaches for predicting the behaviors of the complex ecosystems in nature. However, the concept remains partly intuitive with no consistent definition, and most empirical examples described so far afford limited predictive power. Here, we propose an information-theoretic framework for defining and quantifying emergent simplicity in empirical data based on the ability of coarsened descriptions to predict community-level functional properties. Applying this framework to two published datasets, we demonstrate that all five properties measured across both experiments exhibit robust evidence of what we define as "emergent predictability": surprisingly, as community richness increases, simple compositional descriptions become more predictive. We show that standard theoretical models of high-diversity ecosystems fail to recapitulate this behavior. This is in contrast to simple self-averaging, which is well-understood and generic across models. We propose that, counterintuitively, emergent predictability arises when physiological or environmental feedbacks _oppose_ statistical self-averaging along some axes of community variation. As a result, these axes of variation become increasingly predictive of community function at high richness. We demonstrate this mechanism in a minimal model, and argue that explaining and leveraging emergent predictability will require integrating large-N theoretical models with a minimal notion of physiology, which the dominant modeling frameworks currently omit.


[23] 2411.06539

Advancing glaucoma research with multiphysics continuum mechanics modelling: Opportunities and open challenges

This review examines the emerging role of mechanistic mathematical models based on continuum mechanics to address current challenges in glaucoma research. At present, the advent of Artificial Intelligence and data-based models have resulted in significant progress in drug candidate screening, target identification and delivery optimization for glaucoma treatment. Physics-based models on the other hand offer mechanistic insight by modelling fundamental physical knowledge. Mechanistic models, and specifically those based on continuum mechanics, have the potential to contribute to a better understanding of glaucoma through the description of intraocular fluid dynamics, mass and heat transfer and other basic physical phenomena. So far, these models have expanded our understanding of ocular fluid dynamics, including descriptions of fluid flow profiles within the anterior chamber of the eye under glaucomatous conditions. With the ongoing development of multiphysics modelling frameworks, there is increasing potential to apply these tools to a wide range of current challenges within the field of glaucoma research. These challenges include glaucoma drainage devices, minimally invasive surgical procedures, therapeutic contact lenses, laser-based interventions like peripheral iridotomy, and the design and optimization of biodegradable drug-releasing intracameral implants, which support patient-specific strategies for glaucoma diagnosis and treatment


[24] 2509.05508

Defining and Estimating Outcomes Directly Averted by a Vaccination Program when Rollout Occurs Over Time

During the COVID-19 pandemic, estimating the total deaths averted by vaccination has been of great public health interest. Instead of estimating total deaths averted by vaccination among both vaccinated and unvaccinated individuals, some studies empirically estimated only "directly averted" deaths among vaccinated individuals, typically suggesting that vaccines prevented more deaths overall than directly due to the indirect effect. Here, we define the causal estimand to quantify outcomes "directly averted" by vaccination$\unicode{x2014}$i.e., the impact of vaccination for vaccinated individuals, holding vaccination coverage fixed$\unicode{x2014}$for vaccination at multiple time points, and show that this estimand is a lower bound on the total outcomes averted when the indirect effect is non-negative. We develop an unbiased estimator for the causal estimand in a one-stage randomized controlled trial (RCT) and explore the bias of a popular "hazard difference" estimator frequently used in empirical studies. We show that even in an RCT, the hazard difference estimator is biased if vaccination has a non-null effect, as it fails to incorporate the greater depletion of susceptibles among the unvaccinated individuals. In simulations, the overestimation is small for averted deaths when infection-fatality rate is low, as for many important pathogens. However, the overestimation can be large for averted infections given a high basic reproduction number. Additionally, we define and compare estimand and estimators for avertible outcomes (i.e., outcomes that could have been averted by vaccination, but were not due to failure to vaccinate). Future studies can explore the identifiability of the causal estimand in observational settings.


[25] 2510.00353

The Multivariate SEM-PGS Model: Using Polygenic Scores to Investigate Cross-Trait Genetic Nurture and Assortative Mating

Genetic nurture effects and assortative mating (AM) occur across many human behaviors and can bias estimates from traditional genetic models. These influences are typically studied univariately, within the same trait. However, estimation of cross-trait genetic nurture effects and cross-trait AM remains underexplored due to the absence of suitable approaches. To address this, we developed a multivariate extension of the SEM-PGS model for datasets with genotyped and phenotyped parents and offspring, enabling joint estimation of within-trait and cross-trait genetic and environmental influences. By integrating haplotypic polygenic scores (PGS) into a structural equation modeling framework, the model simultaneously estimates same-trait and cross-trait direct effects, genetic nurture, vertical transmission, and assortative mating. We also provide the first formal description of how copaths can be used to model multivariate assortative mating and derive the corresponding parameter expectations in matrix form. Forward-time Monte Carlo simulations under varying conditions of r^2_PGS and N_trio demonstrate that the model yields unbiased estimates of both within-trait and cross-trait effects when assumptions are met. The precision of estimates was adequate with large sample sizes (N_trio > 16k) and improved as PGS predictive power increased. In addition, our simulation results show that failing to model cross-trait effects biases within-trait estimates, underscoring the importance of incorporating cross-trait effects. The multivariate SEM-PGS model offers a powerful and flexible tool for disentangling gene-environment interplay and advancing the understanding of familial influences on human traits.


[26] 2510.02205

Charting dissipation across the microbial world

The energy dissipated by a living organism is commonly identified with heat generation. However, as cells exchange metabolites with their environment they also dissipate energy in the form of chemical entropy. How dissipation is distributed between exchanges of heat and chemical entropy is largely unexplored. Here, we analyze an extensive experimental database recently created [1] to investigate how microbes partition dissipation between thermal and chemical entropy during growth. We find that aerobic respiration exchanges little chemical entropy and dissipation is primarily due to heat production, as commonly assumed. However, we also find several types of anaerobic metabolism that produce as much chemical entropy as heat. Counterintuitively, instances of anaerobic metabolisms such as acetotrophic methanogenesis and sulfur respiration are endothermic. We conclude that, because of their metabolic versatility, microbes are able to exploit all combinations of heat and chemical entropy exchanges that result in a net production of entropy.


[27] 2403.17202

The generic temperature response of large biochemical networks

Biological systems are remarkably susceptible to relatively small temperature changes. The most obvious example is fever, when a modest rise in body temperature of only few Kelvin has strong effects on our immune system and how it fights pathogens. Another very important example is climate change, when even smaller temperature changes lead to dramatic shifts in ecosystems. Although it is generally accepted that the main effect of an increase in temperature is the acceleration of biochemical reactions according to the Arrhenius equation, it is not clear how it affects large biochemical networks with complicated architectures. For developmental systems like fly and frog, it has been shown that the system response to temperature deviates in a characteristic manner from the linear Arrhenius plot of single reactions, but a rigorous explanation has not been given yet. Here we use a graph-theoretical interpretation of the mean first-passage times of a biochemical master equation to give a statistical description. We find that in the limit of large system size and if the network has a bias towards a target state, then the Arrhenius plot is generically quadratic, in excellent agreement with numerical simulations for large networks as well as with experimental data for developmental times in fly and frog. We also discuss under which conditions this generic response can be violated, for example for linear chains, which have only one spanning tree.


[28] 2509.03487

SafeProtein: Red-Teaming Framework and Benchmark for Protein Foundation Models

Proteins play crucial roles in almost all biological processes. The advancement of deep learning has greatly accelerated the development of protein foundation models, leading to significant successes in protein understanding and design. However, the lack of systematic red-teaming for these models has raised serious concerns about their potential misuse, such as generating proteins with biological safety risks. This paper introduces SafeProtein, the first red-teaming framework designed for protein foundation models to the best of our knowledge. SafeProtein combines multimodal prompt engineering and heuristic beam search to systematically design red-teaming methods and conduct tests on protein foundation models. We also curated SafeProtein-Bench, which includes a manually constructed red-teaming benchmark dataset and a comprehensive evaluation protocol. SafeProtein achieved continuous jailbreaks on state-of-the-art protein foundation models (up to 70% attack success rate for ESM3), revealing potential biological safety risks in current protein foundation models and providing insights for the development of robust security protection technologies for frontier models. The codes will be made publicly available at this https URL.


[29] 2509.07627

LSMTCR: A Scalable Multi-Architecture Model for Epitope-Specific T Cell Receptor de novo Design

Designing full-length, epitope-specific TCR {\alpha}\b{eta} remains challenging due to vast sequence space, data biases and incomplete modeling of immunogenetic constraints. We present LSMTCR, a scalable multi-architecture framework that separates specificity from constraint learning to enable de novo, epitope-conditioned generation of paired, full-length TCRs. A diffusion-enhanced BERT encoder learns time-conditioned epitope representations; conditional GPT decoders, pretrained on CDR3\b{eta} and transferred to CDR3{\alpha}, generate chain-specific CDR3s under cross-modal conditioning with temperature-controlled diversity; and a gene-aware Transformer assembles complete {\alpha}/\b{eta} sequences by predicting V/J usage to ensure immunogenetic fidelity. Across GLIPH, TEP, MIRA, McPAS and our curated dataset, LSMTCR achieves higher predicted binding than baselines on most datasets, more faithfully recovers positional and length grammars, and delivers superior, temperature-tunable diversity. For {\alpha}-chain generation, transfer learning improves predicted binding, length realism and diversity over representative methods. Full-length assembly from known or de novo CDR3s preserves k-mer spectra, yields low edit distances to references, and, in paired {\alpha}/\b{eta} co-modelling with epitope, attains higher pTM/ipTM than single-chain settings. LSMTCR outputs diverse, gene-contextualized, full-length TCR designs from epitope input alone, enabling high-throughput screening and iterative optimization.