New articles on Quantitative Biology


[1] 2508.16587

HemePLM-Diffuse: A Scalable Generative Framework for Protein-Ligand Dynamics in Large Biomolecular System

Comprehending the long-timescale dynamics of protein-ligand complexes is very important for drug discovery and structural biology, but it continues to be computationally challenging for large biomolecular systems. We introduce HemePLM-Diffuse, an innovative generative transformer model that is designed for accurate simulation of protein-ligand trajectories, inpaints the missing ligand fragments, and sample transition paths in systems with more than 10,000 atoms. HemePLM-Diffuse has features of SE(3)-Invariant tokenization approach for proteins and ligands, that utilizes time-aware cross-attentional diffusion to effectively capture atomic motion. We also demonstrate its capabilities using the 3CQV HEME system, showing enhanced accuracy and scalability compared to leading models such as TorchMD-Net, MDGEN, and Uni-Mol.


[2] 2508.16597

Bridging Foundation Models and Efficient Architectures: A Modular Brain Imaging Framework with Local Masking and Pretrained Representation Learning

Functional connectivity (FC) derived from resting-state fMRI plays a critical role in personalized predictions such as age and cognitive performance. However, applying foundation models(FM) to fMRI data remains challenging due to its high dimensionality, computational complexity, and the difficulty in capturing complex spatiotemporal dynamics and indirect region-of-interest (ROI) interactions. To address these limitations, we propose a modular neuroimaging framework that integrates principles from FM with efficient, domain-specific architectures. Our approach begins with a Local Masked Autoencoder (LMAE) for pretraining, which reduces the influence of hemodynamic response function (HRF) dynamics and suppresses noise. This is followed by a Random Walk Mixture of Experts (RWMOE) module that clusters features across spatial and temporal dimensions, effectively capturing intricate brain interactions. Finally, a state-space model (SSM)-based predictor performs downstream task inference. Evaluated on the Cambridge Centre for Ageing and Neuroscience (Cam-CAN) dataset, our framework achieved mean absolute errors (MAEs) of 5.343 for age prediction and 2.940 for fluid intelligence, with Pearson correlation coefficients (PCCs) of 0.928 and 0.887, respectively-outperforming existing state-of-the-art methods. Visualization of expert distribution weights further enhances interpretability by identifying key brain regions. This work provides a robust, interpretable alternative to LLM-based approaches for fMRI analysis, offering novel insights into brain aging and cognitive function.


[3] 2508.16667

BrainPath: Generating Subject-Specific Brain Aging Trajectories

Quantifying and forecasting individual brain aging trajectories is critical for understanding neurodegenerative disease and the heterogeneity of aging, yet current approaches remain limited. Most models predict chronological age, an imperfect surrogate for biological aging, or generate synthetic MRIs that enhance data diversity but fail to capture subject-specific trajectories. Here, we present BrainPath, a 3D generative framework that learns longitudinal brain aging dynamics during training and, at inference, predicts anatomically faithful MRIs at arbitrary timepoints from a single baseline scan. BrainPath integrates an age calibration loss, a swap learning strategy, and an age perceptual loss to preserve subtle, biologically meaningful variations. Across held-out ADNI and an independent NACC dataset, BrainPath outperforms state-of-the-art reference models in structural similarity (SSIM), mean squared error (MSE), peak signal-to-noise ratio (PSNR), and MRI age-difference accuracy, while capturing realistic and temporally consistent aging patterns. Beyond methodological innovation, BrainPath enables personalized mapping of brain aging, synthetic follow-up scan prediction, and trajectory-based analyses, providing a foundation for precision modeling of brain aging and supporting research into neurodegeneration and aging interventions.


[4] 2508.16698

A Computer Vision and Depth Sensor-Powered Smart Cane for Real-Time Obstacle Detection and Navigation Assistance for the Visually Impaired

Visual impairment impacts more than 2.2 billion people worldwide, and it greatly restricts independent mobility and access. Conventional mobility aids - white canes and ultrasound-based intelligent canes - are inherently limited in the feedback they can offer and generally will not be able to differentiate among types of obstacles in dense or complex environments. Here, we introduce the IoT Cane, an internet of things assistive navigation tool that integrates real-time computer vision with a transformer-based RT-DETRv3-R50 model alongside depth sensing through the Intel RealSense camera. Our prototype records a mAP of 53.4% and an AP50 of 71.7% when tested on difficult datasets with low Intersection over Union (IoU) boundaries, outperforming similar ultrasound-based systems. Latency in end-to-end mode is around 150 ms per frame, accounting for preprocessing (1-3 ms), inference (50-70 ms), and post-processing (0.5-1.0 ms per object detected). Feedback is provided through haptic vibration motors and audio notifications driven by a LiPo battery, which controls power using a PowerBoost module. Future directions involve iOS integration to tap into more compute, hardware redesign to minimize cost, and mobile companion app support over Bluetooth. This effort offers a strong, extensible prototype toward large-scale vision-based assistive technology for the visually impaired.


[5] 2508.16740

Integrative Prognostic Modeling of Breast Cancer Survival with Gene Expression and Clinical Data

Background: Accurate survival prediction in breast cancer is essential for patient stratification and personalized therapy. Integrating gene expression data with clinical factors may enhance prognostic performance and support precision medicine. Objective: To develop an integrative survival prediction model combining clinical variables and gene expression signatures, and to assess their contributions using penalized Cox regression and machine learning. Methods: We analyzed 1,867 patients from the METABRIC cohort with clinical annotations and microarray-based gene expression profiles. The top 5,000 most variable genes were retained. Elastic Net-penalized Cox regression identified 75 predictors (70 genes and 5 clinical variables: tumor size, stage, surgery type, age at diagnosis, and Nottingham Prognostic Index). Model performance was evaluated with Harrell's concordance index (C-index) and 36-month time-dependent AUC. Random Survival Forests (RSF) trained on the top 20 genes assessed nonlinear effects and validated variable importance. PCA and heatmaps visualized gene expression patterns across risk groups. Results: The integrative Cox model achieved a C-index of 0.922 and a 36-month AUC of 0.94, outperforming clinical-only models (C=0.64). RSF confirmed the prognostic value of top genes (e.g., OR2T27, TBATA, LINC01165, SLC10A4), yielding a 36-month AUC of 0.88. Conclusions: Combining gene expression signatures with clinical variables substantially improves survival prediction in breast cancer and provides a framework for individualized prognostic assessment and clinical decision-making.


[6] 2508.16760

Rethinking scale in network neuroscience: Contributions and opportunities at the nanoscale

Network science has been applied widely to study brain network organization, especially at the meso-scale, where nodes represent brain areas and edges reflect interareal connectivity inferred from imaging or tract-tracing data. While this approach has yielded important insights into large-scale brain network architecture, its foundational assumptions often misalign with the biological realities of neural systems. In this review, we argue that network science finds its most direct and mechanistically grounded application in nanoscale connectomics-wiring diagrams reconstructed at the level of individual neurons and synapses, often from high-resolution electron microscopy volumes. At this finer scale, core network concepts such as paths, motifs, communities, and centrality acquire concrete biological interpretations. Unlike meso-scale models, nanoscale connectomes are typically derived from individual animals, preserve synaptic resolution, and are richly annotated with cell types, neurotransmitter identities, and morphological detail. These properties enable biologically grounded, mechanistically interpretable analyses of circuit structure and function. We review how nanoscale data support new forms of network modeling, from realistic dynamical simulations to topology-informed circuit inference, and outline emerging directions in multimodal integration, cross-species comparisons, and generative modeling. We also emphasize the continued importance of meso- and macro-scale connectomics, especially in human neuroscience, and discuss how nanoscale insights can inform interpretation at coarser scales. Together, these efforts point toward a multi-scale future for network neuroscience, grounded in the strengths of each resolution.


[7] 2508.16810

Ignition criteria for trigger waves in cell signaling

To rapidly coordinate collective behaviors in tissues, cells communicate with one another through traveling fronts of signaling activity called trigger waves. The stimulus threshold for wave propagation is a key biological parameter that determines when the system produces a global response to a local stimulus. However, it is unclear how this stimulus threshold is determined by the properties of the underlying signaling network. To address this gap, we studied a model of trigger waves with a realistic Hill-type auto-activation function. We obtained an analytic expression for the wave ignition threshold in 1D and numerical solutions in 2D and 3D. In the limit of high sensitivity, we found that the trigger wave threshold depends on both the effective dissociation constant, $K_D$, and the effective Hill coefficient, $n$, of the positive feedback circuit, with the dominant contribution scaling as $K_D^{n/(n-1)}$ in all dimensions. This result leads to a simple method for predicting the trigger wave ignition threshold from bulk activation data and is potentially of use for developing synthetic trigger wave circuits with desired sensitivities.


[8] 2508.16877

A coalgebraic perspective on predictive processing

Predictive processing and active inference posit that the brain is a system performing Bayesian inference on the environment. By virtue of this, a prominent interpretation of predictive processing states that the generative model (a POMDP) encoded by the brain synchronises with the generative process (another POMDP) representing the environment while trying to explain what hidden properties of the world generated its sensory input. In this view, the brain is thought to become a copy of the environment. This claim has however been disputed, stressing the fact that a structural copy, or isomorphism as it is at times invoked to be, is not an accurate description of this process since the environment is necessarily more complex than the brain, and what matters is not the capacity to exactly recapitulate the veridical causal structure of the world. In this work, we make parts of this counterargument formal by using ideas from the theory of coalgebras, an abstract mathematical framework for dynamical systems that brings together work from automata theory, concurrency theory, probabilistic processes and other fields. To do so, we cast generative model and process, in the form of POMDPs, as coalgebras, and use maps between them to describe a form of consistency that goes beyond mere structural similarity, giving the necessary mathematical background to describe how different processes can be seen as behaviourally, rather than structurally, equivalent, i.e. how they can be seen as emitting the same observations, and thus minimise prediction error, over time without strict assumptions about structural similarity. In particular, we will introduce three standard notions of equivalence from the literature on coalgebras, evaluating them in the context of predictive processing and identifying the one closest to claims made by proponents of this framework.


[9] 2508.17010

Lie-RMSD: A Gradient-Based Framework for Protein Structural Alignment using Lie Algebra

The comparison of protein structures is a fundamental task in computational biology, crucial for understanding protein function, evolution, and for drug design. While analytical methods like the Kabsch algorithm provide an exact, closed-form solution for minimizing the Root Mean Square Deviation (RMSD) between two sets of corresponding atoms, their application is limited to this specific metric. The rise of deep learning and automatic differentiation frameworks offers a new, more flexible paradigm for such optimization problems. We present Lie-RMSD, a novel, fully differentiable framework for protein structural alignment. Our method represents the rigid-body transformation (rotation and translation) as a 6-dimensional vector in the Lie algebra se(3) of the special Euclidean group SE(3). This representation allows the RMSD to be formulated as a loss function that can be directly minimized by modern gradient-based optimizers. We benchmarked our framework by aligning two allosteric conformations of Adenylate Kinase (PDB IDs: 4AKE and 1AKE). We demonstrate that a suite of standard optimizers (SGD, Adam, AdamW, and Sophia) can robustly converge to the global minimum, achieving precision effectively identical to the analytical Kabsch algorithm. This work validates the accuracy of the Lie algebra-based gradient descent approach and establishes a robust foundation for its extension to more sophisticated and biologically relevant scoring functions where no analytical solutions exist.


[10] 2508.17389

Neural Proteomics Fields for Super-resolved Spatial Proteomics Prediction

Spatial proteomics maps protein distributions in tissues, providing transformative insights for life sciences. However, current sequencing-based technologies suffer from low spatial resolution, and substantial inter-tissue variability in protein expression further compromises the performance of existing molecular data prediction methods. In this work, we introduce the novel task of spatial super-resolution for sequencing-based spatial proteomics (seq-SP) and, to the best of our knowledge, propose the first deep learning model for this task--Neural Proteomics Fields (NPF). NPF formulates seq-SP as a protein reconstruction problem in continuous space by training a dedicated network for each tissue. The model comprises a Spatial Modeling Module, which learns tissue-specific protein spatial distributions, and a Morphology Modeling Module, which extracts tissue-specific morphological features. Furthermore, to facilitate rigorous evaluation, we establish an open-source benchmark dataset, Pseudo-Visium SP, for this task. Experimental results demonstrate that NPF achieves state-of-the-art performance with fewer learnable parameters, underscoring its potential for advancing spatial proteomics research. Our code and dataset are publicly available at this https URL.


[11] 2508.17555

Boltzina: Efficient and Accurate Virtual Screening via Docking-Guided Binding Prediction with Boltz-2

In structure-based drug discovery, virtual screening using conventional molecular docking methods can be performed rapidly but suffers from limitations in prediction accuracy. Recently, Boltz-2 was proposed, achieving extremely high accuracy in binding affinity prediction, but requiring approximately 20 seconds per compound per GPU, making it difficult to apply to large-scale screening of hundreds of thousands to millions of compounds. This study proposes Boltzina, a novel framework that leverages Boltz-2's high accuracy while significantly improving computational efficiency. Boltzina achieves both accuracy and speed by omitting the rate-limiting structure prediction from Boltz-2's architecture and directly predicting affinity from AutoDock Vina docking poses. We evaluate on eight assays from the MF-PCBA dataset and show that while Boltzina performs below Boltz-2, it provides significantly higher screening performance compared to AutoDock Vina and GNINA. Additionally, Boltzina achieved up to 11.8$\times$ faster through reduced recycling iterations and batch processing. Furthermore, we investigated multi-pose selection strategies and two-stage screening combining Boltzina and Boltz-2, presenting optimization methods for accuracy and efficiency according to application requirements. This study represents the first attempt to apply Boltz-2's high-accuracy predictions to practical-scale screening, offering a pipeline that combines both accuracy and efficiency in computational biology. The Boltzina is available on github; this https URL.


[12] 2508.17599

Species coexistence in the reinforcement learning paradigm

A central goal in ecology is to understand how biodiversity is maintained. Previous theoretical works have employed the rock-paper-scissors (RPS) game as a toy model, demonstrating that population mobility is crucial in determining the species' coexistence. One key prediction is that biodiversity is jeopardized and eventually lost when mobility exceeds a certain value--a conclusion at odds with empirical observations of highly mobile species coexisting in nature. To address this discrepancy, we introduce a reinforcement learning framework and study a spatial RPS model, where individual mobility is adaptively regulated via a Q-learning algorithm rather than held fixed. Our results show that all three species can coexist stably, with extinction probabilities remaining low across a broad range of baseline migration rates. Mechanistic analysis reveals that individuals develop two behavioral tendencies: survival priority (escaping from predators) and predation priority (remaining near prey). While species coexistence emerges from the balance of the two tendencies, their imbalance jeopardizes biodiversity. Notably, there is a symmetry-breaking of action preference in a particular state that is responsible for the divergent species densities. Furthermore, when Q-learning species interact with fixed-mobility counterparts, those with adaptive mobility exhibit a significant evolutionary advantage. Our study suggests that reinforcement learning may offer a promising new perspective for uncovering the mechanisms of biodiversity and informing conservation strategies.


[13] 2508.18058

Comprehensively stratifying MCIs into distinct risk subtypes based on brain imaging genetics fusion learning

Mild cognitive impairment (MCI) is the prodromal stage of Alzheimer's disease (AD) and thus enrolling MCI subjects to undergo clinical trials is worthwhile. However, MCI groups usually show significant diversity and heterogeneity in the pathology and symptom, which pose great challenge to accurately select appropriate subjects. This study aimed to stratify MCI subjects into distinct subgroups with substantial difference in the risk of transitioning to AD by fusing multimodal brain imaging genetic data. The integrated imaging genetics method comprised three modules, i.e., the whole-genome-oriented risk genetic information extraction module (RGE), the genetic-to-brain mapping module (RG2PG), and the genetic-guided pseudo-brain fusion module (CMPF). We used data from AD Neuroimaging Initiative (ADNI) and identified two MCI subtypes, called low-risk MCI (lsMCI) and high-risk MCI (hsMCI). We also validated that the two subgroups showed distinct patterns of in terms of multiple biomarkers including genetics, demographics, fluid biomarkers, brain imaging features, clinical symptoms and cognitive functioning at baseline, as well as their longitudinal developmental trajectories. Furthermore, we also identified potential biomarkers that may implicate the risk of MCIs, providing critical insights for patient stratification at early stage.


[14] 2508.18084

An integrated vertex model of the mesoderm invagination during the embryonic development of Drosophila

The mesoderm invagination of the Drosophila embryo is known as an archetypal morphogenic process. To explore the roles of the active cellular forces and the regulation of these forces, we developed an integrated vertex model that combines the regulation of morphogen expression with cell movements and tissue mechanics. Our results suggest that a successful furrow formation requires an apical tension gradient, decreased basal tension, and increased lateral tension, which corresponds to apical constriction, basal expansion, and apicobasal shortening respectively. Our model also considers the mechanical feedback which leads to an ectopic twist expression with external compression as observed in experiments. Our model predicts that ectopic invagination could happen if an external compressive gradient is applied.


[15] 2508.18096

Realizing Reduced and Sparse Biochemical Reaction Networks from Dynamics

We propose a direct optimization framework for learning reduced and sparse chemical reaction networks (CRNs) from time-series trajectory data. In contrast to widely used indirect methods-such as those based on sparse identification of nonlinear dynamics (SINDy)-which infer reaction dynamics by fitting numerically estimated derivatives, our approach fits entire trajectories by solving a dynamically constrained optimization problem. This formulation enables the construction of reduced CRNs that are both low-dimensional and sparse, while preserving key dynamical behaviors of the original system. We develop an accelerated proximal gradient algorithm to efficiently solve the resulting non-convex optimization problem. Through illustrative examples, including a Drosophila circadian oscillator and a glycolytic oscillator, we demonstrate the ability of our method to recover accurate and interpretable reduced-order CRNs. Notably, the direct approach avoids the derivative estimation step and mitigates error accumulation issues inherent in indirect methods, making it a robust alternative for data-driven CRN realizations.


[16] 2508.18211

Flexibility-Conditioned Protein Structure Design with Flow Matching

Recent advances in geometric deep learning and generative modeling have enabled the design of novel proteins with a wide range of desired properties. However, current state-of-the-art approaches are typically restricted to generating proteins with only static target properties, such as motifs and symmetries. In this work, we take a step towards overcoming this limitation by proposing a framework to condition structure generation on flexibility, which is crucial for key functionalities such as catalysis or molecular recognition. We first introduce BackFlip, an equivariant neural network for predicting per-residue flexibility from an input backbone structure. Relying on BackFlip, we propose FliPS, an SE(3)-equivariant conditional flow matching model that solves the inverse problem, that is, generating backbones that display a target flexibility profile. In our experiments, we show that FliPS is able to generate novel and diverse protein backbones with the desired flexibility, verified by Molecular Dynamics (MD) simulations. FliPS and BackFlip are available at this https URL .


[17] 2505.21407

Breaking co-existence: zealotry vs. nonlinear social impact

We study how zealotry and nonlinear social impact affect consensus formation in the nonlinear voter model, evolutionary games, and the partisan voter model. In all three models, consensus is an absorbing state in finite populations, while co-existence is a possible outcome of the deterministic dynamics. We show that sufficiently strong zealotry -- i.e., the presence of agents who never change state -- can drive infinite populations to consensus in all three models. However, while evolutionary games and the partisan voter model permit zealotry-induced consensus for all values of their model parameters, the nonlinear voter model does not. Central to this difference is the shape of the social impact function, which quantifies how the influence of a group scales with size, and is therefore a measure of majority and minority effects. We derive general conditions relating the slope of this function at small group sizes to the local stability of consensus. Sublinear impact favours minorities and can override zealotry to prevent consensus, whereas superlinear impact promotes majorities and therefore facilitates consensus. We extend the analysis to finite populations, exploring the time-to-consensus, and the shape of quasi-stationary distributions.


[18] 2508.16650

Predicting brain tumour enhancement from non-contrast MR imaging with artificial intelligence

Brain tumour imaging assessment typically requires both pre- and post-contrast MRI, but gadolinium administration is not always desirable, such as in frequent follow-up, renal impairment, allergy, or paediatric patients. We aimed to develop and validate a deep learning model capable of predicting brain tumour contrast enhancement from non-contrast MRI sequences alone. We assembled 11089 brain MRI studies from 10 international datasets spanning adult and paediatric populations with various neuro-oncological states, including glioma, meningioma, metastases, and post-resection appearances. Deep learning models (nnU-Net, SegResNet, SwinUNETR) were trained to predict and segment enhancing tumour using only non-contrast T1-, T2-, and T2/FLAIR-weighted images. Performance was evaluated on 1109 held-out test patients using patient-level detection metrics and voxel-level segmentation accuracy. Model predictions were compared against 11 expert radiologists who each reviewed 100 randomly selected patients. The best-performing nnU-Net achieved 83% balanced accuracy, 91.5% sensitivity, and 74.4% specificity in detecting enhancing tumour. Enhancement volume predictions strongly correlated with ground truth (R2 0.859). The model outperformed expert radiologists, who achieved 69.8% accuracy, 75.9% sensitivity, and 64.7% specificity. 76.8% of test patients had Dice over 0.3 (acceptable detection), 67.5% had Dice over 0.5 (good detection), and 50.2% had Dice over 0.7 (excellent detection). Deep learning can identify contrast-enhancing brain tumours from non-contrast MRI with clinically relevant performance. These models show promise as screening tools and may reduce gadolinium dependence in neuro-oncology imaging. Future work should evaluate clinical utility alongside radiology experts.


[19] 2508.16803

A predictive modular approach to constraint satisfaction under uncertainty - with application to glycosylation in continuous monoclonal antibody biosimilar production

The paper proposes a modular-based approach to constraint handling in process optimization and control. This is partly motivated by the recent interest in learning-based methods, e.g., within bioproduction, for which constraint handling under uncertainty is a challenge. The proposed constraint handler, called predictive filter, is combined with an adaptive constraint margin and a constraint violation cost monitor to minimize the cost of violating soft constraints due to model uncertainty and disturbances. The module can be combined with any controller and is based on minimally modifying the controller output, in a least squares sense, such that constraints are satisfied within the considered horizon. The proposed method is computationally efficient and suitable for real-time applications. The effectiveness of the method is illustrated through a realistic simulation case study of glycosylation constraint satisfaction in continuous monoclonal antibody biosimilar production using Chinese hamster ovary cells, for which the metabolic network model consists of 23 extracellular metabolites and 126 reactions.


[20] 2508.16895

Quantum State Fidelity for Functional Neural Network Construction

Neuroscientists face challenges in analyzing high-dimensional neural recording data of dense functional networks. Without ground-truth reference data, finding the best algorithm for recovering neurologically relevant networks remains an open question. We implemented hybrid quantum algorithms to construct functional networks and compared them with the results of documented classical techniques. We demonstrated that our quantum state fidelity can provide a competitive alternative to classical metrics by revealing distinct functional networks. Our results suggest that quantum computing offers a viable and potentially advantageous alternative for data-driven modeling in neuroscience, underscoring its broader applicability in high-dimensional graph inference and complex system analysis.


[21] 2508.16934

Addressing Annotation Scarcity in Hyperspectral Brain Image Segmentation with Unsupervised Domain Adaptation

This work presents a novel deep learning framework for segmenting cerebral vasculature in hyperspectral brain images. We address the critical challenge of severe label scarcity, which impedes conventional supervised training. Our approach utilizes a novel unsupervised domain adaptation methodology, using a small, expert-annotated ground truth alongside unlabeled data. Quantitative and qualitative evaluations confirm that our method significantly outperforms existing state-of-the-art approaches, demonstrating the efficacy of domain adaptation for label-scarce biomedical imaging tasks.


[22] 2508.17345

ShortListing Model: A Streamlined SimplexDiffusion for Discrete Variable Generation

Generative modeling of discrete variables is challenging yet crucial for applications in natural language processing and biological sequence design. We introduce the Shortlisting Model (SLM), a novel simplex-based diffusion model inspired by progressive candidate pruning. SLM operates on simplex centroids, reducing generation complexity and enhancing scalability. Additionally, SLM incorporates a flexible implementation of classifier-free guidance, enhancing unconditional generation performance. Extensive experiments on DNA promoter and enhancer design, protein design, character-level and large-vocabulary language modeling demonstrate the competitive performance and strong potential of SLM. Our code can be found at this https URL


[23] 2508.17699

Benchmarking Class Activation Map Methods for Explainable Brain Hemorrhage Classification on Hemorica Dataset

Explainable Artificial Intelligence (XAI) has become an essential component of medical imaging research, aiming to increase transparency and clinical trust in deep learning models. This study investigates brain hemorrhage diagnosis with a focus on explainability through Class Activation Mapping (CAM) techniques. A pipeline was developed to extract pixellevel segmentation and detection annotations from classification models using nine state-of-the-art CAM algorithms, applied across multiple network stages, and quantitatively evaluated on the Hemorica dataset, which uniquely provides both slice-level labels and high-quality segmentation masks. Metrics including Dice, IoU, and pixel-wise overlap were employed to benchmark CAM variants. Results show that the strongest localization performance occurred at stage 5 of EfficientNetV2S, with HiResCAM yielding the highest bounding-box alignment and AblationCAM achieving the best pixel-level Dice (0.57) and IoU (0.40), representing strong accuracy given that models were trained solely for classification without segmentation supervision. To the best of current knowledge, this is among the f irst works to quantitatively compare CAM methods for brain hemorrhage detection, establishing a reproducible benchmark and underscoring the potential of XAI-driven pipelines for clinically meaningful AI-assisted diagnosis.


[24] 2508.17815

Multi-domain Distribution Learning for De Novo Drug Design

We introduce DrugFlow, a generative model for structure-based drug design that integrates continuous flow matching with discrete Markov bridges, demonstrating state-of-the-art performance in learning chemical, geometric, and physical aspects of three-dimensional protein-ligand data. We endow DrugFlow with an uncertainty estimate that is able to detect out-of-distribution samples. To further enhance the sampling process towards distribution regions with desirable metric values, we propose a joint preference alignment scheme applicable to both flow matching and Markov bridge frameworks. Furthermore, we extend our model to also explore the conformational landscape of the protein by jointly sampling side chain angles and molecules.


[25] 2508.17891

One pocket to activate them all: Efforts on understanding the modulator pocket in K2P channels

The modulator pocket is a cryptic site discovered in the TREK1 K2P channel that accommodates agonists capable of increasing the channel's activity. Since its discovery, equivalent sites in other K2P channels have been shown to bind various ligands, both endogenous and exogenous. In this review, we attempt to elucidate how the modulator pocket contributes to K2P channel activation. To this end, we first describe the gating mechanisms reported in the literature and rationalize their modes of action. We then highlight previous experimental and computational evidence for agonists that bind to the modulator pocket, together with mutations at this site that affect gating. Finally, we elaborate how the activation signal arising from the modulator pocket is transduced to the gates in K2P channels. In doing so, we outline a potential common modulator pocket architecture across K2P channels: a largely amphipathic structure -consistent with the expected properties of a pocket exposed at the interface between a hydrophobic membrane and the aqueous solvent- but still with some important channel-sequence-variations. This architecture and its key differences can be leveraged for the design of new selective and potent modulators.


[26] 2508.18038

Complex dynamic transformations and strange attractors in a tri-trophic predator-prey system

Understanding predator-prey interactions is a fundamental issue in ecology, and the complex dynamics they induce are highly significant for maintaining community stability and self-organising biodiversity. Here, we investigate complex dynamical behaviors and bifurcation structures in a tri-trophic food web comprising a basal prey, an intermediate predator, and an omnivorous top predator. By combining Jacobian analysis with Hopf bifurcation theory, we derive explicit stability conditions and identify thresholds for oscillatory onset at the coexistence equilibrium. Further, we employ Shilnikov's theorem to establish criteria for the emergence of homoclinic orbit. Numerical simulations uncover a rich repertoire of dynamics, including stable limit cycles, Shilnikov homoclinic attractors, strange attractors, period-doubling cascades (with period-2 and 3 windows), chaotic bursts, and crisis-induced intermittency. These regimes are highly sensitive to the omnivore's foraging strategy: minor parameter shifts can destabilize oscillations or precipitate full system collapse. Our results highlight how omnivory and interaction structure critically modulate ecosystem complexity and resilience, offering new insights into the mechanisms that govern stability in multitrophic systems.


[27] 2508.18066

Arnold: a generalist muscle transformer policy

Controlling high-dimensional and nonlinear musculoskeletal models of the human body is a foundational scientific challenge. Recent machine learning breakthroughs have heralded policies that master individual skills like reaching, object manipulation and locomotion in musculoskeletal systems with many degrees of freedom. However, these agents are merely "specialists", achieving high performance for a single skill. In this work, we develop Arnold, a generalist policy that masters multiple tasks and embodiments. Arnold combines behavior cloning and fine-tuning with PPO to achieve expert or super-expert performance in 14 challenging control tasks from dexterous object manipulation to locomotion. A key innovation is Arnold's sensorimotor vocabulary, a compositional representation of the semantics of heterogeneous sensory modalities, objectives, and actuators. Arnold leverages this vocabulary via a transformer architecture to deal with the variable observation and action spaces of each task. This framework supports efficient multi-task, multi-embodiment learning and facilitates rapid adaptation to novel tasks. Finally, we analyze Arnold to provide insights into biological motor control, corroborating recent findings on the limited transferability of muscle synergies across tasks.


[28] 2508.18226

Disentangling the Factors of Convergence between Brains and Computer Vision Models

Many AI models trained on natural images develop representations that resemble those of the human brain. However, the factors that drive this brain-model similarity remain poorly understood. To disentangle how the model, training and data independently lead a neural network to develop brain-like representations, we trained a family of self-supervised vision transformers (DINOv3) that systematically varied these different factors. We compare their representations of images to those of the human brain recorded with both fMRI and MEG, providing high resolution in spatial and temporal analyses. We assess the brain-model similarity with three complementary metrics focusing on overall representational similarity, topographical organization, and temporal dynamics. We show that all three factors - model size, training amount, and image type - independently and interactively impact each of these brain similarity metrics. In particular, the largest DINOv3 models trained with the most human-centric images reach the highest brain-similarity. This emergence of brain-like representations in AI models follows a specific chronology during training: models first align with the early representations of the sensory cortices, and only align with the late and prefrontal representations of the brain with considerably more training. Finally, this developmental trajectory is indexed by both structural and functional properties of the human cortex: the representations that are acquired last by the models specifically align with the cortical areas with the largest developmental expansion, thickness, least myelination, and slowest timescales. Overall, these findings disentangle the interplay between architecture and experience in shaping how artificial neural networks come to see the world as humans do, thus offering a promising framework to understand how the human brain comes to represent its visual world.


[29] 2310.06191

Investigating the Correlation between Force Output, Strains, and Pressure for Active Skeletal Muscle Contractions

Measuring the forces of individual muscles in a muscle group around a joint is non-trivial, and researchers have suggested using surrogates for individual muscle forces instead. Traditionally, experimentalists have shown that the force output of the skeletal muscle tissue can be correlated to the intra-muscular pressure (IMP) generated by the muscle belly. However, IMP proves difficult to measure in vivo, due to variations from sensor placement and invasiveness of the procedure. Numerical biomechanical simulations offer a tool to analyze muscle contractions, enabling new insights into the correlations among non-invasive experimentally measurable quantities such as strains, and the force output. In this work, we investigate the correlations between the muscle force output, the principal, shear and volumetric strains experienced by the muscle, as well as the pressure developed within the muscle belly as the tissue undergoes isometric contractions with varying activation profiles and magnitudes. It is observed that pressure does not correlate well with force output under higher sub-maximal and maximal activation levels, especially at locations away from the center of the muscle belly due to pressure relaxation effects. This study reveals strong correlations between force output and the strains at all locations of the belly, irrespective of the type of activation considered. This observation offers evidence for further in vivo studies using experimentally measurable principal and volumetric strains in the muscle belly as proxies for the force generation by the individual muscle and consequently enables the estimation on the contribution of various muscle groups to the total force.


[30] 2402.07949

Optimizing the Design of an Artificial Pancreas to Improve Diabetes Management

Diabetes, a chronic condition that impairs how the body turns food into energy, i.e. blood glucose, affects 38 million people in the US alone. The standard treatment is to supplement carbohydrate intake with an artificial pancreas, i.e. a continuous insulin pump (basal shots), as well as occasional insulin injections (bolus shots). The goal of the treatment is to keep blood glucose at the center of an acceptable range, as measured through a continuous glucose meter. A secondary goal is to minimize injections, which are unpleasant and difficult for some patients to implement. In this study, neuroevolution was used to discover an optimal strategy for the treatment. Based on a dataset of 30 days of treatment and measurements of a single patient, a random forest was first trained to predict future glucose levels. A neural network was then evolved to prescribe carbohydrates, basal pumping levels, and bolus injections. Evolution discovered a Pareto front that reduced deviation from the target and number of injections compared to the original data, thus improving patients' quality of life. To make the system easier to adopt, a language interface was developed with a large language model. Thus, these technologies not only improve patient care but also adoption in a broader population.


[31] 2504.00020

Celler:A Genomic Language Model for Long-Tailed Single-Cell Annotation

Recent breakthroughs in single-cell technology have ushered in unparalleled opportunities to decode the molecular intricacy of intricate biological systems, especially those linked to diseases unique to humans. However, these progressions have also ushered in novel obstacles-specifically, the efficient annotation of extensive, long-tailed single-cell data pertaining to disease conditions. To effectively surmount this challenge, we introduce Celler, a state-of-the-art generative pre-training model crafted specifically for the annotation of single-cell data. Celler incorporates two groundbreaking elements: First, we introduced the Gaussian Inflation (GInf) Loss function. By dynamically adjusting sample weights, GInf Loss significantly enhances the model's ability to learn from rare categories while reducing the risk of overfitting for common categories. Secondly, we introduce an innovative Hard Data Mining (HDM) strategy into the training process, specifically targeting the challenging-to-learn minority data samples, which significantly improved the model's predictive accuracy. Additionally, to further advance research in this field, we have constructed a large-scale single-cell dataset: Celler-75, which encompasses 40 million cells distributed across 80 human tissues and 75 specific diseases. This dataset provides critical support for comprehensively exploring the potential of single-cell technology in disease research. Our code is available at this https URL.


[32] 2507.03472

Intestinal villi and crypts density maximizing nutrient absorption

The villi and crypts of the gastrointestinal tract increase the effective surface area of the intestinal mucosa, potentially enhancing nutrient absorption. It is commonly assumed that this is their primary function, and that a higher villi density necessarily leads to improved absorption. However, when villi are packed too closely together, diffusion can be hindered, potentially offsetting this benefit. In this work, we investigate the relationship between the density of these structures and the overall efficiency of absorption. In three different simplified geometries, approximating crypts, leaf-like villi, and finger-like villi we calculate analytically the concentration profile and the absorption flux, assuming that there is only diffusion between these structures while the lumen is well mixed. When plotting the absorption flux per unit of gut length as a function of the structures' density, we observe that there is a density maximizing absorption. We study numerically this optimum. It depends weakly on the absorption properties of the given nutrient, so that a geometry optimal for one nutrient is close to optimum for another nutrient. Physiological data from various animal species align with this predicted optimal range and potentially reflect evolutionary selection for efficient nutrient uptake, supporting the model's validity.


[33] 2507.09024

CNeuroMod-THINGS, a densely-sampled fMRI dataset for visual neuroscience

Data-hungry neuro-AI modelling requires ever larger neuroimaging datasets. CNeuroMod-THINGS meets this need by capturing neural representations for a wide set of semantic concepts using well-characterized images in a new densely-sampled, large-scale fMRI dataset. Importantly, CNeuroMod-THINGS exploits synergies between two existing projects: the THINGS initiative (THINGS) and the Courtois Project on Neural Modelling (CNeuroMod). THINGS has developed a common set of thoroughly annotated images broadly sampling natural and man-made objects which is used to acquire a growing collection of large-scale multimodal neural responses. Meanwhile, CNeuroMod is acquiring hundreds of hours of fMRI data from a core set of participants during controlled and naturalistic tasks, including visual tasks like movie watching and videogame playing. For CNeuroMod-THINGS, four CNeuroMod participants each completed 33-36 sessions of a continuous recognition paradigm using approximately 4000 images from the THINGS stimulus set spanning 720 categories. We report behavioural and neuroimaging metrics that showcase the quality of the data. By bridging together large existing resources, CNeuroMod-THINGS expands our capacity to model broad slices of the human visual experience.


[34] 2508.05699

Designing de novo TIM Barrels: Insights into Stabilization, Diversification, and Functionalization Strategies

The TIM-barrel fold is one of the most versatile and ubiquitous protein folds in nature, hosting a wide variety of catalytic activities and functions while serving as a model system in protein biochemistry and engineering. This review explores its role as a key fold model in protein design, particularly in addressing challenges in stabilization and functionalization. We discuss historical and recent advances in de novo TIM barrel design from the landmark creation of sTIM11 to the development of the diversified variants, with a special focus on deepening our understanding of the determinants that modulate the sequence-structure-function relationships of this architecture. Also, we examine why the diversification of de novo TIM barrels towards functionalization remains a major challenge, given the absence of natural-like active site features. Current approaches have focused on incorporating structural extensions, modifying loops, and using cutting-edge AI-based strategies to create scaffolds with tailored characteristics. Despite significant advances, achieving enzymatically active de novo TIM barrels has been proven difficult, with only recent breakthroughs demonstrating functionalized designs. We discuss the limitations of stepwise functionalization approaches and support an integrated approach that simultaneously optimizes scaffold structure and active site shape, using both physical- and AI-driven methods. By combining computational and experimental insights, we highlight the TIM barrel as a powerful template for custom enzyme design and as a model system to explore the intersection of protein biochemistry, biophysics, and design.


[35] 2508.06835

Expand or better manage protected areas: a framework for minimising extinction risk when threats are concentrated near edges

Several international agreements have called for the rapid expansion of protected areas to halt biodiversity declines. However, recent research has shown that expanding protected areas may be less cost-effective than redirecting resources towards threat management in existing reserves. These findings often assume that threats are homogeneously distributed in the landscape. In some cases, threats are more concentrated near the edge of protected areas. As protected areas expand, core habitat in the centre expands more rapidly than its edge, potentially creating a refuge from threats. In this paper, we present a framework linking protected area expansion and threat management to extinction risk, via their impact on population carrying capacity and growth rate within core and edge habitats. We demonstrate the framework using a simple population model where individuals are uniformly distributed in a circular protected area threatened by poachers who penetrate the protected area to a fixed distance. We parameterise the model for Peter's Duiker (Cephalophus callipygus) harvested for food in the dense undergrowth of African forests using snares. Expanding protected areas can reduce extinction risk more effectively compared to an equivalent investment in snare removal for larger protected areas that already sustain core unhunted habitat. Our results demonstrate the importance of protected area expansion in buffering susceptible populations from fixed hunting pressure restricted to protected area edges. However, for cases where threats, wildlife, and managers respond to each other strategically in space, the relative importance of expansion versus increased management remains a significant open problem.


[36] 2508.14106

High-Throughput Low-Cost Segmentation of Brightfield Microscopy Live Cell Images

Live cell culture is crucial in biomedical studies for analyzing cell properties and dynamics in vitro. This study focuses on segmenting unstained live cells imaged with bright-field microscopy. While many segmentation approaches exist for microscopic images, none consistently address the challenges of bright-field live-cell imaging with high throughput, where temporal phenotype changes, low contrast, noise, and motion-induced blur from cellular movement remain major obstacles. We developed a low-cost CNN-based pipeline incorporating comparative analysis of frozen encoders within a unified U-Net architecture enhanced with attention mechanisms, instance-aware systems, adaptive loss functions, hard instance retraining, dynamic learning rates, progressive mechanisms to mitigate overfitting, and an ensemble technique. The model was validated on a public dataset featuring diverse live cell variants, showing consistent competitiveness with state-of-the-art methods, achieving 93% test accuracy and an average F1-score of 89% (std. 0.07) on low-contrast, noisy, and blurry images. Notably, the model was trained primarily on bright-field images with limited exposure to phase- contrast microscopy (<20%), yet it generalized effectively to the phase-contrast LIVECell dataset, demonstrating modality, robustness and strong performance. This highlights its potential for real- world laboratory deployment across imaging conditions. The model requires minimal compute power and is adaptable using basic deep learning setups such as Google Colab, making it practical for training on other cell variants. Our pipeline outperforms existing methods in robustness and precision for bright-field microscopy segmentation. The code and dataset are available for reproducibility 1.


[37] 2408.12540

Neural Fields and Noise-Induced Patterns in Neurons on Large Disordered Networks

We study pattern formation in class of a large-dimensional neural networks posed on random graphs and subject to spatio-temporal stochastic forcing. Under generic conditions on coupling and nodal dynamics, we prove that the network admits a rigorous mean-field limit, resembling a Wilson-Cowan neural field equation. The state variables of the limiting systems are the mean and variance of neuronal activity. We select networks whose mean-field equations are tractable and we perform a bifurcation analysis using as control parameter the diffusivity strength of the afferent white noise on each neuron. We find conditions for Turing-like bifurcations in a system where the cortex is modelled as a ring, and we produce numerical evidence of noise-induced spiral waves in models with a two-dimensional cortex. We provide numerical evidence that solutions of the finite-size network converge weakly to solutions of the mean-field model. Finally, we prove a Large Deviation Principle, which provides a means of assessing the likelihood of deviations from the mean-field equations induced by finite-size effects.


[38] 2505.18470

Chemical classification program synthesis using generative artificial intelligence

Accurately classifying chemical structures is essential for cheminformatics and bioinformatics, including tasks such as identifying bioactive compounds of interest, screening molecules for toxicity to humans, finding non-organic compounds with desirable material properties, or organizing large chemical libraries for drug discovery or environmental monitoring. However, manual classification is labor-intensive and difficult to scale to large chemical databases. Existing automated approaches either rely on manually constructed classification rules, or are deep learning methods that lack explainability. This work presents an approach that uses generative artificial intelligence to automatically write chemical classifier programs for classes in the Chemical Entities of Biological Interest (ChEBI) database. These programs can be used for efficient deterministic run-time classification of SMILES structures, with natural language explanations. The programs themselves constitute an explainable computable ontological model of chemical class nomenclature, which we call the ChEBI Chemical Class Program Ontology (C3PO). We validated our approach against the ChEBI database, and compared our results against deep learning models and a naive SMARTS pattern based classifier. C3PO outperforms the naive classifier, but does not reach the performance of state of the art deep learning methods. However, C3PO has a number of strengths that complement deep learning methods, including explainability and reduced data dependence. C3PO can be used alongside deep learning classifiers to provide an explanation of the classification, where both methods agree. The programs can be used as part of the ontology development process, and iteratively refined by expert human curators.


[39] 2506.01608

EPFL-Smart-Kitchen-30: Densely annotated cooking dataset with 3D kinematics to challenge video and language models

Understanding behavior requires datasets that capture humans while carrying out complex tasks. The kitchen is an excellent environment for assessing human motor and cognitive function, as many complex actions are naturally exhibited in kitchens from chopping to cleaning. Here, we introduce the EPFL-Smart-Kitchen-30 dataset, collected in a noninvasive motion capture platform inside a kitchen environment. Nine static RGB-D cameras, inertial measurement units (IMUs) and one head-mounted HoloLens~2 headset were used to capture 3D hand, body, and eye movements. The EPFL-Smart-Kitchen-30 dataset is a multi-view action dataset with synchronized exocentric, egocentric, depth, IMUs, eye gaze, body and hand kinematics spanning 29.7 hours of 16 subjects cooking four different recipes. Action sequences were densely annotated with 33.78 action segments per minute. Leveraging this multi-modal dataset, we propose four benchmarks to advance behavior understanding and modeling through 1) a vision-language benchmark, 2) a semantic text-to-motion generation benchmark, 3) a multi-modal action recognition benchmark, 4) a pose-based action segmentation benchmark. We expect the EPFL-Smart-Kitchen-30 dataset to pave the way for better methods as well as insights to understand the nature of ecologically-valid human behavior. Code and data are available at this https URL


[40] 2507.06407

GloBIAS: strengthening the foundations of BioImage Analysis

There is a global need for BioImage Analysis (BIA) as advances in life sciences increasingly rely on cutting-edge imaging systems that have dramatically expanded the complexity and dimensionality of biological images. Turning these data into scientific discoveries requires people with effective data management skills and knowledge of state-of-the-art image processing and data analysis, in other words, BioImage Analysts. The Global BioImage Analysts' Society (GloBIAS) aims to enhance the profile of BioImage Analysts as a key role in science and research. Its vision encompasses fostering a global network, democratising access to BIA by providing educational resources tailored to various proficiency levels and disciplines, while also establishing guidelines for BIA courses. By collaboratively shaping the education of BioImage Analysts, GloBIAS aims to unlock the full potential of BIA in advancing life science research and to consolidate BIA as a career path. To better understand the needs and geographical representation of the BIA community, a worldwide survey was conducted and 291 responses were collected across people from all career stages and continents. This work discusses how GloBIAS aims to address community-identified shortcomings in work environment, funding, and scientific activities. The survey underscores a strong interest from the BIA community in activities proposed by GloBIAS and their interest to actively contribute. With 72% of respondents willing to pay for membership, the community's enthusiasm for both online and in-person events is set to drive the growth and sustainability of GloBIAS.


[41] 2508.12260

Mantis: A Simulation-Grounded Foundation Model for Disease Forecasting

Infectious disease forecasting in novel outbreaks or low resource settings has been limited by the need for disease-specific data, bespoke training, and expert tuning. We introduce Mantis, a foundation model trained entirely on mechanistic simulations, which enables out-of-the-box forecasting across diseases, regions, and outcomes, even in settings with limited historical data. Mantis is built on over 400 million simulated days of outbreak dynamics spanning diverse pathogens, transmission modes, interventions, and surveillance artifacts. Despite requiring no real-world data during training, Mantis outperformed 39 expert-tuned models we tested across six diseases, including all models in the CDC's COVID-19 Forecast Hub. Mantis generalized to novel epidemiological regimes, including diseases with held-out transmission mechanisms, demonstrating that it captures fundamental contagion dynamics. Critically, Mantis is mechanistically interpretable, enabling public health decision-makers to identify the latent drivers behind its predictions. Finally, Mantis delivers accurate forecasts at 8-week horizons, more than doubling the actionable range of most models, enabling proactive public health planning. Together, these capabilities position Mantis as a foundation for next-generation disease forecasting systems: general, interpretable, and deployable where traditional models fail.