The accurate prediction of protein-RNA binding affinity remains an unsolved problem in structural biology, limiting opportunities in understanding gene regulation and designing RNA-targeting therapeutics. A central obstacle is the structural flexibility of RNA, as, unlike proteins, RNA molecules exist as dynamic conformational ensembles. Thus, committing to a single predicted structure discards information relevant to binding. Here, we show that this obstacle can be addressed by extracting pre-structural embeddings, which are intermediate representations from a biomolecular foundation model captured before the structure decoding step. Pre-structural embeddings implicitly encode conformational ensemble information without requiring predicted structures. We build ZeroFold, a transformer-based model that combines pre-structural embeddings from Boltz-2 for both protein and RNA molecules through a cross-modal attention mechanism to predict binding affinity directly from sequence. To support training and evaluation, we construct PRADB, a curated dataset of 2,621 unique protein-RNA pairs with experimentally measured affinities drawn from four complementary databases. On a held-out test set constructed with 40% sequence identity thresholds, ZeroFold achieves a Spearman correlation of 0.65, a value approaching the ceiling imposed by experimental measurement noise. Under progressively fairer evaluation conditions that control for training-set overlap, ZeroFold compares favourably with respect to leading structure-based and leading sequence-based predictors, with the performance gap widening as sequence similarity to competitor training data is reduced. These results illustrate how pre-structural embeddings offer a representation strategy for flexible biomolecules, opening a route to affinity prediction for protein-RNA pairs for which no structural data exist.
A major barrier to decentralized, near-patient diagnostics is the lack of a signal transduction modality that is both analytically precise and accessible at the point of care. Optical readouts remain instrument-dependent and difficult to miniaturize, while compact electrochemical readouts are prone to matrix-derived signal distortion, limiting their biomarker coverage in real clinical settings. Here, we define interfacial potential transduction as a standardized electrical modality for portable, clinical-grade diagnostics across diverse assay formats. A mechanistic framework identifying key sample matrix parameters within the interfacial potentials transduction system enables control of biofluid-derived interference, and is demonstrated in a widely accessible lateral flow immunoassay format through quantitative detection of estradiol, progesterone, and luteinizing hormone in human plasma with high correlation (r2 > 0.97) to clinical analyzers. Broader applicability across representative diagnostic sectors is further demonstrated through exceptional performance including glucose quantification for biochemical analysis with limit of detection (LOD) of 0.92 ug/dL, HIV p24 capsid protein under an immunomagnetic separation workflow (LOD = 44.8 fg/mL), and hepatitis B virus detection within 5 min via loop-mediated isothermal amplification for molecular diagnostics. Together, these results establish interfacial potentials transduction as a unified diagnostic paradigm for near-patient deployment beyond optical and electrochemical approaches.
Our subjective experience of color is typically described by abstract properties such as hue, saturation, and brightness that do not directly correspond to sensory signals arising from cones in the retina. Along the hue dimension, certain colors -- red, green, blue, and yellow -- appear unique in that they are not perceived as a combination of other colors, and the pairs red-green and blue-yellow appear opposites. However, the anatomical and physiological correlates of these 'unique hues' within the brain and the reason for their existence remain a mystery. Here, we demonstrate a direct connection between these hues and the statistics of the natural visual environment. Analysis of simulated cone responses on a dataset of 503 calibrated natural images reveals a strongly non-Gaussian distribution in 3D color space, with heavy tails in distinct, asymmetrically arranged directions. A sparse coding model is then adapted to this data so as to minimize the total sum of coefficients on the basis vectors for representing the data. A six basis-vector model converges to the four unique hues in addition to black and white. Moreover, we find that the nonlinear nature of inference in the sparse coding model yields both excitatory and inhibitory interactions among latent variables; the former facilitates combining adjacent pairs of unique hues to encode intermediate hues situated between them, while the latter enforces mutual exclusivity between opposite unique hues. Together, these findings shed new light on the distribution of color in the natural environment and provide a linking principle between this structure and the phenomenology of color appearance.
We derive a Riemannian metric on three-dimensional color space from the Fisher information of neural population codes in the visual pathway. Photoreceptor adaptation, retinal opponent channels, and cortical population encoding each map onto a geometric construction, producing a metric tensor whose components correspond to measurable neural quantities. The resulting 17-parameter model is fitted jointly to four independent threshold datasets: MacAdam's (1942) chromaticity ellipses, the Koenderink et al. (2026) three-dimensional ellipsoids, Wright's (1941) wavelength discrimination function, and the Huang et al. (2012) threshold color difference ellipses, covering 96 independently measured discrimination conditions across varied chromaticities and luminances. The joint fit achieves STRESS of 23.9 on MacAdam, 20.8 on Koenderink et al., 30.1 on Wright, and 30.8 on Huang et al.
Spatial patterning and synchronization are pervasive features of plankton communities, yet the mechanisms that allow such patterns to persist coherently under environmental noise remain unresolved. In vertically structured aquatic ecosystems, plankton populations are often organized into distinct layers, raising the question of how interactions between layers shape both spatial self-organization and robustness. Here, we develop a spatiotemporal ecosystem model of a two-layer plankton community to examine the role of passive diffusive coupling under stochastic environmental fluctuations. We show that interlayer diffusion induces a sharp transition from independent, layer-specific Turing patterns to fully synchronized spatial patterns once the coupling strength exceeds a critical threshold. Importantly, the same coupling mechanism markedly enhances the stability of spatial patterns against environmental noise, extending their persistence far beyond that of non-coupled layers. Moreover, we uncover a trophic hierarchy in noise sensitivity, with zooplankton exhibiting substantially greater vulnerability than phytoplankton. Together, these results identify passive diffusive coupling as a unifying mechanism that simultaneously promotes spatial synchronization and robustness, providing a mechanistic explanation for the persistence of coherent plankton patterns in fluctuating aquatic environments.
Understanding how animals move through heterogeneous landscapes is central to ecology and conservation. In this context, step selection functions (SSFs) have emerged as the main statistical framework to analyze how biotic and abiotic predictors influence movement paths observed by radio tracking, GPS tags, or similar sensors. A traditional SSF consists of a generalized linear model (GLM) that infers the animal's habitat preferences (selection coefficients) by comparing each observed movement step to random steps. Such GLM-SSFs, however, cannot flexibly consider non-linear or interacting effects, unless those have been specified a priori. To address this problem, generalized additive models have been integrated in the SSF framework, but those GAM-SSFs are still limited in their ability to represent complex habitat preferences and inter-individual variability. Here we explore the utility of deep neural networks (DNNs) to overcome these limitations. We find that DNN-SSFs, coupled with explainable AI to extract selection coefficients, offer many advantages for analyzing movement data. In the case of linear effects, they effectively retrieve the same effect sizes and p-values as conventional GLMs. At the same time, however, they can automatically detect complex interaction effects, nonlinear responses, and inter-individual variability if those are present in the data. We conclude that DNN-SSFs are a promising extension of traditional SSF. Our analysis extends previous research on DNN-SSF by exploring differences and similarities of GLM, GAM and DNN-based SSF models in more depth, in particular regarding the validity of statistical indicators that are derived from the DNN. We also propose new DNN structures to capture inter-individual effects that can be viewed as a nonlinear random effect. All methods used in this paper are available via the 'citoMove' R package.
Capturing dynamic spatiotemporal neural activity is essential for understanding large-scale brain mechanisms. Functional magnetic resonance imaging (fMRI) provides high-resolution cortical representations that form a strong basis for characterizing fine-grained brain activity patterns. The high acquisition cost of fMRI limits large-scale applications, therefore making high-quality fMRI reconstruction a crucial task. Electroencephalography (EEG) offers millisecond-level temporal cues that complement fMRI. Leveraging this complementarity, we present an EEG-conditioned framework for reconstructing dynamic fMRI as continuous neural sequences with high spatial fidelity and strong temporal coherence at the cortical-vertex level. To address sampling irregularities common in real fMRI acquisitions, we incorporate a null-space intermediate-frame reconstruction, enabling measurement-consistent completion of arbitrary intermediate frames and improving sequence continuity and practical applicability. Experiments on the CineBrain dataset demonstrate superior voxel-wise reconstruction quality and robust temporal consistency across whole-brain and functionally specific regions. The reconstructed fMRI also preserves essential functional information, supporting downstream visual decoding tasks. This work provides a new pathway for estimating high-resolution fMRI dynamics from EEG and advances multimodal neuroimaging toward more dynamic brain activity modeling.
We develop a novel ultrasound nasogastric tube (UNGT) dataset to address the lack of public nasogastric tube datasets. The UNGT dataset includes 493 images gathered from 110 patients with an average image resolution of approximately 879 $\times$ 583. Four structures, encompassing the liver, stomach, tube, and pancreas, are precisely annotated. Besides, we propose a semi-supervised adaptive-weighting aggregation medical segmenter to address data limitation and imbalance concurrently. The introduced adaptive weighting approach tackles the severe unbalanced challenge by regulating the loss across varying categories as training proceeds. The presented multiscale attention aggregation block bolsters the feature representation by integrating local and global contextual information. With these, the proposed AAMS can emphasize sparse or small structures and feature enhanced representation ability. We perform extensive segmentation experiments on our UNGT dataset, and the results show that AAMS outperforms existing state-of-the-art approaches to varying extents. In addition, we conduct comprehensive classification experiments across varying state-of-the-art methods and compare their performance. The dataset and code will be available upon publication at this https URL.
Species complexes are groups of closely related populations exchanging genes through dispersal. We study the dynamics of the structure of species complexes in a class of metapopulation models where demes can exchange genetic material through migration and diverge through the accumulation of new mutations. Importantly, we model the ecological feedback of differentiation on gene flow by assuming that the success of migrations decreases with genetic distance, through a specific function $h$. We investigate the effects of metapopulation size on the coherence of species structures, depending on some mathematical characteristics of the feedback function $h$. Our results suggest that with larger metapopulation sizes, species form increasingly coherent, transitive, and uniform entities. We conclude that the initiation of speciation events in large species requires the existence of idiosyncratic geographic or selective restrictions on gene flow.
People make strategic decisions many times a day - during negotiations, when coordinating actions with others, or when choosing partners for cooperation. The resulting dynamics can be studied with learning theory and evolutionary game theory. These frameworks explore how people adapt their decisions over time, in light of how effective their strategies have been. The outcomes of such learning processes depend on how sensitive individuals are to the performance of their strategies. When they are more sensitive, they systematically favor strategies they deem more successful. When they are less sensitive, their learning process is noisier and more erratic. Traditionally, most models treat this sensitivity as a fixed parameter - like the "selection strength" parameter in evolutionary models. Instead, we study how strategies and sensitivities co-evolve. We find that the co-evolutionary endpoints depend on both the type of strategic interaction and the learning rule employed. In prisoner's dilemmas, we often observe sensitivities to increase indefinitely. But in snowdrift and stag-hunt games, sensitivities often converge to a finite value, or we observe evolutionary branching altogether. These results shed light on how evolution might shape learning mechanisms for social behavior. They suggest that noisy learning does not need to be a by-product of cognitive constraints. Instead, it can serve as a means to gain strategic advantages.
Boolean networks are a widely used modeling framework in systems biology for studying gene regulation, signal transduction, and cellular decision-making. Empirical studies indicate that biological Boolean networks exhibit a high degree of canalization, a structural property of Boolean update rules that stabilizes dynamics and constrains state transitions. Despite its central role, existing software packages provide limited support for the systematic generation of Boolean functions and networks with prescribed canalization properties. We present BoolForge, a Python toolbox for the random generation and analysis of Boolean functions and networks, with a particular focus on canalization. BoolForge enables users to (i) generate random Boolean functions with specified canalizing depth, layer structure, and related constraints; (ii) construct Boolean networks with tunable topological and functional properties; and (iii) analyze structural and dynamical features including canalization measures, robustness, modularity, and attractor structure. By enabling controlled generation alongside analysis, BoolForge facilitates ensemble-based investigations of structure-dynamics relationships, benchmarking of theoretical predictions, and construction of biologically informed null models for Boolean network studies. Availability and Implementation: BoolForge is implemented in Python ($\geq$3.10) and can be installed via \texttt{pip install boolforge}. Source code and documentation are available at this https URL. A PDF tutorial compendium is provided as Supplementary Material.
We present this http URL, an SE(3)-equivariant flow-matching model for pocket-aware 3D ligand generation with joint potency and binding affinity prediction and confidence estimation. The model supports de novo generation, interaction- and pharmacophore-conditional sampling, fragment elaboration and replacement, and multi-endpoint affinity prediction (pIC50, pKi, pKd, pEC50). Training combines large-scale ligand libraries with mixed-fidelity protein-ligand complexes, refined on curated co-crystal datasets and adapted to project-specific data through parameter-efficient finetuning. The base this http URL model achieves state-of-the-art performance in unconditional 3D molecule and pocket-conditional ligand generation. On HiQBind, the pre-trained and finetuned model demonstrates highly accurate affinity predictions, and outperforms recent state-of-the-art methods such as Boltz-2 on the FEP+/OpenFE benchmark with substantial speed advantages. However, we show that addressing unseen structure-activity landscapes requires domain adaptation; parameter-efficient LoRA finetuning yields marked improvements on diverse proprietary datasets and PDE10A. Joint generation and affinity prediction enable inference-time scaling through importance sampling, steering design toward higher-affinity compounds. Case studies validate this: selective CK2$\alpha$ ligand generation against CLK3 shows significant correlation between predicted and quantum-mechanical binding energies. Scaffold elaboration on ER$\alpha$, TYK2, and BACE1 demonstrates strong agreement between predicted affinities and QM calculations while confirming geometric fidelity. By integrating structure-aware generation, affinity estimation, property-guided sampling, and efficient domain adaptation, this http URL provides a comprehensive foundation for structure-based drug design from hit identification through lead optimization.
The NBSS (normalized biomass size spectrum) is a common, intuitive approach for the study of natural ecosystems. However, very few studies have been dedicated to verifying possible flaws and paradoxes in this widely used method. Evident points of concern of the NBSS method are 1.) the loss of variability due to binning and 2.) the use of intriguing non-biomass units (such as abundance units) on biomass spectra. The main objectives of this study were to verify, test and analyze the procedures involved in transformations that lead to the NBSS plot, and to check for the correctness of currently used units, while testing the hypothesis that NBSS indeed represents biomass, not abundance or biomass flux (dB/dM), while developing i.) a new conceptual framework, ii.) new terminology, iii.) a novel back-transformation method, iv.) high-resolution kernel density estimation (KDE) plots of the density distribution shape, and v.) a new calculation method for numerical values, dimensions, and units. Extensive tests with in situ and synthetic (simulated) data were used to compare the original biomass distributions with binned outputs. Original biomass units and dimensions are retained in the proposed robust 'bootstrapped, backtransformed, and normalized biomass spectrum' (bNBS). The combination of quantitative binning and non-parametric KDE intends to address the importance of intuitive, high-resolution, simple plotting methods and the relevance of avoiding binning artifacts and oversimplifications. If a standardized binning vector and units are used, the proposed bNBS may allow for a new approach of robust size spectra science, that allows for quantitative inter-comparisons of biomass across regions and time periods.
Fossil DNA preservation varies with depositional environments and diagenesis, producing fragments of heterogeneous origins and degradation states. We use first-principles biomolecular analysis to classify fossil molecular environments into four system types, distinguished by three orthogonal indicators: origin (H/h: host/heterologous), deamination status (D/d), and similarity ratio (S/s). Conventional aDNA pipelines assume a binary mix of endogenous host DNA and modern contaminants, overlooking multisource complexity from multiple species and time-averaged deposits. This leads to bias: authentic signals suppressed during enrichment, alignment, or damage filtering, and exogenous/ancient admixed fragments misassigned as endogenous, particularly in open systems. We introduce the HSF (Host/Species-specific Fragment) posterior traceability framework to address this. It treats fragments as primary units, maximizes source diversity, detects isolated sequences, defers lineage assignment to preserve uncertainty, and applies phylogenetic consistency to discriminate origins. Combined with preservation characterization (e.g., 3D imaging and volumetric openness assessment), it improves authenticity evaluation and reduces misassignment in mixed-signal samples. Case studies identify novel fossil DNA patterns (CRSRR and SRRA) and demonstrate superior performance compared with conventional methods. The HSF framework enhances aDNA reliability, extends molecular archaeology to challenging contexts, and aids genome evolution and lineage reconstruction.
Background: Autism spectrum disorder (ASD) is characterized by significant clinical and biological heterogeneity. Conventional group-mean analyses of eye movements often mask individual atypicalities, potentially overlooking critical pathological signatures. This study aimed to identify idiosyncratic oculomotor patterns in ASD using an "outlier analysis" of smooth pursuit eye movement (SPEM). Methods: We recorded SPEM during a slow Lissajous pursuit task in 18 adults with ASD and 39 typically developed (TD) individuals. To quantify individual deviations, we derived an "outlier score" based on the Mahalanobis distance. This score was calculated from a feature vector, optimized via Principal Component Analysis (PCA), comprising the temporal lag ($\Delta$t) and the spatial deviation ($\Delta$s). An outlier was statistically defined as a score exceeding $\sqrt{10}$ (approximately 3.16$\sigma$) relative to the TD normative distribution. Results: While the TD group exhibited a low outlier rate of 5.1%, the ASD group demonstrated a significantly higher prevalence of 38.9% (7/18) (binomial P = 0.0034). Furthermore, the mean outlier score was significantly elevated in the ASD group (3.00 $\pm$ 2.62) compared to the TD group (1.52 $\pm$ 0.80; P = 0.002). Notably, these extreme deviations were captured even when conventional mean-based comparisons showed limited sensitivity. Conclusions: Our outlier analysis successfully visualized the high degree of idiosyncratic atypicality in ASD oculomotor control. By shifting the focus from group averages to individual deviations, this approach provides a sensitive metric for capturing the inherent heterogeneity of ASD, offering a potential baseline for identifying clinical subtypes.
How do humans move? Advances in reinforcement learning (RL) have produced impressive results in capturing human motion using physics-based humanoid control. However, torque-controlled humanoids fail to model key aspects of human motor control such as biomechanical joint constraints & non-linear and overactuated musculotendon control. We present KINESIS, a model-free motion imitation framework that tackles these challenges. KINESIS is trained on 1.8 hours of locomotion data and achieves strong motion imitation performance on unseen trajectories. Through a negative mining approach, KINESIS learns robust locomotion priors that we leverage to deploy the policy on several downstream tasks such as text-to-control, target point reaching, and football penalty kicks. Importantly, KINESIS learns to generate muscle activity patterns that correlate well with human EMG activity. We show that these results scale seamlessly across biomechanical model complexity, demonstrating control of up to 290 muscles. Overall, the physiological plausibility makes KINESIS a promising model for tackling challenging problems in human motor control. Code, videos and benchmarks are available at this https URL.
Determining the binding pose of a ligand to a protein, known as molecular docking, is a fundamental task in drug discovery. Generative approaches promise faster, improved, and more diverse pose sampling than physics-based methods, but are often hindered by chemically implausible outputs, poor generalisability, and high computational cost. To address these challenges, we introduce a novel fragmentation scheme, leveraging inductive biases from structural chemistry, to decompose ligands into rigid-body fragments. Building on this decomposition, we present SigmaDock, an SE(3) Riemannian diffusion model that generates poses by learning to reassemble these rigid bodies within the binding pocket. By operating at the level of fragments in SE(3), SigmaDock exploits well-established geometric priors while avoiding overly complex diffusion processes and unstable training dynamics. Experimentally, we show SigmaDock achieves state-of-the-art performance, reaching Top-1 success rates (RMSD<2 & PB-valid) above 79.9% on the PoseBusters set, compared to 12.7-30.8% reported by recent deep learning approaches, whilst demonstrating consistent generalisation to unseen proteins. SigmaDock is the first deep learning approach to surpass classical physics-based docking under the PB train-test split, marking a significant leap forward in the reliability and feasibility of deep learning for molecular modelling.
We present ContagionRL, a Gymnasium-compatible reinforcement learning platform specifically designed for systematic reward engineering in spatial epidemic simulations. Unlike traditional agent-based models that rely on fixed behavioral rules, our platform enables rigorous evaluation of how reward function design affects learned survival strategies across diverse epidemic scenarios. ContagionRL integrates a spatial SIRS+D epidemiological model with configurable environmental parameters, allowing researchers to stress-test reward functions under varying conditions including limited observability, different movement patterns, and heterogeneous population dynamics. We evaluate five distinct reward designs, ranging from sparse survival bonuses to a novel potential field approach, across multiple RL algorithms (PPO, SAC, A2C). Through systematic ablation studies, we identify that directional guidance and explicit adherence incentives are critical components for robust policy learning. Our comprehensive evaluation across varying infection rates, grid sizes, visibility constraints, and movement patterns reveals that reward function choice dramatically impacts agent behavior and survival outcomes. Agents trained with our potential field reward consistently achieve superior performance, learning maximal adherence to non-pharmaceutical interventions while developing sophisticated spatial avoidance strategies. The platform's modular design enables systematic exploration of reward-behavior relationships, addressing a knowledge gap in models of this type where reward engineering has received limited attention. ContagionRL is an effective platform for studying adaptive behavioral responses in epidemic contexts and highlight the importance of reward design, information structure, and environmental predictability in learning. Our code is publicly available at this https URL