New articles on Quantitative Biology


[1] 2605.26183

What Molecular Structure Cannot Tell Us: A Taxonomy of Explainability Gaps in GNN-Based Drug Toxicity Prediction

Graph Neural Networks (GNNs) have emerged as a structurally natural approach for molecular toxicity prediction, operating directly on atomic connectivity without the information loss inherent to fixed-length fingerprints. However, the fraction of a drug's known pharmacological profile that is actually encodable in its molecular structure remains systematically underexplored. This study addresses this question through a systematic case study using acetylsalicylic acid (ASA, Aspirin) - one of the most comprehensively characterized drugs in pharmacology - as a model compound. A Message Passing Neural Network (MPNN) is trained on the Tox21 benchmark and GNNExplainer is applied to characterize atom-level attribution. Results indicate that molecular structure explains approximately 45% (5/11) of known ASA adverse effects. A four-category Gap Taxonomy (GAP-1 through GAP-4) is introduced distinguishing between principally non-encodable effects, data gaps arising from Missing Not At Random (MNAR) mechanisms, assay panel mismatches, and representation errors. The MNAR gap is empirically quantified via a systematic ChEMBL query (42 documented assays, 0 retrievable bioactivity entries). An attention pooling experiment localizes the representation error to the MPNN message passing layers rather than the aggregation step. The Gap Taxonomy has direct implications for drug safety signal detection workflows and regulatory frameworks including Good Pharmacovigilance Practice (GVP) guidelines and New Approach Methodologies (NAMs).


[2] 2605.26411

Fixation location in structured populations

In stochastic evolutionary dynamics, the replacement of an existing genotype or cultural trait by a newly introduced mutant is typically characterized by the quantities of fixation probability and fixation time. But in a structured population, the disappearance of a lineage occurs at a specific place. For evolutionary dynamics on graphs, we define the fixation location as the node occupied by the last wild-type individual immediately before mutant fixation. Conditional on fixation, this location is described by a probability distribution over the nodes of the graph. We study the fixation location for neutral evolution, for the colonization process, and, more generally, for constant selection on small graphs, cycles, tori, random graphs, and island populations. We find that the distribution of the fixation location is often highly nonuniform, depends strongly on the graph structure and the selection strength, and can differ sharply even when classical fixation statistics are similar. For many graphs, some nodes can never be fixation locations. Our results identify fixation location as a fundamental aspect of evolutionary dynamics and suggest new ways to understand, monitor, and potentially mitigate extinction events in biological and social settings.


[3] 2605.26551

Random neural networks match observed dimensionality of neural population recordings and motivate stronger experimental tests

Randomly connected neural networks have long served as a theoretical tool for studying collective dynamics in neural populations, yet quantitative comparisons to experiments remain limited. Recent technological advances have made it possible to resolve population-wide correlations across neurons, and minimal models such as random neural networks predict their generic structure. Whether the two agree quantitatively remains untested. In this work, we examine whether a minimally structured random neural network can account for the low dimensionality of activity in neural population recordings by building on recent developments in Dynamical Mean-Field Theory and incorporating two additional experimentally relevant features into the model: finite measurement time and variability across behavioral contexts. We show that, when these factors are included, the dimensionality measured from large-scale recordings is consistent with the values predicted by random models. However, current recording durations make it difficult to use dimensionality to discriminate among connectivity structures. We further show that analytically predicted dimensionality varies non-monotonically with external input strength, and that the orientation similarity between neural manifolds recorded under different behavioral contexts can be more sensitive to network structure than dimensionality is. Together, these results provide quantitative guidance for experimental design to infer the connectivity structure underlying population activity.


[4] 2605.26852

Recognizing Level-k-Based Phylogenetic Networks is NP-Complete

Phylogenetic networks generalize phylogenetic trees by representing reticulate evolution. Tree-based networks and their support trees have been extensively studied, but not all networks are tree-based. To measure how far such networks are from being tree-based, Suzuki and Hayamizu (2025) formulated the problem of finding the support network with minimum level of a given rooted almost-binary phylogenetic network. They conjectured that this problem is NP-hard and provided exponential-time algorithms. In this paper, we prove this conjecture by showing that, for every fixed integer $k \geq 1$, it is NP-complete to decide whether the minimum level is at most $k$.


[5] 2605.26856

The Sensation Modulating Network:Haltability as the architectural ground for object-directed phenomenology

Cognitive science remains split between cognitivism - which accounts for recursion and language but cannot ground formal symbols in meaning - and 4E approaches - which ground cognition in the body but rarely specify the body's architecture in enough detail to support generativity. We argue the impasse stems from an incomplete account of the embodied agent's architecture, and propose one: the Sensation Modulating Network (SMN), the cognitive agent conceived as the whole body, organized at every anatomical scale by opponent dynamics, built from Sensation Modulators that sense and act through one substrate, paired into Coordinated Action Zones routed by a body-wide broadcast network. Three commitments give the SMN its purchase. Haltability - the recruitment of antagonistic affordance into co-activated equilibrium - provides the architectural locus that object-directed phenomenology, in Husserl's sense, requires: opponency enables co-activation, co-activation enables halt, halt enables attention, attention enables intentional directedness, with no module added on top. The dual-signal property of self-modulatable action patterns (SMAPs) makes the self/world distinction a structural feature of the wiring rather than a category the agent applies. And a four-level action-pattern hierarchy - Basal, Haltable, Negotiable, Transactional - gives a single trajectory from autonomic regularity to public conventionalization, locating the conditions for grammar-grounded generativity as architectural transitions. The SMN reconciles the cognitivism-4E debate: recursion lives in the modifiable dynamics of Negotiable Action Patterns, embodiment in the opponent substrate that supports them. A tentative formalism and eight predicted registers (seven testable, one hypothetical), with reference simulations, are given in an appendix.


[6] 2605.26904

SpCAST: Decoding spatial transcriptomics at single-cell resolution with fast and interpretable analysis

Single-cell-resolution spatial transcriptomics profiles gene expression at cellular locations in native tissues, yet accurate cell-type annotation remains challenging: imaging-based platforms are constrained by targeted gene panels, whereas sequencing-based platforms often suffer from sparse molecular capture and dropout. Reliable transfer of cell-type labels from single-cell RNA sequencing references is therefore critical for interpreting targeted and sparse spatial datasets. Here, we present SpCAST, a Kolmogorov--Arnold network-based framework for reference-guided spatial transcriptomics analysis. SpCAST captures nonlinear mappings between reference and spatial expression profiles and uses feature attribution to prioritize genes supporting cell-type predictions. Within a unified framework, SpCAST performs cell-type label transfer, spatial gene-expression reconstruction and marker-gene candidate prioritization. We benchmarked SpCAST on 53 datasets comprising 413,376 spatial cells across five technologies and diverse tissue contexts. SpCAST achieved competitive annotation performance with reduced runtime relative to representative existing methods. Case studies demonstrated that SpCAST supports cross-species label transfer and candidate assignment of originally unlabeled cells. It also reconstructs marker-gene expression patterns with improved spatial concordance and prioritizes cell-type-associated marker genes. Together, these results support SpCAST as an efficient and interpretable framework for extracting cell-type and gene-level information from targeted and sparse single-cell-resolution spatial transcriptomics data.


[7] 2605.27100

Logistic dynamics of small populations with demographic stochasticity

We study an ecology-inspired model for a population of bounded size, whose dynamics is governed by random birth, death, and immigration events. Stochastic fluctuations in the number of individuals give rise to a succession of alternating active and vacant periods, where the population is respectively extant and extinct. Using both analytical and numerical techniques, we characterize the statistics of the two kinds of period, quantifying their duration and frequency, and the typical population sizes in active periods. In sharp contrast to the deterministic mean-field behavior, governed by logistic dynamics, active periods may exhibit pronounced bimodality: either short durations with very small populations, or much longer durations with population sizes close to the maximum. We also investigate how these results change when the population evolves on random networks of three classes: Erdős-Rényi, regular, and geographic. The main effect of the network structure is to induce population clustering, with individuals aggregated into localized groups. This, in turn, limits population growth and increases the frequency of vacant periods.


[8] 2605.26192

Co-folding model guided by structural proteomics

Protein structure generative models excel at predicting single protein static structures from sequence, but routinely fail to capture the correct conformational state of protein complexes, critical for protein design and induced proximity modalities such as antibodies and PROTACs. While structural proteomics techniques like Cross-Linking Mass Spectrometry (XL-MS) and Hydrogen-Deuterium Exchange (HDX-MS) offer valuable spatial and dynamic insights, integrating these sparse, heterogeneous measurements into these models remains an open challenge. Here, we bridge this gap by combining structural proteomics data with the rich biophysical priors learned by pretrained diffusion models. We introduce AIMS-Fold, an inference-time guided-diffusion framework that actively steers the generative sampling trajectory using differentiable physical potentials derived from XL-MS spatial restraints and HDX-MS solvent accessibility profiles. We demonstrate that these structural methods individually enhance predictive accuracy, and their integration yields synergistic improvement. Crucially, by leveraging these experimental restraints, AIMS-Fold achieves higher accuracy on challenging induced proximity targets than purely computational, unguided state-of-the-art models like Boltz-2. This establishes our framework as a powerful, integrative computational approach for the structure based drug design of induced proximity drugs. Evaluation code will be made publicly available upon publication.


[9] 2605.26690

Self-Improvement Imitation with Biologically Guided Search for Protein Design Under Oracle Budgets

Protein sequence optimization under tight oracle budgets requires methods that explore vast combinatorial spaces while making each evaluation informative. Existing reinforcement learning and off-policy generative approaches often degrade under surrogate noise, and position-agnostic mutation proposals risk disrupting functionally critical residues. We introduce SILO, a trajectory-level self-improvement imitation framework for oracle-budgeted protein design. SILO uses a hierarchical edit policy that decomposes each mutation into a position choice followed by a residue choice. In each active-learning round, the policy samples candidate trajectories via incremental stochastic beam search without replacement (SBS), and a UCB-based proxy ensemble, combined with an alanine-scan fitness score (AFS), selects candidates with functionally relevant edits for in silico oracle evaluation. The policy is then updated by next-action cross-entropy imitation on the round's best oracle-labeled trajectories, avoiding value-function estimation. Across eight reproduced protein fitness landscapes and five strong baselines from prior work, SILO achieves the highest maximum and top-100 mean fitness on 8 of 8 landscapes within our evaluations, often exhibiting faster early-stage improvement. In low-data and noisy-proxy stress tests on two landscapes per setting, SILO remains competitive or best when several baselines degrade. Ablations show that SBS with AFS account for much of the gains, with iterative imitation providing additional improvement. Code is available at: this https URL


[10] 2605.26758

Biophoton Emission from Palm during Meditation: A Multi-Method Complexity Analysis

Biophotons are ultra-weak photon emissions in the visible spectrum produced by living organisms. While extensively studied in plants, germinating seeds, and cell cultures, no systematic multi-method complexity analysis of human ultraweak photon emission (UPE) under physiological modulation has been reported. We address this gap by applying a comprehensive analytical framework to UPE measurements from the right palm of a human subject. Three independent sessions were conducted on different days, each comprising four consecutive 15-minute phases: Dark reference, pre-meditation resting state (Pre), structured meditation based on the Sama Vritti box-breathing protocol, and post-meditation recovery (Post). Photon count series are analysed with four complementary methods: distributional statistics (Fano factor, skewness, tail Expected Shortfall); multiscale Fano factor and Allan deviation; stripe-filtered Diffusion Entropy Analysis (DEA); and Renyi entropy with a Time Reversal test. The methods show complementary sensitivities, converging on a coherent picture: a systematic reduction of emission intermittency during meditation, consistently detected across all three sessions. Stripe-filtered DEA places the emission in the non-ergodic renewal regime with a Pre-to-Meditation decrease of the scaling exponent. Renyi analysis reveals two effects: reduced marginal amplitude burstiness (Tdir) and increased sequential pattern structure (Tseq), interpreted as entrainment to the Sama Vritti rhythm. These findings are consistent with cardiac complexity transitions during meditation reported by Tuladhar et al. and with EEG reorganization during Sama Vritti breathing by Zaccaro et al., suggesting a coordinated multi-channel physiological response. The results establish a proof-of-concept framework for complexity analysis of human UPE under physiological modulation.


[11] 2605.26921

Revealing the core dimensions underlying representations in brains, behavior and AI

The study of representations is widespread across fields, including neuroscience, psychology, and artificial intelligence. While representations are often studied and compared through similarities between stimuli, current methods provide only limited access to the dimensions that shape these representations and are often limited in interpretability. To overcome these challenges, here we introduce Similarity-Based Representation Factorization (SRF), a general computational method for recovering low-dimensional, non-negative, interpretable embeddings from similarity matrices derived from measured data. Across simulations and many neural, behavioral, and computational datasets, SRF recovers interpretable dimensions from diverse forms of representational data, even for very sparsely sampled, incomplete data. The dimensions derived from these datasets match those obtained by task-specific models, predict independent behavioral properties, improve exploratory analysis, and offer higher power for confirmatory hypothesis testing than comparing similarity matrices. Together, these results establish SRF as a general-purpose method with broad applications for uncovering, understanding, and leveraging the dimensions underlying representations.


[12] 2605.26973

Signal-to-Noise Ratio and Sample Size Govern Representational Alignment in Neural Networks

Neural networks are known to develop latent representations that are $aligned$, namely structurally similar across networks trained with different architectures, training protocols, or training datasets. We study this phenomenon in a controlled setting, where we train an ensemble of networks on regression and classification tasks using training sets perturbed by independent realizations of a noise process. We show that the signal-to-noise ratio (SNR) and the training sample size influence the alignment in qualitatively similar ways in networks trained on real-world datasets and in an extremely simple $linear$ network with a single hidden layer, for which the alignment can be estimated analytically. Across linear and nonlinear networks, regression and classification tasks, and both synthetic and real-world data, we consistently observe that alignment varies monotonically with SNR but non-monotonically with training sample size. In particular, the alignment is minimized near the interpolation threshold, and a stronger alignment does not necessarily correspond to better generalization error. These findings reveal a non-trivial dependence of alignment on data quality and quantity, decoupled from generalization performance.


[13] 2605.26998

Probabilistic Recurrent Intention Switching Model

Inverse reinforcement learning (IRL) recovers reward functions from observed behavior, yet traditional methods assume a single stationary reward that cannot capture goal switching within an episode. Recent multi-intention IRL methods address this by segmenting trajectories, but model intention transitions as either a memoryless Markov chain or via manual state augmentation with a fixed history window. We propose the Probabilistic Recurrent Intention Switching Model (PRISM), which replaces both mechanisms with a lightweight recurrent network that maps observation history to a per-step intention distribution. We prove that the resulting EM objective decomposes exactly into independent per-intention reward subproblems, each solvable in closed form, yielding an $\mathcal{O}(nK)$ E-step with no variational approximation. We evaluate PRISM on a non-Markovian gridworld, a mouse labyrinth, and BridgeData~V2 robotic manipulation, the first large-scale robotic application of multi-intention IRL. Across all settings PRISM achieves the highest held-out log-likelihood while recovering nameable, temporally coherent intentions from unlabeled demonstrations, suggesting that discrete goal switching is present in both biological and artificial agents.


[14] 2605.27189

Beyond Binary: Speech Representations Across the Cognitive Score Hierarchy

This study examines the relationship between speech representations and the hierarchical structure of cognitive assessment in mild cognitive impairment. Utilizing 5,754 German neuropsychological assessment recordings, we evaluate six cognitive tasks across three score levels: task, domain, and global levels. We compare hand-crafted acoustic features with self-supervised learning (SSL) embeddings. Results show that although SSL representations generally outperform hand-crafted features at lower levels, this trend reverses for MCI classification. Furthermore, task-specific constraints influence performance: tasks with greater response freedom exhibit performance dilution as hierarchical levels increase, suggesting ``specialist'' representations, whereas the performance of highly structured tasks increases toward higher levels, suggesting ``generalist'' representations. These findings show links between task constraints and assessment hierarchy in automated clinical speech analysis.


[15] 2502.19063

Global population crisis scenarios predicted by a general nonlinear dynamical model

We show that a simple nonlinear differential equation (originally studied in the physics of disordered systems) is able to mathematically describe the global population growth over the past 12000 years. Different regimes of population growth since the early Neolithic until today are shown to be all solutions to the same nonlinear differential equation in its various limits. These also include the well-known Malthus (exponential) and Verhulst (logistic) growth regimes, as well as von Foerster's ``doomsday'' formula. All these limits correspond to neglecting higher-order terms in a more general nonlinear dynamic model described by the proposed nonlinear differential equation. While the older models may provide valid fittings to limited time intervals in the global population growth curve in time, their clearly approximate nature prevents them from being predictive over longer periods of time. The proposed comprehensive solution of the proposed model is instead well suited to provide predictions for future scenarios. These include a scenario where the global population could halve as early as 2064 under a deliberately conservative, worst-case assumption that carrying-capacity constraints become abruptly active today.


[16] 2512.08355

Joint economic and epidemiological modelling of alternative pandemic response strategies

In an emerging pandemic, policymakers need to make important decisions with limited information, for example choosing between a mitigation, suppression or elimination strategy. These strategies may require trade-offs to be made between the health impact of the pandemic and the economic costs of the interventions introduced in response. Mathematical models are a useful tool that can help understand the consequences of alternative policy options on the future dynamics and impact of the epidemic. Most models have focused on direct health impacts, neglecting the economic costs of control measures. Here, we introduce a model framework that captures both health and economic costs. We use this framework to compare the expected aggregate costs of mitigation, suppression and elimination strategies, across a range of different epidemiological and economic parameters. We find that for diseases with low severity, mitigation tends to be the most cost-effective option. For more severe diseases, suppression tends to be most cost effective if the basic reproduction number $R_0$ is relatively low, while elimination tends to be more cost-effective if $R_0$ is high. We use the example of New Zealand's elimination response to the Covid-19 pandemic in 2020 to anchor our framework to a real-world case study. We find that parameter estimates for Covid-19 in New Zealand put it close to or above the threshold at which elimination becomes more cost-effective than mitigation. We conclude that our proposed framework holds promise as a decision-support tool for future pandemic threats, although further work is needed to account for population heterogeneity and other factors relevant to decision-making.


[17] 2512.15534

Characterizing Open-Ended Evolution Through Undecidability Mechanisms in Random Boolean Networks

Discrete dynamical models underpin systems biology, but we still lack substrate-agnostic diagnostics for identifying finite-horizon dynamical signatures that may be relevant to open-ended evolution (OEE), such as the recurrent production of novel phenotypic states rather than rapid settling or unstructured noise. We introduce a simple, model-independent metric, {\Omega}, that summarizes the residence-time-weighted contribution of attractor cycle lengths across the sequence of recurrent episodes realized within a finite observation window. {\Omega} is zero for single-attractor dynamics and also vanishes for pure novelty without recurrence, while increasing when trajectories repeatedly enter multiple persistent cyclic phenotypes. Using Random Boolean Networks (RBNs) as a controlled testbed, we compare classical Boolean dynamics with biologically motivated non-classical mechanisms (probabilistic context switching, annealed rule mutation, paraconsistent logic, modal necessary/possible gating, and quantum-inspired superposition/paired-state coupling) under homogeneous and heterogeneous updating schemes. Our results support the view that undecidability-adjacent, state-dependent mechanisms -- implemented as probabilistic context switching, modal necessity/possibility gating, paraconsistent logic, or quantum-inspired correlated branching -- are enabling conditions for sustained novelty. At the end of our manuscript we outline a practical extension of {\Omega} to continuous/hybrid state spaces, positioning {\Omega} as a portable proxy for OEE in biological modeling and a guide for engineering evolvable synthetic circuits.


[18] 2601.20670

Noise-induced excitability: bloom, bust and extirpation in autotoxic population dynamics

Species populations often modify their environment as they grow. When environmental feedback operates more slowly than population growth, the system can undergo boom-bust dynamics, where the population overshoots its carrying capacity and subsequently collapses. In extreme cases, this collapse leads to total extinction. While deterministic models typically fail to capture these finite-time extinction events, we propose a stochastic framework, derived from an individual-based model, to describe boom-bust-extirpation dynamics. We identify a noise-driven, threshold-like behavior where, depending on initial conditions, the population either undergoes a ``boom'' or is extirpated before the expansion occurs. Furthermore, we characterize a transition between an excitable regime, where most trajectories are captured by the absorbing state immediately after the first bust, and a persistent regime, where most populations reach a metastable state. We show that this transition is governed by the noise strength and the ratio of environmental-to-population timescales. This framework provides a theoretical basis for understanding irreversible transitions in invasive species, plant succession, microbial dynamics, and the elimination of cancerous tumors.


[19] 2605.00025

MoDAl: Self-Supervised Neural Modality Discovery via Decorrelation for Speech Neuroprosthesis

Speech neuroprosthesis systems decode intended speech from neural activity in the absence of audible output, offering a path to restoring communication for individuals with speech-impairing conditions. Current approaches decode predominantly from motor cortical areas, discarding others -- such as area 44, part of Broca's area -- that may encode complementary linguistic information. We introduce MoDAl (Modality Decorrelation and Alignment), a framework that discovers complementary neural modalities through the interplay of two objectives in a shared projection space. A contrastive loss aligns each of several parallel brain encoders with the text embeddings of a pretrained large language model (LLM), while a decorrelation loss prevents the encoders from coalescing to duplicative representations. We prove that these objectives are in productive tension: Contrastive alignment induces transitive modality coalescence, which decorrelation must counteract for the framework to discover diverse neurolinguistic modalities. On the Brain-to-Text Benchmark '24, MoDAl reduces word error rate (WER) from 26.3% to 21.6% compared to the previous best end-to-end method, with the gain from incorporating previously discarded area 44 signals arising entirely from the decorrelation mechanism. Analysis of the discovered modalities reveals functional specialization: Encoders receiving area 44 input capture structural and syntactic properties (sentence length, grammatical voice, wh-words), consistent with the neurolinguistic understanding of Broca's area.


[20] 2605.22133

Atom-level Protein Representation Learning Improves Protein Structure Prediction

Recent advances in generative modeling show that pretrained representations can improve generation as conditioning features or alignment targets. Motivated by this, we study protein representations for predicting structures beyond conventional function annotation. We propose TriProRep, a structure-aware pretraining method that jointly models three aligned residue-level views: amino-acid identity, backbone geometry, and local full-atom geometry, discretely encoded via VQ-VAE tokenizers. By pretraining to recover original tokens from generator-corrupted views, TriProRep learns to distinguish plausible but incorrect cross-view augmentations from the original protein. We further introduce RepSP, a benchmark for evaluating protein representations in structure-predictive settings. RepSP tests three uses of representations: homodimer co-folding from apo-chain representations, residue-level prediction of homodimer-derived interaction properties, and representation-aligned monomer structure prediction. Across these tasks, TriProRep improves over sequence-only and prior structure-aware representation models, while maintaining competitive performance on conventional benchmarks.


[21] 2605.23111

Contextual Role Modulates Object Representational Geometry in the Human Brain

The human brain represents objects in a way that is both invariant across instances and flexible enough to support different contexts and tasks. Yet it remains unknown how object representations are dynamically remapped as the same object shifts across contextual roles. Here we combined fMRI with naturalistic movie viewing to investigate how the same objects are represented when they are passive elements in the scene versus the targets of goal-directed actions. When objects were action targets, they engaged a parietal action network centered in the supramarginal and postcentral gyri, while passive objects recruited a distributed occipito-temporal network involved in visual object recognition. Within the networks most strongly encoding objects in their respective contexts, representational geometry showed a double dissociation: target object representations were organized by action affordance and hand posture affordance dimensions, while passive object representations aligned with semantic dimensions. In addition, visual representational structure was invariant to context. Outside those context-specific brain networks, representational content retained context-invariance, indicating that flexibility and invariance operate at different levels of the same representational system. Together, these findings demonstrate neural remapping of object representational geometries in a manner that depends on moment-to-moment changes in the contextual relevance of objects within a naturalistic scene.


[22] 2503.21450

CMADiff: Cross-Modal Aligned Diffusion for Controllable Protein Generation

AI-assisted protein design has emerged as a critical tool for advancing biotechnology, as deep generative models have demonstrated their reliability in this domain. However, most existing models primarily utilize protein sequence or structural data for training, neglecting the physicochemical properties of this http URL, they are deficient to control the generation of proteins in intuitive conditions. To address these limitations,we propose CMADiff here, a novel framework that enables controllable protein generation by aligning the physicochemical properties of protein sequences with text-based descriptions through a latent diffusion process. Specifically, CMADiff employs a Conditional Variational Autoencoder (CVAE) to integrate physicochemical features as conditional input, forming a robust latent space that captures biological traits. In this latent space, we apply a conditional diffusion process, which is guided by BioAligner, a contrastive learning-based module that aligns text descriptions with protein features, enabling text-driven control over protein sequence generation. Validated by a series of evaluations including AlphaFold3, the experimental results indicate that CMADiff outperforms protein sequence generation benchmarks and holds strong potential for future applications. The implementation and code are available at this https URL.


[23] 2507.07295

Local imperfect feedback control in non-equilibrium biophysical systems enabled by thermodynamic constraints

How biological networks achieve robust control despite relying on imperfect, local information remains an important open question. Here, we identify thermodynamic constraints that can curtail non-equilibrium steady-state responses so severely that even crude, local feedback rules can achieve globally stable control without requiring precise network design or global information. Specifically, using Markov jump processes as a general framework for biophysical dynamics, we derive general non-equilibrium response constraints showing that for many classes of rate perturbations, steady-state responses have fixed signs across all driving strengths, so that near-equilibrium responses predict far-from-equilibrium behavior regardless of system complexity. These constraints clarify several biological phenomena: monotonicity is thermodynamically guaranteed whenever a perturbation acts on a single transition rate, and non-monotonic responses, as observed for example in transcription factor regulation, arise only when an input simultaneously modulates multiple rates. Even in this case, we identify a graph-theoretic concept termed ``coherence'' that allows for a restoration of monotonicity. We show how coherence naturally and generally emerges in classic biophysical models of adaptation, including E. coli chemotaxis, and transcription factor regulation when biological constraints on network parameterization are included. We next show that, within a control-theoretic framework, these constraints guarantee that simple linear feedback on small subsets of kinetic rates achieves globally stable tracking and adaptation without coordinated manipulation of many variables. For systems with one regulator, local stability implies global stability for arbitrary network topologies without fine tuning, revealing that non-equilibrium thermodynamics fundamentally constrains biochemical network responses.


[24] 2507.13762

MolPIF: A Parameter Interpolation Flow Model for Molecule Generation

Motivation: Structure-based drug design (SBDD) has advanced with deep generative models, but bridging the gap between continuous atomic coordinates and discrete atom types remains a challenge. Current approaches, such as diffusion and flow matching models, often fail to unify these heterogeneous modalities, relying on separate strategies or ill-fitting Euclidean metrics for discrete variables. This lack of a consistent framework limits generative models' ability to capture the geometric and chemical structure of protein-ligand complexes. Results: We present MolPIF, a parameter interpolation flow mechanism designed to unify the generation of continuous and discrete molecular variables. Unlike traditional flow models that operate in sample space, MolPIF interpolates between distributions in the parameter space, theoretically recovering Wasserstein-2 optimal transport for continuous coordinates and establishing Fisher-Rao geodesics for discrete atom types. We further incorporate a geometry-enhanced learning strategy to improve the capture of atomic contexts. Extensive evaluations on the CrossDocked2020 dataset demonstrate that MolPIF outperforms baselines in binding affinity, chemical validity, geometric fidelity and chemical space coverage. Additionally, MolPIF exhibits versatility in lead optimization and offers flexible prior distribution selection (such as Laplace), establishing a robust paradigm for SBDD. Availability: Source code is freely available at this https URL. Supplementary information: Supplementary data are available at Bioinformatics.


[25] 2512.05794

Mechanistic Interpretability of Antibody Language Models Using SAEs

Sparse autoencoders (SAEs) are a mechanistic interpretability technique that have been used to provide insight into learned concepts within large protein language models. Here, we employ TopK and Ordered SAEs to investigate autoregressive antibody language models, and steer their generation. We show that TopK SAEs can reveal biologically meaningful latent features, but high feature-concept correlation does not guarantee causal control over generation. In contrast, Ordered SAEs impose a hierarchical structure that reliably identifies steerable features, but at the expense of more complex and less interpretable activation patterns. These findings advance the mechanistic interpretability of domain-specific protein language models and suggest that, while TopK SAEs suffice for mapping latent features to concepts, Ordered SAEs are preferable when precise generative steering is required.


[26] 2603.16801

TAMP-OS: An Open-Source Workflow for Tactile 3D-Printable Lithographs

Describe an animal without using the verb look. Can you effectively provide an alternative method for interpreting complex microscopy images while preserving the length scale? The world is filled with features too small for our eyes to see: the setae on a gecko's feet, the cuticles covering a rat's whisker, or the fuzziness of a bat's wing. Furthermore, these structures are non-homogeneous, often shifting from stiff to soft. We provide a workflow for producing low-data, low-cost, and open-source lithograph files, allowing tactile accessibility in microscopy images. The lithographs made with this workflow can be printed on a 350 USD 3D printer using 3D files under 100 Mb, for a total cost per print of 0.75 USD. This work seeks to leverage advanced 3D printing to create tactile graphics and art that make science more accessible and enable tactile exploration of biological structures. This framework in this text is aligned with a GitHub repository that will be constantly updated, allowing tactile media to be created as 3D printing and lithography become more streamlined in the years to come.