New articles on Quantitative Biology


[1] 2606.05189

Bio-plausible Neuromorphic Disturbance Observer Based on Emulation Theory: Extended Version

Biological neural systems achieve remarkable robustness and adaptability in uncertain environments through sparse, event-driven spike-based information processing and adaptive regulation. Inspired by this paradigm, this paper develops a neuromorhpic disturbance observer (NDO) and control framework that replaces conventional continuous-time signal representations with spike-timing encoding. Both disturbance estimates and control inputs are constructed via integrate-and-fire (IF) neuron dynamics from discrete spike events, yielding intrinsically event-driven updates. An adaptive-threshold triggering mechanism is inspired by spike-frequency adaptation (SFA), enabling history-dependent regulation of spike generation. Simulation results demonstrate that the proposed framework achieves neurally inspired robustness and adaptability, while the adaptive-threshold spiking scheme reduces spike events to 42.6% of the fixed-threshold case under noisy conditions.


[2] 2606.05196

Uniform sampling of canalizing Boolean functions reveals hidden biases in Boolean network analysis

Boolean networks are widely used to model gene regulatory systems, where ensembles of Boolean functions serve as null models for assessing structural and dynamical properties. A common approach generates canalizing and nested canalizing functions by sampling their defining parameters uniformly at random. However, because multiple parameterizations can represent the same Boolean function, this induces a non-uniform distribution over distinct functions and systematically biases random ensembles. Here, we develop efficient algorithms for uniform sampling of Boolean functions with prescribed exact or minimal canalizing depth that correct this bias. Our approach combines dynamic programming for sampling canalizing layer structures with rejection-based methods and is implemented in BoolForge. We show that the sampling scheme substantially affects commonly studied function-level metrics. Under traditional parameter-uniform sampling, the expected average sensitivity of nested canalizing functions equals one independent of the number of variables. In contrast, under function-uniform sampling, the expected sensitivity increases with system size and numerically approaches approximately 1.183. This discrepancy arises from an exponential suppression of high-sensitivity functions under parameter-based sampling. These differences propagate to Boolean network models, affecting conclusions about robustness, stability, attractor structure, and baseline dynamical expectations. Revisiting 122 published Boolean gene regulatory network models, we show that function-uniform null models reveal a substantially stronger enrichment of low-sensitivity canalizing architectures than previously inferred. Our results demonstrate that widely used null models systematically underestimate baseline sensitivity and can therefore distort assessments of the stabilizing role of canalization in biological networks.


[3] 2606.05198

An accurate nucleic acid-small molecule docking framework via geometric deep learning with large-scale pretraining

Nucleic acids are increasingly recognized as therapeutic targets beyond conventional protein-centered drug discovery, yet accurate and efficient docking of small molecules to nucleic acid structures remains challenging. Physics-based docking methods often show limited accuracy and efficiency, whereas deep learning approaches are constrained by the scarcity of experimentally resolved nucleic acid-ligand complexes. Here, we present NucleoDock, a deep learning framework for nucleic acid-small molecule docking. To address data scarcity, NucleoDock combines physics-guided large-scale pretraining on millions of docking-generated synthetic complexes with fine-tuning on curated experimental co-crystal structures. It further integrates sequence- and structure-informed nucleotide representations with atomistic three-dimensional features to capture both biological context and binding-site geometry. A mixture density network-based geometric scoring head is used to model conditional interaction-distance distributions for pose ranking. On an external benchmark of 125 nucleic acid-ligand complexes, NucleoDock achieved a top-1 success rate of 56 percent at an RMSD cutoff of 2.0 Angstrom, outperforming rDock with 29 percent, while generating 100 poses in approximately 5 seconds per complex. Retrospective virtual screening on the ROBIN benchmark further showed improved early enrichment. NucleoDock represents a step toward bridging the methodological gap between protein- and nucleic acid-directed computational drug discovery.


[4] 2606.05206

Ontology-constrained multi-LLM scoring of hypothesis support in the predictive processing literature

Fragmentation is common in interdisciplinary fields with diverse methods and theoretical commitments. Predictive coding neuroscience is a clear example: its literature spans computational theory, electrophysiology, imaging, behavior, and modeling, creating a synthesis problem that conventional meta-analysis cannot easily resolve. Here, we describe a local multi-LLM pipeline for ontology-constrained literature synthesis. The pipeline reads papers, extracts evidence, incorporates figure descriptions, assembles constrained prompts, and validates outputs against an expert glossary. We manually defined a predictive-coding glossary of thirty-six concepts grouped into three hypotheses: predictive suppression, feedforward error propagation, and ubiquity. A council of ten local language models scored 31 studies according to their agreement or disagreement with each glossary factor across local and global oddball contexts. This enabled pairwise study-agreement analysis, cross-model comparison, and three-dimensional hypothesis-space mapping. Agreement was high for some hypotheses but weaker for others, revealing structured disagreement, particularly across local versus global oddball paradigms. We further define hypothesis-space temperature, a geometric dispersion metric measuring how compactly studies occupy the hypothesis space. Temperature was lower for local oddball contexts and higher for global oddball contexts, indicating greater dispersion in the latter. The scoring geometry also allowed us to estimate vectors of change between experimental contexts. These results demonstrate that local multi-LLM councils can produce auditable disagreement measurements that map heterogeneous literatures into quantitative evidence spaces. This framework may generalize to cross-study hypothesis mapping where conventional meta-analysis lacks a common comparison space.


[5] 2606.05225

The Language of Elution: Autoregressive Prediction of the Next Feature in Untargeted LC-HRMS Lipidomics

Untargeted liquid chromatography-high-resolution mass spectrometry (LC-HRMS) detects thousands of molecular features per sample, yet only 2-20% receive confident structural annotations. A root cause of this "dark metabolome" is that tandem MS/MS acquisition is reactive: instruments select precursors only after ions appear, blind to what elutes next. We reframe chromatographic elution as an autoregressive sequence prediction task. Because reversed-phase elution order is governed by hydrophobicity, successive features form a physically constrained sequence, like tokens in language. We discretize the mass-to-charge (m/z) axis into 110 bins and train long short-term memory (LSTM) and Transformer models to predict the next eluting m/z bin from five annotation-free per-token features: m/z bin, mass defect, retention-time gap, polarity, and intensity rank. Trained on 15,242 features from four clinical lipidomics cohorts (342 plasma samples; SCIEX TripleTOF 6600+, Waters CSH C18), the LSTM reaches 98.4% top-1 accuracy (99.99% top-5; mean absolute error 3.6 Da) and the Transformer 98.0%. Ablation shows autoregressive context accounts for 55.5 percentage points while no single feature contributes more than 0.2 pp: the sequential pattern, not molecular properties, drives prediction. Models transfer across instruments sharing the method (r=0.999 on an independent Agilent 6530 dataset) but fail under a different column chemistry (5.1% top-1) or polarity mode (2.6%), confirming method- and mode-specificity. Fine-tuning on as few as two to five quality-control injections recovers held-out accuracy from 2.6% to nearly 50%, so cross-condition deployment needs minimal calibration. These results establish that elution sequences are highly predictable and lay the groundwork for predictive MS/MS acquisition to improve annotation coverage in untargeted metabolomics.


[6] 2606.05227

Quantifying the biophysical properties of stomatocytes in health and disease

Hereditary stomatocytosis (HS) comprises red blood cell (RBC) disorders characterized by cup-shaped erythrocytes that respond oppositely to splenectomy: curative in overhydrated HS (OHS) but potentially thrombogenic in dehydrated HS (DHS/xerocytosis). This paradox persists because RBC biomechanics is governed by partly independent parameters--shear modulus, bending rigidity, surface-to-volume ratio (S/V), and cytoplasmic viscosity--that existing assays capture only piecemeal. Here we combine dissipative particle dynamics (DPD) simulations with microfluidic imaging to construct a control discocyte and three stomatocyte models (ST-RBC1-3) at fixed membrane area and decreasing volume (109.7, 101.5, 89.8 fL), spanning the OHS-to-DHS range. Tracing this parameter set through five mechanically orthogonal assays, we find that interendothelial-slit (IES) traversal is geometry-dominated: overhydrated ST-RBC1 requires an order of magnitude higher critical pressure than healthy RBCs, whereas dehydrated ST-RBC3 passes freely. ST-RBC3 nonetheless suppresses membrane tank-treading and raises low-shear whole-blood viscosity by ~29% at physiological haematocrit, comparable to Gaucher-disease hyperviscosity. A funnel-obstacle chip amplifies these differences into a label-free centerline-offset signal predicted to separate all four RBC types (~4.5 standard deviations between extreme phenotypes). These results unite single-cell mechanics, splenic filtration, and hemorheology in one framework, resolve the splenectomy paradox, and point toward microfluidic pre-operative risk stratification in HS.


[7] 2606.05474

AlloGen: Conformation-Selective Binder Generation with Differential State Scoring

Protein binder design has largely optimized for affinity alone, leaving conformational selectivity unaddressed: for allosteric targets such as kinases, nuclear receptors, and GPCRs, a binder that engages both active and inactive states provides no functional specificity regardless of how tightly it binds. We introduce AlloGen, a modular framework that decouples backbone generation from a learned state-selectivity scorer $Q_\theta$, an SE(3)-invariant interface graph transformer trained via a two-phase curriculum that first learns interface geometry before imposing conformational discrimination. Because $Q_\theta$ is fully differentiable and generator-agnostic, it integrates with any backbone generator as a passive reranker or an active gradient-based guide without retraining. Across a diverse benchmark of proteins spanning multiple families and conformational mechanisms, AlloGen consistently identifies binders that preferentially recognize desired structural states while rejecting alternative conformations. Experimental validation on calmodulin further demonstrates that these computational selectivity signals translate to physical molecules, yielding de novo peptides that bind the desired holo conformation while exhibiting no detectable binding to the apo state. Together, these results establish conformational selectivity as a learnable property and provide a general framework for state-selective protein binder design.


[8] 2606.05870

Cross-scale spatially-aware generative modeling of transcriptomic programs underlying neurodegenerative brain organization

Neurodegenerative disorders such as Alzheimer's disease exhibit highly organized patterns of regional brain vulnerability, yet the biological mechanisms underlying this spatial selectivity remain incompletely understood. Existing imaging-transcriptomic studies have largely relied on correlation-based analyses between gene expression and neuroimaging phenotypes, limiting their ability to model how molecular organization gives rise to neurodegeneration. Here, we introduce a cross-scale spatially-aware generative framework for modeling transcriptomic programs underlying cortical neurodegeneration. Regional transcriptomic profiles were derived from the Allen Human Brain Atlas using 910 landmark genes across 68 cortical regions. Neurodegenerative vulnerability maps were constructed from ADNI FreeSurfer cortical thickness measurements by computing regional cortical thinning differences between cognitively normal controls (NC = 926) and Alzheimer's disease subjects (AD = 426). A variational generative architecture was used to learn latent biological programs linking regional gene-expression organization to cortical degeneration while incorporating graph-based spatial smoothness regularization to preserve cortical organization. The proposed framework achieved strong prediction of regional neurodegenerative vulnerability, yielding an explained variance of 0.8604 and a significant spatial correlation between predicted and observed cortical degeneration profiles (r = 0.9439, p < 0.001). The learned latent representations revealed structured transcriptomic organization associated with distributed disease susceptibility. These findings demonstrate that biologically constrained generative modeling can bridge microscale molecular organization with macroscale neurodegeneration, providing a foundation for spatially-aware generative neurobiology and computational neuroscience.


[9] 2606.05918

Federated SPARQL querying for genomic variant functional annotation

Sensitive health data should preferentially be analysed on site. In typical bioinformatics workows, public databases are duplicated and used by specialised tools to enrich the local datasets. In the case of genomic variation data, this process is called variant annotation. In this session we demonstrate variant annotation using federated SPARQL queries. We rst overview how clinico-genomic data can be modelled as a knowledge graph (KG), leveraging state-of-the-art biomedical ontologies. We then perform variant annotation by querying UniprotKB, a massive curated KG for gene and proteins. Our approach avoids public data duplication while maintaining genomic data on site and aligning it with FAIR principles. Our use-case is based on the ICAN project, a research program aimed at studying the physiopathology of cerebral berry aneurysms.


[10] 2606.05980

On the Promises and Limits of Multi-omics Integration for Deconvolution: The HADACA3 Benchmark

Understanding the cellular composition of complex tissues, such as tumors, is a key challenge in biology and medicine. A common approach, known as deconvolution, aims to estimate the cellular composition from bulk molecular measurements. With the growing availability of multiple types of molecular data, it is often assumed that combining data sources should improve deconvolution performance. Here, we present HADACA3, a community-driven benchmark designed to evaluate this assumption. We conducted a four-day collaborative competition followed by a large-scale computational benchmark, testing more than 250,000 analysis pipelines across nine datasets with matched DNA methylation (DNAm) and RNA profiles, representing a wide range of biological and experimental conditions. Our framework jointly evaluates the impact of preprocessing, feature selection, modeling, and integration strategies. We find that DNAm alone achieves the highest median performance across datasets, making it the most stable and reliable single-modality approach. However, multi-omics integration strategies can regularly achieve higher top performance in specific datasets and pipeline configurations. Among the tested strategies, late integration based on error-weighted averaging provides a strong and reliable baseline, while non-linear early integration methods, such as optimal transport, show promising results on real biological datasets. Overall, our results show that multi-omics integration does not systematically improve average performance over DNAm alone, but can improve best-case performance in specific settings. This highlights a trade-off between robustness and peak performance, and emphasizes the importance of aligning integration strategies with the statistical properties of the data. All data, code, and evaluation tools are publicly available to support reproducible research and future method development.


[11] 2606.06117

$p$-adic Bi-Filtrations for Topological Machine Learning on Genomic Sequences

We introduce pVR, a topological machine learning framework for alignment-free genomic sequence classification that combines $p$-adic numbers with topological data analysis. Each DNA sequence is encoded along two complementary axes: a $p$-adic distance on $k$-mer prefixes, which captures hierarchical positional structure, and a compositional $L_1$ distance on $k$-mer frequencies, which captures local sequence content. The two distances jointly parameterise a bi-filtered Vietoris--Rips complex, and per-sequence topological summaries from this bi-filtration serve as features for standard machine learning classifiers. We establish theoretical guarantees for the construction: stability under metric perturbations and invariance to the choice of prime, alongside a result that explains why a single $p$-adic axis is topologically uninformative and why the bi-filtration recovers nontrivial homology. On twelve genomic benchmarks ($28$ to $500$ sequences, $3$ to $7$ classes), pVR outperforms four established alignment-free baselines on three of six low-sample datasets, with gains of up to $21$ percentage points; it underperforms only on a SARS-CoV-2 variant benchmark whose point-mutation divergence violates the hierarchical assumption, and all methods saturate in the large-sample regime. pVR also outperforms zero-shot frozen embeddings from the 500M-parameter Nucleotide Transformer v2 by $6.7$ to $11.4$ percentage points on three low-sample benchmarks. The pVR codebase is publicly available at this https URL.


[12] 2606.06290

Early psychosis shows deviations in scaling behaviour within a critical regime

Accumulating evidence suggests that large-scale brain activity exhibits scale-invariant dynamics consistent with operation in a near-critical regime. Such dynamics have been associated with long-range correlations, efficient information processing, and the emergence of collective organization. While altered criticality-related measures have been reported in psychiatric disorders, previous findings remain fragmented across observables and modalities, making it unclear whether different scaling measures capture a common alteration of large-scale brain dynamics. Here, we investigated scaling properties in resting-state fMRI data from individuals with early psychosis and healthy controls. We combined a phenomenological renormalization group (PRG) framework with power spectral density (PSD) and detrended fluctuation analysis (DFA) to characterize collective dynamics across scales. In healthy controls, resting-state activity exhibited non-trivial scaling behavior consistent with critical-like organization. Early psychosis participants showed the same overall phenomenology of scale-invariant organization, but with systematic shifts in scaling exponents across multiple observables. These findings indicate that early psychosis is not characterized by a simple loss of critical-like dynamics, but rather by a reorganization of collective dynamics within a preserved scaling regime. More broadly, our results suggest that combining coarse-graining approaches with temporal scaling analyses provides a principled framework for studying large-scale brain dynamics in psychiatric disorders.


[13] 2606.06424

Intrinsic Computational Functionalism: From Observer-Relative Maps to Observer-Independent Structures

Anti-computational arguments show that externally imposed computational interpretations cannot ground consciousness, but they do not establish that all computational organisations are observer-relative. We develop intrinsic computational functionalism: the view that, if consciousness is computationally constituted, it depends on physically realised computational structures the system has in virtue of itself rather than on labels imposed by an external interpreter. Two criteria operationalise this view. (C1) System-intrinsic instantiation: the relevant property must be specifiable without an observer's labelling, and invariant under structure-preserving relabellings of the system's variables. (C2) Causal-dynamical organisation under intervention: the property must be grounded in a state-space structure whose variables mutually constrain one another, and whose organisation is exhibited in counterfactual response under intervention. Together these criteria specify what any candidate computational account must satisfy to remain observer-independent, without selecting which intrinsic structures bear on experience. The argumentative core is a three-tier decomposition of identification work: interpreter-relative label selection (tier i), theoretically constrained partition selection (tier ii), and dynamics-internal grain selection (tier iii). We argue that any computational property capable of avoiding the observer-relativity objection must be identified, if at all, through tier (iii) dynamics-internal grain selection, conditional on empirically disciplined tier (ii) choices. Syntax-is-not-semantics arguments, mapmaker arguments, and the observer-relativity component of biological-naturalist objections succeed against views that locate the consciousness-relevant property at tier (i); once the tiers are distinguished, intrinsic computational functionalism survives.


[14] 2606.06434

rsx: A high-performance streaming toolkit for RAD-seq sex determination

Restriction site-associated DNA sequencing (RAD-seq) is widely used to discover sex-linked markers in non-model organisms, but large studies produce marker tables with millions of RAD tags. RADSex provides the reference workflow for building marker-by-individual depth tables and testing sex-biased marker distributions, but its depth, merge, and related table-building commands grow memory-hungry, and its standard output reports frequentist calls with no posterior evidence and no direct Python or C integration. We present rsx, a Rust implementation of the complete RADSex command set that preserves marker-table semantics and command-line compatibility. rsx combines 2-bit DNA keys, parallel ingestion, memory-mapped marker tables, external sorting, bitset group counts, and streamed Gram-matrix PCA so that memory stays bounded by the number of individuals or by explicit buffers. It adds conjugate Beta-Binomial Bayes factors and posterior probabilities under XY and ZW hypotheses, returning strict, posterior-supported, and Bayes-factor-only evidence grades. A portable, libm-independent minimax approximation of the error function keeps the chi-squared tail reproducible across platforms without changing the underlying Yates test. On four real RAD-seq datasets comprising 41.9 billion bases and 29 million markers, rsx reproduced published RADSex v1.2.0 calls, achieved an 8.38-fold geometric-mean speedup across 56 paired timings (2.77-fold for FASTQ processing), and recovered every Bonferroni-significant positive-control marker. In Danio albolineatus, treated as null in the source publication, the posterior layer surfaced 30 W-linked marker hypotheses; in Notothenia rossii it withheld 400 Bayes-factor-only rows compatible with a low-prevalence null. Python bindings, a C API, and a reproducibility archive provide the workflows used for all reported numbers. rsx is released under GPL-3.0-or-later.


[15] 2606.05327

Multimarginal flow matching with optimal transport potentials

Flow matching (FM) has emerged as a powerful framework for learning dynamic transport maps between two empirical distributions. However, less explored is the setting with intermediate observed marginals that can help constrain the flows between the endpoints. This "multimarginal" regime is central to modeling temporal evolution in dynamical systems in many scientific domains that can sample sequential distributions. We tackle this problem with a novel approach that leverages the connection between FM and dynamic optimal transport (OT), softly steering the flow towards the intermediate marginals through potential terms in the dynamic OT action. By extending the conditional FM learning target to incorporate these potentials, we derive an efficient, simulation-free algorithm for multimarginal FM that offers considerable flexibility in the spatiotemporal dynamics of the learned flows. We demonstrate state-of-the-art performance and training efficiency of OT-potential FM (OTP-FM) on diverse single-cell RNA sequencing, oceanographic, and meteorological datasets. Our code is available at this https URL.


[16] 2606.05351

Tricriticality and chaos in a generalized Allee-logistic map

We present a novel nonlinear dynamical model, the generalized Allee-logistic (GAL) map given by $x_{t+1} = r x_t (1 - x_t) G(x_t)$ where $G(x_t) = m (x_t - h) + 1 - m$ incorporates the Allee effect with magnitude $m$ and threshold $h$. The case $m = 0$ yields the logistic map with a continuous transition to extinction. Conversely, $m = 1$ recovers a previously studied model that undergoes only a discontinuous extinction-to-active transition. Between these extremes, the GAL map exhibits nontrivial phenomena, including tricriticality with a closed-form expression for the tricritical point and a universal crossover function. Under a small external input, we verify Widom-like relations. We also note that the Allee effect disfavors the onset of chaos. Our work establishes additional bridges between analytically tractable chaotic maps, nonequilibrium tricriticality, and Allee effects.


[17] 2606.05541

Methods for Inferring Interaction Potentials from Cross-Linking Mass Spectrometry Data

Cross-linking mass spectrometry (XL-MS) has emerged as a powerful quantitative technique for probing intra-protein structural information as well as protein-protein interactions at an unprecedented scale. XL-MS data yield information on the pairwise spatial proximity of proteins through inter-molecular linkers. However, systematic methods for adapting such data for coarse-grained interacting particle models remain limited. Predominant focus is put on directly fitting radial distribution functions (RDFs), while numerous observables, e.g. coordination numbers, which are functionals of the RDF, cannot be uniquely inverted. In this work, we develop a framework for parameterizing interaction potentials from such observables in potentially phase-separated mixtures, as encountered in XL-MS results. We establish a connection between this problem and the inverse Henderson problem and adapt algorithms such as Iterative Boltzmann Inversion and Iterative Monte Carlo to its numerical solution. We derive exact and low-density limit gradient approximations and propose two new algorithms based on an adaptation of the predictor-corrector~framework. In total, we evaluate several optimization algorithms on biologically realistic ten-component test systems. We demonstrate that for homogeneous fluids, all methods achieve exceptional efficiency and accuracy. Critically, we further demonstrate successful parametrization in a challenging three-phase system. Here, three algorithms, namely Adam and gradient descent employing the low-density derivative as well as Newton's method with the exact gradient, reliably recover the correct parameters. These results establish a clear pathway from XL-MS experiments to coarse-grained protein models for systems where phase separation governs biological function, potentially enabling new investigations of biomolecular condensates and protein aggregation.


[18] 2606.06345

Boosting Brain-to-Image Decoding with TRIBE v2 Data Augmentation

Brain decoding is limited by the availability of labeled neural data, and remains challenging in low-data regimes. To address this issue, we investigate whether and when brain decoding can be boosted by augmenting small fMRI datasets with synthetic data generated by a pretrained model of fMRI responses to stimuli. We use TRIBE v2, a large encoding model pretrained on more than 1000 hours of fMRI responses to video, audio and language. For each dataset, we evaluate systematic grids that show how the performance of image decoders varies with the amount of synthetic data used for training. Our results, based on two datasets (the 7T fMRI Natural Scenes Dataset and 3T fMRI BOLD5000), show up to 68% improvement in Top-10 image-retrieval accuracy compared to decoders trained only on real data. Importantly, the proportion of augmented data required to reach a given image decoding performance needs to be adjusted depending on the data source. Surprisingly, image decoders trained exclusively on synthetic fMRI can perform above chance in some settings, suggesting that TRIBE v2 can support zero-shot brain-to-image decoding. Together, these results show how large-scale models of the fMRI responses to sight, sound and language may provide a foundation to improve the data efficiency for image decoding.


[19] 2506.11152

HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data

Single-cell transcriptomics and proteomics have become a great source for data-driven insights into biology, enabling the use of advanced deep learning methods to understand cellular heterogeneity and gene expression at the single-cell level. With the advent of spatial-omics data, we have the promise of characterizing cells within their tissue context as it provides both spatial coordinates and intra-cellular transcriptional or protein counts. Proteomics offers a complementary view by directly measuring proteins, which are the primary effectors of cellular function and key therapeutic targets. However, existing models either ignore the spatial information or the complex genetic and proteomic programs within cells. Thus they cannot infer how cell internal regulation adapts to microenvironmental cues. Furthermore, these models often utilize fixed gene vocabularies, hindering their generalizability unseen genes. In this paper, we introduce HEIST, a hierarchical graph transformer foundation model for spatial transcriptomics and proteomics. HEIST models tissues as hierarchical graphs. The higher level graph is a spatial cell graph, and each cell in turn, is represented by its lower level gene co-expression network graph. HEIST achieves this by performing both intra-level and cross-level message passing to utilize the hierarchy in its embeddings and can thus generalize to novel datatypes including spatial proteomics without retraining. HEIST is pretrained on 22.3M cells from 124 tissues across 15 organs using spatially-aware contrastive and masked autoencoding objectives. Unsupervised analysis of HEIST embeddings reveals spatially informed subpopulations missed by prior models. Downstream evaluations demonstrate generalizability to proteomics data and state-of-the-art performance in clinical outcome prediction, cell type annotation, and gene imputation across multiple technologies.


[20] 2512.23661

Dynamical incompatibilities in paced finger tapping experiments

Paced finger-tapping tasks are used to probe the error correction mechanism underlying sensorimotor synchronization. Despite their century-long history, fundamental contradictions persist in the literature. One such contradiction arises when comparing the two most common types of period perturbation: step change and phase shift. The stimulus sequence is exactly the same up to and including the (unexpected) perturbed stimulus. Why then would the timing of the next response be different between perturbation types, as observed? We show, both experimentally and theoretically, that responses to both types of perturbation are dynamically incompatible when recorded in separate experiments; that is, they cannot be described by a single underlying dynamical system due to the build-up of different temporal contexts. In contrast, when both types of perturbation are presented randomly within the same experiment, the responses become compatible and can be explained by a single mechanism. We conclude that a single underlying dynamical system can represent the response to all perturbation types, signs, and sizes, which is nevertheless calibrated by temporal context. Our results challenge the established idea of phase and period correction processes that are separately activated for different perturbation types.


[21] 2604.04285

Amplification at Equilibrium: Structural and Thermodynamic Limitations, and Implementation

Amplifying weak molecular signals is essential in both natural and engineered biochemical systems. While most amplification schemes operate out of equilibrium, relying on kinetic barriers and fuel-driven cascades, it is also possible to amplify at thermodynamic equilibrium by shifting the energy landscape upon addition of an analyte. Equilibrium amplification is appealing because, in principle, it can remain indefinitely in the untriggered state. In this work, we establish fundamental structural and thermodynamic limits on equilibrium-based amplification. We first prove that dimerization networks--systems restricted to complexes of at most two monomers--are inherently incapable of equilibrium amplification. This no-go theorem explains the absence of amplification in prior undercomplementary "strand commutation" designs. We then show that allowing trimeric complexes breaks this barrier. We propose an isometric trimer-based amplifier whose output preserves the size of the input, enabling modular composition, and validate it experimentally, achieving an amplification factor close to the expected $2\times$. Finally, we derive universal thermodynamic bounds applicable to any equilibrium network regardless of complex size: the maximum amplification factor scales linearly with the free energy of interaction between the analyte and the amplifier components. For nucleic acid systems, this implies that the analyte length must grow linearly with the desired amplification factor, and that composing modular amplifiers yields diminishing returns for a fixed analyte. Together, these results delineate the structural and energetic boundaries of equilibrium amplification and rigorously justify the necessity of out-of-equilibrium approaches for achieving high gain.