New articles on Quantitative Biology


[1] 2606.06516

Probabilistic learning to perform pre-onset individualised prediction of disease severity: application to Veno Occlusive Disease

We advance a new probabilistic supervised learning approach that permits reliable, automated, and early individualised prediction of the severity with which a disease will develop in a prospective patient. The prediction capacity is illustrated via the pre-transplant prediction of the score of severity of Veno Occlusive Disease (or VOD) in the digital twin (DT) of the considered prospective patient, where this score parametrises the severity with which VOD will develop in this patient, after they undergo their Bone Marrow Transplant. The learning of the relationship between the pre-transplant variables, and a severity score variable is undertaken by modelling this relationship as a (random) function that is treated as a sample function of an adequately-chosen stochastic process. The parameters of this underlying process are learnt using a training dataset that is generated using the real-time evolution of retrospective patients in a cohort, with this training dataset subsequently augmented in size by a probabilistic inverse learning of the score of prospective patients. The augmented training set, then permits the learning of the function that capacitates - at the pre-transplant stage - automated prediction of the score of the severity of VOD that characterises the DT of a physical patient in their unique pre-transplant state. This score is subsequently fed back to the real prospective patient as the severity with which VOD will develop in them, after this patient undergoes their transplant. Such a score then permits the treating Haematologist-Oncologists to decide on the treatment regimen, which in this illustration reduces to deciding on treating the patient with Defibrotide. An AI facility is developed to undertake such automated prediction, with the physician inputting the data on the pre-transplant state that characterises the DT of the prospective patient under consideration.


[2] 2606.06537

DSU-Net: An Attention-Enhanced Dense Skip U-Net for Breast Lesion Segmentation in Mammographic Images

Breast cancer remains one of the leading causes of cancer-related mortality among women worldwide, making early detection essential for effective treatment. Mammography is the primary screening modality; however, accurate delineation of suspicious lesions remains challenging and subject to inter-observer variability. Automated segmentation methods can assist radiologists by providing consistent and efficient lesion localization. This study presents DSU-Net, an attention-enhanced Dense Skip U-Net architecture for automated breast lesion segmentation in mammographic images. The proposed framework integrates dense skip connections and attention mechanisms to improve feature propagation, preserve spatial information, and enhance lesion boundary delineation. Experiments were conducted using the Curated Breast Imaging Subset of the Digital Database for Screening Mammography (CBIS-DDSM). To address severe foreground-background imbalance, a composite loss function combining Dice loss, focal loss, and binary cross-entropy loss was employed during training. The proposed model achieved a Dice Similarity Coefficient of 0.9421, an Intersection over Union of 0.8905, an accuracy of 0.9711, and an AUC-ROC of 0.9878 on the validation dataset. Qualitative evaluation demonstrated accurate delineation of lesions with varying sizes and morphologies, while quantitative results confirmed robust discrimination between lesion and background regions. These findings demonstrate that DSU-Net provides accurate and reliable breast lesion segmentation in mammographic images and highlights the potential of attention-guided deep learning for computer-aided breast cancer screening and diagnosis.


[3] 2606.06562

Iterative AI-guided optimisation of selective triple-drug combinations for breast cancer

Personalised cancer therapy aims to tailor treatment to individual tumour profiles, yet tumour heterogeneity and adaptive resistance continue to limit clinical efficacy. Drug combinations offer a strategy to overcome resistance by simultaneously targeting multiple pathways, but their rational design is constrained by the vast combinatorial search space and experimental cost. Here, we present an AI-guided, QSAR-driven iterative optimisation framework that integrates machine learning with automated experimental screening to enable closed-loop discovery of selective multi-drug therapies. Starting from an initial random screen, the system iteratively predicts, tests, and refines three-drug combinations targeting MCF7 breast cancer cells. Incorporation of non-tumorigenic MCF10A cells enables explicit optimisation of tumour-selective efficacy, prioritising regimens that maximise cancer cell killing while sparing healthy cells. Across successive iterations, the framework rapidly enriched for highly selective, high-efficacy combinations, while maintaining chemical and mechanistic diversity and avoiding convergence on a narrow solution space. By continuously learning from experimental feedback, the approach efficiently navigates millions of combinations to identify a small set of validated, tumour-selective regimens. These results establish a scalable proof-of-concept for AI-driven, closed-loop optimisation of higher-order drug combinations, demonstrating how iterative integration of computation and experimentation can enable adaptive and potentially personalised therapeutic design in precision oncology.


[4] 2606.06749

Deterministic access to global viral sequence data enables robust agentic scientific discovery

Public viral genome resources such as the National Center for Biotechnology Information (NCBI) Virus database are central to outbreak response, evolutionary analysis, vaccine design, and genomic surveillance. Yet many high-value retrieval workflows remain optimized for interactive use rather than deterministic, reproducible programmatic interfaces. This creates a challenge for Large Language Model (LLM)-based scientific agents, where errors in metadata interpretation, filtering logic, or retrieval can propagate into incorrect datasets. To evaluate agentic viral data retrieval, we built VirBench, a manually curated benchmark of 120 queries spanning diverse pathogens, taxonomic levels, and metadata filters. When autonomous AI systems, including Biomni, Claude, GPT, and Edison Analysis, were tasked with these queries without a dedicated retrieval layer, performance varied widely: mean accuracy ranged from 16.9% for Claude Sonnet 4 to 91.3% for GPT-5.5, with newer frontier models showing progress but residual errors remaining consequential. To address this, we built gget virus, a deterministic query framework that formalizes NCBI Virus-style filtering as a reproducible programmatic system. By staging retrieval, applying metadata constraints before sequence download, and retrieving structured GenBank records, gget virus reduces data transfer by more than 98% for high-volume queries while preserving exact-match semantics. Instructing autonomous AI systems to use gget virus increased accuracy to at least 90.0% across all evaluated systems and up to 99.7% for GPT-5.5, improved response stability to 0.92-1.00, reduced error magnitude, and generally decreased runtime and tool calls. Together, this work establishes deterministic data access as critical infrastructure for reliable agentic science and provides a reproducible retrieval layer for robust human- and AI-driven viral genomics workflows.


[5] 2606.06889

From Genomes to Algorithms: Neural Network Applications for Palimpsest Detection in Medieval Manuscripts

Biocodicology, the study of biological information preserved in manuscripts, offers new opportunities to examine parchment as both a textual and biological artefact. This study applies non-destructive sampling to isolate and sequence mitochondrial genomes (mtGenomes) from a 14th-century manuscript, Ms. Codex 1629, which contains both single-use and palimpsested folios. We sought to evaluate whether palimpsest preparation, including chemical washing, compromised DNA integrity and whether computational methods could aid in identifying reused parchment. DNA sequencing revealed that both single-use and palimpsested parchments retained sufficient mtGenomes for analysis, with no significant differences in genome coverage or depth. To assess the potential of computational biology in manuscript studies, we implemented machine learning classifiers, including logistic regression and neural networks, to distinguish palimpsests from single-use folios. Models achieved high precision but exhibited reduced recall for the minority palimpsest class, reflecting dataset imbalance. While additional ancient mtGenome samples from palimpsest are required and further testing is needed, this study demonstrates how integrating molecular biology and neural networks highlights new approaches for palimpsest detection and underscores the evolving role of data science in biocodicology.


[6] 2606.07301

Structure-guided taxonomic placement of divergent RNA viruses with ViraClass

Metatranscriptomic sequencing has expanded our knowledge of the RNA virosphere far more rapidly than novel viruses can be taxonomically classified. Taxonomic assignment above the family level is particularly difficult because the RNA-dependent RNA polymerase (RdRp) is often the only gene retained across RNA viruses yet exhibits little sequence similarity among highly divergent viruses. Here we show that RdRp protein structure retains taxonomic signal at evolutionary depths where RdRp primary sequence similarity has largely collapsed, and that the organization of this signal is consistent with the current ICTV hierarchy. Based on this, we developed ViraClass, a hierarchical framework for RNA virus taxonomic placement that uses RdRp structure for rank-by-rank assignment from phylum to genus, stopping at the deepest rank supported by confidence thresholds, and calibrated structural clustering for viruses that remain outside existing reference space. Across random-split, prospective and taxonomic hold-out benchmarks, ViraClass outperforms sequence-based and genome-content baselines. The largest gains emerge at deep evolutionary distances, in benchmarks that withhold entire families, orders or classes from the reference, where sequence-based methods lose most of their signal. In challenging boundary cases such as the Flaviviridae, ViraClass's structure-based placements capture the taxonomic boundary tensions highlighted by recent phylogenetic studies. When applied to a large collection of previously unclassified RdRp sequences, ViraClass places high-confidence queries into existing phyla and organizes the remainder into compact structural groups. ViraClass therefore provides a scalable approach from large-scale virus discovery to hierarchical taxonomic interpretation, particularly at the deep evolutionary ranges that current sequence-based pipelines cannot reach.


[7] 2606.07336

Fixed point compositionality via low-rank gluing rules in inhibition-dominated threshold-linear networks

Brains routinely generate highly flexible and complex behaviors on a relatively stable structure and limited resources. A key mechanism underlying this ability is compositionality, which allows the brain to efficiently decompose complex tasks into simpler, reusable primitives. While network modularity has often been linked to compositionality in biological and artificial networks, a rigorous mathematical characterization of this relationship in nonlinear networks is still lacking. In this work, we formally investigate how structural modularity supports functional compositionality in inhibition-dominated threshold-linear networks (TLNs). We introduce a novel class of modular network assembly called low-rank gluings, where component subnetworks with arbitrary internal connectivity are connected via specific low-rank couplings. We prove that the global fixed points of these networks are constrained to be combinations of the local fixed points of their constituent modules. For a more structured subclass, called rank-1 gluings, we provide a complete characterization that determines which combinations of local fixed points yield global ones. We apply these results to graph-based networks, extending fixed point decomposition rules from combinatorial threshold-linear networks (CTLNs) to the more flexible family of generalized CTLNs (gCTLNs), thereby proving that these structural rules are more robust than initially posited. Finally, we demonstrate that these gluing rules provide a mathematically tractable recipe for engineering compositional dynamics, enabling the construction of networks with a combinatorially large repertoire of predictable attractors that can be understood from simpler component motifs, ranging from compositions of fixed points to compositional limit cycles.


[8] 2606.07372

Nullclines, Subnullclines and the Asymptotic and Transient Attractors in Eco-Evolutionary Dynamics

In the demographic framework, mortality payoff function describes the cost of an interaction and fertility payoff function describes its reward. So while mortality cost depends on opponent's strategy, fertility reward can be affected by the density-dependent juvenile recruitment survival. This motivates an analysis of the eco-evolutionary dynamics of the classical Hawk-Dove game. It is shown that the stable and unstable equilibria (determined by the intersections of frequency and density nullclines) are connected by heteroclinic orbits, which attract nearby trajectories. The resulting bundle of trajectories leads to the discovery of the so-called subnullcines (manifolds placed between frequency and density nullcline) before they converge to the stable rest point. The initial isolated system is then extended by adding environmental seasonality (periodic background mortality), which acts as an external factor. This leads to complex cycling behavior and the subnullclines act as barriers to the propagation of the perturbation (resilience/resistance threshold). Thus, in a way, this paper completes, yet extends, previous works on the eco-evolutionary dynamics of games with demographic payoffs.


[9] 2606.06509

Which Anatomy Matters Under Limited Labels? A Data-Efficient Anatomy-Aware Benchmark for Cardiac Pathology Prediction

Numerous medical imaging problems must be solved under limited labels and constrained compute, yet it remains unclear whether performance gains are driven mainly by more expressive models or by better representation of clinically meaningful anatomy. We study this question through a low-data anatomy-aware benchmark for 5-class cardiac pathology prediction on the public ACDC MRI dataset. Using segmentation-derived patient descriptors from the right ventricle, myocardium, and left ventricle, we compare anatomy-specific and multi-structure representations across linear, kernel, and tree-based classifiers. We find that under limited label settings, representation dominates complexity. These results suggest that in resource-constrained healthcare settings, identifying and representing the most informative anatomy may matter more than the increasing complexity of the model alone.


[10] 2606.06647

The Identity Trap in EEG Foundation Models: A Diagnostic Audit

Objective. EEG foundation models (FMs) report strong accuracy on clinical resting-state EEG. However, high accuracy under subject-disjoint cross-validation remains ambiguous: it can reflect a genuine clinical biomarker, or subject-identity features that correlate with the label. We name this the Identity Trap and ask whether it can be diagnosed at the representation level before fine-tuning. Approach. We propose FMScope, a frozen-representation protocol packaging five diagnostics: variance decomposition, subject-axis erasure, aperiodic 1/f ablation, layer-wise label probing, and within-subject direction consistency. We apply it to three pretrained FMs (LaBraM, CBraMod, REVE) across four datasets in a 2x2 layout: subject relation of label x presence of a consensus cross-subject EEG marker. Main results. (i) The Identity Trap is universal: frozen subject-variance is 13-89x a random null in 12/12 pairs, rising in all 12 under fine-tuning (+10 to +63 pp). This dominance is a removable linear axis: erasing it improves label decoding where the label varies within subject (+6 to +12 pp in primary cells; +4 to +27 pp across external cohorts). (ii) Aperiodic 1/f is one subject carrier: removing it drops the subject probe by 9-19 pp on LaBraM and CBraMod. REVE saturates subject identity without measurable aperiodic dependence. (iii) Fine-tuning amplifies label-variance only in cells with a literature-established cross-subject marker. Significance. The Identity Trap is a physically-grounded instance of shortcut learning: the preferred cue has a measurable physiological component, and subject-disjoint splitting alone cannot rule it out. FMScope separates gains reflecting a biological marker from those reflecting subject identity.


[11] 2606.06717

ShallowBench: Benchmarking Generative Drug Design Models on Shallow-Pocket Targets

While generative AI models have demonstrated remarkable success in structure-based drug design, they predominantly rely on deep binding pockets and struggle to sample effective ligands for challenging low-pocketability targets, such as the historically "undruggable" oncology targets KRAS and MYC. To address this gap, we introduce ShallowBench, a strictly curated benchmark of 5,780 shallow-pocket targets extracted from CrossDocked2020. By computing the difference between an Alpha Shape "lid" volume and the underlying protein atom voxel volume, we successfully isolated targets with low concavity while ensuring sufficient surface area for binding. Evaluating various state-of-the-art generative models reveals weaker predicted binding affinity on these low-concavity interfaces. ShallowBench therefore provides a rigorous benchmark for generative biology models and highlights the necessity of new architectural innovations or loss functions capable of navigating these challenging targets.


[12] 2606.06811

Dependencies and Dataflow in Seed-Filter-Extend Pipelines

Comparing genomes is critical for discovering mutations, tracking evolutionary lineages, and advancing cross-species genomics. Fundamentally, this reduces to an O(n^2) string-matching dynamic programming (DP) problem, a challenge that has driven decades of performance research. However, executing a strict O(n^2) DP algorithm is computationally intractable for genomes spanning millions to billions of base pairs. Consequently, modern aligners rely on global heuristics to identify thousands of candidate similarity regions between species. Unfortunately, these methods are burdened by complex serial dependencies. Once candidate regions are identified, the pipeline executes localized DP alignments, which introduce their own non-trivial heuristics and irregular data dependencies. While parallelizing dense, two-dimensional DP is a well-studied problem, accelerating this end-to-end pipeline is significantly more challenging. Parallelizing across candidate regions and offloading irregular, heuristic-laden local alignments to modern hardware (such as GPUs) remains a major hurdle. In this work, we address the challenge of overcoming these serial bottlenecks by optimizing the global pipeline across regions. We take inspiration from four papers: LASTZ, SegAlign, Darwin-WGA, and SNAP, synthesizing findings across each to inform optimizations, which we either prototype or implement directly in LASTZ.


[13] 2606.06834

The Dark Regulome: Disentangling Predictability from Regulation in Genomic Foundation Models

High-grade gliomas integrate into neural circuits through functional synapses with neurons, raising the question of which noncoding elements shape synaptogenic gene expression in tumor cells. The regulatory program written across the dark genome, what we call the $\textit{dark regulome}$, is the natural substrate to probe, and sequence foundation models offer a zero-shot route through in-silico mutagenesis (ISM); yet likelihood-based scoring is tautologically coupled to local sequence predictability, leaving the regulatory interpretation underdetermined. Across three architecturally distinct foundation models (Caduceus-Ph, HyenaDNA, Enformer) and 30,448 dark genome elements at 92 glioma-relevant loci, we introduce a residualization-and-permutation diagnostic that separates predictability-driven from regulation-driven RIS variance. A sharp 10kb proximal-regulatory horizon survives every control we apply, but the LM-derived element-class hierarchy does not: a six-feature linear baseline matches Caduceus top-decile membership at AUC $= 0.985$. Cross-architecture decomposition cleanly separates a sequence-predictability layer (the two language models co-rank long well-predicted transposable elements) from a regulatory-output layer (Enformer alone retains residual cCRE-discriminative signal), with literally zero overlap between the two top-100 lists. Conservation, brain cis-eQTL, and STRING-PPI cross-checks then anchor what biology survives: top-100 elements across all three models are $3.3\times$ enriched per model for matching brain eQTLs ($p_\mathrm{emp} < 5\times 10^{-3}$), while a tempting transposable-element regulatory layer and a striking NRXN1+NLGN1 protein-pair convergence both fail proper permutation tests once those tests are constructed. We deliver the diagnostic as a general methodological tool for any ISM-based regulatory study.


[14] 2606.07181

RETROSPECT: RETROsynthesis via Sequential Prediction, and Chemically Transformed-ranking

Single-step retrosynthesis needs both accurate first-ranked suggestions and candidate lists that are rich enough for downstream selection. We study this as a proposal-selection decomposition. Our system, RETROSPECT, combines a single Transformer proposal model, which we call the ChemAlign Transformer, with a LambdaMART reranker over structural, reaction-template, upstream-score, and optional DFT-derived descriptors. The generator is trained with hybrid root-aligned and random SMILES augmentation, Pre-LayerNorm, tied embeddings, exponential moving average weights, and a differentiable atom-balance auxiliary loss. On the full USPTO-50K test set of 5,007 reactions, the generator reaches 55.00% top-1 and 86.18% top-10 exact-match accuracy with 99.86% top-1 validity. On the merged candidate-pool benchmark used for reranking, which contains 5,007 test products and about 111 candidates per product, a LambdaMART model trained on the structural feature set reaches 59.4% top-1 with 0.7171 mean reciprocal rank. Feature ablations show that upstream proposal score and template-frequency statistics provide most of the reranking signal, while DFT and reaction-center DFT features provide smaller and less consistent gains. These results support a modular view of retrosynthesis: stronger single-model proposal and learned candidate selection are complementary, and the proposal model can serve as a drop-in component for ensemble systems such as RetroChimera (Maziarz et al., 2024)


[15] 2606.07258

CaliPPer: quantifying, predicting and improving AI model performance for binding prediction

Binding prediction models accelerate therapeutic antibody and TCR discovery, but their performance on new datasets is unpredictable, often leading to low discovery rates. Density-ratio methods (PAPE, M-CBPE) provide label-free performance estimation for binary classification, but their assumptions and aggregate-only outputs limit binding prediction on neoepitopes, antigen variants and chemical scaffolds. Here we present CaliPPer (Calibration and Prediction of Performance), a post-hoc framework pairing a multi-chain Sample-to-Domain Distance (S2DD) with distance-aware Bayesian recalibration, operating at three resolutions: generalisability score, aggregate performance prediction, and per-sample confidence. Across ten models, eight architectures and two immune-receptor domains, CaliPPer attains distance--performance correlations $|r|=0.80\text{--}0.92$, predicts AUROC/AP/F1 with mean absolute errors $0.008\text{--}0.070$, and improves AUROC by up to $+0.20$ on unseen epitopes/variants. Applied retrospectively to five published TCR, BCR, MHC--peptide and small-molecule studies, CaliPPer raises true discovery rates in all five (e.g.\ $0/5 \to 3/5$ confirmed neoantigens), providing a triage layer between computational prediction and experimental validation.


[16] 2606.07413

A Nine-Compartment Nonlinear Epidemic Model with Spline-Based Identification of Time-Varying Transmission and Vaccination Dynamics: Application to the COVID-19 Third Wave in Italy

We develop a nine-compartment nonlinear epidemic model incorporating two co-circulating viral strains (ancestral I1 and the Alpha variant B.1.1.7 I2, which is 43-90% more transmissible, c2=1.5), a super-spreader subpopulation, partial vaccine-induced immunity with waning, and explicit hospitalization dynamics with differentiated mortality. Transmission and vaccination rates are treated as time-varying control inputs and identified from Italian COVID-19 data (January-May 2021) via a Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) control-node parameterization, reducing calibration to a fourteen-variable Sequential Quadratic Programming (SQP) problem with monotonicity and box constraints. A parametric bootstrap (n=1000) quantifies parameter uncertainty. The calibrated model achieves R^2=0.966 for active hospitalizations, R^2=0.987 for cumulative fatalities, and R^2=0.999 for cumulative vaccinations. Well-posedness, the basic reproduction number in closed form, and local and global stability of the disease-free equilibrium are established analytically. An L-infinity approximation error bound shows that the PCHIP control-node parameterization converges to the true time-varying parameters at rate O(h^2) as the node spacing vanishes. Local identifiability and a noise stability bound are established via the Fisher information matrix. A sufficient threshold condition proves epidemic decay under time-varying suppression whenever the effective reproduction number remains persistently below one. Sensitivity analyses consistently rank hospital throughput parameters above the transmission rate, providing a mathematical basis for the observation that reactive containment measures cannot prevent a hospitalization peak already driven by the pre-existing latent viral load.


[17] 2505.06718

Understanding nature's selection of genetic languages

All living organisms use two universal genetic languages in their molecular biology machinery, one containing four nucleotide bases in its alphabet, and the other containing twenty amino acids in its alphabet. They can be understood as the optimal encodings of genetic information for the tasks they carry out, i.e. replication/transcription for DNA/RNA and translation for polypeptide chains. These tasks select needed letters of the alphabet by complementary nucleotide base-pairing, from a collection of molecules in the cell. The computer science paradigm for this process is database search; various algorithms for it can be constructed and compared according to number of attempts (or queries) they need to make to find the correct nucleotide base-pairing. Grover's search algorithm based on oscillatory wave dynamics perfectly fits the number of queries needed to search the genetic alphabets, and it is more efficient than the best Boolean search algorithm (i.e. binary tree search) that needs a larger number of queries. This result strongly suggests that the universal genetic languages have been selected by evolution as the optimal alphabets for the tasks they carry out, and are not an accident of history. The outstanding challenge is to demonstrate how Grover's search algorithm would be executed in vivo by the living organisms.


[18] 2507.23146

Lightweight Language Models are Prone to Reasoning Errors for Complex Computational Phenotyping Tasks

Although computational phenotyping is a central informatics activity with resulting cohorts supporting a wide variety of applications, it is time-intensive because of manual data review. We previously assessed the ability of LLMs to perform computational phenotyping tasks using computable phenotypes for ARF respiratory support therapies. They successfully performed concept classification and classification of single-therapy phenotypes but underperformed on multi-therapy phenotypes. To understand issues with these complex tasks, we expanded PHEONA, a generalizable framework for evaluation of LLMs, to include methods specifically for evaluating faulty reasoning. We assessed the responses of two lightweight non-reasoning LLMs (Mistral Small 24 billion and Phi-4 14 billion) and one lightweight reasoning LLM (Qwen-distilled DeepSeek-r1 32 billion) both with and without prompt modifications to identify explanation correctness and unfaithfulness errors for phenotyping. For experiments without prompt modifications, both errors were present across all models. For experiments assessing accuracy impact after prompt modifications, Mistral had the highest overall accuracy impact when compared to DeepSeek and Phi. Since reasoning errors were ubiquitous across models, our enhancement of PHEONA to include a component for assessing faulty reasoning provides critical support for LLM evaluation and evidence for reasoning errors for complex tasks. While insights from reasoning errors can help prompt refinement, a deeper understanding of why LLM reasoning errors occur will likely require further development and refinement of interpretability methods.


[19] 2605.11197

The Same Problem by Different Names: Unifying Regression Dilution and Regression to the Mean

Regression to the Mean (RTM) and Regression Dilution are traditionally treated as unrelated issues in the clinical and ecological literatures. In this work, we demonstrate that within a linear errors-in-variables framework where baseline variables are subject to transient temporal or measurement noise, these two phenomena share an identical underlying mathematical signature. We unify these disparate traditions by comparing specialized clinical tools, such as the Berry shrinkage correction, with standard sign-agnostic structural estimators like Major Axis (MA) and Reduced Major Axis (RMA) regression. Using an analytical framework, we evaluate the closed-form population limits and finite-sample performance of these methods across various noise-to-signal ratios and sample sizes. Our results show that the Berry method is a specialized tool designed for clinical scenarios where a 1:1 relationship is expected. However, applying it to ecological trade-offs with negative slopes can lead to severe errors. We provide maps of optimality to identify which estimator most accurately recovers the true biological signal under different conditions. By reconciling these disparate methods, we offer a principled guide for researchers to choose the correct tool based on their data's noise profile rather than their disciplinary tradition.


[20] 2606.02462

APLSuite: An Integrated Suite for CD4+ T Cell Epitope Prediction via Antigen Processing Likelihood

Computational epitope prediction is a critical tool for exploring and understanding CD4+ T cell-mediated immune responses, a key aspect of adaptive immunity. While existing computational methods primarily focus on supervised learning approaches, they often overlook the essential role of antigen processing in determining binding specificity. To address this limitation, our group developed Antigen Processing Likelihood (APL), an algorithm that integrates crystallographic B-factor, solvent accessible surface area (SASA), hydrogen exchange protection factors (COREX), and sequence entropy. In this paper we introduce APLSuite, a comprehensive and lightweight software suite designed to streamline APL-based epitope prediction. APLSuite integrates distributed RESTful API services, a Python client for data aggregation and processing, a data science tool for efficient epitope computation, and a user-friendly graphical user interface for non-coding users. It provides a seamless and efficient pipeline for APL calculation and epitope prediction that can be finished in minutes with GPU-acceleration, which has not been implemented by existed tools. This flexible and extensible software suite is deployable on desktop and cloud environments, offering both guided and customizable workflows to meet diverse research needs in immunology research and immunotherapy development. (The project page for this work is available at: this https URL)


[21] 2505.12286

Modeling hepatitis D virus kinetics during bulevirtide monotherapy: challenges and solutions

The entry inhibitor Bulevirtide (BLV) was recently approved in Europe for treatment of chronic hepatitis D virus (HDV) infection, which is considered the most severe viral hepatitis infection. Theory indicates that models that account for free virus and infected cells, but do not include target cell dynamics (historically called the two-equation model) are limited to predicting a monophasic viral decline for antiviral agents that act only to block viral entry/infection. We investigated herein a recently published two-equation type model against clinical data obtained from patients with HDV treated with BLV monotherapy for up to 96 weeks using non-linear mixed effects modelling (NLME). We found that (i) although the model parameters had a relative standard error (RSE) <50\% suggesting that they were 'precisely estimated', the fits failed to reproduce the non-monophasic HDV kinetic patterns observed in most patients leading to incorrect predictions of the duration of treatment needed to reach a theoretical cure boundary, defined as less than 1 virion in the entire patient extracellular body fluid. (ii) The model cannot explain viral breakthrough, and (iii) the model wrongly predicts that viral load will remain at the same level once treatment is stopped. Lastly, we showed that including target cell dynamics in the model can explain not only monophasic viral decline during treatment but also non-monophasic HDV decline patterns such as biphasic, flat-partial response and viral breakthrough. Including target cell dynamics also predicts a viral rebound once BLV is stopped as observed in clinical studies.


[22] 2602.00163

Deep Learning Pose Estimation for Multi-Label Recognition of Combined Hyperkinetic Movement Disorders

Hyperkinetic movement disorders (HMDs) such as dystonia, tremor, chorea, myoclonus, and tics are disabling motor manifestations across childhood and adulthood. Their fluctuating, intermittent, and frequently co-occurring expressions hinder clinical recognition and longitudinal monitoring, which remain largely subjective and vulnerable to inter-rater variability. Objective and scalable methods to distinguish overlapping HMD phenotypes from routine clinical videos are still lacking. Here, we developed a pose-based machine-learning framework that converts standard outpatient videos into anatomically meaningful keypoint time series and computes kinematic descriptors spanning statistical, temporal, spectral, and higher-order irregularity-complexity features.


[23] 2602.09997

Popularity Feedback Constrains Innovation in Cultural Markets

Real-world creative processes ranging from art to science rely on social feedback-loops between selection and creation. Yet, the effects of popularity feedback on collective creativity remain poorly understood. We investigate how popularity ratings influence cultural dynamics in a large-scale online experiment where participants ($N = 1\,008$) iteratively \textit{select} images from evolving markets and \textit{produce} their own modifications. Results show that exposing the popularity of images reduces cultural diversity and slows innovation, delaying aesthetic improvements. Popularity feedback is associated with changes to both selection and creative stages. During selection, popularity information triggers cumulative advantage, with participants preferentially building upon popular images, reducing diversity. During creation, participants make less disruptive changes, and are more likely to expand existing visual patterns. Feedback loops in cultural markets thus not only shape selection, but also, directly or indirectly, the form and direction of cultural innovation.


[24] 2605.31071

Tree Containment Parameterized by Scanwidth

TREE CONTAINMENT is a central decision problem in mathematical phylogenetics, asking whether a given rooted phylogenetic tree is embeddable in ("displayed by") a given rooted phylogenetic network. While the problem is NP-complete for general networks, many algorithmic advances have relied on structural parameters that capture how "tree-like" a network is. In this paper we investigate TREE CONTAINMENT under the structural parameter scanwidth, a directed width measure generalizing popular parameters measuring tree-likeness of phylogenetic networks. We first present a parameterized algorithm that solves the problem in $O(4^{k + k\log{k}} n + nm^2)$ time, where $n$ and $m$ are the numbers of nodes and arcs in the network and $k$ is the width of a given tree-extension. Complementing this upper bound, we prove a matching lower bound under the Exponential-Time Hypothesis (ETH), showing that there is no algorithm for TREE CONTAINMENT that runs in $2^{o(c\log{c})} n^{O(1)}$ time, even on binary inputs, where $c$ is the directed cutwidth of the input network, which upper-bounds the scanwidth $k$.