Motivation: Biomedical question answering often requires evidence beyond topically retrieved literature, including gene alias resolution, database identifier normalization, and atlas-derived biological measurements. However, existing retrieval-augmented generation (RAG) systems typically follow a fixed workflow and lack an explicit mechanism for deciding when retrieved text is sufficient, when curated biomedical knowledge is required, or when executable evidence assembly over structured measurements should be invoked. This motivates a substrate-aware large language model (LLM) harness that selectively assembles sufficient evidence across literature, knowledge bases, and biological atlases. Results: We introduce BioHarness, an LLM harness for staged biomedical evidence assembly across literature retrieval, curated biomedical knowledge resources, and atlas-derived structured measurements. BioHarness first attempts to answer from reranked literature evidence and escalates through grounded cascade control to REPL-style evidence assembly only when the current evidence is uncertain, weakly grounded, or substrate-mismatched. Across 19,302 biomedical QA items spanning seven answer formats, BioHarness improves the pooled score from 65.9 to 71.0 over the strongest non-oracle baseline. Ablations, case studies, and backbone-scaling analyses show that these gains arise from repairing evidence-substrate mismatches through reranking, entity grounding, and structured measurement access, rather than from indiscriminately invoking more reasoning steps, retrieving additional literature, or relying on a particular answer-model scale.
Inferring epidemiological parameters from transmission trees is essential for understanding infectious disease dynamics. Existing tree-based likelihood methods, including the multi-type birth-death models originally applied in phylodynamic settings, provide powerful tools, but most assume homogeneous mixing and rarely capture how transmission potential changes as an individual infects more of their contacts. In this work, we develop a likelihood framework that operates directly on transmission trees, in which nodes are individuals and edges are reported transmission events, with no sequence data involved. We derive a likelihood for a stochastic SIR process on a rooted contact tree in which each infected individual is characterised by the total number of effective contacts, and the number of already infected downstream contacts. We obtain closed-form ordinary differential equations for the probability that a clade goes entirely unobserved and for the probability density that it produces an observed (sampled) tip in a given state. The resulting likelihood can be evaluated for a rooted contact tree with known tip states, and we extend it to partially resolved trees by treating internal branching times as latent variables. Validation on simulated outbreaks confirms accurate parameter recovery and well calibrated uncertainty. Application to empirical COVID-19 contact-tracing data from Karnataka, India, demonstrates the framework's utility for real epidemiological settings. By incorporating contact-degree heterogeneity in a multi-type branching likelihood, our work provides a principled baseline for inferring both transmission dynamics and contact structure from fully or partially resolved transmission trees, complementing rather than relying on sequence-based phylodynamic inference
Brain network is commonly divided into modules for analyzing their functionally segregated roles for group-level analysis in neuroimaging studies. Here, we introduce stochastic modules within brain networks for a robust probabilistic measurement of structural-functional module consistency (SFMC) in a group of subjects. Specifically, a stochastic module can be regarded as the chance of a brain region across subjects potentially being assigned to a group-level sub-network, characterized as an assignment probability for this brain region. This novel method has two advantages for evaluating inhomogeneous modules in brain networks. The first is that it can robustly evaluate the consistency between brain structural and functional modules whose population sizes are not necessary the same, and the second is that it is able to take into account the inter-individual variability of the modules for the groups. Moreover, compared with the conventional structural-functional coupling approach, our stochastic module-based method reveals a more pronounced decline in the coupling between structure and function, indicating stronger developmental reorganization. Our results using the dataset from Baby Connectome Project (BCP) show that the SFMC decreases from 0 to 5 years old, and is greater in primary brain regions, such as visual areas, while lower in more advanced cognitive regions, including those related to attention, control, and default mode network.
Gene regulatory networks (GRNs) are fundamental to cellular growth and tissue formation, orchestrating spatially and temporally regulated gene expression during development. These networks are inherently subject to intrinsic fluctuations arising from molecular noise, making the analysis of their stability essential for understanding robust pattern formation and developmental dynamics of the organism. In this study, we analyze the stability and dynamics of cyclic GRNs with negative feedback and diffusion, considering both deterministic and stochastic approaches. In the deterministic case, the system exhibits a bifurcation between stability and instability, leading to Hopf instability in the absence of diffusion and to Turing-Hopf instability when diffusion is included. It was observed that the discretization of the spatial domain introduces additional unstable modes, enabling a wider range of patterns. The stochastic framework based on the second-moment approach, which incorporates intrinsic fluctuations, reveals that for small system sizes, fluctuations can dominate the dynamics and induce stochastic Turing instability, even when the system is stable in the absence of diffusion. Notably, Turing instabilities can emerge even when all variables have the same diffusion rate. The developed framework provides a systematic method for analyzing the stability of high-dimensional stochastic systems with diffusion, thereby simplifying the prediction of Turing and Turing-Hopf instabilities. These findings contribute to a deeper understanding of the complex dynamics and pattern formation in GRNs, with potential implications for biological processes, such as cellular differentiation and development.
The Global Alliance for Genomics and Health (GA4GH) Beacon protocol lets researchers ask whether a genomic variant has been observed in a participating cohort and receive aggregate variant-level counts. As Beacon networks grow, two privacy risks remain: host institutions can see plaintext queries, and repeated rare-variant queries can support membership-inference attacks. We present bioETH-Beacon, a smart-contract prototype that runs the Beacon "aggregate count" query over encrypted data on a fully homomorphic Ethereum Virtual Machine (fhEVM). Hospitals upload encrypted marker-count entries, authorized researchers submit encrypted marker queries, and the contract returns an encrypted answer that is released, via an off-chain key-management service, only to the requester named in the contract's on-chain ACL. The design is organized as a 3x4 tier-by-query-family grid spanning genotype, sex, age, and phenotype queries, with tiers that trade stronger confidentiality for lower query cost. For genotype paths, the prototype can add bounded on-chain noise to mitigate probing attacks. Experiments on synthetic panels derived from a Polygenic Score (PGS) catalog show the expected scaling behavior and demonstrate that pre-aggregation can substantially reduce query gas when public marker presence is an acceptable trade-off. Overall, bioETH-Beacon provides a research prototype for confidential Beacon-style genomic querying without a trusted compute evaluator.
In the last years, an anomalously high spreading of West Nile virus (WNV) has been observed in Italy, with particularly high peaks of infections in southern Lazio, Campania and Veneto regions. The main disease vector for WNV is represented by Culex pipiens mosquitoes, which spread human infections through their bites. Here, we investigate WNV fever epidemic diffusion during summer season 2025 in Italy through a computational approach based on a quantum version of the Game of Life (GOL) cellular automaton model. Specifically, human dynamics evolves according to the GOL rules, while stochastic dynamics of disease vectors, i.e., mosquitoes, as well as their interaction with humans, simultaneously occur. We show that this model fits the curves of cumulative infected individuals with high accuracy, either at local and average-regional level, with only optimization of mosquito birth and removal rates parameters. Furthermore, leveraging model flexibility, we show that changes in model parameters values elucidate system response to environmental variations. For instance, we quantify, e.g., the impact of mosquito spreading containment measures or sudden mosquito increasing abundance due to climatic and ecological changes. Overall, we provide a general, quantitative description of WNV infection spreading in Italy which could represent a supportive tool to test different environmental scenarios and could be useful to devise strategies for decision makers to monitor disease vector dynamics and to control consequent virus diffusion.
Macrocyclic peptides are promising therapeutic candidates for intracellular targets, but their design requires simultaneous control over non-natural monomer chemistry, ring topology, membrane permeability, and target binding. Existing SMILES- or HELM-string generative models either operate in long atom-level sequence spaces or treat monomers as symbolic tokens with limited chemical grounding. We introduce PepALD, an Autoregressive Latent Diffusion (ALD) foundation model for \textit{de novo} macrocyclic peptide generation. The model represents HELM monomers with structured chemical embeddings, generates each residue through context-conditioned diffusion in chemically informed latent space, predicts R-group-aware ring closures during autoregressive generation, and aligns the denoiser to affinity rewards using winner-protected diffusion-adapted preference optimization. In silico experiments demonstrate PepALD's generation quality and reward-optimization performance against representative peptide generation baselines.
Tree-like structures appear in many areas of science, and their shapes can help understand the underlying processes they drive or that give rise to them. By thinking of these structures as geometric graphs in $\mathbb{R}^3$, we gain access to tools from computational geometry and topology to study them. In this paper, we adopt the theory of quadratic forms to measure the directional spread of geometric graphs, and we introduce the hexplot model -- equipped with a metric derived from the Fisher metric on the standard triangle -- to visualize, measure, and collect statistics.
Real-world clinical decision support requires reasoning over heterogeneous and longitudinal patient information rather than answering isolated medical questions. However, current medical large language models and retrieval-augmented generation systems often rely on single-step prompting or retrieval, which can be fragile when clinical evidence is distributed across long electronic health records, medical images, sensor streams, guidelines, and referral constraints. This paper proposes MedRLM, a Recursive Multimodal Health Intelligence framework for long-context clinical reasoning, sensor-guided screening, and community-to-tertiary referral support. Instead of compressing all patient information into one prompt, MedRLM treats the patient case as an external clinical environment that can be recursively inspected, decomposed, retrieved, verified, and synthesized. The framework coordinates specialized agents for clinical text, longitudinal EHR, medical imaging, physiological sensor signals, guideline retrieval, uncertainty auditing, and referral planning. It further introduces a Clinical Evidence Graph Memory to connect patient-specific observations with retrieved evidence, standardized definitions, sensor-derived biomarkers, and referral criteria. A sensor-guided recursive triggering mechanism activates deeper reasoning when abnormal physiological or behavioral patterns are detected, while uncertainty-gated refinement supports clinician review for high-risk or low-confidence cases. We also outline a real-data evaluation design using public and credentialed clinical datasets spanning EHR, radiology, ECG, ICU time series, and referral-proxy outcomes. MedRLM aims to move medical AI from static question answering toward auditable, multimodal, and workflow-aware clinical decision support.
Camera-trap monitoring in African tropical forests increasingly extends beyond closed-canopy interiors to riverbanks, clearings, and park edges. Among available open tools for African forest camera-trap classification, DeepForestVision is the only one providing a matched offline workflow for both photographs and videos, and previous work showed that it outperformed other available baselines on a comparable benchmark. However, it was designed for closed-canopy, ground-level forest interiors and uses a 35-class prediction space that becomes too coarse when deployments encounter arboreal primates, birds, semi-aquatic taxa, or human-associated confounders such as livestock. We present DeepForestVisionV2, an ecology-driven expansion from 35 to 64 prediction classes (61 animal classes plus human, vehicle, and blank) designed to address three recurrent deployment gradients: vertical stratification, scene openness, and anthropogenic interfaces. DeepForestVisionV2 retains the same offline workflow and is trained on 1,535,010 photographs and 243,354 videos from multi-country African tropical-forest projects. Evaluation combines a cross-country cropped-photo validation set, used to assess robustness across sites and camera-trap settings, with three held-out Uganda video benchmarks spanning the targeted gradients. On the validation set, DeepForestVisionV2 reaches 0.86 accuracy, 0.82 macro-F1, and 0.81 balanced accuracy. On the deployment benchmarks, it preserves or improves baseline accuracy despite its harder classification task, while increasing the number of identified taxa from 22 to 29 in forest-interior videos and from 4 to 9 at riverbanks. In the park-edge use case, it raises accuracy from 0.62 to 0.86 and reduces false alarms from 11 to 0. These results show that DeepForestVisionV2 materially improves field utility while preserving robustness across sites, habitats, and camera-trap settings.
Collective oscillations in neuronal systems often arise from interactions between excitatory and inhibitory populations rather than from recurrent coupling within a single ensemble. Motivated by the coexistence of strongly and partially synchronized regimes in such systems, we study the Kuramoto Sakaguchi model on a bipartite network. Despite its minimal structure, the model exhibits rich collective dynamics, including both continuous and discontinuous transitions from full synchrony to partial synchrony (PS). In the PS regime, global oscillations fail to entrain one of the two populations, whose oscillators display quasiperiodic dynamics with an average frequency that can significantly deviate from that of the global field, as observed in neuronal networks. We show that this PS state constitutes an example of self-organized quasiperiodicity, arising here in the canonical Kuramoto Sakaguchi model despite its purely linear global coupling.
Resting-state EEG provides a non-invasive view of spontaneous brain activity, but extracting meaningful patterns is often limited by scarce high-quality data and reliance on manually engineered features. Generative adversarial networks (GANs) can synthesize neural signals and learn transferable representations directly from raw data, a dual capability that remains underexplored in EEG research. Here, we introduce REST-GAN, a GAN-based framework for resting-state EEG that combines adversarial training with an auxiliary self-supervised reconstruction objective to support signal synthesis and unsupervised feature extraction. Although trained only on raw time-domain signals, without explicit frequency-domain or sensor-topographic supervision, the generated time series reproduced key temporal, spectral, and connectivity properties of real EEG. In band-power feature space, generated samples showed high precision and recall across eyes-open and eyes-closed conditions (EO: 0.91/0.67; EC: 0.87/0.65), while group-average spectral coherence matrices showed low mean absolute differences from real data across frequency bands (~0.01-0.03). The representations learned by the model's critic transferred to independent resting-state demographic classification tasks, outperforming models trained directly on raw EEG and showing competitive performance relative to a recent EEG foundation model, while requiring substantially less training data and computational resources. These findings highlight a computationally efficient, architecture-driven strategy in which generative models serve not only as EEG signal generators, but also as unsupervised feature extractors. This approach may support more data-efficient EEG analysis while reducing reliance on manual feature engineering. The implementation code for REST-GAN is available at: this https URL.
Human cognition depends on large scale communication constrained by white matter architecture. Although weak connections are abundant in mammalian connectomes, they have long been treated as noise and downweighted because of tractography uncertainty in the human brain, and their relevance to human cognition and large scale functional organization remains unresolved. Across multiple datasets and tractography pipelines, we show that, when tractography derived connectivity weights are interpreted through a nonlinear weighting framework, weak connections make measurable contributions to cognitive prediction, functional connectivity simulation, and structure-function coupling. These effects are selective: nonlinear weighting improves the prediction of general cognitive ability and memory more than that of crystallized intelligence or processing speed, consistent with the notion that weak connections preferentially expand the modal repertoire of brain networks to enhance both large scale integration and fine grained segregation, thereby supporting the functional balance essential for diverse cognitive abilities. Importantly, these effects are replicated in a reliability aware connectome generated by integrating two post tractography filtering methods, in which preserving weak links consistently outperforms conventional thresholding strategies. Finally, we show that weak connections contain functionally informative subsets organized along systems level and transcriptomic gradients. In particular, a specific class of weak connections, predominantly linking visual and motor systems with limbic regions and characterized by negative gene coexpression, exerts a disproportionately large influence on brain function.
Imperfect molecular detection in single-cell experiments introduces technical noise that obscures the true stochastic dynamics of gene regulatory networks. While binomial models of molecular capture provide a principled description of imperfect detection, they have so far been analyzed only for simple gene-expression models that do not explicitly account for regulation. Here, we extend binomial models of capture to general gene regulatory networks to understand how imperfect capture reshapes the observed time-dependent statistics of molecular counts. Our results reveal when capture effects correspond to a renormalization of a subset of the kinetic rates and when they cannot be absorbed into effective rates, providing a systematic basis for interpreting noisy single-cell measurements. In particular, we show that rate renormalization depends on the level of regulatory detail in the model. For implicit regulatory models based on promoter state transitions, it arises whenever gene product synthesis does not trigger a promoter state change, as in the absence of promoter-proximal pausing or when pausing is short-lived. For models with explicit transcription factor binding, the same condition holds, together with sufficiently high transcription factor abundance, which in practice requires only a few tens of molecules per cell. In these cases, technical noise reduces the apparent mean burst size of synthesized gene products and accelerates the apparent rates of transcription factor binding reactions. This acceleration becomes stronger as the number of protein species and/or molecules involved in promoter switching increases. These effects hold for gene regulatory networks of arbitrary connectivity and remain valid under time-dependent kinetic rates.
Gene regulatory networks govern cellular fate decisions through multistable dynamics. The genetic toggle switch is a canonical model of such behaviour; yet, the impact of cell division on its dynamics remains poorly understood. We derive analytical separatrices for a simplified Boolean toggle switch with and without division. We show that division can redirect trajectories with identical initial conditions to opposing stable states, and we define a region of disagreement where fate decisions are predicted incorrectly if division is neglected. Our results imply that division can fundamentally reshape fate boundaries in multistable regulatory networks.
Graph symmetries identify structural regularities and reduce the computational complexity of network analysis. In weighted graphs, however, exact automorphisms are rare because real-valued weights seldom coincide. We introduce a general framework for detecting approximate symmetries by aggregating weights into discrete categories, generating a sequence of coarser graphs on which classical automorphism analysis applies. The approximation path is fully configurable, based on interaction magnitudes, and can be matched to the empirical weight distribution. Applied to 250 empirical food webs using logarithmic aggregation, the method reveals that automorphisms emerge even at low approximation levels and almost always form small orbits. Orbit sizes rarely exceed two or three vertices, reflecting the combinatorial fragility of larger symmetric sets. Even so, symmetric vertices occupy diverse structural positions in the network and high connectivity does not imply asymmetry. The observation of just local permutations confirms the conclusions of trophic species and niche analysis. A case study demonstrates that automorphisms can also recover latent ecological structure. The minimal aggregation level at which two vertices become substitutable provides a quantitative measure of role similarity. The framework offers a principled, automorphism-based approach for quantifying similarity and redundancy in weighted complex networks.