Tensor decomposition of donor $\times$ cell-type $\times$ gene single-cell data recovers \emph{multicellular programs}: coordinated axes of inter-individual transcriptional variation that span cell types and stratify disease. Yet immune single-cell atlases are increasingly multi-institution, multi-ancestry, and governed, so patient cells often cannot be pooled. We present a federated estimator: each site computes a local program subspace, and a coordinator merges these by stacked SVD under federated global-mean centering, provably equivalent (up to truncation) to the centralised decomposition. This centering makes the merge robust to site-label confounding (program AUC $0.957$ vs.\ $0.861$ for naive per-site centering). Only program subspaces leave a site, and aggregation is compatible with secure aggregation. On a 261-donor systemic lupus erythematosus atlas it recovers the canonical interferon program (ISG enrichment AUC $0.998$; case--control separation $0.958$; bootstrap $\Delta\text{AUC}=-0.000$, 95\% CI $[-0.004,+0.012]$ vs.\ centralised), across institution-scale and multi-ancestry partitions, and across three \emph{real} COVID-19 sites (subspace correlation $0.989$). It recovers the program when \emph{no site observes all cell types} (correlation $1.000$, exact by construction), which fixed-feature federated PCA cannot. On an interstitial-lung-disease atlas the recovered program predicts disease better than the best single cell type (AUC $0.96$ vs.\ $0.91$; gap 95\% CI excludes zero) and the advantage survives federation; a liver cohort is consistent ($p=0.005$). Membership-inference shows secure aggregation cuts attack AUC from $0.91$ to $0.61$. The method enables cross-institution, cross-ancestry recovery of multicellular immune programs without sharing cells.
Predicting transcriptional responses to genetic perturbations could reduce the experimental burden of functional genomics, but extrapolation to genes that were never perturbed during training remains difficult. We present Stable-Shift, a structured method for estimating unseen-gene responses. Stable-Shift aggregates single-cell measurements into perturbation-level expression shifts, fits a low-rank response basis using training perturbations only, and predicts an unseen gene's coordinates in that basis from biological context. The context combines STRING interactions, network structure, control-cell expression statistics, and Gene Ontology annotations; the evaluated implementation uses graph convolution to integrate these inputs. On the supplied K562 Perturb-seq benchmark, Stable-Shift obtained 0.592 cosine similarity, compared with 0.569 for GEARS, together with higher Spearman correlation and top-gene precision among the evaluated methods. Its mean cosine similarity over five unseen-gene splits was 0.589 +/- 0.008. The same ordering was observed in the supplied graph-aware, residualized, gene-space, and Norman-dataset comparisons. These results support further study of biologically structured latent-response prediction, while the lower gene-space accuracy and sensitivity to sparse graph neighborhoods limit the scope of the present conclusions.
Alignment-free methods in phylogenetic tree construction have major benefits in computational efficiency over alignment-based methods, but most sacrifice sequence information to pairwise distances, losing the statistical power of maximum likelihood (ML) inference. We describe ML-MAWS, an algorithm that fills this gap by encoding Minimal Absent Words (MAWs) as a binary presence/absence character matrix and estimating using an ML tree under the Lewis Mkv model using ascertainment bias correction. MAWs are obtained in linear time through the traversal of a suffix automaton. Three new elements contribute to the phylogenetic signal: strand-aware filtering combines forward and reverse complement MAW sets to eliminate compositional artifacts; entropy-based multi-length selection uses Shannon entropy maximization to select the most informative lengths of MAWs; and parsimony-informative character capping only retains the most discriminative columns. We tested ML-MAWS on 14 benchmark datasets of bacterial, mitochondrial, viral, and simulated genomes with normalized Robinson Foulds distances and matching split distances, against published reference trees. The results show that the coarse binary encoding of MAWs can lead to higher topological errors than continuous-valued distance baselines, while ML-MAWS can successfully recover near-correct splits and can uniquely provide per-branch statistical confidence as well as a rigorous probabilistic framework that is lacking in these methods.
Molecular generation is a central challenge in drug discovery, requiring models that explore vast chemical space while satisfying diverse design constraints. We present Molexar, a unified multimodal molecular foundation model built on Fragment-SELFIES, a robust, fragment-aware molecular language with validity-preserving decoding and explicit fragment structure. A pretrained autoregressive decoder learns the Fragment-SELFIES syntax and molecular distribution; supervised fine-tuning (SFT) then trains the same decoder on condition-molecule pairs spanning scalar molecular properties, pharmacophore fingerprints, protein sequences, and binding pockets, injecting each condition by in-place replacement of value-token embeddings so that all generation modes share one autoregressive path. Molexar achieves strong efficiency at a small parameter count while matching or exceeding larger models. The pretrained model reaches 100% validity and high drug-likeness in unconditional and fragment-constrained generation; the SFT model follows single- and multi-property instructions and remains competitive on target-conditioned generation on the CrossDocked2020 test set. On MolGenBench, Molexar further generates molecules with favorable safety and potency. These results establish Molexar as a practical unified foundation for computational chemistry and drug-design workflows.
Polychronous Neuronal Groups (PNGs) reproducible, time-locked spatiotemporal firing cascades stabilised by Spike-Timing-Dependent Plasticity (STDP) and heterogeneous axonal delays provide a combinatorially rich substrate for neural computation whose structural determinants remain poorly understood. We simulate a recurrent network of N=1000 Izhikevich neurons over ten hours of biological time and identify 1545 unique PNGs via an offline event-driven detection algorithm. A parametric Watts-Strogatz topology sweep reveals that the clusteringcoefficient C is the primary structural driver of PNG yield: the transition from a ring-lattice (C~0.35, $\sim\!850$ \PNGs) to a random graph (C~!0.20$, $<\!50$ \PNGs) reduces representational capacity by more than 90%. We further introduce a sparse-dot-product Recurrence Plot (RP) framework that identifies PNGs as unit-slope diagonal structures in the phase-space recurrence matrix, entirely independent of anatomical neuron labelling. Recurrence Quantification Analysis yields DET~0.65, quantifying the reproducibility of the network's dynamical trajectory. Together, the results establish small-world topology as the structural optimum for polychronization and the \RP decoder as a principled, label-free tool for PNG identification.
Tabular foundation models (TFMs) achieve strong performance on microbiome abundance data, yet their robustness under realistic distribution shift remains poorly characterized. We introduce a benchmark that evaluates the robustness of TFMs to biologically inspired perturbations across six gut microbiome datasets spanning four disease contexts. In this in-context learning setting, models receive unperturbed support sets as context and are evaluated on perturbed query samples. To isolate robustness beyond "shortcut" features, we preserve the most discriminative taxa and apply three controlled perturbation strategies: (i) removal of high-abundance (uninformative) taxa, (ii) sparsification via increased zero-inflation, and (iii) zero-imputation via spurious non-zero injections. Our results show that protecting discriminative features is insufficient to guarantee stability under support-query shift: across datasets, all perturbations degrade model performance, with zero-imputation consistently the most harmful, indicating that corrupting global feature structure can break generalization even when key taxa are retained. Sparsification disproportionately affects TFMs relative to a classical random forest baseline, suggesting greater sensitivity to zero-inflation-type shifts. The code is publicly available at: this https URL.
Complex systems, from gene regulatory networks to neural circuits and transportation infrastructures, exhibit rich functional behaviour that topology alone does not capture. Here we show that functional memory exhibits a universal organisational regularity: in every biological, ecological, social, and technological domain studied, real interaction strengths organise memory at greater hierarchical depth than random weight assignment on the same topology, across thirty-four networks spanning several orders of magnitude in size and density. Using a thermodynamic description of multiscale information flow, we quantify how memory is distributed across path lengths and show that functional memory organisation collapses onto four recurrent dynamical organisations, revealing an intrinsically low-dimensional structure. Comparing each network against null models that selectively perturb weighted transport geometry, mesoscale structure, and directionality reveals that these ingredients contribute distinct and non-equivalent roles: weight geometry systematically governs memory depth, mesoscale structure shapes memory organisation across scales, and directionality modulates the sensitivity of the cascade to structural perturbation. The same comparison provides an operational criterion for whether network weights encode genuine functional interaction structure. These results establish weighted transport geometry as a primary organiser of functional memory and show that weighted interactions carry dynamical structure that binary topology alone cannot recover.
We introduce a Dirichlet--multinomial (DM) deviance residualization for sparse, jointly overdispersed count matrices, the regime that dominates sequencing-based biochemical assays. The DM null treats each sample's count vector as a fixed-total composition with a single scalar concentration $\alpha_0$ governing overdispersion, and arises exactly by conditioning independent negative-binomial feature counts on the observed sample total -- making the DM the joint conditional analogue of standard feature-wise overdispersed count models. The resulting transform preserves exact sparsity, evaluates in constant time per nonzero entry, agrees with multinomial residuals on singleton counts, shrinks repeated-count residuals according to the overdispersion the null tolerates, and recovers the multinomial residual as $\alpha_0\to\infty$. The same fixed-dispersion comparison principle extends to ordered and tree-structured features via the generalized DM and the Dirichlet-tree multinomial, giving a single residual family that subsumes joint and feature-wise count nulls under a common compositional logic and is computationally lightweight enough to drop into existing sparse pipelines.
This study aims to identify typical collective phenomena that emerge in excitatory and inhibitory (E-I) spiking neural networks as reported in recent computational studies. The research methodology used is Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) procedures, comprising three primary stages: an initial search for literature in the SCOPUS database, a screening process based on specific inclusion and exclusion criteria, and a review of the selected literatures. Out of 491 documents from 2014 to 2024, six research papers are selected in the review stage. Four generic collective regimes have been identified, including synchrony, irregular behavior, stationary state, and oscillatory patterns. Our review findings suggest that the collective dynamics of E-I spiking neurons stem from the interplay of intrinsic neuronal characteristics, network topology, and external stimuli. Additionally, the prevalent use of Quadratic Integrate-and-Fire (QIF) neuron model in the literature highlights its significance as a robust candidate for exploring collective behaviors in large-scale neuronal networks. The findings outlined in this paper might be useful for individuals who lack prior familiarity with computational modelling of spiking neurons but have an interest in the field.
In studying primate vision, a large body of work focuses on the first feedforward sweep. During this initial time window, information is thought to pass through ventral stream regions in a stage-like fashion in an effort to extract high-level information from the retinal input. Consequently, electrophysiological analyses commonly focus on spatial response patterns, either by averaging data in time, or by applying decoders in a temporally local fashion. By analysing data recorded simultaneously across multiple arrays placed along the macaque ventral stream, we here show that this prior approach may be missing key aspects of information encoding. First, time-resolved, multivariate analyses of information transfer between V4 and IT reveal temporally and semantically varied information content as being exchanged within the first 100ms of processing. Second, by employing recurrent neural network (RNN) decoding techniques that extend across the temporal domain, we demonstrate that the neural pattern dynamics themselves carry categorical information far beyond the spatially encoded information available at any given time point. These findings challenge the prevailing view of a single, stage-like feedforward process and suggest that even the earliest parts of visual processing are better characterised as a spatiotemporally evolving process that encodes information in its dynamics rather than purely spatial response patterns.
Understanding how structural connections are associated with tau propagation in Alzheimer's disease (AD) remains a central open question, yet existing computational models either rely heavily on biophysical assumptions or lack neurobiologically interpretable pathway maps. We present SC-TauPath, a structural connectivity (SC) attribution framework that maps tau propagation pathways from in vivo neuroimaging data. SC-TauPath combines a Network Diffusion Model (NDM)-augmented multilayer perceptron with gradient $\times$ input attribution to score each SC edge's contribution to tau prediction, then translates these attribution scores into multi-scale pathway maps (backbone edges, high-traffic routes, and hub ROIs), which validates established Braak staging anatomy. Applied to 234 ADNI participants with paired DTI SC and 18F-Flortaucipir PET, SC-TauPath achieves strong cross-validated tau prediction and yields attribution-based pathway maps consistent with established Braak staging anatomy, demonstrating that SC encode spatially specific information about regional tau distribution in AD.
Continuous Attractor Neural Networks (CANNs) traditionally rely on pre-wired recurrent connectivity to model spatial representations, path integration, and anticipatory dynamics. However, the biological mechanisms through which this structured connectivity emerges via learning remain relatively unexplored. This work presents a theoretical framework revealing how continuous attractor connectivity and its computational properties self-organize through Hebbian plasticity, firing-rate adaptation, and global inhibition. We show that translationally invariant inputs naturally drive the emergence of stable, Gaussian-profiled feedforward weights. Crucially, anticipatory dynamics arise spontaneously within these feedforward architectures, shifting the activity bump forward without requiring recurrent excitatory collaterals. This predictive shift can be linearly amplified across multilayer networks, consistent with anticipatory activity observed in the superficial layers of the entorhinal cortex. Furthermore, introducing recurrent interactions allows the network to learn connections capable of self-sustaining a moving bump of activity. Finally, by modulating the network with an external, time-varying baseline current that encodes speed, the system adjusts its intrinsic velocity to function as a precise unidirectional path integrator. Ultimately, this study suggests that prospective coding and path integration are not manually engineered features, but rather naturally co-emergent properties of a single self-organizing competitive network.
Deep learning-based antimicrobial peptide (AMP) discovery faces critical challenges such as limited controllability, lack of representations that efficiently model antimicrobial properties, and low experimental hit rates. To address these challenges, we introduce OmegAMP, a framework designed for reliable AMP generation with increased controllability. Its diffusion-based generative model leverages a novel conditioning mechanism to achieve fine-grained control over desired physicochemical properties and to direct generation towards specific activity profiles, including species-specific effectiveness. This is further enhanced by a biologically informed encoding space that significantly improves overall generative performance. Complementing these generative capabilities, OmegAMP leverages a novel synthetic data augmentation strategy to train classifiers for AMP filtering, drastically reducing false positive rates and thereby increasing the likelihood of experimental success. Our in silico experiments demonstrate that OmegAMP delivers state-of-the-art performance across key stages of the AMP discovery pipeline, enabling us to achieve an unprecedented success rate in wet lab experiments. We tested 25 candidate peptides, 24 of them (96%) demonstrated antimicrobial activity, proving effective even against multi-drug resistant strains. Our findings underscore OmegAMP's potential to significantly advance computational frameworks in the fight against antimicrobial resistance.
The scalability of pool-based active learning is limited by the computational cost of evaluating large unlabeled datasets, a challenge that is particularly acute in virtual screening for drug discovery. While active learning strategies such as Bayesian Active Learning by Disagreement (BALD) prioritize informative samples, it remains computationally intensive when scaled to libraries containing billions samples. In this work, we introduce BALD-GFlowNet, a generative active learning framework that circumvents this issue. Our method leverages Generative Flow Networks (GFlowNets) to directly sample objects in proportion to the BALD reward. By replacing traditional pool-based acquisition with generative sampling, BALD-GFlowNet achieves scalability that is independent of the size of the unlabeled pool. In our virtual screening experiment, we show that BALD-GFlowNet achieves a performance comparable to that of standard BALD baseline while generating more structurally diverse molecules, offering a promising direction for efficient and scalable molecular discovery.
Wilson-Cowan and Amari-type models capture nonlinear neural population dynamics, providing a fundamental framework for modeling how sensory and other exogenous inputs shape activity in neural tissue. We study the controllability properties of Amari-type neural fields subject to piecewise/constant-in-time inputs. The model describes the time evolution of the polarization of neural tissue within a spatial continuum, with synaptic interactions represented by a convolution kernel. We study the synthesis of piecewise/constant-in-time inputs to achieve two-point boundary-type control objectives, namely, steering neural activity from an initial state to a prescribed target state. This approach is particularly relevant for predicting the emergence of paradoxical neural representations, such as discordant visual illusions that occur in response to overt sensory stimuli. We first present a control synthesis based on the Banach fixed-point theorem, which yields an iterative construction of a constant-in-time input under minimal regularity assumptions on the kernel and transfer function; however, it exhibits practical limitations, even in the linear case. To overcome these challenges, we then develop a generic synthesis framework based on the flow of neural dynamics drift, enabling explicit piecewise constant and constant-in-time inputs. Extensive numerical results in one and two spatial dimensions confirm the effectiveness of the proposed syntheses and demonstrate their superior performance compared to inputs derived from naive linearization at the initial or target states when these states are not equilibria of the drift dynamics. By providing a mathematically rigorous framework for controlling Amari-type neural fields, this work advances our understanding of nonlinear neural population control with potential applications in computational neuroscience, psychophysics, and neurostimulation.
Continual learning systems face a fundamental geometric obstacle: as experience accumulates on a fixed-capacity manifold, covering numbers grow linearly with time, eventually forcing representational overlap and catastrophic interference. Prevailing approaches attack this problem by \emph{expansion} - projecting into higher-dimensional spaces via kernels, overparameterization, or replay. We argue the solution is the opposite: \emph{contraction}. We formalize abstraction as the \textbf{Urysohn Ladder}, a hierarchy of quotient maps that recursively collapse validated metric neighborhoods into compact tokens, converting unbounded ambient-space search into bounded navigation on a low-dimensional intrinsic scaffold. Geometrically, each collapsed token acts as a shortcut - a region of extreme metric contraction that bridges distant experiences, much like a wormhole in the representational manifold. We establish four results that collectively guarantee \emph{separability} (metric contraction renders nonlinearly entangled structure linearly separable at each quotient level, and this separability propagates faithfully through the entire hierarchy), \emph{bounded capacity} (covering numbers remain $O(1)$ per quotient level, independent of stream length), \emph{stability} (parity-partitioned flow/scaffold subspaces enable unbounded plasticity without catastrophic interference), and \emph{scalability} (inference cost scales with quotient distance, not ambient distance). We validate each claim empirically with pretrained models and real-world datasets. Moreover, we demonstrate the potential of Urysohn Ladder for scalable continual learning via scaffold amortization.