Biochemical reactions inside living cells often occur in the presence of crowders -- molecules that do not participate in the reactions but influence the reaction rates through excluded volume effects. However the standard approach to modelling stochastic intracellular reaction kinetics is based on the chemical master equation (CME) whose propensities are derived assuming no crowding effects. Here, we propose a machine learning strategy based on Bayesian Optimisation utilising synthetic data obtained from spatial cellular automata (CA) simulations (that explicitly model volume-exclusion effects) to learn effective propensity functions for CMEs. The predictions from a small CA training data set can then be extended to the whole range of parameter space describing physiologically relevant levels of crowding by means of Gaussian Process regression. We demonstrate the method on an enzyme-catalyzed reaction and a genetic feedback loop, showing good agreement between the time-dependent distributions of molecule numbers predicted by the effective CME and CA simulations.

Can a micron sized sack of interacting molecules understand, and adapt to a constantly-fluctuating environment? Cellular life provides an existence proof in the affirmative, but the principles that allow for life's existence are far from being proven. One challenge in engineering and understanding biochemical computation is the intrinsic noise due to chemical fluctuations. In this paper, we draw insights from machine learning theory, chemical reaction network theory, and statistical physics to show that the broad and biologically relevant class of detailed balanced chemical reaction networks is capable of representing and conditioning complex distributions. These results illustrate how a biochemical computer can use intrinsic chemical noise to perform complex computations. Furthermore, we use our explicit physical model to derive thermodynamic costs of inference.

We present SODA, a lightweight and open-source visualization library for biological sequence annotations that enables straightforward development of flexible, dynamic, and interactive web graphics. SODA is implemented in TypeScript and can be used as a library within TypeScript and JavaScript.

Detection of extremely rare variant alleles, such as tumour DNA, within a complex mixture of DNA molecules is difficult. Barcoding of DNA template molecules early in the next-generation sequencing library construction provides a way to identify and bioinformatically remove polymerase errors. During the PCR-based barcoding procedure consisting of $t$ consecutive PCR-cycles, DNA molecules become barcoded by random nucleotide sequences. Previously, values 2 and 3 of $t$ have been used, however even larger values of $t$ might be relevant. This paper proposes using a multi-type branching process with immigration as a model describing the random outcome of imperfect PCR-barcoding procedure, with variable $t$ treated as the time parameter. For this model we focus on the expected numbers of clusters of molecules sharing the same unique molecular identifier.

In this work we derive new analytic expressions for fixation time in Wright-Fisher model with selection. The three standard cases for fixation are considered: fixation to zero, to one or both. Second order differential equations for fixation time are obtained by a simplified approach using only the law of total probability and Taylor expansions. The obtained solutions are given by a combination of exponential integral functions with elementary functions. We then state approximate formulas involving only elementary functions valid for small selection effects. The quality of our results are explored throughout an extensive simulation study. We show that our results approximate the discrete problem very accurately even for small population size (a few hundreds) and large selection coefficients.

The human gut microbiome is associated with a large number of disease etiologies. As such, it is a natural candidate for machine learning based biomarker development for multiple diseases and conditions. The microbiome is often analyzed using 16S rRNA gene sequencing. However, several properties of microbial 16S rRNA gene sequencing hinder machine learning, including non-uniform representation, a small number of samples compared with the dimension of each sample, and sparsity of the data, with the majority of bacteria present in a small subset of samples. We suggest two novel methods to combine information from different bacteria and improve data representation for machine learning using bacterial taxonomy. iMic and gMic translate the microbiome to images and graphs respectively, and convolutional neural networks are then applied to the graph or image. We show that both algorithms improve performance of static 16S rRNA gene sequence-based machine learning compared to the best state-of-the-art methods. Furthermore, these methods ease the interpretation of the classifiers. iMic is then extended to dynamic microbiome samples, and an iMic explainable AI algorithm is proposed to detect bacteria relevant to each condition.

When choosing between options, we must solve an important binding problem. The values of the options must be associated with other information, including the action needed to select them. We hypothesized that the brain solves this binding problem through use of distinct population subspaces. We examined responses of single neurons in five value-sensitive regions in rhesus macaques performing a risky choice task. In all areas, neurons encoded the values of both possible options, but used semi-orthogonal coding subspaces associated with left and right options, which served to link options to their positions in space. We also observed a covariation between subspace orthogonalization and behavior: trials with less orthogonalized subspaces were associated with greater likelihood of choosing the less valued option. These semi-orthogonal subspaces arose from a combination of linear and non-linear mixed selective neurons. By decomposing the neural geometry, we show this combination of selectivity achieves a code that balances binding/separation and generalization. These results support the hypothesis that binding operations serve to convert high-dimensional codes to multiple low-dimensional neural subspaces to flexibly solve decision problems.

Recent work in mathematical neuroscience has calculated the directed graph homology of the directed simplicial complex given by the brains sparse adjacency graph, the so called connectome. These biological connectomes show an abundance of both high-dimensional directed simplices and Betti-numbers in all viable dimensions - in contrast to Erd\H{o}s-R\'enyi-graphs of comparable size and density. An analysis of synthetically trained connectomes reveals similar findings, raising questions about the graphs comparability and the nature of origin of the simplices. We present a new method capable of delivering insight into the emergence of simplices and thus simplicial abundance. Our approach allows to easily distinguish simplex-rich connectomes of different origin. The method relies on the novel concept of an almost-d-simplex, that is, a simplex missing exactly one edge, and consequently the almost-d-simplex closing probability by dimension. We also describe a fast algorithm to identify almost-d-simplices in a given graph. Applying this method to biological and artificial data allows us to identify a mechanism responsible for simplex emergence, and suggests this mechanism is responsible for the simplex signature of the excitatory subnetwork of a statistical reconstruction of the mouse primary visual cortex. Our highly optimised code for this new method is publicly available.