New articles on Quantitative Biology


[1] 2106.09444

Unsupervised classification of cell imaging data using the quantization error in a Self Organizing Map

This study exploits previously demonstrated properties such as sensitivity to the spatial extent and the intensity of local image contrast of the quantization error in the output of a Self Organizing Map (SOM QE). Here, the SOM QE is applied to double color staining based cell viability data in 96 image simulations. The results show that the SOM QE consistently and in only a few seconds detects fine regular spatial increases in relative amounts of RED or GREEN pixel staining across the test images, reflecting small, systematic increases or decreases in the percentage of theoretical cell viability below the critical threshold. Such small changes may carry clinical significance, but are almost impossible to detect by human vision. Moreover, we demonstrate a clear sensitivity of the SOM QE to differences in the relative physical luminance (Y) of the colors, which here translates into a RED GREEN color selectivity. Across differences in relative luminance, the SOM QE exhibits consistently greater sensitivity to the smallest spatial increases in RED image pixels compared with smallest increases of identical spatial extents in GREEN image pixels. Further selective color contrast studies on simulations of biological imaging data will allow generating increasingly larger benchmark datasets and, ultimately, unravel the full potential of fast, economic, and unprecedentedly precise biological data analysis using the SOM QE.


[2] 2106.09594

A factor graph EM algorithm for inference of kinetic microstates from patch clamp measurements

We derive a factor graph EM (FGEM) algorithm, a technique that permits combined parameter estimation and statistical inference, to determine hidden kinetic microstates from patch clamp measurements. Using the cystic fibrosis transmembrane conductance regulator (CFTR) and nicotinic acetylcholine receptor (nAChR) as examples, we perform {\em Monte Carlo} simulations to demonstrate the performance of the algorithm. We show that the performance, measured in terms of the probability of estimation error, approaches the theoretical performance limit of maximum {\em a posteriori} estimation. Moreover, the algorithm provides a reliability score for its estimates, and we demonstrate that the score can be used to further improve the performance of estimation. We use the algorithm to estimate hidden kinetic states in lab-obtained CFTR single channel patch clamp traces.


[3] 2106.09639

Implementing Permutations in the Brain and SVO Frequencies of Languages

The subject-verb-object (SVO) word order prevalent in English is shared by about $42\%$ of world languages. Another $45\%$ of all languages follow the SOV order, $9\%$ the VSO order, and fewer languages use the three remaining permutations. None of the many extant explanations of this phenomenon take into account the difficulty of implementing these permutations in the brain. We propose a plausible model of sentence generation inspired by the recently proposed Assembly Calculus framework of brain function. Our model results in a natural explanation of the uneven frequencies. Estimating the parameters of this model yields predictions of the relative difficulty of dis-inhibiting one brain area from another. Our model is based on the standard syntax tree, a simple binary tree with three leaves. Each leaf corresponds to one of the three parts of a basic sentence. The leaves can be activated through lock and unlock operations and the sequence of activation of the leaves implements a specific word order. More generally, we also formulate and algorithmically solve the problems of implementing a permutation of the leaves of any binary tree, and of selecting the permutation that is easiest to implement on a given binary tree.


[4] 2106.09389

Brain, Rain and Forest Fires -- What is critical about criticality: In praise of the correlation function

We present a brief review of power laws and correlation functions as measures of criticality and the relation between them. By comparing phenomenology from rain, brain and the forest fire model we discuss the relevant features of self-organisation to the vicinity about a critical state. We conclude that organisation to a region of extended correlations and approximate power laws may be behaviour of interest shared between the three considered systems.


[5] 2106.09408

Predicting cognitive scores with graph neural networks through sample selection learning

Analyzing the relation between intelligence and neural activity is of the utmost importance in understanding the working principles of the human brain in health and disease. In existing literature, functional brain connectomes have been used successfully to predict cognitive measures such as intelligence quotient (IQ) scores in both healthy and disordered cohorts using machine learning models. However, existing methods resort to flattening the brain connectome (i.e., graph) through vectorization which overlooks its topological properties. To address this limitation and inspired from the emerging graph neural networks (GNNs), we design a novel regression GNN model (namely RegGNN) for predicting IQ scores from brain connectivity. On top of that, we introduce a novel, fully modular sample selection method to select the best samples to learn from for our target prediction task. However, since such deep learning architectures are computationally expensive to train, we further propose a \emph{learning-based sample selection} method that learns how to choose the training samples with the highest expected predictive power on unseen samples. For this, we capitalize on the fact that connectomes (i.e., their adjacency matrices) lie in the symmetric positive definite (SPD) matrix cone. Our results on full-scale and verbal IQ prediction outperforms comparison methods in autism spectrum disorder cohorts and achieves a competitive performance for neurotypical subjects using 3-fold cross-validation. Furthermore, we show that our sample selection approach generalizes to other learning-based methods, which shows its usefulness beyond our GNN architecture.


[6] 2106.09553

Do Large Scale Molecular Language Representations Capture Important Structural Information?

Predicting chemical properties from the structure of a molecule is of great importance in many applications including drug discovery and material design. Machine learning based molecular property prediction holds the promise of enabling accurate predictions at much less complexity, when compared to, for example Density Functional Theory (DFT) calculations. Features extracted from molecular graphs, using graph neural nets in a supervised manner, have emerged as strong baselines for such tasks. However, the vast chemical space together with the limited availability of labels makes supervised learning challenging, calling for learning a general-purpose molecular representation. Recently, pre-trained transformer-based language models (PTLMs) on large unlabeled corpus have produced state-of-the-art results in many downstream natural language processing tasks. Inspired by this development, here we present molecular embeddings obtained by training an efficient transformer encoder model, referred to as MoLFormer. This model was employed with a linear attention mechanism and highly paralleized training on 1D SMILES sequences of 1.1 billion unlabeled molecules from the PubChem and ZINC datasets. Experiments show that the learned molecular representation performs competitively, when compared to existing graph-based and fingerprint-based supervised learning baselines, on the challenging tasks of predicting properties of QM8 and QM9 molecules. Further task-specific fine-tuning of the MoLFormerr representation improves performance on several of those property prediction benchmarks. These results provide encouraging evidence that large-scale molecular language models can capture sufficient structural information to be able to accurately predict quantum chemical properties and beyond.