Collective is different: Information exchange and speed-accuracy trade-offs
in self-organized patterning
October 01, 2025
During development, highly ordered structures emerge as cells collectively coordinate with each other. While recent advances have clarified how individual cells process and respond to external signals, understanding collective cellular decision making remains a major challenge. Here, we introduce a minimal, analytically tractable, model of cell patterning via local cell-cell communication. Using this framework, we identify a trade-off between the speed and accuracy of collective pattern formation and, by adapting techniques from stochastic chemical kinetics, quantify how information flows between cells during patterning. Our analysis reveals counterintuitive features of collective patterning: globally optimized solutions do not necessarily maximize intercellular information transfer and individual cells may appear suboptimal in isolation. Moreover, the model predicts that instantaneous information shared between cells can be non-monotonic in time as patterning occurs. An analysis of recent experimental data from lateral inhibition in Drosophila pupal abdomen finds a qualitatively similar effect.
Creating a functional organism requires complex multicellular coordination [1], [2]. To this end, eukaryotic cells have evolved a multitude of mechanisms to sense external cues [1]–[3] and respond to them by changing their internal state [4]–[6], migrating [7], or signaling [2], [8] to other cells. The underlying principles of multicellular self-organization remain elusive, despite numerous examples of developmental self-organization observed across biology [9]–[11]. Does development optimize for particular objectives, such as robustness or speed? If so, what constraints shape the outcomes? In the case of a single cell processing an exogenous signal, theoretical progress has been made to determine how cells could optimally process a static [4] or dynamic [12], [13] signal, and how cells can sense their environment and optimally act upon it [14], [15]. Many developmental contexts fit this paradigm: one population emits a signal and another responds without providing feedback to the sender. For example, in C. elegans vulval development an anchor cell secretes an EGF-like ligand which is received by vulval precursor cells but no reciprocal signal is sent [16], [17]. Similarly, during ascidian development, vegetal cells secrete an FGF-like ligand, which induces a neural fate in specific animal cells and no reciprocal signal is sent [18], [19].
Frequently, however, cells both send and receive signals, leading to complex non-linear feedback [1], [2], [20]. Indeed, although the receiving cells in the previous examples do not feedback on the signal sender, they coordinate among themselves, through Notch-Delta signaling in C. elegans vulva [16] and EphrinA signaling in ascidian neural induction [21]. A single cell making noisy measurements of its external environment and acting on them can be modeled as a partially observed Markov decision process, for which optimal control is governed by the Bellman equation [15], [22]. In contrast, analyzing interacting cells engaged in a collective task, such as patterning, represents a decentralized partially observed Markov decision process, where computing an optimal strategy is generically NP-hard, though finding good solutions is becoming increasingly practical [23], [24]. Difficulties arising from decentralization are well known in computer science; for instance, the two generals’ problem demonstrates that perfect coordination between decentralized actors is impossible when they communicate over noisy channels [25], [26]. Echoing “more is different” [27], [28], while multicellular systems are indeed made up of individual cells, the emergent principles that govern the self-organizing collective are not simple extensions of single-cell behavior. Development provides a proof-by-example that decentralized systems are capable of robust self-organization. Yet, despite numerous models demonstrating collective self-organization [10], [29], [30], the core principles, such as optimality and information flow, remain largely unexplored.
To explore these fundamental questions in an analytically and numerically tractable framework, we focus on a minimal model of self-organization through local cell-cell communication, Fig. 1. In our model, motivated by lateral inhibition [8], [10], [31], cells communicate imperfectly with their neighbors and are tasked with forming a pattern in which exactly one cell adopts an “inhibitor”-like state while its neighbors adopt an “inhibited”-like state. With this simple patterning task we find that the strategies obey a speed-accuracy trade-off: arbitrarily high patterning accuracy is possible, but at the cost of increasing the time required to pattern. By reframing the problem as a stochastic reaction network [32], we can precisely compute information theoretic quantities, such as the mutual information between cells, allowing us to track how information flows in our system. We contrast this dynamic information between trajectories with simpler instantaneous information quantities that can be computed from data, finding that instantaneous quantities can sometimes decrease as patterning occurs. Next, we show that after optimizing for a multicellular objective, individual cells do not appear to behave optimally when considered in isolation, and that the collectively optimal strategies neither maximize nor minimize the information transferred between cells. Finally, we connect these theoretical results to recent live-imaging experiments of collective patterning through Notch-Delta signaling [33], observing similar non-monotonicity in instantaneous mutual information in the data as in our model and demonstrating how information shared between cells can be inferred directly from data.
To explore the principles of communication and self-organization in a multicellular system, we take a model that accounts for the following essential biological features: (i) cells have some internal state, which could be used to classify cells into cell types or to specify a target pattern, (ii) cells have a way to communicate with each other, (iii) cells control their internal state based on signals they have received, and control the signals they send based on their internal state, (iv) both sending and receiving of signals, as well as control of their internal state, are imperfect and subject to stochastic fluctuations. These criterion are general enough to describe a host of complex developmental feats of self-organization, from neural induction to digit specification. While development regularly features multiple signaling pathways, long range morphogen diffusion, cell divisions, and physical rearrangements, for the sake of tractability we focus on the simpler example of a small fixed number of cells patterning through lateral inhibition with communication across physical cell-cell contacts.
Lateral inhibition is a way to sort a population of cells into two states in a controlled manner by having cells in one state, or advancing towards that state, inhibit their neighbors from similarly advancing towards that state, Fig. 1. In development, this is often implemented through Notch-Delta signaling, although lateral inhibition appears more generally across biology, for instance, to increase sensory perception in neurons [34]. Often, Notch-Delta signaling is combined with some initial pre-patterning to create highly ordered structures [10]. However, even from an initial population of identical cells, Notch-Delta signaling will sort cells into two states through lateral inhibition and stochasticity in intra-cellular dynamics. A 2D tissue of identical cells will thus form a “salt and pepper” pattern of the two cell states under Notch-Delta dynamics.
Notch and Delta are proteins located on the surface of cells. When a Delta protein in cell \(A\) binds to a Notch receptor on cell \(B\), cleavage of Notch is triggered, releasing the Notch intracellular domain. In turn, the Notch intracellular domain translocates to the nucleus where it leads to transcriptional changes within cell \(B\). Having received this signal, cell \(B\) suppresses the production of Delta [8]. In this way, a cell that is expressing Delta will inhibit its neighbors from similarly expressing Delta, thus forming a laterally inhibiting system, Fig. 1C. The two final states being cells with high Notch and low Delta expression, and cells with high Delta expression. To model the full complexities of a specific Notch-Delta patterning system, one would have to account for both the Notch and Delta proteins, their mRNA transcripts, as well as a number of transcription factors and their mRNA transcripts, like Scute and E(spl)m3-HLH. Indeed, there are many existing models of Notch-Delta dynamics that capture varying levels of biological details [11], [29]. However, the key principles of lateral inhibition can be captured with a model where cells have a single internal variable [10], analogous to a reaction coordinate between the two terminal cell states [6].
Specifically, we will study a modified version of the Notch-Delta model in Ref. [10], where instead of a continuous internal variable, our cells have a discrete internal state \(u\in \{0,1,\dots, N\}\). To model cell-cell communication, each cell has a signaling state \(s\) where \(s=0\) means no signal is being received and \(s=1\) means that a signal is being received. In reality, the level of signal depends on how many Notch receptors have been recently cleaved and will not be binary. Cells can adjust their internal state stochastically, Fig 1B, \[\label{eq:intRate} u \xrightleftharpoons[f^-(u+1,s)]{f^+(u,s)} u + 1,\tag{1}\] where the transition rates \(f^\pm(u,s)\) depends on the cell’s internal state as well as its signaling state. The signaling state can also change stochastically, \[\label{eq:sigRate} s =0 \xrightleftharpoons[k^-]{k^+} s = 1,\tag{2}\] where \(k^-\) is taken to be a fixed parameter, but \(k^+\) is a function of the neighboring cells. Specifically, for cell \(i\), we take \[\label{eq:adjRate} k^+ = \sum_{j} A_{ij} \, g(u_j),\tag{3}\] where \(A_{ij}\) is an adjacency matrix where \(A_{ij}=1\) if cells \(i\) and \(j\) are neighbors at time \(t\), and \(A_{ij}=0\) otherwise. This form could be easily modified so that the value of the adjacency matrix depends on the area of contact between cells, or even an area of contact that changes with time, but for now we assume that all neighbors have equal and fixed contact areas. Finally, we assume that the states \(u=0\) and \(u=N\) are absorbing states, once a cell has entered those states it will not leave. Since cells can implement a crude threshold, these absorbing states model a level of Notch or Delta that trigger an irreversible change of fate [8], [12].
With the model specified, we want to know if there exists some set of rates \(\{f^{\pm},k^-,g\}\) under which a system of \(M\) cells will coordinate to make a target pattern. Eventually all cells will reach an absorbing state, either \(0\) or \(N\), after which there are no cell state transitions, and we refer to this as a terminal state. We focus on a system of \(M=3\) cells where every cell is in contact with every other cell, and the target pattern is one cell in the \(N\) state and two cells in the \(0\) state, Fig. 1. This is the smallest network that exhibits decentralized symmetry breaking with each cell coordinating with multiple neighbors.
Crucially, every cell has the exact same set of transition rates; if one cell had rates with \(f^+ >0\), \(f^-=0\) and the other two cells had different rates with \(f^- >0\), \(f^+=0\) then a correct patterning would always be achieved. Instead, we give every cell the same rates and identical initial conditions. If each cell were to stochastically choose an absorbing state without communication, our target pattern would be achieved with probability at most \(4/9\) (Appendix 9). To reliably reach the target pattern, cells must coordinate with each other through signaling. In the next section, we will explore the optimal strategies for reaching the target state, before quantifying the information transferred between cells executing an optimal strategy in section 4.
In our system, cells can receive a signal (through \(s\)), control their internal state (through \(f^{\pm}\)) and send a signal to neighboring cells (through \(g\)). With these essential ingredients for self-organization, our first question is whether regions of the parameter space, \(p = \{ f^\pm,k^-,g\}\) can achieve patterning with a small error, \(\epsilon\), which we define as the probability of our system ending up in a non-target terminal state. Scaling by the fastest time scale, so that \(f^\pm, k^-, g \leq 1\), our model allows for an arbitrarily small error rate (Appendix 11), although this comes at the price of taking infinitely long to reach the terminal states (Appendix 10). This trade-off is generic, occurring for any value of \(M\geq 3\) and any adjacency matrix (Appendix 10). Therefore, a more biologically motivated optimization problem is to specify the degree of error that is permitted, \(\epsilon_{tol}\), and minimize the average time to reach the terminal states, \(\tau\), under the constraint that \(\epsilon \leq \epsilon_{tol}\). We note the same region of the speed-error plane is accessible whether one constrains \(\epsilon \leq \epsilon_{tol}\) and minimizes \(\tau\) or if one constraints \(\tau \leq \tau_{\max}\) and minimizes \(\epsilon\).
The level of permitted error, \(\epsilon_{tol}\) in a fate specification depends on its context within development, and whether any mistake can be corrected or compensated for later on. For instance, ABp specification via Notch-signaling is critical for normal development in C. elegans [35], whereas when Notch-signaling specifies sensory organ precursor cells in Drosophila pupal abdomen, errors can be compensated by later cell rearrangements and in any case a single misspecification is not critical to survival [9]. Ideally, we want to examine how the minimum average time to reach a terminal state changes as a function of the error constraint \(\epsilon_{tol}\), a Pareto front that, barring additional considerations such as evolvability [36] or thermodynamic cost of transcription, translation, and signaling [37], evolutionarily optimized systems should be near. Similar trade-off structures appear for speed and dissipation in transcription [37], and for fluctuations in biochemical reaction networks [38]–[40].
Mathematically, our model is a continuous time Markov chain for the full state of the system \(x_\alpha = ((u_1,s_1), \dots, (u_M, s_M))\), tracking both the internal and signaling states of every cell. A probability distribution over the states evolves according to the master equation, \[\frac{\mathrm{d}}{\mathrm{d}t} P_t(x_\alpha) = \sum_\beta Q_{\alpha \beta} P_t(x_\beta),\] where \(P_t(x_\alpha)\) is the probability of the system being in state \(x_\alpha\) at time \(t\), \(Q_{\alpha\beta}\) is the rate at which \(x_\beta\) transitions to \(x_\alpha\) for \(\alpha\neq\beta\) and \(Q_{\alpha\alpha} = -\sum_{\beta\neq \alpha} Q_{\beta\alpha}\). We can solve this master equation directly, or simulate realizations of the process with the Gillespie algorithm. Finding the error rate \(\epsilon\) or the average time to terminal states \(\tau\) can be done directly from the rate matrix \(Q\), by posing them as first passage-like problems. To do so, we first define \(\tau_\alpha\) to be the expected time to reach a terminal state starting from state \(x_\alpha\) and \(\epsilon_\alpha\) to be the probability that, starting from \(x_\alpha\), the eventual terminal state does not achieve the target pattern. Calling \(\mathcal{T}^G\) the set of terminal states which achieve the target pattern and \(\mathcal{T}^B\) the set of terminal states that do not achieve the target pattern, then \(\tau_\alpha = 0\) for \(\alpha\in \mathcal{T}^G\cup \mathcal{T}^B\), \(\epsilon_\alpha = 0\) for \(\alpha\in\mathcal{T}^G\), and \(\epsilon_\alpha = 1\) for \(\alpha\in\mathcal{T}^B\). For the remaining \(\alpha \notin \mathcal{T}^G\cup \mathcal{T}^B\), it follows that \[\begin{align} \label{eq:MFP} \tau_\alpha &= -\frac{1}{Q_{\alpha\alpha}} - \sum_{\beta\neq\alpha} \frac{Q_{\beta\alpha}}{Q_{\alpha\alpha}} \tau_\beta, \\ \epsilon_\alpha &= - \sum_{\beta\neq\alpha} \frac{Q_{\beta\alpha}}{Q_{\alpha\alpha}} \epsilon_\beta, \end{align}\tag{4}\] and these linear equations can be solved to find \(\tau_\alpha\), and \(\epsilon_\alpha\). Since the rate matrix depends on the parameters \(p = \{ f^{\pm},k^-,g\}\), and \(\tau_\alpha\), \(\epsilon_\alpha\) depend on the rate matrix, we have that \(\tau_\alpha = \tau_\alpha(p)\), \(\epsilon_\alpha = \epsilon_\alpha(p)\). Supposing that we initialize the system in state \(\alpha^*\), we can now pose the minimization problem as \[\begin{align} \label{eq:optim} \min_{p} \tau_{\alpha^*}(p) \quad s.t. \quad \epsilon_{\alpha^*}(p) \leq \epsilon_{tol}, \end{align}\tag{5}\] where, after minimization, we will find both the optimal \(\tau\) along with a set of parameters \(p\) that generate it. As the cells have a discrete internal state, there are only finitely many parameters that we are optimizing over, specifically \(4(N-1)\) for the \(f^\pm\), 1 for \(k^-\) and \(N+1\) for \(g\). The optimization problem is equivalent to a quadratic problem with quadratic constraints (SM), a generically hard problem [41]. In our numerical parameter scans however, we consistently find solutions converge to a small number of local minima from different initializations, whether solving as interior point optimization or through gradient descent (SM).
Minimizing across many values of \(\epsilon_{tol}\), with \(N=6\) internal states and \(\alpha^* = ((1,0),(1,0),(1,0))\), identifies the optimal trade-off curve between error and average time to reach the terminal state, Fig. 2D. Without penalizing error, by setting \(\epsilon_{tol} = 1\), all cells immediately head towards the closest absorbing state, \(u=0\), and consequently never reach a target pattern. In the other limit of extreme precision, \(\epsilon_{tol}\to 0\), it takes increasingly long to reach a terminal state. For intermediate values, the optimal strategy is able to consistently reach a terminal state in a relatively short time compared to the fastest timescale of the system, Fig. 2D. Above the Pareto front, any parameter combination is accessible, Fig. 2D.
The optimal strategy appears to be one of cells stochastically increasing their internal state, the first cell to transition signals to its neighbors who decrease their internal state, Fig. 2 (A-C). These strategies often implement a sharp change in \(f^\pm\) as a function of internal state and signaling and \(g\) as a function of internal state, Fig. 2C. Such sharp responses are a common feature of gene regulatory networks and so this does not represent an unphysical aspect of the model. For small error rate, the optimal strategy after a cell stochastically increases its internal state is for that cell to wait (\(f^+ < 1\)), making sure it is the only inhibitor-like cell before it commits to that fate permanently, \(\epsilon=0.02\) in Fig. 2C . For larger error rates, the optimal strategy is for a cell to head immediately to the inhibitor state (\(f^+ \approx 1\)) after it has increased its internal state, \(\epsilon=0.1\) in Fig. 2C.
This strategy, of a single cell stochastically increasing its internal state before signaling to its neighbors, Fig. 2A, differs qualitatively from the model in Ref. [10], where all cells increase their internal state concomitantly before one cell eventually becomes dominant and inhibits its neighbors. Both concomitant [10] and non-concomitant [9] dynamics have been observed in recent live-imaging of Notch-Delta patterning. In particular, a recent experiment measured Scute expression, a Delta precursor, while sensory organ precursor (SOP) specification was occurring in the Drosophila dorsal histoblast and saw minimal concomitant increase in Scute [9]. Scute expression appeared to increase in future SOP cells while remaining low in non-SOP cells, which then began to increasingly express Notch [9].
Successful collective self-organization requires cells to exchange information. In our system, in the absence of communication cells can only reach the target pattern with a probability of at most \(4/9\). (Appendix 9). With communication, however, they can find the correct pattern to arbitrary accuracy. In this section we quantify the information transferred between cells using techniques of information theory [1], [32], [42], [43]. The central object will be the mutual information, \[\label{eq:MI} I(X;Y) = \mathbb{E}\left[ \log \frac{ P(X,Y)}{ P(X) P(Y)} \right]\tag{6}\] where \(X\) and \(Y\) are random variables, \(P(X)\), \(P(Y)\) are their marginal distributions, and the expectation is taken over \(P(X,Y)\), their joint distribution. The mutual information quantifies how much information one gains about \(X\) upon seeing \(Y\) or vice-versa, and is the natural measure of the information shared between \(X\) and \(Y\) [43], [44]. We could use the terminal state of the system to quantify the information shared between neighboring cells in the final pattern, as has been explored recently [1]. However, as we will see, the final state of the system does not quantify all the dynamic information shared between cells during patterning. Quantifying dynamic information requires computing the mutual information between trajectories \(I(X_0^T;Y_0^T)\), where \(X_0^T\) denotes the trajectory of a time dependent variable \(X(t)\) for \(0\leq t \leq T\). If \(X(t)=u_1(t)\) and \(Y(t)=u_2(t)\), then \(I(X_0^T;Y_0^T)\) quantifies the total information that is shared between cell 1 and cell 2 up until time \(T\). We will also make use of the transfer entropy rate, defined as \[\begin{align} \dot{\mathcal{T}}^{{Y}\to{X}}(t) =&\lim_{dt\to 0}\frac{1}{dt}\mathbb{E}\left[\log{\frac{P_{t+dt}[X(t+dt)|X^{t}_{0},Y^{t}_{0}]}{P_{t+dt}[X(t+dt) |X^{t}_{0}]}}\right], \end{align}\] which quantifies the directed information transfer rate from \(Y\) to \(X\), and provided \(X\) and \(Y\) cannot simultaneously change [32], is related to the mutual information by \[\begin{align} \label{eq:MI2} I(X_0^T;Y_0^T) = \int_0^T \left[ \dot{\mathcal{T}}^{{Y}\to{X}}(t) + \dot{\mathcal{T}}^{{X}\to{Y}}(t) \right]\, dt . \end{align}\tag{7}\] In this section, we measure the magnitude of the information flow, and by computing conditional transfer entropy rates, we explore how information is transferred from inhibitor to inhibited cells and vice-versa, as well as between inhibited cells.
It is well established theoretically that gene regulatory networks can process information, and that the mutual information between the network’s inputs and its outputs can be quantified [37], [44]–[48]. Information has been quantified in experimental systems, both directly from trajectories [49]–[51], and from building detailed models that are fit to experimental data [42], [52]. A handful of developmental problems have been studied from the perspective of information theory. The information contained in the Drosophila gap gene pattern was experimentally quantified [4], [43], [53], and it has been proposed that the gene regulatory networks optimally create an information rich pattern [54]. In ascidian neural induction, a recent study compared experimental cell-cell contact surfaces to the surfaces that would maximally transfer information between signal sending and signal receiving cells [19]. In this existing literature, the signal is taken as exogenous; a cell’s response to a signal does not impact the signal’s future values. While this is often an appropriate assumption, such as in ascidian neural induction [18], [19], this need not hold in general, and indeed does not hold for lateral inhibition. In our model, cells are both sending and receiving signals and the signals they receive affect the future signals they will send. Any information theory analysis of our system must take this into account.
In addition to this feedback, other common approximations which simplify the computation of information theoretic quantities do not hold for a laterally inhibiting system. As we have seen, such systems are time dependent and cannot be approximated as a series of static input and output relationships. Moreover, the multi-modal nature of cell fate transitions means that we cannot approximate the dynamics as a uni-modal Gaussian process, for which computations are more tractable [55], [56]. It is, in theory, possible to simulate our system with the Gillespie algorithm and, treating the simulated data like it were experimental data, attempt to directly estimate the mutual information. However, direct estimation of mutual information remains challenging due to the high dimensionality of the space of trajectories and such estimators do not take advantage of any of the known structure of our model.
Computing the mutual information between trajectories is difficult even when the underlying stochastic model is known. To see this, suppose we wanted to compute a mutual information, \(I(X_{0}^T;Y_{0}^T)\), where \(X(t)\) could be \(X(t)=u_1(t)\) or \(X(t) = [u_1(t),u_2(t)]\), similarly for \(Y(t)\). While the path measure of a CTMC has a closed form expression, working out the mutual information requires computing marginal path measures, such as \(P(X_0^T)\), which are intractable analytically. Recently, Monte-Carlo sampling techniques that take advantage of the known model structure, have been used to estimate otherwise intractable terms in mutual-information-like computations [57], [58]. In these approaches an outer expectation is estimated by Monte-Carlo sampling but once a sample has been drawn, actually computing the corresponding value of the integrand requires another round of Monte-Carlo sampling. While advanced sampling techniques like these may be required for many problems, in our problem we can exactly compute the integrand in equation 7 following Ref. [32]. There, the authors consider a stochastic reaction network with \(K\) reaction channels, \(n\) chemical species \(Z_1,\dots,Z_n\), with vector \(Z(t)\) recording the copy number of each species at time \(t\), and the \(k^{th}\) reaction occurring at a rate \(\lambda_k(Z(t))\). If \(X(t)\) and \(Y(t)\) are variables, or disjoint subsets of variables, in \(Z(t)\) that do not change simultaneously, then \[\begin{align} \label{eq:SRNMI} I(X_0^T; Y_0^T) =& \mathbb{E}\left( \sum_{k\in R_{X}}\int_0^T \log \frac{\lambda_k^{XY}(s)}{\lambda^X_k(s)} dN_k(s)\right. \\ \notag &\quad \left. +\sum_{k\in R_{Y}} \int_0^T \log \frac{\lambda_k^{XY}(s)}{\lambda^Y_k(s)} dN_k(s) \right), \end{align}\tag{8}\] where \(R_A\) denotes the set of reactions that involve a change in \(A\), \(dN_k(t)\) is the increment of \(N_k(t)\) which counts the number of times reaction \(k\) has occurred, and \(\lambda_k^A = \mathbb{E}[\lambda_k(Z(t))| A_0^T]\) is the expected rate of the \(k^{th}\) reaction given \(A_0^T\). The first and second terms in the expectation correspond to the transfer entropies \(\mathcal{T}^{Y\to X}(T)\) and \(\mathcal{T}^{X\to Y}(T)\) respectively [32], where throughout we define the transfer entropy to be \(\mathcal{T}^{A\to B}(T) = \int_0^T \dot{\mathcal{T}}^{A\to B}(t)dt\). Note that equation 8 appears as equation A6 in Ref. [32] although here we have neglected the integrals with expectation zero. To actually compute the integrand, we need to compute \(\lambda^A_k\), which requires knowing the conditional distribution \(\pi^A(\bar{z},t) := P(\bar{Z}(t)=\bar{z}|A_0^T)\), where \(\bar{Z}(t)\) is a vector that tracks all molecular abundances except those in \(A\). This conditional distribution obeys a filtering equation [32], [59],
\[\begin{align} \label{eq:filter} d\pi^A(\bar{z},t) = &\sum_{k\in S_{\bar{Z}}} \left[\lambda_k(\bar{z}-\nu_k^{\bar{Z}},A(t))\pi^A(\bar{z}-\nu_k^{\bar{Z}},t) - \lambda_k(\bar{z},A(t))\pi^A(\bar{z},t) \right]dt \\\notag & - \sum_{k\in R_A} [\lambda_k(\bar{z},A(t)) - \lambda_k^A(t) ]\pi^A(\bar{z},t)dt + \sum_{k\in R_A} \left[ \frac{\lambda_k(\bar{z}-\nu_k^{\bar{Z}},A(t))}{\lambda_k^A(t)}\pi^A(\bar{z}-\nu_k^{\bar{Z}},t) - \pi^A(\bar{z},t)\right]dN_k(t), \end{align}\tag{9}\]
where \(\nu_k^{\bar{Z}}\) is a vector that represents the change in \(\bar{Z}\) after the \(k^{th}\) reaction, and \(S_{\bar{Z}}\) is the set of reactions that only involve species in \(\bar{Z}\). The first summation represents a master-like equation representing the evolution of the latent variables under reactions that are exclusive to them, the second summation represents an updated belief about the latent variables given that no reaction has occurred and the third summation represents an updated belief given that reaction \(k\) occurred. Keeping track of \(\pi^A(\bar{z},t)\) for every possible count of the latent species \(\bar{z}\) is typically not practical in chemical reaction network, and the filtering equation is used to construct moment closure approximations [32], [59].
We can interpret our model, equations 1 –3 , as a stochastic reaction network, albeit an unusual one with non-linear reaction rates which constrain the copy number of each species to be bounded above by a finite number. This finiteness of copy number means that we need not approximate equation 9 , and instead can solve it exactly. Moreover, it can be solved by discontinuously changing \(\pi^A(\bar{z},t)\) at the points when a reaction in \(R_A\) occurs, and away from those points solving a non-linear differential equation. Our general strategy for computing the mutual information in equation 8 is to (i) draw a sample trajectory from the full system (ii) Solve for \(\pi^A(\bar{z},t)\) for \(0\leq t \leq T\), and use this to compute \(\lambda_k^A(t)\), for \(A = X, Y, [X,Y]\), (iii) compute the value inside the expectation in equation 8 . The overall expectation is computed by a Monte-Carlo average taken over many samples, repeating steps (i)-(iii) for each sample.
Similarly to the information \(I(X_0^T;Y_0^T)\), we can compute an information rate \(\frac{dI(X_0^t;Y_0^t)}{dt}\), as well as transfer entropy rates, \(\dot{\mathcal{T}}^{X\to Y}(t)\) and \(\dot{\mathcal{T}}^{Y\to X}(t)\), representing the directed rate of information transfer from \(X\) to \(Y\) and vice-versa. See SM for additional numerical details.
We are now in a position to quantify information flows in our system. For the optimized model with error \(\epsilon = 0.02\), and taking \(X=u_1\), \(Y=u_2\), we find that there is a sharp increase in the rate of mutual information which then sharply decreases, Fig. 3A. Each cell will eventually reach an absorbing state after which no further “reactions” involving \(X\) or \(Y\) occur, and hence that trajectory makes no further contributions to the mutual information in equation 8 . The probability that a cell has not reached an absorbing state decays exponentially with time, and indeed, the total mutual information between the trajectories of \(u_1\) and \(u_2\) clearly asymptotes as \(T\to\infty\), Fig. 3A. Due to the symmetry of the problem, any pair of internal variables will have the same mutual information between their trajectories as any other pair and the transfer entropy between them in both directions will be exactly half the value of the mutual information.
For any given value of the error, we can find the optimal model, and compute the exact same information theoretic quantities as we have above for \(\epsilon = 0.02\). We find that the total information exchanged between two trajectories decreases monotonically as the allowed error increases, reaching around zero at roughly the point when a zero-communication strategy is possible, Fig. 3B. In the limit of \(\epsilon \to 0\), it is possible for the trajectory of cells to share an arbitrarily large amount of information with each other (Appendix 12).
In addition to looking at pairs of cells, we can take one cell and look at the information it has about the remaining cells, or \(X = u_1\), \(Y = [u_2,u_3]\). We similarly see a sharp increase in the information shared between them before this curve asymptotes, at around \(1.89\) nats for \(\epsilon = 0.02\), Fig. 3C. Interestingly, this quantity greatly exceeds the \(0.693\) nats (\(1\) bit) that can be shared between a cell and a fixed binary external signal. Additionally the transfer entropy also exceeds \(1\) bit, and is asymmetric with each cell receiving more information from its neighbors than it sends, Fig. 3C. This shows that, whether through signaling timing or repeated signal activation, the binarized signaling state is capable of transferring more than one bit of information. Similarly to the pair of cells, the mutual information between one cell and the remaining cells decreases monotonically as the allowed error increases, Fig. 3D.
Due to the inherent symmetry of our system, each cell is initially equally likely to be the inhibitor cell, and this symmetry obscures the way information is transferred. For instance, suppose that information only flowed from the inhibitor cell to the inhibited cells. The naive transfer entropy calculation would still find every pairwise transfer entropy rate to be equal, say \(\dot{\mathcal{T}}^{1\to2}(t) = \dot{\mathcal{T}}^{2\to1}(t)\) since this quantity is computed over trajectories where cell 1 is the inhibitor and trajectories where cell 2 is the inhibitor. Instead, we would like to decompose our information flows by somehow removing the symmetry that any one of the cells is equally likely to be the inhibitor. In systems with feedback, such as this one, such decompositions can violate the data processing inequality or subtly introduce fictitious dependencies, and thus require careful interpretation. With those caveats in mind, we can define transfer entropy rates conditioned on some terminal event, for instance \(\dot{\mathcal{T}}^{1\to2 | Z}(t)\) where \(Z\) is the event that terminal state is \(u_1 = N\), \(u_2=u_3=0\) is reached. In general, for the event \(Z\) that some final absorbing state is ultimately reached, conditioning the CTMC on \(Z\) results in another CTMC. Specifically, using Doob’s h-transform [60], the rate matrix of the conditioned system is \[Q_{\alpha \beta | Z} = Q_{\alpha \beta} \frac{P(Z|\alpha)}{P(Z|\beta)},\] for \(\alpha \neq \beta\) and \(Q_{\beta\beta | Z} = -\sum_{\alpha\neq \beta} Q_{\alpha\beta|Z}\), where \(P(Z|\alpha)\) is the probability of the event \(Z\) given the system is in state \(\alpha\). Quantities of the form \(P(Z | \alpha)\) can be computed in the same way we compute \(\epsilon_\alpha\) (equation 4 ), after which we can use this new conditioned CTMC, to compute information theoretic quantities exactly as before. We find that the conditioned transfer entropy rates are significantly higher from the inhibitor cell to the inhibited cells then in the reverse direction, Fig. 4A,C. Nevertheless, the flow of information goes both ways with a non-zero conditioned transfer rates from the inhibited states to the inhibitor state, Fig. 4C. Further, there is a non-zero conditioned transfer rate between the two inhibited cells (Appendix 13). Exploring these information theoretic quantities highlights the complex picture of information flows in a self-organizing system.
To completely quantify the information shared between cells and to account for situations where a change in one cell only affects another cell at a later time, information quantities should be computed between entire trajectories, as we have done so far. However, while it is possible to compute these quantities exactly in our model, computing the mutual information between trajectories directly from biological data remains impractical due to the high dimensional space which trajectories inhabit. A more practical approach is to compute information theoretic quantities using only the instantaneous state of the system, rather than full trajectories. In this section, we explore what can be learned from instantaneous information quantities alone.
Consider the mutual information between the internal state of two cells at some fixed time \(t\), \(I(u_1(t);u_2(t))\), which we will refer to as the instantaneous mutual information. Due to the data processing inequality the instantaneous mutual information is always less than the mutual information between the trajectories up until that point, or \(I(X_0^t;Y_0^t) \geq I(X(t);Y(t))\). A related quantity called correlational information (CI) was recently proposed [1], defined as \[CI(t) = \frac{1}{M}\mathbb{E}\left[\log \frac{P(u_1(t),\dots,u_M(t))}{\prod_{i=1}^M P_i(u_i(t))}\right],\] where in our system, all cells are equivalent and hence the marginal probability \(P_i\) is the same for every cell. Note that \(M\times CI\) is sometimes referred to as the multi-information or the total correlation of the random variable \((u_1(t),\dots,u_M(t))\) [61]. Analogously to how we compute mutual information between trajectories, we can compute a trajectory version of correlational information (SM).
Plotting the instantaneous mutual information for a particular choice of \(\epsilon=0.02\), we find that the instantaneous information is indeed less than the full dynamic information by a factor of around \(4-6\), Fig. 5. In the long time limit, \(t\to\infty\), the instantaneous mutual information between a pair of cells is around \(0.156\) nats, or around \(15.3\%\) of the total information shared between the full trajectories. The final state instantaneous mutual information is smaller than the dynamic mutual information for all values of \(\epsilon\), with the largest difference occurring at small \(\epsilon\), Fig. 3B, C. In fact, since the final state can only take one of two values, \(u_1(t\to\infty)\in\{0,N\}\), we have \(I(u_1(t\to\infty);u_2(t\to\infty)) \leq H(u_1(t\to\infty)) \leq \log 2\). In contrast, the dynamic information shared between cells can be unbounded in the \(\epsilon\to0\) limit (Appendix 12).
Intriguingly, the shape of the instantaneous mutual information curve, as well as the correlational information curve, is non-monotonic, exhibiting a local maxima at a finite time, Fig. 5. Heuristically, we can understand this through the following argument. Suppose after some long time you learn that one cell is at state \(N\). You can deduce that its neighbors are likely to be in state \(0\). If you know one cell is at state \(0\) you deduce that one of its neighbors is likely in state \(N\) and the other one is likely in state \(0\). Now suppose that we view the system at some intermediate, but later time. Upon learning that one cell is in state \(N\) or \(0\), we may make similar deductions as before. However, if the cell is in an intermediate state, we can deduce that the neighboring cells are also likely to be in an intermediate state. Therefore at the intermediate time, there are more possible states for the neighbors to be in, and hence one can learn more information about the neighbors by knowing the state of a cell than at the final time. This highlights how a careful interpretation of instantaneous mutual information is required, and that a decrease in instantaneous mutual information does not necessarily mean that the system is becoming less coupled or more disordered.
Through evolutionary selection, development is tuned to robustly generate viable, functional offspring. Many cellular processes, including Notch-Delta signaling, are highly conserved and have been tuned across millions of years of evolution. It is therefore natural to ask questions about optimality such as, how gene regulatory networks can optimally process intra-cellular information [44]–[46], how cells can optimally infer an external signal [12], [13], [62] and how cells can act optimally to control their environment [15]. Similarly, development has been modeled by cells acting as optimal Bayesian agents, seeking to minimize uncertainty about their cellular identity [30]. However, fitness is defined at the level of the organism or even the level of the community, making it challenging to ascribe optimality to any particular component of the system. We find that by optimizing a collective objective, the actions of individual cells need not appear individually optimal. Additionally we find that the collectively optimal system does not maximize information flow. In this section, we briefly examine how collective optimization results in individual cells appearing to act sub-optimally.
Previously, in equation 5 we minimized over one set of parameters \(p\), that every cell shared. Equivalently, we could have given each cell its own set of parameters \(p_1\), \(p_2\), and \(p_3\), and optimized over all of these together with the constraint that \(p_1=p_2=p_3\). Fix the parameters of two cells, say cells \(1\) and \(2\), to follow the collective optimum, \(p_1^*=p_2^*\) and consider the remaining cell \(3\). From the perspective of this cell alone, the strategy given by the collectively optimized parameters \(p_3^*\) is not optimal. Cell \(3\) can optimize its parameters to find a new strategy \(\tilde{p}_3\) which achieves a faster time to the terminal state with the same error, Fig. 6. However all cells adopting \(p_3^*\) is superior to all cells adopting \(\tilde{p}_3\) as (i) the resulting system is more error prone, and (ii) at the new error rate, the strategy is far from optimal, Fig. 6. This reminds us that for self-organizational problems the objective is a collective one. The actions of a single cell, when treating the remaining cells and signaling environment as an exogenous mean field, may appear suboptimal. Only in the context of the collective problem are cells’ actions optimal. Similar trade-offs where individual strategies appear suboptimal yet enable collective coordination have been recently observed at larger biological scales [63].
A signaling pathway, taken in an isolated cell with an exogenous signal is often considered as optimally translating the information from the external signal into an internal state [37], [44]. However for a collective, self-interacting system with feedback, where the signal is sent and received by cells, it becomes less clear what optimal signal processing should look like. For a given rate of error, we find suboptimal solutions that both transfer more information between cells than the optimal solution as well as less information, Fig. 7. When the objective is simply to reach the target state as quickly as possible with some allowed error rate, cells do not optimize the information transferred between them.
To explore our theoretical predictions, we examine recent experimental data of Scute expression during sensory organ patterning in the Drosophila pupal abdomen, an example of lateral inhibition through Notch-Delta signaling [9]. Broadly, there are two ways we could connect our theoretical model to experimental data: by fitting the model parameters to experimental data and analyzing the fitted model [42], [52], or by estimating information theoretic quantities directly from measurements without assuming a particular model [49]–[51]. Given the deliberately minimal structure of our model, we adopt the latter approach. Specifically, we compute instantaneous mutual information directly from experimental trajectories. Consistent with our theoretical analysis in Section 5, we find that this measure varies non-monotonically over the course of cellular patterning.
To investigate pattern formation in an experimental laterally inhibiting system, we analyze data from Ref. [9], who performed live imaging of sensory organ formation in Drosophila pupal abdomen. In these experiments, the transcription factor Scute, along with a nuclear marker, were endogenously tagged with fluorescent reporters. Scute is part of the Notch-Delta feedback loop, it upregulates expression of Delta, and it is indirectly suppressed by the activation of Notch receptors. Thus, Scute serves as a proxy for the “reaction coordinate” between the inhibited state and the inhibitor state, with high Scute expression indicating the inhibitor state and low Scute expression indicating the inhibited state. Moreover, the fluorescence intensity of Scute within a nucleus provides a quantitative readout of its expression for each cell simultaneously. During the imaging period, cells coordinate through Notch-Delta signaling to create a somewhat regular pattern of Delta expressing cells, known as Sensory Organ Precursor (SOP) cells, Fig. 8A. Errors where two neighboring cells are both SOPs occur around 10\(\%\) of the time, but can be corrected through cell rearrangements [9]. In total, three live-imaging experiments of this form were performed. Each experiment was imaged for around 12 hours and has around 300 time points each of resolution \(660\times900\times24\) pixels\(^3\) corresponding to a physical region of around \(257.4\times351\times31.9\,\mu m^3\).
To explore information transfer in the experiments, there are a number of complications with the experimental system that are not present in the model we must consider. Cells divide, die, and move around in the tissue which complicates the direct application of the mutual information formula.
Following Ref. [9], we take the intensity of Scute in a cell, and compare this to the average intensity of its neighbors, Fig. 8B-C. The neighborhood is determined by taking the three-dimensional Delaunay tessellation of the nuclei centroids and retaining only edges that are shorter than \(12\mu m\) (SM). Note that this neighborhood can change in time. We next select only the nuclei in a region of interest, by drawing a box that approximately includes the anterior dorsal histoblast nest, where patterning is occurring, and not the surrounding regions. We then apply a difference-of-Gaussian filter to the intensities to remove regional variations in background fluorescence, which could erroneously correlate low Scute intensity cells and their neighbors (SM). Having processed the data, for each time point we have a series of pairs \((u_i(t),v_i(t))\) where \(u_i(t)\) is the intensity of the \(i^{th}\) cell, and \(v_i(t)\) the mean intensity of its neighbors. To compute the mutual information between \(u\) and \(v\), we apply the Kraskov–Stögbauer–Grassberger estimator [64] (SM), finding that the mutual information initially increases in time before decreasing, and this behavior is consistent across all experiments Fig. 8D. Estimating mutual information from finite data is challenging and data from uncorrelated variables can result a non-zero point estimate. To confirm that our estimate of mutual information is statistically significant, we create a null data set by randomly reassigning the identity of each neighbor, essentially creating a set of pairs \((u_i(t), v_{\sigma(i)}(t))\) where \(\sigma\) is a random permutation. In this null data set the marginals are preserved but now \(u\) and \(v\) are approximately independent. For much of the time when patterning is occurring, the mutual information estimation exceeds \(95\%\) of estimations computed from the null data set, demonstrating that there is statistically significant information shared between a cell and its neighbors, Fig. 8D.
To explore the underlying principles of decentralized self-organization, we introduced a tractable model of a laterally inhibiting system and demonstrated that it is capable of reaching a target pattern starting from initially identical cells. Our analysis showed that a trade-off exists between patterning accuracy and time taken to pattern, resulting in a Pareto front of optimized solutions with varying error rates. By reframing the model as a stochastic reaction network, we were able to directly compute quantities such as the mutual information and transfer entropies between cell trajectories, revealing how information is transferred between cells. Having optimized for collective patterning speed and error rate, the solutions do not appear to optimize for the flow of information, nor do cells appear individually optimal when considered in isolation. Computing the mutual information between trajectories from data alone remains challenging, and so we explored more tractable information measures, such as the instantaneous mutual information, finding that these quantities may display counterintuitive behavior such as non-monotonicity in time. Finally, we computed such quantities in experimental measurements of Notch-Delta patterning and once again found that instantaneous quantities can be non-monotonic in time as patterning occurs, even as the total dynamic information shared between cells strictly increases in time.
In development, the starting point of patterning is often inhomogeneous. For example, in addition to Notch-signaling, there is also a Delta pre-pattern in SOP formation in Drosophila dorsal thorax [10], allowing for a more regular array of SOP cells. Rather than all cells being initially identical, we could consider them to have some initial pre-patterning containing some useful, but not perfect, information. This would still mean that cells need to coordinate to achieve a target pattern, but presumably could do so faster than without the pre-pattern for the same accuracy requirement. How optimal strategies and information flow change as the information contained in the pre-pattern change remains an open question for future study.
Another assumption of our model is that every cell has exactly the same rates as every other cell. In reality, cells are heterogeneous and will differ somewhat in their initial condition as well as their parameters and hence response to a signal [51], [65]. This initial heterogeneity need not represent a pre-pattern or contain useful information, it could be entirely stochastic. Even so, heterogeneity may well enhance the patterning ability of our system [9], by breaking the initial symmetry between cells. Another source of heterogeneity comes from the non-regular initial packing of cells, along with cell divisions, death, and rearrangement, so that different cells experience different, and dynamic, neighborhoods. It would be interesting to explore how varying amounts of heterogeneity change the optimal speed-accuracy trade-off curve.
Although our model reproduces behavior that is observed in real experimental systems, such as the scale of information transfer and non-monotonicity, it is not intended to be a detailed biological model. Relatedly, due to the complexities of gene regulatory networks, rather than modeling every molecular event conceptual progress has been made on understanding cell fate transitions by using “gene free” phenomenological models [5], [10], [16]. Similarly, simple discrete models have proved powerful for exploring the concepts underlying cell fate patterning without explicitly modeling the complex underlying gene regulatory networks [66]–[68]. Here, we were able to gain insight into optimal patterning and information flow, by studying a simplified phenomenological model. Supposing instead, that we wished to precisely calculate information theoretic quantities in a specific biological system. As we have seen, some simple information measures, such as the instantaneous mutual information shared between cells, can be computed directly from data. These quantities can be insightful, but are necessarily an underestimate of the true trajectory mutual information. Estimating full trajectory quantities directly from experimental data remains impractical with existing estimators due to the high dimensionality of trajectories, although this remains an active area of research [69]. Seemingly, the most promising approach is then to build a detailed model, fit this to data, and then compute the information transfer within this model [37], [42]. While such computations have not yet been attempted in a detailed model with multiple cells and feedback, advances in computing transfer entropy rates [57] might render such computations possible, albeit numerically expensive.
Throughout, for computational simplicity, we bound all rates above by 1, which essentially constrains the fastest time scale in our system. Therefore, strategies which create a high accuracy pattern at the cost of taking a long time, have a time scale separation between the patterning time scale and the time scale of the internal dynamics or signaling. In other words, accurate self-organized patterning requires transcriptional and signaling dynamics to occur on a faster time scale than patterning. For many developmental processes, this may well be a limiting constraint. The time scale of transcriptional and translational dynamics is on the order of tens of minutes [70], which would constrain accurate self-organized patterning to be on the order of hundreds of minutes, which is indeed a typical patterning time scale. Conversely, responding to an exogenous signal, such as a pre-existing morphogen gradient, need only take as long as the transcriptional and translational time scales. Perhaps one reason why pre-patterning and exogenous signals are often used in combination with self-organization is not that self-organization approaches are incapable of achieving the desired accuracy, but that such approaches take inordinately long compared to self-organization with pre-patterning.
We are grateful to François Schweisguth for sharing experimental data with us and Minh Son Phan for image processing advice. We thank Henry Mattingly, Matthew Smart, and David Denberg for helpful discussions. A.T. acknowledges the support of the Summer@Simons program. The authors also acknowledge the MIT Office of Research Computing and Data for providing high performance computing resources that have contributed to the research results reported within this paper. This research received support through Schmidt Sciences, LLC (J.D.), the MathWorks Professorship Fund (J.D.).
Without the ability to communicate, each cell can only adjust its internal state to reach either absorbing state, 0 or \(N\). The specific choice of transition rates affects the time it takes to reach an absorbing state, but whatever the choice of rates there will be a probability \(q\) of reaching \(0\) first and a probability \(1-q\) of reaching \(N\) first. Since each cell has to have the same transition rates, and without communication the cells are independent, the probability that exactly one cell reaches \(N\) and that two cells reach \(0\) is \(3q^2(1-q)\). Maximizing over \(q\), we find \(q=2/3\) and the probability is \(4/9\).
Consider a system of \(M\geq3\) cells and a set of good terminal states where for every good terminal state the number of cells in state \(N\), or \(n_d\), is some fixed number, \(0 < n_d < M\). It need not be the case that any terminal configuration with \(n_d\) cells in state \(N\) is a good state, only that all good states have \(n_d\) cells in state \(N\). Previously, \(M=3\) and every good terminal state had exactly one cell in state \(N\), or \(n_d = 1\). For this system, suppose there exists some set of parameters \(p\) such that the error rate is \(\epsilon\). In this case we can show that the average time taken to reach a terminal state will be at least \(M^{-2}(3M+3)^{-1}(1-\epsilon)\epsilon^{-(2N+2)^{-M}}\), demonstrating that precision comes at the cost of time. To prove this, we will need the following lemma.
Lemma 1. If there is a non-zero probability of reaching a good terminal state, there is a non-zero probability of reaching a bad terminal state.
Let \(\mathcal{X}\) be the set of self-sufficient single cell states: those from which the cell has a non-zero probability of reaching an absorbing state (\(0\) or \(N\)) without receiving any further signals. This could include states where the signal must turn off to reach an absorbing state. The absorbing states are trivially included in this set. We need not assume anything about the cell-cell adjacency matrix although it suffices to prove the lemma for a connected graph and apply the result to the disconnected components of a general adjacency matrix.
Then each cell can, with non-zero probability, reach an absorbing state without ever receiving any signal from its initial state. Hence there is a path with non-zero probability through which all cells can reach the same absorbing state, which would result in a bad terminal state.
This means that every cell must receive a signal to progress to an absorbing state. We can trace the steps a cell takes to go from \((1,0)\) to when it first enters \(\mathcal{X}\). In doing so, it must be possible for a cell to reach a state \((w,0)\) where \(g(w)>0\) before the cell receives a signal. If not, then no cell could ever receive a signal. Also, there must be a self-sufficient state that is reachable from \((1,0)\), \((v,s)\in \mathcal{X}\), for which \(g(v)>0\). This is because a successful trajectory starts with no cells being in \(\mathcal{X}\), ends with \(M\) cells being in \(\mathcal{X}\). Since only one cell state changes at a time, there must be a point at which \(M-1\) cells occupy a state in \(\mathcal{X}\) and one does not. To be successful, this final cell needs a signal to enter \(\mathcal{X}\) which requires a non-zero \(g\) from one of the cells already in \(\mathcal{X}\).
Now consider the following finite sequence of steps:
Move cell \(1\) into the state \((w,0)\) with \(g(w)>0\).
Move the remaining cells, starting with the neighbors of cell 1, into the self-sufficient state \((v,s)\) with \(g(v)>0\).
Move cell \(1\) to \((N,s)\) (the exact signaling state does not matter).
Now move the remaining cells into the absorbing state \(N\) if it is accessible from \((v,s)\), else move them to \(0\).
If state \(N\) is accessible from \((v,s)\), or if \(n_d\neq1\) then this produces a path with non-zero probability that reaches a bad terminal state. If \(N\) is not accessible from \((v,s)\) and \(n_d=1\), then consider the modified sequence of steps.
Move cell \(1\) into the state \((w,0)\) with \(g(w)>0\).
Move \(M-2\) of the remaining \(M-1\) cells, starting with the neighbors of cell 1, into the self-sufficient state \((v,s)\) with \(g(v)>0\).
Move the final cell to \((N,s)\) (the exact signaling state does not matter).
Move cell \(1\) to \((N,s)\) (the exact signaling state does not matter).
Now move the remaining cells into the absorbing state \(0\).
The final state has 2 cells in state \(N\) and hence this represents a path to a bad terminal state with non-zero probability.
Proposition 1. For an \(M\) cell system with error rate \(\epsilon < 1\), the average time to reach the terminal states \(\tau\) satisfies \(\tau \geq M^{-2}(3M+3)^{-1}(1-\epsilon)\epsilon^{-(2N+2)^{-M}}\)
One can interpret a CTMC as a Markov chain for the sequence of states, along with a set of residence times drawn from an exponential distribution. Specifically, we can write the probability of making a particular transition from \(j\to i\), given that we are in state \(j\) as \(\mathbb{P}(j\to i |j) = W_{ij}/\sum_{k\neq j} W_{kj}\). Since, in our system, only \(3M\) transitions are possible from any given state, and each transition rate is bounded by 1, we can conclude that \(\mathbb{P}(j\to i |j) \geq W_{ij} / 3M\). For a path \(\mathcal{P} = \{\alpha_1 \to \alpha_2 \to \dots \to \alpha_{l+1}\}\), we have that \[\begin{align} \mathbb{P}(\mathcal{P}|\alpha_1) &\geq \prod_{k=1}^{l} \left( \frac{W_{\alpha_{k+1} \alpha_{k}}}{3M} \right) \\ \notag &\geq \left[\min_{1\leq k \leq l} W_{\alpha_{k+1} \alpha_{k}}/3M \right]^{l}, \end{align}\] where by considering solely the Markov chain path, we have effectively marginalized over the possible waiting times. If we take \(\alpha_1\) as the initial condition (\(\alpha^*\)), and \(\alpha_{l+1}\) as a bad terminal state, then \(\mathbb{P}(\mathcal{P}|\alpha_1) \leq \epsilon\), and hence \[3M \epsilon^{1/l} \geq \min_{1\leq k \leq l} W_{\alpha_{k+1}\alpha_k}.\]
Since \(\epsilon<1\), there is a non-zero probability of reaching a good terminal state and so Lemma 1 tells us there is a non-zero probability of reaching a bad terminal state. Taking such a path to a bad terminal state, the smallest rate along this path satisfies \(0 <W_{\alpha_k,\alpha_{k+1}} \leq 3M\epsilon^{1/l}\). Without loss of generality, we can remove any loops in this path (places where \(\alpha_r = \alpha_q\), \(r\neq q\)) leaving a loop free path with non-zero probability, and since there are at most \((2N+2)^M\) states along this path we have \(0 <W_{\alpha_k,\alpha_{k+1}} \leq 3M\epsilon^{(2N+2)^{-M}}\).
Now we progressively prune our rates. In particular, we formally set whatever parameter determines this smallest rate to zero. This parameter will either correspond to exactly one of the \(f^\pm,k^-\). If it corresponds to a \(k^+\), then all the \(g\) values involved must be smaller than \(3M\epsilon^{(2N+2)^{-M}}\), and so set them all to zero in the new system. If it is possible to reach a good terminal state in this new system, it is possible to reach a bad terminal state by Lemma 1, and hence there exists a path with non-zero probability which, as before. So, we prune again. We repeat this procedure until there are no possible paths to the good terminal state (which must exist as there are only finitely many parameters). At this point, we can conclude that to reach the good terminal state, the system must make a transition where the rate is “slow”. Typically that means that the transition rate is at most \(3M\epsilon^{(2N+2)^{-M}}\), although in the case of a \(k^+\), it could be the combination of \(M-1\) small \(g\) values and is at most \(3(M-1)M\epsilon^{(2N+2)^{-M}}\). The average time to reach a good terminal state, \(\tau_g\) satisfies \(\tau \geq (1-\epsilon)\tau_g\), and hence if we can bound \(\tau_g\) we can bound \(\tau\). To bound \(\tau_g\) we can ask how long it takes on average to make one of these slow transitions, given that at least one of these transitions must be made to reach the good state. Since at most \(3M\) transitions are possible from any given state (of which, at most \(M\) correspond to a \(k^\pm\)) even if the system was in a state where every possible transition was a slow transition, the rate at which a slow transition occurs would be at most \(M^2(3M+3)\epsilon^{(2N+2)^{-M}}\). Thus, this quantity bounds the rate at which a slow transition occurs, whatever state the system is in. Hence, the average time for such a transition to occur is at least \(M^{-2}(3M+3)^{-1}\epsilon^{-(2N+2)^{-M}}\), and hence in total, \(\tau \geq M^{-2}(3M+3)^{-1}(1-\epsilon)\epsilon^{-(2N+2)^{-M}}\). Although not the tightest bound possible, this still shows that there must exist a speed-accuracy tradeoff and \(\tau\to\infty\) as \(\epsilon\to0\).
Here we explicitly construct a solution that can achieve an arbitrarily small error rate. Motivated by the appearance of numerically optimized solutions, let us take \(f^+(1,0) = \eta\), \(f^+(i,s) = 1\) for \(i>1\), \(f^+(1,1) = 0\). \(f^-(i,0) = 0, \;\forall i\), \(f^-(1,1) = 1\), \(f^-(i,1) = 0\) for \(i>1\), \(k^-=0\), \(g(1)=0\) \(g(i)=1\) for \(i>1\), \(\eta \ll 1\).
To compute the error rate, suppose that the first transition has occurred and hence a cell has transitioned from \((1,0)\to (2,0)\). At this point it will reach \(N\), and contribute a constant signal \(g=1\) to its neighbors. Precisely when it transitions or whether it receives a signal is of no relevance for computing the error rate. Each of the next cells has a choice, they could transition to \((2,0)\) with probability \(\eta/(1+\eta)\) or transition to \((1,1)\) with probability \(1/(1+\eta)\). A transition to \((2,0)\) guarantees the system will reach a bad terminal state, whereas a transition to \((1,1)\) guarantees that cell will eventually reach \((0,1)\) and will not signal to its neighbors. Thus if it does transition to \((0,1)\) in order for the system to reach the good terminal state the remaining cell has to also transition to \((0,1)\) which occurs again with probability \(1/(1+\eta)\). Hence the probability of failure is \(1 - 1/(1+\eta)^2 \approx 2\eta\).
The expected time to the first transition is \(1/3\eta\). After which time, each cell can only ever make at most \(N-1\) (for \(N>2\)) transitions, each of which have a waiting time with mean at most \(1\). For fixed \(N\) this gives \(\tau = 1/3\eta + O(1)\), or \(\tau = 2/(3\epsilon) + O(1)\). This is not asymptotically optimal, this strategy can be optimized by choosing \(f^\pm(i,1)\) more carefully.
In this section, we will show that the full mutual information between a pair of trajectories in the system in Appendix 11 is unbounded. Let us consider \(I(X_0^T;Y_0^T)\) with \(X=u_1(t)\), and \(Y=u_2(t)\). Using the chain rule for mutual information we have that \[\begin{align} I(X_0^T;Y_0^T,Z) &= I(X_0^T;Y_0^T) + I(X_0^T;Z|Y_0^T), \\\notag &= I(X_0^T;Z) + I(X_0^T;Y_0^T|Z), \end{align}\] for any random variable \(Z\). Choosing a \(Z\) with a finite state space of size \(N_Z\) (and the size of this state space is independent of \(\eta\)), means that the entropy of \(Z\), or entropy of \(Z\) conditioned on another variable is bounded above by \(H(Z) \leq \log N_Z\), and hence any mutual information term between \(Z\) and another variable is similarly bounded by \(\log N_Z\). Hence, we can determine that as \(\epsilon \to 0\), \[\begin{align} \notag I(X_0^T;Y_0^T) \text{ bounded } \iff& I(X_0^T;Y_0^T,Z) \text{ bounded } \\\notag \iff& I(X_0^T,Z;Y_0^T,Z) \text{ bounded } \\ \iff& I(X_0^T;Y_0^T|Z) \text{ bounded. } \end{align}\] For the transition rates in Appendix 11, the set of possible paths (sans waiting times) from the initial to condition to the final is finite, so set \(Z\) to be the random variable representing which path is taken. Further, let us take the path where \(u_1\) first transitions from \(1\to2\), followed by \(s_2\) transitioning from \(0\to1\), and then \(u_2\) transitioning from \(1\to 0\). This particular path has a \(O(1)\) probability of occurring, and so if the mutual information conditioned on this path diverges, then the overall mutual information diverges. Further, from the data processing inequality, we can apply any coarse graining function to the trajectories \(X_0^T\) and \(Y_0^T\) and only decrease the mutual information. Thus, if we reduce the trajectory \(X_0^T\) to the first time at which \(u_1\) changes, and similarly \(Y_0^T\) to the first time at which \(u_2\) changes, essentially we are left computing the mutual information, \(I(\tau_1:\hat{\tau})\), with \(\hat{\tau} = \tau_1+\tau_2+\tau_3\), where all the \(\tau_i\)’s are drawn from exponential distributions, with rates \(3\eta\) for \(\tau_1\), \((3+2\eta)\) for \(\tau_2\) and \((3 + \eta)\) for \(\tau_3\) (even after conditioning on a path the waiting time distribution is unchanged and reflects the sum of possible transition rates before conditioning). Decomposing the mutual information, \[\begin{align} I(\tau_1;\hat{\tau}) &= H(\hat{\tau}) - H(\hat{\tau}|\tau_1) \\\notag &\geq \max\{H(\tau_1),H(\tau_2),H(\tau_3)\} - H(\tau_2+\tau_3) \\ \notag &\geq H(\tau_1) - H(\tau_2) - H(\tau_3) \\ \notag &= O(\log1/\eta) \end{align}\] using the fact that \(H(X+Y)\geq \max \{H(X),H(Y)\}\), \(H(X,Y) \geq H(X+Y)\), \(H(X+Y) = H(X) + H(Y)\) if \(X\) and \(Y\) are independent, and if \(X\sim \text{Exponential}(\lambda)\) then \(H(X) = 1 - \log \lambda\). Thus the mutual information diverges at least as fast as \(O(\log 1/\eta)\).
Intuitively, in this system cell 1 can transition at any point in an \(O(1/\eta)\) time. If you observe a transition \(1\to0\) in cell \(2\) at time \(T\), you know that with \(O(1)\) probability, cell 1 transitioned at a time \(T + O(1)\), narrowing down from an \(O(1/\eta)\) uncertainty and thus gaining a significant amount of information. However, this intuition also suggests a strategy that would have a finite amount of mutual information, while still having an arbitrarily small error rate. If we keep \(g(i)=1\) for \(i>1\) but set \(f^-(1,1) = \eta\), \(f^+(i,s)=\eta\) for \(i>1\), the error rate is unaffected. However, observing a transition \(1\to0\) in cell \(2\) doesn’t narrow down the time at which another cell transitioned all that much, since the cell 2 transition occurs at a time \(O(1/\eta)\) after the first transition. If we took the information, \(I( (u_1,s_1) ; (u_2,s_2))\) this would again diverge, in some sense in this new system there is divergent mutual information it is just not stored in the internal state. We suspect that the mutual information \(I( (u_1,s_1) ; (u_2,s_2))\) always becomes divergent as the error goes to zero.
In Section 4, we showed that successful patterning depends on asymmetric, non-trivial communication among cells. While Fig. 4 depicts transfer between inhibitor and inhibited cells, Fig. 9 illustrates communication among the inhibited cells themselves. Unlike the asymmetric flow involving inhibitor cells, the exchange here is symmetric, reflecting the equivalence of inhibited cells.
dskinner@flatironinstitute.org↩︎