Abstract

Spin glasses are models of statistical mechanics in which a large number of simple elements interact with one another in a disordered fashion. One of the fundamental results of the theory is the Parisi formula, which identifies the limit of the free energy of a large class of such models. Yet many interesting models remain out of reach of the classical theory, and direct generalizations of the Parisi formula yield invalid predictions. I will report here on some partial progress towards the resolution of this problem, which also brings a new perspective on classical results.

The aim of statistical mechanics is to describe the emergent properties of systems that are made of a large number of simple elements. Spin glasses are particular such models, in which there is a lot of “disagreement” between the elementary units of the system¹. Mathematically, this is usually modeled by introducing randomness into the interactions between the elements. In the first section of this note, we make this concrete by presenting a basic spin glass called the Sherrington-Kirkpatrick (SK) model. We define the free energy of the model, and state a fundamental result, called the Parisi formula, that identifies the asymptotic behavior of the free energy in the limit of large system size. We also discuss surprising aspects of this formula, and in Section 2, we present a more recent alternative formulation. In Section 3, some variants of the SK model are introduced for which the limit free energy is currently not known. This is the fundamental problem that has driven most of my work in the topic. In Section 4, a point of view based on partial differential equations is introduced that allows us to formulate a natural conjecture for the limit free energy of these models. Partial results consistent with this conjecture are also presented. In Section 5, we discuss a promising connection between this point of view based on partial differential equations and the alternative representation of the Parisi formula that appeared in Section 2. The note ends with a short concluding section.

1 The Parisi formula↩︎

We start by introducing a basic spin glass called the Sherrington-Kirkpatrick (SK) model [2]. We give ourselves independent Gaussian random variables \((W_{i,j})_{i,j \geqslant 1}\) of zero mean and unit variance, and for every \(\sigma \in \mathbb{R}^N\), we set \[\label{e46def46HN} H_N(\sigma) := \frac{1}{\sqrt{N}} \sum_{i,j = 1}^N W_{i,j} \sigma_i \sigma_j.\tag{1}\] We are interested in questions such as: what is the limit of \[\label{e46def46max} \frac{1}{N} \max_{\sigma \in \{-1,1\}^N} H_N(\sigma)\tag{2}\] as \(N\) tends to infinity? This problem is often motivated with the following story. There are \(N\) individuals \(\{1,\ldots, N\}\) that need to be split into two groups. We can represent one such splitting using a vector \(\sigma \in \{-1,1\}^N\), with the understanding that \(\sigma_i\) indicates the group to which individual \(i\) is assigned. The coefficients \(W_{ij}\) represent how much individual \(i\) likes individual \(j\), and we would like to find an assignment \(\sigma\) that maximizes global welfare, that is, a maximizer in \(\{-1,1\}^N\) of \[\label{e46alt46max} \sigma \mapsto \sum_{i,j = 1}^N W_{i,j} \mathbf{1}_{\{\sigma_i = \sigma_j\}}.\tag{3}\] For \(\sigma \in \{-1,1\}^N\), we have \(\sigma_i \sigma_j = 2\mathbf{1}_{\{\sigma_i = \sigma_j\}} - 1\), so the maximization of 3 is essentially equivalent to that in 2 , up to an affine transformation.

Figure 1: A simple situation with frustration. The coefficients (W_{ij}) suggest to set \sigma_i = \sigma_j, \sigma_i = \sigma_k, and \sigma_j = - \sigma_k, but we cannot realize these three conditions simultaneously. — Figure 1: A simple situation with frustration. The coefficients \((W_{ij})\) suggest to set \(\sigma_i = \sigma_j\), \(\sigma_i = \sigma_k\), and \(\sigma_j = - \sigma_k\), but we cannot realize these three conditions simultaneously.

Finding a configuration \(\sigma \in \{-1,1\}^N\) that maximizes \(H_N(\sigma)\) is a non-trivial task. Even with \(N = 3\), we see in situations such as that depicted in Figure 1 that it will typically not be possible to find a configuration \(\sigma\) such that for every \(i\) and \(j\), we have \((W_{i,j} + W_{j,i}) \sigma_i \sigma_j \geqslant 0\). In other words, certain pairs will be frustrated: two individuals \(i\) and \(j\) may be assigned to different groups even though they would rather be together, or vice versa. The presence of these frustrations is the key signature of spin glasses.

More fundamentally, one can show that the problem, given the coefficients \((W_{ij})\), of finding a configuration \(\sigma \in \{\pm 1\}^N\) that maximizes \(H_N\), is NP-hard in general. In fact, the problem is NP-hard even if we only aim to find a configuration \(\sigma \in \{\pm 1\}^N\) such that \(H_N(\sigma)\) is at least a fixed positive fraction of the maximal value, no matter how small we allow the fraction to be [3]. But here we depart from such worst-case analysis, and focus instead on “typical” choices of the coefficients \((W_{ij})\), by postulating that they are chosen randomly.

Standard concentration inequalities allow us to show that the maximum in 2 deviates only little from its expectation in the limit of large \(N\), so we may as well focus on studying its expectation. In addition to the expectation of 2 , it is natural to also consider, for each \(\beta \geqslant 0\), the quantity \[\label{e46def46FN} F_N(\beta) := \frac{1}{N} \mathbb{E}\log \bigg(\frac{1}{2^N}\sum_{\sigma\in \{-1,1\}^N} \exp (\beta H_N(\sigma))\bigg).\tag{4}\] There are several reasons for this. From the point of view of statistical mechanics, this quantity is closely related to the Gibbs measure associated with \(H_N(\sigma)\), which is the probability measure that attributes a probability proportional to \(\exp(\beta H_N(\sigma))\) to each configuration \(\sigma \in \{-1,1\}^N\). (See [4] for some motivations behind the concept of Gibbs measures.) Mathematically, the free energy can also be seen as a sort of Laplace transform of the quantity of interest, and contains much information about the geometry of \(H_N\) and the structure of the Gibbs measure. It is often more convenient to work with, and if we so wish, we can a posteriori recover information about the maximum in 2 by considering \(F_N(\beta)/\beta\) for large \(\beta\). The normalization of \(H_N\) has been chosen so that \(\max_{\{-1,1\}^N} H_N\) is of order \(N\). The free energy thus allows us to interpolate between a large-\(\beta\) regime in which the sum in 4 is dominated by the contribution of the configurations \(\sigma\) for which \(H_N(\sigma)\) is large (“energy dominates”), and a small-\(\beta\) regime in which the very large number of configurations with relatively small \(H_N(\sigma)\) provide the dominant contribution (“entropy dominates”).

The problem of identifying the large-\(N\) limit of the free energy \(F_N(\beta)\) turns out to be surprisingly rich and difficult. An initial guess for this limit was proposed in the original paper [2] that introduced the model, but it was already understood there that the proposed answer could not be valid for large values of \(\beta\). In 1979, Giorgio Parisi then came up with a sophisticated non-rigorous procedure, called the replica method, that led to what is now called the Parisi formula for this limit [5]–[8] (see also [1], [9] for more on the replica method). After many years of effort, Francesco Guerra and Michel Talagrand managed to prove the Parisi formula rigorously [10], [11] in 2003. A more conceptual proof, centered around the fact that the associated Gibbs measure is asymptotically ultrametric², was then developed by Dmitry Panchenko, and generalized to a broader class of models [12], [13]. The Parisi formula takes the following form.

Theorem 1 (Parisi formula [10]–[13]). For every \(\beta \geqslant 0\), we have \[\label{e46parisi} f(\beta) := \lim_{N\to+\infty}F_N(\beta)=\inf_{\mu \in \mathcal{P}([0,1])}\left( \Phi_\mu(0,0) -{\beta^2}\int_0^1 t\mu([0,t])\mathrm{d}t\right),\tag{5}\] where \(\mathcal{P}([0,1])\) denotes the space of probability measures on \([0,1]\), and \(\Phi_\mu :[0,1]\times \mathbb{R}\to \mathbb{R}\) is the solution to \[\label{e46parisi46pde46dis} \begin{cases} -\partial_t\Phi_\mu(t,x)={\beta^2}\Big(\partial_x^2\Phi_\mu(t,x)+\mu([0,t])\big(\partial_x\Phi_\mu(t,x)\big)^2\Big) & \text{for } (t,x) \in [0,1]\times \mathbb{R},\\ \Phi_\mu(1,x)=\log\cosh(x)& \text{for } x\in \mathbb{R}. \end{cases}\tag{6}\]

The formula 5 came as a surprise, and its validity was initially controversial. Compared to more classical problems of statistical mechanics, several aspects stand out. The first is simply the complexity of the formula, as the optimization variable is a probability measure, while for more classical models it usually ranges in a finite-dimensional space. A second and perhaps more fundamental surprise is that the limit free energy is expressed as an infimum, rather than a supremum. Indeed, even before passing to the limit \(N \to +\infty\), the free energy of essentially any system can be written as a supremum of a functional involving intuitive energy and entropy terms. To be precise, for every probability measure \(\mu\) over a measure space \(E\) and every bounded measurable function \(g : E \to \mathbb{R}\), we have \[\label{e46gibbs46var} \log \int e^g \, \mathrm{d}\mu = \sup_{\nu \in \mathcal{P}(E)} \left( \int g \, \mathrm{d}\nu - \mathsf H(\nu \, | \, \mu) \right),\tag{7}\] where \(\mathsf H(\nu \, | \, \mu)\) stands for the relative entropy of \(\nu\) with respect to \(\mu\), which is \(+\infty\) if \(\nu\) is not absolutely continuous with respect to \(\mu\), and is otherwise given by \[\mathsf H(\nu \, | \, \mu) := \int \frac{\mathrm{d}\nu}{\mathrm{d}\mu} \log \left( \frac{\mathrm{d}\nu}{\mathrm{d}\mu} \right) \, \mathrm{d}\mu.\] The optimum in 7 is achieved by the corresponding Gibbs measure, that is, the probability measure \(\nu^*\) whose Radon-Nikodym derivative with respect to \(\mu\) is proportional to \(e^g\). The first term on the right side of 7 expresses the average of \(g\) under the measure \(\nu^*\), while the second term expresses the cost for samples from \(\mu\) to look like samples from \(\nu^*\). In our case, we think of \(g\) as being \(\beta H_N\) and of \(\mu\) as being the uniform measure on \(\{-1,1\}^N\). For finite \(N\), the representation coming from 7 is complicated since \(g\) is random and we need to maximize over all probability measures over \(\{-1,1\}^N\), but one could a priori hope that some simplifications occur in the limit of large \(N\), so that we ultimately end up with a simple formula for the limit free energy of the form \[\label{e46fbeta46rep} f(\beta) = \sup_{(e,s) \in I} (\beta e - s),\tag{8}\] for some explicit set \(I \subseteq\mathbb{R}^2\). In simpler systems of statistical mechanics, one can often obtain a representation of the form in 8 by identifying all pairs \((e,s)\) such that, roughly speaking, \[\label{e46energy-entropy} \frac{1}{N} \log \bigg(2^{-N} \sum_{\sigma \in \{-1,1\}^N} \mathbf{1}_{\{ H_N(\sigma) \simeq N e \}}\bigg) \simeq -s \, ;\tag{9}\] or at a minimum, one can deduce legible information in the spirit of 9 from the identification of the limit free energy. For spin glasses however, although we know that a representation of the form 8 exists simply because \(f\) is convex, I am not aware of a reasonably direct and concrete way to describe the set \(I\), in other words the set of energy-entropy pairs that are achievable by the system. This is related to the fact that, even though the convexity of \(f\) is easily checked as \(F_N\) itself is convex (e.g.by 7 ), it is not at all clear to verify this convexity property directly from the limit expression given by Theorem 1.

Since maximization problems are much more standard representations of free energies, one may call the variational problem in 5 an “inverted” variational representation³.

2 Un-inverting the Parisi formula↩︎

An “un-inverted” representation of the limit free energy of the SK model, that is, one that takes the form of a supremum, was recently found. Recall that we denote by \(f(\beta)\) the large-\(N\) limit of the free energy \(F_N(\beta)\) defined in 4 .

Theorem 2 (Un-inverted Parisi formula [14]). Let \((B_t)_{t \geqslant 0}\) denote a Brownian motion defined on some filtered probability space \((\Omega, \mathcal{F}, (\mathcal{F}_t)_{t \geqslant 0}, \mathbf{P})\), and let \(\mathbf{Mart}\) denote the space of bounded martingales on \(\Omega\). For every \(\beta \geqslant 0\), we have \[\label{e46uninverted} f(\beta) = \sup_{\alpha\in \mathbf{Mart}} \bigg\{ \beta\sqrt{2}\mathbf{E}[\alpha_1 B_1]- \mathbf{E}[\phi^*(\alpha_1)] - {\beta^2} \sup_{t \in [0,1]} \int_t^1 (s - \mathbf{E}[\alpha_s^2]) \, \mathrm{d}s \bigg\} ,\tag{10}\] where for every \(\lambda \in \mathbb{R}\), we set \[\label{e46def46phi42} \phi^*(\lambda) = \left| \begin{array}{ll} \frac{1}{2} \left[(1+\lambda)\log(1+\lambda) + (1-\lambda)\log(1-\lambda)\right] & \text{ if } |\lambda| \leqslant 1, \\ +\infty & \text{ otherwise}. \end{array} \right.\tag{11}\]

The representation in 10 was obtained by manipulating the Parisi formula from Theorem 1, using the fact from [15] that the functional inside the infimum in 5 is convex, together with duality arguments. The connection between this representation and the finite-\(N\) system remains to be discovered. There are however some indications for how this connection might emerge. For instance, the quantity \(\mathbf{E}[\phi^*(\alpha_1)]\) resembles a relative entropy as in 7 . Indeed, denoting by \[\mathsf{Ber}(m) := \frac{1+m}{2}\delta_1 + \frac{1-m}{2}\delta_{-1}\] the law of a random variable taking values in \(\{-1,1\}\) with mean \(m\), we have that \[\frac{1}{N} \mathsf H \left( \bigotimes_{i = 1}^N \mathsf{Ber}(m_i) \, | \, (\mathsf{Ber}(0))^{\otimes N} \right) = \frac{1}{N} \sum_{i = 1}^N \phi^*(m_i),\] and this quantity can be rewritten as \(\mathbf{E}[\phi^*(\alpha_1)]\) provided that the law of \(\alpha_1\) is \(\frac{1}{N} \sum_{i = 1}^N \delta_{m_i}\). We recall that for our model, we think of the formula 7 with \(\mu\) chosen to be the uniform measure over \(\{-1,1\}^N\), which is \((\mathsf{Ber}(0))^{\otimes N}\).

In truth, the Gibbs measure (which is the optimizer of 7 ) is not a product measure of the form \(\bigotimes_{i = 1}^N \mathsf{Ber}(m_i)\), and the last term in 10 accounts for a correction. For specialists, the structure of the formula in [t46uninverted] is likely to evoke the finite-\(N\) representation of the free energy first introduced by Thouless, Anderson and Palmer [16] and further developed in [17]–[20], where a similar correction term appears.

To give further substance to the claim that there should be a direct way to understand the emergence of the variational formula in 10 from the finite-\(N\) system, I would like to make a detour to questions that concern optimization algorithms; a more detailed overview of these developments is in [21]. First, by taking the large-\(\beta\) limit of \(f(\beta)/\beta\), one can show that \[\begin{gather} \label{e46uninverted46max} \lim_{ N \to +\infty} \frac{1}{N} \mathbb{E}\max_{\sigma \in \{-1,1\}^N} H_N(\sigma) \\ = \sup_{\alpha \in \mathbf{Mart}} \left\{\sqrt{2}\mathbf{E}[\alpha_1 B_1] \;: \;|\alpha_1| \leqslant 1 \text{ and } \forall t \in [0,1], \;\int_t^1 (\mathbf{E}[\alpha_s^2] -s) \, \mathrm{d}s \geqslant 0 \right\}. \end{gather}\tag{12}\] In a series of recent works, an algorithmic threshold \(\mathsf{ALG}\) was identified such that the following holds. On the one hand, there exists an efficient algorithm that, given the coefficients \((W_{ij})\), returns a configuration \(\sigma\) such that \(H_N(\sigma)/N\) is \(\mathsf{ALG}(1+o(1))\) with probability tending to \(1\) as \(N\) tends to infinity [22]–[27]. On the other hand, no matter how small \(\varepsilon> 0\) is chosen, with probability tending to \(1\) as \(N\) tends to infinity, an algorithm that is Lipschitz continuous with respect to the input weights \((W_{ij})\) will not be able to output a configuration \(\sigma\) such that \(H_N(\sigma)/N\) exceeds \(\mathsf{ALG} - \varepsilon\) [28]–[31]. This threshold value \(\mathsf{ALG}\) can be written as \[\label{e46alg46formula} \mathsf{ALG} = \sup_{\alpha \in \mathbf{Mart}} \left\{\sqrt{2}\mathbf{E}[\alpha_1 B_1] \;: \;|\alpha_1| \leqslant 1 \text{ and } \forall t \in [0,1], \; \mathbf{E}[\alpha_t^2] = t \right\}.\tag{13}\] One point I find very interesting is that in this case, we have a clear connection between the variational formula 13 and the spin-glass system at finite \(N\). Indeed, for essentially every choice of martingale \(\alpha \in \mathbf{Mart}\) that satisfies the constraints \(|\alpha_1| \leqslant 1\) and \(\forall t,\, \mathbf{E}[\alpha_t^2] = t\), one can construct an algorithm that outputs a configuration \(\sigma\) such that \(H_N(\sigma)/N\) is approximately \(\sqrt{2}\mathbf{E}[\alpha_1 B_1]\). The algorithm iteratively updates a point in \(\mathbb{R}^N\), with small increments that we can here approximate by a continuous evolution \(m : [0,1] \to \mathbb{R}^N\), and for each \(t \in [0,1]\), the empirical measure of the coordinates \(\frac{1}{N} \sum_{i = 1}^N \delta_{m_i(t)}\) converges weakly to the law of \(\alpha_t\).

In short, I feel particularly interested in, and have the impression that we can make progress upon, the following question:

Can we interpret the variational formula in 10 in terms of finite-\(N\) constructs?

Part of my interest in this question is that I think that it has the potential to give us a new way to understand and study spin glasses. Also, once we have a satisfactory answer to this question at a heuristic level, perhaps we could devise a new proof at least of the inequality stating that, for every \(\alpha \in \mathbf{Mart}\), \[\label{e46desired46ineq} f(\beta) \geqslant\beta\sqrt{2}\mathbf{E}[\alpha_1 B_1]- \mathbf{E}[\phi^*(\alpha_1)] - {\beta^2} \sup_{t \in [0,1]} \int_t^1 (s - \mathbf{E}[\alpha_s^2]) \, \mathrm{d}s.\tag{14}\] As will be explained, having a direct proof of the inequality 14 that does not rely on the Parisi formula would be tremendously useful for resolving the main open problem discussed in the next section.

3 Towards more general models↩︎

Building on the insights provided by the Parisi formula and its proof, along with subsequent developments, we now have a much deeper understanding of the Sherrington-Kirkpatrick model and the structure of its Gibbs measures. Moreover, the ideas first developed for the SK model have proven remarkably fruitful in a wide range of other contexts that exhibit similar kinds of “frustration”, ranging from statistics and high-dimensional geometry to computer science and combinatorics. Examples include random constraint satisfaction problems [32]–[37], community detection and related large-scale statistical learning problems [38], [39] [4], error-correcting codes in information theory [40], and classical combinatorial problems such as graph coloring [41]–[43].

Figure 2: The graph of direct interactions in the bipartite model.

This being said, and perhaps surprisingly, some models that seem like modest generalizations of the SK model still resist analysis. One such example can be constructed as follows. In the definition of \(H_N\) for the SK model in 1 , we sum over all pairs \((i,j)\). For the new model we consider now, we imagine that the indices are organized over two layers as on Figure 2, and we only sum over pairs of indices that belong to different layers. Such models are related to several classical models of artificial neural networks, such as the Hopfield model [44]–[50] and restricted Boltzmann machines [51]–[56].

To formalize the model precisely, we can first give ourselves, for each integer \(N\), two integers \(N_1(N)\) and \(N_2(N)\) that represent the sizes of the two layers displayed on Figure 2. We will keep the dependency of \(N_1\) and \(N_2\) on \(N\) implicit from now on, and assume that there exist \(\lambda_1, \lambda_2 \in (0,+\infty)\) such that \[\label{e46def46lambda} \lim_{N \to \infty} \frac{N_1}{N} = \lambda_1 \quad \text{ and } \quad \lim_{N \to \infty} \frac{N_2}{N} = \lambda_2.\tag{15}\] Now, for each \(\sigma = (\sigma_1, \sigma_2) = (\sigma_{1,1},\ldots, \sigma_{1,N_1}, \sigma_{2,1}, \ldots, \sigma_{2,N_2}) \in \mathbb{R}^{N_1} \times \mathbb{R}^{N_2}\), we set \[\label{e46def46HN46bip} H_N^\mathrm{bip}(\sigma) := \frac{1}{\sqrt{N}} \sum_{i\leqslant N_1, j \leqslant N_2} W_{i,j} \sigma_{1,i} \sigma_{2,j}.\tag{16}\] We refer to this model as the bipartite model. Writing \(\Sigma_N := \{-1,1\}^{N_1} \times \{-1,1\}^{N_2}\), we would like for instance to understand the large-\(N\) behavior of \[\label{e46bip46free} \frac{1}{N} \mathbb{E}\log \bigg(\frac{1}{|\Sigma_N|}\sum_{\sigma \in \Sigma_N} \exp(\beta H_N^\mathrm{bip}(\sigma))\bigg).\tag{17}\]

While this bipartite model may seem like a small modification of the SK model, to this day, we do not know what the limit of the quantity in 17 is; in fact, we do not even know that it converges as \(N\) tends to infinity in this case. (The same goes for the maximum of \(H_N^{\mathrm{bip}}\) over \(\Sigma_N\).) Moreover, the problem we encounter here goes beyond that of adjusting some technical part of the proof of the Parisi formula. Indeed, although one can a priori imagine several possible ways of extending the Parisi formula to this bipartite model, one can in fact show that none of those candidates for the limit are valid [57].

In order to clarify the key difference between the bipartite and the SK models at the technical level, it is useful to change a bit our viewpoint on the definition of these random fields \(H_N\) and \(H_N^\mathrm{bip}\). Instead of writing them down explicitly as in 1 and 16 , we can equivalently specify that they are centered Gaussian fields, and display their covariance. For the SK model, we have for every \(\sigma, \tau \in \mathbb{R}^N\) that \[\label{e46cov46HN} \mathbb{E}\left[ H_N(\sigma) H_N(\tau) \right] = N \left( \frac{\sigma \cdot \tau}{N} \right) ^2,\tag{18}\] where \(\sigma \cdot \tau\) denotes the scalar product between \(\sigma\) and \(\tau\). More generally, one could consider centered Gaussian fields \((H_N(\sigma))_{\sigma \in \mathbb{R}^N}\) such that, for some smooth function \(\xi : \mathbb{R}\to \mathbb{R}\), we have for every \(\sigma, \tau \in \mathbb{R}^N\) that \[\label{e46def46cov} \mathbb{E}\left[ H_N(\sigma) H_N(\tau) \right] = N \xi\left( \frac{\sigma \cdot \tau}{N} \right);\tag{19}\] this corresponds to an assumption on the invariance of the law of \(H_N\) under orthogonal transformations. The SK model 1 corresponds to the case when \(\xi(r) = r^2\). For \(\xi(r) = r^3\), we can construct a Gaussian field that satisfies 19 by setting \[H_N(\sigma) := \frac{1}{N} \sum_{i,j,k = 1}^N W_{i,j,k} \sigma_i \sigma_j \sigma_k,\] where \((W_{i,j,k})\) are independent centered Gaussians with unit variance. For every integer \(p \geqslant 1\), we can generalize this and construct a centered Gaussian field \(H_N\) such that 19 holds with \(\xi(r) = r^p\). By considering linear combinations of independent versions of such fields, we can build a centered Gaussian field \(H_N\) such that 19 holds with \[\label{e46valid46xi} \xi(r) = \sum_{p =0}^{+\infty} a_p^2 \, r^p,\tag{20}\] provided that the sequence \((a_p)_{p \geqslant 1}\) decays to zero sufficiently fast. It turns out that the functions of the form in 20 are all those such that 19 holds for some centered Gaussian field \(H_N\) (see [58], and [59] for a more general statement covering cases with multiple types of spins).

In the case of the bipartite model 16 , we have instead that, for every \(\sigma, \tau \in \mathbb{R}^{N_1} \times \mathbb{R}^{N_2}\), \[\label{e46def46cov46bip} \mathbb{E}\left[ H_N^{\mathrm{bip}}(\sigma) H_N^{\mathrm{bip}}(\tau) \right] = N \left( \frac{\sigma_1 \cdot \tau_1}{N} \right) \left( \frac{\sigma_2 \cdot \tau_2}{N} \right).\tag{21}\] The key technical difference between the SK and the bipartite models is that here the relevant function that shows up on the right side of 21 is the mapping \((x,y) \mapsto xy\), which is not convex. To be precise, for models with only one type of spins, i.e.of the form in 19 , what is crucial is that the function \(\xi\) is convex over \(\mathbb{R}_+\); as one can see from 20 , this is in fact always the case! This convexity property can however break down as soon as we consider models with two or more types of spins. In general, we can consider models with a fixed number \(D\) of types of spins, say \(\sigma = (\sigma_1, \ldots, \sigma_D) \in \mathbb{R}^{N_1} \times \cdots \times \mathbb{R}^{N_D}\), with \(N_d / N \to \lambda_d \in (0,+\infty)\) for every \(d \in \{1,\ldots, D\}\), and with a covariance such that, for every \(\sigma, \tau \in \mathbb{R}^{N_1} \times \cdots \times \mathbb{R}^{N_D}\), \[\label{e46cov46general} \mathbb{E}\left[ H_N(\sigma) H_N(\tau) \right] = N \xi \left( \left( \frac{\sigma_d \cdot \tau_{d}}{N} \right)_{1 \leqslant d \leqslant D} \right) ,\tag{22}\] where \(\xi\) is some (admissible) function from \(\mathbb{R}^{D}\) to \(\mathbb{R}\). Those models for which we can write down and rigorously prove a Parisi formula for the limit free energy are those for which the function \(\xi\) is convex over \(\mathbb{R}^D_+\) [13], [60]–[65]. Some particular models of the form 22 with \(\xi\) that is not convex over \(\mathbb{R}^D_+\) but with several additional symmetries have also been successfully analyzed [66]–[71], but for the most part, the analysis of models of the form 22 with non-convex \(\xi\) remains open.

4 A connection with Hamilton-Jacobi equations↩︎

In order to make progress on the identification of the limit free energy for systems such as the bipartite model, my collaborators and I have explored an approach that consists in seeking a partial differential equation that would be solved by the limit of \(F_N\). The free energy \(F_N\) as we defined it here in 4 (or in 17 for the bipartite model) depends only on \(\beta\), and it is not possible to find a simple equation for \(F_N\) or its limit that would only involve derivatives in \(\beta\). Hence, we first seek to add terms to the energy function \(H_N\) that depend on additional parameters; for instance, we could replace \(\beta H_N(\sigma)\) by \(\beta H_N(\sigma) + \lambda H_N'(\sigma)\) for some free parameter \(\lambda \in \mathbb{R}\) and some well-chosen \(H_N'\). This would yield a free energy that now depends on \(\lambda\) in addition to \(\beta\). We perform this “enrichment” of the free energy with the hope of finding a partial differential equation involving derivatives in, say, \(\beta\) and \(\lambda\), that the free energy would asymptotically solve as \(N\) tends to infinity. Naturally, one would like the additional quantities such as \(H_N'\) in the example to be less complicated to analyze than the original field \(H_N\). On the other hand, we would like the additional parameters to be sufficiently rich that we can ultimately close the equation for the limit free energy. In practice, we will always shoot for first-order partial differential equations, so we can intuitively think of the task as that of building a simpler but “locally equivalent” energy function \(H_N'\), so that we can compensate small variations of \(\beta\) with small variations of \(\lambda\) and keep the free energy roughly constant. If this is indeed possible, then we obtain a way to flow the parameter \(\beta\) from the “easy” case with \(\beta = 0\) towards the value of \(\beta\) of interest.

The idea of thinking of the limit free energy of a model of statistical mechanics as a solution to a partial differential equation goes back at least to [72], [73] (see also [74] for a recent survey on related topics). For simpler models of statistical mechanics, as well as some problems of inference such as community detection on dense graphs, this strategy can work very well, see [75] and [4] for a detailed presentation. The case of spin glasses is more difficult though. Ideas in this spirit were first explored in [76]–[79] under simplifying assumptions. We now informally discuss some of the recent progress in this direction. Similar difficulties, not discussed further here, also show up for community detection on sparse graphs [80]–[82].

In order to keep the notation simple, we first present the approach in the case of the SK model 1 . For every \(t \geqslant 0\) and \(h \geqslant 0\), we set \[\label{e46def46newFN} F_N(t,h) := - \frac{1}{N} \mathbb{E}\log \bigg(\frac{1}{2^N} \sum_{\sigma \in \{- 1,1\}^N} \exp(\sqrt{2t} H_N(\sigma) - Nt + \sqrt{2h} z \cdot \sigma - N h)\bigg),\tag{23}\] where \(z = (z_1,\ldots, z_N)\) is a vector of independent centered Gaussian random variables with unit variance, independent of \(H_N\), and we recall that the function \(H_N\) for the SK model is defined in 1 . The key decision we made is that of adding the term involving \(z \cdot \sigma\). As will be seen more clearly below, the fact that this term is linear in \(\sigma\) makes it much simpler indeed. And if we write \(H_N(\sigma)\) in the form \[H_N(\sigma) = \frac{1}{\sqrt N} \sum_{i = 1}^N \left( \sum_{j = 1}^N W_{ij} \sigma_j\right) \sigma_i ,\] it is perhaps not unreasonable to hope that the term \((\sum_{j = 1}^N W_{ij} \sigma_j)\) could be substituted with equivalent independent Gaussians. We also wrote a factor of \(\sqrt{2t}\) in place of \(\beta\) in front of \(H_N\); since \(H_N\) is Gaussian, this ensures that the variance of \(\sqrt{2t} H_N\) scales linearly, as with Brownian motion. The compensating parameter \(Nt\) is only a convenience⁴. Similar comments also hold concerning the terms \(\sqrt{2h}\) and \(-Nh\). Notice also that, unlike in previous sections, we added a minus sign to the definition of the free energy \(F_N(t,h)\).

Before proceeding, we introduce notation for the Gibbs measure. For any function \(f\), we write \[\label{e46def46Gibbs} \langle f(\sigma) \rangle := \frac{\sum_{\sigma \in \{\pm 1\}^N} f(\sigma) \exp(H_N(t,h,\sigma))}{\sum_{\sigma \in \{\pm 1\}^N} \exp(H_N(t,h,\sigma))},\tag{24}\] where \(H_N(t,h,\sigma) := \sqrt{2t} H_N(\sigma) - Nt + \sqrt{2h} z \cdot \sigma - N h\). In the notation on the left side of 24 , the bracket \(\langle \cdot \rangle\) stands for the expectation with respect to the Gibbs measure, and we think of \(\sigma\) as a random variable that is sampled accordingly. We write \(\sigma'\) to denote an independent copy of \(\sigma\) under the Gibbs measure, so that \[\langle f(\sigma, \sigma') \rangle := \frac{\sum_{\sigma,\sigma' \in \{\pm 1\}^N} f(\sigma, \sigma') \exp(H_N(t,h,\sigma)+H_N(t,h,\sigma'))}{\sum_{\sigma,\sigma' \in \{\pm 1\}^N} \exp(H_N(t,h,\sigma)+H_N(t,h,\sigma'))}.\] This expectation \(\langle \cdot \rangle\) depends on the parameters \(t\) and \(h\), even though we keep it implicit in the notation. A simple calculation involving Gaussian integration by parts gives us that \[\label{e46FN46derivatives} \partial_t F_N(t,h) = \mathbb{E}\left\langle\left( \frac{\sigma \cdot \sigma'}{N} \right) ^2 \right\rangle\quad \text{ and } \quad \partial_h F_N(t,h) = \mathbb{E}\left\langle\frac{\sigma \cdot \sigma'}{N} \right\rangle.\tag{25}\] For general models as in 19 , we would find the same expression for \(\partial_h F_N\) as in 25 (for the corresponding definition of the Gibbs measure), while for the derivative in \(t\), we would find that \[\label{e46drt46xi} \partial_t F_N(t,h) = \mathbb{E}\left\langle\xi \left( \frac{\sigma \cdot \sigma'}{N} \right) \right\rangle.\tag{26}\] Coming back to the SK model for now, we thus obtain that \[\label{e46pde46FN} \partial_t F_N - (\partial_h F_N)^2 = \mathbb{E}\left\langle\left( \frac{\sigma \cdot \sigma'}{N} \right) ^2 \right\rangle- \left( \mathbb{E}\left\langle\frac{\sigma \cdot \sigma'}{N} \right\rangle\right) ^2.\tag{27}\] The right-hand side of 27 is the variance of the random variable \(\sigma \cdot \sigma'/{N}\) under \(\mathbb{E}\left\langle\cdot \right\rangle\). Since \(\sigma \cdot \sigma'/N\) is a sum of a large number of terms, we may at first anticipate that it will have small fluctuations. If we assume that this is so for the moment, we are led to the expectation that \(F_N\) may converge to a limit function \(f\) that solves the equation \[\label{e46hj46SK46simple} \partial_t f - (\partial_h f)^2 = 0.\tag{28}\] Moreover, one can easily compute the value of the free energy \(F_N\) 23 at \(t = 0\), as we have that, for every \(h \geqslant 0\), \[\label{e46simple46init} F_N(0,h) = F_1(0,h).\tag{29}\] Hence, if we believe that the random variable \(\sigma \cdot \sigma'/N\) has vanishingly small fluctuations in the limit of large \(N\), then we are led to the belief that \(F_N\) should converge to the function \(f\) that solves 28 with initial condition \(f(0,\cdot) = F_1(0,\cdot)\). For the model with the covariance as in 19 , we can proceed in the same way, and under the same assumption, we would obtain the limit partial differential equation \[\label{e46hj46xi46simple} \partial_t f - \xi(\partial_h f) = 0.\tag{30}\] Equations of this form go by the name of Hamilton-Jacobi equations.

Unfortunately, the hypothesis that the random variable \(\sigma \cdot \sigma'/N\) has vanishingly small fluctuations in the limit of large \(N\) is only valid at high temperature, or in other words, for small values of \(t\). For large values of \(t\), the Gibbs measure becomes more complex (with an ultrametric structure), and the variance of \(\sigma \cdot \sigma'/N\) under the Gibbs measure does not tend to zero.

Since for large \(t\), one cannot close the equation for the limit of \(F_N\) using only the variables \(t\) and \(h\), we need to refine this first attempt and introduce a richer additional term than this \(z \cdot \sigma\) that we used here. The more sophisticated term is still linear in \(\sigma\), but replaces the simple “external field” \(z\) by one with an ultrametric structure. This ultrametric structure is encoded by a number of parameters, which can be collectively bundled into an increasing⁵ and bounded cadlag function \(q : [0,1) \to \mathbb{R}\); we denote by \(\mathcal{Q}\) the set of such functions. The detailed motivation and complete definition of this term would bring us too far off, and I will have to ask the reader to accept (or to consult [4]) that it is indeed possible to define an enriched free energy \(F_N(t, q)\), for every \(t \in \mathbb{R}_+\) and \(q \in \mathcal{Q}\), so that asymptotically as \(N\) tends to infinity, we have for the SK model that \[\label{e46approx46hj} \partial_t F_N - \int_0^1 (\partial_q F_N)^2 = \text{some plausibly small conditional variance term},\tag{31}\] and that moreover, we have in analogy with 29 that, for every \(q \in \mathcal{Q}\), \[\label{e46def46psi} F_N(0,q) = F_1(0,q). \quad \text{ For convenience, we write } \psi_1(q) := F_1(0,q).\tag{32}\] In the special case when the path \(q\) is identically equal to \(h\), we recover the quantity in 23 , so this enriched free energy \(F_N : \mathbb{R}_+ \times \mathcal{Q} \to \mathbb{R}\) is an extension of that defined in 23 ; and in particular, the quantity we are ultimately most interested in computing is \(F_N(t,0)\). Informally, the derivative \(\partial_q\) appearing in 31 is such that, for a sufficiently smooth function \(g : \mathcal{Q} \to \mathbb{R}\) and \(q \in \mathcal{Q}\), the quantity \(\partial_q g(q,\cdot)\) is a function from \([0,1]\) to \(\mathbb{R}\) such that for every \(q' \in \mathcal{Q}\) and as \(\varepsilon\) tends to zero, \[g((1-\varepsilon) q + \varepsilon q') - g(q) = \varepsilon\int_0^1 \partial_q g(q,u) (q'-q)(u) \, \mathrm{d}u + o(\varepsilon) \, ;\] and a more explicit writing of the integral in 31 is \(\int_0^1 (\partial_q F_N(t,q,u))^2 \, \mathrm{d}u.\)

For all models with a single type, we can indeed characterize the limit free energy as the unique solution to the equation in 31 .

Theorem 3 (Limit free energy via a Hamilton-Jacobi equation [83]–[85]). The enriched free energy \(F_N : \mathbb{R}_+ \times \mathcal{Q} \to \mathbb{R}\) of the SK model converges pointwise to the unique function \(f : \mathbb{R}_+ \times \mathcal{Q} \to \mathbb{R}\) that solves \[\label{e46hj46SK} \begin{cases} \partial_t f - \int_0^1 (\partial_q f)^2 = 0 & \text{ on } \mathbb{R}_+ \times \mathcal{Q}, \\ f(0,\cdot) = \psi_1 & \text{ on } \mathcal{Q}. \end{cases}\tag{33}\] More generally, if \(F_N : \mathbb{R}_+ \times \mathcal{Q} \to \mathbb{R}\) stands instead for the enriched free energy associated with a model with covariance given by 19 , then \(F_N\) converges pointwise to the unique function \(f : \mathbb{R}_+ \times \mathcal{Q} \to \mathbb{R}\) that solves \[\label{e46hj46xi} \begin{cases} \partial_t f - \int_0^1 \xi(\partial_q f) = 0 & \text{ on } \mathbb{R}_+ \times \mathcal{Q}, \\ f(0,\cdot) = \psi_1 & \text{ on } \mathcal{Q}. \end{cases}\tag{34}\]

Part of the task of making sense of this theorem is that one needs to find a good notion of solution for the Hamilton-Jacobi equations in 33 and 34 . This is based on the notion of viscosity solutions (see [4] for an introduction that is tailored to our context).

Using that \(\xi\) is convex on \(\mathbb{R}_+\), we can in fact write the viscosity solution \(f\) of 34 as a variational formula. Indeed, this solution is such that, for every \(t \geqslant 0\) and \(q \in \mathcal{Q}\), \[\label{e46hopf-lax46xi} f(t,q) = \sup_{q' \in \mathcal{Q}} \left(\psi_1(q+q') - t \int_0^1 \xi^* \left( \frac{q'}{t} \right) \right),\tag{35}\] where \(\xi^*\) is the convex dual of \(\xi\), which is defined, for every \(s \in \mathbb{R}\), by \[\xi^*(s) := \sup_{r \geqslant 0} \left( rs - \xi(r) \right) .\] One can recover the Parisi formula by setting \(q = 0\) in 35 , making a change of variables, and doing some explicit calculations involving the function \(\psi_1\) (see again [4] for more details)⁶.

The main motivation for developing this PDE point of view on the Parisi formula is to tackle the case when \(\xi\) is non-convex, such as is the case for the bipartite model. In this case, with \(H_N^{\mathrm{bip}}\) defined in 16 , we can try to mimic the simple arguments that led us to 28 or 30 . The point now is that since there are two types of spins, we add one variable for each of these two types. That is, for every \(t \geqslant 0\), \(h = (h_1,h_2) \in \mathbb{R}_+^2\), and \(\sigma = (\sigma_1, \sigma_2) \in \Sigma_N\), we set \[H_N(t,h,\sigma) := \sqrt{2t} H_N^{\mathrm{bip}}(\sigma) - N\lambda_1 \lambda_2 t + \sqrt{2h_1} z_1 \cdot \sigma_1 - N_1 h_1 + \sqrt{2 h_2} z_2 \cdot \sigma_2 - N_2 h_2,\] as well as \[F_N(t,h) := - \frac{1}{N} \mathbb{E}\log \bigg( \frac{1}{|\Sigma_N|} \sum_{\sigma \in \Sigma_N} \exp(H_N(t,h,\sigma)) \bigg).\] In the displays above and from now on, we drop the superscript \(^{\mathrm{bip}}\) for ease of notation. The definition of the Gibbs average \(\left\langle\cdot \right\rangle\) is as in the formula in 24 , with \(h \in \mathbb{R}_+^2\) and with the summation variable \(\sigma\) ranging in \(\Sigma_N\). In place of 25 or 26 , we obtain that \[\partial_t F_N(t,h) = \mathbb{E}\left\langle\left( \frac{\sigma_1 \cdot \sigma_1'}{N} \right) \left( \frac{\sigma_2 \cdot \sigma_2'}{N} \right) \right\rangle,\] while we still have \[\partial_{h_1} F_N(t,h) = \mathbb{E}\left\langle\frac{\sigma_1 \cdot \sigma_1'}{N} \right\rangle\quad \text{ and } \quad \partial_{h_2} F_N(t,h) = \mathbb{E}\left\langle\frac{\sigma_2 \cdot \sigma_2'}{N} \right\rangle.\] Under the assumption that the random variables \(\sigma_1 \cdot \sigma_1'/N\) and \(\sigma_2 \cdot \sigma_2'/N\) have vanishing fluctuations in the limit of large \(N\), we would expect \(F_N\) to converge to \(f\) solution to \[\partial_t f - \partial_{h_1} f \, \partial_{h_2} f = 0.\] The initial condition \(f(0,\cdot)\) is easy to compute as it factorizes similarly to what we saw in 29 .

As was the case earlier, the assumption of concentration of the random variables \(\sigma_1 \cdot \sigma_1'/N\) and \(\sigma_2 \cdot \sigma_2'/N\) is invalid for large \(t\), and we need again to pass to a more sophisticated free energy \(F_N : \mathbb{R}_+ \times \mathcal{Q}^2 \to \mathbb{R}\), which this time takes as second argument a pair of paths \(q = (q_1,q_2) \in \mathcal{Q}^2\). We have a factorization property similar to that in the first identity of 32 ; for every \(q = (q_1,q_2) \in \mathcal{Q}^2\), we write \[\psi_2(q) := \lim_{N \to +\infty} F_N(0,q).\] It is useful here to distinguish between the function \(\psi_1\) defined on \(\mathcal{Q}\) we introduced earlier in 32 , and that new function \(\psi_2 : \mathcal{Q}^2 \to \mathbb{R}\) we just defined, as indeed one can easily show using 15 that \[\label{e46psi246decomp} \psi_2(q) = \lambda_1 \psi_1(q_1) + \lambda_2 \psi_1(q_2).\tag{36}\] Calculations similar to those leading to 31 lead to the following conjecture.

Conjecture 1. The enriched free energy \(F_N : \mathbb{R}_+ \times \mathcal{Q}^2 \to \mathbb{R}\) for the bipartite model converges to the function \(f : \mathbb{R}_+ \times \mathcal{Q}^2 \to \mathbb{R}\) that solves \[\label{e46hj46bip} \begin{cases} \partial_t f - \int_0^1 \partial_{q_1} f \, \partial_{q_2} f = 0 & \text{ on } \mathbb{R}_+ \times \mathcal{Q}^2, \\ f(0,\cdot) = \psi_2 & \text{ on } \mathcal{Q}^2. \end{cases}\qquad{(1)}\]

In equation ?? , the nonlinearity (i.e. the mapping \((x,y) \mapsto xy\)) is neither convex nor concave, so we cannot write a variational representation similar to that in 35 for the solution to ?? .

A number of partial results have been obtained that give credence to Conjecture 1. For ease of discussion, let us suppose that the enriched free energy \(F_N : \mathbb{R}_+ \times \mathcal{Q}^2 \to \mathbb{R}\) of the bipartite model converges pointwise to some function \(g\) (one can easily show that converging subsequences exist). First, we know from [57], [59], [83] that \(g \geqslant f\), where \(f\) is the (unique) viscosity solution to ?? . Moreover, the limit function \(g\) is differentiable “almost everywhere”⁷, and the function \(g\) solves the equation displayed in ?? at every point of differentiability [62]. The latter property is weaker than that of being a viscosity solution to ?? though, so this is not sufficient to conclude. Another result from [62] gives a characterization of \(g\) in terms of a critical point of an explicit functional. While one can show that there exists a unique such critical point for small \(t\), there can be more than one critical point for large \(t\), so this is again not sufficient to conclude. For those readers who are familiar with Hamilton-Jacobi equations, the result can be understood as saying that the value of \(g\) at a given point \((t,q)\) is as prescribed by one of the characteristic lines that goes through \((t,q)\).

5 Conjectured un-inverted formula for general models↩︎

As already mentioned, direct generalizations of the Parisi formula to models such as the bipartite case yield formulas that are provably invalid. For a long time, I therefore could not see any alternative candidate variational formula for the limit of \(F_N\). I have changed my mind on this, and now believe that the limit of \(F_N\) in the bipartite case does admit a variational representation. However, I do not expect it to take a form similar to that of the Parisi formula in 5 , but rather to take a form similar to its “un-inverted” version in 10 .

For Hamilton-Jacobi equations such as ?? , there are two classical settings in which one can write a variational formula. The first, already discussed, is when the non-linearity in the equation is either convex or concave, but this is clearly not the case here. The second classical situation in which one can obtain a variational formula for the solution to ?? is when the initial condition \(\psi_2\) is either convex or concave. Alas, one can also see that the function \(\psi_2\) is neither convex nor concave in general [57]. This being said, the function \(\psi_2\) can be turned into a concave function through the following change of variables. Recalling the decomposition in 36 , we first work with the function \(\psi_1\), and interpret a single path \(q \in \mathcal{Q}\) as the inverse cumulative distribution function of a probability measure. In other words, there is a bijective correspondence between the set \(\mathcal{Q}\) and the set \(\mathcal{P}_c(\mathbb{R}_+)\) of probability measures over \(\mathbb{R}_+\) with compact support, through the mapping which to a given \(q \in \mathcal{Q}\) attributes the law of \(q(U)\), where \(U\) is a uniform random variable over \([0,1]\). Denoting this mapping by \(M : \mathcal{Q} \to \mathcal{P}_c(\mathbb{R}_+)\), we define \(\widetilde{\psi}_1 : \mathcal{P}_c(\mathbb{R}_+) \to \mathbb{R}\) so that \(\widetilde{\psi}_1(M q) = \psi_1(q)\). It turns out that the function \(\widetilde{\psi}_1\) is concave over \(\mathcal{P}_c(\mathbb{R}_+)\) [15]. Similarly, we can define \(\widetilde{\psi}_2 : (\mathcal{P}_c(\mathbb{R}_+))^2 \to \mathbb{R}\) such that for every \(q_1,q_2 \in \mathcal{Q}\), we have \(\widetilde{\psi}_2(M q_1, M q_2) = \psi_2(q_1, q_2)\), and by 36 , the function \(\widetilde{\psi}_2\) is concave over \((\mathcal{P}_c(\mathbb{R}_+))^2\).

Since \(\widetilde{\psi}_2\) is concave, it can be written as an infimum of affine functions. I now expect that the solution to the Hamilton-Jacobi equation in ?? can be represented as the infimum of the solutions started from these enveloping affine functions. Using also explicit calculations involving the function \(\psi_1\), this would yield an “un-inverted” representation of the limit free energy in the spirit of the right-hand side of 10 .

For general models of the form in 22 with convex \(\xi\), the connection between the Parisi formula, the Hamilton-Jacobi equation, and the un-inverted variational representation is verified in [86]. One natural goal, also interesting from a purely PDE perspective, is to extend the un-inverted variational representation of the Hamilton-Jacobi equation to the non-convex case. Since we know from [57], [59], [83] that the limit free energy is always greater than or equal to the solution to the Hamilton-Jacobi equation, this would give us a lower bound on the free energy in the form of a variational formula similar to the right-hand side of 10 . The justification of an inequality in the spirit of 14 would then allow us to complete this picture, prove Conjecture 1, and obtain a variational representation of the limit free energy for all models with a covariance taking the form in 22 .

6 Conclusion↩︎

Despite the relative simplicity of their definition, spin glasses display a mathematical structure that I find surprisingly rich and profound. Moreover, the ideas and techniques that were developed to study spin glasses have turned out to be useful in the analysis of a broad class of models across disciplines.

This short review is centered around the problem of identifying the free energy of models of spin glasses involving several different types of spins. We chose the bipartite model defined in 16 as our guiding example of models in the class of centered Gaussian fields with a covariance in the form of 22 . While the case when \(\xi\) is convex over \(\mathbb{R}^D_+\) is well-understood, models with non-convex \(\xi\) such as the bipartite model have so far resisted complete analysis. Strikingly, direct generalizations of the Parisi formula to these models yield predictions that are demonstrably false, and an alternative approach is required.

We reviewed one possible approach based on the idea that the limit free energy should satisfy a Hamilton-Jacobi equation. One can formulate the precise Conjecture 1 to this effect, and several partial results have been obtained that give substance to it.

Perhaps most interestingly, a connection between the Hamilton-Jacobi equation appearing in Conjecture 1 and an “un-inverted” variational formula in the spirit of that presented in Theorem 2 is starting to emerge. In my opinion, this “un-inverted” formula deserves further study and could potentially help us to better understand spin glasses, including those models for which the Parisi formula is already proved rigorously.

The topic of spin glasses is much broader than what this short and partial review could cover. Books on spin glasses include [4], [12], [50], [87]–[97]. The book [4] is close in spirit to the discussion presented here, and in particular to Section 4.

Acknowledgements. I would like to warmly thank Hong-Bin Chen, Tomás Dominguez, and Victor Issa, with whom much of what is presented here was developed.

References↩︎

[1]

Jean-Christophe Mourrat. An informal introduction to the Parisi formula. .

[2]

David Sherrington and Scott Kirkpatrick. Solvable model of a spin-glass. Phys. Rev. Lett., 35(26):1792, 1975.

[3]

Sanjeev Arora, Eli Berger, Hazan Elad, Guy Kindler, and Muli Safra. On non-approximability for quadratic programs. In 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS’05), pages 206–215. IEEE, 2005.

[4]

Tomas Dominguez and Jean-Christophe Mourrat. Statistical mechanics of mean-field disordered systems: a Hamilton-Jacobi approach. Zurich Lectures in Advanced Mathematics. European Mathematical Society, Zürich, 2024.

[5]

Giorgio Parisi. Infinite number of order parameters for spin-glasses. Phys. Rev. Lett., 43(23):1754, 1979.

[6]

Giorgio Parisi. The order parameter for spin glasses: a function on the interval 0-1. J. Phys. A, 13(3):1101, 1980.

[7]

Giorgio Parisi. A sequence of approximated solutions to the S-K model for spin glasses. J. Phys. A, 13(4):L115–L121, 1980.

[8]

Giorgio Parisi. Order parameter for spin-glasses. Phys. Rev. Lett., 50(24):1946, 1983.

[9]

Giorgio Parisi. Nobel lecture: Multiple equilibria. Reviews of Modern Physics, 95(3):030501, 2023.

[10]

Francesco Guerra. Broken replica symmetry bounds in the mean field spin glass model. Comm. Math. Phys., 233(1):1–12, 2003.

[11]

Michel Talagrand. The Parisi formula. Ann. of Math. (2), 163(1):221–263, 2006.

[12]

Dmitry Panchenko. The Sherrington-Kirkpatrick model. Springer Monographs in Mathematics. Springer, New York, 2013.

[13]

Dmitry Panchenko. The Parisi ultrametricity conjecture. Ann. of Math. (2), 177(1):383–393, 2013.

[14]

Jean-Christophe Mourrat. Un-inverting the Parisi formula. Ann. Inst. H. Poincaré Probab. Statist., to appear.

[15]

Antonio Auffinger and Wei-Kuo Chen. The Parisi formula has a unique minimizer. Comm. Math. Phys., 335(3):1429–1444, 2015.

[16]

David J Thouless, Philip W Anderson, and Robert G Palmer. Solution of’solvable model of a spin glass’. Philosophical Magazine, 35(3):593–601, 1977.

[17]

Antonio Auffinger and Aukosh Jagannath. On spin distributions for generic \(p\)-spin models. J. Stat. Phys., 174(2):316–332, 2019.

[18]

Antonio Auffinger and Aukosh Jagannath. Thouless-Anderson-Palmer equations for generic \(p\)-spin glasses. Ann. Probab., 47(4):2230–2256, 2019.

[19]

Wei-Kuo Chen and Dmitry Panchenko. On the TAP free energy in the mixed \(p\)-spin models. Comm. Math. Phys., 362(1):219–252, 2018.

[20]

Wei-Kuo Chen, Dmitry Panchenko, and Eliran Subag. Generalized TAP free energy. Comm. Pure Appl. Math., 76(7):1329–1415, 2023.

[21]

David Gamarnik. Turing in the shadows of Nobel and Abel: an algorithmic story behind two recent prizes. Notices Amer. Math. Soc., 72(5):485–493, 2025.

[22]

Ahmed El Alaoui, Andrea Montanari, and Mark Sellke. Optimization of mean-field spin glasses. Ann. Probab., 49(6):2922–2960, 2021.

[23]

Brice Huang and Mark Sellke. Algorithmic threshold for multi-species spherical spin glasses. .

[24]

David Jekel, Juspreet Singh Sandhu, and Jonathan Shi. Potential Hessian ascent: the Sherrington-Kirkpatrick model. In Proceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 5307–5387. SIAM, Philadelphia, PA, 2025.

[25]

Andrea Montanari. Optimization of the Sherrington-Kirkpatrick Hamiltonian. SIAM Journal on Computing, (0):FOCS19–1, 2021.

[26]

Mark Sellke. Optimizing mean field spin glasses with external field. Electron. J. Probab., 29:Paper No. 4, 47, 2024.

[27]

Eliran Subag. Following the ground states of full-RSB spherical spin glasses. Comm. Pure Appl. Math., 74(5):1021–1044, 2021.

[28]

David Gamarnik. The overlap gap property: a topological barrier to optimizing over random structures. Proc. Natl. Acad. Sci. USA, 118(41):e2108492118, 2021.

[29]

David Gamarnik and Aukosh Jagannath. The overlap gap property and approximate message passing algorithms for \(p\)-spin models. Ann. Probab., 49(1):180–205, 2021.

[30]

David Gamarnik, Cristopher Moore, and Lenka Zdeborová. Disordered systems insights on computational hardness. J. Stat. Mech. Theory Exp., (11):Paper No. 114015, 41, 2022.

[31]

Brice Huang and Mark Sellke. Tight Lipschitz hardness for optimizing mean field spin glasses. Comm. Pure Appl. Math., 78(1):60–119, 2025.

[32]

Jian Ding, Allan Sly, and Nike Sun. Proof of the satisfiability conjecture for large k. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 59–68, 2015.

[33]

Jian Ding, Allan Sly, and Nike Sun. Satisfiability threshold for random regular NAE-SAT. Comm. Math. Phys., 341(2):435–489, 2016.

[34]

Florent Krzakała, Andrea Montanari, Federico Ricci-Tersenghi, Guilhem Semerjian, and Lenka Zdeborová. Gibbs states and the set of solutions of random constraint satisfaction problems. Proc. Natl. Acad. Sci. USA, 104(25):10318–10323, 2007.

[35]

Marc Mézard and Andrea Montanari. Information, physics, and computation. Oxford Graduate Texts. Oxford University Press, Oxford, 2009.

[36]

Marc Mézard, Giorgio Parisi, and Riccardo Zecchina. Analytic and algorithmic solution of random satisfiability problems. Science, 297(5582):812–815, 2002.

[37]

Rémi Monasson, Riccardo Zecchina, Scott Kirkpatrick, Bart Selman, and Lidror Troyansky. Determining computational complexity from characteristic “phase transitions.” Nature, 400(6740):133–137, 1999.

[38]

Emmanuel Abbe. Community detection and stochastic block models: recent developments. J. Mach. Learn. Res., 18(1):6446–6531, 2017.

[39]

Lenka Zdeborová and Florent Krzakala. Statistical physics of inference: Thresholds and algorithms. Advances in Physics, 65(5):453–552, 2016.

[40]

Tom Richardson and Ruediger Urbanke. Modern coding theory. Cambridge university press, 2008.

[41]

Amin Coja-Oghlan. Phase transitions in discrete structures. In European Congress of Mathematics, pages 599–618. Eur. Math. Soc., Zürich, 2018.

[42]

Jian Ding, Allan Sly, and Nike Sun. Maximum independent sets on random regular graphs. Acta Math., 217(2):263–340, 2016.

[43]

Roberto Mulet, Andrea Pagnani, Martin Weigt, and Riccardo Zecchina. Coloring random graphs. Phys. Rev. Lett., 89(26):268701, 2002.

[44]

Shun’ichi Amari. Learning patterns and pattern sequences by self-organizing nets of threshold elements. IEEE Transactions on computers, 100(11):1197–1206, 1972.

[45]

Daniel J. Amit, Hanoch Gutfreund, and Haim Sompolinsky. Spin-glass models of neural networks. Phys. Rev. A, 32(2):1007, 1985.

[46]

Daniel J. Amit, Hanoch Gutfreund, and Haim Sompolinsky. Statistical mechanics of neural networks near saturation. Annals of Physics, 173(1):30–67, 1987.

[47]

Adriano Barra, Alberto Bernacchia, Enrica Santucci, and Pierluigi Contucci. On the equivalence of Hopfield networks and Boltzmann machines. Neural Networks, 34:1–9, 2012.

[48]

John J. Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA, 79(8):2554–2558, 1982.

[49]

William A Little. The existence of persistent states in the brain. Mathematical biosciences, 19(1-2):101–120, 1974.

[50]

Michel Talagrand. Mean field models for spin glasses. Volume II, volume 55 of Ergebnisse der Mathematik und ihrer Grenzgebiete. Springer, Heidelberg, 2011.

[51]

Adriano Barra, Giuseppe Genovese, Peter Sollich, and Daniele Tantari. Phase diagram of restricted Boltzmann machines and generalized Hopfield networks with arbitrary priors. Phys. Rev. E, 97(2):022310, 2018.

[52]

Geoffrey E. Hinton. A practical guide to training restricted Boltzmann machines. In Neural networks: Tricks of the trade, pages 599–619. Springer, 2012.

[53]

Geoffrey E. Hinton. Nobel lecture: Boltzmann machines. Reviews of Modern Physics, 97(3):030502, 2025.

[54]

Paul Smolensky. Information processing in dynamical systems: foundations of harmony theory. In Parallel distributed processing: explorations in the microstructure of cognition, volume 1, pages 194–281. MIT Press, 1986.

[55]

Jérôme Tubiana. Restricted Boltzmann machines: from compositional representations to protein sequence analysis. PhD thesis, Université Paris Sciences et Lettres, 2018.

[56]

Jérôme Tubiana and Rémi Monasson. Emergence of compositional representations in restricted boltzmann machines. Phys. Rev. Lett., 118(13):138301, 2017.

[57]

Jean-Christophe Mourrat. Nonconvex interactions in mean-field spin glasses. Probab. Math. Phys., 2(2):281–339, 2021.

[58]

I. J. Schoenberg. Positive definite functions on spheres. Duke Math. J., 9:96–108, 1942.

[59]

Jean-Christophe Mourrat. Free energy upper bound for mean-field vector spin glasses. Ann. Inst. Henri Poincaré Probab. Stat., 59(3):1143–1182, 2023.

[60]

Adriano Barra, Pierluigi Contucci, Emanuele Mingione, and Daniele Tantari. Multi-species mean field spin glasses. Rigorous results. Ann. Henri Poincaré, 16(3):691–708, 2015.

[61]

Hong-Bin Chen. On free energy of non-convex multi-species spin glasses. .

[62]

Hong-Bin Chen and Jean-Christophe Mourrat. On the free energy of vector spin glasses with nonconvex interactions. Probab. Math. Phys., 6(1):1–80, 2025.

[63]

Dmitry Panchenko. The free energy in a multi-species Sherrington-Kirkpatrick model. Ann. Probab., 43(6):3494–3513, 2015.

[64]

Dmitry Panchenko. Free energy in the Potts spin glass. Ann. Probab., 46(2):829–864, 2018.

[65]

Dmitry Panchenko. Free energy in the mixed \(p\)-spin models with vector spins. Ann. Probab., 46(2):865–896, 2018.

[66]

Antonio Auffinger and Wei-Kuo Chen. Free energy and complexity of spherical bipartite models. J. Stat. Phys., 157(1):40–59, 2014.

[67]

Jinho Baik and Ji Oon Lee. Free energy of bipartite spherical Sherrington-Kirkpatrick model. Ann. Inst. Henri Poincaré Probab. Stat., 56(4):2897–2934, 2020.

[68]

Erik Bates and Youngtak Sohn. Balanced multi-species spin glasses. .

[69]

Stephane Dartois and Benjamin McKenna. Injective norm of real and complex random tensors I: From spin glasses to geometric entanglement. .

[70]

Eliran Subag. TAP approach for multispecies spherical spin glasses II: The free energy of the pure models. Ann. Probab., 51(3):1004–1024, 2023.

[71]

Eliran Subag. TAP approach for multi-species spherical spin glasses I: General theory. Electron. J. Probab., 30:Paper No. 87, 32, 2025.

[72]

J. G. Brankov and V. A. Zagrebnov. On the description of the phase transition in the Husimi-Temperley model. J. Phys. A, 16(10):2217–2224, 1983.

[73]

Charles Newman. Percolation theory: A selective survey of rigorous results. In Advances in multiphase flow and related problems. SIAM, 1986.

[74]

Roland Bauerschmidt, Thierry Bodineau, and Benoit Dagallier. Stochastic dynamics and the Polchinski equation: an introduction. Probab. Surv., 21:200–290, 2024.

[75]

Hongbin Chen, Jean-Christophe Mourrat, and Jiaming Xia. Statistical inference of finite-rank tensors. Ann. H. Lebesgue, 5:1161–1189, 2022.

[76]

Elena Agliari, Adriano Barra, Raffaella Burioni, and Aldo Di Biasio. Notes on the p-spin glass studied via Hamilton-Jacobi and smooth-cavity techniques. J. Math. Phys., 53(6):063304, 29, 2012.

[77]

Adriano Barra, Gino Del Ferraro, and Daniele Tantari. Mean field spin glasses treated with PDE techniques. Eur. Phys. J. B, 86(7):Art. 332, 10, 2013.

[78]

Adriano Barra, Aldo Di Biasio, and Francesco Guerra. Replica symmetry breaking in mean-field spin glasses through the Hamilton-Jacobi technique. J. Stat. Mech. Theory Exp., (9):P09006, 22, 2010.

[79]

Francesco Guerra. Sum rules for the free energy in the mean field spin glass model. Fields Institute Communications, 30(11), 2001.

[80]

Tomas Dominguez and Jean-Christophe Mourrat. Infinite-dimensional Hamilton-Jacobi equations for statistical inference on sparse graphs. SIAM J. Math. Anal., 56(4):4530–4593, 2024.

[81]

Tomas Dominguez and Jean-Christophe Mourrat. Mutual information for the sparse stochastic block model. Ann. Probab., 52(2):434–501, 2024.

[82]

Anastasia Kireeva and Jean-Christophe Mourrat. Breakdown of a concavity property of mutual information for non-Gaussian channels. Inf. Inference, 13(2):Paper No. iaae008, 21, 2024.

[83]

Hong-Bin Chen and Jiaming Xia. . Probab. Theory Related Fields, 192(3):803–873, 2025.

[84]

Jean-Christophe Mourrat. The Parisi formula is a Hamilton-Jacobi equation in Wasserstein space. Canad. J. Math., 74(3):607–629, 2022.

[85]

Jean-Christophe Mourrat and Dmitry Panchenko. Extending the Parisi formula along a Hamilton-Jacobi equation. Electron. J. Probab., 25:Paper No. 23, 17, 2020.

[86]

Hong-Bin Chen, Victor Issa, and Jean-Christophe Mourrat. The convex structure of the Parisi formula for multi-species spin glasses. .

[87]

Erwin Bolthausen and Anton Bovier, editors. Spin glasses, volume 1900 of Lecture Notes in Mathematics. Springer, Berlin, 2007.

[88]

Anton Bovier. Statistical mechanics of disordered systems: a mathematical perspective, volume 18 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2006.

[89]

Anton Bovier and Pierre Picco, editors. Mathematical aspects of spin glasses and neural networks, volume 41 of Progress in Probability. Birkhäuser Boston, Inc., Boston, MA, 1998.

[90]

Patrick Charbonneau, Enzo Marinari, Marc Mézard, Giorgio Parisi, Federico Ricci-Tersenghi, Gabriele Sicuro, and Francesco Zamponi. Spin glass theory and far beyond. World Scientific, 2023.

[91]

Pierluigi Contucci and Cristian Giardinà. Perspectives on spin glasses. Cambridge University Press, Cambridge, 2013.

[92]

Cirano De Dominicis and Irene Giardina. Random fields and spin glasses: a field theory approach. Cambridge University Press, New York, 2006.

[93]

Marc Mézard, Giorgio Parisi, and Miguel Virasoro. Spin glass theory and beyond, volume 9 of World Scientific Lecture Notes in Physics. World Scientific Publishing Co., Inc., Teaneck, NJ, 1987.

[94]

Hidetoshi Nishimori. Statistical physics of spin glasses and information processing, volume 111 of International Series of Monographs on Physics. Oxford University Press, New York, 2001.

[95]

Manfred Opper and David Saad, editors. Advanced mean field methods. Neural Information Processing Series. MIT Press, Cambridge, MA, 2001.

[96]

Daniel L. Stein and Charles M. Newman. Spin glasses and complexity. Primers in Complex Systems. Princeton University Press, Princeton, NJ, 2013.

[97]

Michel Talagrand. Mean field models for spin glasses. Volume I, volume 54 of Ergebnisse der Mathematik und ihrer Grenzgebiete. Springer-Verlag, Berlin, 2011.

Why such models may relate to glass is discussed in [1].↩︎
Although the concept of ultrametricity is central to the topic, it will not be discussed much in this note. The interested reader can consult [1] for a light introduction, and [4] and [12] for more precision.↩︎
The real benefit of this terminology is that it is robust to the sign convention we choose, as many authors (and we too later in this note) would add a minus sign to the definition of the free energy.↩︎
In general, we do not need to restrict ourselves to models defined on \(\{-1,1\}^N\). For a model whose covariance is given by 19 , we can consider \[F_N(t,h) := - \frac{1}{N} \mathbb{E}\log \int \exp\left(\sqrt{2t} H_N(\sigma) - N t\xi(|\sigma|^2/N) + \sqrt{2h} z \cdot \sigma - h|\sigma|^2\right) \, \mathrm{d}P_N(\sigma),\] where \(P_N = P_1^{\otimes N}\) is the \(N\)-fold tensor product of a probability measure \(P_1\) on \(\mathbb{R}\) with compact support. We recover the SK model 1 by choosing \(\xi(r) = r^2\) and \(P_1 = (\delta_1 + \delta_{-1})/2\). Notice that with this definition (and thanks to the minus sign appearing there), Jensen’s inequality yields that \(F_N \geqslant 0\). The additional terms involving \(t\xi(|\sigma|^2/N)\) and \(|\sigma|^2\) facilitate the analysis, and can be removed a posteriori as they are not themselves random.↩︎
By “increasing”, we mean that for every \(s \leqslant t \in [0,1)\), we have \(q(s) \leqslant q(t)\).↩︎
Recall that in this section, we added a minus sign in the definition of the free energy in 23 , so what we see as a supremum here is an infimum in the convention of Sections 1 and 2.↩︎
Quotation marks are due here because there is no Lebesgue measure on \(\mathcal{Q}\); the exact formulation is in terms of Gaussian null sets.↩︎

Spin glasses and the Parisi formula