November 17, 2023
Annealing is the process of gradually lowering the temperature of a system to guide it towards its lowest energy states. In an accompanying paper [Luo et al. Phys. Rev. E 108, L052105 (2023)], we derived a general bound on annealing performance by connecting annealing with stochastic thermodynamics tools, including a speed-limit on state transformation from entropy production. We here describe the derivation of the general bound in detail. In addition, we analyze the case of simulated annealing with Glauber dynamics in depth. We show how to bound the two case-specific quantities appearing in the bound, namely the activity, a measure of the number of microstate jumps, and the change in relative entropy between the state and the instantaneous thermal state, which is due to temperature variation. We exemplify the arguments by numerical simulations on the SK model of spin-glasses.
Simulated annealing (SA) is a heuristic optimization algorithm to approximate the global minimum of a function in a large search space [1]–[4]. The algorithm models the physical annealing process in metallurgy [5]: by treating the function to be optimized as the energy landscape of a system, the lowest energy state corresponding to the global minimum can be evolved via a gradually cooling process. SA has been massively used in a wide variety of real applications [6]–[8] and has stimulated novel development in hardware [9], [10].
The ideal SA finds the global optimum of any function from a quasistatic cooling process, requiring infinite annealing time [11]. However, in a realistic setting, the cooling schedule is performed in a finite time. As a result, real SA algorithms do not guarantee the global optimum but only find it probabilistically. Significant efforts have been made to investigate the convergence of real SA algorithms [12]–[14], to analyze the effect of finite-length cooling schedules [15]–[18], and to gauge the performance of SA in specific problems [19], [20]. In an accompanying paper [21], we have derived an analytical bound on the performance of annealing by using recent methods from finite-time stochastic thermodynamics [22]–[29]. The resulting bound holds for all cooling schedules.
In this paper, we provide the full details of the results in Ref. [21] and show explicitly how the theorems and methods in finite-time stochastic thermodynamics can be adapted to quantify the SA performance. We derive a method for bounding the activity in SA, a measure of the number of microstate jumps. We also show how to bound the change in relative entropy between the state and the instantaneous thermal state which is due to temperature variation. Our arguments involve two conjectures which we describe and justify, and which we hope may spark further research.
The paper is organized as follows. In Sec. 2, we introduce key concepts in stochastic thermodynamics that we shall use. In Sec. 3, we prove the bound proposed in Ref. [21] which characterizes the performance of any (physical or simulated) annealing process. In Sec. 4, we apply this bound to SA and discuss in detail the assessment of SA performance from accessible parameters. Finally, in Sec. 5, we conclude our results.
We start by briefly introducing the stochastic thermodynamical framework of annealing processes, using standard terminology of e.g.Refs. [23], [26], [27].
There is a system of interest. The system has \(N\) energy levels denoted as \(\{E_i\}_{i=1}^N\). The statistical state of the system is defined as a vector \(p(t) = \left[p_1(t),\dots,p_N(t)\right]\), where \(p_i(t)\) is the probability of the system on the energy level \(E_i\) at time \(t\). The average energy of the system is \[E_p:=\sum_{i=1}^N p_iE_i.\]
In annealing, the system of interest is in contact with a heat bath whose temperature \(T\) can be controlled to vary in time. If the system is not in thermal equilibrium, it will then thermalize, that is, the state of the system will evolve towards the thermal Gibbs’ state \(\gamma = \left[\gamma_i,\dots,\gamma_N\right]\), with \(\gamma_i = \exp\left(-\beta E_i\right)/Z\), where \(Z = \sum_i\exp\left(-\beta E_i\right)\) is the partition function and \(\beta = 1/T\) is the inverse temperature (The Boltzmann constant \(k_\mathrm{B}\) is taken to be 1). In annealing, the thermal state \(\gamma(t)\) is time-dependent due to the varying temperature \(T(t)\).
The dynamics of a thermalization process are described by a master equation [26], [30], [31]: \[\label{eq:master95equation} \dot{p}_i(t) =\sum_{j(\neq i)}\Gamma_{ij}(t)p_j(t)-\Gamma_{ji}(t)p_i(t),\tag{1}\] where \(\Gamma_{ij}(t)\) is the generator satisfying \(\sum_i\Gamma_{ij}(t) = 0, \forall i\) and \(\Gamma_{ij}(t) \ge 0, \forall i\neq j\). In addition, we suppose that the detailed balance condition is satisfied in the process, i.e., \[\Gamma_{ij}(t)\gamma_j(t) = \Gamma_{ji}(t)\gamma_i(t),\quad \forall i,j., \label{eq:detailed_balance}\tag{2}\] which is commonly assumed in stochastic thermodynamics (See e.g.Ref. [22], [23], [27]).
As the system evolves, there are changes in its energy and entropy. It is common to break the change in (average) energy \(E_p\) into two parts: \[\begin{align} dE_p & =d\left( \sum_i p_iE_i \right)=\sum_i E_idp_i+\sum_i p_idE_i \nonumber\\ &:=dQ+dW, \end{align}\] where \(Q\) is called the heat added to the system and \(W\) is called the work done to the system. In annealing, the energy spectrum is unchanged and \(dE_p=dQ\). The thermodynamic entropy of the system (again for \(k_\mathrm{B}=1\)), \[S_p:=-\sum_i p_i\ln p_i,\] also changes due to the evolution of \(p\). Note that we shall make no distinctions between the thermodynamic entropy and Shannon entropy since the two quantities are the same within our setting. For the thermal state \(\gamma\) transitioning to a new thermal state at an infinitesimally different temperature, as in quasistatic annealing, one can show \(\beta dQ=dS_p\). If the state is not a thermal state, one has \(dS_p\geq \beta dQ\). The quantity \[\label{eq:32entroprodrate} \dot{\Sigma}(t) = \dot{S}_p(t) - \beta(t)\dot{Q}(t),\tag{3}\] is thus non-negative in the case where the system is not in thermal equilibrium. \(\dot{\Sigma}\) is commonly termed the entropy production rate. The entropy production when two systems 1 and 2 (like the system and the bath) interact is by definition \(dS_1+dS_2\), in line with early ideas of entropy being akin to a fluid existing in and flowing between different systems and possibly being created [32]. Here the bath is in a thermal state, such that \(dS_1+dS_2=dS_1+\beta(t)dQ_2=dS_1-\beta(t)dQ_1\), justifying the name entropy production rate for \(\dot{\Sigma}\) (see e.g. Ref. [27] for more).
To characterize the performance of an annealing protocol run in finite time \(\tau\), we consider the 1-norm distance [22]–[24] to the final thermal state \(\gamma(\tau)\): \[L_t := \sum_i|p_i(t)-\gamma_i(\tau)|. \label{eq:L_t}\tag{4}\] The probability of not having a globally optimal state at time \(\tau\) when \(T(\tau)=0\), \(p(\text{non-optimal})\) is bounded by \(L_\tau\). To show this, notice that the 1-norm distance is twice the total variation distance [31] such that \[\begin{align} \frac{1}{2}L_\tau&=\max_{\mathrm{events}} |p(\mathrm{events})-\gamma(\mathrm{events})|\nonumber\\ &\geq |p(\text{non-optimal})-\gamma(\text{non-optimal})|\nonumber\\ &=p(\text{non-optimal}), \end{align}\] where in the last line we used the fact that \(\gamma(\text{non-optimal}) = 0\) at zero temperature. \(L_\tau\) thus can be considered as the performance error of the annealing. In the next section, an upper bound of \(L_\tau\) in terms of \(L_0\) for a general annealing process will be given.
We then consider such an annealing process, where a system initially in thermal equilibrium with a heat bath and the bath temperature is continuously decreased from \(T_{\mathrm{i}}=T(0)\) to \(T_{\mathrm{f}}=T(\tau)\) within a finite time \(\tau\). In the accompanying paper [21], we derived a general limit on the annealing performance error represented by \(L_\tau\). Here, we provide the details of the derivation in the following three subsections.
A critical challenge in evaluating the effectiveness of an annealing process is to measure the deviation of the system state, \(p(\tau)\), from the reference thermal state \(\gamma(\tau)\). While the 1-norm distance, \(L_\tau\), is a suitable metric for this purpose, its direct calculation can be difficult. However, relative entropy, a comparison tool widely adopted in information theory to compare two probability distributions [33], proves to be a good alternative to start with, due to its deep connection to non-equilibrium thermodynamics. The relative entropy between probability distributions \(p(t)\) and \(\gamma(t)\) is defined as \[S(p(t)||\gamma(t)) = \sum_i p_i(t)\ln\dfrac{p_i(t)}{\gamma_i(t)}.\] Its close relationship with free energy is apparent via the expression, \[F(t)-F_{eq}(t)=T(t)S(p(t)||\gamma(t)),\] where the non-equilibrium free energy \(F(t):=E_p(t)-T(t)S_p(t)\) and the equilibrium free energy \(F_{eq}(t):=E_\gamma(t)-T(t)S_\gamma(t)\) with \(E_\gamma(t)\) and \(S_\gamma(t)\) being the energy and entropy of the thermal state \(\gamma(t)\), respectively.
Taking the time derivative of \(S(p||\gamma)\), one gets \[\begin{align} \dfrac{\mathrm{d}}{\mathrm{d}t}S(p||\gamma) &= \dot{p}\frac{\partial }{\partial p}S(p||\gamma)+\dot{\gamma}\frac{\partial}{ \partial \gamma}S(p||\gamma) \nonumber\\ &= \dot{p}\frac{\partial }{\partial p}S(p||\gamma)+\dot{\beta}\frac{\partial}{ \partial \beta}S(p||\gamma) \nonumber\\ &= -\dot{\Sigma} + \dot{\mathcal{I}}, \label{eq:relative_entropy_dot} \end{align}\tag{5}\] We immediately see that a new rate, \(\dot{\mathcal{I}}:=\dot{\beta}\frac{\partial}{ \partial \beta}S(p||\gamma)\), compared to a fixed temperature scenario, appears at the end of Eq. (5 ). \(\mathcal{I}\) is an annealing-related quantity, arising from the temperature variation. A direct calculation shows that \[\begin{align} \dot{\mathcal{I}} = \dot{\gamma}\frac{\partial}{ \partial \gamma}S(p||\gamma) =- \sum_i\dot{\gamma}_i\dfrac{p_i}{\gamma_i} = (E_p - E_\gamma)\dot{\beta}, \end{align}\] where the expression \[\dot{\gamma}_i = \dot{\beta}(E_\gamma - E_i)\gamma_i, \label{eq:gamma_dot}\tag{6}\] is used in the last equality.
We now comment on the physical meaning of these terms from Eq. (5 ). From Eq. 3 , the entropy production rate, \(\dot{\Sigma}\), is non-zero when the system thermalizes towards a fixed thermal state. We imagine that, in ideal annealing processes, the change of the thermal states can be regarded as the switching of the system to heat baths at different temperatures. The disconnection and re-connection of the system with baths should not alter the system state instantaneously, resulting in no entropy production during the change of the reference thermal state. However, the variation of the reference state does affect the value of the relative entropy. This contribution is represented by \(\dot{\mathcal{I}}\).
We proceed to split the relative entropy. Integrating both sides of Eq. (5 ) from 0 to \(\tau\) gives us \[S(p(\tau)||\gamma(\tau)) - S(p(0)||\gamma(0)) = -\Sigma(\tau) + \mathcal{I}(\tau),\] where \(\Sigma(\tau) = \int_0^\tau \dot{\Sigma}\mathrm{d} t\) and \[\mathcal{I}(\tau) = \int_0^\tau (E_p-E_\gamma)\dot{\beta}\mathrm{d} t. \label{eq:I95tau}\tag{7}\]
Since the system is assumed to be in equilibrium initially, which is the most common case in annealing, we have \(S(p(0)||\gamma(0)) = 0\) and therefore obtain, \[S(p(\tau)||\gamma(\tau)) = -\Sigma(\tau) + \mathcal{I}(\tau). \label{eq:relative95entropy}\tag{8}\]
Using Pinsker’s inequality [34], \(L_\tau^2\le 2S(p(\tau)||\gamma(\tau))\), we arrive at an upper bound on \(L_\tau^2\), \[L_\tau^2 \le 2\left[-\Sigma(\tau) + \mathcal{I}(\tau)\right]. \label{eq:L94295bound}\tag{9}\]
In this subsection, we, inspired by Ref. [22], derive a lower bound for the entropy production \(\Sigma(\tau)\) in the varying temperature case. For the clarity of equations, we omit the time dependence of \(p_i(t)\) and \(\Gamma_{ij}(t)\).
First, we rewrite \(\dot{\Sigma}(t)\) as \[\begin{align} \dot{\Sigma}(t) &= -\sum_i \dot{p}_i\left(\ln p_i + \beta(t) E_i\right) \nonumber\\ &= \dfrac{1}{2} \sum_{i\neq j}\left(\Gamma_{ij}p_j - \Gamma_{ji}p_i\right)\ln\frac{\Gamma_{ij}p_j}{\Gamma_{ji}p_i} \nonumber\\ &\ge \sum_{i\neq j} \dfrac{\left(\Gamma_{ij}p_j - \Gamma_{ji}p_i\right)^2}{\Gamma_{ij}p_j + \Gamma_{ji}p_i}, \label{eq:entropy95prod95rate} \end{align}\tag{10}\] where we have used Eq. 1 , the relation \(\beta(t)E_i=-\ln \gamma_i-\ln Z\) in the second line, and the inequality \((x-y)\ln (x/y) \geq 2(x-y)^2/(x+y)\) in the third line, as suggested by Ref. [22].
Meanwhile, using the Cauchy-Schwarz inequality, we have \[\begin{align} \sum_i |\dot{p}_i| &= \sum_{i\neq j}\left|\Gamma_{ij}p_j - \Gamma_{ji}p_i\right| \nonumber\\ &\le \sqrt{\left(\sum_{i\neq j} \dfrac{\left(\Gamma_{ij}p_j - \Gamma_{ji}p_i\right)^2}{\Gamma_{ij}p_j + \Gamma_{ji}p_i}\right)\left(\sum_{i\neq j}\Gamma_{ij}p_j + \Gamma_{ji}p_i\right)} \nonumber\\ &\le \sqrt{2\dot{\Sigma}(t)\mathcal{A}(t)}, \end{align}\] where we have defined the activity \[\mathcal{A}(t) := \sum_{i}\sum_{j(\neq i)}\Gamma_{ij}(t)p_j(t), \label{eq:activity}\tag{11}\] which is the expected jumping rate among different states at time \(t\). We denote the expected number of jumps during the time interval \([0,\tau]\) as \[\begin{align} \label{eq:32expectN} \langle N_{\mathrm{jumps}}\rangle =\int_0^\tau \mathcal{A}(t) dt = \langle \mathcal{A} \rangle_\tau\tau, \end{align}\tag{12}\] where \(\langle \mathcal{A} \rangle_\tau:=\frac{1}{\tau}\int_0^\tau\mathcal{A}(t) dt\). (See also discussion around Eq. (24 ) for further explanation of why we call this the expected number of jumps).
Then, the 1-norm distance between \(p(0)\) and \(p(\tau)\) satisfies \[\begin{align} \sum_i|p_i(0)-p_i(\tau)| &\le \sum_i\int_0^\tau \mathrm{d}t |\dot{p}_i(t)| \nonumber\\ &\le \int_0^\tau \mathrm{d}t \sqrt{2\dot{\Sigma}(t)\mathcal{A}(t)} \nonumber\\ &\le\sqrt{2\Sigma(\tau)\langle \mathcal{A} \rangle_\tau\tau}, \label{eq:speed95limit95mid95step} \end{align}\tag{13}\] Equivalently, we obtain \[\Sigma(\tau) \ge \dfrac{\left(\sum_i|p_i(0)-p_i(\tau)|\right)^2}{2\langle \mathcal{A} \rangle_\tau\tau}. \label{speed95limit95origin}\tag{14}\] That lower bound on \(\Sigma(\tau)\) is termed the speed limit from entropy production on the transformation from the initial state \(p(0)\) to the final \(p(\tau)\).
As a concrete example, consider a quasistatic process, which has zero entropy production (\(\Sigma(\tau)=0\)). Then the speed limit implies that for a finite activity \(\langle \mathcal{A} \rangle_\tau\), a transition between two distinct statistical states requires \(\tau \rightarrow \infty\).
Since \(\sum_i|p_i(0)-p_i(\tau)|\) is a distance, by triangular inequality, we have \[\begin{align} \label{eq:32systdist} &\sum_i|p_i(0)-p_i(\tau)| \nonumber \\ &\ge \Big|\sum_i|p_i(0)-\gamma_i(\tau)|-\sum_i|p_i(\tau)-\gamma_i(\tau)|\Big| \nonumber\\ &= |L_0-L_\tau|. \end{align}\tag{15}\] Substituting Eq. 15 into Eq. (14 ), we finally obtain a lower bound of the entropy production for a general annealing process. \[\Sigma(\tau) \ge \dfrac{(L_0-L_\tau)^2}{2\langle \mathcal{A} \rangle_\tau\tau}. \label{eq:speed95limit}\tag{16}\]
Combining Eqs. 9 and 16 , we have \[\label{eq:UQineq} L_\tau^2 \le - \dfrac{(L_0-L_\tau)^2}{\langle \mathcal{A} \rangle_\tau\tau} + 2\mathcal{I}(\tau).\tag{17}\] Eq. 17 can be regarded as a unary quadratic inequality for \(L_\tau\). Solving this inequality for \(L_\tau\) gives us \[L_\tau \le \dfrac{L_0 + \sqrt{\langle\mathcal{A}\rangle_\tau\tau\left(-L_0^2 + 2\mathcal{I}(\tau)\left(\langle\mathcal{A}\rangle_\tau\tau + 1\right)\right)}}{\langle\mathcal{A}\rangle_\tau\tau+1}. \label{eq:main95bound}\tag{18}\]
This is the general bound of an annealing process as obtained in Ref. [21]. It is worth noticing that Eq. (18 ) is applicable for any system energy landscape and any cooling schedule as long as the thermalization dynamics respect the detailed balance condition in Eq. (2 ). Since the terms inside the square root in Eq. (18 ) are non-negative given Eq. (17 ), \(\langle \mathcal{A} \rangle_\tau\) and \(\mathcal{I}(\tau)\) obey a trade-off relation: \[\mathcal{I}(\tau) \ge \dfrac{L_0^2}{2\left(\langle \mathcal{A} \rangle_\tau\tau+1\right)}. \label{eq:trade95off}\tag{19}\] This can be understood as a ‘speed limit’ related to \(\mathcal{I}(\tau)\), which can only be small if the expected number of jumps \(\langle N_{\mathrm{jumps}}\rangle = \langle \mathcal{A} \rangle_\tau\tau\) is large, given that \(L_0\) is fixed.
In the quasistatic limit associated with the protocol duration \(\tau\rightarrow\infty\), the tightness of our bound [Eq. (18 )] for the error \(L_\tau\) is guaranteed. This can be seen by investigating how our bound on SA performance scales with \(\tau\). Let the rate of varying temperature \(\dot{T}\) decrease with \(\tau\). In this case, \(\mathcal{I}(\tau)\) will diminish as \(\tau\) increases. This can be seen from bounding the expression of \(\mathcal{I}(\tau)\) in Eq. (7 ): \[\mathcal{I}(\tau) = \int_0^\tau (E_p-E_\gamma)\dot{\beta}\mathrm{d} t \le \max_{t\in[0,\tau]} (E_p-E_\gamma)\Delta\beta,\] with \(\Delta\beta\equiv \beta_\mathrm{f} - \beta_\mathrm{i}\) being the maximal range of the inverse temperature. As \(\tau\) increases, the thermal state of the system changes slower, leading to a smaller \(\max_{t\in[0,\tau]} (E_p-E_\gamma)\) that measures the departure from the equilibrium energy. We can define the downward scaling of \(\mathcal{I}(\tau)\) for large \(\tau\) in terms of the big O notation for an asymptotic upper bound: \(\mathcal{I}(\tau) = \mathcal{O}(\tau^{-\alpha})\), where \(0<\alpha\le 1\) is some scalar. An upper bound on \(\alpha\) is in fact imposed by the trade-off relation [Eq. (19 )] (assuming \(\langle \mathcal{A} \rangle_\tau\) is finite (bounded)). Substituting that upper bound of \(\mathcal{I}(\tau)\) into Eq. (18 ), the upper bound on \(L_\tau\), we obtain \(L_\tau = \mathcal{O}(\tau^{-\alpha/2})\). This scaling implies the tightness of the bound for quasistatic protocols: when \(\tau\rightarrow\infty\), the protocol approaches being quasistatic and both \(L_\tau\) and its bound vanish.
We now further analyse how to evaluate Eq. (18 ) when using it to bound the performance of SA, in particular, how to evaluate \(\langle \mathcal{A} \rangle_\tau\) and \(\mathcal{I}(\tau)\) without calculating the evolution of the statistical state of the system, \(p(t)\).
In this section we apply the bound on annealing performance, Eq. (18 ), to the case of SA.
SA is an optimization algorithm inspired by real annealing processes. The algorithm treats the cost function as the energy of a system and uses a control parameter called ‘temperature’ to anneal it. The physical principle is that a system at thermal equilibrium has a higher probability of staying in its ground state at lower temperatures. The algorithm seeks to find the minimal cost by gradually lowering the temperature and moving closer to the final thermal equilibrium. Therefore, the success of the optimization depends on how far the system is from this equilibrium state.
In SA, the annealing time \(\tau\) used in the previous section is replaced by the discrete time steps \(\kappa\). \(L_\kappa \equiv \sum_i |p_i(\kappa) - \gamma_i(\kappa)|\) becomes the error from the target state. As we will show, a discrete-time analogue of Eq. (18 ) provides an upper bound to the error when the SA is run in \(\kappa\) steps.
To illustrate how to evaluate \(\langle \mathcal{A}^\mathrm{D} \rangle_\kappa\) and \(\mathcal{I}(\kappa)\), the discrete version of \(\langle \mathcal{A} \rangle_\tau\) and \(\mathcal{I}(\tau)\), without referring to the intermediate statistical state of the system, we first briefly set up the framework of the SA in Sec. 4.1 and bound \(\langle \mathcal{A}^\mathrm{D} \rangle_\kappa\) and \(\mathcal{I}(\kappa)\) in Sec. 4.2 and Sec. 4.3, respectively. A performance bound of SA is derived in Sec. 4.4. In Sec. 4.5 we show how \(\mathcal{I}(\kappa)\) can be calculated analytically for a 1D Ising chain. A numerical verification of the tightness of the bound on a Sherrington-Kirkpatrick (SK) model, a fully connected spin glass with Gaussian couplings [1], [35], is given in Sec. 4.6. Finally, in Sec. 4.7, we provide further analytical and numerical discussion on the relaxation behaviour of SA by using SK models.
In SA, the system state is updated iteratively until it approximately reaches equilibrium. The evolution of the state can be viewed as a discrete-time Markov chain [31]. Suppose that temperature decreases from \(T_\text{i}\) to \(T_\text{f}\) in \(\kappa\) steps, i.e., \(T_\text{i}=T_0, T_1, \dots, T_k, \dots, T_\kappa=T_\text{f}\). At step \(k\), let the probability of the system staying in state \(i\) be \(p_i(k)\). For each iteration step \(k\), the state changes are described by \[p_i(k+1) = \sum_jP_{ij}(k+1)p_j(k), \label{eq:SA_evolution}\tag{20}\] where \(P_{ij}(k+1) = \mathrm{Prob}\{X(k+1)=i|X(k) = j\}\) is the transition matrix with \(X(k)\) denoting the system state at step \(k\), satisfying \(\sum_i P_{ij}(k) = 1,\forall j\) and \(P_{ij}(k) \ge 0, \forall i,j\).
In typical SA algorithms, \(P_{ij}(k)\) has the following form \[P_{ij}(k) = \left\{ \begin{align} &\dfrac{1}{|\mathcal{N}(j)|} A_{ij}(k), & i \in \mathcal{N}(j)\\ &1-\sum_{l(\neq j)}P_{lj}(k), & i=j\\ &0, & \text{otherwise} \end{align} \right., \label{eq:P(k)}\tag{21}\] where \(|\mathcal{N}(j)|\) is the size of \(\mathcal{N}(j)\), the neighbourhood of the state \(j\) in which all states can be reached in one step from \(j\), and the acceptance rate \(A_{ij}(k)\) is the probability of accepting such a transition from \(j\) to \(i\).
In line with SA simulating a thermalisation process, the detailed balance condition [Eq. (2 )] is imposed. Here, we more specifically adopt the acceptance rate from the Glauber dynamics which is widely used in SA and spin glass studies [31], [36]–[38], \[A_{ij}(k) = \frac{1}{1+\exp(\beta(k)\Delta E_{ij})} = \frac{\gamma_i(k)}{\gamma_i(k) + \gamma_j(k)}, \label{eq:A_ij}\tag{22}\] where the energy difference \(\Delta E_{ij} = E_i - E_j\).
We now proceed to bound the activity that appears in Eq. (18 ) for SA with Glauber dynamics. We first define the discrete-time activity \(\mathcal{A}^\mathrm{D}\). We then derive an expression for \(\mathcal{A}^\mathrm{D}\) under Glauber dynamics SA. We proceed to conjecture an upper bound on \(\mathcal{A}^\mathrm{D}\) based on analytical and numerical evidence.
Since the time evolution in SA [Eq. (20 )] is discrete, a discrete version of the activity is needed. We shall use a superscript \(\mathrm{D}\) to denote discrete, and write the discrete-time activity as \(\mathcal{A}^\mathrm{D}\). In Appendix 6, we show that any discrete-time Markov chain in Eq. (20 ) can always be simulated by a continuous-time master equation in Eq. (1 ). It is, therefore, natural to define the discrete-time activity as \[\mathcal{A}^\mathrm{D}(k) = \sum_i\sum_{j(\neq i)} P_{ij}(k+1) p_j(k). \label{eq:activity95D}\tag{23}\] which leads to the iteration-averaged activity \(\langle \mathcal{A}^\mathrm{D} \rangle_\kappa:= \frac{1}{\kappa}\sum_{k=0}^{\kappa-1}\mathcal{A}^\mathrm{D}(k)\). \(\langle \mathcal{A}^\mathrm{D} \rangle_\kappa\) equals the \(\langle \mathcal{A} \rangle_\tau\) of the corresponding continuous-time Markov chain (see Appendix 6) and can be interpreted in terms of the expected number of jumps between micro-states. The RHS of Eq. (23 ) is a sum over the probabilities of jumping from state \(i\) to \(j\neq i\), so we may write \(\mathcal{A}^\mathrm{D}=p(\mathrm{jump})\). The expected number of jumps over the whole process, by inspection, is \(\langle N_\mathrm{jumps}^\mathrm{D} \rangle=\sum_{k=0}^{\kappa-1} p(\mathrm{jump})\). Thus \[\langle N_\mathrm{jumps} ^\mathrm{D}\rangle = \langle \mathcal{A}^\mathrm{D} \rangle_\kappa\kappa. \label{eq:Njumpsdiscrete}\tag{24}\] Eq. (24 ) is the discrete-time analogue of Eq. (12 ). For the case of continuous time, it is natural to define \(\langle N_\mathrm{jumps} \rangle=\int_{t=0}^{\tau} \rho(\mathrm{jump}, t)dt\) where \(\rho(\mathrm{jump}, t)\), which can be termed the rate of jumping, is the probability density for changing microstates in the infinitesimal time interval \(dt\).
Substituting the general transition matrix \(P_{ij}(k)\) of Eq. (21 ) into Eq. 23 , we have \[\mathcal{A}^\mathrm{D}(k) = \sum_i\sum_{j\in\mathcal{N}(i)}\dfrac{1}{|\mathcal{N}(j)|}A_{ij}(k+1)p_j(k). \label{eq:A_D(k)}\tag{25}\] For simplicity, we denote \(|\mathcal{N}(j)| = n_j\) and ignore the step dependence of quantities, writing \(q(k) \equiv q\), where \(q\) is some quantity. We use prime \('\) to denote the quantity at the next step: \(q(k+1) \equiv q'\).
Note that the acceptance rate \(A_{ij}'\) defined in Eq. (22 ) has the property: \(A_{ij}' + A_{ji}' = 1\). We can therefore rewrite Eq. (25 ) as \[\begin{align} \mathcal{A}^\mathrm{D} &= \sum_i\sum_{j\in\mathcal{N}(i)}\dfrac{1}{n_j} A_{ij}'p_j \nonumber\\ &= \dfrac{1}{2}\sum_i\sum_{j\in\mathcal{N}(i)}\left(A_{ij}'\dfrac{p_j}{n_j} + A_{ji}'\dfrac{p_i}{n_i}\right) \nonumber\\ &= \dfrac{1}{2}\sum_i\sum_{j\in\mathcal{N}(i)} \dfrac{p_i}{n_i} + \dfrac{1}{2}\sum_i\sum_{j\in\mathcal{N}(i)}A_{ij}'\left(\dfrac{p_j}{n_j} - \dfrac{p_i}{n_i}\right) \nonumber\\ &= \dfrac{1}{2} - \dfrac{1}{4}\sum_i\sum_{j\in\mathcal{N}(i)}\left(A_{ij}'-A_{ji}'\right)\left(\dfrac{p_i}{n_i} - \dfrac{p_j}{n_j}\right) \nonumber\\ &= \dfrac{1}{2} - \dfrac{1}{4}\sum_i\sum_{j\in\mathcal{N}(i)}\dfrac{\gamma_i'-\gamma_j'}{\gamma_i'+\gamma_j'}\left(\dfrac{p_i}{n_i} - \dfrac{p_j}{n_j}\right). \end{align}\] Therefore, the activity, interpreted as the jumping probability, deviates from \(1/2\), the unbiased case, by a number relevant to the order of occupation probabilities for neighbouring states. More precisely, we can consider the activity at the thermal state by setting \(p_i = \gamma_i',\forall i\), which gives us \[\begin{align} \mathcal{A}_\gamma^\mathrm{D} &\equiv \sum_i\sum_{j\in\mathcal{N}(i)}\dfrac{1}{n_j} A_{ij}'\gamma'_j \nonumber\\ &= \dfrac{1}{2} - \dfrac{1}{4}\sum_i\sum_{j\in\mathcal{N}(i)}\dfrac{\gamma_i'-\gamma_j'}{\gamma_i'+\gamma_j'}\left(\dfrac{\gamma'_i}{n_i} - \dfrac{\gamma'_j}{n_j}\right).\label{eq:A95D95gamma95mid} \end{align}\tag{26}\] The thermal state, at infinite temperature, becomes uniform (unbiased), such that \(\mathcal{A}^\mathrm{D}_\gamma = 1/2\), while, at zero temperature, \(\mathcal{A}^\mathrm{D}_\gamma = 0\), meaning that any possible jumps are frozen. The analysis of the two extreme cases is intuitive and simple. However, the value of \(\mathcal{A}^\mathrm{D}_\gamma\) at the intermediate temperatures highly depends on the neighbourhood structure of the energy landscape, making the determination of \(\mathcal{A}_\gamma^\mathrm{D}\) quite complicated.
One common way to simplify the problem is by considering equal-sized neighbourhoods, i. e. \(n_i= n, \forall i,\) as in the Edwards-Anderson model [39] we shall introduce later. Now, by Eq. (26 ), we have \[\mathcal{A}_\gamma^\mathrm{D} = \dfrac{1}{2} - \dfrac{1}{4n}\sum_i\sum_{j\in\mathcal{N}(i)}\dfrac{(\gamma_i'-\gamma_j')^2}{\gamma_i'+\gamma_j'}\le 1/2.\] We conclude that, at least for fixed neighbourhood size, \(0\le\mathcal{A}_\gamma^\mathrm{D}\le 1/2\) during the annealing process where equality on both sides holds at zero and infinite temperatures, respectively.
There is significant evidence that \(\mathcal{A}^\mathrm{D}(k)\) also has a non-trivial upper bound in SA. In SA, the system, initially in a thermal state at some non-zero temperature, keeps evolving toward a cooler thermal state. It is thus reasonable to conjecture that \[\mathcal{A}^\mathrm{D}(k)\le1/2,\quad 0\le k\le \kappa. \label{eq:A95D95conjecture}\tag{27}\] Numerical evidence from simulations of a \(7\)-spin SK model (the formal definition of the SK model will be given in Sec. 4.6), is provided in Fig. ([fig:A_D]). One sees Eq. (27 ) is respected. Moreover, there is a decreasing trend of \(\mathcal{A}^\mathrm{D}(k)\) during the annealing, which coincides with previous numerical results [15] and could be a general feature of SA. As a consequence of the conjecture [Eq. (27 )], we have \(\langle \mathcal{A}^\mathrm{D} \rangle_\kappa\le 1/2\).
A further, intuitive, route to understanding conjecture [Eq. (27 )] makes use of the tendency for lower energy states to have a higher probability in thermal scenarios. For equal-sized neighborhoods, the activity \(\mathcal{A}^\mathrm{D}\) is given by \[\mathcal{A}^\mathrm{D} = \dfrac{1}{2} - \dfrac{1}{4n}\sum_i\sum_{j\in\mathcal{N}(i)}\dfrac{(\gamma_i'-\gamma_j')(p_i-p_j)}{\gamma_i'+\gamma_j'}. \label{eq:A_D_explicit}\tag{28}\] Adopting the concept of passive states where the lower energy state has the higher probability of occupation [40]–[44], we could assume the passivity in neighborhoods and define the neighborhood-passivity \(\mathcal{P}_i\) for state \(i\), \[\begin{align} \mathcal{P}_i &:= \sum_{j\in\mathcal{N}(i)}\dfrac{(\gamma_i'-\gamma_j')(p_i-p_j)}{\gamma_i'+\gamma_j'} \nonumber\\ & = \sum_{j\in\mathcal{N}(i)}\tanh\left[\frac{\beta'}{2}(E_j-E_i)\right](p_i-p_j). \end{align}\] Note that if the two neighboring states \(p_i\) and \(p_j\) respect passivity, \(\tanh\left[\beta'(E_j-E_i)/2\right](p_i-p_j)\ge 0\). The passivity in neighborhoods thus requires \(\mathcal{P}_i\ge 0, \forall i\). Once this assumption holds during the SA, by Eq. (28 ), we have \(\mathcal{A}^\mathrm{D}\le 1/2\) for every step \(k\). Consequently, \(\langle \mathcal{A}^\mathrm{D} \rangle_\kappa\le 1/2\). This assumption can be justified by noticing that all thermal states are passive. Thus, as the system initially staying in a thermal state evolves towards a new thermal state in SA, we expect the passivity at least in the neighbourhoods to be preserved. As is shown in Fig. (2), \(\mathcal{P}_i\ge 0, \forall i\), for three different kinds of cooling schedules, which verifies our passivity assumption and gives new numerical evidence for the conjecture of \(\mathcal{A}^\mathrm{D}(k)\le1/2\), and hence \(\langle \mathcal{A}^\mathrm{D} \rangle_\kappa\le 1/2\).
In this subsection, we will bound the relative entropy change from temperature variation, \(\mathcal{I}\). In SA, \(\mathcal{I}\) is evaluated by a discrete sum: \[\mathcal{I}(\kappa) = \sum_{k=1}^{\kappa}\left[E_p(k)-E_\gamma(k)\right]\left[\beta(k)-\beta(k-1)\right], \label{eq:I95kappa}\tag{29}\] which is shown to be equal to the \(\mathcal{I}(\tau)\) of the underlying continuous-time process that simulates the discrete-time evolution in Appendix 6. Since \(E_\gamma(k)\) is the energy expectation of the thermal state \(\gamma\), although it apparently acquires the \(k\)-dependence from a cooling schedule \(T(k)\), it is intrinsically a fixed function of temperature for a specific energy landscape. In this sense, we regard it as a problem-specific but dynamics-free quantity. In contrast, the average energy \(E_p(k)\) requires information on the instantaneous system state \(p(k)\). To access it, one has to solve the dynamics in Eq. (20 ), which is generally difficult and is the main obstacle to analyzing the performance of SA. In the following, we will try to tackle it by providing an upper bound to \(E_p(k)\) in terms of \(E_\gamma(k)\). This is done via the use of a partial swap (PS) model, a simple thermalization protocol where the system state is swapped to the thermal state with a probability at each step [28], [45], [46]. We shall use the superscript PS to denote quantities in the partial swap model. By showing that \(E_p(k)\le E_p^\mathrm{PS}(k)\) for all \(k\), we will derive a bound of \(\mathcal{I}(\kappa)\) as \(\mathcal{I}(\kappa)\le\mathcal{I}^\mathrm{PS}(\kappa)\).
The evolution of the state of the system \(p^\mathrm{PS}\) in a partial swap model is given by \[p_i'^\mathrm{PS}= p_i^\mathrm{PS}+ \mu^\mathrm{PS}(\gamma_i'^\mathrm{PS}- p_i^\mathrm{PS}), \label{eq:PS_model}\tag{30}\] where \(\mu^\mathrm{PS}\) is the so-called partial swap rate and the time dependence has been neglected and the prime \('\) denotes the next time step. Multiplying \(E_i\) on both sides and summing over \(i\) gives \[E_p'^\mathrm{PS}= E_p^\mathrm{PS}+ \mu^\mathrm{PS}(E_\gamma'^\mathrm{PS}- E_p^\mathrm{PS}), \label{eq:E95PS95model}\tag{31}\] and therefore, \[\mu^\mathrm{PS}= \dfrac{E_p'^\mathrm{PS}-E_p^\mathrm{PS}}{E_\gamma'^\mathrm{PS}-E_p^\mathrm{PS}}.\] Similarly, we can define a relaxation rate \(\mu\) for the actual process in SA as \[\mu := \dfrac{E_p' - E_p}{E_\gamma' - E_p}. \label{eq:mu95def}\tag{32}\] This relaxation rate has been employed in analysing the minimal dissipation and designing the optimal cooling schedule in SA [47]–[49]. Note that by writing in this form, we require a positive \(\mu\). Since in SA, \(E_p\ge E_\gamma \ge E_\gamma'\), we need the assumption that \(E_p \ge E_p'\). Such descending energy expectation has been witnessed in most uses of SA [15], [50], [51]. \(\mu^\mathrm{PS}\) and \(\mu\) quantify the relaxation speeds of PS and the actual processes in terms of the energy differences, respectively. Intuitively, \(\mu\ge\mu^\mathrm{PS}\) implies that the corresponding PS process is slower than the actual one in SA. This can be formulated by the following proposition.
Proposition 1. Consider that two identical systems initially in the same thermal state are annealed according to the same cooling schedule. Their relaxation dynamics are given by Eq. (20 ) and Eq. (30 ), respectively. If \(\mu\ge\mu^\mathrm{PS}\) during the annealing, \(E_p\le E_p^\mathrm{PS}\) holds all the time.
Proof. Since the two processes correspond to the same energy landscape and cooling schedule, we have \(E_\gamma'^\mathrm{PS}= E_\gamma'\). Let \(\mu\ge\mu^\mathrm{PS}\), namely, \[\dfrac{E_p'-E_p}{E_\gamma'-E_p} \ge \dfrac{E_p'^\mathrm{PS}-E_p^\mathrm{PS}}{E_\gamma'-E_p^\mathrm{PS}}. \label{eq:compare95mu}\tag{33}\] As a partial swap rate, \(\mu^\mathrm{PS}\le 1\). The insufficient thermalization leads to \(E_p'^\mathrm{PS}\ge E_\gamma'\), and therefore, \(\mu^\mathrm{PS}\) is a monotonically increasing function of \(E_p^\mathrm{PS}\). If \(E_p^\mathrm{PS}\ge E_p\), \[\dfrac{E_p'^\mathrm{PS}-E_p^\mathrm{PS}}{E_\gamma'-E_p^\mathrm{PS}} \ge \dfrac{E_p'^\mathrm{PS}-E_p}{E_\gamma'-E_p},\] and thus, by Eq. (33 ), \[\dfrac{E_p'-E_p}{E_\gamma'-E_p} \ge \dfrac{E_p'^\mathrm{PS}-E_p}{E_\gamma'-E_p}.\] In SA, \(E_p \ge E_\gamma \ge E_\gamma'\). The above inequality gives us \(E_p'^\mathrm{PS}\ge E_p'\). With the initial condition \(E_p^\mathrm{PS}(0) = E_p(0)\), by induction, we proved that \(\mu\ge\mu^\mathrm{PS}\) leads to \(E_p^\mathrm{PS}\ge E_p\) all the time. ◻
Knowing that \(\mu\ge\mu^\mathrm{PS}\) does give us an upper bound on \(E_p\), we further note that one can evaluate \(E_p^\mathrm{PS}\) iteratively in Eq. (31 ) by using \(E_\gamma\) only. Replacing \(E_p(k)\) with \(E_p^\mathrm{PS}(k)\) in Eq. (29 ), we thus find the desired bound \(\mathcal{I}(\kappa)\le\mathcal{I}^\mathrm{PS}(\kappa)\).
Now the question is how to find such a \(\mu^\mathrm{PS}\) less than \(\mu\) when \(\mu\) is not accessible (since its calculation still requires \(E_p\)). Recall that \(\mu\) gauges the speed of the relaxation dynamics. The relaxation time \(\tau_\mathrm{rel}(P) \equiv 1/(1-\lambda_2(P))\) of the discrete Markov chain [31], defined by the second largest eigenvalue of the transition matrix \(P\), \(\lambda_2(P)\), should be consistent with \(\mu\). Therefore, we conjecture that the order of relaxation times of the two processes corresponding to \(\mu\) and \(\mu^\mathrm{PS}\) is the inverse order of \(\mu\) and \(\mu^\mathrm{PS}\), i. e. \[\tau_\mathrm{rel}\le\tau_\mathrm{rel}^\mathrm{PS}\Rightarrow \mu\ge\mu^\mathrm{PS}. \label{eq:ordering95conjecture}\tag{34}\] With this conjecture, we can compare \(\lambda_2(P^\mathrm{PS})\) and \(\lambda_2(P)\) of the transition matrices \(P^\mathrm{PS}\) and \(P\) to find a condition on \(\lambda_2(P^\mathrm{PS})\). The desired \(\mu^\mathrm{PS}\) can therefore be derived from \(\lambda_2(P^\mathrm{PS})\) by the following proposition.
Proposition 2. For a partial swap model given by Eq. (30 ), \(\lambda_2(P'^\mathrm{PS}) = 1-\mu^\mathrm{PS}\).
Proof. Let \(N\) be the total number of states. The transition matrix \(P'^\mathrm{PS}\) for such a PS model can be written explicitly as follow: \[\begin{align} P'^{\mathrm{PS}} &= \begin{pmatrix} 1-\mu+\mu\gamma'_1 & \mu\gamma'_1 & \cdots & \mu\gamma'_1 \\ \mu\gamma'_2 & 1-\mu+\mu\gamma'_2 & \cdots & \mu\gamma'_2\\ \vdots & \vdots & \ddots & \vdots \\ \mu\gamma'_N & \mu\gamma'_N & \cdots & 1-\mu+\mu\gamma'_N\\ \end{pmatrix} \nonumber\\ &= (1-\mu) \begin{pmatrix} 1 & & \\ & \ddots &\\ & & 1\\ \end{pmatrix} +\mu \begin{pmatrix} \gamma'_1 & \cdots & \gamma'_1\\ \gamma'_2 & \cdots & \gamma'_2\\ \vdots & & \vdots\\ \gamma'_N & \cdots & \gamma'_N\\ \end{pmatrix}, \end{align}\] where we have neglected the superscript ‘\(\mathrm{PS}\)’ of the matrix elements. Note that the second matrix on the right-hand side has a rank of \(1\). By the rank-nullity theorem, for a matrix \(T\), \(\mathrm{Rank}(T) + \mathrm{Nullity}(T) = \mathrm{dim}(T)\), where \(\mathrm{Nullity}(T) = \mathrm{dim}\{\vec{v}|T\vec{v} = 0\}\), the nullity of the second matrix is \(N-1\), i. e. it has the eigenvalue of \(0\) with multiplicity of \(N-1\) and the other eigenvalue is \(1\) given by its trace. Therefore, the eigenvalues of \(P'^{\mathrm{PS}}\) are \(\lambda_1(P'^{\mathrm{PS}}) = 1\) and \(\lambda_2(P'^{\mathrm{PS}}) = \dots = \lambda_N(P'^{\mathrm{PS}}) =1-\mu^\mathrm{PS}\). ◻
For the transition matrix \(P\) in SA [Eq. (21 )], the second largest eigenvalue \(\lambda_2(P)\) can be bounded as [52] \[\lambda_2(P) \le 1 - \dfrac{\omega_{\min}}{\gamma_{\max}}\lambda_2(Q), \label{eq:lambda_2_P_bound}\tag{35}\] where \(\omega_{\min} = \min_{i,j\in\mathcal{N}(i)} P_{ij}\gamma_j\) and \(\gamma_{\max} = \max_i \gamma_i\). Here, \(Q\) is the Laplacian matrix associated with the state space [52], \[Q_{ij} =\left\{ \begin{align} &|\mathcal{N}(i)|\quad & \text{if }i=j,\\ &-1 & \text{if } j\in \mathcal{N}(i),\\ &0 & \text{otherwise,} \end{align} \right.\] and \(\lambda_2(Q)\) is the second smallest eigenvalue of \(Q\). By Eq. (21 ), \[\omega_{\min} = \min_{i,j\in \mathcal{N}(i)}\left(\frac{1}{|\mathcal{N}(j)|}\dfrac{\gamma_i\gamma_j}{\gamma_i+\gamma_j}\right)\ge \frac{1}{\mathcal{N}_{\max}}\dfrac{\gamma_{\min}}{2},\] where \(\mathcal{N}_{\max} = \max_i |\mathcal{N}(i)|\) and \(\gamma_{\min} = \min_i \gamma_i\). Substituting the above into Eq. (35 ), we have \[\lambda_2(P) \le 1 - \dfrac{\lambda_2(Q)}{2\mathcal{N}_{\max}}\exp(-\beta\Delta E_{\max}).\] Since we want PS to be a slower process than SA, i. e. \(\tau_\mathrm{rel}(P)\le\tau_\mathrm{rel}(P^\mathrm{PS})\) followed by \(\lambda_2(P)\le\lambda_2(P^\mathrm{PS})\), we can simply set \(\lambda_2(P'^\mathrm{PS}) = 1 - \mu^\mathrm{PS}= 1 - \dfrac{\lambda_2(Q)}{2\mathcal{N}_{\max}}\exp(-\beta'\Delta E_{\max})\), which gives us \[\mu^\mathrm{PS}(k) = \dfrac{\lambda_2(Q)}{2\mathcal{N}_{\max}}\exp[-\beta(k+1)\Delta E_{\max}], \label{eq:mu95PS95general}\tag{36}\] where we have recovered the \(k\)-dependence for further uses. By the conjecture [Eq. (33 )], such a \(\mu^\mathrm{PS}(k)\) guarantees \(\mu(k)\ge\mu^\mathrm{PS}(k)\). Hence, it leads to \(\mathcal{I}(\kappa)\le\mathcal{I}^\mathrm{PS}(\kappa)\) as we discussed.
For a concrete example of the \(\mu^\mathrm{PS}(k)\) of Eq. (36 ), consider an \(n\)-spin Edwards-Anderson (EA) model [39], a generalisation of the SK model with the Hamiltonian \[H_{\mathrm{EA}} = \sum_{\langle k,l\rangle}g_{kl}s_ks_l, \label{eq:EA95Hamiltonian}\tag{37}\] where each spin \(s_k\in\{-1,+1\}\) and \(\langle k,l \rangle\) denotes the set of pairs of spins \(s_k\) and \(s_l\) having non-vanishing couplings \(g_{kl}\). Here, we do not consider the self-energy terms, i. e. \(g_{kk} = 0, \forall k\). The system state \(i\) is a vector representing the spin configuration \(\vec{s}^{(i)} = \{s_1^{(i)},\dots,s_n^{(i)}\}\). The neighbors of \(i\) are configurations with only one spin orienting oppositely with respect to \(\vec{s}^{(i)}\). In other words, every neighborhood has the same size, i. e. \(\mathcal{N}_{\max} = n\). Given that the total number of states is \(2^n\), the state space is an \(n\)-dimensional hypercube, whose \(\lambda_2(Q) = 2\) [52]. In this case, Eq. (36 ) becomes \[\mu^\mathrm{PS}(k) = \dfrac{1}{n}\exp[-\beta(k+1)\Delta E_{\max}]. \label{eq:mu95PS95SK}\tag{38}\] We will use this \(\mu^\mathrm{PS}(k)\) to bound our numerical results in Sec. 4.6.
We now derive the performance bound for the SA we considered in this section.
We define the entropy production in the discrete-time Markov chain, denoted by \(\Sigma(\kappa)\), as the entropy production in the corresponding continuous-time Markov chain. The detailed justification is given in Appendix 6, with the key results listed below. The discrete-time analogue of the speed limit in Eq. (16 ) is given by \[\Sigma(\kappa) \ge \dfrac{(L_0-L_\kappa)^2}{2\langle \mathcal{A}^\mathrm{D} \rangle_\kappa\kappa}.\] Accordingly, Eq. (17 ) becomes \[L_\kappa^2 \le -\frac{(L_0 - L_\kappa)^2}{\langle \mathcal{A}^\mathrm{D}\rangle_\kappa\kappa} + 2\mathcal{I}(\kappa),\] Substituting the conjectured bounds \(\langle \mathcal{A}^\mathrm{D} \rangle_\kappa\le 1/2\) and \(\mathcal{I}(\kappa)\le\mathcal{I}^\mathrm{PS}(\kappa)\) with the partial swap rate \(\mu^\mathrm{PS}(k)\) in Eq. (36 ), we have \[L_\kappa^2 \le - \dfrac{2(L_0-L_\kappa)^2}{\kappa} + 2\mathcal{I}^\mathrm{PS}(\kappa).\] Solving this inequality for \(L_\kappa\), we have the performance bound of SA, \[L_\kappa \le \dfrac{2L_0 + \sqrt{2\kappa\left(-L_0^2 + \mathcal{I}^\mathrm{PS}(\kappa)\left(\kappa + 2\right)\right)}}{\kappa+2}. \label{eq:SA95bound}\tag{39}\] Compared with the discrete-time analogy of the bound Eq. (18 ) (see Appendix 6), \[L_\kappa \le \dfrac{L_0 + \sqrt{\langle \mathcal{A}^\mathrm{D}\rangle_\kappa\kappa\left(-L_0^2 + 2\mathcal{I}(\kappa)\left(\langle \mathcal{A}^\mathrm{D}\rangle_\kappa\kappa + 1\right)\right)}}{\langle \mathcal{A}^\mathrm{D}\rangle_\kappa\kappa+1}, \label{eq:main95bound95discrete}\tag{40}\] Eq. (39 ) replaces the relative entropy term \(\mathcal{I}\) with \(\mathcal{I}^\mathrm{PS}\) associated with a partial swap model. Evaluating \(\mathcal{I}^\mathrm{PS}(\kappa)\) does not require solving the evolution of \(p(k)\) [Eq. (20 )] but only the information on \(E_\gamma(k)\), which can be obtained by knowing the energy spectrum of the system.
For specific models, the energy spectrum or the probability distribution of energy levels may be known, from which one can construct the partition function \(Z\) and find \(E_\gamma\) by \(E_\gamma = -\partial \ln Z / \partial \beta\).
For systems with a priori unknown energy levels, one can sample the energy landscape to estimate the spectrum. The computational resources required in evaluating the bound on the error of the solution are also then in fact much lower than the resources required for evaluating the error \(L_\kappa\) via SA. To find \(L_\kappa\), the distance between the statistical state \(p(\kappa)\) and the equilibrium statistical state \(\gamma(\kappa)\), directly through SA, many trajectories are indeed required, in order to obtain \(p(k)\) at the \(k\)th step. A key advantage of Eq. (39 ), is to replace the need to evaluate that distance with the need to know the energy spectrum. Given a state space of size \(N\), one can find the energy landscape by the order of \(N\) calls of the energy function. In contrast, for a faithful estimation of \(p(k)\) via SA, the number of trajectories \(N_{\mathrm{traj}}\gg N\) and each Monte Carlo time step for all trajectories needs \(N_{\mathrm{traj}}\) calls of the energy function. Therefore, quantifying the computational cost by the number of calls of the energy function, evaluating our Eq. (39 ) is computationally much cheaper than performing SA to evaluate the error.
In the following, we exemplify the calculation of the bound for 1D Ising chains with known energy spectrum in Sec. 4.5, and study the tightness of the bound by numerical simulation on SK models in Sec. 4.6.
In order to show how this bound can be analytically calculated, we consider the SA on a 1D \(n\)-spin Ising chain, which is a simplified EA model, with the Hamiltonian given by [53] \[H_{\mathrm{1D}} = -J\sum_{i=1}^{n}s_{i}s_{i+1},\] where \(s_i\in\{-1,+1\}\), \(J\) is the coupling constant and the periodic boundary condition is applied, i. e. \(s_{n}s_{n+1} \equiv s_ns_1\). For even \(n\), the partition function \(Z_\mathrm{1D}\) at the inverse temperature \(\beta\) can be calculated as [53] \[Z_\mathrm{1D} = 2^n \left[\cosh\left(\beta J\right)^n+\sinh\left(\beta J\right)^n\right].\] The equilibrium energy \(E_{\gamma,\mathrm{1D}}\) is therefore
\[\begin{align} E_{\gamma,\mathrm{1D}} &= -\frac{\partial \ln Z_\mathrm{1D}}{\partial \beta} &= -\frac{Jn\cosh\left(\beta J\right)\sinh\left(\beta J\right)\left[\cosh\left(\beta J\right)^{n-2}+\sinh\left(\beta J\right)^{n-2}\right]}{\cosh\left(\beta J\right)^n+\sinh\left(\beta J\right)^n}. \end{align}\]
Using the partial swap model [Eq. (31 )], one can calculate \(E^\mathrm{PS}_{p,\mathrm{1D}}(k)\equiv E^\mathrm{PS}_{p,\mathrm{1D}}(\beta(k))\) iteratively from \(E_{\gamma,\mathrm{1D}}(k)\equiv E_{\gamma,\mathrm{1D}}(\beta(k))\) and find \(\mathcal{I}^\mathrm{PS}_\mathrm{1D}(\kappa)\) by Eq. (29 ). On the other hand, given the energy landscape, the initial distance \(L_0 = \sum_i|\gamma_i(0)-\gamma_i(\kappa)|\) is known. The bound Eq. (39 ) can thus be evaluated for this model.
Apart from evaluating Eq. (39 ) exactly, we can also deduce the scaling of \(L_\kappa\) for large \(\kappa\). We consider the normal cooling schedules where the rate of varying temperature decreases with \(\kappa\). Therefore, when \(\kappa\) gets large, \(\mathrm{d}k\equiv 1\) and \(\mathrm{d}E^\mathrm{PS}_{p,\mathrm{1D}}(k) \equiv E^\mathrm{PS}_{p,\mathrm{1D}}(k+1) - E^\mathrm{PS}_{p,\mathrm{1D}}(k)\) are relatively small. Eq. (31 ) can be written as a differential equation: \[\dot{E}^\mathrm{PS}_{p,\mathrm{1D}}(k) = -\mu^\mathrm{PS}_{\mathrm{1D}}(k)\left[E^\mathrm{PS}_{p,\mathrm{1D}}(k) - E_{\gamma,\mathrm{1D}}(k)\right],\] where \(\dot{x}(k) \equiv \mathrm{d} x / \mathrm{d}k\). This equation can be solved with the initial condition \(E^\mathrm{PS}_{p,\mathrm{1D}}(0) = E_{\gamma,\mathrm{1D}}(0)\) by (See Supplementary Material of Ref. [21]) \[E^\mathrm{PS}_{p,\mathrm{1D}}(k) = E_{\gamma,\mathrm{1D}}(k) -\int_0^k\mathrm{e}^{-\int_s^k\mu^\mathrm{PS}_{\mathrm{1D}}(s')\mathrm{d}s'}\dot{E}_{\gamma,\mathrm{1D}}(s)\mathrm{d}s. \label{eq:E_PS_1D}\tag{41}\] Accordingly, for large \(\kappa\), \(\mathcal{I}^\mathrm{PS}_\mathrm{1D}(\kappa)\) calculated by Eq. (29 ) can be written as an integral in the similar form as Eq. (7 ), \[\begin{align} \mathcal{I}^\mathrm{PS}_\mathrm{1D}(\kappa) &= \int_0^\kappa \left[E^\mathrm{PS}_{p,\mathrm{1D}}(k) - E_{\gamma,\mathrm{1D}}(k)\right] \dot{\beta}(k)\mathrm{d}k\\ &= -\int_0^\kappa \int_0^k\mathrm{e}^{-\int_s^k\mu^\mathrm{PS}_{\mathrm{1D}}(s')\mathrm{d}s'}\dot{E}_{\gamma,\mathrm{1D}}(s)\dot{\beta}(k) \mathrm{d}s \mathrm{d}k, \end{align}\] where the second line is from the substitution of Eq. (41 ). We note that \[\begin{align} \dot{E}_\gamma &= \sum_i E_i \dot{\gamma}_i = \sum_i E_i(E_\gamma-E_i)\gamma_i\dot{\beta} = - \sigma(E)^2_\gamma\, \dot{\beta}, \end{align}\] where the second equality used \(\dot{\gamma}_i = (E_\gamma-E_i)\gamma_i\dot{\beta}\) [Eq. (6 )] and \(\sigma(E)^2_\gamma\equiv -\sum_i E_i(E_\gamma-E_i)\gamma_i\) is the energy variance for the thermal state \(\gamma\). \(\mathcal{I}^\mathrm{PS}_\mathrm{1D}(\kappa)\) can therefore be written as
\[\mathcal{I}^\mathrm{PS}_\mathrm{1D}(\kappa) = \int_0^\kappa \int_0^k\exp\!\left[-\int_s^k\mu^\mathrm{PS}_{\mathrm{1D}}(s')\mathrm{d}s'\right]\sigma_\mathrm{1D}(E)^2_\gamma(s)\dot{\beta}(s)\dot{\beta}(k) \mathrm{d}s \mathrm{d}k, \label{eq:I95PS951D951}\tag{42}\] where \[\sigma_\mathrm{1D}(E)^2_\gamma =\frac{\partial^2 \ln Z_{\mathrm{1D}}}{\partial \beta^2} = \frac{4J^2n\left[\cosh\left(\beta J\right)^{2n}\sinh\left(\beta J\right)^2 - \cosh\left(\beta J\right)^{2}\sinh\left(\beta J\right)^{2n}+(n-1)\left(\cosh\left(\beta J\right)\sinh\left(\beta J\right)\right)^n\right]}{\left[\cosh\left(\beta J\right)^n + \sinh\left(\beta J\right)^{n}\right]^2\sinh\left(2 \beta J\right)^2}.\] For large \(\beta J\), \(\cosh\left(\beta J\right)\simeq\sinh\left(\beta J\right)\) and we have \[\sigma_\mathrm{1D}(E)^2_\gamma \simeq \frac{J^2 n(n-1)}{\sinh\left(2 \beta J\right)^2} \simeq 4 J^2 n(n-1)\mathrm{e}^{-4\beta J}.\] Consequently, by Eq. (42 ), \[\begin{align} \mathcal{I}_{1D}^\mathrm{PS}(\kappa) &\simeq 4 J^2 n(n-1)\int_0^\kappa\int_0^k \exp\!\left[-\int_s^k \frac{1}{n}\exp[-2\beta(s')nJ]\mathrm{d}s'\right]\mathrm{e}^{-4\beta(s) J} \dot{\beta}(s)\dot{\beta}(k)\mathrm{d}s\mathrm{d}k. \end{align} \label{eq:I95PS951D952}\tag{43}\]
where as we suggested in Eq. (38 ), \(\mu^\mathrm{PS}_{\mathrm{1D}}(k)\) is chosen as \(\mu^\mathrm{PS}_{\mathrm{1D}}(k) \simeq \exp[-2\beta(k)nJ]/n\) with \(\Delta E_{\max} = 2nJ\). Eq. (43 ) hence provides a systematic way to calculate \(\mathcal{I}_{1D}^\mathrm{PS}(\tau)\) for any annealing schedule \(\beta(k)\).
For example, Consider the cooling schedule: \(\beta(k) = \beta_\mathrm{i} + (\beta_\mathrm{f} - \beta_\mathrm{i}) k/\kappa\), and use the lowest partial swap rate \(\bar{\mu}\equiv \exp[-2\beta_\mathrm{f}nJ]/n\le\mu^\mathrm{PS}_{\mathrm{1D}}(k), \forall k\), for simplicity. We then have
\[\mathcal{I}_{1D}^\mathrm{PS}(\kappa) \lesssim \frac{4 J^2 n(n-1)\mathrm{e}^{-4\beta_\mathrm{i}J}(\beta_\mathrm{f} - \beta_\mathrm{i})^2}{-4J(\beta_\mathrm{f}-\beta_\mathrm{i}) + \bar{\mu}\kappa}\left[\frac{1-\mathrm{e}^{-4J(\beta_\mathrm{f}-\beta_\mathrm{i})}}{4J(\beta_\mathrm{f}-\beta_\mathrm{i})} + \frac{-1 + \mathrm{e}^{-\bar{\mu}\kappa}}{\bar{\mu}\kappa}\right].\]
Choosing \(J = 1/n\) to fix the energy range \(\Delta E_{\max} = 2\), we find \(\mathcal{I}_{1D}^\mathrm{PS}(\kappa) \simeq \mathcal{O}(n\kappa^{-1})\). Substituting it into the bound Eq. (39 ), the scaling of the error \(L_\kappa\) is given as \(L_\kappa \simeq \mathcal{O}(n^{1/2}\kappa^{-1/2})\). This result shows how the annealing performance for the 1D Ising chain is influenced by the system and protocol parameters, for this specific cooling schedule, which to our knowledge has not been derived before.
To verify the tightness of our bound in Eq. (39 ), we adopt an \(n\)-spin SK model to conduct SA. The Hamiltonian of an \(n\)-spin SK model is defined as [1], [35] \[H_{\mathrm{SK}} = \dfrac{1}{n^{3/2}}\sum_{k=1}^{n}\sum_{l=1}^{n}g_{kl}s_ks_l,\] where \(s_k\in\{-1,+1\}\) and \(\{g_{kl}\}_{k,l=1}^{n}\) is a collection of independent and identically distributed standard Gaussian random variables. The normalization factor \(1/n^{3/2}\) is chosen to ensure that the maximum energy has the order of magnitude of 1 [35]. The merit of this choice is that one can use the famous result of the SK model that \(\lim_{n\rightarrow\infty}\mathbb{E}(H_{\mathrm{SK},\max}) \approx 0.76\), where \(\mathbb{E}(\cdot)\) denotes the Gaussian expectation [54], [55]. The partial swap rate in Eq. (38 ) is also applicable to the SK model. We thus have a nice estimation of \(\Delta E_{\max}\) in the expression of \(\mu^\mathrm{PS}(k)\) in the thermodynamic limit \(n\rightarrow\infty\), i. e. \[\lim_{n\rightarrow\infty}\mathbb{E}(\Delta E_{\max}) = \lim_{n\rightarrow\infty}2\mathbb{E}(H_{\mathrm{SK},\max}) \approx 1.52.\] This result with Eq. (38 ) provides a faithful way to find an appropriate \(\mu^\mathrm{PS}(k)\) only in terms of cooling schedules for an SK model with large \(n\).
In Fig. ([fig:result_SK]), we present the numerical results of SA conducted on a \(7\)-spin SK model, in which the temperature is inverse-linearly decreased from \(T_\mathrm{i} = 1.8\) to \(T_\mathrm{f} = 0.8\) in \(\kappa\) steps. The left plot validates that both evolution history-dependent or -independent bounds [Eq. (40 ) and Eq. (39 )] provide upper limits for the performance error \(L_\kappa\) of SA. To reveal the \(\kappa\)-dependence of the annealing-related quantity, the relative entropy from temperature descending \(\mathcal{I}(\kappa)\), bounds from both directions are presented in the right plot, where \(\mathcal{I}^\mathrm{PS}(\kappa)\) is calculated by the method suggested in Sec. 4.3 and the lower bound is given by the trade-off relation in Eq. (19 ) with \(\langle \mathcal{A}^\mathrm{D} \rangle_\kappa\le 1/2\). Notably, the upper and lower bounds of \(\mathcal{I}(\kappa)\) are saturated for fast (small \(\kappa\)) and slow (large \(\kappa\)) cooling, respectively, demonstrating the tightness of the two bounds.
For comparison, a similar simulation is provided in the accompanying paper [21], which adopts linear cooling schedules.
From the previous discussion, we have seen that the choice of \(\mu^\mathrm{PS}(k)\) in Eq. (38 ) does give us an evolution history-independent bound for the example of an SK model. However, as is shown in Fig. ([fig:result_SK]), there is still a gap between the history-independent bound in Eq. (39 ) and the history-dependent bound in Eq. (40 ). A natural question to ask is whether Eq. (38 ) is an optimal choice of \(\mu^\mathrm{PS}(k)\) that satisfies \(\mu(k)\ge\mu^\mathrm{PS}(k)\). Can we find the largest \(\mu^\mathrm{PS}(k)\) allowed? In this section, we will pursue this possibility further by looking at the relaxation rate \(\mu(k)\) defined in Eq. (32 ) in more detail.
Referring to the time evolution of SA in Eq. (20 ), we have (again omitting the arguments \(k\) and using \('\) to label quantities at \(k+1\) step) \[p_i'= p_i + \dfrac{1}{n}\sum_{j\in\mathcal{N}(i)}\left(A_{ij}'p_j - A_{ji}'p_i\right),\] where we have considered the equal-sized neighborhoods, \(|\mathcal{N}(i)| = n, \forall i\), and the acceptance rate \(A_{ij}\) is given by Eq. (22 ). Thus, \(E_p'\) is given by \[\begin{align} E_p' &= \sum_i E_i p_i' \nonumber\\ &= E_p + \dfrac{1}{n}\sum_i\sum_{j\in\mathcal{N}(i)}\left(E_iA_{ij}'p_j - E_iA_{ji}'p_i\right) \nonumber\\ &= E_p + \dfrac{1}{n}\sum_i\sum_{j\in\mathcal{N}(i)}\left(E_j-E_i\right)A_{ji}'p_i \nonumber\\ &= E_p + \dfrac{1}{n}\sum_i\sum_{j\in\mathcal{N}(i)}\Delta E_{ji}\dfrac{\gamma_j' p_i}{\gamma_i' + \gamma_j'}, \label{eq:E_p_prime} \end{align}\tag{44}\] where we have defined \(\Delta E_{ji} \equiv E_j-E_i\). Then \[E_p' - E_p = \dfrac{1}{n}\sum_i\sum_{j\in\mathcal{N}(i)}\Delta E_{ji}\dfrac{\gamma_j'}{\gamma_i'+\gamma_j'}(p_i-\gamma_i'). \label{eq:E_p_prime_E_p}\tag{45}\] To relate it with \(\mu\), we want to rewrite the right-hand side of Eq. (45 ) in terms of \(E_\gamma' - E_p\). The following proposition is thus considered.
Proposition 3. For an \(n\)-spin EA model with the Hamiltonian defined in Eq. (37 ), the following holds: \[\sum_{j\in\mathcal{N}(i)}\Delta E_{ji} = - 4E_i. \label{eq:sum_Delta_E}\qquad{(1)}\]
Proof. We rewrite the Hamiltonian in Eq. (37 ) as \[\begin{align} H_{\mathrm{EA}} &= \sum_{\langle k,l\rangle}g_{kl}s_ks_l \nonumber\\ & = \dfrac{1}{2}\sum_{k=1}^{n}\sum_{l\in\mathcal{C}(k)}g_{kl}s_ks_l \nonumber\\ & \equiv\dfrac{1}{2}\sum_{k=1}^{n}s_kh_k, \end{align}\] where \(\mathcal{C}(k)\) is the set of spins interacting with spin \(s_k\) and \(h_k \equiv \sum_{l\in\mathcal{C}(k)}g_{kl}s_l\) is the effective field applied on \(s_k\). The factor \(1/2\) is from double-counting. Now consider a spin configuration (state \(i\)) \(\vec{s}^{(i)} = \{s_1^{(i)},\dots,s_n^{(i)}\}\) with energy \(E_i = \sum_{k=1}^{n} s_k^{(i)}h_k^{(i)} / 2\). A neighboring state \(j\) can be generated by flipping the \(j\)-th spin in \(\vec{s}^{(i)}\), i. e. \(\vec{s}^{(j)} = \{s_1^{(i)},\dots,-s_j^{(i)}, \dots, s_n^{(i)}\}\). Therefore, the energy difference \(\Delta E_{ji}\) is \[\begin{align} \Delta E_{ji} &= E_j - E_i \nonumber\\ &= E_{\mathrm{rest}, j} - s_j^{(i)}h_j^{(i)} - \left(E_{\mathrm{rest}, j} + s_j^{(i)}h_j^{(i)}\right) \nonumber\\ &= -2s_j^{(i)}h_j^{(i)}, \end{align}\] where \(E_{\mathrm{rest}, j}\) is the energy not involving the interactions with the spin \(s_j\). Summing over all neighbors of \(i\), we find that \[\sum_{j\in\mathcal{N}(i)}\Delta E_{ji} = \sum_{j=1}^{n} -2s_j^{(i)}h_j^{(i)} = - 4E_i.\] ◻
As a result of this proposition, we have \[E_\gamma'-E_p = \dfrac{1}{4}\sum_i\sum_{j\in\mathcal{N}(i)}\Delta E_{ji}(p_i-\gamma_i').\] Therefore, Eq. (45 ) can be written as
\[\begin{align} E_p' - E_p &= \dfrac{1}{n}\sum_i\sum_{j\in\mathcal{N}(i)}\dfrac{\Delta E_{ji}}{2}(p_i-\gamma_i') + \dfrac{1}{n}\sum_{i}\sum_{j\in\mathcal{N}(i)}\Delta E_{ji}\left(\dfrac{\gamma_j'}{\gamma_i'+\gamma_j'} - \dfrac{1}{2}\right)(p_i-\gamma_i')\nonumber\\ &= \dfrac{2}{n}(E_\gamma'-E_p) - \dfrac{1}{n}\sum_{i}\sum_{j\in\mathcal{N}(i)}\dfrac{\Delta E_{ji}}{2}\tanh\left(\beta'\dfrac{\Delta E_{ji}}{2}\right)(p_i-\gamma_i'). \label{eq:E_p_prime_E_p_final} \end{align}\tag{46}\]
We then obtain \[\begin{align} \mu &:= \dfrac{E_p' - E_p}{E_\gamma'-E_p} \nonumber\\ &= \frac{2}{n} - \dfrac{1}{n}\sum_{i}\sum_{j\in\mathcal{N}(i)}\dfrac{\Delta E_{ji}}{2}\tanh\left(\beta'\dfrac{\Delta E_{ji}}{2}\right)\frac{p_i-\gamma_i'}{E_\gamma'-E_p}. \label{eq:mu95EA} \end{align}\tag{47}\] At high temperatures where \(\beta'\Delta E_{\max}\ll 1\) with \(\Delta E_{\max} = \max_iE_i - \min_iE_i\), \(\mu\simeq 2/n\). This reflects the fact that in the high-temperature limit, the complex structure of the energy landscape can be neglected and the relaxation rate is only diminished by the size of the system.
As temperature decreases, the second term on the right-hand side of Eq. (47 ) becomes significant and therefore introduces more complication to the evaluation of \(\mu\). We note that according to the cooling schedule, \(\beta(k)\), \(\mu(k(\beta)) \equiv \mu(\beta)\) can be viewed as a function of \(\beta\). If \(\mu(k)\) does not depend on the system state \(p(k)\), \(\mu(\beta)\) should be the same for all cooling schedules when \(\beta\) is fixed. However, Fig. ([fig:mu]) provides a counter-example of a \(7\)-spin SK model showing that \(\mu(k)\) is not a sole function of \(\beta\) but relies on the history of \(\beta\). Hence, finding \(\mu(k)\) without referring to \(p(k)\) seems intractable. Moreover, as is depicted in the red dash-dotted line in Fig. ([fig:mu]), the \(\mu^\mathrm{PS}\) in Eq. (38 ) does bound \(\mu\) irrespective of cooling schedules and is even saturated for large \(\beta\). Therefore, at least for this specific example, at low temperatures, we cannot do much better than the \(\mu^\mathrm{PS}\) in Eq. (38 ). On the other hand, Fig. ([fig:mu]) provides numerical evidence for our conjecture [Eq. (34 )].
Finally, we note that the asymptotic behaviour of \(\mu\rightarrow 2/n\) at high temperatures, \(\beta\rightarrow 0\), is recognized in Fig. ([fig:mu]) for all three cooling schedules. To take advantage of this property, one may use \(\mu^\mathrm{PS}= 2/n\) for high temperatures and then use the \(\mu^\mathrm{PS}\) in Eq. (38 ) in further annealing to obtain a tighter bound than Eq. (39 ).
In this article, we derived a general bound for quantifying the annealing performance. The bound is derived in the framework of finite-time stochastic thermodynamics and thus can be applied to nano-scale systems and particular quantum systems whose evolution satisfies stochastic master equations. The bound shows that the 1-norm distance between the final statistical state of the system and the final thermal state is restricted by two problem-specific parameters: the averaged activity \(\langle \mathcal{A} \rangle_\tau\) and the change of relative entropy due to the annealing \(\mathcal{I}(\tau)\). The two quantities prove to obey a trade-off relation forced by the initial distance \(L_0\).
We further applied this bound to study the finite-time performance of SA. Under a reasonable conjecture that the discrete activity \(\mathcal{A}^\mathrm{D}\le1/2\) and purposing a slower partial swap model to bound \(\mathcal{I}\le\mathcal{I}^\mathrm{PS}\), we obtained a modified bound which does not require the intermediate states of the system and only refers to the rule of temperature descent and the energy landscape. Consequently, the new bound applies to any Glauber dynamics SA schedules and any cost function. We employed 1D Ising chains as an example to illustrate how the bound can be evaluated analytically. To validate our conjectures, we presented numerical simulations of \(7\)-spin SK models, demonstrating the performance of our bounds. Above all, our results show that recent developments in stochastic thermodynamics can be adapted to characterize the finite time behaviour of SA, and we expect such approaches can be further developed to understand other Monte Carlo algorithms.
We gratefully acknowledge valuable discussions with Alexander Yosifov, Barry Sanders, Li Xiao, and Yu Chai. This work was supported by National Natural Science Foundation of China (Grants No. 12050410246, No. 12005091).
Here we prove that the discrete-time evolution in SA, i.e. Eq. (20 ) in the main body, can always be simulated by a continuous-time master equation defined in Eq. (1 ) in the main body. For the clarity of notations, below we write probabilities in the discrete-time with a superscript \(\mathrm{D}\) and probabilities in the continuous time with a superscript \(\mathrm{C}\). Then, our aim is to prove that: \[p^\mathrm{D}_i(k+1) = \sum_{j}P_{ij}(k+1)p^\mathrm{D}_j(k), \label{eq:pD_evolution}\tag{48}\] can be constructed by \[\dot{p}^\mathrm{C}_i(t) = \sum_j\Gamma_{ij}(t)p^\mathrm{C}_j(t), \label{eq:pC_evolution}\tag{49}\] such that \[p^\mathrm{D}_i(k) = p^\mathrm{C}_i(t_k), \label{eq:equivalence95DC}\tag{50}\] for times \(t_{0}=0,t_{1},\dots,t_{k},\dots,t_{\kappa}=\tau\) with \(\kappa\) denoting the number of iterations in SA.
Integrating Eq. (49 ) from time \(t_k\) to time \(t_{k+1}\), we obtain \[\begin{align} p_i^\mathrm{C}(t_{k+1}) &= p_i^\mathrm{C}(t_k) + \int_{t_k}^{t_{k+1}} \mathrm{d}t \sum_j\Gamma_{ij}(t)p^\mathrm{C}_j(t)\\ &= \sum_{j}W_{ij}(k+1)p_j^\mathrm{C}(t_k), \end{align} \label{eq:W_evolution}\tag{51}\] where we have defined \[W_{ij}(k+1) = \delta_{ij} + \int_{t_k}^{t_{k+1}} \mathrm{d}t \,\Gamma_{ij}(t)p^\mathrm{C}_j(t)/p^\mathrm{C}_j(t_k),\label{eq:W_ij}\tag{52}\] for each \(k=0,1,\dots,\kappa-1\), where \(\delta_{ij}\) is the Kronecker delta. To show that there always exists a valid transition rate matrix making Eq. (51 ) reproduce Eq. (48 ), we introduce \[\Gamma_{ij}(t) = \sum_{k=0}^{\kappa-1}\delta \left(t-t_k^+\right) P_{ij}(k+1), \label{eq:Gamma_ij}\tag{53}\] for \(i\neq j\) and \(\Gamma_{ii}(t) = - \sum_{j(\neq i)}\Gamma_{ji}(t)\). Here, \(\delta(\cdot)\) is the Dirac \(\delta\)-function and \(t_k^+>t_k\) is a value close to \(t_k\).
As we will show, \(P_{ij}(k+1)\) guarantees that the transition rate defined in Eq. (53 ) induces the required thermal states and detailed balance relation. Due to the relation of \(P_{ij}(k+1)\) demanded by SA \[P_{ij}(k+1)\gamma_j^\mathrm{D}(k+1) = P_{ji}(k+1)\gamma_i^\mathrm{D}(k+1), \label{eq:P_ij-DB}\tag{54}\] it can be verified that the state defined as \[\gamma^{\mathrm{C}}\left(t\right)=\gamma^{\mathrm{D}}\left(k+1\right),\text{ if }t_{k}^{+}<t\leqslant t_{k+1} \label{eq:thermal95equivalence95DC}\tag{55}\] satisfies \[\Gamma_{ij}(t)\gamma_j^\mathrm{C}(t) = \Gamma_{ji}(t)\gamma_i^\mathrm{C}(t), \label{eq:DB_Gamma}\tag{56}\] for all \(i\neq j\). Thus, \(\gamma^{\mathrm{C}}\left(t\right)\) always satisfies the detailed balance condition and is a valid equilibrium state for the process governed by Eq. (53 ). Meanwhile, substituting Eq. (53 ) into Eq. (52 ), we obtain \[\begin{align} W_{ij}(k+1) &= P_{ij}(k+1), & \forall i\neq j,\\ W_{ii}(k+1) &= P_{ii}(k+1), & \forall i. \end{align}\] where we have used the fact that \(p_j^\mathrm{C}(t_k^+) / p_j^\mathrm{C}(t_k) \rightarrow 1\).
Finally, by taking \(p^\mathrm{D}_i(0) = p^\mathrm{C}_i(t_0)\), we obtain that the continuous-time evolution described by Eq. (49 ) and Eq. (53 ) is in accordance with the discrete-time evolution described by Eq. (48 ) and Eq. (54 ) at discrete times \(t_0,\dots,t_{\kappa}\).
Consequently, the time-averaged activity \(\langle \mathcal{A}^\mathrm{C}\rangle_\tau\) is defined as \[\begin{align} \langle \mathcal{A}^\mathrm{C}\rangle_\tau &= \frac{1}{\tau}\int_0^\tau \mathrm{d}t\sum_{i}\sum_{j(\neq i)}\Gamma_{ij}(t)p_j^\mathrm{C}(t)\\ &= \frac{1}{\tau}\sum_{k=0}^{\kappa-1}\int_{t_k}^{t_{k+1}}\mathrm{d}t\sum_{i}\sum_{j(\neq i)}\Gamma_{ij}(t)p_j^\mathrm{C}(t)\\ &= \frac{1}{\tau}\sum_{k=0}^{\kappa-1}\sum_{i}\sum_{j(\neq i)}P_{ij}(k+1)p^\mathrm{D}_j(k). \end{align}\] To obtain an analogy of \(\langle \mathcal{A}^\mathrm{C}\rangle_\tau\) for the discrete case in SA, we suppose that the time interval \(t_{k+1} - t_{k}\) is a unit time \(1\), such that \(\tau = \kappa\), and define the iteration-averaged activity \(\langle \mathcal{A}^\mathrm{D}\rangle_\kappa\) as \[\begin{align} \langle \mathcal{A}^\mathrm{D}\rangle_\kappa &= \langle \mathcal{A}^\mathrm{C}\rangle_\tau \\ &= \frac{1}{\kappa}\sum_{k=0}^{\kappa-1}\sum_{i}\sum_{j(\neq i)}P_{ij}(k+1)p^\mathrm{D}_j(k)\\ &\equiv \frac{1}{\kappa}\sum_{k=0}^{\kappa-1} \mathcal{A}^\mathrm{D}(k), \label{eq:A95equivalence95DC} \end{align}\tag{57}\] where the discrete activity is defined as \[\mathcal{A}^\mathrm{D}(k) = \sum_{i}\sum_{j(\neq i)}P_{ij}(k+1)p^\mathrm{D}_j(k).\] Since the entropy production depends on the path of the statistical state of the system, it is more sensible to define the entropy production for the discrete-time evolution according to the underlying continuous-time evolution, in which the entropy production rate is well-defined as Eq. (10 ) in the main body, i. e. \[\dot{\Sigma}^\mathrm{C} = \frac{1}{2}\sum_{i}\sum_{j(\neq i)}\left[\Gamma_{ij}p_j^\mathrm{C}-\Gamma_{ji}p_i^\mathrm{C}\right]\ln\frac{\Gamma_{ij}p_j^\mathrm{C}}{\Gamma_{ji}p_i^\mathrm{C}}.\] Therefore, we define \[\Sigma^\mathrm{D}(\kappa) \equiv \Sigma^\mathrm{C}(\tau) = \int_0^\tau \dot{\Sigma}^\mathrm{C}(t) \mathrm{d}t.\] By the ‘speed limit’ with \(\Sigma^\mathrm{C}(\tau)\) in Eq. (16 ) in the main body, we have \[\Sigma^\mathrm{D}(\kappa) = \Sigma^\mathrm{C}(\tau) \ge \frac{(L_0 - L_\tau)^2}{2\langle \mathcal{A}^\mathrm{C}\rangle_\tau\tau} = \frac{(L_0 - L_\kappa)^2}{2\langle \mathcal{A}^\mathrm{D}\rangle_\kappa\kappa}, \label{eq:speed95limit95D}\tag{58}\] where we have used \(\tau = \kappa\), Eq. (57 ) and \[\begin{align} L_0 &= \sum_i|p^\mathrm{C}_i(0)-\gamma^\mathrm{C}_i(\tau)| = \sum_i|p^\mathrm{D}_i(0)-\gamma^\mathrm{D}_i(\kappa)|,\\ L_\tau& = \sum_i|p^\mathrm{C}_i(\tau)-\gamma^\mathrm{C}_i(\tau)|,\\ L_\kappa& = \sum_i|p^\mathrm{D}_i(\kappa)-\gamma^\mathrm{D}_i(\kappa)| = L_\tau. \end{align}\] The last equality \(L_\kappa = L_\tau\) is from \(p^\mathrm{D}(\kappa) = p^\mathrm{C}(\tau)\) by Eq. (50 ) and \(\gamma^\mathrm{D}(\kappa) = \gamma^\mathrm{C}(\tau)\) by Eq. (55 ). Eq. (58 ) is therefore a ‘speed limit’ in the discrete-time case.
Accordingly, we can define the accumulated change in relative entropy \(\mathcal{I}^\mathrm{D}(\kappa)\) by \(\mathcal{I}^\mathrm{C}(\tau)\) in Eq. (7 ) in the main body, such that \[\begin{align} \mathcal{I}^\mathrm{D}(\kappa) &\equiv \mathcal{I}^\mathrm{C}(\tau) \\ &= \int_0^\tau \left[E_p^\mathrm{C}(t)-E_\gamma^\mathrm{C}(t)\right]\dot{\beta}(t)\mathrm{d} t\\ &= \sum_i \int_0^\tau E_i\left[p_i^\mathrm{C}(t)-\gamma_i^\mathrm{C}(t)\right]\dot{\beta}(t)\mathrm{d} t\\ &= \sum_{k=0}^{\kappa-1} \sum_i \int_{t_k}^{t_{k+1}} E_i\left[p_i^\mathrm{C}(t)-\gamma_i^\mathrm{C}(t)\right]\dot{\beta}(t)\mathrm{d} t \\ &= \sum_{k=0}^{\kappa-1} \sum_i E_i \left[p_i^\mathrm{C}(t_{k+1})-\gamma_i^\mathrm{C}(t_{k+1})\right]\left[\beta(t_{k+1})-\beta(t_{k})\right]\\ &= \sum_{k=0}^{\kappa-1} \sum_i E_i \left[p_i^\mathrm{D}(k+1)-\gamma_i^\mathrm{D}(k+1)\right]\left[\beta(k+1)-\beta(k)\right]\\ &= \sum_{k=0}^{\kappa-1}\left[E_p^\mathrm{D}(k+1)-E_\gamma^\mathrm{D}(k+1)\right]\left[\beta(k+1)-\beta(k)\right], \end{align}\] where in the 4th line we used the fact that \(p^\mathrm{C}(t) = p^\mathrm{C}(t_{k+1})\) and \(\gamma^\mathrm{C}(t) = \gamma^\mathrm{C}(t_{k+1})\) for \(t_k^+ < t \le t_{k+1}\), while in the 5th line, \(\beta(t_k) = \beta(k)\) is in accordance with Eq. (55 ).
From Eq. (8 ) in the main body, we have \[\begin{align} S(p^\mathrm{D}(\kappa)||\gamma^\mathrm{D}(\kappa)) &= S(p^\mathrm{C}(\tau)||\gamma^\mathrm{C}(\tau)) \\&= -\Sigma^\mathrm{C}(\tau) + \mathcal{I}^\mathrm{C}(\tau) \\&= -\Sigma^\mathrm{D}(\kappa) + \mathcal{I}^\mathrm{D}(\kappa). \end{align}\] Using Pinsker’s inequality, \(L_\kappa^2\le 2 S(p^\mathrm{D}(\kappa)||\gamma^\mathrm{D}(\kappa))\) and the ‘speed limit’ [Eq. (58 )], we obtain \[L_\kappa^2 \le -\frac{(L_0 - L_\kappa)^2}{\langle \mathcal{A}^\mathrm{D}\rangle_\kappa\kappa} + 2\mathcal{I}^\mathrm{D}(\kappa),\] which is an analogy of Eq. (17 ) in the main body. Solving the inequality for \(L_\kappa\) gives us the discrete-time performance bound: \[L_\kappa \le \dfrac{L_0 + \sqrt{\langle \mathcal{A}^\mathrm{D}\rangle_\kappa\kappa\left(-L_0^2 + 2\mathcal{I}^\mathrm{D}(\kappa)\left(\langle \mathcal{A}^\mathrm{D}\rangle_\kappa\kappa + 1\right)\right)}}{\langle \mathcal{A}^\mathrm{D}\rangle_\kappa\kappa+1},\] analogue to Eq. (18 ) in the main body.