November 17, 2023
The Worldvolume Hybrid Monte Carlo method (WV-HMC method) [arXiv:2012.08468] is a reliable and versatile algorithm towards solving the sign problem. Similarly to the tempered Lefschetz thimble method, this method removes the ergodicity problem inherent in algorithms based on Lefschetz thimbles. In addition to this advantage, the WV-HMC method significantly reduces the computational cost because one needs not compute the Jacobian of deformation in generating configurations. A crucial step in this method is the RATTLE algorithm, where the Newton method is used at each molecular dynamics step to project a transported configuration onto a submanifold (worldvolume) in the complex space. In this paper, we simplify the RATTLE algorithm by employing a simplified Newton method (the fixed-point method) along with iterative solvers for orthogonal decompositions of vectors, and show that this algorithm further reduces the computational cost. We also apply this algorithm to the HMC algorithm for the generalized thimble method (GT-HMC method). We perform a numerical test for the convergence of the simplified RATTLE algorithm, and show that the convergence depends on the system size only weakly. The application of this simplified algorithm to various models will be reported in subsequent papers.
KUNS-2987
Simplified algorithm for the Worldvolume HMC
and the Generalized-thimble HMC
Masafumi Fukuma1
Department of Physics, Kyoto University, Kyoto 606-8502, Japan
The numerical sign problem has been a major obstacle to first-principles calculations in various important physical systems. Typical examples include finite-density QCD [1], Quantum Monte Carlo calculations for strongly correlated electron systems and frustrated spin systems [2], and the real-time dynamics of quantum many-body systems.
The sign problem has a long history, and specific algorithms have been developed so far for each system with the sign problem. However, in the last two decades there has been a movement to develop more versatile methods for solving the sign problem, and various algorithms have been proposed. One of these is a class of algorithms based on the complex Langevin equation [3]–[8]. Another is based on Lefschetz thimbles [9]–[24] (the path optimization method [25]–[28] may be included in this class). There has also been an intensive study of non-Monte Carlo techniques, such as the tensor network method [29]–[33].
As will be reviewed in Sect. 2, in the Lefschetz thimble method, one deforms the integration surface of the path integral into the complex space so that the sign problem is alleviated on the new integration surface. This algorithm has the advantage that correct convergence is guaranteed by the Picard-Lefschetz theory, and in principle it can be applied to any system so long as the system can be expressed with continuous variables. However, as will be also discussed in Sect. 2, naive Monte Carlo implementations lead to serious ergodicity problems [13], [14].
The tempered Lefschetz thimble method (TLT method) [17] was introduced to solve the sign and ergodicity problems simultaneously, by implementing the tempering algorithm using the deformation parameter as the tempering parameter. The TLT method, however, has a drawback of high computational cost. In fact, one needs to compute the Jacobian of the deformation at every stochastic step along the direction of deformation, whose cost is \(O(N^3)\) (\(N\) is the number of degrees of freedom). To reduce the computational cost, the Worldvolume Hybrid Monte Carlo method (WV-HMC method) was invented in Ref. [23], where one considers molecular dynamics over a continuous accumulation of deformed surfaces (worldvolume). While retaining the advantages of the TLT method, the WV-HMC method significantly reduces the computational cost because it no longer needs the computation of the Jacobian in generating configurations.
The main aim of this paper is to clarify and simplify the WV-HMC algorithm to a level at which it is accessible to a wider range of researchers. A crucial step in the WV-HMC method is the RATTLE algorithm [34], [35], which projects at each molecular dynamics step a transported configuration onto the worldvolume [23]. The projection requires solving constraint equations, which can be done with the Newton method. In this paper, we simplify the RATTLE algorithm by employing a simplified Newton method, and show that this algorithm further reduces the numerical cost. The introduction of a simplified Newton method to the Lefschetz thimble method was first made as the fixed-point method in a seminal paper by the Komaba group [12], where the projection onto a single thimble is considered using the explicit form of the Jacobian.2 The fixed-point method saves us from solving extra linear equations that was necessary for the standard (non-simplified) Newton method [22], [23]. Furthermore, by combining the fixed-method with iterative solvers [19] for orthogonal decompositions of vectors that use only integrations of flow equations, one no longer needs to compute the Jacobian explicitly unlike the original fixed-point method. We also apply this algorithm to the HMC algorithm for the generalized thimble method (GT-HMC method) [21], [22]. The application of the simplified WV-HMC algorithm to various models will be reported in subsequent papers [36]–[38].
This paper is organized as follows. In Sect. 2, we first define the sign problem, and briefly summarize various algorithms proposed so far based on Lefschetz thimbles. Section 3 deals with the simplification of the GT-HMC method. We show that the projection onto a deformed integration surface can be effectively performed by a simplified Newton method (fixed-point method) when combined with iterative solvers for orthogonal decompositions of vectors. This algorithm is extended to the WV-HMC method in Sect. 4. In Sect. 5, we perform a numerical test for the convergence of the simplified RATTLE algorithm, and show that the convergence depends on the system size only weakly. Section 6 is devoted to conclusion and outlook for the application of the simplified algorithm to various models.
Let \(x=(x^i)\in{\mathbb{R}}^N\) be a dynamical variable with flat measure \(dx\equiv dx^1\wedge\cdots\wedge dx^N\), and \(S(x)\) and \({\mathcal{O}}(x)\) the action and observables, respectively. Our aim is to estimate the expectation values of \({\mathcal{O}}\) with respect to the Boltzmann weight \(\rho(x)\equiv e^{-S(x)}/\int dx\,e^{-S(x)}\): \[\begin{align} \langle {\mathcal{O}}\rangle \equiv \int_{{\mathbb{R}}^N} dx\,\rho(x)\,{\mathcal{O}}(x) = \frac{\int_{{\mathbb{R}}^N} dx\,e^{-S(x)}\,{\mathcal{O}}(x)}{\int_{{\mathbb{R}}^N} dx\,e^{-S(x)}}. \label{vev} \end{align}\tag{1}\] When the action is complex-valued, one cannot regard \(\rho(x)\) as a probability distribution, and a direct use of the Monte Carlo method is not possible. A standard way around is to reweight the integral with the real part of the action, \({\rm Re}\,S(x)\), and rewrite the integral as a ratio of reweighted averages: \[\begin{align} \langle {\mathcal{O}}\rangle = \frac{\langle e^{-i {\rm Im}\,S(x)}\,{\mathcal{O}}(x) \rangle_{\rm rewt}}{\langle e^{-i {\rm Im}\,S(x)} \rangle_{\rm rewt}}, \label{rewt1} \end{align}\tag{2}\] where the reweighted average \(\langle\cdots\rangle_{\rm rewt}\) is defined by \[\begin{align} \langle g(x) \rangle_{\rm rewt} \equiv \frac{\int dx\,e^{-{\rm Re}\,S(x)}\,g(x)}{\int dx\,e^{-{\rm Re}\,S(x)}}. \label{rewt2} \end{align}\tag{3}\] The reweighted averages in Eq. 2 become highly oscillatory integrals with large degrees of freedom (\(N\gg 1\)), giving very small values of the form \(e^{-O(N)}\). This should not be a problem if the reweighted averages can be estimated precisely, but in the Monte Carlo calculations they are accompanied by statistical errors of \(O(1/\sqrt{{N_{\rm conf}}\,})\) for a sample of size \({N_{\rm conf}}\,\): \[\begin{align} \langle {\mathcal{O}}\rangle \approx \frac{e^{-O(N)} \pm O(1/\sqrt{{N_{\rm conf}}\,})}{e^{-O(N)} \pm O(1/\sqrt{{N_{\rm conf}}\,})}, \end{align}\] and thus we need an exponentially large sample size, \({N_{\rm conf}}\,\gtrsim e^{O(N)}\), in order to make the statistical errors relatively smaller than the means. This is the sign problem we consider in this paper.
In the Lefschetz thimble method, the integration surface \(\Sigma_0 = {\mathbb{R}}^N=\{x\}\) is continuously deformed into the complex space \({\mathbb{C}}^N=\{z=x+i\,y\}\) in such a way that the oscillatory behavior is alleviated on the deformed surface \(\Sigma \in {\mathbb{C}}^N\). Throughout this paper, we assume that \(e^{-S(z)}\) and \(e^{-S(z)}\,{\mathcal{O}}(z)\) are entire functions in \({\mathbb{C}}^N\) (which usually holds for systems of interest). Then, Cauchy’s theorem ensures that the integrals do not change under deformations if the boundaries at \({\rm Re}\,z\to\pm\infty\) are kept fixed, and we have \[\begin{align} \langle {\mathcal{O}}\rangle &= \frac{\int_\Sigma dz\, e^{-S(z)}\,{\mathcal{O}}(z)}{\int_\Sigma dz\, e^{-S(z)}}. \label{vev95Sigma} \end{align}\tag{4}\] We consider the deformation with the anti-holomorphic flow defined by the following flow equation: \[\begin{align} \dot{z} = \overline{\partial S(z)} \quad \bigl[ \partial S(z) = (\partial_i S(z))~(i=1,\ldots,N) \bigr]. \end{align}\] This leads to the inequality \([S(z)]^\centerdot=(\partial S(z)/\partial z)\cdot \dot{z} = |\partial S(z)|^2\geq 0\), from which we know that \({\rm Re}\,S(z)\) always increases under the flow except at critical points \(\zeta\) [where the gradient of the action vanishes, \(\partial S(\zeta)\)=0], while \({\rm Im}\,S(z)\) is kept constant. This flow sends the original integration surface \(\Sigma_0={\mathbb{R}}^N\) to a deformed surface \(\Sigma_t\) at flow time \(t\), which in the large flow time limit moves to a vicinity of a homological sum of Lefschetz thimbles: \[\begin{align} \Sigma_t \to \sum_\sigma n_\sigma\,{\mathcal{J}}_\sigma \quad (n_\sigma\in{\mathbb{Z}}). \end{align}\] Here, \(\sigma\) labels critical points, and \(J_\sigma\) is the Lefschetz thimble associated with critical point \(\zeta_\sigma\), which is defined as the union of orbits flowing out of \(\zeta_\sigma\).3 Since \({\rm Im}\,S(z)\) is constant on each Lefschetz thimble [\({\rm Im}\,S(z)={\rm Im}\,S(z_\sigma)\) for \(z\in{\mathcal{J}}_\sigma\)], the oscillatory behavior of integrals at large flow times is expected to be much relaxed around each Lefschetz thimble.
While the sign problem attributed to oscillatory integrals gets alleviated as we increase the flow time, there comes out another problem, the ergodicity problem. In fact, \({\rm Re}\,S(z)\) diverges at the boundaries of Lefschetz thimbles, which are zeros of the Boltzmann weight \(\propto e^{-S(z)}\), and it is hard for configurations to move from a vicinity of one thimble to that of another thimble in stochastic processes (see Fig. 1). Thus, we have a dilemma between the alleviation of the sign problem and the emergence of the ergodicity problem.
The generalized thimble method [16] is an algorithm which makes a sampling on a deformed surface at such a flow time that is large enough to relax the oscillatory behavior and at the same time is small enough to avoid the ergodicity problem. However, a closer investigation [20] shows that the oscillatory behaviors usually starts being relaxed only after the deformed surface reaches some of the zeros of \(e^{-S(z)}\), so that one can hardly expect such an ideal flow time to be found. Nevertheless, this algorithm is still useful for grasping a flow time at which the sign problem starts being relaxed, by observing the average phase factors \(\langle e^{-i {\rm Im}\,S(z)}\,dz/|dz| \rangle_{\Sigma_t}\) at various flow times. Configurations on (a connected component of) a deformed surface \(\Sigma_t\) can be generated efficiently with the Hybrid Monte Carlo algorithm [21], [22], which we refer in this paper to the Generalized-thimble Hybrid Monte Carlo (GT-HMC) and review in the next section.4 The tempered Lefschetz thimble (TLT) method [17] avoids the above dilemma by implementing the tempering algorithm with the flow time as the tempering parameter. This is the first algorithm that solves the sign and ergodicity problems simultaneously, but has a drawback of large computational cost [\(O(N^3)\) for generating a configuration]. The Worldvolume Hybrid Monte Carlo (WV-HMC) method [23] was then introduced to reduce the computational cost of the TLT method while still retaining its advantages. This is based on the molecular dynamics on a continuous accumulation (worldvolume) of deformed surfaces (see Fig. 2). The TLT and WV-HMC methods have been successfully applied to \((0+1)\)-dimensional Thirring model [17], the Hubbard model away from half filling [20] and the Stephanov model [23] (although the system sizes are yet small).
We explain the basics of GT-HMC [21], [22]5 and propose its simplified algorithm. In the following, \(\Sigma\equiv\Sigma_t\) is the deformed surface at flow time \(t\), and \(T_z \Sigma\) and \(N_z\Sigma\) represent the tangent and the normal spaces at \(z\) to \(\Sigma\), respectively.
We start from the expression [Eq. 4 ] \[\begin{align} \langle {\mathcal{O}}\rangle &= \frac{\int_\Sigma dz\, e^{-S(z)}\,{\mathcal{O}}(z)}{\int_\Sigma dz\, e^{-S(z)}}. \label{vev95Sigma2} \end{align}\tag{5}\] With local coordinates6 \(x=(x^a)\) \((a=1,\ldots,N)\) and the Jacobian \(E^i_a \equiv \partial z^i/\partial x^a\), the holomorphic \(N\)-form \(dz = dz^1\wedge\cdots\wedge dz^N\) is expressed as \[\begin{align} dz = \det E\,dx. \end{align}\] We introduce the inner product \[\begin{align} \langle u, v \rangle \equiv {\rm Re\,}u^\dagger v = \sum_{i=1}^N {\rm Re\,} \overline{u^i} v^i \,\,\bigl( = \langle v, u \rangle \bigr) \end{align}\] for vectors \(u=(u^i),\,v=(v^i)\in{\mathbb{C}}^N\). The induced metric \(ds^2 = \gamma_{ab}\,dx^a dx^b \equiv |dz(x)|^2\) is then given by7 \[\begin{align} \gamma_{ab} = \langle E_a,E_b \rangle = E_a^\dagger E_b, \end{align}\] which yields the invariant volume element on \(\Sigma\), \[\begin{align} |dz| \equiv \sqrt{\gamma}\,dx = |\det E\,|\,dx. \end{align}\] The expectation value 5 is then expressed as a ratio of reweighted averages on \(\Sigma\): \[\begin{align} \langle {\mathcal{O}}\rangle = \frac{\langle {\mathcal{F}}(z)\,{\mathcal{O}}(z) \rangle_\Sigma}{\langle {\mathcal{F}}(z) \rangle_\Sigma}, \end{align}\] where \(\langle\cdots\rangle_\Sigma\) is defined by \[\begin{align} \langle g(z) \rangle_\Sigma \equiv \frac{\int_\Sigma |dz|\,e^{-{\rm Re}\,S(z)}\,g(z)}{\int_\Sigma |dz|\,e^{-{\rm Re}\,S(z)}} \end{align}\] and \({\mathcal{F}}(z)\) is the associated reweighting factor: \[\begin{align} {\mathcal{F}}(z) \equiv \frac{dz}{|dz|}\,e^{-i\,{\rm Im}\,S(z)} = \frac{\det E}{|\det E\,|}\,e^{-i\,{\rm Im}\,S(z)}. \end{align}\]
The reweighted average \(\langle \cdots \rangle_\Sigma\) can be written as a path integral over the phase space by rewriting the measure \(|dz|=\sqrt{\gamma}\,dx\) to the form \[\begin{align} |dz| = \sqrt{\gamma}\,dx \propto dx\,dp\,e^{-(1/2)\,\gamma^{ab}\,p_a p_b}, \end{align}\] where \(dx\,dp\equiv \prod_a \bigl( dx^a dp_a \bigr)\) is the volume element of the phase space and \((\gamma^{ab})\equiv (\gamma_{ab})^{-1}\). We thus obtain the phase-space path integral in the parameter-space representation: \[\begin{align} \langle g(z) \rangle_\Sigma = \frac{\int dx\,dp\,e^{-(1/2)\,\gamma^{ab}\,p_a p_b - {\rm Re}\,S(z(x))}\,g(z(x))}{\int dx\,dp\,e^{-(1/2)\,\gamma^{ab}\,p_a p_b - {\rm Re}\,S(z(x))}}. \end{align}\] Note that the volume element can be expressed as \(dx\,dp = \omega^N/N!\) with the symplectic 2-form \(\omega\equiv dp_a\wedge dx^a\).
In Monte Carlo calculations, it is more convenient to rewrite everything in terms of the target space coordinates \(z=(z^i)\). To do this, we introduce the momentum \(\pi=(\pi^i)\) which is tangent to \(\Sigma\): \[\begin{align} \pi \in T_z\Sigma ~~~with~~ \pi^i \equiv p^a E_a^i \quad (p^a\equiv \gamma^{ab}\,p_b). \end{align}\] One then can show that the 1-form \[\begin{align} a\equiv \langle\pi, dz\rangle = {\rm Re}\,\overline{\pi^i}\,dz^i \end{align}\] can be expressed as \(a=p_a dx^a\), and thus we find that \(a\) is a symplectic potential of \(\omega\), \(\omega = da\), and obtain the identity \[\begin{align} \omega = {\rm Re}\,d\overline{\pi^i}\wedge dz^i. \end{align}\] Furthermore, noting the identity \[\begin{align} \langle \pi,\pi \rangle = \gamma^{ab} p_a p_b\quad (\pi\in T_z\Sigma), \end{align}\] we have the following target-space representation: \[\begin{align} \langle g(z) \rangle = \frac{\int_{T\Sigma}\, \omega^N\,e^{-H(z,\pi)}\,g(z)}{\int_{T\Sigma}\, \omega^N\,e^{-H(z,\pi)}}. \end{align}\] Here, \(T\Sigma \equiv \{(z,\pi)\,|\,z\in\Sigma,\,\pi\in T_z\Sigma\}\) is the tangent bundle of \(\Sigma\), and \(H(z,\pi)\) is the Hamiltonian of the form8 \[\begin{align} H(z,\pi) = \frac{1}{2}\,\langle\pi,\pi\rangle + V(z) \end{align}\] with the (real-valued) potential \[\begin{align} V(z) = {\rm Re}\,S(z) = \frac{1}{2}\,[S(z) + \overline{S(z)}]. \end{align}\]
We assume that the \(N\)-dimensional real submanifold \(\Sigma\) in \({\mathbb{C}}^N={\mathbb{R}}^{2N}\) is specified by \(N\) independent equations \(\phi^r(z)=0\) \((r=1,\ldots,N)\) with real-valued functions \(\phi^r(z)\). In order to define a consistent molecular dynamics on \(\Sigma\), we consider the Hamilton dynamics for an action of the first-order form: \[\begin{align} I[z,\pi,\lambda] = \int ds\,\Bigl[ \langle \pi, {\overset{\circ}{z}} \rangle - H(z,\pi) - \lambda_r\, \phi^r(z) \Bigr]. \end{align}\] Here, \({\overset{\circ}{z}}\equiv dz/ds\), and \(\lambda_r\in{\mathbb{R}}\) are Lagrange multipliers. Hamilton’s equations are then given by9 \[\begin{align} {\overset{\circ}{z}} &= \pi, \tag{6} \\ {\overset{\circ}{\pi}} &= - 2\overline{\partial V(z)} - 2\lambda_r\, \overline{\partial\phi^r(z)} \tag{7} \end{align}\] with constraints \[\begin{align} \phi^r(z) &= 0, \tag{8} \\ \langle \pi, \overline{\partial\phi^r} \rangle &= 0. \tag{9} \end{align}\] One can easily show that the symplectic potential \(a\) changes under molecular dynamics as \({\overset{\circ}{a}} = d[(1/2)\langle \pi,\pi \rangle - V(z)]\), from which follows \({\overset{\circ}{\omega}} = d{\overset{\circ}{a}} = 0\). Furthermore, noting that \(\lambda\equiv \lambda_r\,\overline{\partial \phi^r(z)}\in N_z\Sigma\),10 one can also show that \({\overset{\circ}{H}} = 0\).
A discretized form of Eqs. 6 –9 with step size \(\Delta s\) is given by RATTLE [34], [35] of the following form (we rescale \(\lambda\to\lambda/\Delta s\) for later convenience) [12], [21], [22]: \[\begin{align} \pi_{1/2} &= \pi - \Delta s\,\overline{\partial V(z)} - \lambda/\Delta s, \tag{10} \\ z' &= z + \Delta s\,\pi_{1/2}, \tag{11} \\ \pi' &= \pi_{1/2} - \Delta s\,\overline{\partial V(z')} - \lambda'/\Delta s. \tag{12} \end{align}\] Here, the Lagrange multipliers \(\lambda\in N_z\Sigma\) and \(\lambda'\in N_{z'}\Sigma\) are determined such that \(z'\in \Sigma\) and \(\pi'\in T_{z'}\Sigma\), respectively. One easily sees that the transformation \((z,\pi)\to(z',\pi')\) satisfies the reversibility.11 Noting that \(\langle \lambda,dz \rangle = 0\) and \(\langle \lambda', dz' \rangle = 0\), one can also show that the symplectic potential \(a= \langle \pi, dz \rangle\) transforms as follows: \[\begin{align} a_{1/2} &\equiv \langle \pi_{1/2}, dz\rangle = a - (\Delta s/2)\,dV(z), \\ a'_{1/2} &\equiv \langle \pi_{1/2}, dz'\rangle = a_{1/2} + (\Delta s/2)\,d\langle \pi_{1/2},\pi_{1/2}\rangle, \\ a' &\equiv \langle \pi', dz'\rangle = a'_{1/2} - (\Delta s/2)\,dV(z') \nonumber \\ &= a + (\Delta s/2)\,d\bigl[\langle \pi_{1/2},\pi_{1/2}\rangle - V(z) - V(z')\bigr], \end{align}\] from which we find that this transformation is symplectic (\(\omega' = \omega\)) and thus volume-preserving (\(\omega'^N = \omega^N\)). One can further show that this transformation preserves the Hamiltonian to \(O(\Delta s^2)\):12 \[\begin{align} H(z',\pi') = H(z,\pi) + O(\Delta s^3). \end{align}\]
As we see in the next subsection, in determining \(\lambda\) and \(\lambda'\), we repeatedly project a vector \(w\in T_z{\mathbb{C}}^N\) onto the tangent space \(T_z\Sigma\) and the normal space \(N_z\Sigma\): \[\begin{align} w = v + n \quad \bigl(v\in T_z\Sigma,~ n\in N_z\Sigma\bigr). \label{gt95projector} \end{align}\tag{13}\] This projection can be carried out by an iterative use of flow as follows [12]. For \(z\in\Sigma\) and its starting configuration \(x\in\Sigma_0={\mathbb{R}}^N\), we introduce an \({\mathbb{R}}\)-linear map \(A:\,T_{x}{\mathbb{C}}^N \to T_z{\mathbb{C}}^N\) which consists of three steps (see Fig. 3):
(1) For a given vector \(w_0\in T_{x}{\mathbb{C}}^N\), decompose it into \[\begin{align} v_0 \equiv \frac{1}{2}\,(w_0 + \bar{w}_0) \in T_{x}\Sigma_0,\quad n_0 \equiv \frac{1}{2}\,(w_0 - \bar{w}_0) \in N_{x}\Sigma_0. \end{align}\] (2) Integrate the following flow equations that maps \(v_0\in T_{x}\Sigma_0\) to \(v\in T_z\Sigma\) and \(n_0\in N_{x}\Sigma_0\) to \(n\in N_z\Sigma\), \[\begin{align} \dot{z} &= \overline{\partial S(z)} ~~with~~ z|_{t=0} = x, \tag{14} \\ \dot{v} &= \overline{H(z)\,v} ~~with~~ v|_{t=0} = v_0, \tag{15} \\ \dot{n} &= - \overline{H(z)\,n} ~~with~~ n|_{t=0} = n_0, \tag{16} \end{align}\] where \(H(z)\equiv (\partial_i \partial_j S(z))\) is the Hessian matrix.13 Note that \(v\) and \(n\) are linear in \(v_0\) and \(n_0\), respectively, and we write them as \[\begin{align} v^i \equiv E^i_a v_0^a = (E v_0)^i,\quad n^i \equiv F^i_a n_0^a = (F n_0)^i. \end{align}\] (3) Define an \({\mathbb{R}}\)-linear map \(A:\,T_{x}{\mathbb{C}}^N\ni w_0 \mapsto w\in T_z{\mathbb{C}}^N\) by \[\begin{align} w = A w_0 \equiv E v_0 + F n_0 ~~for~~ w_0=v_0+n_0. \label{gt95projection} \end{align}\tag{17}\]
Once the map \(A\) is defined, the decomposition 13 of a given \(w\in T_z{\mathbb{C}}^N\) can be carried out as in Algorithm 4. Note that, if we use an iterative method (such as BiCGStab) for solving \(A w_0 = w\) in Step 1, we no longer need to carry out Step 2 and Step 3 [19]. This is because in Step 1 we repeatedly compute \(E \tilde{v}_0\) and \(F \tilde{n}_0\) for a candidate solution \(\tilde{w}_0 = \tilde{v}_0 + \tilde{n}_0\), so that \(v = E v_0\) and \(n = F n_0\) are already obtained when the iteration is converged.
The Lagrange multiplies \(\lambda\in N_z\Sigma\) and \(\lambda'\in N_{z'}\Sigma\) in Eqs. 10 –12 are determined as follows.
The condition that \(z'\in \Sigma\) is equivalent to that \(z'\) can be written as \(z'=z_t(x')\) with \(x'\in\Sigma_0\) (see Fig. 5).
Thus, finding \(\lambda\) satisfying Eqs. 10 and 11 for a given \(z=z_t(x)\in\Sigma\) and \(\pi\in T_z\Sigma\) is equivalent to finding a doublet \((u,\lambda)\) \((u\in T_{x}\Sigma_0,\,\lambda\in N_z\Sigma)\) that satisfies \[\begin{align} z_t(x+u) = z_t(x) + \Delta z - \lambda \label{gt95newton1} \end{align}\tag{18}\] with \[\begin{align} \Delta z(z,\pi)\equiv \Delta s\,\pi - (\Delta s)^2\, \overline{\partial V(z)}. \end{align}\] Equation 18 can be solved iteratively with Newton’s method. There, one solves the following linearized equation in updating an approximate solution \((u,\lambda)\) as \(u\gets u+\Delta u\) and \(\lambda\gets \lambda+\Delta \lambda\): \[\begin{align} E_{\rm new} \Delta u + \Delta \lambda = B, \label{gt95newton2} \end{align}\tag{19}\] where \(E_{\rm new}\equiv \partial z_t(x+u)/\partial u = (\partial z_t/\partial x)(x+u)\), and \[\begin{align} B\equiv z + \Delta z - \lambda - z_{\rm new}\in {\mathbb{C}}^N \label{gt95B} \end{align}\tag{20}\] with \(z_{\rm new}\equiv z_t(x+u)\).
Equation 19 can be solved with direct or iterative methods by regarding it as a linear equation of the form \(\tilde{A} X=B\) with respect to \(X = (\Delta u, F^{-1}\Delta \lambda)\) as carried out in Ref. [22].14 Instead of solving Eq. 19 , we here propose to use the simplified Newton equation (corresponding to the fixed-point method of Ref. [12] for the case of the projection onto a single thimble), where \(E_{\rm new}\) on the left hand side is replaced by the value at \(z=z_t(x)\): \[\begin{align} E \Delta u + \Delta \lambda = B. \label{gt95newton3} \end{align}\tag{21}\] This equation can be readily solved by using the projection introduced in Sect. 3.3. To see this, from the orthogonal decomposition \[\begin{align} B =& B_v + B_n = E B_{0,v} + F B_{0,n} \end{align}\] with \(B_v\in T_z\Sigma\), \(B_n\in N_z\Sigma\), \(B_{0,v}\in T_{x}\Sigma_0\) and \(B_{0,n}\in N_{x}\Sigma_0\), we write \(B\) to the form \[\begin{align} B = E B_{0,v} + B_n. \end{align}\] Then, comparing with the left hand side of Eq. 21 , we obtain15 \[\begin{align} \Delta u = B_{0,v},\quad \Delta \lambda = B_n. \end{align}\] Note that, if we set the initial guess to \(u=0\) and \(\lambda=0\), then the first run in the iteration gives the following result with respect to the decomposition \(\Delta z = E(\Delta z)_{0,v} + (\Delta z)_n\): \[\begin{align} u = (\Delta z)_{0,v},\quad \lambda = (\Delta z)_n \quad (approximate solution). \label{gt95init2} \end{align}\tag{22}\]
Note that determining \(\lambda'\) in Eq. 12 such that \(\pi'\in T_{z'}\Sigma\) is equivalent to projecting \(\tilde{\pi}' \equiv \pi_{1/2} - \Delta s\,\overline{\partial V(z')}\) onto \(T_{z'}\Sigma\). Thus, \(\pi'\) is simply obtained from the decomposition \(\tilde{\pi}' = \tilde{\pi}'_v + \tilde{\pi}'_n\) at \(z'\) as \(\pi' = \tilde{\pi}'_v\).
Molecular dynamics from a configuration \((z,\pi) \in T\Sigma\) is summarized in Algorithm 6.
We summarize in Algorithm 7 the GT-HMC algorithm for updating a configuration \(z\in\Sigma\) with the RATTLE of Algorithm 6.16
We explain the basics of WV-HMC [23] and propose its simplified algorithm. For convenience of the reader who reads only this section, the presentation is made in a completely parallel way to that for GT-HMC without worrying about repetition.
We restart from the expression 4 : \[\begin{align} \langle {\mathcal{O}}\rangle = \frac{\int_{\Sigma_t} dz_t\,e^{-S(z_t)}\,{\mathcal{O}}(z_t)}{\int_{\Sigma_t} dz_t\,e^{-S(z_t)}}, \end{align}\] where we have denoted the configurations by \(z_t\) (instead of \(z\)) to stress that they live on \(\Sigma_t\). Since the numerator and the denominator are both independent of \(t\) (Cauchy’s theorem), they can be averaged over flow time \(t\) with an arbitrary weight \(e^{-W(t)}\), leading to the expression [23] \[\begin{align} \langle {\mathcal{O}}\rangle &= \frac{\int dt\,e^{-W(t)}\,\int_{\Sigma_t} dz_t\,e^{-S(z_t)}\,{\mathcal{O}}(z_t)}{\int dt\,e^{-W(t)}\,\int_{\Sigma_t} dz_t\,e^{-S(z_t)}} \nonumber\\ &\equiv \frac{\int_{{\mathcal{R}}} dt\,dz_t\,e^{-S(z_t)-W(t)}\,{\mathcal{O}}(z_t)}{\int_{{\mathcal{R}}} dt\,dz_t\,e^{-S(z_t)-W(t)}} \equiv \frac{Z_{{\mathcal{O}}}}{Z}. \label{wv95pi1} \end{align}\tag{23}\] The \((N+1)\)-dimensional integration region \({\mathcal{R}}\) is defined by \({\mathcal{R}}\equiv \{z_t\in\Sigma_t \,|\,t\in {\mathbb{R}}\}\), which we refer to the worldvolume, by regarding it as an orbit of an integration surface \(\Sigma_t\) in the target space \({\mathbb{C}}^N={\mathbb{R}}^{2N}\). The extension of \({\mathcal{R}}\) in the flow time direction can be effectively constrained to a finite interval \([T_0,T_1]\) by adjusting the functional form of \(W(t)\). The function \(W(t)\) has another role to lift configurations upwards (positive flow direction) so that they are distributed almost equally over different flow times. In fact, the force of molecular dynamics exerts configurations in the direction opposite to the flow [see, e.g., Eq. 7 with \(-2\overline{\partial V(z)} = -\,\overline{\partial S(z)}\) for the case of GT-HMC], and thus configurations have a tendency to precipitate towards the bottom (near \(\Sigma_0\)) if nothing is done. A possible form of \(W(t)\) is given in Sect. 4.5 (see Ref. [36] for a more detailed study).
Since multimodality becomes more severe at larger flow times, we take the lower bound \(T_0\) to be small enough such that there is no ergodicity problem on \(\Sigma_{T_0}\).17 The upper bound \(T_1\) is chosen such that oscillatory integrals are sufficiently tamed there.18 After global equilibrium is well established over \({\mathcal{R}}\), we estimate the expectation value \(\langle \mathcal{O} \rangle\) with sample averages using the configurations taken from a subinterval \([\tilde{T}_0,\tilde{T}_1]\) (\(T_0 \leq \tilde{T}_0 < \tilde{T}_1 \leq T_1\)), which is free from both of the sign problem at \(t\sim T_0\) and the possible complicated geometry at \(t\sim T_1\).19 With local coordinates \(x=(x^a)\) for \(\Sigma_t\) (see footnote 6), we introduce those of \({\mathcal{R}}\) as \(\hat{x}=(\hat{x}^\mu)=(\hat{x}^0=t,\hat{x}^a=x^a)\). Then, the induced metric on \({\mathcal{R}}\), \(d\hat{s}^2 = \hat{\gamma}_{\mu\nu}\,d\hat{x}^\mu d\hat{x}^\nu \equiv |dz(\hat{x})|^2\), takes the following form [23]: \[\begin{align} d\hat{s}^2 &= |(\partial_t z^i)\,dt + (\partial_{x^a}z^i)\,dx^a|^2 = |\xi^i dt + E_a^i dx^a|^2 = |\xi^i_n dt + (E_a^i dx^a + \xi_v^i dt)|^2 \nonumber \\ &= \alpha^2\,dt^2 + \gamma_{ab}\,(dx^a+\beta^a dt)\,(dx^b+\beta^b dt) \end{align}\] with the induced metric \(\gamma_{ab}\) on \(\Sigma_t\), the shift \(\beta^a\) and the lapse \(\alpha\,(>0)\) given by \[\begin{align} \gamma_{ab} &= \langle E_a, E_b \rangle, \\ \beta^a &= \gamma^{ab}\,\langle \xi, E_b \rangle, \\ \alpha^2 &= \langle \xi_n,\xi_n\rangle. \end{align}\] Here, \(E_a=(E^i_a=\partial z^i/\partial x^a)\) form a basis of \(T_z\Sigma_t\), and \(\xi=\overline{\partial S}\) is the flow vector, which is decomposed into the tangential and the normal components as \(\xi=\xi_v + \xi_n\) [see Eq. 13 ]. Note that \(\xi_v=E_a\,\beta^a \in T_z\Sigma_t\,(\subset T_z{\mathcal{R}})\) and \(\xi_n\in N_z\Sigma_t\cap T_z{\mathcal{R}}\). The invariant volume element on \({\mathcal{R}}\) is then given by \[\begin{align} |dz|_{\mathcal{R}}\equiv \sqrt{\hat{\gamma}}\,d\hat{x} = \alpha\sqrt{\gamma}\,dt\,dx = \alpha\,|dz_t|\,dt, \end{align}\] and \(Z_{\mathcal{O}}\) in Eq. 23 can be written as \[\begin{align} Z_{\mathcal{O}}= \int_{\mathcal{R}}|dz|_{\mathcal{R}}\,e^{-V(z)}\,{\mathcal{F}}(z)\,{\mathcal{O}}(z) \end{align}\] with \[\begin{align} V(z) &\equiv {\rm Re}\,S(z) + W(t(z)), \\ {\mathcal{F}}(z) &\equiv \frac{dt\,dz_t}{|dz|_{\mathcal{R}}}\,e^{-i\,{\rm Im}\,S(z)} = \alpha^{-1}\,\frac{dz_t}{|dz_t|}\,e^{-i\,{\rm Im}\,S(z)} = \alpha^{-1}\,\frac{\det E}{|\det E\,|}\,e^{-i\,{\rm Im}\,S(z)}. \end{align}\] Thus, by defining the reweighted average \(\langle\cdots\rangle_{\mathcal{R}}\) on \({\mathcal{R}}\) by \[\begin{align} \langle g(z) \rangle_{\mathcal{R}}\equiv \frac{\int_{\mathcal{R}}|dz|_{\mathcal{R}}\,e^{-V(z)}\,g(z)}{\int_{\mathcal{R}}|dz|_{\mathcal{R}}\,e^{-V(z)}}, \end{align}\] the expectation value 23 is expressed as a ratio of the reweighted averages: \[\begin{align} \langle {\mathcal{O}}\rangle = \frac{\langle {\mathcal{F}}(z)\,{\mathcal{O}}(z) \rangle_{\mathcal{R}}}{\langle {\mathcal{F}}(z) \rangle_{\mathcal{R}}}. \end{align}\]
Similarly to the GT-HMC algorithm, the reweighted averages \(\langle \cdots \rangle_{\mathcal{R}}\) can be written as a path integral over the phase space by rewriting the measure \(|dz|_{\mathcal{R}}=\sqrt{\hat{\gamma}}\,d\hat{x}\) to the form \[\begin{align} |dz|_{\mathcal{R}}= \sqrt{\hat{\gamma}}\,d\hat{x} \propto d\hat{x}\,d\hat{p}\,e^{-(1/2)\,\hat{\gamma}^{\mu\nu}\,\hat{p}_\mu \hat{p}_\nu}, \end{align}\] where \(d\hat{x}\,d\hat{p}\equiv \prod_\mu \bigl( d\hat{x}^\mu d\hat{p}_\mu \bigr)\) is the volume element of the phase space of \({\mathcal{R}}\) and \((\hat{\gamma}^{\mu\nu})\equiv (\hat{\gamma}_{\mu\nu})^{-1}\). We thus obtain the phase-space path integral in the parameter-space representation: \[\begin{align} \langle g(z) \rangle_{\mathcal{R}} = \frac{\int d\hat{x}\,d\hat{p}\, e^{-(1/2)\,\hat{\gamma}^{\mu\nu}\,\hat{p}_\mu \hat{p}_\nu - V(z(\hat{x}))}\,g(z(\hat{x}))}{\int d\hat{x}\,d\hat{p}\,e^{-(1/2)\,\hat{\gamma}^{\mu\nu}\,\hat{p}_\mu \hat{p}_\nu - V(z(\hat{x}))}}. \end{align}\] Note that the volume element can be expressed as \(d\hat{x}\,d\hat{p} = \hat{\omega}^{N+1}/(N+1)!\) with the symplectic 2-form \(\hat{\omega} \equiv d\hat{p}_\mu\wedge d\hat{x}^\mu\).
In Monte Carlo calculations, it is more convenient to rewrite everything in terms of the target space coordinates \(z=(z^i)\). To do this, we introduce the momentum \(\pi=(\pi^i)\) which is tangent to \({\mathcal{R}}\): \[\begin{align} \pi \in T_z{\mathcal{R}}~~~with~~ \pi^i \equiv \hat{p}^\mu \hat{E}_\mu^i \quad (\hat{p}^\mu\equiv \hat{\gamma}^{\mu\nu}\,\hat{p}_\nu,~ \hat{E}_\mu^i\equiv \partial z^i/\partial \hat{x}^\mu). \end{align}\] One then can show that the 1-form \[\begin{align} \hat{a}\equiv \langle\pi, dz\rangle = {\rm Re}\,\overline{\pi^i}\,dz^i \end{align}\] can be expressed as \(\hat{a}=\hat{p}_\mu d\hat{x}^\mu\), and thus we find that \(\hat{a}\) is a symplectic potential of \(\hat{\omega}\), \(\hat{\omega} = d\hat{a}\), and obtain the identity \[\begin{align} \hat{\omega} = {\rm Re}\,d\overline{\pi^i}\wedge dz^i. \end{align}\] Furthermore, noting the identity \[\begin{align} \langle \pi,\pi \rangle = \hat{\gamma}^{\mu\nu} \hat{p}_\mu \hat{p}_\nu \quad (\pi\in T_z{\mathcal{R}}), \end{align}\] we have the following target-space representation: \[\begin{align} \langle g(z) \rangle_{\mathcal{R}} = \frac{\int_{T{\mathcal{R}}}\, \hat{\omega}^{N+1}\,e^{-H(z,\pi)}\,g(z)}{\int_{T{\mathcal{R}}}\, \hat{\omega}^{N+1}\,e^{-H(z,\pi)}}. \end{align}\] Here, \(T{\mathcal{R}}\equiv \{(z,\pi)\,|\,z\in{\mathcal{R}},\,\pi\in T_z{\mathcal{R}}\}\) is the tangent bundle of \({\mathcal{R}}\), and \(H(z,\pi)\) is the Hamiltonian of the form \[\begin{align} H(z,\pi) = \frac{1}{2}\,\langle\pi,\pi\rangle + V(z) \end{align}\] with the (real-valued) potential \[\begin{align} V(z) = {\rm Re}\,S(z) + W(t(z)). \end{align}\]
In parallel to discussions for GT-HMC, the RATTLE [34], [35] for WV-HMC is given as follows [23]: \[\begin{align} \pi_{1/2} &= \pi - \Delta s\,\overline{\partial V(z)} - \lambda/\Delta s, \tag{24} \\ z' &= z + \Delta s\,\pi_{1/2}, \tag{25} \\ \pi' &= \pi_{1/2} - \Delta s\,\overline{\partial V(z')} - \lambda'/\Delta s. \tag{26} \end{align}\] Here, the Lagrange multipliers \(\lambda\in N_z{\mathcal{R}}\) and \(\lambda'\in N_{z'}{\mathcal{R}}\) are determined such that \(z'\in {\mathcal{R}}\) and \(\pi'\in T_{z'}{\mathcal{R}}\), respectively. This transformation satisfies the reversibility as in footnote 11. One can also show that this transformation is symplectic (\(\hat{\omega}' = \hat{\omega}\)) and thus volume-preserving (\(\hat{\omega}'^{N+1} = \hat{\omega}^{N+1}\)). One can further show that it preserves the Hamiltonian to \(O(\Delta s^2)\): \(H(z',\pi') = H(z,\pi) + O(\Delta s^3)\).
The gradient of the potential, \(\overline{\partial V(z)}\), can be set to the form [23] \[\begin{align} \overline{\partial V(z)} = \frac{1}{2}\,\Bigl[ \xi + \frac{W'(t)}{\langle \xi_n,\xi_n \rangle}\,\xi_n \Bigr]. \label{wv95force} \end{align}\tag{27}\] A simplified proof is given in Appendix 7.
As we see in the next subsection, in determining \(\lambda\) and \(\lambda'\), we repeatedly project a vector \(w\in T_z{\mathbb{C}}^N\) onto the tangent space \(T_z{\mathcal{R}}\) and the normal space \(N_z{\mathcal{R}}\): \[\begin{align} w = w_\parallel + w_\perp \quad \bigl(w_\parallel\in T_z{\mathcal{R}},~ w_\perp\in N_z{\mathcal{R}}\bigr). \label{wv95projector1} \end{align}\tag{28}\] This decomposition can be carried out by using the projection for \(\Sigma=\Sigma_t\) [see Eq. 13 and Algorithm 4]. In fact, let us decompose the vectors \(w\) and \(\xi\) into \[\begin{align} w &= w_v + w_n \quad \bigl(w_v\in T_z\Sigma_t,~ w_n\in N_z\Sigma_t\bigr), \\ \xi &= \xi_v + \xi_n \quad \bigl(\xi_v\in T_z\Sigma_t,~ \xi_n\in N_z\Sigma_t\bigr). \end{align}\] Then, \(w_\parallel\) and \(w_\perp\) are given by20 \[\begin{align} w_\parallel = w_v + c\,\xi_n,\quad w_\perp = w_n - c\,\xi_n \end{align}\] with \[\begin{align} c = \frac{\langle \xi_n, w_n\rangle}{\langle \xi_n,\xi_n \rangle}. \end{align}\]
The Lagrange multiplies \(\lambda\in N_z{\mathcal{R}}\) and \(\lambda'\in N_{z'}{\mathcal{R}}\) in Eqs. 24 –26 are determined as follows.
The condition that \(z'\in {\mathcal{R}}\) is equivalent to that \(z'\) can be written as \(z'=z_{t'}(x')\) with some \(t'\in{\mathbb{R}}\) and \(x'\in\Sigma_0\) (see Fig. 8).
Thus, finding \(\lambda\) satisfying Eqs. 24 and 25 for a given \(z=z_t(x)\in{\mathcal{R}}\) and \(\pi\in T_z{\mathcal{R}}\) is equivalent to finding a triplet \((h,u,\lambda)\) \((h\in{\mathbb{R}},\,u\in T_{x}\Sigma_0,\,\lambda\in N_z{\mathcal{R}})\) that satisfies \[\begin{align} z_{t+h}(x+u) = z_t(x) + \Delta z - \lambda \label{wv95newton1} \end{align}\tag{29}\] with \[\begin{align} \Delta z(z,\pi)\equiv \Delta s\,\pi - (\Delta s)^2\, \overline{\partial V(z)}. \end{align}\] Equation 29 can be solved iteratively with Newton’s method. There, one solves the following linearized equation in updating an approximate solution \((h,u,\lambda)\) as \(h\gets h+\Delta h\), \(u\gets u+\Delta u\) and \(\lambda\gets \lambda+\Delta\lambda\): \[\begin{align} \xi_{\rm new} \Delta h + E_{\rm new} \Delta u + \Delta \lambda = B, \label{wv95newton2} \end{align}\tag{30}\] where \(\xi_{\rm new}\equiv \partial z_{t+h}(x+u)/\partial h = (\partial z_{t+h}/\partial t)(x+u)\), \(E_{\rm new}\equiv \partial z_{t+h}(x+u)/\partial u = (\partial z_{t+h}/\partial x)(x+u)\), and \[\begin{align} B\equiv z + \Delta z - \lambda - z_{\rm new}\in {\mathbb{C}}^N \end{align}\] with \(z_{\rm new}\equiv z_{t+h}(x+u)\).
Equation 30 is proposed in Ref. [23], and can be solved with direct or iterative methods by regarding it as a linear equation of the form \(AX=B\) with respect to \(X = (\Delta h, \Delta u, \Delta \lambda)\). Instead of solving Eq. 30 , we here propose to use the simplified Newton equation (corresponding to the fixed-point method of Ref. [12] for the case of the projection onto a single thimble), where \(\xi_{\rm new}\) and \(E_{\rm new}\) on the left hand side are replaced by the values at \(z=z_t(x)\): \[\begin{align} \xi \Delta h + E \Delta u + \Delta \lambda = B. \label{wv95newton3} \end{align}\tag{31}\] This equation can be readily solved by using the projection introduced in Sect. 4.3. To see this, we introduce the decomposition of \(\xi\) and \(B\) as \[\begin{align} \xi &= \xi_\parallel \nonumber \\ &= E \xi_{0,v} + \xi_n, \tag{32} \\ B &= B_\parallel + B_\perp = (B_v + c_B\,\xi_n) + (B_n - c_B\,\xi_n) \tag{33} \nonumber \\ &= E B_{0,v} + c_B\,\xi_n + (B_n - c_B\,\xi_n) \end{align}\] with \[\begin{align} c_B \equiv \frac{\langle B, \xi_n \rangle}{\langle \xi_n, \xi_n \rangle}. \end{align}\] Meanwhile, the left hand side of Eq. 31 is decomposed as \[\begin{align} \xi \Delta h + E \Delta u + \Delta \lambda &= (E \xi_{0,v} + \xi_n)\,\Delta h + E \Delta u + \Delta \lambda \nonumber \\ &= E(\xi_{0,v}\,\Delta h + \Delta u) + \xi_n\,\Delta h + \Delta \lambda. \label{wv95lhs} \end{align}\tag{34}\] Comparing Eqs. 33 and 34 , we find \[\begin{align} \xi_{0,v}\,\Delta h + \Delta u &= B_{0,v}, \\ \Delta h &= c_B, \\ \Delta \lambda &= B_n - c_B\,\xi_n, \end{align}\] or equivalently,21 \[\begin{align} \Delta h = c_B, \quad \Delta u = B_{0,v} - c_B\,\xi_{0,v}, \quad \Delta \lambda = B_n - c_B\,\xi_n. \end{align}\]
Note that determining \(\lambda'\) in Eq. 26 such that \(\pi'\in T_{z'}{\mathcal{R}}\) is equivalent to projecting \(\tilde{\pi}' \equiv \pi_{1/2} - \Delta s\,\overline{\partial V(z')}\) onto \(T_{z'}{\mathcal{R}}\). Thus, \(\pi'\) is simply obtained from the decomposition \(\tilde{\pi}' = \tilde{\pi}'_\parallel + \tilde{\pi}'_\perp\) at \(z'\) as \(\pi' = \tilde{\pi}'_\parallel\).
Molecular dynamics from a configuration \((z,\pi) \in T{\mathcal{R}}\) is summarized in Algorithm 9.
We require configurations in molecular dynamics to be confined in the region \(T_0\lesssim t\lesssim T_1\). This can be realized by adjusting the function \(W(t)\), whose possible form, e.g., is (see Ref. [36])22 \[\begin{align} W(t) = \begin{cases} -\,\gamma(t-T_0) + c_0\,\bigl(e^{(t-T_0)^2/2d_0^2} - 1\bigr) & (t < T_0) \\ -\,\gamma(t-T_0) & (T_0 \leq t \leq T_1) \\ -\,\gamma(t-T_0) + c_1\,\bigl(e^{(t-T_1)^2/2d_1^2} - 1\bigr) & (t > T_1). \\ \end{cases} \end{align}\] Configurations then bounce off the walls placed at the lower boundary (\(t = T_0)\) and at the upper boundary (\(t = T_1\)) with penetration depths \(d_0\) and \(d_1\), respectively (\(c_0\) and \(c_1\) correspond to the heights at \(t=T_0-d_0\) and \(t=T_1+d_1\) with the gradients \(-\gamma - c_0/d_0\) and \(-\gamma + c_1/d_1\)). However, with a finite step size \(\Delta s\), some configurations may penetrate the wall so deeply that the resulting large repulsive force \(-W'(t)\,\overline{\partial t(z)}\) in \(-\overline{\partial V(z)}\) can lower the numerical precision. The simplest solution to this issue, keeping (1) exact volume preservation, (2) exact reversibility, and (3) approximate energy conservation, is to let such a configuration go back the way it just comes [23]. The algorithm will take the form of Algorithm 10.
We summarize in Algorithm 11 the WV-HMC method with the simplified RATTLE algorithm.
In this section, we perform a numerical test for the convergence of the simplified RATTLE algorithm that uses the simplified Newton method (fixed-point method) combined with iterative solvers for orthogonal decompositions of vectors, and show that the convergence depends on the system size only weakly. We give a discussion only for the GT-HMC algorithm. This is because the comparison of computational costs for different system sizes can be made more precisely if the flow time is fixed. Note that the computational cost with the WV-HMC algorithm are generally smaller than that with the GT-HMC algorithm. In fact, the computational costs for GT-HMC and WV-HMC are almost the same for a fixed flow time, and the flow times appearing in WV-HMC are smaller than the flow time set in GT-HMC. We also demonstrate that the computational cost of the simplified RATTLE algorithm is much less than the original algorithm [22] that uses the standard (non-simplified) Newton method.
We consider the complex scalar field theory at finite density, whose lattice action [5] is given by \[\begin{align} S(\phi) = \sum_n\Bigl[ (2d+m^2)\,|\phi_n|^2 + \lambda\,|\phi_n|^4 - \sum_{\nu=0}^{d-1}\bigl(e^{\mu\,\delta_{\nu,0}}\,\bar{\phi}_n \phi_{n+\nu} + e^{-\mu\,\delta_{\nu,0}}\,\bar{\phi}_{n+\nu} \phi_n \bigr) \Bigr]. \end{align}\] Here, \(\phi_n\) is the value at site \(n\) of a complex field \(\phi\) living on a \(d\)-dimensional periodic square lattice of size \(L^d\), and \(\mu\) is the chemical potential that makes the action complex-valued. We decompose \(\phi_n\) into the real and imaginary parts as \(\phi_n = \phi_{R,n} + i\,\phi_{I,n}\). Then, the set \(\Sigma_0 \equiv \{x=(\phi_{R},\phi_{I})\}\) is the configuration space whose real dimension is \(N=2\,L^d\), and we apply the WV/GT-HMC method following the prescriptions given in the preceding sections.23 In performing numerical tests, we set the physical parameters to \(d=2\), \(m=0.1\), \(\lambda=1.0\), \(\mu=0.5\), and vary the lattice size \(L^2\) from \(16^2\) to \(512^2\). The flow time is fixed at \(t=0.01\), and the molecular dynamics parameters are set to \(\Delta s = 0.02\) and \(N_{\rm step}=50\). Computations are performed with a fixed number of threads \((=8)\), and the flow equations 14 –16 are solved with DOPRI5(4) [40].
As discussed above, every iteration method utilizes the projection of a vector \(w\in T_z{\mathbb{C}}^N\) onto the tangent space \(T_z\Sigma\) and the normal space \(N_z\Sigma\) [Eq. 13 ]. The dominant part in the computation is the inversion of the linear problem \(A w_0 = w\) to find \(w_0\in T_x{\mathbb{C}}^N\) for a given \(w\in T_z{\mathbb{C}}^N\), and we use the BiCGStab method to solve this equation. Figure 12 shows the history of relative errors in BiCGStab, from which we find that the system size dependence of the convergence is very weak. Figure 13 is the elapsed time to solve the linear equation. The iteration is terminated when the relative error falls below a prescribed tolerance \((= 10^{-10})\). The statistical errors are estimated from ten \(w\)’s randomly generated in \(T_z{\mathbb{C}}^N\) with a fixed \(z\). From the figure, the computational cost is expected to be in the range \(O(N)\sim O(N \log N)\).
Figure 14 is the elapsed time for a given \((z,\pi)\in T\Sigma\) to solve Eq. 18 with respect to \((u,\lambda)\) using the simplified Newton equation (Algorithm 6) with iterative solvers for orthogonal decompositions of vectors. The physical and molecular dynamics parameters are the same as above. The iteration is terminated when \(|B|\) \((B=z+\Delta z-\lambda-z_{\rm new})\) [Eq. 20 ] falls below a prescribed absolute tolerance \((= 10^{-10})\). The statistical errors are estimated from ten \((z,\pi)\)’s randomly generated in \(T\Sigma\). This shows that the computational cost would be in the range \(O(N)\sim O(N\,(\log N)^2)\). The computational cost scaling is studied in more detail in a subsequent paper [36].
In this subsection, we compare the convergence of two algorithms for solving Eq. 18 . One is the simplified RATTLE algorithm, where the simplified Newton method (fixed-point method) is used [Eq. 21 ] along with iterative solvers for orthogonal decompositions of vectors (Algorithm 4 with iterative solvers). The other uses the standard Newton method (as in Ref. [22]) with the following equation [Eq. 19 ]: \[\begin{align} &E_{\rm new} \Delta u + \Delta \lambda = B \nonumber \\ & \left( \begin{array}{l} B =~ z + \Delta z - \lambda - z_{\rm new}\in {\mathbb{C}}^N, \\ z_{\rm new}=z_t(x+u), \quad E_{\rm new} = (\partial z_t/\partial x)(x+u) \end{array} \right). \label{gt95newton295app} \end{align}\tag{35}\] Introducing \(\Delta\lambda_0\equiv F^{-1}\Delta \lambda\), we solve Eq. 35 with respect to \(X=(\Delta u, \Delta \lambda_0)\) iteratively (\(\Delta\lambda\) is then obtained as \(\Delta\lambda=F \Delta\lambda_0\)). To do this, we first define an \({\mathbb{R}}\)-linear map \(\tilde{A}:\,T_{x+u}\Sigma_0 \oplus N_x\Sigma_0 \ni w_0 \mapsto w\in T_{z_{\rm new}}\Sigma \oplus T_z\Sigma\) by \[\begin{align} w = \tilde{A}w_0 = E_{\rm new}v'_0 \oplus F n_0 ~~for~~ w_0 = v'_0 \oplus n_0, \end{align}\] where \(E_{\rm new}v'_0 \in T_{z_{\rm new}}\Sigma\) and \(F n_0 \in N_z\Sigma\) are obtained from \(v'_0\in T_{x+u}\Sigma_0\) and \(n_0\in N_x\Sigma_0\), respectively, by integrating two flow equations from \(x+u\) to \(z_{\rm new}=z_t(x+u)\) and from \(x\) to \(z=z_t(x)\). Now Eq. 35 takes the form \(\tilde{A} X=B\) and can be solved iteratively by using the BiCGStab method as \[\begin{align} \Delta u = {\rm Re\,}X,\quad \Delta \lambda_0 = i\,{\rm Im\,}X \quad [and thus~ \Delta \lambda = F\,(i\,{\rm Im\,}X) \,(= i\,E\,{\rm Im\,}X)]. \end{align}\]
Figure 15 shows the convergence of the iteration for two algorithms to solve Eq. 18 on the lattice of size \(32\times 32\) for ten different configurations \((z,\pi)\in T\Sigma\) used in Fig. 14. We see that the iteration of the simplified method (orange lines) rapidly converges almost independently of configurations, while the original method (blue lines) requires many iterations to converge and the convergence varies in very different ways depending on configurations.
Figure 16 shows the elapsed times for two algorithms on the lattice of size \(L^2\) with \(L=8,\,12,\,16,\,24,\,32\) for ten different configurations. We see that both exhibit the computational cost scaling \(\sim O(N)\), but the simplified algorithm is much faster than the original algorithm. The rate of improvement has large deviations, reflecting the serious dependence of the original algorithm on configurations.
We have developed a simplified algorithm for the GT-HMC and the WV-HMC methods, by adopting a simplified Newton method (worldvolume version of the fixed-point method of Ref. [12]) in determining the Lagrange multiplies of RATTLE with iterative solvers for orthogonal decompositions of vectors. Using as a benchmark model the complex scalar field theory at finite density,24 we performed a numerical test for the convergence of the simplified RATTLE algorithm. We found that the convergence depends on the system size only weakly and the computational cost is nearly \(O(N)\) at each molecular dynamics step.
In subsequent papers [36], [37], we apply the current algorithm to various quantum field theories. The target models of the WV-HMC method can be classified into two categories. The first consists of models whose action is purely local, for which the Hessian \((H_{ij}=\partial_i\partial_j S)\) appearing in flow equations are sparse matrices. A typical example is the complex scalar field theory at finite density, and will be studied in Ref. [36]. The other treats models whose bosonized actions include nonlocal terms such as the logarithm of the fermion determinant. A study of dynamical fermion systems with the WV-HMC method will be made in Ref. [37].
Models discussed above have configuration spaces of flat geometry. Actually, the WV-HMC method can also be generalized to models whose configuration spaces are group manifolds. This will be discussed in Ref. [38].
Besides applying the WV-HMC method to various models, we believe that it is also important to keep improving the algorithm itself. It should be interesting to combine the WV-HMC algorithm with other methods towards solving the sign problem, such as the complex Langevin method and/or the tensor network method. It is also interesting to incorporate machine learning techniques in order to further reduce the computational cost.
One of the most important projects in the near future is to develop the Monte Carlo algorithm to study the real-time dynamics of quantum many-body systems (see, e.g., Refs. [41]–[44] for attempts based on the generalized thimble method). A study based on the WV-HMC is now in progress and will be reported elsewhere.
The author thanks Sinya Aoki, Ken-Ichi Ishikawa, Issaku Kanamori, Yoshio Kikukawa, Nobuyuki Matsumoto and Naoya Umeda for valuable discussions, and especially Yusuke Namekawa for collaboration and comments on the manuscript. This work was partially supported by JSPS KAKENHI (Grant Numbers 20H01900, 23H00112, 23H04506) and by MEXT as “Program for Promoting Researches on the Supercomputer Fugaku” (Simulation for basic science: approaching the new quantum era, JPMXP1020230411).
We start from the expression \[\begin{align} \overline{\partial V} = \frac{1}{2}\,\overline{\partial S} + W'(t)\,\overline{\partial t} = \frac{1}{2}\,\xi + W'(t)\,\overline{\partial t}. \end{align}\] We decompose \(\overline{\partial t} \in T_z{\mathbb{C}}^N\) at \(z\in \Sigma_t \,(\subset {\mathcal{R}})\) into the form \[\begin{align} \overline{\partial t} = (\overline{\partial t})_\parallel + (\overline{\partial t})_\perp = v + c\,\xi_n + (\overline{\partial t})_\perp \end{align}\] with \(v=v^a E_a\in T_z\Sigma_t\,(\subset T_z{\mathcal{R}})\). Note that \(\xi_n\in N_z\Sigma_t\cap T_z{\mathcal{R}}\) and \((\overline{\partial t})_\perp\in N_z{\mathcal{R}}\). In the following, we will show that (1) \(v=0\) and (2) \(c=1/(2\,\langle \xi_n,\xi_n \rangle)\). This completes the proof because \((\overline{\partial t})_\perp\in N_z{\mathcal{R}}\) can be absorbed into \(\lambda\) in Eq. 24 .
(1) For \(\forall z\in\Sigma_t\), \(\forall u=u^a E_a\in T_z\Sigma_t\) and an infinitesimally small \(\epsilon\), we have \[\begin{align} t(z,\bar z) = t(z+\epsilon u, \bar z+\epsilon \bar u) = t(z,\bar z) + \epsilon\,[u^i\,\partial_i t + \overline{u^i}\,\overline{\partial_i t}] + O(\epsilon^2), \end{align}\] and thus, \[\begin{align} 0 = u^i\,\partial_i t + \overline{u^i}\,\overline{\partial_i t} = 2\,\langle u,\overline{\partial t} \rangle = 2\,\langle u,v \rangle = 2\,u^a \gamma_{ab} v^b. \end{align}\] This means that \(v=0\) due to the nondegeneracy of \(\gamma_{ab}\).
(2) By using the orthogonality, \(c\) is given by \[\begin{align} c = \frac{\langle \xi_n, \overline{\partial t} \rangle}{\langle \xi_n,\xi_n \rangle}. \end{align}\] Here, noting that \[\begin{align} 1 = [t(z,\bar z)]^\centerdot = \dot{z}^i\,\partial_i t + \overline{\dot{z}^i}\,\overline{\partial_i t} = \xi^i\,\partial_i t + \overline{\xi^i}\,\overline{\partial_i t} = 2\,\langle \xi,\overline{\partial t} \rangle = 2\,\langle \xi_n,\overline{\partial t} \rangle \quad(\because v=0), \end{align}\] we have \(\langle \xi_n,\overline{\partial t} \rangle = 1/2\). We thus obtain \(c = 1/(2\,\langle \xi_n,\xi_n \rangle )\).
E-mail address: fukuma@gauge.scphys.kyoto-u.ac.jp↩︎
The author thanks the referee for the comment on the first version of the manuscript, making him aware that the simplified Newton method considered here is the same as the fixed-point method of Ref. [12].↩︎
If we further introduce the anti-thimble \({\mathcal{K}}_\sigma\) as the union of orbits flowing into \(\zeta_\sigma\), the coefficient \(n_\sigma\) is expressed as the intersection number between the original surface and the anti-thimble, \(n_\sigma = \langle \Sigma_0, {\mathcal{K}}_\sigma \rangle\) [9].↩︎
A HMC algorithm with RATTLE was first introduced to the Lefschetz thimble method in a monumental paper by the Komaba group [12], where sampling is done directly on a single thimble.↩︎
The GT-HMC algorithm is treated in Ref. [22] as part of the TLT method and is combined with the parallel tempering algorithm with the flow time as the tempering parameter. The presentation below follows this reference.↩︎
A canonical choice of \(x\) is initial configurations of the flow, but they can also be set to vectors in the tangent space at a point on \(\Sigma\) as in Refs. [12] and [16] . ↩︎
We have used the identity \({\rm Im}\,E_a^\dagger E_b = 0\) [12], which holds when the original configuration space \(\Sigma_0\) is flat.↩︎
A more precise expression is \(H(z,\bar z,\pi,\bar \pi)=(1/2)\langle \pi,\pi\rangle + V(z,\bar z)\), but we abbreviate it as in the text to simplify expressions.↩︎
Note that \(\overline{\partial V(z)} = (1/2)\,\overline{\partial S(z)}\) because \(V(z) = {\rm Re}\,S(z) = (1/2)\,[S(z)+\overline{S(z)}]\).↩︎
In fact, for any vector \(v\in T_z\Sigma\), we have \[\begin{align} \langle \lambda,v \rangle = \lambda_r\, {\rm Re}\,\langle \overline{\partial\phi^r},v \rangle = (\lambda_r/2)\,(v\cdot\partial+\bar{v}\cdot\bar\partial)\phi^r = (\lambda_r/2)\,\lim_{\epsilon\to 0}(1/\epsilon)[\phi^r(z+\epsilon v)-\phi^r(z)] = 0. \nonumber \end{align}\]↩︎
If \((z,\pi)\to(z',\pi')\) is a motion, so is \((z',-\pi')\to(z,-\pi)\) with \(\lambda\) and \(\lambda'\) interchanged. ↩︎
Note that \(\langle \lambda,\pi \rangle = \langle \lambda',\pi' \rangle = 0\), \(\lambda=O(\Delta s^2)\), \(\lambda'=O(\Delta s^2)\) and \(\lambda-\lambda'=O(\Delta s^3)\).↩︎
Equation 15 is obtained from another flow equation of type 14 , \((z+\epsilon v)^\centerdot = \overline{\partial S(z+\epsilon v)}\) with an infinitesimal parameter \(\epsilon\). Then, Eq. 16 is deduced from the condition that \(\langle v,n\rangle^\centerdot = 0\).↩︎
We discuss in Sec. 5.1 the convergence of iteration to solve Eq. 18 when using the original equation 19 with iterative solvers.↩︎
With knowledge of the explicit form of the Jacobian \(E=(E^i_a)\) [requiring the computational cost of \(O(N^3)\)], they can be written as \(\Delta u = {\rm Re}(E^{-1}B)\) and \(\Delta \lambda = F\,(i\, {\rm Im}(E^{-1}B)) = i\,E\, {\rm Im}(E^{-1}B)\). \({\rm Re}(E^{-1}B)\) and \({\rm Im}(E^{-1}B)\) correspond to Eqs. (3.21) and (3.22) in Ref. [12], respectively.↩︎
One needs to compute the phase of the Jacobian determinant, \(dz/|dz| = \det E/|\det E|\), upon measurement, of which the direct computation costs \(O(N^3)\). However, the phase can be evaluated by using a stochastic estimator, for which the computational cost is reduced to \(O(N\times N_R)\), where \(N_R\) is the number of independent Gaussian noise fields [39].↩︎
When the system already has an ergodicity problem on the original integration surface \(\Sigma_{t=0}\), we further implement other algorithms to reduce the problem or use \(T_0\) of a negative value [17].↩︎
By using the GT-HMC, one can set a criterion for the choice of \(T_1\), e.g., that the average phase factor \(\langle e^{-i{\rm Im}\,S(z)}\,dz/|dz| \rangle_{\Sigma_{T_1}}\) is not zero at least to two standard deviations.↩︎
The subinterval for estimation, \([\tilde{T}_0,\tilde{T}_1]\), is determined by the condition that the estimate of \({\mathcal{O}}\) only varies within small statistical errors against small changes of the subinterval [23]. The set of configurations in the subregion \(\tilde{\mathcal{R}}\equiv \{z\in\Sigma_t\,|\,t\in [\tilde{T}_0,\tilde{T}_1]\}\) can also be regarded as a Markov chain, so that the standard statistical analysis method (such as Jackknife) can also be applied [24].↩︎
One can easily show that \(\langle \xi_n,w_\perp \rangle=0\) and \(\langle w_\parallel, w_\perp \rangle = 0\).↩︎
If we set the initial guess to \(h=0\), \(u=0\) and \(\lambda=0\), then the first run in the iteration gives the result \[\begin{align} h = c_{\Delta z},\quad u = (\Delta z)_{0,v} - c_{\Delta z}\,\xi_{0,v},\quad \lambda = (\Delta z)_n - c_{\Delta z}\,\xi_n \quad (approximate solution) \nonumber \end{align}\] with respect to the decompositions \(\xi = E \xi_{0,v} + \xi_n\) [Eq. 32 ] and \[\begin{align} \Delta z = E (\Delta z)_{0,v} + c_{\Delta z}\,\xi_n + [(\Delta z)_n - c_{\Delta z}\,\xi_n] \quad \Bigl( c_{\Delta z} \equiv \frac{\langle \Delta z, \xi_n \rangle}{\langle \xi_n, \xi_n \rangle} \Bigr). \nonumber \end{align}\]↩︎
\(\gamma\,(>0)\) is the gradient of the tilt that lifts configurations upwards (positive flow direction). If this simple form is not enough for configurations to distribute almost equally over different flow times, one may resort to the multicanonical algorithm to tune \(W(t)\), as employed in Ref. [23].↩︎
A detailed investigation of the model with the simplified GT/WV-HMC algorithm is carried out in Ref. [36].↩︎