October 01, 2025
Computing condensed phase spectra from atomistic simulations requires calculating correlation functions from molecular dynamics and can be very expensive. A totally general, data-driven method to reduce cost is to employ an exact rewriting to a generalized master equation characterized by a memory kernel. The decay time of the kernel can be less than the original function, reducing the amount of data required. In this paper we construct the minimal projection operator to predict vibrational sum-frequency generation spectra and apply it to the air-water interface simulated using ab initio molecular dynamics. We are able to obtain a modest reduction in cost of just under 50%. We explore various avenues to use more of the available data to expand the projector in an attempt to reduce the cost further. Interestingly, we are not able to effect any change by including quadrupoles, inter-molecular couplings, or a depth-dependence. How to strategically go about maximally reducing cost using projection operators remains an open question.
It is a central goal of chemical physics to link theoretically calculated microscopic dynamics with experimentally measured macroscopic observables, perhaps the most important being optical spectra. From the simulation side, one of the persistent challenges for molecular dynamics approaches in accessing macroscopic quantities is that of computational cost. New algorithms to evolve the equation of motion (EOM) address the enduring objective of extending the feasible computational timescale for systems of a given size, whether the forces be coarse-grained,[1], [2] atomistic forcefield (FFMD),[3] ab-initio Born-Oppenheimer[4] (AIMD) or non-adiabatic,[5] ring-polymer quantum analogue,[6] or otherwise. Indeed, one of the guiding principles in choosing a particular level of the computational hierarchy is that the characteristic timescales of the phenomenon of interest can be captured within the window available. For condensed phase spectroscopy, inherently quantum observables like the transition dipole moment must be measured over correlation times involving heavier, nuclear degrees of freedom that we might otherwise want to treat as classical, creating a tension between what is desired and what is affordable.
This problem of affordable timescale is intrinsic to the study of quantum processes in condensed media, and so there are a wealth of strategies available with which to attack it. As one example, polaron forming systems that exhibit coupling between electronic and quantum-nuclear motion can exhibit dynamics over many 10s or 100s of picoseconds,[7] far beyond what is typically accessible to fully quantum dynamical approaches. One approach to bridge regimes is to move to new methods that trade accuracy (and sometimes clarity) for efficiency: in this example, semiclassical approaches like surface hopping[8] or embedding methods such as QM/MM.[9] Another line of attack is to map to a model system that reduces the dimensionality of the problem whilst being parameterized by data at the higher level of theory or experiment.[10] A related route is to parameterize the forces (or other observables, like the total current[11]) directly using machine learning, particularly neural network potentials that can, for example, allow AIMD accuracy at FFMD cost with a relatively small set of reference calculations.[12] In this paper we employ the generalized master equation (GME), which is a model-free way to compute correlation functions at possibly much reduced cost. Here, we will use only the natural outputs of AIMD simulation. However, our approach to the method is completely date-driven, meaning the dynamics could in principle be obtained at any level of theory, with any of the aforementioned methods, and it can even be extended to treat spatial finite-size effects.[13]
The GME describes the coupled EOMs for a set of correlation functions.[14] It is obtained in three steps: by choosing some ‘observables’1 \(|\boldsymbol{A})\), using the projection operator \(\mathcal{P} =|\boldsymbol{A})(\boldsymbol{A}|\boldsymbol{A})^{-1}(\boldsymbol{A}|\) to define a generalized Langevin equation of motion for some (in principle different) observables \(|\boldsymbol{B})\), and then forming correlation functions by taking the inner product with a third set of observables \(|\boldsymbol{D})\). When all observables are contained within \(|\boldsymbol{A})\), the expression[18]–[20] \[\label{eq:MNZ} \dot{\boldsymbol{ \mathcal{C} }}(t) = \dot{\boldsymbol{ \mathcal{C} }}(0)\boldsymbol{ \mathcal{C} }(t) - \int_0^t \!\textrm d{s} \, \boldsymbol{ \mathcal{K} }(s)\boldsymbol{ \mathcal{C} }(t-s)\tag{1}\] is a formally exact, non-Markovian EOM for the matrix of correlation functions \(\boldsymbol{ \mathcal{C} }(t)\equiv(\boldsymbol{A}(0)|\boldsymbol{A}(t))\). The observables could be classical or quantum mechanical.[14] The dynamics of \(\boldsymbol{ \mathcal{C} }(t)\) are completely determined by its initial condition \(\dot{\boldsymbol{ \mathcal{C} }}(0)\) and the memory kernel, \(\boldsymbol{ \mathcal{K} }(t)\). Efficiency gains are possible when \(\boldsymbol{ \mathcal{K} }(t)\) is a decaying function with some characteristic lifetime \(\tau_K\) that is less than the times of \(\boldsymbol{ \mathcal{C} }(t)\) we wish to reach. Since \(\boldsymbol{ \mathcal{K} }(t')\) can be extracted from \(\boldsymbol{ \mathcal{C} }(t\leq t')\) by numerical inversion,[21] accurate time series data for \(\boldsymbol{ \mathcal{C} }(t)\) up to \(\tau_K\) allow access to \(\boldsymbol{ \mathcal{C} }(t>\tau_K)\) at no reduced accuracy and for (comparatively) trivial cost. In the best case scenario the matrix \(\boldsymbol{ \mathcal{C} }(t)\) is generated by a Markov process and \(\boldsymbol{ \mathcal{K} }(t)\sim \delta(t)\): the exponential decays of \(\boldsymbol{ \mathcal{C} }(t)\) (with some longest lifetime which can be arbitrarily large depending on the physics of the system) are completely described by the first time step \(\boldsymbol{ \mathcal{C} }(\Delta t)\).[22] In many practical cases in both quantum dynamics[11], [23]–[25]\(^,\)2 and biophysics,[26]–[28] a large separation of timescales has allowed cost reduction of many orders of magnitude.
Unfortunately for the applicability of the GME method in computational spectroscopy, memory kernels extracted from atomistic MD are often found to be as long (or even much longer) lived than the timescales of the correlation functions they parametrise.[29]–[35] Indeed, this computationally unfavourable regime \(\tau_K > \tau_ \textrm{eq}\) arises in applications of the GLE more generally, e.g. to time series describing local weather conditions.[36] The reason for this is that the memory kernel’s lifetime is a measure of the slowest timescale within the complementary space of \({ \mathcal{P} }\), and so acceleration of computation using a non-Markovian EOM to bypass full propagation can only3 be successful if the projection operator contains both the slowest timescales of the system and the observable of interest. The observable of interest does not itself have to be the slowest timescale. Hence, a projection that includes many degrees of freedom could result in an efficiency gain while a more minimal, obvious choice may lead to none. Ultimately, the definition of the projection operator is a choice made by the practitioner, informed by physical intuition and experience of what the dynamical eigenspectrum might look like.[14]\(^,\)4 The central question this study will address is therefore, “Do GMEs constructed with spectroscopic correlation functions from AIMD calculations have short-lived memory kernels and, if not, what can be done to amend the projection operator so that they do?”
In what follows, we will look to employ the GME approach in calculating vibrational sum-frequency generation (VSFG) spectra from AIMD simulations. To keep things simple, computationally, and because it has been extensively studied,[16], [39]–[49] we will take our system to be the air-water interface. The signals we seek to predict are particular directional combinations of the polarizability-dipole correlation,[50], [51] \[\label{eq:corr952nd95vsfg} \chi^{(2)}_{ijk}(\omega) \simeq X(\omega) \int \!\textrm d{t} \, \textrm{e} ^{ \mathrm{i}\omega t} \langle \alpha_{ij}(0)\mu_k (t) \rangle,\tag{2}\] where \(\alpha_{ij}(\boldsymbol{R}) = \Delta\mu_i(\boldsymbol{R}) / E_j\) is computed as a finite difference of dipole moments in the presence and absence of a small external electric field that serves to induce a distortion in the density \(\rho(\boldsymbol{r};\boldsymbol{R})\) for fixed nuclei \(\boldsymbol{R}\).[41] This amounts to taking the resonant term in the full expression for the second order response (see Appendix 5). At this level of theory the nuclei are taken to be classical, which is justified by replacing the exact Kubo-transformed correlation function with its classical counterpart, choosing prefactor to be \(X(\omega)= \mathrm{i}\beta\omega/2\).[52] Therefore we can use Born-Oppenheimer based MD with explicit electronic density to calculate the correlation functions.
We pre-requilibrated a \(17\times17\) Å slab of \(171\) waters using SPC/E[53] in GROMACS,[54] the \(z\) direction was taken as \(L=57\) Å to avoid image interactions in the Ewald sum.[3] We then ran AIMD[55], [56] with the PBE functional[57] plus D3 vdW correction[58] using CP2K[59] at \(350\) K. We ran \(2000\) steps of \(0.5\) fs under massive thermostat to re-equilibrate and then \(104,000\) steps under standard CSVR[60] with a time constant of \(100\) fs. The finest plane wave cutoff was \(350\) Ry, with \(4\) grids used at a relative grid ratio of \(40\) Ry. The basis set was DZVP-MOLOPT-SR-GTH with the GTH pseudopotential.[61], [62] We post-processed the trajectory by printing the electron density every \(8\) time steps – chosen as the maximum stride that would still allow \(\omega_ \textrm{max}\) to include the O–H stretching region – as well as under an imposed electric field of \(0.005\) a.u. in the \(x\), \(y\), and \(z\) directions separately. The four time series were input to the open-source TRAVIS code[63] for radical Voronoi partitioning,[64] which calculated atomic contributions to the electronic density and dipole (and quadrupole);[65] units are e and e pm respectively. Atomic cells bore net charge (oxygen around \(-1.4\) e and hydrogen around \(0.7\) e), so when values were combined into molecules their origins were re-referenced, and to do this we chose their common centre of mass. Molecules were on average neutral with FWHM of around \(0.04\) e; we assumed molecules to be neutral when computing cross-correlations. To correctly account for the inversion symmetry in our slab system, polarizabilities were calculated using a switching function that accounts for the surface normal and also discards contributions close to the central slab layer to reduce noise.[47]
From these simulation data we computed the correlation functions, averaging windows of length \(1024\) steps with half-overlapping steps of length \(512\). We always worked with time-derivatives of the observables and analytically pre-multiplied by \(1/( \textrm{i} \omega)^2\).[43] Cross terms between molecules were spatially truncated[42], [66] at \(5\) Å[43] using a neighbour list constructed at the start of the averaging window and appropriate for periodic boundaries as implemented in scipy.spatial.cKDTree. When Fourier transforming to give the spectra: the functions were zero-padded by a factor \(3\), mirrored, and windowed using a Hann function; the sinc correction for finite time step was applied to get the correct ratio of band intensities. When filtering the raw MD data, a piecewise function that interpolates between \(1\) and \(0\) using the Hann window over 10 steps was employed.
For robustness to noise, memory kernels were extracted using the derivative-free TTM [67] numerical approach to inverting the convolution integral. The extracted memory object \(\boldsymbol{ \mathcal{T} }_t\) equals \(\boldsymbol{ \mathcal{K} }(t)\) in the limit \({\Delta t\rightarrow 0}\). We find these to be in close quantitative agreement and full qualitative agreement with the memory kernels extracted using traditional Volterra methods when comparison is possible. To not unduly complicate the paper, we refer to the memory object as the kernel, or \(\boldsymbol{ \mathcal{K} }(t)\) below. We note there is some controversy about the precise details of the integral discretization in the context of evaluating path integrals with Trotterized time steps.[68], [69] Here the kernel is parametrized from the exact data at the larger time step, and no other equations are involved.
We briefly comment on the observation that we do not reproduce the \(3600\) cm\(^{-1}\) shoulder in the \(\chi_{z\parallel\parallel}^{(2)}(\omega)\) spectrum. Absence of this shoulder was shown[43] to be an artifact introduced by using the (ss)VVCF[44], [45] approach. We do not make any of the same approximations, so it is perhaps slightly puzzling at first sight. The most obvious explanation for this is incomplete sampling, but Ref. [43] can still distinguish the feature with only \(20\) ps of data. Whilst several computational convergence settings differ, we assign this change in lineshape to our chosen elevated temperature (to avoid glassiness in PBE water). Our \(350\) K is higher than Ref. [43] (\(320\) K), Ref. [44] (\(330\) K), or Ref. [47] (\(300\) K), and it is precisely over this range that the shoulder feature has been shown to disappear when the temperature-dependence was studied[46] using the MB-pol energy function.
In generalized Langevin approaches, practitioners often focus on projecting the position and momentum of a particular type of atom, molecules, organism, etc., or some non-linear function of them.[14], [21] The target is something like the crossing of a barrier, the rotation or formation of a bond, a mean distance travelled, and so forth.[32], [35], [37], [70], [71] It is equally possible to target a collective property of many molecules,[11], [29], [72] as is the case for the susceptibility in Eq. 2 . Which coordinates of the full phase space are effectively represented is now unclear and, as such, the minimal amount of information required to obtain \(\chi_{ijk}^{(2)}(\omega)\) may not be the choice with the best gain in efficiency. Indeed it is possible that computing additional observables is required, but in what follows we limit ourselves to those directly accessible from the simulations needed to compute VSFG spectra through the standard route,[52] as this represents a clear-cut case of when the GME method can provide improvements in efficiency. Further work will be needed to determine if the results of parallel simulations could be included; the savings would have to be significant to justify running additional AIMD trajectories.
To begin we define the minimal projection operator needed to compute the 3 spectra \(\chi^{(2)}_{\parallel \parallel z}(\omega)\), \(\chi^{(2)}_{\parallel zz}(\omega)\), and \(\chi^{(2)}_{zzz}(\omega)\), where \(x,y \rightarrow \parallel\) is the average of the two in-plane directions which are equivalent by symmetry. We do not compute \(\chi^{(2)}_{z \parallel z}(\omega)\), which previous work[47] suggests is the same as \(\chi^{(2)}_{\parallel zz}(\omega)\), but appears numerically distinguishable in our calculation.5 Therefore \[\label{eq:proj95vsfg} A \in \lbrace \mu_\parallel, \mu_z, \alpha_{\parallel \parallel}, \alpha_{\parallel z} , \alpha_{z z} \rbrace\tag{3}\] defines a matrix \((\boldsymbol{A}(0)|\boldsymbol{A}(t))\) where the upper \(2\times2\) is block-diagonal with both elements giving the IR spectrum (of the slab), the lower \(3\times3\) contains information required for the different polarizations of the Raman spectrum, and the (lower) coupling \(3\times 2\) block contains the VSFG spectra. For the numerics, the matrix that is described by the GME in Eq. 1 is further rotated so that \(\boldsymbol{ \mathcal{C} }(0)=\mathbb{I}\) with the normalizing factor \((\boldsymbol{A}|\boldsymbol{A})^{-1}\), which means \((\boldsymbol{A}(0)|\boldsymbol{A}(0))\) has to be invertible, and indeed we find that it is.
In Figure 1 we present \(\boldsymbol{ \mathcal{C} }(t)\) and its accompanying \(\boldsymbol{ \mathcal{K} }(t)\) with y-limits chosen to emphasise the behaviour approaching \(\tau_K\) and \(\tau_\mathrm{eq}\). Unlike the earlier cited examples of Ref. [29]–[35], in our data we find that the kernel is clearly shorter lived than the corresponding \(\boldsymbol{ \mathcal{C} }(t)\). Indeed, for the diagonal elements \(\mathcal{K} _{ii}\rightarrow 0\) much faster than the corresponding element of the correlation function. However, the off-diagonal entries appear to have equivalent lifetimes. Disappointingly then, we might therefore expect no gain in efficiency. To decide if there is any reduction in cost, we compose an error metric. Any metric must ultimately be judged by comparing a ‘converged’ result with ‘predicted’ spectra of varying cutoff, but what numerical value constitutes an acceptable threshold is difficult to quantify. The ‘goodness’ of the resulting spectrum is some combination of peak positions, line shape, and band intensity ratios that will depend on exactly for what purpose the scientist wishes to use the result. Nevertheless, we present a quantitative measure of error in Fig. 3.
By comparing the deviations from our converged result, which uses an amount of data given by the dotted grey line, we can inspect the form of the curves to work backwards to a reasonable earliest cutoff. The GME result at that cutoff can then be compared to the ‘underconverged’ spectrum that would have resulted if the cutoff had been chosen at this earlier time. The underconverged deviations are also plotted, such that when the GME has a smaller error for a particular cutoff, it represents a cost saving. If we define the lifetime to be when the function falls below some sufficiently small value, \(| \mathcal{C} _{ij}(t>\tau_\mathrm{eq})| < \epsilon~\forall~i,j\) and \(| \mathcal{K} _{ij}(t>\tau_K)| < \epsilon~\forall~i,j\) where \(\epsilon = 0.015\), we find the GME method does improve accuracy for correlation functions of duration less than the converged result’s, but efficiency savings are charitably only a factor of 2. The corresponding spectra to be compared visually are presented in Fig. 2. For the IR spectrum the disagreement mainly stems from the inaccurate band intensity ratio, whilst in the VSFG there is slightly more error in lineshape of the stretching region; the 1650 cm\(^{-1}\) band is still in error, but it is relatively much less intense in the second-order spectrum.
This result is quite unexpected, being neither the negative result of a longer-lived memory kernel, nor the massive savings that have been achieved in other contexts. A factor of two saving is pleasant – we emphasize there is no significant overhead to using the GME-based approach since all elements of the projector are also necessary to compute the spectrum in the established way – but we can aspire to more. Our objective therefore shifts to trying to find what, if anything, can be done to improve the efficiency gain. In so doing we also wish to understand why it is that a minimal GME fails to achieve significant gains in this system.
The effect of changing the projector on the resulting GME efficiency is difficult to anticipate when working with collective variables. To interrogate the inclusion or omission of observables we must develop an intuition for how the underlying timescales are feeding through into our projected degrees of freedom.[73]
To start, we could step back from the VSFG problem and ask a simpler question: what effect does including the polarizibility data \(\alpha_{ij}(t)\) have on the efficiency of computing the infra-red vibrational spectrum? That is, consider that our \(5\times 5\) matrix was constructed by augmenting a smaller matrix containing only \(\langle \mu_i(0) \mu_j(t)\rangle\). Whilst the polarizibility may just be the field derivative of the dipole, in recent work on the current-current correlations in polaron forming systems[11] it was found that a simple6 time derivative of the electric current (collective) autocorrelation was able to reduce cost by up to an order of magnitude. Here, somewhat surprisingly, we find that the prediction of the IR spectrum is insensitive to including the polarizability in the projector, even though the \(\langle \alpha_{ij}(0)\mu_k(t) \rangle\) couplings (the VSFG spectra) are non-zero (see Appendix 6). Thus, it appears that the dipole-dipole correlation functions already contain the information provided by the polarizability time series. The next-slowest motion in the system must be contained in some other observable, but which?
We could look at the problem from the reverse: In the limit that the projector is the identity operator on the full coordinate space, we know a closed system obeys the Schrödinger equation, which is Markovian. Our projection is built from operators which act on the phase-space coordinates, discarding some and aggregating others. Each operation that moves our system away from the full phase-space may be responsible for contaminating the memory kernel with slow degrees of freedom, but to our knowledge there is no established convention on how to go about identifying the culprits and so improve the efficiency. One can try to delineate the possibly harmful steps one has unwittingly taken in following a minimal description of the projection operator as follows,
In computing the dipole moment we include the nuclear positions, but the electronic positions are only included through the first moment of the density. In principle, different electronic densities can have the same dipole moment, so we are not fully describing the electronic position.
Even though we do fully describe the nuclear positions (classically), since we are working with collective quantities in the GME, the cross-terms between coordinates are not included, even though we do calculate them when constructing the total dipole. That is, we are applying the GME only after performing the molecular sum, \(\boldsymbol{\mu}=\sum_{i=0}^{N-1}\boldsymbol{m}_i\).
Our master equation is written purely in terms of time, and not space, as it would be in, say, mode-coupling theory.[73] This happens in the minimal approach because aggregating all molecular contributions into a single value also integrates out position dependence. Yet, we know that the spectral signatures – the dynamics – of waters close to the interface differ from those in the bulk. Hence, removing spatial information may have served to increase the memory kernel’s lifetime.[35]
We will now analyze these steps systematically by quantifying what effect, if any, they have on the efficiency.
First, we query whether inclusion of higher moments of the density will lead to a more Markovian description. An attractive property of the Voronoi method[65] we used to compute the atomic (and then molecular) contributions to the total dipole is that all higher moments of the density are made accessible. This contrasts with the traditional Wannier centre approach,[16], [41] applied either directly or through machine learning.[48] We are therefore able to enlarge the projector at no additional cost with the distinct quadrupole moments, \[\label{eq:proj95quad} A \in \lbrace \mu_\parallel, \mu_z, \alpha_{\parallel \parallel}, \alpha_{\parallel z} , \alpha_{z z} \rbrace \cup \lbrace Q_{\parallel \parallel}, Q_{\parallel z}, Q_{z z} \rbrace.\tag{4}\] We note that we find \(Q_{\parallel z}=Q_{z \parallel}\), which is important because their cross correlations are also identical and so render the time-zero matrix \((\boldsymbol{A}\vert\boldsymbol{A})\) non-invertible when both are included; there is also a technical point that arises about choice of units, see Appendix 7.
Figure 4 displays the \(\boldsymbol{ \mathcal{C} }(t)\) and \(\boldsymbol{ \mathcal{K} }(t)\) matrices for Eq. 4 . There is a hint of coupling between the elements linking the first row and the lower \(3\times 3\) block (\(\mu_\parallel\) and \(Q_{ij}\) in the unnormalized matrix respectively) but it is numerically much smaller than the equivalent coupling with the central \(3\times 3\) block (\(\alpha_{ij}\)). Apparently as a result of this, we find predictions of the SFG spectra are unaffected by including quadrupoles in the projector—there are minor quantitative changes to the equivalent of Fig. 3, but no qualitative improvement. Just as including \(\alpha_{ij}\) does not improve the efficiency of computing the IR spectrum, including \(Q_{ij}\) does not improve the efficiency of computing the VSFG spectra.
We therefore move on to our second question: do the set of molecular dipoles \(\lbrace\boldsymbol{m}_i(t)\rbrace\) serve as a better basis for the projection operator? Certainly, decomposing the collective quantity into molecular level contributions is a chemically intuitive way of viewing the problem and might more directly reveal the nature of the underlying physics. To answer this question, we test whether application of a master equation directly to the molecular correlation matrix, \[\label{eq:C95molecules} C_{ij}(t) = \langle \boldsymbol{m}_i(0)\cdot\boldsymbol{m}_j(t)\rangle,\tag{5}\] where \(i,j\) index the 171 water molecules in the simulation cell, leads to a kernel with a shorter lifetime than starting immediately with the 1-dimensional object \(\sum_{\langle i,j\rangle}\langle \boldsymbol{m}_i(0)\cdot \boldsymbol{m}_j(t)\rangle\).7 This matrix can be directly visualized, but at full resolution it is hard to interpret (see Appendix 8 for reference). We use a colormap to represent the distance between waters \(i\) and \(j\) at the start of each window, averaged over the whole simulation, and present a zoomed view of a randomly chosen \(6\times 6\) block centred on the diagonal in Fig. 5. Here, we can see that the cross-correlation elements are small compared to the important \(<100\) fs, oscillatory region of the self-terms. The rest of the time series we have (previously) discarded as noise. Therefore, the cross correlation elements are of a magnitude and structure very similar to the noisy part of the diagonal elements, even for the waters that are closest in space. It follows that, for these data, operating on the raw matrix is below our fault tolerance. We confirm this by attempting the analysis, and indeed find the GME prediction of the total sum (the full dipole-dipole ACF) is worse than just using the matrix itself. This was to be expected, since it has been shown that these cross-terms require order of nanoseconds to converge.[43] So our second question – whilst motivating advances in simulation technology that unlock cheaper propagation – must go unanswered with these data. We are left wondering how we can work with noise-containing MD data without completely integrating out the molecular-level information?
The third avenue we identified concerned spatial dependence of the memory. The matrix in Fig. 5 is totally raw and unordered, yet we know that over time there is a similarity between waters at a particular depth due, i.e. isotropy of \(x\) and \(y\) in a slab system. Indeed, in general, the well-known inhomogeneous broadening of features in the water spectrum arises due to molecules in different local environments. This is particularly important in VSFG since the sign of \(\chi^{(2)}_{ijk}(\omega)\) carries information about the orientation of molecules contributing at frequency \(\omega\), which is itself reporting on differences in bonding at varying depth with respect to the interface.[49] The question of spatial dependence therefore goes hand-in-hand with that of molecular cross-correlation and defining objects that are minimally aggregated whilst still containing acceptably low noise levels.
To see if this might be a fruitful line of inquiry we first compute the total dipole autocorrelation (appropriate to the IR spectrum) in two parts: from contributions within 5 Å of the centre of the slab, and those further away (within the lower and upper, interfacial layers). In Fig. 6 we show side-by-side the time and frequency domain measures for the surface, centre, and combined contributions. As expected, the surface contribution provides an enriched population with fluctuations to the blue, sharply centred around \(\sim3800\) cm\(^{-1}\), representing not-fully-coordinated O–H stretches.[39], [40] Yet, whilst they are distinct, the functions in the time-domain decay on what seems to be an identical timescale. Indeed we can compute the associated memory kernels, and find they also have equivalent lifetimes. Now, it may be that only when the coupling between the regions is included that the kernel lifetime is reduced, so we also construct the matrix with elements \[\label{eq:C95layers} C_{uv}=\langle \boldsymbol{\mu}_u(0)\cdot\boldsymbol{\mu}_v(t)\rangle\tag{6}\] for \(u, v \in \{\mathrm{surface}, \mathrm{centre}\}\). This is defining a GME where the projected observables are total dipole moments arising from particular layers. We find the off-diagonal elements are of very small magnitude (see Appendix 8, Fig. 9). Here, again, the lifetime is unaffected when the kernel is cutoff and the resulting correlation matrix used to construct the IR spectrum (from the sum of diagonal elements). This stands to reason, since we already know that inter-molecular contributions are much smaller than intra-molecular contributions, and most correlations between surface and central regions come from inter-molecular contributions.
There is a slight nuance to this explanation. Since the correlation function considers the total dipole at time \(t\), there is a possibility for molecules to diffuse between the two regions over that time. This leads to intra-molecular contributions to the off-diagonal elements, and these should grow in number as \(t\) increases. In spite of this, our result implies that these migrating waters are outnumbered by those that remain in (or leave and re-enter) the region in which they started. Hence, even when the coupling of the two layers is included, there is no reduction in the kernel lifetime. We repeat this approach for other numbers of evenly spaced stratifications, up to 7 layers (see Appendix 8, Fig. 10). As the number of layers increases, the residence probability decreases and the intra-molecular contribution to the off-diagonals grows. Still, we find no improvement in efficiency. If spatial dependence of the memory is able to improve our methodology, it will require a more advanced treatment.[34], [36], [70]
Our first result was a significant one: spectroscopically relevant correlation functions directly calculated from AIMD can benefit from an increase in efficiency using a GME. This is in contrast to many studies where the lifetime of the memory kernel is in excess of that for the correlation function. The GME approach comes at negligible additional cost to running the simulations, even for reduced cost methods, and simple routes to calculating the time non-local kernel – including the TTM method used in this paper – are simple to implement with a few lines of code. Yet, the gain in efficiency is not nearly as impressive a result as the orders of magnitude we have come to expect in other systems.
This challenge of how best to define observables in the projection operator to maximize efficiency is a general one. In biophysics the construction of good collective variables is a central problem[72], [74] and there are many approaches ranging from bespoke problem-specific analysis to data-driven, machine-learning-based methods.[75]–[78] Indeed, direct prediction of spectra based on structure – or a set of quantities that are a proxy for the structure – constitutes an orthogonal approach to spectroscopic computation.[79]–[82] Separately, the problem of how to best construct the memory function in glassy dynamics specifically confronts spatial heterogeneity arising in the liquid state.[83] The difficulty of all these interrelated questions advises that future work would benefit from climbing down from the full, electronic-structure problem and working at the FFMD rung of the theoretical hierarchy, first pinning-down treatment of the nuclear contribution to the position correlation function at reduced cost. The trickier electronic description can then be added on subsequently. The need for a principled approach to finding the optimal projector for atomistic simulations of spectroscopic quantities is clear when we consider that expressions for higher-order susceptibilities do not even have straightforward quantum-to-classical replacements, including other SFG spectroscopies that are electronically on-resonance (see Appendix 5).[84]
In sum, despite the multitude of different routes we employed to find a better projector, even for just the IR spectrum, the lifetime of the kernel was found to be extremely robust. Whether it be the GME defined on Eq. 3 , Eq. 4 , Eq. 5 , or Eq. 6 we cannot gain a reduction in cost larger than \(\sim50\%\). This is an interesting result that poses a clear question: what, if any, projector can practically be defined on the output of AIMD simulations to get closer to the Markovian limit of perfectly efficiency spectroscopic prediction?
T.S. is the recipient of an Early Career Fellowship from the Leverhulme Trust. Andrés Montoya-Castillo, David Wilkins, and Yair Litman are thanked for helpful discussions. This work has made use of the Hamilton HPC Service of Durham University.
The author has no conflicts to disclose.
The data that support the findings of this study are available from the corresponding author upon reasonable request.
The derivation has two steps: first we motivate the correlation of the polarizability with the dipole moment as the correct function to be Fourier transformed, and second we show how it can be replaced with its classical counterpart.
We begin assuming the well-known perturbation theory result for the n\(^\mathrm{th}\) order polarization of a quantum density responding to an external field, \(\mathrm{Tr}\lbrace \rho^{(n)}\mu \rbrace\). The second order susceptibility, which is the collection of frequency permutations[85] \[\chi^{(2)}(\omega_1, \omega_2) = \frac{1}{2!}\rho_0\sum_p S^{(2)}(\omega_1+\omega_2, \omega_1)\] of the non-linear response function \[\label{eq:commutators} S^{(2)}(t_2, t_1) = \left(\frac{ \mathrm{i}}{\hbar}\right)^2\langle [[V(t_2,t_1), V(t_1)],V(0)]\rho(-\infty)\rangle,\tag{7}\] where \(V\) is the light-matter coupling strength which under the usual approximation is the dipole and we understand \(t_1\) and \(t_2\) are time-ordered (and non-negative). Introducing the quantity \(\mathrm{i}\delta\) to define \(I_{vv'}(\omega)\equiv 1/{\omega-\omega_{vv'}+i\delta}\) for transition frequencies \(\omega_{vv'}\) (which is taking the matter eigenbasis we cannot actually calculate, and so in practice we take \(\delta \rightarrow \Gamma_{vv'}\) as phenomonological damping coefficients), the Fourier-representation can be used to express the response as a sum of Feynman pathways \[\begin{align} \chi^{(2)}(\omega_1, \Omega) = \frac{1}{2}&\left(\frac{1}{\hbar}\right)^2\sum_p\sum_{a,b,c}\rho_0^{(a)}\mu_{ab}\mu_{bc}\mu_{ca} \times \\ &\big[ I_{ca}(\Omega)I_{ba}(\omega_1) - I_{bc}(\Omega)I_{ba}(\omega_1) \\ &+ I_{ab}(\Omega)I_{ac}(\omega_1) - I_{bc}(\Omega)I_{ac}(\omega_1) \big] \end{align}\] where \(\Omega=\omega_1+\omega_2\) is the sum frequency. In Hilbert space the two permutations give eight separate terms: half the terms contain the difference between the transition frequency and the radiation, and half the sum. Of these, half contain \(\omega_1\) and the other half \(\omega_2\).[50] Since in VSFG only the incoming IR pulse is usually on-resonance with a transition, these two sets are referred to as the resonant and non-resonant contributions. Continuing with only the \(\omega_2\) resonant part, \[\begin{align} \chi^{(2),\mathrm{res}}_{pqr}(\Omega, \omega_2)&=\left(\frac{1}{\hbar}\right)^2\sum_{g,n,m}\rho_0^{(g)}\times\\ &\Bigg[\frac{ \left\langle g \right| \mu_p \left| n \right\rangle \left\langle n \right| \mu_q \left| m \right\rangle \left\langle m \right| \mu_r \left| g \right\rangle }{(\Omega-\omega_{ng}+ \mathrm{i}\Gamma_{ng})(\omega_2-\omega_{mg}+ \mathrm{i}\Gamma_{mg})} \\ &-\frac{ \left\langle g \right| \mu_q \left| m \right\rangle \left\langle m \right| \mu_p \left| n \right\rangle \left\langle n \right| \mu_r \left| g \right\rangle }{(\Omega+\omega_{ng}+ \mathrm{i}\Gamma_{ng})(\omega_2+\omega_{mg}+ \mathrm{i}\Gamma_{mg})}\\ &+\frac{ \left\langle g \right| \mu_r \left| m \right\rangle \left\langle m \right| \mu_q \left| n \right\rangle \left\langle n \right| \mu_p \left| g \right\rangle }{(\Omega-\omega_{nm}+ \mathrm{i}\Gamma_{nm})(\omega_2-\omega_{ng}+ \mathrm{i}\Gamma_{ng})} \\ &-\frac{ \left\langle g \right| \mu_r \left| m \right\rangle \left\langle m \right| \mu_p \left| n \right\rangle \left\langle n \right| \mu_q \left| g \right\rangle }{(\Omega-\omega_{nm}+ \mathrm{i}\Gamma_{nm})(\omega_2+\omega_{mg}+ \mathrm{i}\Gamma_{ng})}\Bigg], \end{align}\] where \(g\) and \(m\) are in the electronic ground state but can have different vibrational states. To reveal the Raman tensor, we factorize out \(\left\langle m \right| \mu_r \left| g \right\rangle /(\omega_2-\omega_{mg}+i\Gamma_{mg})\) from all four terms, transposing to \[\label{eq:chi95res95double} \begin{align} \chi^{(2),\mathrm{res}}_{pqr}(\Omega, \omega_2)=-\frac{1}{\hbar}\sum_{g,m}\left(\rho_0^{(g)}-\rho_0^{(m)} \right)\times \\ \frac{ \left\langle g \right| \alpha_{pq}(\Omega) \left| m \right\rangle \left\langle m \right| \mu_r \left| g \right\rangle }{\omega_2-\omega_{mg}+i\Gamma_{mg}} \end{align}\tag{8}\] where \[\begin{align}\label{eq:raman95matrix95element} \left\langle g \right| \alpha_{pq}(\Omega) \left| m \right\rangle &\equiv \\-\frac{1}{\hbar}\sum_n&\left[\frac{ \left\langle g \right| \mu_p \left| n \right\rangle \left\langle n \right| \mu_q \left| m \right\rangle }{\Omega-\omega_{ng}+ \mathrm{i}\Gamma_{ng}} -\frac{ \left\langle g \right| \mu_q \left| n \right\rangle \left\langle n \right| \mu_p \left| m \right\rangle }{\Omega+\omega_{nm}+ \mathrm{i}\Gamma_{nm}}\right] \end{align}\tag{9}\] is the quantity that appears when considering the scattering from state \(g\) to \(m\), the diagonal elements of which are themselves the first-order susceptibility.[50], [85] For the non-resonant part \(\omega_2 \rightarrow \omega_1\) and \(pqr\rightarrow prq\) on the right-hand side. It is usual to assume the non-resonant part can be modeled as a background.[50]
At this point, using similar logic of resonance, it is assumed the Raman tensor’s frequency dependence can be ignored over the spectral range that \(\omega_{ng}\simeq\omega_{nm}\), and it is replaced by a constant value \(\alpha(\Omega=0)\) referred to as the polarizability tensor.[50] In the time domain, completeness over \(|m\rangle\langle m|\) can be used to eliminate the second summation in Eq. 8 to give a quantum trace giving the correlation of \(\alpha(t)\) with \(\mu(0)\).[39] Indeed, given these approximations on \(\Omega\) one can return to Eq. 7 and simplify, instead defining the starting point as[86] \[\chi^{(2)}_{pqr}(\omega_1)\simeq-\frac{1}{\hbar}\mathrm{Tr}\lbrace \alpha_{pq} I(\omega_1)[\mu_r, \rho_0] \rbrace,\] where the remaining commutator is moved onto the initial density matrix.
To move between this quantum-mechanical expression for the susceptibility and something appropriate to classical nuclei, we first perform the Kubo transform.[87] That is, we define the correlation function[88] \[\Phi^{(2)}_{pqr}(t) = \frac{1}{\beta}\int_0^\beta \!\textrm d{\beta'} \, \langle \alpha_{pq}(t- \mathrm{i}\hbar\beta') \mu_r \rangle\] whose Fourier transform is \[\Phi^{(2)}_{pqr}(\omega) = \frac{1- \textrm{e} ^{-\beta\hbar\omega} }{\beta\hbar\omega}\int_{-\infty}^\infty \!\textrm d{t} \,\langle\alpha_{pq}(t)\mu_{r}\rangle \textrm{e} ^{ \mathrm{i}\omega t}\] which follows from the Fourier multiplier property of the imaginary time translation[73] (shift) justified for the imaginary argument by the KMS condition assuring analyticity between the real axis and \(\mathrm{i}\hbar\beta\) (and further that there is rapid decay at \(t\rightarrow\infty\))[88]. The response function can be written in terms of this ‘Kubo correlation function’ because the commutator with the initial density can be rewritten as a time derivative, since by the chain rule \[\frac{\partial}{\partial\lambda}A(\lambda) \equiv \frac{\partial}{\partial\lambda}\left( \textrm{e} ^{\lambda H} A \textrm{e} ^{-\lambda H} \right) = -[H, A(\lambda)],\] and the temperature integral of this is the original commutator (up to a Boltzmann factor).[88] In other words, one can move between \[-\frac{ \mathrm{i}}{\hbar}\int_0^\infty \!\textrm d{t} \,\mathrm{Tr}\lbrace\alpha_{pq}(t)[\mu_{r},\rho_0]\rbrace \textrm{e} ^{ \mathrm{i}\omega_1 t}\] and \[\frac{ \mathrm{i}\omega_1}{2}\left(\Phi^{(2)}_{pqr}(\omega_1) +\frac{ \mathrm{i}}{\pi}P_p\int \!\textrm d{\omega} \,\frac{\Phi^{(2)}_{pqr}(\omega)}{\omega_1-\omega} \right)\] where the principle part is the boundary term that accounts for moving between Laplace and Fourier transforms (required from causality but often omitted since only the imaginary part is desired).[86] The reason this allows correspondence to a classical result is because the Kubo-transformed function has the classical symmetries—the imaginary time translation-plus-integration, which is the ‘detailed balance prefactor’ in frequency space, expresses the quantum correlation function as an anti-commutator using well-known identities, which means it becomes real and even; it reduces to the classical function in the limit \(\hbar\rightarrow 0\).
Other quantum corrections are available, but this is thought to be the best.[89] Note that Ref. [47] does perform this replacement, but there is a typo in their equation for the susceptibility.8 In fact we tested to see what effect omitting the prefactor had and saw little effect on the O–H stretching region, as \(\omega\) is already large.
We pause to note that this route assumes the one-time nature of the correlation function at the start. In principle when the full frequency dependence is retained then a two-time Kubo transform would need to be performed.[90] To our knowledge there is no literature on what effect ignoring this fact has on the quality of the correspondence. The two-time version is already much more complicated because no simple (Kramers-Kronig) relationship exists between the correlation function’s real and imaginary parts, and so previous work has had to resort to approximation.[84], [91] There is alternative work based on classical TCFs, but it also assumes particular (harmonic) forms to derive the connection.[92]
Focussing just on the prediction of the IR spectrum from the diagonal \(2 \times 2\) block, we ask, is the introduction of the polarization limiting the effectiveness of the GME? This does not seem possible since, as we have described the theory, enlarging the projector at worst keeps the memory kernel lifetime constant. We can test this easily, and indeed from Fig. 7 we find that reducing the projector to just the dipole entries (which are completely uncorrelated) yields a very similar error as a function of cutoff as the blue trace in Fig. 3. The lifetime of \(K(t)\) given the same convergence criteria stays effective the same, lowering from 35 to 33 steps. This is consistent with our expectation that enlarging the projector cannot decrease the kernel lifetime. The lifetime of \(\boldsymbol{ \mathcal{C} }(t)\) actually increases from 54 steps to 64 steps.
The factor \((\boldsymbol{A}\vert \boldsymbol{A})^{-1}\) required for the numerical inversion ensures that the different possible time series of \(A\) have appropriate units after projection. One could also normalize time series, effectively changing units, before construction the matrix. Whilst the \(\boldsymbol{ \mathcal{C} }(t)\) matrix will look different, upon converting back to the original time series (to compute the spectrum) these two approaches should give the same result, and we have confirmed they do for the data of Fig. 1. That is, this does not affect lifetimes or accuracy.
However there can be a numerical difference if the matrix \((\boldsymbol{A}\vert \boldsymbol{A})\) becomes poorly conditioned. For example \(\mu_z\) and \(\alpha_{zz}\) may have different orders of magnitude, and the condition number as defined by the ratio of largest to smallest eigenvalues can become very large and render inversion unstable. This is what we found when expanding the projector to include \(Q_{ij}\) which are as-computed an additional order of magnitude greater than \(\alpha_{zz}\) in these units. For reference, fields were converted to Vpm\(^{-1}\) using the factor 2.57. Pre-normalizing the quadrupole time series data removes the issue, and that is what is used in the main text. The normalization factor is stored in case the original units are demanded at a later time.
In Fig. 8 we show the full \(\langle \boldsymbol{m}_i(0)\cdot \boldsymbol{m}_j(t)\rangle\) matrix, which is of dimension \(171 \times 171\). The main observations are that 1) the matrix is quite homogeneous, 2) a 5 Å cutoff actually includes a large number of the other 170 waters in the box, and 3) most waters that start within 5 Å are at a larger distance on average over the length of the full simulation.
In Fig. 9 we show the surface-centre coupling projector. As described in the main text, the off-diagonal element is insignificant with its small size. The correlation functions start at different initial values because the numbers of molecules in each region is different. We also constructed similar matrices for larger numbers of layers, and we show the \(N=4\) result in Fig. 10. It is clear that as \(N\) increases statistics deteriorate; there are relatively few waters in the top-most layer (bottom right). The layers can be defined to have more equal numbers of molecules but the general conclusions drawn from the diagonal elements – that the lifetime is the same as the correlation function – persist.
Strictly speaking, the dipole moment in a periodic system is not an observable.[15] Only changes in the total polarization are well-defined. Nevertheless the operator we associate with the dipole (molecular or total) can still be written and computed within the limits of the Berry-phase polarization branch,[16] provided the system remains insulating[17]↩︎
The authors of Ref. [25] have informed us the lifetime of the RPMD kernel is just 25 fs, two orders of magnitude shorter than the transmission coefficient plateau time in the low-friction regime.↩︎
There is a case where the memory kernel fits a model form that can be extrapolated,[37] but in general long-timescales in the memory kernel cannot easily be inferred from short-time data, especially in the presence of noise.↩︎
This can be seen as the opposite of a method like DMD that takes the full configurational space and extracts the high/infinite dimensional (Markovian) propagator by performing a dimensionality reduction, effectively approaching the minimal space from above by rank reduction.[38]↩︎
We added the requisite 6th element to the projector to see its effect, but there was no discernible difference. Symmetry tables can be found in Ref. [51].↩︎
Analytically, taking time derivatives of the current begins to include operators describing the phonon bath, and is therefore anything but simple (even unobtainable from the point of view of the HEOM solver). However sufficiently high time resolution allows a numerical estimate.↩︎
The \(\langle i,j\rangle\) notation means when this value is greater than 5 Å we zero the element, as described in the Methods section↩︎
Confirmed in a private communication with the authors.↩︎