Fisher information and trajectorial interpretation to the Itô–Langevin relative entropy dissipation


Abstract

The dissipation phenomena of relative entropy from an Itô–Langevin dynamical system is a classic topic from stochastic analysis. Relying on the time-reversal of diffusions, a novel trajectorial approach investigates the pathwise behavior of relevant entropy processes, reveals more information from the delicate random structure, and eventually retrieves the known classical results. In essence, this approach provides novel insights and rederives the known results of the Itô–Langevin dynamics, as will be presented in this expository article. Another part is to view the stochastic time-evolution through the lens of the Wasserstein space, under which we observe the geometric feature of steepest descent of the entropy decay as well as its exponential rate of velocity.

1 Introduction↩︎

When Schrödinger Schrödinger0? tried to explain why intelligent systems tend to have far more replication errors than general Statistical Thermodynamics, the concept of entropy was progressively developed in response to the observation that even for the most isolated physical systems, the level of their internal disorder significantly increases as time flows [1].

This time-monotonous trend has been described as the dissipation of entropy, where the phrase was first adopted by Clausius [2]. In this expository article, we present a novel perspective to the entropy dissipation for a large class of models characterized by the Itô-Langevin stochastic differential equations.

1.1 Reviewing the literature↩︎

Very heuristically, the notion of entropy and its dissipation had been widely discussed by Boltzmann [3][5], Gibbs [6], [7], and Shannon [8], [9], until unanimously accepted to be the metric which quantifies the level of disorder of a physical system. Several approaches from different disciplines have been developed to characterize the monotonicity of its time-evolution. For a recent overview, see Cover/Thomas [10], Krzakala/Zdeborová Krzakala/Zdeborová? for Information Theory, see Kardar [11], [12] for Statistical Physics, and see Sudakov [13] for Combinatorics. The notion of entropy has also been utilized in Mathematical Finance, see Choulli [14], Laeven/Stadje [15], and Schweizer [16]. Colloquially, the monotonous time-evolution of entropy indicates some irreversible change which is intrinsic to the physical system of interest, and on average tends to increase its disorder.

Our discussion focuses on the concept of relative32entropy lying at the interdisciplinary realm between Stochastic Calculus and Interacting Particle Systems. The notion of relative entropy quantifies the complexity of the evolution of a family of time-parametrized probability measures, see [17]. To summarize precisely, we follow a probabilistic approach to formalize the Langevin diffusion as entropic gradient flux in an appropriately defined Wasserstein space. A recent breakthrough by Karatzas/Schachermayer/Tschiderer [18], see also [19], [20], on the trajectorial interpretation to the essentially well-known de Bruijin identity, see Brossier/Zozor [21], is our main theme and will be presented in the sequel.

In this expository article, the family of time-parametrized probability measures on \(\mathbb{R}^d\) will be denoted by \((P^\beta_t)_{t\geq0}\), where the \(\beta\)-script indicates that this evolution is placed under the presence of some smooth and time-homogeneous perturbations. The model of interest is the trajectorial dynamics of the relative entropy of \((P^\beta_t)_{t\geq0}\) against a \(\sigma\)-finite reference measure \(Q\) on the Borel sets of \(\mathbb{R}^d\). This trajectorial approach brings us a novel interpretation to the interplay between the relative entropy and other quantities, such as Fisher information, which reveals more internal structure from the stochastic system. Over recent years, apart from the trajectorial formulation of the relative entropy dissipation, similar trajectorial approaches have been successfully applied to the optimal stopping theorems by Davis/Karatzas [22], to the Doob martingale inequalities by Acciaio/Beiglböck/Penkner/Schachermayer/Temme Acciaio/Beiglböck/Penkner/Schachermayer/Temme?, and to the Burkholder-Davis-Gundy inequality by Beiglböck/Siorpaes Beiglböck/Siorpaes?. See also Gentil/Léonard/Ripani Gentil/Léonard/Ripani? for an application in the Schrödinger problem.

1.2 Motivation and rough descriptions↩︎

This expository article is an interpretation to the trajectorial Otto calculus, developed by Karatzas/Scha-chermayer/Tschiderer in [18], when applied to the scenario of relative entropy dissipation. In essence, the trajectorial formulation provides a novel approach to the well-known phenomena of entropy dissipation [23], through which we are able to witness more internal information from the stochastic dynamics of interest.

Define \(\mathcal{C}\mathrel{\vcenter{:}}=\mathcal{C}(\mathbb{R}_+;\mathbb{R}^d)\), consisting of \(\mathbb{R}^d\)-valued continuous functions on \([0,\infty)\), to be the path space where we will place the stochastic dynamics. The time-evolution of the coordinate process \((X_t(\omega))_{t\geq0}=(\omega(t))_{t\geq0}\) for all \(\omega\in\mathcal{C}\) is characterized by its distribution \(\mathbb{P}^\beta\) on \(\mathcal{C}\), i.e. a probability measure on the Borel sets of \(\mathcal{C}\). At each time point \(t\geq0\), the marginal law of \(X_t\) is denoted by \(P^\beta_t\), a Borel probability measure on \(\mathbb{R}^d\). Collectively, we have a time-parametrized family \((P^\beta_t)_{t\geq0}\) of Borel probability measures on \(\mathbb{R}^d\) at hand. And we choose a fixed \(\sigma\)-finite Borel reference measure \(Q\) on \(\mathbb{R}^d\), such that each \(P^\beta_t\) is absolutely continuous with respect to \(Q\), for quantifying the relative entropy. An approach to compare \(P^\beta_t\) to \(Q\) is computing their Radon–Nikodým derivative. It is well-known that taking \(\mathbb{P}^\beta\)-expectation on the logarithmic derivative \(\log dP^\beta_t/dQ\) yields the classical quantity of relative entropy between \(P^\beta_t\) and \(Q\).

The first insight of the trajectorial formulation is that we work with the process \((\log dP^\beta_t/dQ)_{t\geq0}\) and investigate its trajectorial properties. This approach deals with the pathwise behavior of the relevant processes. Achieving their pathwise limiting identities and subsequently taking \(\mathbb{P}^\beta\)-expectation will give us the well-known results on the dissipation phenomena on the dynamics of relative entropy. In other words, this approach will reveal more information than the classical approach from the delicate pathwise structure of the stochastic dynamical system.

Another insight is letting the time-parametrized family \((P^\beta_t)_{t\geq0}\) undergo time-reversal. This less transparent approach is adopted because it is comparatively simpler than the original forward-time approach, especially in the computation of the semimartingale decomposition of the relative entropy process which is defined in Section 4. To phrase this backward-time approach, we fix a compact time interval \([0,T]\) and consider the same family of Borel probability measures \((P^\beta_{T-t})_{0\leq t\leq T}\), indexed backward in time. Our main object of interest, i.e. the trajectorial formulation, originates from observing the difference between the terms \(\log dP^\beta_{T-t}/dQ\) and \(\mathbb{E}^{\mathbb{P}^\beta}[\log dP^\beta_{t_0}/dQ|\sigma(P^\beta_{T-\theta},0\leq\theta\leq t)]\) with \(0\leq t\leq T-t_0\leq T\). Dividing this difference term by \(T-t_0-t\) and letting \(t\nearrow T-t_0\), we obtain formally the trajectorial time-derivative of the relative entropy process, under the conditional knowledge of \(\sigma(P^\beta_{T-\theta},0\leq\theta\leq t)\). And as stated in the previous paragraph, taking \(\mathbb{P}^\beta\)-expectation and with some additional regularity argument, we obtain the dissipation identity of the relative entropy.

1.3 Structure of this article↩︎

From the above characterizations of the trajectorial formulation, we could retrieve the known classical results on the relative entropy dissipation by first taking \(\mathbb{P}^\beta\)-expectation and further collapsing the \(\beta\)-perturbation. Indeed, the classical de Bruijn identity [23] follows immediately from the above procedures. The realization of the rough descriptions relies on the specification of the law \(\mathbb{P}^\beta\) on the path space \(\mathcal{C}\).

In Section 2 of this expository article, we shall require \(\mathbb{P}^\beta\) to be the law of the coordinate process \((X_t)_{t\geq0}\) so that it satisfies an Itô-Langevin stochastic differential equation (1 ), which describes a broad class of particle system dynamics, see [24], [25]. In Section 3, some necessary terminologies and prerequisite theories on the time-reversal principle of diffusion processes are presented, before subsequently discussing the trajectorial formalism which heavily relies on the backward-time techniques.

It is essential to specify the semimartingale decomposition of the Radon–Nikodým derivative process \((dP^\beta_{T-t}/dQ)_{0\leq t\leq T}\), viewed under time-reversal, as well as the semimartingale decomposition of its logarithm \((\log dP^\beta_{T-t}/dQ)_{0\leq t\leq T}\). The computation and some related results on its martingale property will be displayed in the end of Section 4. And their applications to the relative entropy dissipation are presented in Section 5, where the known classical results are shown to be a derivation of the trajectorial approach.

In Section 6, we characterize the time-evolution of the family \((P^\beta_t)_{t\geq0}\) of Borel probability measures on \(\mathbb{R}^d\) through the lens of the suitably defined quadratic Wasserstein space, where its internal connections to the relative entropy dissipation via the Fisher information, as well as the steepest descent property in the unperturbed scenario, are revealed. This article is concluded with an argument of the exponential decay rate of the relative entropy quantity in the absence of perturbation, which can also be derived in parallel from the Bakry-Émery theory.

2 The stochastic Itô–Langevin dynamics↩︎

Ever since the seminal contribution [26] to the Brownian motion theory, the Itô–Langevin stochastic differential equations have played an eminent role in the non-equilibrium Statistical Mechanics [27], Oliveira/Tomé? and in the study of particle systems [28], [29]. The fundamental idea of Itô-Langevin dynamics is to describe the diffusion particle in terms of the combination of deterministic forces and stochastic fluctuations.

In this expository article, we will also place the entropy dynamics under the constraint of an Itô–Langevin stochastic differential equation. To express the spirit of our trajectorial formulation and taking into account the conciseness of this exposition, we will focus on the simplest setting of a particle undergoing diffusion in a potential field. Notice that throughout our exposition, \(\abs{\cdot}:\mathbb{R}\to\mathbb{R}_+\) denotes the absolute value of a real number, and \(\norm{\cdot}:\mathbb{R}^d\to\mathbb{R}_+\) denotes the Euclidean \(L^2\)-norm of a vector in \(\mathbb{R}^d\).

2.1 Itô–Langevin dynamics↩︎

Denote by \(\psi(\cdot):\mathbb{R}^d\to\mathbb{R}_+\) the potential function, which is assumed to be of class \(\mathcal{C}^\infty(\mathbb{R}^d;\mathbb{R}_+)\) and satisfies the linear growth condition \(\lVert\nabla\psi(x)\rVert\leq K(1+\norm{x})\) for all \(x\in\mathbb{R}^d\) with some absolute constant \(K>0\). This potential function determines the distribution \(\mathbb{P}^\beta\) on \(\mathcal{C}=\mathcal{C}(\mathbb{R}_+;\mathbb{R}^d)\) and henceforth also the time-evolution of the family of marginal distributions \((P^\beta_t)_{t\geq0}\) of the coordinate process \((X_t)_{t\geq0}\). Furthermore, starting from a fixed time point \(t_0\geq0\), a smooth perturbation field \(\beta(\cdot):\mathbb{R}^d\to\mathbb{R}^d\) started to influence the evolution. To capture the aforementioned setting, the Itô-Langevin stochastic differential equation, \[\label{perturbed32Itô-Langevin32dynamics} dX_t=-\big(\nabla\psi(X_t)+\beta(X_t)I_{\{t>t_0\}}\big)\,dt+dW^\beta_t,\quad\text{for all}\quad t\geq0\quad\text{with}\quad X_{0}\sim P_{0},\tag{1}\] constrains the time-evolution of the coordinate process \((X_t)_{t\geq0}\) in \(\mathcal{C}\), and hence also the family \((P^\beta_t)_{t\geq0}\) of Borel probability measures on \(\mathbb{R}^d\). Here, \((W^\beta_t)_{t\geq0}\) is a \(d\)-dimensional Brownian motion started from zero, independent of \(X_0\). In (1 ), the initial distribution \(P_0\) of \(X_0\) is put to be absolutely continuous with respect to the Lebesgue measure on \(\mathbb{R}^d\). Throughout this expository article, we assume that the perturbation \(\beta(\cdot)\) is of compact support. Indeed, this regularity requirement simplifies our argument, and we have

Lemma 1. The Itô-Langevin diffusion (1 ) with initial distribution \(P_0\) admits a pathwise unique strong solution \((X_t)_{t\geq0}\), whose distribution on \(\mathcal{C}\) is denoted by \(\mathbb{P}^\beta\). If we assume that the distribution \(P_0\) of \(X_0\) admits a finite second moment, i.e. \(\mathbb{E}^{\mathbb{P}^\beta}[\norm{X_0}^2]<\infty\) and that the potential function \(\psi(\cdot)\) satisfies the following drift condition, \[\label{drift32condition} x\cdot\nabla\psi(x)\geq-C\norm{x}^2,\quad\forall~x\in\mathbb{R}^d\quad\text{with}\quad\norm{x}\geq R,\quad\text{for some}\quad C,R>0,\tag{2}\] then each \(X_t\), with \(t\geq0\), admits a finite second moment, i.e. \(\mathbb{E}^{\mathbb{P}^\beta}[\norm{X_t}^2]<\infty\).

Proof. Since the potential \(\psi(\cdot)\) is of class \(\mathcal{C}^\infty(\mathbb{R}^d;\mathbb{R}_+)\), its gradient \(\nabla\psi(\cdot)\) is then of class \(\mathcal{C}^\infty(\mathbb{R}^d;\mathbb{R}^d)\). Hence, both \((\nabla\psi+\beta)(\cdot)\) and \(\nabla\psi(\cdot)\) are locally Lipschitz continuous on compact sets of \(\mathbb{R}^d\), because \(\beta(\cdot)\) is smooth with compact support. Therefore, together with the linear growth condition on \(\nabla\psi(\cdot)\), the existence of a strong solution \((X_t)_{t\geq0}\) in \(\mathcal{C}\) and its pathwise uniqueness is a consequence of Le Gall Le?, Gall?. We are left to verify that each \(X_t\) admits a finite second moment. Indeed, in (2 ) we can choose \(R>0\) sufficiently large so that \(\text{supp}(\beta)\subseteq\{x\in\mathbb{R}^d:\,\norm{x}\leq R\}\). Then, denote \[C_R\mathrel{\vcenter{:}}=\sup\big\{d-2x\cdot (\nabla\psi+\beta I_{\{t>t_0\}})(x):\,\norm{x}\leq R,\,t\geq0\big\}\quad\text{and}\quad\tau_k\mathrel{\vcenter{:}}=\inf\big\{t\geq0:\,\norm{X_t}>k\big\},\;\;\forall~k>R.\] Notice that the drift condition (2 ) guarantees \(C_R<\infty\). Then the Itô formula gives \[d\norm{X_t}^2=\big(d-2X_t\cdot (\nabla\psi+\beta I_{\{t>t_0\}})(X_t)\big)\,dt+2X_t\,dW^\beta_t,\quad\text{for all}\quad t\geq0.\] Here, both \(X_t\) and \(W^\beta_t\) are vectors in \(\mathbb{R}^d\), so integrating \(X\) against \(W^\beta\) simply refers to \(\sum_{i=1}^d\int X_t^{(i)}\,dW_t^{\beta,(i)}\). Taking \(\mathbb{P}^\beta\)-expectation under the localization sequence \((\tau_k)_{k>R}\), we observe that \[\begin{align} \mathbb{E}^{\mathbb{P}^\beta}\big[\norm{X_{\tau_k\wedge t}}^2\big]&=\mathbb{E}^{\mathbb{P}^\beta}\big[\norm{X_{0}}^2\big]+\mathbb{E}^{\mathbb{P}^\beta}\bigg[\int_0^{\tau_k\wedge t}\big(d-2X_\theta\cdot (\nabla\psi+\beta I_{\{\theta>t_0\}})(X_\theta)\big)I_{\{\norm{X_\theta}\leq R\}}\,d\theta\bigg]\\ &\quad+\mathbb{E}^{\mathbb{P}^\beta}\bigg[\int_0^{\tau_k\wedge t}\big(d-2X_\theta\cdot (\nabla\psi+\beta I_{\{\theta>t_0\}})(X_\theta)\big)I_{\{\norm{X_\theta}> R\}}\,d\theta\bigg]\\ &\leq\mathbb{E}^{\mathbb{P}^\beta}\big[\norm{X_{0}}^2\big]+C_R\mathbb{E}^{\mathbb{P}^\beta}\big[\tau_k\wedge t\big]+\mathbb{E}^{\mathbb{P}^\beta}\bigg[\int_0^{\tau_k\wedge t}d+2C\norm{X_\theta}^2\,d\theta\bigg]. \end{align}\] The last inequality above follows from (2 ) and the definition of \(C_R\). Hence, \[\mathbb{E}^{\mathbb{P}^\beta}\big[\norm{X_{\tau_k\wedge t}}^2\big]\leq\mathbb{E}^{\mathbb{P}^\beta}\big[\norm{X_{0}}^2\big]+(C_R+d)t+2C\int_0^t \mathbb{E}^{\mathbb{P}^\beta}\big[\norm{X_{\tau_k\wedge\theta}}^2\big]\,d\theta\quad\text{for all}\quad t\geq0.\] Applying the Gronwall inequality [30], we obtain \[\label{Gronwall32inequality} \mathbb{E}^{\mathbb{P}^\beta}\big[\norm{X_{\tau_k\wedge t}}^2\big]\leq\mathbb{E}^{\mathbb{P}^\beta}\big[\norm{X_{0}}^2\big]+(C_R+d)t+2C\int_0^t e^{2C(t-\theta)}\big(\mathbb{E}^{\mathbb{P}^\beta}\big[\norm{X_{0}}^2\big]+(C_R+d)\theta\big)\,d\theta.\tag{3}\] The assumption \(\mathbb{E}^{\mathbb{P}^\beta}[\norm{X_{0}}^2]<\infty\) implies that the RHS of (3 ) is finite for all \(t\geq0\). Applying the monotone convergence theorem [31] and letting \(k\nearrow\infty\), \[\mathbb{E}^{\mathbb{P}^\beta}[\norm{X_{t}}^2]\leq\mathbb{E}^{\mathbb{P}^\beta}\big[\norm{X_{0}}^2\big]+(C_R+d)t+2C\int_0^t e^{2C(t-\theta)}\big(\mathbb{E}^{\mathbb{P}^\beta}\big[\norm{X_{0}}^2\big]+(C_R+d)\theta\big)\,d\theta<\infty,\] for all \(t\geq0\), which verifies the claim. ◻

Lemma 1 tells us that the second moment condition propagates in time. Indeed, from now on we assume that \(\mathbb{E}^{\mathbb{P}^\beta}[\norm{X_0}^2]<\infty\), which automatically implies \[\label{temp} \mathbb{E}^{\mathbb{P}^\beta}\big[\norm{X_t}^2\big]=\int_{\mathbb{R}^d}\norm{x}^2\,dP^\beta_t\in\mathbb{R},\quad\text{for all}\quad t\geq0.\tag{4}\] In Section 6,we will endow the Wasserstein space structure to the set of all Borel probability measures on \(\mathbb{R}^d\) with finite second moments. There, (4 ) shows that \(P^\beta_t\), \(t\geq0\), belongs to this Wasserstein space, whose metric structure provides more insights, for instance Theorem 4, on the evolution of \((P^\beta_t)_{t\geq0}\).

2.2 Density and reference measure↩︎

The absolute continuity of each \(P^\beta_t\) with respect to the Lebesgue measure on \(\mathbb{R}^d\) is guaranteed in [32]. For all \(t\geq0\), we write \(p^\beta_t(\cdot):\mathbb{R}^d\to\mathbb{R}_+\) as the density of \(P^\beta_t\) against the Lebesgue measure on \(\mathbb{R}^d\). Notice that \((t,x)\mapsto p^\beta_t(x)\) satisfies a partial differential equation on \(\mathbb{R}_+\times\mathbb{R}^d\), called the Fokker-Planck equation. This more analytic perspective will be discussed when we compute the semimartingale decomposition of \((dP^\beta_{T-t}/dQ)_{0\leq t\leq T}\) and \((\log dP^\beta_{T-t}/dQ)_{0\leq t\leq T}\) in Section 4, where the time-reversal is performed in a compact interval \([0,T]\). For now, we only remark that the Fokker-Planck equation plays an important role in classical dissipative systems [28], [33] and has internal connections to the Itô-Langevin dynamics [29].

The family of probability measures \((P^\beta_t)_{t\geq0}\) then has two interpretations: Either its density \((p^\beta_t(\cdot))_{t\geq0}\) can be seen as the solution to the Fokker-Planck equation, to be written out in Section 4, or each \(P^\beta_t\) can be seen as the marginal distribution to the solution process \((X_t)_{t\geq0}\) of the Itô-Langevin dynamics (1 ) at time \(t\geq0\). Apart from that, we introduce a \(\sigma\)-finite measure \(Q\) on the Borel sets of \(\mathbb{R}^d\) with density \[q(\cdot)\mathrel{\vcenter{:}}=\exp\big(-2\psi(\cdot)\big):\mathbb{R}^d\to\mathbb{R}_+\] against the Lebesgue measure on \(\mathbb{R}^d\). This \(\sigma\)-finite Borel measure \(Q\) is specified to be the reference measure when we compute the relative entropy of \((P^\beta_t)_{t\geq0}\) later in this article.

When the potential function \(\psi(\cdot)\) grows rapidly enough so that \(\exp(-\psi(\cdot))\in L^2(\mathbb{R}^d)\), the normalized density \(q(\cdot)\) solves a variant Fokker-Plank equation, if we take \(p^\beta_0(\cdot)=q(\cdot)\) modulo normalization. Looking back to (1 ), an equivalent probabilistic perspective reveals that \(X_t\sim Q\) modulo normalization at each time \(t\geq0\), see [28], [33]. Moreover, when \(\exp(-\psi(\cdot))\in L^2(\mathbb{R}^d)\), it is verified [34] that the normalized probability density \(q(\cdot)\) satisfies a variational principle: It minimizes the free energy functional, \[\mathscr{F}(\rho)\mathrel{\vcenter{:}}=\int_{\mathbb{R}^d}\psi(x)\rho(x)\,dx+\frac{1}{2}\int_{\mathbb{R}^d}\rho(x)\log\rho(x)\,dx,\] over all probability densities \(\rho(\cdot):\mathbb{R}^d\to\mathbb{R}_+\) on \(\mathbb{R}^d\).

Additionally, we require the perturbation field to be of gradient type, i.e. \(\beta(\cdot)=\nabla B(\cdot):\mathbb{R}^d\to\mathbb{R}^d\) in our exposition. The function \(B(\cdot):\mathbb{R}^d\to\mathbb{R}\), of class \(\mathcal{C}^\infty(\mathbb{R}^d,\mathbb{R})\) and compactly supported, is called the perturbation potential. When the perturbation is switched off, the family of probability measures characterizing the coordinate process \((X_t)_{t\geq0}\) from (1 ) is denoted by \((P^0_t)_{t\geq0}\), where the zero-script simply indicates that this is the case of vanishing perturbation.

2.3 Quantities from statistical physics↩︎

So far we have been working with the family of Borel probability measures \((P^\beta_t)_{t\geq0}\) and the \(\sigma\)-finite Borel reference measure \(Q\) on \(\mathbb{R}^d\). But how do we extract information from their time-evolution? One approach is to translate our language and work through the lens of stochastic processes.

It is now clear that each \(P^\beta_t\) is absolutely continuous with respect to the \(\sigma\)-finite reference measure \(Q\), for any \(t\geq0\). Indeed, we write the likelihood32ratio32process, or the Radon–Nikodým derivative, as \[\label{likelihood32ratio32process} \ell^{\mathbb{P}^\beta}_t(X_t)=\frac{dP^\beta_t}{dQ},\quad\text{where}\quad\ell^{\mathbb{P}^\beta}_t(x)\mathrel{\vcenter{:}}= p^\beta_t(x)e^{2\psi(x)}\quad\text{for all}\quad(t,x)\in\mathbb{R}_+\times\mathbb{R}^d.\tag{5}\] And we call its logarithmic process as the relative32entropy32process, \[\label{relative32entropy32process} \mathcal{R}^{\mathbb{P}^\beta}_t(X_t)\mathrel{\vcenter{:}}=\log\ell^{\mathbb{P}^\beta}_t(X_t)=\log\frac{dP^\beta_t}{dQ}\;\;\text{for all}\;\;t\geq0.\tag{6}\] This seemly redundant definition will actually simplify computation in the analysis of semimartingale decomposition of \((\ell^{\mathbb{P}^\beta}_{T-t}(X_{T-t}))_{0\leq t\leq T}\) and \((\mathcal{R}^{\mathbb{P}^\beta}_{T-t}(X_{T-t}))_{0\leq t\leq T}\) in Section 4. For notational conciseness, from now on we will abbreviate \(\ell^{\mathbb{P}^\beta}\) and \(\mathcal{R}^{\mathbb{P}^\beta}\) as \(\ell^\beta\) and \(\mathcal{R}^\beta\), respectively.

Having defined the basic setting of relevant stochastic processes, we now introduce some quantities of interest. These quantities have their origins from Statistical Physics [12] and will, essentially, serve as the metrological index of our Itô-Langevin stochastic system. Regarding the family \((P^\beta_t)_{t\geq0}\) of Borel probability measures and the \(\sigma\)-finite reference measure \(Q\), we define the relative32entropy, \[\label{relative32entropy} \mathbb{H}\big[P^\beta_t|Q\big]\mathrel{\vcenter{:}}=\int_{\mathbb{R}^d}\log\frac{dP^\beta_t}{dQ}\,dP^\beta_t=\int_{\mathbb{R}^d}p^\beta_t(x)\log\frac{p^\beta_t(x)}{q(x)}\,dx,\quad\text{for all}\quad t\geq0,\tag{7}\] as well as the Fisher32information, \[\label{Fisher32information} \mathbb{I}\big[P^\beta_t|Q\big]\mathrel{\vcenter{:}}=\int_{\mathbb{R}^d}\big\lVert\nabla\log\frac{dP^\beta_t}{dQ}\big\rVert^2\,dP^\beta_t=\int_{\mathbb{R}^d}\big\lVert\nabla\big(\log p_t^\beta(x)+2\psi(x)\big)\big\rVert^2p^\beta_t(x)\,dx,\quad\text{for all}\quad t\geq0.\tag{8}\]

To avoid a meticulous discussion on the general case, some regularity is assumed for the time-evolution of the relative entropy. And therefore, our argument is simplified to better present the trajectorial formulation in Section 5. We add the assumption that our choice of the initial distribution \(P_0\) on \(\mathbb{R}^d\) ensures \(\mathbb{H}[P_0|Q]<\infty\). Incorporating \(\ell^\beta\) and \(\mathcal{R}^\beta\) into the above definitions, the relative entropy (7 ) and Fisher information (8 ) can then be written as \[\mathbb{H}\big[P^\beta_t|Q\big]=\mathbb{E}^{\mathbb{P}^\beta}\big[\mathcal{R}^\beta_t(X_t)\big]\quad\text{and}\quad\mathbb{I}\big[P^\beta_t|Q\big]=\mathbb{E}^{\mathbb{P}^\beta}\big[\big\lVert\nabla\mathcal{R}^\beta_t(X_t)\big\rVert^2\big],\quad\text{for all}\quad t\geq0.\]

One remarkable consequence of defining the relative entropy of probability measure \(P^\beta_t\) with respect to the \(\sigma\)-finite Borel measure \(Q\) is that the mapping \(t\mapsto\mathbb{H}[P^0_t|Q]\) admits a strong version of monotonicity on \(\mathbb{R}_+\), in the absence of perturbation.

Lemma 2. For finite time horizon \(T\geq0\) and let \(\tau_1,\tau_2\) with \(\tau_1\leq\tau_2\) be two stopping times taking value in \([0,T]\) with respect to the filtration generated by the coordinate process \((X_t)_{t\geq0}\), then, \[\mathbb{H}\big[P^0_{T-\tau_1}|Q\big]\leq\mathbb{H}\big[P^0_{T-\tau_2}|Q\big]\quad\text{and}\quad\mathbb{H}\big[P^0_{T-t_1}|Q\big]\leq\mathbb{H}\big[P^0_{T-t_2}|Q\big],\quad\text{for all}\quad0\leq t_1\leq t_2\leq T.\]

Lemma 2 is actually a surface corollary of an internal property of the relative entropy process: \((\mathcal{R}^\beta_t)_{t\geq0}\) running at backward-time satisfies the condition to be a \(Q\)-submartingale, where the reference measure \(Q\) is only required to be \(\sigma\)-finite on \(\mathbb{R}^d\), not necessarily a probability measure. For the precise definition of a \(Q\)-submartingale and for the proof of Lemma 2, readers are referred to [35].

Remember that \((P^0_t)_{t\geq0}\) is the family of marginal distributions induced by (1 ), without perturbation. In fact, \(P^\beta_t\) coincides with \(P^0_t\) when \(0\leq t\leq t_0\), before the perturbation \(\beta(\cdot)\) is initiated. Similarly, we denote by \(\mathbb{H}[P^0_t|Q]\), \(\mathbb{I}[P^0_t|Q]\), and \(\ell^0_t(X_t)\), \(\mathcal{R}^0_t\) for all \(t\geq0\), in their respective zero-perturbation case. The above-defined relative entropy and Fisher information provide decisive metric to a quantitative version of the motivation and rough descriptions mentioned in Section 1. Such trajectorial formulation will be thoroughly investigated in Section 5.

2.4 Preview of classical results↩︎

The trajectorial approach which is presented in Section 5 of this expository article reveals more internal information from the Itô-Langevin stochastic dynamics, but its formulation is rather abstract and difficult to comprehend at first reading. Therefore, it will be a courtesy to present the known classical results on the relative entropy dissipation to the readers as a flashing lamp, before we follow a long journey comprising of the time-reversal principles in Section 3 and the semimartingale decomposition of \(\ell^\beta\) and \(\mathcal{R}^\beta\) in Section 4, which eventually leads to the trajectorial approach in Section 5.

The classical result on relative entropy dissipation is phrased as the time-derivative of the relative entropy (7 ), in the absence of perturbation, \[\lim\limits_{t\searrow t_0}\frac{1}{t-t_0}\bigg(\mathbb{H}[P^0_t|Q]-\mathbb{H}[P^0_{t_0}|Q]\bigg)=-\frac{1}{2}\mathbb{I}[P^0_{t_0}|Q]\;\;\text{for all}\;\;t_0\geq0,\] which renders us the well-known de Bruijin identity [23]. An observation of the above limiting identity tells us that the time-derivative of relative entropy is therefore expressed as the Fisher information modulo a multiplicative constant. Another quantity of interest is to estimate the limiting behavior of the metric from the Borel probability measure \(P^\beta_t\) to \(P^\beta_{t_0}\) on \(\mathbb{R}^d\), as \(t\searrow t_0\).

To give a more precise description to this metric, we define \(\mathscr{P}_2(\mathbb{R}^d)\) to be the quadratic Wasserstein space, whose elements consist of all probability measures on \(\mathbb{R}^d\) admitting a finite second moment. And the space \(\mathscr{P}_2(\mathbb{R}^d)\) is equipped with the suitably defined quadratic Wasserstein metric \(W_2(\mu,\nu)\) for all \(\mu,\nu\in\mathscr{P}_2(\mathbb{R}^d)\). For now we just view \(W_2\) as a well-defined metric on \(\mathscr{P}_2(\mathbb{R}^d)\). Its exact definition as well as the detailed discussion of the quadratic Wasserstein space will be deferred to Section 6. We are interested in the limiting behavior of \(W_2(P^\beta_t,P^\beta_{t_0})\) as \(t\searrow t_0\). In Section 6, it will be shown that \[\lim\limits_{t\searrow t_0}\frac{1}{t-t_0}W_2(P^0_t,P^0_{t_0})=\frac{1}{2}\sqrt{\mathbb{I}[P^0_{t_0}|Q]}.\]

Clearly, the limiting time-derivative of the relative entropy dissipation and that of the quadratic Wasserstein metric are strongly correlated in the sense that \[\lim\limits_{t\searrow t_0}\frac{\,\mathbb{H}[P^0_t|Q]-\mathbb{H}[P^0_{t_0}|Q]\,}{W_2(P^0_t,P^0_{t_0})}=-\sqrt{\mathbb{I}[P^0_{t_0}|Q]},\] which reveals also the fact that Fisher information (8 ) serves as a bridge-gate between the relative entropy and the quadratic Wasserstein metric. The idea to consider the relative entropy dissipation in the context of quadratic Wasserstein space was first discussed by Jordan/Kinderlehrer/Otto [32] and Otto [36].

This expository article takes into consideration an external deterministic perturbation, i.e. there is no extra randomness governing the perturbation field, to the unperturbed Itô-Langevin dynamics. This is a natural extension to the known results, but its major importance is the revelation of the so called steepest descent property of the relative entropy dissipation. The steepest descent property can only be precisely described after we have presented our analysis of the time-displacement of \((P^\beta_t)_{t\geq0}\) viewed from the quadratic Wasserstein space perspective in Section 6. And this steepest descent property answers the question why the unperturbed dynamics is remarkably different from the the same stochastic systems placed under the smooth perturbation field \(\beta(\cdot)\).

To achieve the trajectorial formulation of the relative entropy dissipation, it is more convenient to look at things backward in time. The following Section 3 presents no new results, but it contains all the necessary background theory of stochastic processes under time-reversal.

3 Time-reversal of diffusion processes↩︎

As announced in the end of Section 2, this preparatory section contains ramifications on the theory of time-reversal principles of diffusions. We choose to present this general topic before discussing the semimartingale decomposition of relevant processes \(\ell^\beta\) and \(\mathcal{R}^\beta\) in Section 4 as well as the trajectorial formulation in Section 5, because many time-reversal techniques are adopted to formalize the main results, which are conveniently written in a backward-time fashion. For a pedagogical reasoning, a courtesy on various filtrations, Wiener processes, and Itô integration under a backward-time approach becomes quite necessary.

3.1 Historical comments↩︎

The principle of time-reversal in stochastic analysis has a distinguished history in many disciplines of sciences. This type of question has been of interest to physicists, most notably Guerra/Marra [37], Nelson [38], and Witten [39], [40] as well as to control theorists Lindquist/Picci [41], Goussev/Jalabert/Pastawski/Wis-niacki [42]. The philosophy of time-reversal principles has also shed light to economists, see Zumbach [43]. Previous to our work, the connection between time-reversal dynamics and the Itô-Langevin stochastic differential equations has also been discussed in [44][47].

It is well-known that Markov process remains a Markov process under time-reversal [48]. However, the strong Markovian property is not necessarily preserved under time-reversal [49], and neither is the semimartingale property [50]. So it is of interest to see whether the diffusion property, i.e. strong Markovian semimartingale property, is preserved under time-reversal.

Instead of analyzing the Itô-Langevin dynamics (1 ), we start with a general \(\mathbb{R}^d\)-valued diffusion process, i.e. strong Markovian continuous semimartingale, \((S_t)_{t\geq0}\) driven by a stochastic differential equation, see for instance (10 ), with smooth drift and dispersion coefficients. Our main goal in this section is to assertion that its time-reversed process, \[\label{time-reversed32process} \widehat{S}_t\mathrel{\vcenter{:}}= S_{T-t}\;\;\text{for all}\;\;0\leq t\leq T,\tag{9}\] is a diffusion, adapted to a backward filtration which will be specified later, provided sufficient regularity on its constraint stochastic differential equation, for instance (10 ). Such question goes back to Boltzmann [4], [5], [51], Schrödinger Schrödinger1?, Schrödinger2?, and Kolmogorov [52]. Time-reversal of stochastic processes was dealt with systematically by Nelson [38], Carlen [53] in the context of dynamical theory for diffusions. It was developed in the context of filtering, interpolation and extrapolation by Haussmann/Pardoux [48] and Pardoux [47]. In a non-Markovian context, the time-reversal of diffusions was developed by Föllmer Föllmer1?, Föllmer2?. See also Margarint [54] and Napolitano/Sakurai [55] for the time-reversal principles applied to Mathematical Physics.

In this expository article, we focus on the time-reversal principles relevant to the Itô-Langevin stochastic differential equation (1 ) and demonstrate that the time-reversal of its solution process maintains the diffusion property, provided sufficient regularity conditions on its drift and dispersion terms, under a suitable filtered probability space. Henceforth, in Sections 4 and 5 where we formalize the trajectorial interpretation of the relative entropy dissipation, it becomes safe to wielding the time-reversal techniques. Moreover, it is convenient to restrict our discussion to a compact time horizon \(T>0\) without loss of generality.

3.2 Backward filtrations↩︎

Under time-reversal, the backward processes are no longer adapted to the original forward-time filtrations. Consequently, it is necessary to construct some new filtrations, from the known information, which expand backward in time. For a reference on the theory of filtrations, readers are referred to Protter [56]. We place a filtered probability space \((\Omega,\mathcal{F},\mathbb{F},\mathbb{P})\) with the forward-time filtration \(\mathbb{F}\mathrel{\vcenter{:}}=(\mathcal{F}_t)_{0\leq t\leq T}\), where \[\mathcal{F}_t\mathrel{\vcenter{:}}=\sigma(\xi,W_\theta:\,0\leq\theta\leq t)\quad\text{for all}\quad0\leq t\leq T,\] modulo \(\mathbb{P}\)-augmentation. Here \(\xi\) is an \(\mathcal{F}_0\)-measurable and \((W_t)_{0\leq t\leq T}\) is an \(\mathbb{F}\)-Brownian motion starting from zero, independent of \(\xi\). Next, consider the stochastic differential equations \[\label{forward32diffusion32equation} S^{(i)}_t=\xi^{(i)}+\int_0^ta_i(\theta,S_\theta)\,d\theta+\sum\limits_{\nu=1}^m\int_0^tb_{i\nu}(\theta,S_\theta)\,dW^{(\nu)}_\theta\quad\text{for all}\quad0\leq t\leq T,\tag{10}\] with \(i=1,2,\ldots,d\). We assume that (10 ) admits a pathwise unique strong solution, which conforms to our Itô-Langevin setting of the coordinate process \((X_t)_{0\leq t\leq T}\), where its strong existence and the pathwise uniqueness property is verified in Lemma 1. Then, \(S=(S^{(i)},\ldots,S^{(d)})^T\) is \(\mathbb{F}\)-adapted and \[\mathcal{F}_t=\sigma(S_\theta,W_\theta:\,0\leq\theta\leq t)=\sigma(S_0,W_t-W_\theta:\,0\leq\theta\leq t)\quad\text{for all}\quad0\leq t\leq T,\] modulo \(\mathbb{P}\)-augmentation. It follows that \((S_t)_{0\leq t\leq T}\) has the \((\mathcal{F}_t)_{0\leq t\leq T}\)-strong Markovian property, see [57].

We further assume that the drifts \(a_i(t,x)\) and dispersions \(b_{i\nu}(t,x)\) are of class \(\mathcal{C}^\infty(\mathbb{R}_+\times\mathbb{R}^d;\mathbb{R})\), for all \(1\leq i\leq d\) and \(1\leq\nu\leq m\). Hence their regularity contains enough smoothness. And the covariance matrix \(\sigma(t,x)\) of (10 ), given by \[\sigma_{ij}(t,x)\mathrel{\vcenter{:}}=\sum\limits_{\nu=1}^mb_{i\nu}(t,x)b_{j\nu}(t,x),\quad\text{for all}\quad1\leq i,j\leq d,\] is of class \(\mathcal{C}^\infty(\mathbb{R}_+\times\mathbb{R}^d;\mathbb{R}^{d\times d})\). The density function \(\rho_t(\cdot):\mathbb{R}^d\to\mathbb{R}_+\) of the marginal law of \(S_t\) against the Lebesgue measure on \(\mathbb{R}^d\) solves the forward Kolmogorov equation [58], [57], \[\frac{\partial \rho_t}{\partial t}(x)=\frac{1}{2}\sum\limits_{1\leq i,j\leq d}\frac{\partial^2}{\partial x_i\partial x_j}\big(\sigma_{ij}(t,x)\rho_t(x)\big)-\sum\limits_{1\leq i\leq d}\frac{\partial}{\partial x_i}\big(a_i(t,x)\rho_t(x)\big),\quad\text{for all}\quad(t,x)\in[0,T]\times\mathbb{R}^d.\] Define the following filtration \(\widehat{\mathbb{F}}\mathrel{\vcenter{:}}=(\widehat{\mathcal{F}}_{T-t})_{0\leq t\leq T}\) running backward in time, by \[\label{backward32filtration32F} \widehat{\mathcal{F}}_{T-t}\mathrel{\vcenter{:}}=\sigma(S_{T-\theta},W_{T-\theta}-W_{T-t}:\,0\leq\theta\leq t)\quad\text{for all}\quad0\leq t\leq T.\tag{11}\] For each \(0\leq t\leq T\), This \(\sigma\)-algebra \(\widehat{\mathcal{F}}_{T-t}\) can be equivalently expressed as \[\begin{align} \widehat{\mathcal{F}}_{T-t}&=\sigma(S_{T-t},W_{T-\theta}-W_{T-t}:\,0\leq\theta\leq t)= \sigma(S_{T-t},W_{T}-W_{T-\theta}:\,0\leq\theta\leq t)\\ &=\sigma(S_{T},W_{T-t}-W_{T-\theta}:\,0\leq\theta\leq t)=\sigma(S_T)\vee\mathcal{H}_{T-t}, \end{align}\] where \(\mathcal{H}_{T-t}\mathrel{\vcenter{:}}=\sigma(W_{T-t}-W_{T-\theta}:\,0\leq\theta\leq t)\) is independent of the random vector \(S_{T-t}\), for all \(0\leq t\leq T\), see [59]. Then the backward-time process \((\widehat{S}_t)_{0\leq t\leq T}\) and \(\widetilde{W}_t\mathrel{\vcenter{:}}= W_{T-t}-W_T\), \(0\leq t\leq T\) are both adapted to \(\widehat{\mathbb{F}}\) defined in (11 ). Indeed, the \(\sigma\)-algebra \(\widehat{\mathcal{F}}_{T-t}\) can be further expressed as \[\widehat{\mathcal{F}}_{T-t}=\sigma(\widehat{S}_\theta,\widetilde{W}_\theta-\widetilde{W}_t:\,0\leq\theta\leq t)=\sigma(\widehat{S}_0)\vee\mathcal{H}_{T-t},\quad \text{where}\;\;\mathcal{H}_{T-t}=\sigma(\widetilde{W}_\theta-\widetilde{W}_t:\,0\leq\theta\leq t).\]

The notion of a backward filtration is essential, once we put the time-evolution under a reversed direction. To have a meaningful discussion on the relevant backward-time stochastic processes, it is necessary to specify the backward-time filtrations to which these processes are filtered. In consequence, the above argument provides necessary supplements to guarantee this point, when we write down the semimartingale decomposition of \(\ell^\beta\) as well as \(\mathcal{R}^\beta\) backward in time in Section 4, and when we formulate the trajectorial approach of the relative entropy dissipation written in a backward-time fashion in Section 5.

3.3 Wiener process and Itô integration↩︎

Running filtrations under time-reversal induces a new question. How do we identify an adapted backward-time Wiener process? Indeed, it is possible that the time-reversal of a forward-time Brownian motion loses its martingale property under a backward filtration. Nonetheless, if we subtract a proper backward-time finite variation process, the Lévy theorem Le?, Gall? yields the adapted Brownian motion under time-reversal.

Lemma 3. The backward-time process \((\widetilde{W}_t)_{0\leq t\leq T}\) is a Brownian motion of its own filtration \((\mathcal{H}_{T-t})_{0\leq t\leq T}\), but only a semimartingale to the strictly larger filtration \(\widehat{\mathbb{F}}\). On the other hand, if we define the backward-time process \((B_t)_{0\leq t\leq T}=((B_t^{(1)}\ldots,B_t^{(m)})^T)_{0\leq t\leq T}\) by \[\label{definition32of32B} B^{(\nu)}_t\mathrel{\vcenter{:}}=\widetilde{W}^{(\nu)}_t-\int_0^t\rho_{T-\theta}^{-1}\sum\limits_{1\leq i\leq d}\frac{\partial}{\partial x_i}\big(\rho_{T-\theta}(\cdot)b_{i\nu}(T-\theta,\cdot)\big)(\widehat{S}_\theta)\,d\theta,\quad\text{for all}\quad0\leq t\leq T,\tag{12}\] with \(\nu=1,2,\ldots,m\). Then \((B_t)_{0\leq t\leq T}\) is an \(\mathbb{R}^m\)-valued \(\widehat{\mathbb{F}}\)-adapted Brownian motion independent of \(\widehat{\mathcal{F}}_T\), and therefore also independent of \(S_T\).

Proof. First, we need to show that each component \(B^{(\nu)}\), \(\nu=1,\ldots,m\) of the backward-time process \(B\) is a \(\widehat{\mathbb{F}}\)-adapted martingale. In other words, for all bounded \(\widehat{\mathcal{F}}_t\)-measurable \(\mathcal{K}\), we have to show that \[\label{how32to32show32B32is32martingale} \mathbb{E}^{\mathbb{P}}\big[\big(B^{(\nu)}_{T-\theta}-B^{(\nu)}_{T-t}\big)\mathcal{K}\big]=0,\quad\text{for all}\quad0\leq\theta\leq t\leq T.\tag{13}\] Since \(\mathbb{E}^{\mathbb{P}}[\mathcal{K}|\mathcal{F}_t]=\mathbb{E}^{\mathbb{P}}[\mathcal{K}|S_t]\) \(\mathbb{P}\)-a.s. there exists a Borel measurable \(K_t:\mathbb{R}^m\to\mathbb{R}\) such that \(K_t(S_t)=\mathbb{E}^{\mathbb{P}}[\mathcal{K}|\mathcal{F}_t]\). We further define \(K_\theta(x)\mathrel{\vcenter{:}}=\mathbb{E}^{\mathbb{P}}[K_t(S_t)|S_\theta=x]\) for all \((\theta,x)\in[0,t]\times\mathbb{R}^m\). Invoking the Markovian property of \((S_t)_{0\leq t\leq T}\) and following the ideas from Meyer [60], we deduce that the process, \[K_\theta(S_\theta)=\mathbb{E}^{\mathbb{P}}\big[K_t(S_t)|S_\theta\big]=\mathbb{E}^{\mathbb{P}}\big[\mathcal{K}|\mathcal{F}_\theta\big],\quad\text{for all}\quad0\leq\theta\leq t,\] is an \(\mathbb{F}\)-martingale, and therefore, \[K_t(S_t)-K_\theta(S_\theta)=\sum\limits_{1\leq i\leq d}\sum\limits_{1\leq\nu\leq m}\int_\theta^t\frac{\partial K_\tau}{\partial x_i}(S_\tau)b_{i\nu}(\tau,S_\tau)\,dW^{(\nu)}_\tau.\] Since \(\mathbb{E}^{\mathbb{P}}[(W^{(\nu)}_t-W^{(\nu)}_\theta)K_t(S_t)]=\mathbb{E}^{\mathbb{P}}[(W^{(\nu)}_t-W^{(\nu)}_\theta)(K_t(S_t)-K_\theta(S_\theta))]\), \[\mathbb{E}^{\mathbb{P}}\big[\big(W^{(\nu)}_t-W^{(\nu)}_\theta\big)K_t(S_t)\big]=\mathbb{E}^{\mathbb{P}}\big[\sum\limits_{i=1}^d\int_\theta^t\frac{\partial K_\tau}{\partial x_i}(S_\tau)b_{i\nu}(\tau,S_\tau)\,d\tau\big]=\sum\limits_{i=1}^d\int_\theta^t\int_{\mathbb{R}^d}\big(b_{i\nu}(\tau,\cdot)\frac{\partial K_\tau}{\partial x_i}\big)(x)\rho_\tau(x)\,dx\,d\tau.\] Integrating by parts, for each \(\nu=1,\ldots,m\), this yields, \[\begin{align}\label{consequence4432why32B32is32martingale} &\;\;\;\;-\mathbb{E}^{\mathbb{P}}\big[\big(W^{(\nu)}_t-W^{(\nu)}_\theta\big)K_t(S_t)\big]=\sum\limits_{i=1}^d\int_\theta^t\int_{\mathbb{R}^d}K_\tau(x)\frac{\partial}{\partial x_i}\big(\rho_\tau(\cdot)b_{i\nu}(\tau,\cdot)\big)(x)\,dx\,d\tau\\ &=\int_\theta^t\mathbb{E}^{\mathbb{P}}\big[K_\tau(S_\tau)\rho^{-1}_\tau\sum\limits_{i=1}^d\frac{\partial}{\partial x_i}\big(\rho_\tau(\cdot)b_{i\nu}(\tau,\cdot)\big)(S_\tau)\big]\,d\tau=\mathbb{E}^{\mathbb{P}}\big[K_t(S_t)\int_\theta^t\rho^{-1}_\tau\sum\limits_{i=1}^d\frac{\partial}{\partial x_i}\big(\rho_\tau(\cdot)b_{i\nu}(\tau,\cdot)\big)(S_\tau)\,d\tau\big]. \end{align}\tag{14}\] Combining (12 ) and (14 ), we get \[\mathbb{E}^{\mathbb{P}}\bigg[\mathbb{E}^{\mathbb{P}}\big[\mathcal{K}|\mathcal{F}_t\big]\bigg(W^{(\nu)}_t-W^{(\nu)}_\theta+\int_\theta^t\rho_\tau^{-1}\sum\limits_{i=1}^d\frac{\partial}{\partial x_i}\big(\rho_\tau(\cdot)b_{i\nu}(\tau,\cdot)\big)(S_\tau)\,d\tau\bigg)\bigg]=0,\quad\text{for all}\quad0\leq\theta\leq t\leq T,\] which is equivalent to (13 ), where the conditional expectation can be removed because both \(W^{(\nu)}_t-W^{(\nu)}_\theta\) and \((S_\tau)_{\arabic{footnote}\leq\tau\leq t}\) are \(\mathcal{F}_t\)-measurable. Hence, \((B^{\nu}_t)_{0\leq t\leq T}\) is a \(\widehat{\mathbb{F}}\)-martingale for each \(\nu=1,\ldots,m\). In view of the continuity of the sample paths and the property, \[\big\langle B^{(\mu)}, B^{(\nu)}\big\rangle_t=\big\langle \widetilde{W}^{(\mu)}, \widetilde{W}^{(\nu)}\big\rangle_t=t\delta_{\mu\nu}\quad\text{for all}\quad1\leq\mu,\nu\leq m\quad\text{and}\quad0\leq t\leq T,\] we can infer that each \((B^{\nu}_t)_{0\leq t\leq T}\) is a \(\widehat{\mathbb{F}}\)-Brownian motion such that \(B^{(\mu)}\) and \(B^{(\nu)}\) are mutually independent for all \(\mu\neq\nu\), by appealing to Lévy theorem Le?, Gall?. Henceforth, \((B_t)_{0\leq t\leq T}\) is a \(\mathbb{R}^m\)-valued \(\widehat{\mathbb{F}}\)-Brownian motion. ◻

Our main goal of this section is to verify that the time-reversal \((\widehat{S}_t)_{0\leq t\leq T}\) is a diffusion process, under some suitable backward filtrations. In fact, we furthermore specify its semimartingale decomposition in Lemma 4. To achieve this goal, we need to introduce a notion of backward stochastic integration which uses finite sums of backward increments to approximate the stochastic integrals. Such scheme is essential to the proof of Lemma 4, and is called the backward Itô integration.

Consider two continuous semimartingales \(X_t=X_0+M_t+K_t\) and \(Y_t=Y_0+N_t+L_t\), where \((M_t)_{0\leq t\leq T}\) and \((N_t)_{0\leq t\leq T}\) are continuous local martingales, \((K_t)_{0\leq t\leq T}\) and \((L_t)_{0\leq t\leq T}\) are continuous finite variation processes. By analogy with its forward-time counterpart, the backward Itô integral Le?, Gall? is defined by, \[\label{backward32Itô32integral} \int_0^tY_\theta\bullet dX_\theta\mathrel{\vcenter{:}}=\int_0^tY_\theta\,dM_\theta+\int_0^tY_\theta\,dK_\theta+\langle M,N\rangle_t,\quad\text{for all}\quad0\leq t\leq T.\tag{15}\] If \(\Pi=\{t_0=0,t_1,\ldots,t_m=T\}\) is a partition of the time interval \([0,T]\), denote by \(\norm{\Pi}\mathrel{\vcenter{:}}=\max\{t_{j}-t_{j-1}:\,1\leq j\leq m\}\). And we have the following convergence in probability [61], \[\sum\limits_{0\leq j\leq m-1}Y_{t_{j+1}}(X_{t_{j+1}}-X_{t_j})\xlongrightarrow{\,\mathbb{P}\,}\int_0^TY_t\bullet dX_t,\quad\text{as}\quad\norm{\Pi}\to0.\] And for all \(f\in\mathcal{C}^2(\mathbb{R}^d;\mathbb{R})\), the backward Itô integral admits the change of variable formula [62], [61], \[f(X_t)=f(X_0)+\sum\limits_{1\leq i\leq d}\int_0^t\frac{\partial f}{\partial x_i}(X_\theta)\bullet dX^{(i)}_\theta-\frac{1}{2}\sum\limits_{1\leq i,j\leq d}\int_0^t\frac{\partial^2f}{\partial x_i\partial x_j}(X_\theta)\,d\langle M^{(i)},M^{(j)}\rangle_\theta,\] for all \(0\leq t\leq T\). Following the definition of the backward Itô integral, in Lemma 4, we show that the time-reversal \((\widehat{S}_t)_{0\leq t\leq T}\) is indeed a \(\widehat{\mathbb{F}}\)-diffusion.

3.4 Diffusions under time-reversal↩︎

The main goal of this section says that a forward-time diffusion, with sufficient regularity on its drift and dispersion coefficients, remains a diffusion process under time-reversal with respect to a suitable backward filtration. It is important because we perform a time-reversal technique to the backward-time semimartingale decomposition of \(\ell^\beta\) and \(\mathcal{R}^\beta\) in Lemmas 6 and 7, as well as to the trajectorial formulation of relative entropy dissipation in Theorems 1 and 2.

Lemma 4. Given a \(\mathbb{R}^d\)-valued diffusion process \((S_t)_{0\leq t\leq T}\) adapted to \((\mathcal{F}_{t})_{0\leq t\leq T}\), define its time-reversal \((\widehat {S}_t)_{0\leq t\leq T}\) as in (9 ) and define the backward filtration \(\widehat{\mathbb{F}}\) as in (11 ). Then, \((\widehat {S}_t)_{0\leq t\leq T}\) is an \(\widehat{\mathbb{F}}\)-adapted diffusion, i.e. a strong Markovian semimartingale, with the decomposition, \[\label{backward-time32S} \widehat{S}^{(i)}_t=\widehat{S}^{(i)}_0+\int_0^t\widehat{a}_i(T-\theta,\widehat{S}_\theta)\,d\theta+\sum\limits_{\nu=1}^m\int_0^tb_{i\nu}(T-\theta,\widehat{S}_\theta)\,dB^{(\nu)}_\theta\quad\text{for all}\quad0\leq t\leq T,\tag{16}\] where for each \(i=1,\ldots,d\), \[\widehat{a}_i(t,x)\mathrel{\vcenter{:}}=\sum\limits_{1\leq j\leq d}\frac{\partial\sigma_{ij}}{\partial x_j}(t,x)+\sum\limits_{1\leq j\leq d}\sigma_{ij}(t,x)\frac{\partial}{\partial x_j}\log \rho_t(x)-a_i(t,x),\quad\text{for all}\quad(t,x)\in[0,T]\times\mathbb{R}^d.\]

Proof. From (10 ) and by the Itô formula, the process \[b_{i\nu}(t,S_t)-b_{i\nu}(0,\xi)-\sum\limits_{1\leq j\leq d}\sum\limits_{1\leq\kappa\leq m}\int_0^t\big(b_{j\kappa}(\theta,\cdot)\frac{\partial b_{i\nu}}{\partial x_j}(\theta,\cdot)\big)(S_\theta)\,dW^{(\kappa)}_\theta\] is of finite variation. Hence, \[\label{quadratic32variation} \big\langle b_{i\nu}(\cdot,S),W^{(\nu)}\big\rangle_t=\sum\limits_{1\leq j\leq d}\int_0^t\big(b_{j\nu}(\theta,\cdot)\frac{\partial b_{i\nu}}{\partial x_j}(\theta,\cdot)\big)(S_\theta)\,d\theta\quad\text{for all}\quad0\leq t\leq T.\tag{17}\] On the other hand, we can express the forward-time diffusion \((S_t)_{0\leq t\leq T}\) in terms of backward Itô integral, \[S^{(i)}_t-\xi^{(i)}-\int_0^ta_i(\theta,S_\theta)\,d\theta=\sum\limits_{\nu=1}^m\int_0^tb_{i\nu}(\theta,S_\theta)\bullet dW^{(\nu)}_\theta-\big\langle b_{i\nu}(\cdot,S),W^{(\nu)}\big\rangle_t.\] Combining with (17 ), we observe that \[S^{(i)}_t=\xi^{(i)}-\int_0^t\bigg(\sum\limits_{j=1}^d\sum\limits_{\nu=1}^m b_{j\nu}(\theta,\cdot)\frac{\partial b_{i\nu}}{\partial x_j}(\theta,\cdot)-a_i(\theta,\cdot)\bigg)(S_\theta)\,d\theta+\sum\limits_{\nu=1}^m\int_0^tb_{i\nu}(\theta,S_\theta)\bullet dW^{(\nu)}_\theta.\] Evaluating also at the terminal time point \(T\), this gives \[S^{(i)}_t=S^{(i)}_T+\int_t^T\bigg(\sum\limits_{j=1}^d\sum\limits_{\nu=1}^m b_{j\nu}(\theta,\cdot)\frac{\partial b_{i\nu}}{\partial x_j}(\theta,\cdot)-a_i(\theta,\cdot)\bigg)(S_\theta)\,d\theta-\sum\limits_{\nu=1}^m\int_t^Tb_{i\nu}(\theta,S_\theta)\bullet dW^{(\nu)}_\theta,\] as well as \[\widehat{S}_t=\widehat{S}_0+\int_0^t\bigg(\sum\limits_{j=1}^d\sum\limits_{\nu=1}^m b_{j\nu}(T-\theta,\cdot)\frac{\partial b_{i\nu}}{\partial x_j}(T-\theta,\cdot)-a_i(T-\theta,\cdot)\bigg)(\widehat{S}_\theta)\,d\theta+\sum\limits_{\nu=1}^m\int_0^tb_{i\nu}(T-\theta,\widehat{S}_\theta)\, d\widetilde{W}^{(\nu)}_\theta.\] via time-reversal. In light of (12 ), we could write the time-reversed process \((\widehat{S}_t)_{0\leq t\leq T}\) into \[\begin{align} \widehat{S}_t&=\widehat{S}_0+\sum\limits_{\nu=1}^m\int_0^tb_{i\nu}(T-\theta,\widehat{S}_\theta)\,dB^{(\nu)}_\theta+\int_0^t\bigg(\sum\limits_{j=1}^d\sum\limits_{\nu=1}^m b_{j\nu}(T-\theta,\cdot)\frac{\partial b_{i\nu}}{\partial x_j}(T-\theta,\cdot)\bigg)(\widehat{S}_\theta)\,d\theta\\ &\quad+\int_0^t\bigg(\sum\limits_{\nu=1}^m\rho^{-1}_{T-\theta}(\cdot)b_{i\nu}(T-\theta,\cdot)\sum\limits_{j=1}^d\frac{\partial}{\partial x_j}\big(\rho_{T-\theta}(\cdot)b_{j\nu}(T-\theta,\cdot)\big)-a_i(T-\theta,\cdot)\bigg)(\widehat{S}_\theta)\,d\theta, \end{align}\] which provides a semimartingale decomposition for the \(\widehat{\mathbb{F}}\)-adapted process \((\widehat {S}_t)_{0\leq t\leq T}\).

The strong Markovian property follows from the existence of a strong solution (in our assumption) of (10 ) which is pathwise unique, see Le Gall Le?, Gall?. Hence, the time-reversal \((\widehat {S}_t)_{0\leq t\leq T}\) is a \(\widehat{\mathbb{F}}\)-diffusion process. Calculating the drift coefficients gives us (16 ). ◻

Under sufficient regularity conditions, the Lemma 4 shows that the time-reversal of a diffusion process remains a diffusion, adapted to a suitable backward filtration. This lemma paves the way to the investigation of many relevant processes backward in time, which also gives us their semimartingale decomposition.

4 Semimartingale decomposition of \(\ell^\beta\) and \(\mathcal{R}^\beta\)↩︎

The aim of this section is to provide a semimartingale decomposition to the likelihood ratio process \(\ell^\beta\) (5 ) and its logarithm, the relative entropy process \(\mathcal{R}^\beta\) (6 ), both running backward in time. Later in Lemma 8, it is verifies that the local martingale part from the semimartingale decomposition of \((\mathcal{R}^\beta_{T-t}(X_{T-t}))_{0\leq t\leq T}\) is in fact a square integrable martingale, adapted to a suitable backward filtration. This martingale property allows us to take \(\mathbb{P}^\beta\)-expectation to the backward-time \(\mathcal{R}^\beta\) without invoking the localization sequence of stopping times, and henceforth retrieves the relative entropy quantity.

After presenting the general principles of time-reversal of diffusions in Section 3, let us turn our attention to the Itô-Langevin dynamics (1 ). The semimartingale decomposition of \(\mathcal{R}^\beta\), and hence also of \(\ell^\beta\), requires some knowledge of the differential structure of the likelihood ratio \(\ell^\beta_t(x)=p^\beta_t(x)e^{2\psi(x)}\), \((t,x)\in\mathbb{R}_+\times\mathbb{R}^d\). First, we write down the partial differential equation which is satisfied by the density function \(p^\beta_t(x)\), \((t,x)\in\mathbb{R}_+\times\mathbb{R}^d\). This type of partial differential equation is called the Fokker-Planck equation, which is internally connected to the Itô-Langevin dynamics, as stated in Section 2.

4.1 Fokker–Planck equation↩︎

In Section 2, we denote by \(\mathbb{P}^\beta\) the distribution on \(\mathcal{C}=\mathcal{C}(\mathbb{R}_+;\mathbb{R}^d)\) of the strong solution process \((X_t)_{t\geq0}\) of the Itô-Langevin stochastic differential equation (1 ). At each \(t\geq0\), we use \(P^\beta_t\) to denote the law of the marginal \(X_t\). Each \(P^\beta_t\) is absolutely continuous with respect to the Lebesgue measure on \(\mathbb{R}^d\) and thus induces a probability density \(p^\beta_t(\cdot):\mathbb{R}^d\to\mathbb{R}_+\). The Itô–Langevin dynamics is internally connected to the Fokker—Planck equation in that \(p^\beta_t(\cdot)\) satisfies the partial differential equation [32], \[\label{perturbed32Fokker-Plank} \frac{\partial p^\beta_t}{\partial t}(x)=\sum\limits_{j=1}^d\frac{\partial}{\partial x_j}\bigg(\big(\frac{\partial\psi}{\partial x_j}(x)+\beta(x)I_{\{t>t_0\}}\big)p^\beta_t(x)\bigg)+\frac{1}{2}\sum\limits_{j=1}^d\frac{\partial^2p^\beta_t}{\partial x_j^2}(x)\quad\text{with}\quad p^\beta_{0}(\cdot)=p_{0}(\cdot),\tag{18}\] for all \((t,x)\in\mathbb{R}_+\times\mathbb{R}^d\). Here \(p^\beta_0(\cdot)=p_0(\cdot)\) is the density function of the initial distribution \(P_0\sim X_0\) to (1 ) against the Lebesgue measure on \(\mathbb{R}^d\). The existence and uniqueness of a solution to (18 ) is guaranteed, see [63]. The solution \(p^\beta_t(\cdot)\) conserves its \(L^1(\mathbb{R}^d)\) norm [18], which means that \[\int_{\mathbb{R}^d}p^\beta_t(x)\,dx\equiv1,\quad\text{for all}\quad(t,x)\in\mathbb{R}_+\times\mathbb{R}^d.\] And this conservation principle confirms that \((p^\beta_t(\cdot))_{t\geq0}\) is indeed a family of probability densities on \(\mathbb{R}^d\).

An observation of (18 ) tells us that the time-evolution of \((P^\beta_t)_{t\geq0}\), or equivalently of \((p^\beta_t(\cdot))_{t\geq0}\), is governed by the real-valued function \(\psi(\cdot)\) when \(0\leq t\leq t_0\), and additionally by the \(\mathbb{R}^d\)-valued perturbation \(\beta(\cdot)\) when \(t>t_0\). If the perturbation is switched off, i.e. \(\beta(\cdot)\) vanishes, the family of probability measures generated from (18 ) is denoted by \((P^0_t)_{t\geq0}\) with their densities denoted by \((p^0_t(\cdot))_{t\geq0}\). Here, the zero-script simply indicates that this is the case of vanishing perturbation.

To write down the relative entropy (7 ), we have introduced a \(\sigma\)-finite reference measure \(Q\) on the Borel sets of \(\mathbb{R}^d\). This reference measure \(Q\) is defined via its density function \(q(\cdot)=\exp(-2\psi(\cdot)):\mathbb{R}^d\to\mathbb{R}_+\) against the Lebesgue measure on \(\mathbb{R}^d\). In contrast to the densities \((p^\beta_t(\cdot))_{t\geq0}\), the density \(q(\cdot)\) is time-invariant and solves the stationary version of the Fokker-Plank equation [32], \[\label{Fokker-Plank32for32Q} \sum\limits_{j=1}^d\frac{\partial}{\partial x_j}\big(\frac{\partial\psi}{\partial x_j}(\cdot)q(\cdot)\big)(x)+\frac{1}{2}\sum\limits_{j=1}^d\frac{\partial^2q}{\partial x_j^2}(x)=0,\quad\text{for all}\quad x\in\mathbb{R}^d.\tag{19}\]

Some literature also name the equations (18 ) and (19 ) as the forward-Kolmogorov equations. But they refer to the same thing. Remember that we have defined the relative entropy process \((\mathcal{R}^\beta_t(X_t))_{0\leq t\leq T}=(\log\ell^\beta_t(X_t))_{0\leq t\leq T}\) (6 ) via the function \(\ell^\beta_t(x)=p^\beta_t(x)/q(x)\). Therefore, to understand the semimartingale decomposition of the processes \(\ell^\beta\) and \(\mathcal{R}^\beta\), either forward-time or backward-time, we need to characterize the differential structure of the densities \((p^\beta_t(\cdot))_{0\leq t\leq T}\) and \(q(\cdot)\) as in (18 ) and (19 ).

4.2 Filtration and time-displacement↩︎

At this stage, it becomes important to specify the relevant filtrations. We denote by \((\mathcal{F}_t)_{t\geq0}\) the smallest forward continuous filtration to which the Brownian motion \((W^\beta_t)_{t\geq0}\) and the solution process \((X_t)_{t\geq0}\) of (1 ) is adapted. That is, \[\mathcal{F}_t\mathrel{\vcenter{:}}=\sigma(X_\theta,W^\beta_\theta:\,0\leq\theta\leq t),\quad\text{for all}\quad t\geq0\] modulo \(\mathbb{P}^\beta\)-augmentation. Likewise, given the compact time interval \([0,T]\), we denote by \((\mathcal{G}_{T-t})_{0\leq t\leq T}\) the backward continuous filtration generated by the backward processes \((W^\beta_{T-t})_{t\geq0}\) and \((X_{T-t})_{0\leq t\leq T}\). That is, \[\mathcal{G}_{T-t}\mathrel{\vcenter{:}}=\sigma(X_{T-\theta},W^\beta_{T-\theta}:\,0\leq\theta\leq t),\quad\text{for all}\quad 0\leq t\leq T.\] Notice that we are using similar notations for the forward-time and backward-time filtrations as in Section 3. We hope this convention will leave no ambiguity because these two filtrations \((\mathcal{F}_t)_{0\leq t\leq T}\) and \((\mathcal{G}_{T-t})_{0\leq t\leq T}\) are essentially constructed in almost the same way to the filtrations in Section 3, except that the filtrations here are generated by the solution process \((X_t)_{t\geq0}\), rather than a general diffusion process \((S_t)_{t\geq0}\).

Even though \((W^\beta_t)_{t\geq0}\) is a \((\mathcal{F}_t)_{t\geq0}\)-adapted Brownian motion running forward in time, its time-reversal \((W^\beta_{T-t})_{0\leq t\leq T}\) is not necessarily a backward-time Brownian motion adapted to \((\mathcal{G}_{T-t})_{0\leq t\leq T}\). It turns out that this time-reversal process contains a non-trivial finite variation part in its semimartingale decomposition. And to construct a true \((\mathcal{G}_{T-t})_{0\leq t\leq T}\)-Brownian motion backward in time, we need to subtract this finite variation process.

Lemma 5. In the Itô-Langevin dynamics (1 ), \((W^\beta_t)_{t\geq0}\) is denoted to be the \(d\)-dimensional Brownian motion. The backward-time process \[\overline{W}^{\mathbb{P}^\beta}_{T-t}\mathrel{\vcenter{:}}= W^\beta_{T-t}-W^\beta_T-\int_0^t\nabla\log p_{T-\theta}^\beta(X_{T-\theta})\,d\theta,\quad\text{for all}\quad0\leq t\leq T\] is a Brownian motion of the backward filtration \((\mathcal{G}_{T-t})_{0\leq t\leq T}\) under \(\mathbb{P}^\beta\). Furthermore, the time-reversed process \((X_{T-t})_{0\leq t\leq T}\) is a \((\mathcal{G}_{T-t})_{0\leq t\leq T}\)-diffusion process with its semimartingale decomposition given by \[\label{semimartingale32decomposition32of32X} dX_{T-t}=\nabla\log p_{T-t}^\beta(X_{T-t})\,dt+(\nabla\psi+\beta I_{\{0\leq t<T-t_0 \}})(X_{T-t})\,dt+d\overline{W}^{\mathbb{P}^\beta}_{T-t},\quad\text{for all}\quad0\leq t\leq T,\tag{20}\] with respect to the backward filtration \((\mathcal{G}_{T-t})_{0\leq t\leq T}\).

Proof. To verify that \((\overline{W}^{\mathbb{P}^\beta}_{T-t})_t\) is a \((\mathcal{G}_{T-t})_{0\leq t\leq T}\)-Brownian motion, we use Lemma 3. And to verify that \((X_{T-t})_{0\leq t\leq T}\) is a \((\mathcal{G}_{T-t})_{0\leq t\leq T}\)-diffusion process with its semimartingale decomposition (20 ), we use Lemma 4. And then the assertion is verified. ◻

Having selected the suitable backward filtration and Brownian motions, we look at the semimartingale decomposition of the likelihood ratio process \((\ell^\beta_{T-t}(X_{T-t}))_t\) and of the relative entropy process \((\mathcal{R}^\beta_{T-t}(X_{T-t}))_t\), whose pathwise behavior is the essence to the trajectorial formulation of relative entropy dissipation.

4.3 Semimartingale decomposition↩︎

Looking back to the forward-time coordinate process \((X_t)_{t\geq0}\) on \(\mathcal{C}\) characterized by (1 ), we aim for the semimartingale decomposition of the processes \(\ell^\beta\) and \(\mathcal{R}^\beta\) running under time-reversal. This will be the first step to understand the trajectorial formulation of relative entropy dissipation in Section 5.

Lemma 6. The backward-time likelihood ratio process \((\ell^\beta_{T-t}(X_{T-t}))_{0\leq t\leq T}\) is a semimartingale adapted to \((\mathcal{G}_{T-t})_{0\leq t\leq T}\) with decomposition \[\begin{align}\label{semimartingale32decomposition32of32l} d\ell^\beta_{T-t}(X_{T-t})&=\big(2\beta\cdot\nabla\psi-\sum\limits_{1\leq i\leq d}\frac{\partial\beta^{(i)}}{\partial x_i}\big)(X_{T-t})\ell^\beta_{T-t}(X_{T-t})I_{\{0\leq t<T-t_0\}}\,dt\\ &\quad+\big\lVert\nabla\ell^\beta_{T-t}(X_{T-t})\big\rVert^2\,dt+\nabla\ell^\beta_{T-t}(X_{T-t})\,d\overline{W}^{\mathbb{P}^\beta}_{T-t},\quad\text{for all}\quad0\leq t\leq T. \end{align}\tag{21}\]

Proof. Since \(\ell^\beta_{T-t}(X_{T-t})=dP^\beta_{T-t}/dQ\), we know \(\ell^\beta_{T-t}(\cdot)=p^\beta_{T-t}(\cdot)+\exp(2\psi(\cdot))\). From (18 ), (19 ), we can compute that \[\frac{\partial\ell^\beta_{T-t}}{\partial t}(x)=-\frac{1}{2}\Delta\ell^\beta_{T-t}(x)+\nabla\ell^\beta_{T-t}\cdot\big(\nabla\psi-\beta I_{\{0\leq t<T-t_0\}}\big)(x)+\big(2\beta\cdot\nabla\psi-\sum\limits_{1\leq i\leq d}\frac{\partial\beta^{(i)}}{\partial x_i}\big)(x)\ell^\beta_{T-t}(x)I_{\{0\leq t<T-t_0\}}.\] Applying (20 ) and invoking the Itô formula, (21 ) follows, and the assertion is verified. ◻

Lemma 6 gives us a \((\mathcal{G}_{T-t})_{0\leq t\leq T}\) semimartingale decomposition of the backward-time likelihood ratio process \((\ell^\beta_{T-t})_{0\leq t\leq T}\). Since taking its logarithm produces the relative entropy process \((\mathcal{R}^\beta_{T-t})_{0\leq t\leq T}\), we write the following derivation.

Lemma 7. The backward-time relative entropy process \((\mathcal{R}^\beta_{T-t}(X_{T-t}))_{0\leq t\leq T}\) is a semimartingale adapted to \((\mathcal{G}_{T-t})_{0\leq t\leq T}\) with decomposition \[\begin{align}\label{semimartingale32decomposition32of32R} d\mathcal{R}^\beta_{T-t}(X_{T-t})&=\big(2\beta\cdot\nabla\psi-\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}\big)(X_{T-t})I_{\{0\leq t<T-t_0\}}\,dt\\ &\quad+\frac{1}{2}\big\lVert\nabla\mathcal{R}^\beta_{T-t}(X_{T-t})\big\rVert^2\,dt+\nabla\mathcal{R}^\beta_{T-t}(X_{T-t})\,d\overline{W}^{\mathbb{P}^\beta}_{T-t},\quad\text{for all}\quad0\leq t\leq T. \end{align}\tag{22}\]

Proof. Remember that \(\mathcal{R}^\beta_{T-t}(X_{T-t})=\log\ell^\beta_{T-t}(X_{T-t})\). Applying (21 ) and invoking the Itô formula, (22 ) follows, and the assertion is verified. ◻

The semimartingale decomposition (22 ) splits \((\mathcal{R}^\beta_{T-t}(X_{T-t}))_{0\leq t\leq T}\) into a sum of a local martingale and a finite variation process, adapted to the filtration \((\mathcal{G}_{T-t})_{0\leq t\leq T}\). In the following, we will verify that the local martingale part is actually a martingale. This property allows us to take \(\mathbb{P}^\beta\)-expectation to \(\mathcal{R}^\beta\) and henceforth cancel the martingale part without employing a localization sequence of stopping times. This scheme yields an expression of the relative entropy (7 ) using Fisher information (8 ).

For the clarity of this exposition, we introduce some new notations. Denote the backward-time cumulative32Fisher32information32process by \[\label{cumulative32Fisher32information32process} \mathcal{F}^\beta_{T-t}\mathrel{\vcenter{:}}=\int_0^t\big(2\beta\cdot\nabla\psi-\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}\big)(X_{T-\theta})I_{\{0\leq \theta<T-t_0\}}+\frac{1}{2}\big\lVert\nabla\mathcal{R}^\beta_{T-\theta}(X_{T-\theta})\big\rVert^2\,d\theta,\tag{23}\] for all \(0\leq t\leq T\), which is of finite variation and adapted to the filtration \((\mathcal{G}_{T-t})_{0\leq t\leq T}\). Simultaneously, we denote the backward-time local martingale by, \[\label{perturbed32square-integrable32martingale} \mathcal{M}^\beta_{T-t}\mathrel{\vcenter{:}}=\int_0^t\nabla\mathcal{R}^\beta_{T-t}(X_{T-t})\,d\overline{W}^{\mathbb{P}^\beta}_{T-t},\quad\text{for all}\quad0\leq t\leq T.\tag{24}\] Then the semimartingale decomposition of \((\mathcal{R}^\beta_{T-t})_{0\leq t\leq T}\) can be written as \(\mathcal{R}^\beta_{T-t}-\mathcal{R}^\beta_{T}=\mathcal{M}^\beta_{T-t}+\mathcal{F}^\beta_{T-t}\), with \(0\leq t\leq T.\) It is remarkable that in the absence of perturbation, taking \(\mathbb{P}^0\)-expectation to the cumulative Fisher information process (23 ) gives us the cumulative integral of the Fisher information (8 ) modulo a multiplicative factor \(\frac{1}{2}\), i.e.  \[\mathbb{E}^{\mathbb{P}^0}\big[\mathcal{F}^0_{t}\big]=\frac{1}{2}\int_t^T\mathbb{E}^{\mathbb{P}^0}\big[\big\lVert\nabla\mathcal{R}^0_{\theta}(X_{\theta})\big\rVert^2\big]\,d\theta=\frac{1}{2}\int_t^T\mathbb{I}\big[P^0_{\theta}|Q\big]\,d\theta,\quad\text{for all}\quad0\leq t\leq T.\] Now, we verify that \((\mathcal{M}^\beta_{T-t})_{0\leq t\leq T}\) is an uniformly integrable martingale.

Lemma 8. The backward-time continuous local martingale \((\mathcal{M}^\beta_{T-t})_{0\leq t\leq T}\) is a square integrable martingale adapted to the filtration \((\mathcal{G}^\beta_{T-t})_{0\leq t\leq T}\).

Proof. It is sufficient to show that \((\mathcal{M}^\beta_{T-t})_{0\leq t\leq T}\) is bounded in \(L^2(\mathbb{P}^\beta)\). Since we have assumed the continuity of \(t\mapsto\nabla\log\ell^\beta_t(x)\) on \([0,T]\), for any fixed \(x\in\mathbb{R}^d\), and by the continuity of the sample paths of \((X_t)_{0\leq t\leq T}\), we observe that \[\int_0^{T-\epsilon}\big\lVert\nabla\log\ell^\beta_{T-\theta}(X_{T-\theta})\big\rVert^2\,d\theta<\infty,\quad\mathbb{P}^\beta\text{-a.s.}\quad\text{for all}\quad0<\epsilon<T.\] On this account, define the sequence of stopping times by \[\tau^\beta_k\mathrel{\vcenter{:}}= T\wedge\inf\big\{t\geq0:\,\int_0^t\big\lVert\nabla\log\ell^\beta_{T-\theta}(X_{T-\theta})\big\rVert^2\,d\theta\geq k\big\},\quad\text{for all}\quad k\in\mathbb{N}.\] Then \((\tau^\beta_k)_{k\in\mathbb{N}}\) is non-decreasing and converges to \(T\), \(\mathbb{P}^\beta\)-a.s. Henceforth, \((\tau^\beta_k)_{k\in\mathbb{N}}\) is a localization sequence for the local martingale \((\mathcal{M}^\beta_{T-t})_{0\leq t\leq T}\). The stopped process \((\mathcal{M}^\beta_{T-\tau^\beta_k\wedge t})_{0\leq t\leq T}\) is therefore a \(L^2(\mathbb{P}^\beta)\)-bounded martingale adapted to \((\mathcal{G}^\beta_{T-t})_{0\leq t\leq T}\), for each \(k\in\mathbb{N}\).

Taking \(\mathbb{P}^\beta\)-expectation to the process \((\mathcal{R}^\beta_{T-t})_{0\leq t\leq T}\) at the stopping time \(\tau^\beta_k\), we observe that \[\frac{1}{2}\mathbb{E}^{\mathbb{P}^\beta}\big[\int_0^{\tau^\beta_k}\big\lVert\nabla\mathcal{R}^\beta_{T-\theta}(X_{T-\theta})\big\rVert^2\,d\theta\big]+\mathbb{E}^{\mathbb{P}^\beta}\big[\int_0^{\tau^\beta_k}\big(2\beta\cdot\nabla\psi-\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}\big)(X_{T-\theta})\,d\theta\big]=\mathbb{H}\big[P^\beta_{T-\tau^\beta_k}|Q\big]-\mathbb{H}\big[P^\beta_{T}|Q\big],\] for each \(k\in\mathbb{N}\). Since the perturbation field \(\beta(\cdot)=\nabla B(\cdot)\) is of class \(\mathcal{C}^\infty(\mathbb{R}^d;\mathbb{R}^d)\) with compact support, \[C_1\mathrel{\vcenter{:}}=\mathbb{E}^{\mathbb{P}^\beta}\big[\int_0^T\big|2\beta\cdot\nabla\psi-\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}\big|(X_{T-\theta})\,d\theta\big]<\infty.\] Henceforth, \[\frac{1}{2}\mathbb{E}^{\mathbb{P}^\beta}\big[\int_0^{\tau^\beta_k}\big\lVert\nabla\mathcal{R}^\beta_{T-\theta}(X_{T-\theta})\big\rVert^2\,d\theta\big]\leq C_1+\mathbb{H}\big[P^\beta_{T-\tau^\beta_k}|Q\big]-\mathbb{H}\big[P^\beta_{T}|Q\big].\]

Only in this proof, for all \(0\leq t\leq T\), we denote by \(Q^\beta_t\) the \(\sigma\)-finite Borel measure on \(\mathbb{R}^d\) with density \(\exp(-2(\psi+BI_{\{t>t_0\}})(\cdot))\) against the Lebesgue measure on \(\mathbb{R}^d\). A variant argument to Lemma 2 implies, \[\mathbb{H}\big[P^\beta_{T-\tau^\beta_k}|Q^\beta_{T-\tau^\beta_k}\big]\leq\mathbb{H}\big[P^\beta_{0}|Q^\beta_0\big]\quad\text{for all}\quad k\in\mathbb{N}.\] Now that we have the \(\mathbb{P}^\beta\)-a.s. boundedness of \(k\mapsto\mathbb{H}[P^\beta_{T-\tau^\beta_k}|Q^\beta_{T-\tau^\beta_k}]\). To proceed with an estimate of the terms \(\mathbb{H}[P^\beta_{T-\tau^\beta_k}|Q]\) and \(\mathbb{H}[P^\beta_{T}|Q]\), we observe that \[C_2\mathrel{\vcenter{:}}=2\max\big\{\abs{B(x)}:\,x\in\mathbb{R}^d\big\}<\infty.\] It is immediate that \(\mathbb{H}[P^\beta_{0}|Q^\beta_{0}]\leq\mathbb{H}[P_0|Q]+C_2\) and \[\mathbb{H}\big[P^\beta_{T-\tau^\beta_k}|Q\big]-C_2\leq\mathbb{H}\big[P^\beta_{T-\tau^\beta_k}|Q^\beta_{T-\tau^\beta_k}\big]\leq\mathbb{H}\big[P^\beta_{T-\tau^\beta_k}|Q\big]+C_2,\quad\text{for each}\quad k\in\mathbb{N}.\] In consequence, \[\mathbb{H}\big[P^\beta_{T-\tau^\beta_k}|Q\big]\leq\mathbb{H}\big[P^\beta_{T-\tau^\beta_k}|Q^\beta_{T-\tau^\beta_k}\big]+C_2\leq\mathbb{H}\big[P^\beta_{0}|Q^\beta_{0}\big]+C_2 \leq\mathbb{H}\big[P_0|Q\big]+2C_2\quad\text{for each}\quad k\in\mathbb{N}\] as well as \[\frac{1}{2}\mathbb{E}^{\mathbb{P}^\beta}\big[\int_0^{\tau^\beta_k}\big\lVert\nabla\mathcal{R}^\beta_{T-\theta}(X_{T-\theta})\big\rVert^2\,d\theta\big]\leq C_1+\mathbb{H}\big[P_0|Q\big]+2C_2-\mathbb{H}\big[P^\beta_T|Q\big]<\infty\] for all \(k\in\mathbb{N}\). Since \(\tau^\beta_k\to T\) as \(k\to\infty\) \(\mathbb{P}^\beta\)-a.s., the monotone convergence theorem yields \[\label{monotone32convergence} \mathbb{E}^{\mathbb{P}^\beta}\big[\big\langle\mathcal{M}^\beta\big\rangle_0\big]=\mathbb{E}^{\mathbb{P}^\beta}\big[\int_0^T\big\lVert\nabla\mathcal{R}^\beta_{T-\theta}(X_{T-\theta})\big\rVert^2\,d\theta\big]\leq2\big(C_1+\mathbb{H}\big[P_0|Q\big]+2C_2-\mathbb{H}\big[P^\beta_T|Q\big]\big)<\infty.\tag{25}\] Henceforth, the continuous local martingale \((\mathcal{M}^\beta_{T-t})_{0\leq t\leq T}\) is a \((\mathcal{G}^\beta_{T-t})_{0\leq t\leq T}\)-martingale bounded in \(L^2(\mathbb{P}^\beta)\). And the assertion is verified. ◻

The martingale property of \((\mathcal{M}^\beta_{T-t})_{0\leq t\leq T}\) from the semimartingale decomposition of \((\mathcal{R}^\beta_{T-t})_{0\leq t\leq T}\) allows us to transfer our conclusions of the pathwise behavior of \((\mathcal{R}^\beta_{T-t})_{0\leq t\leq T}\) to the quantity \(\mathbb{H}^\beta[P^\beta_{T-t}|Q]\) with \(0\leq t\leq T\) by taking \(\mathbb{P}^\beta\)-expectation, without any localization sequence of stopping times. In fact, Lemma 8 guarantees that our trajectorial formulation can retrieve the known results on relative entropy dissipation.

5 Applications to relative entropy dissipation↩︎

This section contains the main results of this expository article, namely, the trajectorial formulation of the relative entropy dissipation. Based on the preliminaries in Sections 3 and 4, our results describe some remarkable features, for instance Theorems 29 and 31 , of the pathwise behavior of \(\mathcal{R}^\beta\) under time-reversal. The reason why we look at things backward in time is explained at the end of this section, where the less transparent forward-time approach is compared to our derivations in Section 4.

The trajectorial approach is an advancement towards understanding the random fluctuations of the relative entropy of a complex system [64], [65]. This approach reveals more information from the Itô-Langevin stochastic system than the known classical results. Indeed, taking \(\mathbb{P}^\beta\)-expectation retrieves the dynamics of relative entropy. Let us now focus on this trajectorial interpretation.

5.1 Time-displacement and derivative of \(\mathcal{R}^\beta\)↩︎

The trajectorial interpretation to the dissipation of relative entropy is referred to Theorems 1 and 2 when the Itô-Langevin dynamics is placed under a perturbation \(\beta(\cdot):\mathbb{R}^d\to\mathbb{R}^d\) initiated at \(t_0\geq0\). Corollaries 1 and 2 record the scenario in the absence of perturbation. As a prelude, these corollaries will be of independent interest for the understanding of the so-called steepest descent property, to be discussed in Section 6. Our first step is an argument on the regularity control of the ratio between \(\ell^\beta\) and \(\ell^0\).

Lemma 9. Fix the time interval \([0,T]\) with \(T>t_0\). There exist \(C_1,C_2>0\) such that \[\label{bdd4432l32ratio} C_1\leq\frac{\ell^\beta_t(x)}{\ell^0_t(x)}=\frac{p_t^\beta(x)}{p^0_t(x)}\leq C_2,\quad\text{for all}\quad(t,x)\in[0,T]\times\mathbb{R}^d.\tag{26}\]

Proof. In the Itô-Langevin dynamics (1 ), the forward-time coordinate process is denoted by \((X_t)_{0\leq t\leq T}\). We have also denoted by \((W^\beta_t)_{0\leq t\leq T}\) the \(d\)-dimensional \((\mathcal{F}_t)_{0\leq t\leq T}\)-Brownian motion under \(\mathbb{P}^\beta\), and by \((W^0_t)_{0\leq t\leq T}\) the Brownian motion under \(\mathbb{P}^0\). Hence, \[W^0_t-W^0_{t_0}=W^\beta_t-W^\beta_{t_0}-\int_{t_0}^t\beta(X_\theta)\,d\theta,\quad\text{for all}\quad t_0\leq t\leq T.\] By the Girsanov theorem Le?, Gall?, the density between \(\mathbb{P}^\beta\) and \(\mathbb{P}^0\) amounts to, \[\label{Z32beta:32control321} Z^\beta_t\mathrel{\vcenter{:}}=\frac{\mathbb{P}^\beta}{\mathbb{P}^0}\bigg|_{\mathcal{F}_t}=\exp\big(-\int_{t_0}^t\beta(X_\theta)\,dW^0_\theta-\frac{1}{2}\int_{t_0}^t\big\lVert\beta(X_\theta)\big\rVert^2\,d\theta\big),\quad\text{for all}\quad t_0\leq t\leq T.\tag{27}\] Notice that for each \((t,x)\in[t_0,T]\times\mathbb{R}^d\), the ratio \(\ell^\beta_t(x)/\ell^0_t(x)=p^\beta_t(x)/p^0_t(x)\) is equal to \(Z^\beta_t\), under the condition \(X_t=x\), i.e. \[\frac{\ell^\beta_t(x)}{\ell^0_t(x)}=\mathbb{E}^{\mathbb{P}^0}\big[Z^\beta_t|X_t=x\big],\quad\text{for all}\quad(t,x)\in[0,T]\times\mathbb{R}^d.\] Therefore, if we manage to uniformly bound the logarithm \((\log Z^\beta_t)_{0\leq t\leq T}\), then the uniform boundedness of \(|\ell^\beta_t(x)/\ell^0_t(x)|\) follows. Since the perturbation \(\beta(\cdot):\mathbb{R}^d\to\mathbb{R}^d\) is smooth with compact support, \[\frac{1}{2}\int_{t_0}^t\big\lVert\beta(X_\theta)\big\rVert^2\,d\theta\leq C^\prime,\quad\text{for all}\quad t_0\leq t\leq T,\quad\mathbb{P}^\beta\text{-a.s.}\] for some constant \(C^\prime>0\). Since \(\beta(\cdot)\) is of gradient type, i.e. \(\beta(\cdot)=\nabla B(\cdot)\) with \(B(\cdot)\) of class \(\mathcal{C}^\infty(\mathbb{R}^d;\mathbb{R})\) and compactly supported, then the Itô formula gives \[\label{Z32beta:32control322} \int_{t_0}^t\beta(X_\theta)\,dW^0_\theta=B(X_t)-B(X_{t_0})+\frac{1}{2}\int_{t_0}^t\big(2\beta\cdot\nabla\psi-\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}\big)(X_\theta)\,d\theta,\quad\text{for all}\quad t_0\leq t\leq T.\tag{28}\] Invoking the compact supportness of \(B(\cdot)\) again, it is then obvious that \[\big|\int_{t_0}^t\beta(X_\theta)\,dW^0_\theta\big|\leq C^{\prime\prime},\quad\text{for all}\quad t_0\leq t\leq T,\quad\mathbb{P}^\beta\text{-a.s.}\] for some constant \(C^{\prime\prime}>0\). And this implies that \(|\log Z^\beta_t|\leq C^\prime+C^{\prime\prime}\) for all \(t_0\leq t\leq T\) \(\mathbb{P}^\beta\)-a.s., whence the assertion is verified. ◻

The framework of our discussion on the pathwise behavior of \(\mathcal{R}^\beta\) is based on a time-reversal perspective. The following Theorems 1 and 2 present the displacement and time-derivative of \(\mathcal{R}^\beta\) backward in time. Lemma 9 provides quantitative control on the deviation effect of the perturbation \(\beta(\cdot)\) based on its smoothness and compact support, which is necessary to our further derivation on the trajectorial dynamics of the relative entropy process \(\mathcal{R}^\beta\).

Theorem 1. Fix the time interval \([0,T]\) with \(T>t_0\). The time-reversal of the relative entropy process \(\mathcal{R}^\beta\) satisfies, for all \(0\leq t< T-t_0\), the following \(\mathbb{P}^\beta\)-a.s. trajectorial relation, \[\begin{align}\label{perturbed32trajectorial32entropy32decay4432time-displacement} \mathbb{E}^{\mathbb{P}^\beta}\big[\mathcal{R}^\beta_{t_0}(X_{t_0})|\mathcal{G}_{T-t}\big]-\mathcal{R}^\beta_{T-t} (X_{T-t})&=\mathbb{E}^{\mathbb{P}^\beta}\big[\int_t^{T-t_0}\big(2\beta\cdot\nabla\psi-\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}\big)(X_{T-\theta}) \,d\theta|\mathcal{G}_{T-t}\big]\\ &\quad+\frac{1}{2}\mathbb{E}^{\mathbb{P}^\beta}\big[\int_t^{T-t_0}\big\lVert\nabla\mathcal{R}^\beta_{T-\theta}(X_{T-\theta})\big\rVert^2 \,d\theta|\mathcal{G}_{T-t}\big]. \end{align}\tag{29}\]

Proof. Applying Lemma (8), the local martingale part from the semimartingale decomposition (22 ) of \(\mathcal{R}^\beta\) is a square integrable martingale. Hence, taking \(\mathcal{G}_{T-t}\)-conditional expectation with respect to \(\mathbb{P}^\beta\) on (22 ) cancels this martingale part. And the assertion follows. ◻

The perturbation terms in (29 ) clouds the implication of the phrase dissipation. Nonetheless, this term indicates how the perturbation \(\beta(\cdot)\) entangles with the potential \(\psi(\cdot)\), and henceforth affects the Itô-Langevin stochastic dynamics (1 ). But if we collapse the perturbation \(\beta(\cdot)\), things become more transparent.

Corollary 1. Switching off the perturbation \(\beta(\cdot):\mathbb{R}^d\to\mathbb{R}^d\), for all \(0\leq t< T-t_0\), Theorem 1 reduces to the \(\mathbb{P}^0\)-a.s. trajectorial displacement of relative entropy dissipation, \[\label{unperturbed32trajectorial32entropy32decay4432time-displacement} \mathbb{E}^{\mathbb{P}^0}\big[\mathcal{R}^0_{t_0}(X_{t_0})|\mathcal{G}_{T-t}\big]-\mathcal{R}^0_{T-t} (X_{T-t})=\frac{1}{2}\mathbb{E}^{\mathbb{P}^0}\big[\int_t^{T-t_0}\big\lVert\nabla\mathcal{R}^0_{T-\theta}(X_{T-\theta})\big\rVert^2 \,d\theta|\mathcal{G}_{T-t}\big].\tag{30}\]

Remember that he backward cumulative Fisher information process is defined in (23 ), which is exactly the integrand in (30 ). If we present things forward in time, for instance replacing \(T-t\) by \(T-(T-t)\), then we observe that the relative entropy process \((\mathcal{R}^\beta_t(X_t))_{t\geq0}\) is monotonically decreasing along all of its trajectories \(\mathbb{P}^0\)-a.s., conforming to the phrase dissipation.

On the other hand, the Fisher information quantity (8 ) appears again if we take the time-derivative of the relative entropy displacement (29 ). This can be seen as the trajectorial rate of time-evolution of \(\mathcal{R}^\beta\). However, things become delicate if we explore more on such limiting trajectorial behavior.

Lemma \(\ref{lem:325461}\) provides sufficient regularity to allow us taking the limit \(t\nearrow T-t_0\) in (29 ) divided by \(T-t_0-t\), which gives us the time-derivative of backward process \(\mathcal{R}^\beta\) under time-reversal. This differential structure of the trajectorial relative entropy process will eventually shed light on the know classical results on entropy dissipation which have been displayed in Section 2.

Theorem 2. Fix the time interval \([0,T]\) with \(T>t_0\). The backward relative entropy process \(\mathcal{R}^\beta\) satisfies the limiting trajectorial identity, \[\label{perturbed32trajectorial32entropy32decay4432derivative1} \lim\limits_{t\nearrow T-t_0}\frac{1}{T-t_0-t}\bigg(\mathbb{E}^{\mathbb{P}^\beta}\big[\mathcal{R}^\beta_{t_0}(X_{t_0})|\mathcal{G}_{T-t}\big]-\mathcal{R}^\beta_{T-t} (X_{T-t})\bigg)=\frac{1}{2}\big\lVert\nabla\mathcal{R}^0_{t_0}(X_{t_0})\big\rVert^2+\big(2\beta\cdot\nabla\psi-\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}\big)(X_{t_0}),\tag{31}\] where the limit in (31 ) is taken in \(L^1(\mathbb{P}^\beta)\).

Proof. Viewing Theorem 1 and invoking the dominated convergence theorem, we deduce from the smoothness and compact supportness of \(\beta(\cdot)\), together with the uniform boundedness of Lemma 9, that \[\lim\limits_{t\nearrow T-t_0}\frac{1}{T-t_0-t}\mathbb{E}^{\mathbb{P}^\beta}\big[\big|\mathcal{R}^\beta_{T-t}(X_{T-t})-\mathcal{R}^\beta_{t_0}(X_{t_0})\big|\big]=\mathbb{E}^{\mathbb{P}^\beta}\big[\big|\frac{1}{2}\big\lVert\nabla\mathcal{R}^0_{t_0}(X_{t_0})\big\rVert^2+\big(2\beta\cdot\nabla\psi-\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}\big)(X_{t_0})\big|\big].\] The fundamental theorem of calculus for continuous functions implies that \[\lim\limits_{t\nearrow T-t_0}\frac{1}{T-t_0-t}\big(\mathcal{R}^\beta_{T-t}(X_{T-t})-\mathcal{R}^\beta_{t_0}(X_{t_0})\big)=\frac{1}{2}\big\lVert\nabla\mathcal{R}^0_{t_0}(X_{t_0})\big\rVert^2+\big(2\beta\cdot\nabla\psi-\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}\big)(X_{t_0}),\quad\mathbb{P}^\beta\text{-a.s.}\] The Scheffé lemma [66] says that if a sequence of integrable random variables converges to a limiting random variable almost surely, then the convergence in \(L^1\) is equivalent to the convergence of their \(L^1\) norms. Using the Scheffé lemma, \[\label{Scheffé} \lim\limits_{t\nearrow T-t_0}\mathbb{E}^{\mathbb{P}^\beta}\bigg[\bigg|\frac{\mathcal{R}^\beta_{T-t}(X_{T-t})-\mathcal{R}^\beta_{t_0}(X_{t_0})}{T-t_0-t}-\frac{1}{2}\big\lVert\nabla\mathcal{R}^0_{t_0}(X_{t_0})\big\rVert^2-\big(2\beta\cdot\nabla\psi-\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}\big)(X_{t_0})\bigg|\bigg]=0.\tag{32}\] Moreover, the triangle inequality and the Jensen inequality for conditional expectation yields \[\begin{align} &\mathbb{E}^{\mathbb{P}^\beta}\bigg[\bigg|\frac{\mathbb{E}^{\mathbb{P}^\beta}[\mathcal{R}^\beta_{t_0}(X_{t_0})|\mathcal{G}_{T-t}]-\mathcal{R}^\beta_{T-t} (X_{T-t})}{T-t_0-t}-\frac{1}{2}\big\lVert\nabla\mathcal{R}^0_{t_0}(X_{t_0})\big\rVert^2-\big(2\beta\cdot\nabla\psi-\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}\big)(X_{t_0})\bigg|\bigg]\\ &\quad\leq\mathscr{A}^\beta_{t_0,T-t}+\mathscr{B}^\beta_{t_0,T-t}+\mathscr{C}^\beta_{t_0,T-t}, \end{align}\] where \[\begin{align} \mathscr{A}^\beta_{t_0,T-t}&\mathrel{\vcenter{:}}=\mathbb{E}^{\mathbb{P}^\beta}\bigg[\bigg|\frac{\mathcal{R}^\beta_{T-t}(X_{T-t})-\mathcal{R}^\beta_{t_0}(X_{t_0})}{T-t_0-t}-\frac{1}{2}\big\lVert\nabla\mathcal{R}^0_{t_0}(X_{t_0})\big\rVert^2-\big(2\beta\cdot\nabla\psi-\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}\big)(X_{t_0})\bigg|\bigg],\\ \mathscr{B}^\beta_{t_0,T-t}&\mathrel{\vcenter{:}}=\mathbb{E}^{\mathbb{P}^\beta}\bigg[\bigg|\mathbb{E}^{\mathbb{P}^\beta}\big[\big(2\beta\cdot\nabla\psi-\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}\big)(X_{t_0})|\mathcal{G}_{T-t}\big]-\big(2\beta\cdot\nabla\psi-\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}\big)(X_{t_0})\bigg|\bigg],\\ \mathscr{C}^\beta_{t_0,T-t}&\mathrel{\vcenter{:}}=\mathbb{E}^{\mathbb{P}^\beta}\bigg[\bigg|\mathbb{E}^{\mathbb{P}^\beta}\big[\frac{1}{2}\big\lVert\nabla\mathcal{R}^0_{t_0}(X_{t_0})\big\rVert^2|\mathcal{G}_{T-t}\big]-\frac{1}{2}\big\lVert\nabla\mathcal{R}^0_{t_0}(X_{t_0})\big\rVert^2\bigg|\bigg]. \end{align}\] By (32 ), we know \(\mathscr{A}^\beta_{t_0,T-t}\to0\) as \(t\nearrow T-t_0\). Using [67] and the right-continuity of \((\mathcal{G}_{T-t})_{0\leq t\leq T}\), we know \(\mathscr{B}^\beta_{t_0,T-t}\to0\) and \(\mathscr{C}^\beta_{t_0,T-t}\to0\) as \(t\nearrow T-t_0\). Combining these facts, the assertion (31 ) is verified. ◻

The limiting identity (31 ) on the time-derivative of the process \(\mathcal{R}^\beta\) indicates that this time-derivative is split into two parts: the \(\mathbb{P}^\beta\)-integrand of the Fisher information and a perturbation term induced by \(\beta(\cdot)\). This expression conforms with the spirit that the Itô-Langevin system is perturbed. And this expression is transparent in the sense that the perturbation term is separate from the Fisher information integrand. In consequence, this perturbation term vanishes when \(\beta\equiv0\).

Corollary 2. Switching off the perturbation \(\beta(\cdot):\mathbb{R}^d\to\mathbb{R}^d\), Theorem 2 reduces to the unperturbed limiting trajectorial identity \[\label{unperturbed32trajectorial32entropy32decay} \lim\limits_{t\nearrow T-t_0}\frac{1}{T-t_0-t}\bigg(\mathbb{E}^{\mathbb{P}^0}\big[\mathcal{R}^0_{t_0}(X_{t_0})|\mathcal{G}_{T-t}\big]-\mathcal{R}^0_{T-t}(X_{T-t})\bigg)=\frac{1}{2}\norm{\nabla\mathcal{R}^0_{t_0}(X_{t_0})}^2,\tag{33}\] where the limit in (33 ) is taken in \(L^1(\mathbb{P}^0)\).

Theorem 2 and Corollary 2 present time-derivatives of the relative entropy from a trajectorial approach, in the perturbed and unperturbed cases, respectively. In the subsequent paragraphs, we will see how these trajectorial identities retrieve the known classical phenomena on the dissipation of relative entropy.

5.2 Consequences on the classical results↩︎

The identities presented in Theorems 1 and 2 reveal the trajectorial dynamics of the relative entropy process \(\mathcal{R}^\beta\). For the computational convenience, these results are written backward in time. Nevertheless, after taking \(\mathbb{P}^\beta\)-expectation, these results conform to the known phenomena on the forward-time relative entropy dissipation. And this consequence confirms that the trajectorial identities presented in this expository article yield more information than the classical approach on the relative entropy dissipation.

The following paragraphs show how the trajectorial approach eventually rediscovers the known results on the relative entropy dissipation of the Itô-Langevin stochastic system. To begin with, Lemma 10 gives another deviation control on the perturbation effect of \(\beta(\cdot)\), apart from Lemma 9. Such control is important when we discuss the time-derivatives of relative entropy quantity.

Lemma 10. Fix the time interval \([0,T]\) with \(T>t_0\). There exist \(C_1,C_2>0\) such that \[\label{ineq3214432lem325462} \big|\frac{\ell^\beta_{T-t}(x)}{\ell^0_{T-t}(x)}-1\big|\leq C_1(T-t_0-t)\tag{34}\] as well as \[\label{ineq3224432lem325462} \mathbb{E}^{\mathbb{P}^0}\big[\int_t^{T-t_0}\big\lVert\nabla(\mathcal{R}^\beta_{T-\theta}-\mathcal{R}^0_{T-\theta})(X_{T-\theta})\big\rVert^2\,d\theta\big]\leq C_2(T-t_0-t)^2,\tag{35}\] for all \(0\leq t< T-t_0\) and \(x\in\mathbb{R}^d\).

Proof. Only in this proof, we will denote by \(L^\beta_t(x)\mathrel{\vcenter{:}}=\ell^\beta_t(x)/\ell^0_t(x)\) for all \((t,x)\in[0,T]\times\mathbb{R}^d\). Since for all \(t\geq0\), \(\log L^\beta_t(X_t)=\mathcal{R}^\beta_t(X_t)-\mathcal{R}^0_t(X_t)\), then \[\begin{align} &\mathbb{E}^{\mathbb{P}^0}\big[\int_0^{T-t_0}\big\lVert\nabla\log L^\beta_{T-\theta}(X_{T-\theta})\big\rVert^2\,d\theta\big]\\ &\quad\leq2\mathbb{E}^{\mathbb{P}^0}\big[\int_0^{T-t_0}\big\lVert\nabla\mathcal{R}^0_{T-\theta}(X_{T-\theta})\big\rVert^2\,d\theta\big]+2\mathbb{E}^{\mathbb{P}^0}\big[\int_0^{T-t_0}\big\lVert\nabla\mathcal{R}^\beta_{T-\theta}(X_{T-\theta})\big\rVert^2\,d\theta\big]\\ &\quad\leq2\mathbb{E}^{\mathbb{P}^0}\big[\int_0^{T-t_0}\big\lVert\nabla\mathcal{R}^0_{T-\theta}(X_{T-\theta})\big\rVert^2\,d\theta\big]+2\mathbb{E}^{\mathbb{P}^\beta}\big[\int_0^{T-t_0}Z^\beta_{T-\theta}\big\lVert\nabla\mathcal{R}^\beta_{T-\theta}(X_{T-\theta})\big\rVert^2\,d\theta\big]. \end{align}\] Using (27 ) and (28 ), there exists \(C^\prime>0\) such that \(Z^\beta_t(x)\leq C^\prime\) for all \((t,x)\in[0,T]\times\mathbb{R}^d\). Hence, \[\begin{align} &\mathbb{E}^{\mathbb{P}^0}\big[\int_0^{T-t_0}\big\lVert\nabla\log L^\beta_{T-\theta}(X_{T-\theta})\big\rVert^2\,d\theta\big]\\ &\quad\leq2\mathbb{E}^{\mathbb{P}^0}\big[\int_0^{T-t_0}\big\lVert\nabla\mathcal{R}^0_{T-\theta}(X_{T-\theta})\big\rVert^2\,d\theta\big]+2C^\prime\mathbb{E}^{\mathbb{P}^\beta}\big[\int_0^{T-t_0}\big\lVert\nabla\mathcal{R}^\beta_{T-\theta}(X_{T-\theta})\big\rVert^2\,d\theta\big]\\ &\quad\leq2C^{\prime\prime}(1+C^\prime)<\infty, \end{align}\] for some \(C^{\prime\prime}>0\), where the last inequality is due to (25 ). Since \(\nabla\log L^\beta_t(x)=\nabla L^\beta_t(x)/L^\beta_t(x)\) for all \(x\in\mathbb{R}^d\), using (26 ) again, we have \[\label{martingale32part32is322nd-integrable} \mathbb{E}^{\mathbb{P}^0}\big[\int_0^{T-t_0}\big\lVert\nabla L^\beta_{T-\theta}(X_{T-\theta})\big\rVert^2\,d\theta\big]\leq C^{\prime\prime\prime}\tag{36}\] for some constant \(C^{\prime\prime\prime}>0\). Using (21 ) for both the perturbed and unperturbed cases and by the Itô formula, we see that the time-reversal \((L^\beta_{T-t}(X_{T-t}))_{0\leq t\leq T-t_0}\) satisfies \[\label{SDE32for32L} dL^\beta_{T-t}(X_{T-t})=\nabla L^\beta_{T-t}(X_{T-t})\,d\overline{W}^{\mathbb{P}^0}_{T-t}-\big(L^\beta_{T-t}\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}\big)(X_{T-t})-L^\beta_{T-t}\big(\beta\cdot\nabla\log\text{(}p^0_{T-t}L^\beta_{T-t})\big)(X_{T-t})\,dt\tag{37}\] for all \(0\leq t<T-t_0\), with respect to the filtration \((\mathcal{G}_{T-t})_{0\leq t\leq T-t_0}\). In view of (36 ), the martingale part from the semimartingale decomposition of \((L^\beta_{T-t}(X_{T-t}))_{0\leq t\leq T-t_0}\) is \(L^2(\mathbb{P}^0)\)-bounded. Its drift term vanishes when \(X_{T-t}\) exits the compact support of \(\beta(\cdot)\) in \(\mathbb{R}^d\). Hence, the drift term is bounded by \[C^{\prime\prime\prime\prime}\mathrel{\vcenter{:}}=\sup\big\{\big|L^\beta_{T-t}\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}\big|(x)+\big|L^\beta_{T-t}\big(\beta\cdot\nabla\log\text{(}p^0_{T-t}L^\beta_{T-t})\big)\big|(x):\,x\in\mathbb{R}^d,\,0\leq t\leq T-t_0\big\}<\infty.\] Henceforth, the processes \[L^\beta_{T-t}(X_{T-t})+C^{\prime\prime\prime\prime}t\qquad\text{and}\qquad L^\beta_{T-t}(X_{T-t})-C^{\prime\prime\prime\prime}t,\quad\text{with}\quad 0\leq t\leq T-t_0\] are respectively \((\mathcal{G}_{T-t})_{0\leq t\leq T-t_0}\)-submartingale and \((\mathcal{G}_{T-t})_{0\leq t\leq T-t_0}\)-supermartingale. Then, \[\big|\mathbb{E}^{\mathbb{P}^0}\big[L^\beta_{t_0}(X_{t_0})|X_{T-t}=x\big]- L^\beta_{T-t}(x)\big|\leq C^{\prime\prime\prime\prime}(T-t_0-t),\quad\text{for all}\quad(t,x)\in[0,T-t_0]\times\mathbb{R}^d.\] Since \(L^\beta_{t_0}(\cdot)\equiv1\), taking \(C_1\mathrel{\vcenter{:}}= C^{\prime\prime\prime\prime}\), (34 ) is verified. Using (37 ) and the Itô formula, we observe \[\begin{align} &\mathbb{E}^{\mathbb{P}^0}\big[\int_t^{T-t_0}\big\lVert\nabla(\mathcal{R}^\beta_{T-\theta}-\mathcal{R}^0_{T-\theta})(X_{T-\theta})\big\rVert^2\,d\theta\big]\\ &\quad=\mathbb{E}^{\mathbb{P}^0}\big[\log L^\beta_{T-t}(X_{T-t})-L^\beta_{T-t}(X_{T-t})+1+\int_t^{T-t_0}G_{T-\theta}(X_{T-\theta})\,d\theta\big], \end{align}\] where the function \(G_{T-t}(\cdot):\mathbb{R}^d\to\mathbb{R}\) is defined as \[G_{T-t}(x)\mathrel{\vcenter{:}}=(L^\beta_{T-t}(x)-1)\big(\beta\cdot\nabla\log\text{(}p^\beta_{T-t}L^\beta_{T-t})+\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}\big)(x),\quad\text{for all}\quad(t,x)\in[0,T-t_0]\times\mathbb{R}^d.\] Introduce the constant \[C^{\prime\prime\prime\prime\prime}\mathrel{\vcenter{:}}=\sup\big\{\big|\beta\cdot\nabla\log\text{(}p^\beta_{T-t}L^\beta_{T-t})+\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}\big|:\,x\in\mathbb{R}^d,\,0\leq t\leq T-t_0\big\}<\infty.\] Use (34 ), it is then immediate that \[\mathbb{E}^{\mathbb{P}^0}\big[\int_t^{T-t_0}\big\lVert\nabla(\mathcal{R}^\beta_{T-\theta}-\mathcal{R}^0_{T-\theta})(X_{T-\theta})\big\rVert^2\,d\theta\big]\leq\mathbb{E}^{\mathbb{P}^0}\big[\int_t^{T-t_0}G_{T-\theta}(X_{T-\theta})\,d\theta\big]\leq C^{\prime\prime\prime\prime}C^{\prime\prime\prime\prime\prime}(T-t_0-t)^2.\] Taking \(C_2\mathrel{\vcenter{:}}= C^{\prime\prime\prime\prime}C^{\prime\prime\prime\prime\prime}\), (35 ) is verified. ◻

Having the Lemmas 9 and 10 as deviation control, we retrieve the known classical results on the relative entropy dissipation from the trajectorial approach in Theorems 1 and 2.

Theorem 3. Fix \(t_0\geq0\) to be the time point when \(\beta(\cdot)\) is initiated. We retrieve the known result on the forward-time dissipation of the relative entropy under perturbation, \[\label{perturbed32classical32entropy32decay4432time-displacement} \mathbb{H}\big[P^\beta_t|Q\big]-\mathbb{H}\big[P^\beta_{t_0}|Q\big]=-\frac{1}{2}\mathbb{E}^{\mathbb{P}^\beta}\big[\int_{t_0}^t\big\lVert\nabla\mathcal{R}^\beta_{\theta}(X_{\theta})\big\rVert^2 \,d\theta\big]+ \mathbb{E}^{\mathbb{P}^\beta}\big[\int_{t_0}^t\big(\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}-2\beta\cdot\nabla\psi\big)(X_{\theta})\,d\theta\big]\tag{38}\] for all \(t\geq t_0\). Furthermore, we also have the time-derivative, \[\label{perturbed32classical32entropy32decay4432derivative} \lim\limits_{t\searrow t_0} \frac{1}{t-t_0} \bigg(\mathbb{H}[P^\beta_t|Q]-\mathbb{H}[P^\beta_{t_0}|Q]\bigg)=-\frac{1}{2}\mathbb{I}[P^0_{t_0}|Q]-\mathbb{E}^{\mathbb{P}^0}\big[\big(\beta\cdot\nabla\mathcal{R}^0_{t_0}\big)(X_{t_0})\big].\tag{39}\]

Proof. Following Theorem 1 and taking \(\mathbb{P}^\beta\)-expectation on (29 ), we get (38 ). Since \(\beta(\cdot)\) is smooth with compact support, the continuity of the sample paths of \((X_t)_{t\geq0}\) renders us, \[\label{second32term} \lim\limits_{t\searrow t_0} \frac{1}{t-t_0}\mathbb{E}^{\mathbb{P}^\beta}\big[\int_{t_0}^t\big(\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}-2\beta\cdot\nabla\psi\big)(X_{\theta})\,d\theta\big]=\mathbb{E}^{\mathbb{P}^\beta}\big[\big(\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}-2\beta\cdot\nabla\psi\big)(X_{t_0})\big].\tag{40}\] Notice that \(X_{t_0}\) has the same distribution under \(\mathbb{P}^\beta\) and \(\mathbb{P}^0\). Henceforth, we can replace the \(\mathbb{P}^\beta\)-expectation in (40 ) with the \(\mathbb{P}^0\)-expectation. Moreover, integration by parts yields \[\begin{align} &\mathbb{E}^{\mathbb{P}^0}\big[\big(\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}-2\beta\cdot\nabla\psi\big)(X_{t_0})\big]=\int_{\mathbb{R}^d}\big(\sum\limits_{i=1}^d\frac{\partial\beta^{(i)}}{\partial x_i}-2\beta\cdot\nabla\psi\big)(x)p_{t_0}(x)\,dx\\ &\quad\quad\quad=-\int_{\mathbb{R}^d}\beta\cdot\nabla(\log p_{t_0}+2\psi)(x)p_{t_0}(x)\,dx=-\mathbb{E}^{\mathbb{P}^0}\big[\big(\beta\cdot\nabla\mathcal{R}^0_{t_0}\big)(X_{t_0})\big]. \end{align}\] Applying (34 ) and (35 ), \[\label{first32term} \lim\limits_{t\searrow t_0} \frac{1}{t-t_0}\mathbb{E}^{\mathbb{P}^\beta}\big[\int_{t_0}^t\big\lVert\nabla\mathcal{R}^\beta_{\theta}(X_{\theta})\big\rVert^2 \,d\theta\big]=\lim\limits_{t\searrow t_0} \frac{1}{t-t_0}\mathbb{E}^{\mathbb{P}^0}\big[\int_{t_0}^t\big\lVert\nabla\mathcal{R}^0_{\theta}(X_{\theta})\big\rVert^2 \,d\theta\big]=\mathbb{I}\big[P^0_{T_0}|Q\big],\tag{41}\] where the last equality is due to Tschiderer [35]. Combining (40 ) and (41 ), the assertion (39 ) is verified. ◻

When there is no perturbation, the time-derivative of the relative entropy dissipation conforms to the Fisher information modulo a multiplicative factor \(1/2\). This unperturbed scenario will be further discussed in Section 6, when we look at the steepest descent property of the relative entropy, from the Wasserstein space perspective.

Corollary 3. Switching off the perturbation \(\beta(\cdot):\mathbb{R}^d\to\mathbb{R}^d\), Theorem 3 reduces to the classical result of the dissipation of relative entropy, \[\label{unperturbed32classical32entropy32decay4432time-displacement} \mathbb{H}\big[P^0_t|Q\big]-\mathbb{H}\big[P^0_{t_0}|Q\big]=-\frac{1}{2}\mathbb{E}^{\mathbb{P}^0}\big[\int_{t_0}^t\big\lVert\nabla\mathcal{R}^0_{\theta}(X_{\theta})\big\rVert^2 \,d\theta\big],\quad\text{for all}\quad t\geq t_0,\tag{42}\] as well as the classical limiting identity \[\label{unperturbed32classical32entropy32decay4432derivative} \lim\limits_{t\searrow t_0}\frac{1}{t-t_0}\bigg(\mathbb{H}[P^0_t|Q]-\mathbb{H}[P^0_{t_0}|Q]\bigg)=-\frac{1}{2}\mathbb{I}[P^0_{t_0}|Q].\tag{43}\]

So far we have demonstrated that the classical consequences on the dissipation of relative entropy can be retrieved from the trajectorial formulation via Theorem 3 and Corollary. However, it has not been answered why we choose the indirect, and probably less transparent, approach of time-reversal. In the remainder of this section, it will be shown that the backward-time approach is indeed more convenient than the forward-time approach.

5.3 Defects in the forward-time approach↩︎

We have mentioned in Section 1 that the time-reversal principle is advantageous in its computational convenience. By comparing our derivation to the forward-time approach, we highlight where such computational convenience comes from.

We first compute the stochastic differential equations satisfied by the forward-time processes \((\ell^\beta_t(X_t))_{t\geq0}\) and \((\mathcal{R}^\beta_t(X_t))_{t\geq0}\). Notice that both processes are \((\mathcal{F}_t)_{t\geq0}\)-adapted. Similar to the backward-time scenario, we use the Fokker-Planck equations (18 ) and (19 ) to capture the differential structure of the likelihood ratio \(\ell^\beta_t(x)=p^\beta_t(x)\exp(2\psi(x))\).

Lemma 11. In the forward-time approach, the likelihood ratio process \((\ell^\beta_t(X_t))_{t\geq0}\) satisfies the stochastic differential equation, \[\begin{align}\label{forward-time32l} d\ell^\beta_t(X_t)&=\big(\sum\limits_{1\leq i\leq d}\frac{\partial\beta^{(i)}}{\partial x_i}-2\beta\cdot\nabla\psi\big)\ell^\beta_t(X_t)I_{\{t>t_0\}}\,dt\\ &\quad+\big(\sum\limits_{1\leq i\leq d}\frac{\partial^2\ell^\beta_t}{\partial x_i^2}-2\nabla\ell^\beta_t\cdot\nabla\psi\big)(X_t)\,dt+\nabla\ell^\beta_t(X_t)\,dW^\beta_t,\quad\text{for all}\quad t\geq0. \end{align}\tag{44}\] And its logarithm, the forward-time relative entropy process \((\mathcal{R}^\beta_t(X_t))_{t\geq0}\) satisfies, \[\begin{align}\label{forward-time32R} d\mathcal{R}^\beta_t(X_t)&=\big(\sum\limits_{1\leq i\leq d}\frac{\partial\beta^{(i)}}{\partial x_i}-2\beta\cdot\nabla\psi\big)I_{\{t>t_0\}}\,dt+\big(\sum\limits_{1\leq i\leq d}\frac{1}{\ell^\beta_t}\frac{\partial^2\ell^\beta_t}{\partial x_i^2}-2\nabla\mathcal{R}^\beta_t\cdot\nabla\psi\big)(X_t)\,dt\\ &\quad-\frac{1}{2}\big\lVert\nabla\mathcal{R}^\beta_t(X_t)\big\rVert^d\,dt+\nabla\mathcal{R}^\beta_t(X_t)\,dW^\beta_t,\quad\text{for all}\quad t\geq0. \end{align}\tag{45}\]

Proof. From the Fokker-Planck equations (18 ) and (19 ), we can compute that \[\label{forward-time32PDE4432l} \frac{\partial\ell^\beta_t}{\partial t}(x)=\frac{1}{2}\Delta\ell^\beta_t(x)+\nabla\ell^\beta_t\cdot(\beta I_{\{t>t_0\}}-\nabla\psi)(x)+\big(\sum\limits_{1\leq i\leq d}\frac{\partial\beta^{(i)}}{\partial x_i}-2\beta\cdot\nabla\psi\big)\ell^\beta_t(x)I_{\{t>t_0\}}.\tag{46}\] Via the Itô formula and that \(\mathcal{R}^\beta_t(X_t)=\log\ell^\beta_t(X_t)\) for all \(t\geq0\), the assertions (44 ), (45 ) are verified. ◻

Compared to the backward-time approach (22 ), some extra terms show up in (45 ). The analysis trajectorial behavior of the forward-time process \((\mathcal{R}^\beta_t(X_t))_{t\geq0}\) is therefore more involved. And consequently, the need of additional computation to deal with these extra terms makes the forward-time approach less transparent and eventually clouds the intuition of the phrase dissipation.

We may still take \(\mathbb{P}^\beta\)-expectation, formally, to retrieve the classical identity of relative entropy dissipation through the forward-time approach. Indeed, after we verify the integrability of the additional term in (45 ), performing the integration by parts shows that, \[\mathbb{E}^{\mathbb{P}^\beta}\big[\sum\limits_{1\leq i\leq d}\frac{1}{\ell^\beta_t}\frac{\partial^2\ell^\beta_t}{\partial x_i^2}(X_t)-2(\nabla\mathcal{R}^\beta_t\cdot\nabla\psi)(X_t)\big]=0.\] Henceforth, despite its computational complexity, this forward-time approach eventually leads to the same results, i.e. Theorem 3 and Corollary 3. Nevertheless, we prefer to work on the interpretation to the trajectorial dynamics backward in time.

6 Connections to derivative in Wasserstein space↩︎

The motivation of the Wasserstein space comes from a comparison between probability measures. In essence, the Wasserstein space is a suitably defined collection of probability measures endowed with a metric. An intuitive picture is to view each distribution as a unit amount of soil piled on ground. This metric quantifies the minimal cost of transporting one pile into the other, see Ambrosio/Gigli/Savaré Ambrosio/Gigli/Savaré?. By this analogy, the metric is known in Computer Science as the earth mover distance, see Levina/Bickel [68].

The name, Wasserstein metric, was coined by Dobrushin [69] after learning the work of Vaseršteĭn Vaseršteĭn? on Markov processes describing large systems of automata. Nevertheless, this metric has already been introduced by Kantorovich [70], [71] in the context of Optimal Transport Theory. Wasserstein metric is a natural way to compare the laws of two random variables, where one is derived from the other by some perturbations, or undergoes time-evolution.

In our context of Itô-Langevin dynamics (1 ), \(P^\beta_t\) and \(P^\beta_{t_0}\) correspond to the marginal laws on \(\mathbb{R}^d\) of \((X_t)_{t\geq t_0}\) at time \(t\) and \(t_0\). As previewed in Section 1, we present the time-derivative of \(t\mapsto W_2(P^\beta_t,P^\beta_{t_0})\) at \(t=t_0\). This limiting behavior reveals a correlation to the relative entropy dissipation via the quantity of Fisher information (8 ). But first of all, let us proceed in an orderly way and start with the basic formulation of the quadratic Wasserstein space.

6.1 Basic structure of Wasserstein space↩︎

Let \(\mathscr{P}(\mathbb{R}^d)\) denote the set of all probability measures on the Borel sets of \(\mathbb{R}^d\). In this expository article, the quadratic Wasserstein space is defined to be a metric space whose elements form a subset of \(\mathscr{P}(\mathbb{R}^d)\). This metric structure quantifies the distance between probability measures on \(\mathbb{R}^d\). To be precise, the elements of the quadratic Wasserstein space \(\mathscr{P}_2(\mathbb{R}^d)\) consist exactly of those elements in \(\mathscr{P}(\mathbb{R}^d)\) with finite second moment, i.e. \[\label{def4432quad4432W32space} \mathscr{P}_2(\mathbb{R}^d)\mathrel{\vcenter{:}}=\big\{P\in\mathscr{P}(\mathbb{R}^d):\,\int_{\mathbb{R}^d}\norm{x}^2\,dP(x)<\infty\big\},\tag{47}\] together with a metric function \(W_2(\cdot,\cdot):\mathscr{P}_2(\mathbb{R}^d)\times\mathscr{P}_2(\mathbb{R}^d)\to\mathbb{R}_+\), which will be specified soon.

On the other hand, to simplify our exposition, occasionally we identify probability density functions on \(\mathbb{R}^d\) with its associated Borel probability measures. Notice that if \(p(\cdot):\mathbb{R}^d\to\mathbb{R}_+\) denotes a probability density function on \(\mathbb{R}^d\), then its associated probability measure, \[\label{equivalence32for32density32in32W2} p(x)\,dx\in\mathscr{P}_2(\mathbb{R}^d)\qquad\text{if and only if}\qquad\int_{\mathbb{R}^d}x^2p(x)\,dx<\infty.\tag{48}\] In fact, if condition (48 ) is satisfied, \(p(\cdot):\mathbb{R}^d\to\mathbb{R}_+\) will be identified with an element in \(\mathscr{P}_2(\mathbb{R}^d)\). Readers should stay alert to this convention. But in our expository article, this should leave no ambiguity.

Having specified the elements of the quadratic Wasserstein space \(\mathscr{P}_2(\mathbb{R}^d)\) in (47 ), we give a precise discription to the Wasserstein metric \(W_2\). First of all, we adopt some notions and terminologies from the Optimal Transport Theory [72], Ambrosio/Gigli/Savaré?. Given \(\mu,\nu\in\mathscr{P}(\mathbb{R}^d)\), let \(\Gamma(\mu,\nu)\) denote the set of Kantorovich transport plans, i.e. probability measures \(\gamma\) on the Borel sets of \(\mathbb{R}^d\times\mathbb{R}^d\) with marginals \(\mu\) and \(\nu\). Then, \(\gamma\) satisfies \(\pi^1_{\#}\gamma=\gamma\circ(\pi^1)^{-1}=\mu\) and \(\pi^2_{\#}\gamma=\gamma\circ(\pi^2)^{-1}=\nu\), where \(\pi^i:\mathbb{R}^d\times\mathbb{R}^d\to\mathbb{R}^d\), \(i=1,2\) are the canonical projections. The Wasserstein metric \(W_2\) is defined by, \[\label{quad32W4432metric} W_2(\mu,\nu)^2\mathrel{\vcenter{:}}=\inf\big\{\int_{\mathbb{R}^d\times\mathbb{R}^d}\norm{x-y}^2\,d\gamma(x,y):\,\gamma\in\Gamma(\mu,\nu)\big\},\quad\text{for all}\quad\mu,\nu\in\mathscr{P}_2(\mathbb{R}^d).\tag{49}\] It is verified that \(W_2\) is indeed a metric on \(\mathscr{P}_2(\mathbb{R}^d)\), see Sturm [73], [74]. In fact, the definition (49 ) of \(W_2\) gives more regularity on the metric structure of \(\mathscr{P}_2(\mathbb{R}^d)\). The quadratic Wasserstein space \(\mathscr{P}_2(\mathbb{R}^d)\), equipped with the metric \(W_2\), is separable and completely metrizable, i.e. a Polish space, see Ambrosio/Gigli/Savaré Ambrosio/Gigli/Savaré?, [75], [76].

The Wasserstein metric \(W_2\) is furthermore compatible with a Riemannian interpretation of the Wasserstein space [36], [77]. Regarded formally as a Riemannian manifold consisting of Borel probability measures on \(\mathbb{R}^d\), the characterization of \(W_2\) suggests the tangent bundle \(T\mathscr{P}_2(\mathbb{R}^d)\mathrel{\vcenter{:}}=\cup_\mu T_\mu\mathscr{P}_2(\mathbb{R}^d)\) to \(\mathscr{P}_2(\mathbb{R}^d)\), where \[\label{quad32W4432tangent32space} T_\mu\mathscr{P}_2(\mathbb{R}^d)\mathrel{\vcenter{:}}=\overline{\big\{\nabla\varphi:\,\varphi\in\mathcal{C}^\infty_c(\mathbb{R}^d;\mathbb{R})\big\}}^{L^2(\mu)},\quad\text{for all}\quad\mu\in\mathscr{P}_2(\mathbb{R}^d).\tag{50}\] Naturally, (50 ) hints to a differential structure to the Wasserstein metric framework of \(\mathscr{P}_2(\mathbb{R}^d)\).

In light of the Riemannian structure of \(\mathscr{P}_2(\mathbb{R}^d)\), we can talk of the constant speed geodesic. Indeed, this concept is studied in Differential Geometry, where any two points on a smooth manifold are connected by a unique length-minimized curve, called the geodesic [78][80]. In the Wasserstein space theory, (50 ) provides a tangent bundle to \(\mathscr{P}_2(\mathbb{R}^d)\) and hence defines its manifold structure. Then, given two arbitrary probability measures in \(\mathscr{P}_2(\mathbb{R}^d)\), the scheme to transport one probability measure to the other with minimal effort, i.e. cumulative tangential distance, corresponds exactly to a geodesic on \(\mathscr{P}_2(\mathbb{R}^d)\). This geodesic can be also written as a parametrized family of probability measures in \(\mathscr{P}_2(\mathbb{R}^d)\).

Let \(I\mathrel{\vcenter{:}}=[a,b]\) be the parameter interval. Fix \(\mu_a,\mu_b\in\mathscr{P}_2(\mathbb{R}^d)\). If we can find a transport map \(\mathcal{T}^G\mathrel{\vcenter{:}}=\nabla G:\mathbb{R}^d\to\mathbb{R}^d\) such that \(\mu_b=(\mathcal{T}^G)_\#\mu_a=\mu_a\circ(\mathcal{T}^G)^{-1}\) and \(G\) is convex on \(\mathbb{R}^d\), then the parametrized family \((\mu_t)_{t\in I}\) in \(\mathscr{P}_2(\mathbb{R}^d)\) defined by, \[\mu_t\mathrel{\vcenter{:}}=(\mathcal{T}^G_t)_\#\mu_a,\qquad\text{where}\quad\mathcal{T}^G_t\mathrel{\vcenter{:}}= \frac{b-t}{b-a} Id_{\mathbb{R}^d}+\frac{t-a}{b-a}\nabla G,\quad\text{for all}\quad t\in[a,b],\] is a curve in \(\mathscr{P}_2(\mathbb{R}^d)\) connecting \(\mu_a\) and \(\mu_b\) which, for all \(a\leq u\leq v\leq b\), satisfies, \[\label{optimal32transport} W_2(\mu_u,\mu_v)=\frac{v-u}{b-a}\sqrt{\int_{\mathbb{R}^d}\big\lVert x-\nabla G(x)\big\rVert^2\,d\mu_a(x)}=\frac{v-u}{b-a}\big\lVert x-\nabla G(x)\big\rVert_{L^2(\mu_a)}.\tag{51}\] And this parametrized curve \((\mu_t)_{t\in I}\) in \(\mathscr{P}_2(\mathbb{R}^d)\) is the constant speed geodesic from \(\mu_a\) to \(\mu_b\). The result (51 ) is the Brenier theorem for the Wasserstein spaces. Readers are referred to Brenier [81] and Villani [82] for more details. In practice, we often construct a transport map \(\mathcal{T}^G\) from a convex function \(G(\cdot):\mathbb{R}^d\to\mathbb{R}\). Then, this \((\mu_t)_{t\in I}\) is automatically a constant speed geodesic in \(\mathscr{P}_2(\mathbb{R}^d)\).

In view of the Wasserstein space theory, the family \((P^\beta_t)_{t\geq0}\) can be equivalently seen as a parametrized curve in the manifold \(\mathscr{P}_2(\mathbb{R}^d)\), in light of Lemma 1. And we further impose some regularity conditions on the potential \(\psi(\cdot)\) which drives the Itô-Langevin dynamics (1 ) in this context. At each \(t\geq0\), we assume that \(\psi(\cdot)\) is chosen such that there exists a sequence of functions \((\varphi^{0,(m)}_t(\cdot))_{m\in\mathbb{N}}\) of class \(\mathcal{C}^\infty(\mathbb{R}^d;\mathbb{R})\) with compact support, whose gradients \((\nabla\varphi^{0,(m)}_t(\cdot))_{m\in\mathbb{N}}\) admit the mean square convergence \[\nabla\varphi^{0,(m)}_t(\cdot)\xrightarrow{\,L^2(\mathbb{R}^d,P^0_t)\,}V^0_t(\cdot)\mathrel{\vcenter{:}}=\nabla\varphi^0_t(\cdot)\quad\text{as}\quad m\to\infty,\] where the time-dependent velocity field \(V^0_t(\cdot)\) is of gradient type with \(\varphi^0_t(\cdot)\mathrel{\vcenter{:}}=-\psi(\cdot)-\tfrac{1}{2}\log p^0_t(\cdot)\). Here, \(P^0_t\) corresponds to the unperturbed marginal distribution of \(X_t\) in (1 ) at \(t\geq0\). In particular, at \(t_0\), \[V^0_{t_0}(\cdot)\in T_{P^0_{t_0}}\mathscr{P}_2(\mathbb{R}^d)=\overline{\big\{\nabla\varphi(\cdot):\,\varphi\in\mathcal{C}_c^\infty(\mathbb{R}^d,\mathbb{R})\big\}}^{L^2(P^0_{t_0})}.\] When there is perturbation, denote by \(V^\beta_t(\cdot)\mathrel{\vcenter{:}}=\nabla\varphi^\beta_t(\cdot)\) with \(\varphi^\beta_t(\cdot)\mathrel{\vcenter{:}}=-(\psi+BI_{\{t>t_0\}})(\cdot)-\tfrac{1}{2}\log p^\beta_t(\cdot)\). Since \(\beta(\cdot)\) is of gradient type, i.e. \(\beta=\nabla B\), where \(B(\cdot)\) is smooth and compactly supported, it is clear that \[\label{tangent32space32condition} V^\beta_{t_0}(\cdot)\in\text{T}_{P^0_{t_0}}\mathscr{P}_2(\mathbb{R}^d)=\overline{\big\{(\nabla\varphi+\beta)(\cdot):\,\varphi\in\mathcal{C}_c^\infty(\mathbb{R}^d,\mathbb{R})\big\}}^{L^2(P^0_{t_0})}.\tag{52}\] In practice, the expression (52 ) ensures we could find a sequence of compactly supported smooth vector fields to approximate \(V^\beta_{t_0}(\cdot)\), all of which are of gradient type.

6.2 Local behavior of Wasserstein metric↩︎

Having introduced the Wasserstein spaces, let us turn to the limiting behavior of \((t-t_0)^{-1}W_2(P^\beta_{t},P^\beta_{t_0})\) as \(t\searrow t_0\). This limiting identity is stated in Theorem 4. When the perturbation vanishes, this limiting identity reduces to an expression of the Fisher information quantity (8 ) and is therefore correlated to the time-derivative (39 ) of the relative entropy.

For the clarity of our exposition, we first claim that a family of random variables are \(\mathbb{P}^\beta\)-uniformly integrable. An inspection of (53 ) tells that these random variables are all functionals of the velocity field \((V^\beta_t(\cdot))_{t\geq0}\). And their uniform integrability is important to the Lemma 13. Readers are encouraged to go through the proof of Lemma 12, but it is also fine to skip its proof in first reading of this section.

Lemma 12. The family of random variables, \[\label{family32of32r46v46} \bigg(\big\lVert\frac{1}{t-t_0}\int_{t_0}^tV^\beta_\theta(X_\theta)\,d\theta-V^\beta_{t_0}(X_{t_0})\big\rVert^2\bigg)_{t\geq t_0},\tag{53}\] is \(\mathbb{P}^\beta\)-uniformly integrable.

Proof. Notice that for each \(t\geq0\), the velocity vector \(V^\beta_t(\cdot)\) is of class \(\mathcal{C}^\infty(\mathbb{R}^d;\mathbb{R}^d)\) with compact support. Hence we know \(\lVert V^\beta_{t}(X_{t})\rVert^2\in L^1(\mathbb{P}^\beta)\) for all \(t\geq0\), and by the Jensen inequality we have \[\big\lVert\frac{1}{t-t_0}\int_{t_0}^tV^\beta_\theta(X_\theta)\,d\theta\big\rVert^2\leq\frac{1}{t-t_0}\int_{t_0}^t\big\lVert V^\beta_\theta(X_\theta)\big\rVert^2\,d\theta.\] It then suffices to prove the uniform integrability of the family \[\bigg(\frac{1}{t-t_0}\int_{t_0}^t\big\lVert V^\beta_\theta(X_\theta)\big\rVert^2\,d\theta\bigg)_{t\geq t_0}.\] Invoking the definition of the velocity field \(V^\beta_t(\cdot)\) and that \(\beta(\cdot)\) is smooth with compact support, \[\bigg(\frac{1}{t-t_0}\int_{t_0}^t\big\lVert V^\beta_\theta(X_\theta)\big\rVert^2\,d\theta\bigg)_{t\geq t_0}\quad\text{is U.I. if and only if}\quad\bigg(\frac{1}{t-t_0}\int_{t_0}^t\big\lVert\nabla\mathcal{R}^\beta_\theta(X_\theta)\big\rVert^2\,d\theta\bigg)_{t\geq t_0}\quad\text{is U.I.,}\] where U.I. abbreviates the phrase uniformly integrable. By continuity of the sample paths of \((X_t)_{t\geq0}\), \[\frac{1}{t-t_0}\int_{t_0}^t\big\lVert\nabla\mathcal{R}^\beta_\theta(X_\theta)\big\rVert^2\,d\theta\to\big\lVert\nabla\mathcal{R}^0_{t_0}(X_{t_0})\big\rVert^2\;\;\text{as}\quad t\to t_0,\quad\mathbb{P}^\beta\text{-a.s.}\] Since \(L^1(\mathbb{P}^\beta)\) convergence implies \(\mathbb{P}^\beta\)-uniform integrability, it suffices to check the convergence of their \(\mathbb{P}^\beta\)-expectation by the Scheffé lemma. In fact, using (41 ), we ascertain this claim. ◻

The \(\mathbb{P}^\beta\)-uniform integrability of the random variables (53 ) is necessary to the proof of Lemma 13, where we transfer the \(\mathbb{P}^0\)-a.s. convergence of a sequence of random variables to their corresponding \(L^2(\mathbb{P}^0)\)-convergence in (54 ).

Lemma 13. The velocity field \((V^\beta_t(\cdot))_{t\geq t_0}\) induces a curved flow \((\mathcal{L}^\beta_t)_{t\geq t_0}\), characterized by \[\mathcal{L}^\beta_{t_0}=Id_{\mathbb{R}^d}\qquad\text{and}\qquad\frac{d}{dt}\mathcal{L}^\beta_t=V^\beta_t(\mathcal{L}^\beta_t),\quad\text{for all}\quad t\geq t_0.\] Then, for all \(t\geq t_0\), \((\mathcal{L}^\beta_t)_\#P^\beta_{t_0}=P^\beta_t\), i.e. the map \(\mathcal{L}^\beta_t:\mathbb{R}^d\to\mathbb{R}^d\) transports the probability measure \(P^\beta_{t_0}=P^0_{t_0}\) to the probability measure \(P^\beta_t\). Moreover, \[\label{eqn:326468} \lim\limits_{t\searrow t_0}\frac{1}{t-t_0}\mathbb{E}^{\mathbb{P}^0}\big[\big\lVert\mathcal{L}^\beta_t(X_{t_0})-X_{t_0}-(t-t_0)V^\beta_{t_0}(X_{t_0})\big\rVert^2\big]^{1/2}=0.\tag{54}\]

Proof. First note that \[\mathcal{L}^\beta_t(x)=x+\int_{t_0}^tV^\beta_\theta(\mathcal{L}^\beta_\theta(x))\,d\theta,\quad\text{for all}\quad(t,x)\in[t_0,\infty)\times\mathbb{R}^d.\] On this account, for all \(t\geq t_0\), \[\mathbb{E}^{\mathbb{P}^0}\big[\big\lVert\mathcal{L}^\beta_t(X_{t_0})-X_{t_0}-(t-t_0)V^\beta_{t_0}(X_{t_0})\big\rVert^2\big]=\mathbb{E}^{\mathbb{P}^0}\big[\big\lVert\int_{t_0}^tV^\beta_\theta(\mathcal{L}^\beta_\theta(x))\,d\theta-(t-t_0)V^\beta_{t_0}(X_{t_0})\big\rVert^2\big].\] In light of Lemma 9, \(\mathbb{P}^\beta\) and \(\mathbb{P}^0\) are mutually absolutely continuous with uniformly bounded density process. Hence, to verify (54 ), it suffices to show the limiting assertion, \[\lim\limits_{t\searrow t_0}\frac{1}{t-t_0}\mathbb{E}^{\mathbb{P}^\beta}\big[\big\lVert\int_{t_0}^tV^\beta_\theta(X_\theta)\,d\theta-(t-t_0)V^\beta_{t_0}(X_{t_0})\big\rVert^2\big]^{1/2}=0.\] By the continuity of the sample paths of \((X_t)_{t\geq0}\), \[\big\lVert\frac{1}{t-t_0}\int_{t_0}^tV^\beta_\theta(X_\theta)\,d\theta-V^\beta_{t_0}(X_{t_0})\big\rVert^2\to0\quad\text{as}\quad t\to t_0,\quad\mathbb{P}^\beta\text{-a.s.}\] By the \(\mathbb{P}^\beta\)-uniform integrability in Lemma 12, the convergence still holds after taking \(\mathbb{P}^\beta\)-expectation. ◻

The limiting assertion (54 ) to the non-optimal transport plan \((\mathcal{L}^\beta_t)_{t\geq t_0}\) will be used in Theorem 4, where we decompose the transport from \(P^\beta_{t_0}\) to \(P^\beta_t\) into a composition of a sequence of optimal transport plans \((\mathcal{J}^{\beta,(m)}_t)_{t\geq t_0, m\in\mathbb{N}}\) and the non-optimal transport \((\mathcal{L}^\beta_t)_{t\geq t_0}\).

Theorem 4. We have the local limiting behavior of the quadratic Wasserstein metric, \[\label{eqn4432w-space32distance} \lim\limits_{t\searrow t_0}\frac{1}{t-t_0}W_2(P^\beta_t,P^\beta_{t_0})=\frac{1}{2}\big\lVert\nabla\mathcal{R}^0_{t_0}(X_{t_0})+2\beta(X_{t_0})\big\rVert_{L^2(\mathbb{P}^0)}.\tag{55}\]

Proof. According to (52 ), there exists a sequence of compactly supported functions \((\varphi^{\beta,(m)}_{t_0}(\cdot))_{m\in\mathbb{N}}\) of class \(\mathcal{C}^\infty(\mathbb{R}^d;\mathbb{R})\), such that \[\label{L232limit32of32velocity32vector} \lim\limits_{m\to\infty}\mathbb{E}^{\mathbb{P}^0}\big[\big\lVert V^\beta_{t_0}(X_{t_0})-\nabla\varphi^{\beta,(m)}_{t_0}(X_{t_0})\big\rVert^2\big]=0.\tag{56}\] We call the gradients \((\nabla\varphi^{\beta,(m)}_{t_0}(\cdot))_{m\in\mathbb{N}}\) the localized gradient fields, which have compact support and approximate the velocity field \(V^\beta_{t_0}(\cdot)\) in \(L^2(\mathbb{P}^0)\). These localized gradient fields induce a sequence of localized linear transports \((\mathcal{J}^{\beta,(m)}_t)_{t\geq t_0,m\in\mathbb{N}}\), defined by \[\mathcal{J}^{\beta,(m)}_t(x)\mathrel{\vcenter{:}}= x+(t-t_0)\nabla\varphi^{\beta,(m)}_{t_0}(x)\quad\text{for all}\quad x\in\mathbb{R}^d,\quad t\geq t_0,\quad\text{and}\quad m\in\mathbb{N}.\] Denote by \(P^{\beta,(m)}_{\mathcal{J}_t}\) the transport image of \(P_{t_0}\) under \(\mathcal{J}^{\beta,(m)}_t\), i.e. \(P^{\beta,(m)}_{\mathcal{J}_t}=(\mathcal{J}^{\beta,(m)}_t)_\# P_{t_0}\) for all \(t\geq t_0\) and \(m\in\mathbb{N}\). We claim that \[\label{optimal32local32transport} \lim\limits_{t\searrow t_0}\frac{1}{t-t_0}W_2(P^{\beta,(m)}_{\mathcal{J}_t},P_{t_0})=\big\lVert\nabla\varphi^{\beta,(m)}_{t_0}(X_{t_0})\big\rVert_{L^2(\mathbb{P}^0)}.\tag{57}\] In order to deduce (57 ), we have to show that \(\mathcal{J}^{\beta,(m)}_t(\cdot):\mathbb{R}^d\to\mathbb{R}^d\) is the gradient of a convex function, for all \(t\geq t_0\) sufficiently close to \(t_0\). From its definition, \[\mathcal{J}^{\beta,(m)}_t(x)=\nabla\big(\frac{1}{2}\norm{x}^2+(t-t_0)\varphi^{\beta,(m)}_{t_0}(x)\big)\quad\text{for all}\quad x\in\mathbb{R}^d.\] Hence, it suffices to show that \(\tfrac{1}{2}\norm{\cdot}^2+(t-t_0)\varphi^{\beta,(m)}_{t_0}(\cdot)\) is convex for all \(m\in\mathbb{N}\), when \(t\geq t_0\) is close enough to \(t_0\). Its Hessian matrix is given by, \[\label{hessioan32of32matrix32phi} Id_{\mathbb{R}^d}+(t-t_0)\text{Hess}(\varphi^{\beta,(m)}_{t_0})(x),\quad\text{for all}\quad x\in\mathbb{R}^d.\tag{58}\] Since \(\varphi^{\beta,(m)}_{t_0}(\cdot)\) is smooth with compact support, there exists \(\epsilon_m>0\) such that (58 ) is positive definite for all \(t_0\leq t\leq t_0+\epsilon_m\), uniformly in \(x\in\mathbb{R}^d\). Hence, for each \(m\in\mathbb{N}\), \(\mathcal{J}^{\beta,(m)}_t(\cdot)\) is indeed the gradient of a convex function when \(t_0\leq t\leq t_0+\epsilon_m\). And (57 ) follows from the Brenier theorem, [81], [82]. Invoking (56 ), \[\label{w32limit321} \lim\limits_{m\to\infty}\lim\limits_{t\searrow t_0}\frac{1}{t-t_0}W_2(P^{\beta,(m)}_{\mathcal{J}_t},P_{t_0})=\big\lVert V^\beta_{t_0}(X_{t_0})\big\rVert_{L^2(\mathbb{P}^0)}=\frac{1}{2}\big\lVert\nabla\mathcal{R}^0_{t_0}(X_{t_0})+2\beta(X_{t_0})\big\rVert_{L^2(\mathbb{P}^0)}.\tag{59}\] Our next step is to show that \[\label{w32limit322} \lim\limits_{m\to\infty}\lim\limits_{t\searrow t_0}\frac{1}{t-t_0}W_2(P^\beta_t,P^{\beta,(m)}_{\mathcal{J}_t})=0.\tag{60}\] To achieve this, we construct a transport plan from \(P^{\beta,(m)}_{\mathcal{J}_t}\) to \(P^\beta_t\). In Lemma 13, we have the non-optimal transport \(\mathcal{L}^\beta_t\) with \((\mathcal{L}^\beta_t)_\#P_{t_0}=P^\beta_t\). And we have the localized linear transport \(\mathcal{J}^{\beta,(m)}_t\) with \((\mathcal{J}^{\beta,(m)}_t)_\# P_{t_0}=P^{\beta,(m)}_{\mathcal{J}_t}\). To this end, let \(\mathcal{H}^{\beta,(m)}_t\mathrel{\vcenter{:}}=\mathcal{L}^\beta_t\circ(\mathcal{J}^{\beta,(m)}_t)^{-1}\), whence \((\mathcal{H}^{\beta,(m)}_t)_\#P^{\beta,(m)}_{\mathcal{J}_t}=P^\beta_t\) for all \(t\geq t_0\). Let \(\mathbb{P}^{\beta,(m)}_{\mathcal{J}}\) denote a probability measure on the path space \(\mathcal{C}\) under which the canonical coordinate process \((X_t)_{t\geq0}\) has the marginal distribution \(P^{\beta,(m)}_{\mathcal{J}_t}\) at each \(t\geq t_0\) and such that the marginals of \(\mathbb{P}^{\beta,(m)}_{\mathcal{J}}\) agrees with \(\mathbb{P}\) at time \(t\) when \(0\leq t\leq t_0\). Then, \[\mathbb{E}^{\mathbb{P}^{\beta,(m)}_{\mathcal{J}}}\big[\big\lVert\mathcal{H}^{\beta,(m)}_t(X_t)-X_t\big\rVert^2\big]=\mathbb{E}^{\mathbb{P}^0}\big[\big\lVert\mathcal{L}^\beta_t(X_{t_0})-\mathcal{J}^{\beta,(m)}_t(X_{t_0})\big\rVert^2\big],\quad\text{for all}\quad t\geq t_0.\] Notice that, \[\frac{1}{2(t-t_0)^2}\big\lVert\mathcal{L}^\beta_t(x)-\mathcal{J}^{\beta,(m)}_t(x)\big\rVert^2\leq\big\lVert V^\beta_{t_0}(x)-\nabla\varphi^{\beta,(m)}_{t_0}(x)\big\rVert^2+\big\lVert\frac{1}{t-t_0}\int_{t_0}^tV^\beta_\theta(\mathcal{L}^\beta_\theta(x))\,d\theta-V^\beta_{t_0}(x)\big\rVert^2.\] Using (56 ) and Lemma 13, we can conclude that \[\lim\limits_{m\to\infty}\lim\limits_{t\searrow t_0}\frac{1}{t-t_0}W_2(P^\beta_t,P^{\beta,(m)}_{\mathcal{J}_t})=\lim\limits_{m\to\infty}\lim\limits_{t\searrow t_0}\frac{1}{t-t_0}\mathbb{E}^{\mathbb{P}^{\beta,(m)}_{\mathcal{J}}}\big[\big\lVert\mathcal{H}^{\beta,(m)}_t(X_t)-X_t\big\rVert^2\big]^{1/2}=0\] which verifies (60 ). Since \[\lim\limits_{m\to\infty}\lim\limits_{t\searrow t_0}\frac{1}{t-t_0}W_2(P^{\beta,(m)}_{\mathcal{J}_t},P_{t_0})\leq\lim\limits_{m\to\infty}\lim\limits_{t\searrow t_0}\frac{1}{t-t_0}W_2(P^{\beta,(m)}_{\mathcal{J}_t},P^\beta_t)+\liminf\limits_{t\searrow t_0}\frac{1}{t-t_0}W_2(P^\beta_t,P_{t_0})\] as well as \[\limsup\limits_{t\searrow t_0}\frac{1}{t-t_0}W_2(P^\beta_t,P_{t_0})\leq\lim\limits_{m\to\infty}\lim\limits_{t\searrow t_0}\frac{1}{t-t_0}W_2(P^\beta_t,P^{\beta,(m)}_{\mathcal{J}_t})+\lim\limits_{m\to\infty}\lim\limits_{t\searrow t_0}\frac{1}{t-t_0}W_2(P^{\beta,(m)}_{\mathcal{J}_t},P_{t_0}),\] using (59 ) and (60 ) we can conclude (55 ). And the assertion is verified. ◻

Theorem 4 reveals the time-derivative of the Wasserstein metric from \((P^\beta_t)_{t\geq t_0}\) to \(P^\beta_{t_0}\). This limiting identity (55 ) includes both the gradient of the relative entropy process and perturbation terms. In fact, if we collapse the perturbation, the results become more transparent.

Corollary 4. Switching off the perturbation \(\beta(\cdot):\mathbb{R}^d\to\mathbb{R}^d\), Theorem 4 reduces to the time-derivative of the unperturbed Wasserstein metric from \((P^0_t)_{t\geq t_0}\) to \(P^0_{t_0}\), \[\label{eqn443264615} \lim\limits_{t\searrow t_0}\frac{1}{t-t_0}W_2(P^0_t,P^0_{t_0})=\frac{1}{2}\norm{\nabla\mathcal{R}^0_{t_0}(X_{t_0})}_{L^2(\mathbb{P}^0)}.\tag{61}\]

Without perturbation, the time-derivative of the Wasserstein metric (61 ) is equal to the square root of Fisher information. Through this limiting identity, (61 ) is therefore correlated to the time-derivative of the relative entropy (42 ). Additionally, this insight reveals the steepest descent property of the dissipation of relative entropy, concerning the scenario of the unperturbed dynamics.

6.3 Steepest descent property of relative entropy \(\mathbb{H}\)↩︎

The philosophy of steepest descent is to locate a parametrized curve from an abstract manifold, such that the varying rate of some indexed quantities is extremized. This idea was adopted by Debye [83] who used Bessel functions [84] to numerically approximate an integral. In the work of Lagrange [85], [86], Landau/Lifshitz [87], and Feynman [88], the Lagrangian formalism of mechanics was progressively designed to interpret the variational principles and the trajectory of classical particles.

Over this expository article, the phrase steepest descent has appeared without an explanation. What it refers to is not completely in align with the literature listed above. Nonetheless, its precise interpretation will be clarified at this point. And this steepest descent property, corresponding to the unperturbed scenario of (1 ), will also answer the question why we are interested in introducing the smooth perturbation \(\beta(\cdot)\) into our Itô-Langevin dynamics, and why the unperturbed case is remarkable.

Combining the results from Theorems 3 and 4, we observe that the time-derivatives of relative entropy and Wasserstein metric, evaluated at \(t_0\geq0\), are correlated via an expression of Fisher information (8 ) as well as some perturbation terms, i.e. \[\label{derivetive:32composed4432perturbed} \lim\limits_{t\searrow t_0}\frac{\,\mathbb{H}[P^\beta_t|Q]-\mathbb{H}[P^\beta_{t_0}|Q]\,}{W_2(P^\beta_t,P^\beta_{t_0})}=-\mathbb{E}^{\mathbb{P}^0}\bigg[\nabla\mathcal{R}^0_{t_0}(X_{t_0})\cdot\frac{\nabla\mathcal{R}^0_{t_0}(X_{t_0})+2\beta(X_{t_0})}{\norm{\nabla\mathcal{R}^0_{t_0}(X_{t_0})+2\beta(X_{t_0})}_{L^2(\mathbb{P}^0)}}\bigg].\tag{62}\] When the perturbation vanishes, the RHS of (62 ) reduces to the square root of Fisher information, i.e. \[\label{derivetive:32composed4432unperturbed} \lim\limits_{t\searrow t_0}\frac{\,\mathbb{H}[P^0_t|Q]-\mathbb{H}[P^0_{t_0}|Q]\,}{W_2(P^0_t,P^0_{t_0})}=-\mathbb{E}^{\mathbb{P}^0}\bigg[\nabla\mathcal{R}^0_{t_0}(X_{t_0})\cdot\frac{\nabla\mathcal{R}^0_{t_0}(X_{t_0})}{\norm{\nabla\mathcal{R}^0_{t_0}(X_{t_0})}_{L^2(\mathbb{P}^0)}}\bigg].\tag{63}\] Comparing (62 ) and (63 ) and in light of the Cauchy-Schwarz inequality, we observe that their difference, \[\lim\limits_{t\searrow t_0}\frac{\,\mathbb{H}[P^\beta_t|Q]-\mathbb{H}[P^\beta_{t_0}|Q]\,}{W_2(P^\beta_t,P^\beta_{t_0})}-\lim\limits_{t\searrow t_0}\frac{\,\mathbb{H}[P^0_t|Q]-\mathbb{H}[P^0_{t_0}|Q]\,}{W_2(P^0_t,P^0_{t_0})},\] is always nonnegative, and strictly positive when \(\beta(\cdot)\) is not parallel to \(\nabla\mathcal{R}^0_{t_0}\). If we otherwise view \(\mathbb{H}[P^\beta_t|Q]\), \(t\geq0\) as a flow on the curve \((P^\beta_t)_{t\geq0}\subseteq\mathscr{P}_2(\mathbb{R}^d)\), then its slope reaches infimum in the absence of perturbation. Henceforth, the relative entropy is unlikely to increase and most likely to decrease when the perturbation vanishes. This extremal phenomenon is therefore referred as the steepest descent property.

6.4 Dissipative velocity of relative entropy \(\mathbb{H}\)↩︎

Applying additional non-degeneracy conditions on the second-order derivatives of the potential \(\psi(\cdot)\), we could extract more information from the unperturbed Itô-Langevin stochastic dynamics (1 ). Namely, we obtain the Bakry-Émery Bakry/Émery? exponential decay rate of \(\mathbb{H}[P^0_t|Q]\). For an invitation to the relevant topics in the Bakry-Émery theory, which derives also the exponential decay of \(\mathbb{I}[P^0_t|Q]\) defined in (8 ), readers are encouraged to the references Bakry-Émery Bakry/Émery?, Bakry/Gentil/Ledoux [89], and Gentil [90].

In our exposition, the derivation relies on the analysis of the geodesics in \(\mathscr{P}_2(\mathbb{R}^d)\), viewed as a manifold. First, let \(\mu_a,\mu_b\), \(a<b\), be two elements in \(\mathscr{P}_2(\mathbb{R}^d)\), both absolutely continuous with respect to the reference measure \(Q\) introduced in Section 2. Let \(\mathcal{T}\) denote the optimal transport form \(\mu_a\) to \(\mu_b\), i.e. \(\mu_b=(\mathcal{T})_\#\mu_a\). Then, the interpolation family \((\mathcal{T}_t)_{a\leq t\leq b}\) of transport plans induced by \(\mathcal{T}\) such that \[\mathcal{T}_t\mathrel{\vcenter{:}}=\frac{b-t}{b-a} Id_{\mathbb{R}^d}+\frac{t-a}{b-a}\mathcal{T},\quad\text{for all}\quad t\in[a,b],\] generates a law \(\mu\) on the Borel sets of \(\mathcal{C}([a,b];\mathbb{R}^d)\). For each \(t\in[a,b]\), let \(\mu_t\) denote the marginal distribution of \(\mu\) on \(\mathbb{R}^d\) at \(t\). Then, the parametrized family \((\mu_t)_{a\leq t\leq b}\) is a curve in \(\mathscr{P}_2(\mathbb{R}^d)\) and satisfies \(\mu_t\mathrel{\vcenter{:}}=(\mathcal{T}_t)_\#\mu_a\) for all \(t\in[a,b]\). Hence, \((\mu_t)_{a\leq t\leq b}\) is a constant speed geodesic.

In Section 2, we have defined the relative entropy \(\mathbb{H}[P^\beta_t|Q]\) and its associated process \(\mathcal{R}^{\mathbb{P}^\beta}_t\), abbreviated as \(\mathcal{R}^\beta_t\), for each \(t\geq0\). We can similarly define the relative entropy \(\mathbb{H}[\mu_t|Q]\) as well as its associated process \(\mathcal{R}^\mu_t(X_t)\mathrel{\vcenter{:}}=\log d\mu_t/dQ\) with \(a\leq t\leq b\), where \((X_t)_{a\leq t\leq b}\) is the canonical coordinate process in \(\mathcal{C}([a,b];\mathbb{R}^d)\) so that \(X_t\sim\mu_t\) for all \(a\leq t\leq b\). The dissipation of \(\mathbb{H}[\mu_t|Q]\), calculated against \(Q\) and along the geodesic \((\mu_t)_{a\leq t\leq b}\), will be our first step to understand the exponential decay rate of \(\mathbb{H}[P^0_t|Q]\) with \(t\geq t_0\).

Lemma 14. Fix \(\mu_a,\mu_b\in\mathscr{P}_2(\mathbb{R}^d)\), both absolutely continuous with respect to \(Q\). We have the time-derivative of the relative entropy along the constant speed geodesic \((\mu_t)_{a\leq t\leq b}\), \[\lim\limits_{t\searrow a}\frac{1}{t-a}\bigg(\mathbb{H}\big[\mu_t|Q\big]-\mathbb{H}\big[\mu_a|Q\big]\bigg)=\frac{1}{b-a}\mathbb{E}^{\mu}\big[\nabla\mathcal{R}^\mu_a\cdot(\mathcal{T}-Id_{\mathbb{R}^d})(X_a)\big]\]

Proof. Only in this proof, we use \(V^\mu_t=(V^{\mu,(1)}_t\ldots,V^{\mu,(d)}_t)\) to denote the velocity field defined by, \[(t,x)\mapsto V^\mu_t(x)\mathrel{\vcenter{:}}=(\mathcal{T}-Id_{\mathbb{R}^d})\big((\mathcal{T}_t)^{-1}x\big),\quad\text{for all}\quad(t,x)\in[a,b]\times\mathbb{R}^d.\] Then, \(V^\mu_t(\cdot)\) is associated to the transport \(\mathcal{T}_t\) in the sense that \[\mathcal{T}_t(x)=x+\frac{1}{b-a}\int_a^t V^\mu_s\big(\mathcal{T}_s(x)\big)\,ds,\quad\text{for all}\quad a\leq t\leq b.\] Since each \(\mu_t\) is absolutely continuous with respect to \(Q\), while \(Q\) is absolutely continuous with respect to the Lebesgue measure on \(\mathbb{R}^d\), then \(\mu_t\) is absolutely continuous with respect to the Lebesgue measure on \(\mathbb{R}^d\) with density \(\rho^\mu_t(\cdot):\mathbb{R}^d\to\mathbb{R}_+\). According to [82], \[-\frac{\partial\rho^\mu_t}{\partial t}(x)=\sum\limits_{1\leq i\leq d}\frac{\partial V^{\mu,(i)}_t}{\partial x_i}(x)\rho^\mu_t(x)+\big(V^\mu_t\cdot\nabla\rho^\mu_t\big)(x),\quad\text{for all}\quad(t,x)\in[a,b]\times\mathbb{R}^d.\] Recall that \((X_t)_{a\leq t\leq b}\) denotes the coordinate process in \(\mathcal{C}([a,b];\mathbb{R}^d)\) with \(X_a\sim\mu_a\). Then the integral form, \[X_t=X_a+\frac{1}{b-a}\int_a^tV^{\mu}_s(X_s)\,ds,\quad\text{for all}\quad a\leq t\leq b,\] characterizes the law of \(X_t\) satisfying \(X_t\sim(\mathcal{T}_t)_\#\mu_a\) for all \(a\leq t\leq b\). Hence, \[d\rho^\mu_t(X_t)=\frac{\partial\rho^\mu_t}{\partial t}(X_t)\,dt+\nabla\rho^\mu_t(X_t)\,dX_t=-\frac{1}{b-a}\sum\limits_{1\leq i\leq d}\frac{\partial V^{\mu,(i)}_t}{\partial x_i}(X_t)\rho^\mu_t(X_t)\,dt.\] Therefore, \(d\log\rho^\mu_t(X_t)=-(b-a)^{-1}\sum_{1\leq i\leq d}(\partial V^{\mu,(i)}_t/\partial x_i)(X_t)\,dt\), \(a\leq t\leq b\). Since \(q(\cdot)=\exp(-2\psi(\cdot))\), \[d\log q(X_t)=-2\nabla\psi(X_t)\,dX_t=-\frac{2}{b-a}(\nabla\psi\cdot V^{\mu}_t)(X_t)\,dt.\] Henceforth, \[d\mathcal{R}^\mu_t(X_t)=d\log\frac{\rho^\mu_t}{q}(X_t)=\frac{1}{b-a}\big(2(\nabla\psi\cdot V^{\mu}_t)-\sum\limits_{1\leq i\leq d}\frac{\partial V^{\mu,(i)}_t}{\partial x_i}\big)(X_t)\,dt,\quad\text{for all}\quad a\leq t\leq b.\] Taking \(\mu\)-expectation, \[\mathbb{H}\big[\mu_t|Q\big]-\mathbb{H}\big[\mu_a|Q\big]=\mathbb{E}^{\mu}\big[\mathcal{R}^\mu_t(X_t)\big]-\mathbb{E}^{\mu}\big[\mathcal{R}^\mu_a(X_a)\big]=\frac{1}{b-a}\mathbb{E}^{\mu}\big[\int_a^t\big(2(\nabla\psi\cdot V^{\mu}_t)-\sum\limits_{1\leq i\leq d}\frac{\partial V^{\mu,(i)}_t}{\partial x_i}\big)(X_s)\,ds\big],\] for all \(a\leq t\leq b\). Consequently, \[\lim\limits_{t\searrow a}\frac{b-a}{t-a}\bigg(\mathbb{H}\big[\mu_t|Q\big]-\mathbb{H}\big[\mu_a|Q\big]\bigg)=\mathbb{E}^{\mu}\big[2(\nabla\psi\cdot V^{\mu}_t)(X_a)-\sum\limits_{1\leq i\leq d}\frac{\partial V^{\mu,(i)}_t}{\partial x_i}(X_a)\big]=\mathbb{E}^{\mu_a}\big[(\nabla\mathcal{R}^{\mu}_a\cdot V^\mu_a)(X_a)\big],\] where the last equality is due to integration by parts. Since \(V^\mu_a=\mathcal{T}-Id_{\mathbb{R}^d}\), the assertion is verified. ◻

Now we impose the displacement convexity to the potential \(\psi(\cdot):\mathbb{R}^d\to\mathbb{R}_+\), which is necessary to the formulation of the following lemma.

Lemma 15. Suppose the potential \(\psi(\cdot)\) satisfies the curvature bound, \(\text{Hess}(\psi)\geq\kappa Id_{\mathbb{R}^d}\), for some \(\kappa>0\). Fix \(\mu_a,\mu_b\in\mathscr{P}_2(\mathbb{R}^d)\) such that both are absolutely continuous with respect to \(Q\). Then we have, \[\mathbb{H}\big[\mu_a|Q\big]-\mathbb{H}\big[\mu_b|Q\big]\leq-\mathbb{E}^{\mu}\big[\nabla\mathcal{R}^\mu_a\cdot(\mathcal{T}-Id_{\mathbb{R}^d})(X_a)\big]-\frac{\kappa}{2}W_2(\mu_a,\mu_b)^2.\]

Proof. Only in this proof, we define the following two functions, \[\mathscr{F}(t)\mathrel{\vcenter{:}}=\int_{\mathbb{R}^d}\rho^\mu_t(x)\log\rho^\mu_t(x)\,dx\qquad\text{and}\qquad\mathscr{H}(t)\mathrel{\vcenter{:}}=\int_{\mathbb{R}^d}2\psi(x)\rho^\mu_t(x)\,dx,\quad\text{for all}\quad a\leq t\leq b.\] By [82], the functions \(\mathscr{F}(t)\) and \(\mathscr{H}(t)\) are, respectively, displacement convex and \(\kappa\)-uniformly displacement convex from the Wasserstein space perspective, i.e. \[\frac{\partial^2}{\partial t^2}\mathscr{F}(t)\geq0\qquad\text{and}\qquad\frac{\partial^2}{\partial t^2}\mathscr{H}(t)\geq\frac{\kappa}{(b-a)^2} W_2(\mu_a,\mu_b)^2,\quad\text{for all}\quad a\leq t\leq b.\] Notice that \(\mathbb{H}[\mu_t|Q]=\mathscr{F}(t)+\mathscr{H}(t)\), \(a\leq t\leq b\). Henceforth, the relative entropy function \(t\mapsto\mathbb{H}[\mu_t|Q]\) is \(\kappa\)-uniformly displacement convex from the Wasserstein space perspective, i.e. \[\frac{\partial^2}{\partial t^2}\mathbb{H}\big[\mu_t|Q\big]\geq\frac{\kappa}{(b-a)^2} W_2(\mu_a,\mu_b)^2,\quad\text{for all}\quad a\leq t\leq b.\] Use Lemma 14 and the Taylor formula, we observe that \[\mathbb{H}\big[\mu_b|Q\big]=\mathbb{H}\big[\mu_a|Q\big]+(b-a)\frac{\partial}{\partial t}\mathbb{H}\big[\mu_t|Q\big]\bigg|_{t=0^+}+\int_a^b(b-t)\frac{\partial^2}{\partial t^2}\mathbb{H}\big[\mu_t|Q\big]\,dt,\] which implies \[\mathbb{H}\big[\mu_b|Q\big]-\mathbb{H}\big[\mu_a|Q\big]\geq\mathbb{E}^{\mu}\big[\nabla\mathcal{R}^\mu_a\cdot(\mathcal{T}-Id_{\mathbb{R}^d})(X_a)\big]+\frac{\kappa}{2}W_2(\mu_a,\mu_b)^2.\] And the assertion is verified. ◻

Lemma 15 is of the HWI inequality type of Cordero-Erausquin [91] and Otto/Villani [92], which relates the fundamental quantities of relative entropy (H), quadratic Wasserstein distance (W), and Fisher information (I). This inequality has also been discussed by Datta/Rouzé Datta/Rouzé?, McCann [93], and Villani [94] on its convexity results and on its relations to the relative entropy dissipation. Another exposition on the HWI inequalities is Gentil/Léonard/Ripani/Tamanini Gentil/Léonard/Ripani/Tamanini?.

Theorem 5. Suppose the potential \(\psi(\cdot)\) satisfies the curvature bound, \(\text{Hess}(\psi)\geq\kappa Id_{\mathbb{R}^d}\) for some \(\kappa>0\). Then the relative entropy decays exponentially. In particular, \[\label{exponential32decay4432perturbed} \mathbb{H}\big[P^0_t|Q\big]\leq\mathbb{H}\big[P^0_{t_0}|Q\big]e^{-\kappa(t-t_0)},\quad\text{for all}\quad t\geq t_0.\tag{64}\]

Proof. On the strength of the Cauchy-Schwarz inequality, \[-\mathbb{E}^{\mu}\big[\nabla\mathcal{R}^\mu_a\cdot(\mathcal{T}-Id_{\mathbb{R}^d})(X_a)\big]\leq\big\lVert\nabla\mathcal{R}^\mu_a(X_a)\big\rVert_{L^2(\mu_a)}\big\lVert(\mathcal{T}-Id_{\mathbb{R}^d})(X_a)\big\rVert_{L^2(\mu_a)}.\] Use Lemma 15, the definition of Fisher information (8 ) as well as the optimal transport property (51 ), \[\label{eqn4432middle32step} \mathbb{H}\big[\mu_a|Q\big]-\mathbb{H}\big[\mu_b|Q\big]\leq W_2(\mu_a,\mu_b)\sqrt{\mathbb{I}\big[\mu_a|Q\big]}-\frac{\kappa}{2}W_2(\mu_a,\mu_b)^2.\tag{65}\] Reading (65 ), we first take \((\mu_a,\mu_b)=(Q,P^0_t)\) and then take \((\mu_a,\mu_b)=(P^0_t,Q)\), then \[\mathbb{H}\big[P^0_t|Q\big]\leq\frac{1}{2\kappa}\mathbb{I}\big[P^0_t|Q\big],\quad\text{for all}\quad t\geq0.\] Applying Corollary 3, \[\frac{\partial}{\partial t}\mathbb{H}\big[P^0_t|Q\big]\leq-\kappa\mathbb{H}\big[P^0_t|Q\big].\] And the assertion is verified. ◻

In light of the Bakry-Émery theory, therefore, Theorem 5 yields the exponential decay rate of the relative entropy corresponding to the unperturbed Itô-Langevin dynamics.

1

References↩︎

[1]
.
[2]
.
[3]
.
[4]
.
[5]
.
[6]
.
[7]
.
[8]
.
[9]
.
[10]
.
[11]
.
[12]
.
[13]
.
[14]
.
[15]
.
[16]
.
[17]
.
[18]
.
[19]
.
[20]
.
[21]
.
[22]
.
[23]
.
[24]
.
[25]
.
[26]
.
[27]
.
[28]
.
[29]
.
[30]
.
[31]
.
[32]
.
[33]
.
[34]
.
[35]
.
[36]
.
[37]
.
[38]
.
[39]
.
[40]
.
[41]
.
[42]
.
[43]
.
[44]
.
[45]
.
[46]
.
[47]
.
[48]
.
[49]
.
[50]
.
[51]
.
[52]
.
[53]
.
[54]
.
[55]
.
[56]
.
[57]
.
[58]
.
[59]
.
[60]
.
[61]
.
[62]
.
[63]
.
[64]
.
[65]
.
[66]
.
[67]
.
[68]
.
[69]
.
[70]
.
[71]
.
[72]
.
[73]
.
[74]
.
[75]
.
[76]
.
[77]
.
[78]
.
[79]
.
[80]
.
[81]
.
[82]
.
[83]
.
[84]
.
[85]
.
[86]
.
[87]
.
[88]
.
[89]
.
[90]
.
[91]
.
[92]
.
[93]
.
[94]
.

  1. chen.jiaming@cims.nyu.edu↩︎