October 30, 2025
We prove Kantorovich duality for a linearized version of a recently proposed non-quadratic quantum optimal transport problem, where quantum channels realize the transport. As an application, we determine optimal solutions of both the primal and the dual problem using this duality in the case of quantum bits and distinguished cost operators, with certain restrictions on the states involved. Finally, we use this information on optimal solutions to give an analytical proof of the triangle inequality for the induced quantum Wasserstein divergences.
Although Gaspard Monge formulated the first version of the optimal transport problem already at the end of the 18th century [1], the theory of optimal transportation became a vital part of mathematical analysis only in the 20th century, when major advances were obtained by Leonid Kantorovich in the 1940s [2], [3], and the breakthrough result of Yann Brenier on the structure of optimal transport maps [4], [5] induced intense research activity on the topic by various authors working in analysis and mathematical physics.
Techniques relying on the theory of optimal transport and using desirable properties of the induced Wasserstein distances on probability measures played a key role in significant advancements in several areas of mathematics, such as probability theory [6], [7], the study of physical evolution equations [8]–[10] and stochastic partial differential equations [11], [12], variational analysis [13], [14], and the geometry of metric measure spaces [15]–[18]. We refer to the monographs [19]–[22] for a detailed overview of the field.
Beyond their theoretical importance, transport-related metrics and optimal transport techniques have found their place in a large variety of disciplines outside mathematics, such as economics [23], finance, and biology [24], and they also became popular and found their applications in applied sciences like biomedical image processing [25]–[27], data analysis and classification [28], [29], or machine learning [30]–[34].
Recent decades have seen also several non-commutative (or quantum) versions of the optimal transport problem and induced Wasserstein distances. In the early 1990s, relying on duality phenomena, Connes and Lott proposed a spectral distance in the framework of non-commutative geometry [35]. A few years later, Słomczyński and Życzkowski defined a distance on quantum states by the classical Wasserstein distance of their Husumi transforms [36], [37], and a free probability approach was proposed by Biane and Voiculescu in 2001 [38] — see also the works of Shlyakhtenko [39]–[41] on the topic. Carlen and Maas laid down the foundations of a dynamical theory [42]–[45] relying on the classical Benamou-Brenier formula and Jordan-Kinderlehrer-Otto theory, and this work has been continued by Datta, Rouzé [46], [47], and Wirth [48], among others. Caglioti, Golse, Mouhot, and Paul worked out a quantum optimal transport concept based on quantum couplings [49]–[56], while De Palma and Trevisan established a similar, yet different, concept based on quantum channels [57], [58]. The concept of Friedland, Eckstein, Cole and Życzkowski [59]–[61] is also based on couplings, but with strikingly different cost operators. Duvenhage used modular couplings to define quantum Wasserstein distances [62]–[65], and separable quantum Wasserstein distances have also been introduced and studied [66]–[68]. A substantial part of the above mentioned current approaches to non-commutative optimal transport is covered by the book [69], and the reader is advised to consult the survey papers [70] and [71] as well.
In this paper, we take the quantum optimal transport concept developed by De Palma and Trevisan [57], [58] as starting point, and consider a non-quadratic generalization of the transport problem introduced there, which we proposed recently in [72]. We consider a linear relaxation of this latter transport problem and prove strong Kantorovich duality for it with an appropriate dual problem. When proving the duality, we will follow the approach of Caglioti, Golse, and Paul [50], [55] (see also [73]), which partially relies on ideas from the proof of the classical Kantorovich duality (see, e.g., [21]). As an application, we determine optimal solutions of both the primal and the dual problem using this duality in the case of quantum bits and distinguished cost operators, with certain restrictions on the states involved. Finally, we use this information on optimal solutions to give an analytical proof of the triangle inequality for the induced quantum Wasserstein divergences.
Let us recall now those elements of the mathematical formalism of quantum mechanics that we will use throughout this paper. Let \(\mathcal{H}\) be a separable complex Hilbert space. In the sequel, we denote by \(\mathcal{L}(\mathcal{H})^{sa}\) the set of self-adjoint but not necessarily bounded operators on \(\mathcal{H}\), and \(\mathcal{S}(\mathcal{H})\) stands for the set of states, that is, the set of positive trace-class operators on \(\mathcal{H}\) with unit trace. The space of all bounded operators on \(\mathcal{H}\) is denoted by \(\mathcal{B}(\mathcal{H}),\) and we recall that the collection of trace-class operators on \(\mathcal{H}\) is denoted by \(\mathcal{T}_1(\mathcal{H})\) and defined by \(\mathcal{T}_1(\mathcal{H})= \left\{ X \in \mathcal{B}(\mathcal{H}) \, \middle| \, \mathrm{tr}_{\mathcal{H}}[\sqrt{X^*X}] < \infty \right\}.\) Similarly, \(\mathcal{T}_2(\mathcal{H})\) stands for the set of Hilbert-Schmidt operators defined by \(\mathcal{T}_2(\mathcal{H})= \left\{ X \in \mathcal{B}(\mathcal{H}) \, \middle| \, \mathrm{tr}_{\mathcal{H}}[X^*X] < \infty \right\}.\) A quantum channel is a completely positive and trace preserving (CPTP) linear map on \(\mathcal{T}_1(\mathcal{H}).\) The transpose \(A^T\) of a linear operator \(A\) acting on a Hilbert space \(\mathcal{H}\) is a linear operator on the dual space \(\mathcal{H}^*\) defined by the identity \((A^T \eta) (\varphi) \equiv \eta (A \varphi)\) where \(\eta \in \mathcal{H}^*\) and \(\varphi \in \mathcal{H}.\)
We briefly recall also the classical optimal transport problem. If \(\mu\) and \(\nu\) are Borel probability measures on a complete and separable metric space \((\mathcal{X},d)\) representing the capacity of production and intensity of consumption of the goods to be transported, respectively, and \(c: \mathcal{X}\times \mathcal{X}\rightarrow \mathbb{R}\) is a non-negative lower semicontinuous function representing the transport cost in the sense that \(c(x,y)\) is the cost of transporting one unit of goods from \(x\) to \(y,\) then finding the optimal (that is, cheapest) transport plan is mathematically formalized as follows: \[\label{eq:cl-ot-prob} \text{minimize } \pi \mapsto \iint_{\mathcal{X}\times \mathcal{X}} c(x,y) \mathrm{d}\pi(x,y)\tag{1}\] where \(\pi\) runs over all possible couplings of \(\mu\) and \(\nu.\) A measure \(\pi \in \mathrm{Prob}(\mathcal{X}\times \mathcal{X})\) is called a coupling of \(\mu\) and \(\nu\) (in notation: \(\pi \in \mathcal{C}(\mu,\nu)\)) if the marginals of \(\pi\) are \(\mu\) and \(\nu,\) that is, \(\iint_{\mathcal{X}\times \mathcal{X}} f(x) \mathrm{d}\pi(x,y)=\int_{\mathcal{X}} f(x) \mathrm{d}\mu(x)\) and \(\iint_{\mathcal{X}\times \mathcal{X}} g(y) \mathrm{d}\pi(x,y)=\int_{\mathcal{X}} g(y) \mathrm{d}\nu(y)\) for all continuous and bounded functions \(f,g \in C_b(\mathcal{X}).\) A consequence of the tightness (that is, sequential compactness in the weak topology) of \(\mathcal{C}(\mu, \nu)\) and the lower-semicontinuity of \(c\) is that there is a coupling (in other words: transport plan) \(\pi_0 \in \mathcal{C}(\mu, \nu)\) that minimizes 1 , see, e.g., [22]. If the cost function is the power of order \(p\) of the distance, that is, \(c(x,y)=d(x,y)^p,\) then optimal transport plans determine a genuine distance called \(p\)-Wasserstein distance and denoted by \(d_{\mathcal{W}_p}\) on probability measures: \[\begin{align} \label{eq:classical-p-Wass-def} d_{\mathcal{W}_p}(\mu,\nu)=\left( \inf_{\pi \in \mathcal{C}(\mu,\nu)} \left\{ \iint_{\mathcal{X}\times \mathcal{X}} d^p(x,y) \mathrm{d}\pi(x,y) \right\} \right)^{\frac{1}{p}}. \end{align}\tag{2}\]
An influential work of De Palma and Trevisan introduced a quantum mechanical counterpart of the classical optimal transport problem with quadratic cost, and also quadratic Wasserstein distances induced by optimal solutions of these transport problems [57]. A key idea of this quantum optimal transport concept is that the transport between quantum states is realized by quantum channels [57], [58]. A brief summary of their approach reads as follows. The inputs of the transport problem are the initial and final states \(\rho, \omega \in \mathcal{S}\left( \mathcal{H} \right),\) where \(\mathcal{H}\) is a separable Hilbert space, and a finite collection of observable quantities \(\mathcal{A}=\left\{ A_1, \dots, A_K \right\}\) where \(A_k \in \mathcal{L}(\mathcal{H})^{sa}\) for all \(k.\) The transport plans between \(\rho\) and \(\omega\) are quantum channels \(\Phi: \mathcal{T}_1\left( \mathrm{supp}(\rho) \right) \to \mathcal{T}_1(\mathcal{H})\) sending \(\rho\) to \(\omega,\) and a transport plan \(\Phi\) gives rise to the quantum coupling \(\Pi_{\Phi}\) the following way:
\[\begin{align} \label{eq:Pi-phi-def} \Pi_{\Phi}=\left( \Phi \otimes \mathrm{id}_{\mathcal{T}_1\left( \mathcal{H}^* \right)} \right) \left( || \sqrt{\rho} \rangle\rangle\langle\langle \sqrt{\rho} || \right), \end{align}\tag{3}\] where \(|| \sqrt{\rho} \rangle\rangle\langle\langle \sqrt{\rho} || \in \mathcal{S}\left( \mathcal{H}\otimes \mathcal{H}^* \right)\) is the canonical purification [74] of the state \(\rho \in \mathcal{S}\left( \mathcal{H} \right).\) Here, and in the sequel, we use the canonical linear isomorphism between \(\mathcal{T}_2(\mathcal{H})\) and \(\mathcal{H}\otimes \mathcal{H}^*\) which is the linear extension of the map \[\begin{align} \label{eq:canon-isom} \psi \otimes \eta \mapsto | \psi \rangle\circ \eta \qquad \left( \psi \in \mathcal{H}, \, \eta \in \mathcal{H}^* \right). \end{align}\tag{4}\] Accordingly, for an \(X \in \mathcal{T}_2(\mathcal{H}),\) the symbol \(|| X \rangle\rangle\) denotes the map \(\mathbb{C}\ni z \mapsto z X \in \mathcal{T}_2(\mathcal{H}) \simeq \mathcal{H}\otimes \mathcal{H}^*,\) while \(\langle\langle X ||\) stands for the map \(\mathcal{T}_2(\mathcal{H}) \ni Y \mapsto \mathrm{tr}_{\mathcal{H}}\left[ X^* Y \right],\) where \(X^*\) is the adjoint of \(X.\) It is easy to check that \(\Pi_{\Phi}\) defined in 3 is a state on \(\mathcal{H}\otimes \mathcal{H}^*\) such that its first marginal is \(\omega\) while the second marginal is \(\rho^T,\) that is, \[\begin{align} \mathrm{tr}_{\mathcal{H}^*}\left[ \Pi_{\Phi} \right]=\omega \text{ and } \mathrm{tr}_{\mathcal{H}}\left[ \Pi_{\Phi} \right]=\rho^T. \end{align}\] Therefore, the set of all quantum couplings of the states \(\rho,\omega \in \mathcal{S}\left( \mathcal{H} \right)\) (denoted by \(\mathcal{C}(\rho, \omega)\)) was defined in [57] by \[\label{eq:q-coup-def} \mathcal{C}\left( \rho, \omega \right)=\left\{ \Pi \in \mathcal{S}\left( \mathcal{H}\otimes \mathcal{H}^* \right) \, \middle| \, \mathrm{tr}_{\mathcal{H}^*} [\Pi]=\omega, \, \mathrm{tr}_{\mathcal{H}} [\Pi]=\rho^T \right\}.\tag{5}\] In other words, and this rephrasing will prove useful when formalizing the dual transport problems, a coupling of \(\rho\) and \(\omega\) is a state \(\Pi\) on \(\mathcal{H}\otimes \mathcal{H}^*\) such that \[\label{eq:part-trace-def} \mathrm{tr}_{\mathcal{H}\otimes \mathcal{H}^*}[\left( A\otimes I_{\mathcal{H}}^T \right) \Pi]=\mathrm{tr}_{\mathcal{H}} [\omega A] \text{ and } \mathrm{tr}_{\mathcal{H}\otimes \mathcal{H}^*}\left[ \left( I_{\mathcal{H}} \otimes B^{T} \right) \Pi \right]=\mathrm{tr}_{\mathcal{H}^*} [\rho^T B^T]=\mathrm{tr}_{\mathcal{H}} [\rho B]\tag{6}\] for all bounded \(A, B \in \mathcal{L}(\mathcal{H})^{sa}.\) The analogy of the above definition of quantum couplings with the classical notion of couplings recapped below equation 1 is clear, and we note that \(\mathcal{C}\left( \rho,\omega \right)\) is never empty, because the trivial coupling \(\omega\otimes\rho^T\) belongs to \(\mathcal{C}\left( \rho,\omega \right)\).
The definition of couplings 5 proposed by De Palma and Trevisan [57] is different from the definition proposed by Golse, Mouhot, Paul [51] in the sense that it involves the dual Hilbert space \(\mathcal{H}^*\) and hence the transpose operation. For a clarification of this difference, see Remark 1 in [57] while for more detail on the latter concept of quantum couplings, the interested reader should consult [49]–[56], [75].
The goal of this section is to formalize a linear relaxation of a non-linear primal quantum optimal transport problem that we proposed in [72], and to propose a corresponding dual problem for which we can prove strong Kantorovich duality following the approach of Caglioti, Golse, and Paul [50], [55], which is explained also in [73].
In [72] we considered the following quantum mechanical optimal transport problem: let \(\mathcal{H}\) be a separable Hilbert space, \(\mathcal{A}=\left\{ A_1, \dots, A_K \right\}\) a finite collection of observables on \(\mathcal{H},\) and let \(c: \mathbb{R}^K \times \mathbb{R}^K \to \mathbb{R}\) be a non-negative, lower semicontinuous classical cost function. The positive and possibly unbounded self-adjoint cost operator \(C_{c}^{(\mathcal{A})}\) acting on a dense subspace of \(\left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K}\) is defined by \[\begin{align} \label{eq:C-c-A-def} C_{c}^{(\mathcal{A})} =\iint_{\mathbb{R}^K \times \mathbb{R}^K} c\left( x_1, \dots, x_K, y_1, \dots, y_K \right) \mathrm{d}E_1(y_1) \otimes \mathrm{d}E_1^T (x_1) \otimes \cdots \otimes \mathrm{d}E_K(y_K) \otimes \mathrm{d}E_K^T (x_K), \end{align}\tag{7}\] where \(E_k\) is the spectral measure of \(A_K,\) that is, \(A_k=\int_\mathbb{R}\lambda \mathrm{d}E_k(\lambda)\) for \(k \in \left\{ 1, \dots, K \right\}.\) The transport problem is to \[\begin{align} \label{eq:IID-primal-problem} \text{minimize } \mathrm{tr}_{\left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K}}\left[ \Pi^{\otimes K} C_{c}^{(\mathcal{A})} \right] \end{align}\tag{8}\] where \(\Pi\) runs over the set of all couplings of \(\rho, \omega \in \mathcal{S}\left( \mathcal{H} \right),\) that is, \[\begin{align} \label{eq:IID-primal-constraint} \Pi \in \mathcal{C}(\rho,\omega)=\left\{ \Pi \in \mathcal{S}\left( \mathcal{H}\otimes \mathcal{H}^* \right) \, \middle| \, \mathrm{tr}_{\mathcal{H}^*} [\Pi]=\omega, \, \mathrm{tr}_{\mathcal{H}} [\Pi]=\rho^T \right\}. \end{align}\tag{9}\]
It is important to note that the loss function \(\mathrm{tr}_{\left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K}}\left[ \Pi^{\otimes K} C_{c}^{(\mathcal{A})} \right]\) in the primal problem 8 is non-linear in its variable \(\Pi \in \mathcal{C}\left( \rho,\omega \right).\) However, there is a natural linear relaxation which is described the following way.
Problem 1. Let the initial and final states \(\rho, \omega \in \mathcal{S}\left( \mathcal{H} \right)\) and the cost operator \(C_c^{(\mathcal{A})}\) acting on \(\left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K}\) and defined by 7 be given. The optimization task is to \[\begin{align} \label{eq:linear-primal-problem} \text{minimize } \mathrm{tr}_{\left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K}}\left[ \Gamma C_c^{\mathcal{A}} \right] \end{align}\qquad{(1)}\] subject to the constraints \[\begin{align} \label{eq:linear-primal-constraint} \Gamma \in \mathcal{S}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right) , \, \left( \Gamma \right)_{2k-1}=\omega, \, \left( \Gamma \right)_{2k}=\rho^T \text{ for all } k\in \left\{ 1, \dots, K \right\}, \end{align}\qquad{(2)}\] where \[\begin{align} \label{eq:Gamma-2k-1-def} \left( \Gamma \right)_{2k-1} =\mathrm{tr}_{1,\dots, 2k-2,2k,\dots,2K}\left[ \Gamma \right] =\mathrm{tr}_{\left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes(k-1)} \otimes \mathcal{H}^* \otimes \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes(K-k)}}\left[ \Gamma \right], \end{align}\qquad{(3)}\] and \[\begin{align} \label{eq:Gamma-2k-def} \left( \Gamma \right)_{2k} =\mathrm{tr}_{1,\dots, 2k-1,2k+1,\dots,2K}\left[ \Gamma \right] =\mathrm{tr}_{\left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes(k-1)} \otimes \mathcal{H}\otimes \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes(K-k)}}\left[ \Gamma \right]. \end{align}\qquad{(4)}\]
Note that \(\Pi^{\otimes K}\) satisfies the constraint ?? whenever \(\Pi \in \mathcal{C}(\rho, \omega)\) (defined in 5 ), and therefore, the infimum of 8 is lower bounded by the infimum of ?? . The difference between the non-linear problem 8 and its linear relaxation (Problem 1) is that the couplings of \(\rho\) and \(\omega\) acting on different subsystems are required to be independent in the former version while they may have correlations in the latter version. We will present an explicit example in the sequel (see Proposition 2) which demonstrates that minimum of ?? can be strictly smaller than that of 8 .
It is instructive to consider the case when the transport cost factorizes, that is, \[\begin{align} \label{eq:tr-cost-factorizes} c(x_1, \dots, x_K, y_1, \dots, y_K)=f_1\left( x_1,y_1 \right)+ \dots, +f_K\left( x_K,y_K \right). \end{align}\tag{10}\] In this case, the cost operator \(C_c^{(\mathcal{A})}\) defined in 7 has the simpler form \[\begin{align} C_c^{(\mathcal{A})}=\sum_{k=1}^K I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes(k-1)} \otimes \left( \iint_{\mathbb{R}\times \mathbb{R}} f_k\left( x_k,y_k \right) \mathrm{d}E_k(y_k) \otimes \mathrm{d}E_k^T (x_k) \right) \otimes I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes(K-k)}. \end{align}\] Therefore, introducing the shorthand \(C_k:=\iint_{\mathbb{R}\times \mathbb{R}} f_k\left( x_k,y_k \right) \mathrm{d}E_k(y_k) \otimes \mathrm{d}E_k^T (x_k),\) for any \(\Gamma \in \mathcal{S}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right)\) one gets \[\begin{align} \mathrm{tr}_{\left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K}}\left[ \Gamma C_c^{\mathcal{A}} \right] =\sum_{k=1}^K \mathrm{tr}_{\mathcal{H}\otimes \mathcal{H}^*}\left[ \left( \Gamma \right)_{(2k-1,2k)} C_k \right], \end{align}\] where the marginals \(\left( \Gamma \right)_{(2k-1,2k)}\) are defined similarly as in Problem 1, that is, \[\left( \Gamma \right)_{(2k-1,2k)}=\mathrm{tr}_{1,\dots, 2k-2, 2k+1, \dots, 2K}\left[ \Gamma \right].\] Consequently, the linearized primal problem (Problem 1) reduces to the following: \[\begin{align} \label{eq:linear-factorized} \text{minimize } \sum_{k=1}^K \mathrm{tr}_{\mathcal{H}\otimes \mathcal{H}^*} \left[ \Pi_k C_k \right] \end{align}\tag{11}\] under the constraints \[\begin{align} \Pi_1, \dots, \Pi_K \in \mathcal{C}(\rho, \omega). \end{align}\] On the contrary, the non-linear primal problem 8 proposed in [72] reduces to \[\begin{align} \label{eq:primal-factorized} \text{minimize }\mathrm{tr}_{\mathcal{H}\otimes \mathcal{H}^*} \left[ \Pi \left( \sum_{k=1}^K C_k \right) \right] \text{ subject to } \Pi \in \mathcal{C}(\rho, \omega), \end{align}\tag{12}\] as noted in [72] for the special case \(c(x_1, \dots, x_K, y_1, \dots, y_K)=\sum_{k=1}^p\left| x_k-y_k \right|^p.\) Note that if the transport cost factorizes in the sense of 10 , then the a priori non-linear loss function of the primal problem 8 becomes linear, as clearly shown by 12 .
The classical dual problem for the optimal transportation problem 1 on the complete and separable metric space \(\mathcal{X}\) is to \[\begin{align} \label{eq:cl-dual-prob} \text{maximize } \int_{\mathcal{X}} \psi(y) \mathrm{d}\nu(y) +\int_{\mathcal{X}} \varphi(x) \mathrm{d}\mu(x) \end{align}\tag{13}\] subject to the constraint \[\begin{align} \label{eq:cl-dual-const} \psi(y)+\varphi(x) \leq c(x,y) \end{align}\tag{14}\] for all \(x,y \in \mathcal{X},\) and the classical Kantorovich duality asserts that \[\begin{align} \label{eq:class-Kant-dual} \sup \left\{ \int_{\mathcal{X}} \psi(y) \mathrm{d}\nu(y) +\int_{\mathcal{X}} \varphi(x) \mathrm{d}\mu(x) \, \middle| \, \psi(y)+\varphi(x) \leq c(x,y) \right\}= \nonumber \\ = \min \left\{ \iint_{\mathcal{X}\times \mathcal{X}} c(x,y) \mathrm{d}\pi(x,y) \, \middle| \, \pi \in \mathcal{C}(\mu, \nu) \right\}, \end{align}\tag{15}\] see, e.g., Theorem 1.3. in [21]. In view of 13 and 14 , a natural generalization of the classical dual problem to our quantum setting is the following.
Problem 2. Let the initial and final states \(\rho, \omega \in \mathcal{S}\left( \mathcal{H} \right)\) and the cost operator \(C_c^{(\mathcal{A})}\) defined by 7 and acting on \(\left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K}\) be given. The optimization task is to \[\begin{align} \label{eq:linear-k-partite-dual-problem} \text{maximize } \sum_{k=1}^K \left( \mathrm{tr}_{\mathcal{H}}\left[ \omega Y_k \right] + \mathrm{tr}_{\mathcal{H}}\left[ \rho X_k \right] \right) \end{align}\qquad{(5)}\] subject to the constraint \[\begin{align} \label{eq:linear-k-partite-dual-constraint} \sum_{k=1}^K I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (k-1)} \otimes \left( Y_k \otimes I_{\mathcal{H}^*}+I_{\mathcal{H}} \otimes X_k^T \right) \otimes I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (K-k)} \leq C_c^{\mathcal{A}}. \end{align}\qquad{(6)}\]
It turns out that the above proposed Problem 2 is indeed the strong Kantorovich dual of Problem 1. The precise statement is formalized in the following theorem, which is the main result of this section.
Theorem 1. Let \(\mathcal{A}=\left\{ A_1,\dots,A_K \right\}\) be a finite collection of observables on a separable Hilbert space \(\mathcal{H},\) let the cost operator \(C_c^{(\mathcal{A})}\) be defined as in 7 , and let \(\rho, \omega \in \mathcal{S}\left( \mathcal{H} \right).\) Then \[\begin{align} \label{eq:duality-nr-1} \sup \left\{ \sum_{k=1}^K \left( \mathrm{tr}_{\mathcal{H}}\left[ \omega Y_k \right] + \mathrm{tr}_{\mathcal{H}}\left[ \rho X_k \right] \right) \, \middle| \, \sum_{k=1}^K I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (k-1)} \otimes \left( Y_k \otimes I_{\mathcal{H}^*}+I_{\mathcal{H}} \otimes X_k^T \right) \otimes I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (K-k)} \leq C_c^{\mathcal{A}} \right\} = \nonumber \\ =\min \left\{ \mathrm{tr}_{\left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K}}\left[ \Gamma C_c^{(\mathcal{A})} \right] \, \middle| \, \Gamma \in \mathcal{S}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right) , \, \left( \Gamma \right)_{2k-1}=\omega, \, \left( \Gamma \right)_{2k}=\rho^T \text{ for all } k\in \left\{ 1, \dots, K \right\} \right\}, \end{align}\qquad{(7)}\] where the variables \(X_1, Y_1, \dots, X_K, Y_K\) to be optimized are self-adjoint and bounded operators on \(\mathcal{H},\) and the marginals \(\left( \Gamma \right)_{2k-1}\) and \(\left( \Gamma \right)_{2k}\) are the ones defined in ?? and ?? .
Proof. Let us define the functional \(\Theta: \mathcal{B}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right)^{sa} \rightarrow (-\infty,+\infty]\) by \[\begin{align} \label{eq:IID-Theta-def} \Theta(U):=\begin{cases} 0 & \text{if } U \geq -C_c^{\mathcal{A}} \\ + \infty & \text{else.} \end{cases} \end{align}\tag{16}\] Here, the inequality \(U \geq -C_c^{\mathcal{A}}\) is to be understood in the Löwner sense, that is, \(\langle x |U| x \rangle \geq -\langle x |C_c^{\mathcal{A}}| x \rangle\) for all \(x \in \mathrm{dom}\left( C_c^{\mathcal{A}} \right).\) The constraint \(U \geq -C_c^{\mathcal{A}}\) defines a convex domain in \(\mathcal{B}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right)^{sa},\) and hence the functional \(\Theta\) defined by 16 is convex. Furthermore, we define the functional \(\Xi: \mathcal{B}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right)^{sa} \rightarrow (-\infty,+\infty]\) by \[\begin{align} \label{eq:IID-Xi-def} \Xi(U):=\begin{cases} \sum_{k=1}^K \left( \mathrm{tr}_{\mathcal{H}}\left[ \omega Y_k \right] + \mathrm{tr}_{\mathcal{H}}\left[ \rho X_k \right] \right) & \text{if } U= \sum_{k=1}^K I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (k-1)} \otimes \left( Y_k \otimes I_{\mathcal{H}^*}+I_{\mathcal{H}} \otimes X_k^T \right) \otimes I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (K-k)} \\ + \infty & \text{else,} \end{cases} \end{align}\tag{17}\] where \(X_1,Y_1, \dots,X_K, Y_K \in \mathcal{B}(\mathcal{H})^{sa}.\) It is important to note that the domain of \(\Xi,\) that is, the region where it takes finite values, is convex. Indeed, it is a direct sum of linear subspaces of \(\mathcal{B}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right)^{sa},\) namely, \[\begin{align} \label{eq:IID-Xi-domain-char} \mathrm{domain}(\Xi)=\bigoplus_{k=1}^K\left( I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (k-1)} \otimes \mathcal{B}(\mathcal{H})^{sa}\otimes I_{\mathcal{H}}^T\otimes I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (K-k)} \oplus I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (k-1)} \otimes I_{\mathcal{H}} \otimes \mathcal{B}\left( \mathcal{H}^* \right)^{sa}\otimes I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (K-k)} \right). \end{align}\tag{18}\] Recall that the Legendre-Fenchel transform \(\Omega^*\) of a convex function \(\Omega\) defined on the real normed vector space \(\mathcal{B}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right)^{sa}\) equipped with the operator norm topology is defined by \[\begin{align} \label{eq:IID-LF-transform-def} \Omega^*\left( \widetilde{\Gamma} \right):=\sup \left\{ \widetilde{\Gamma}(U)-\Omega(U) \, \middle| \, U \in \mathcal{B}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right)^{sa} \right\} \end{align}\tag{19}\] for all \(\widetilde{\Gamma} \in \left( \mathcal{B}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right)^{sa} \right)^*.\) The famous Fenchel-Rockafellar duality theorem [76] asserts that \[\begin{align} \label{eq:IID-LF-dualiy} \inf\left\{ \Theta(U)+\Xi(U) \, \middle| \, U \in \mathcal{B}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right)^{sa} \right\} =\max \left\{ -\Theta^*\left( -\widetilde{\Gamma} \right)-\Xi^*\left( \widetilde{\Gamma} \right) \, \middle| \, \widetilde{\Gamma} \in \left( \mathcal{B}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right)^{sa} \right)^* \right\} \end{align}\tag{20}\] whenever there exists a \(U_0 \in \mathcal{B}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right)^{sa}\) such that both \(\Theta(U_0)\) and \(\Xi(U_0)\) are finite, and \(\Theta\) is continuous at \(U_0.\) Clearly, \(U_0:=I_{\left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K}}\) does the job. Indeed, by 16 , we have \(\Theta\left( I_{\left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K}} \right)=0,\) and by 17 we get \(\Xi\left( I_{\left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K}} \right)=1.\) Moreover, \(I_{\left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K}}\) lies in the interior of the cone of positive semidefinite operators in the operator norm topology on \(\mathcal{B}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right)^{sa}.\) Recall that the classical cost function \(c\) is nonnegative, and hence the induced cost operator \(C_c^{\mathcal{A}}\) defined in 7 is positive semidefinite. Therefore, \(U\geq -C_c^{\mathcal{A}}\) holds for any positive semidefinite \(U \in \mathcal{B}(\mathcal{H}\otimes \mathcal{H}^*)^{sa},\) and hence there is an open neighborhood of \(I_{\left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K}}\) where \(\Theta\) vanishes. Consequently, \(\Theta\) is continuous in \(I_{\left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K}}.\)
By the definition of the Legendre-Fenchel transform 19 and straightforward steps, we can compute \(\Theta^*\) as follows: \[\begin{align} \label{eq:IID-Theta-star-computation} \Theta^*\left( -\widetilde{\Gamma} \right) = \sup \left\{ -\widetilde{\Gamma}(U)- \Theta(U) \middle| \, U \in \mathcal{B}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right)^{sa} \right\} = \sup_{U \, : \, U \geq -C_c^{\mathcal{A}}} \left\{ -\widetilde{\Gamma}(U) \right\}= \\ =-\inf_{U \, : \, U \geq -C_c^{\mathcal{A}}} \left\{ \widetilde{\Gamma}(U) \right\} =\begin{cases} \widetilde{\Gamma}\left( C_c^{\mathcal{A}} \right), & \text{if } \widetilde{\Gamma} \geq 0, \\ +\infty, & \text{else.} \end{cases} \end{align}\tag{21}\] Indeed, if \(\widetilde{\Gamma} \geq 0,\) that is, \(\widetilde{\Gamma}(R)\geq 0\) for all positive semi-definite \(R \in \mathcal{B}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right)^{sa},\) then \[\begin{align} -\inf_{U \, : \, U \geq -C_c^{\mathcal{A}}} \left\{ \widetilde{\Gamma}(U) \right\} =-\inf_{ S \geq 0, \, S \in \mathcal{B}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right)^{sa}} \left\{ \widetilde{\Gamma}\left( -C_c^{(\mathcal{A})} \right)+\widetilde{\Gamma}(S) \right\}=-\widetilde{\Gamma}\left( -C_c^{\mathcal{A}} \right)=-\widetilde{\Gamma}\left( -C_c^{\mathcal{A}} \right). \end{align}\] On the other hand, if \(\widetilde{\Gamma} \ngeq 0,\) that is, \(\widetilde{\Gamma}(R)<0\) for some \(R\geq 0,\) then \(\widetilde{\Gamma}\left( -C_c^{\mathcal{A}}+tR \right)\) tends to \(-\infty\) as \(t\) tends to \(+\infty,\) and hence \(\inf_{U \, : \, U \geq -C_c^{\mathcal{A}}}\left\{ \widetilde{\Gamma}(U) \right\}=-\infty.\) As for the convex conjugate of \(\Xi,\) one gets \[\begin{align} \Xi^*\left( \widetilde{\Gamma} \right) = \sup \left\{ \widetilde{\Gamma}(U) - \Xi(U) \, \middle| \, U \in \mathcal{B}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right)^{sa} \right\} = \nonumber \\ =\sup \left\{ \widetilde{\Gamma}(U)-\left( \sum_{k=1}^K \left( \mathrm{tr}_{\mathcal{H}}\left[ \omega Y_k \right] + \mathrm{tr}_{\mathcal{H}}\left[ \rho X_k \right] \right) \right) \, \middle| \, U= \sum_{k=1}^K I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (k-1)} \otimes \left( Y_k \otimes I_{\mathcal{H}^*}+I_{\mathcal{H}} \otimes X_k^T \right) \otimes I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (K-k)} \right\} \nonumber \end{align}\] \[\begin{align} =\sup_{X_1,Y_1,\dots,X_K,Y_k \in \mathcal{B}(\mathcal{H})^{sa}}\left\{ \widetilde{\Gamma}\left( \sum_{k=1}^K I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (k-1)} \otimes \left( Y_k \otimes I_{\mathcal{H}^*}+I_{\mathcal{H}} \otimes X_k^T \right) \otimes I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (K-k)} \right)-\left( \sum_{k=1}^K \left( \mathrm{tr}_{\mathcal{H}}\left[ \omega Y_k \right] + \mathrm{tr}_{\mathcal{H}}\left[ \rho X_k \right] \right) \right) \right\} \nonumber \end{align}\] \[\begin{align} \label{eq:IID-Xi-star-computation} =\begin{cases} 0, & \text{if } \widetilde{\Gamma}\left( \sum_{k=1}^K I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (k-1)} \otimes \left( Y_k \otimes I_{\mathcal{H}^*}+I_{\mathcal{H}} \otimes X_k^T \right) \otimes I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (K-k)} \right)=\sum_{k=1}^K \left( \mathrm{tr}_{\mathcal{H}}\left[ \omega Y_k \right] + \mathrm{tr}_{\mathcal{H}}\left[ \rho X_k \right] \right) \\ +\infty, & \text{else,} \end{cases} \end{align}\tag{22}\] where the condition in the first line of 22 means that the equation holds for all \(X_1, Y_1, \dots, X_k, Y_K \in \mathcal{B}(\mathcal{H})^{sa}.\)
On one hand, the left-hand side of 20 can be written as \[\begin{align} \label{eq:IID-LHS-of-LF-duality} \inf\left\{ \Theta(U)+\Xi(U) \, \middle| \, U \in \mathcal{B}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right)^{sa} \right\} \nonumber \\ =\inf\left\{ \sum_{k=1}^K \left( \mathrm{tr}_{\mathcal{H}}\left[ \omega Y_k \right] + \mathrm{tr}_{\mathcal{H}}\left[ \rho X_k \right] \right) \, \middle| \, \sum_{k=1}^K I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (k-1)} \otimes \left( Y_k \otimes I_{\mathcal{H}^*}+I_{\mathcal{H}} \otimes X_k^T \right) \otimes I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (K-k)} \geq -C_c^{\mathcal{A}} \right\} \nonumber \end{align}\tag{23}\] \[\begin{align} =\inf\left\{ -\sum_{k=1}^K \left( \mathrm{tr}_{\mathcal{H}}\left[ \omega(-Y_k) \right] + \mathrm{tr}_{\mathcal{H}}\left[ \rho( -X_k) \right] \right) \, \middle| \, \sum_{k=1}^K I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (k-1)} \otimes \left( (-Y_k) \otimes I_{\mathcal{H}^*}+I_{\mathcal{H}} \otimes (-X_k)^T \right) \otimes I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (K-k)} \leq C_c^{\mathcal{A}} \right\} \nonumber \end{align}\] \[\begin{align} =\inf\left\{ -\sum_{k=1}^K \left( \mathrm{tr}_{\mathcal{H}}\left[ \omega Y_k \right] + \mathrm{tr}_{\mathcal{H}}\left[ \rho X_k \right] \right) \, \middle| \, \sum_{k=1}^K I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (k-1)} \otimes \left( Y_k \otimes I_{\mathcal{H}^*}+I_{\mathcal{H}} \otimes X_k^T \right) \otimes I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (K-k)} \leq C_c^{\mathcal{A}} \right\} \nonumber \\ -\sup\left\{ \sum_{k=1}^K \left( \mathrm{tr}_{\mathcal{H}}\left[ \omega Y_k \right] + \mathrm{tr}_{\mathcal{H}}\left[ \rho X_k \right] \right) \, \middle| \, \sum_{k=1}^K I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (k-1)} \otimes \left( Y_k \otimes I_{\mathcal{H}^*}+I_{\mathcal{H}} \otimes X_k^T \right) \otimes I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (K-k)} \leq C_c^{\mathcal{A}} \right\}. \end{align}\] On the other hand, by 21 and 22 , the right-hand side of 20 reads as \[\begin{align} \label{eq:IID-RHS-of-LF-duality} \max \left\{ -\Theta^*\left( -\widetilde{\Gamma} \right)-\Xi^*\left( \widetilde{\Gamma} \right) \, \middle| \, \widetilde{\Gamma} \in \left( \mathcal{B}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right)^{sa} \right)^* \right\} \end{align}\tag{24}\] \[\begin{align} \label{eq:IID-RHS-of-LF-duality-2} &=\max \left\{ -\widetilde{\Gamma}(C_c^{\mathcal{A}}) \, \middle| \, \widetilde{\Gamma} \geq 0, \widetilde{\Gamma}\left( \sum_{k=1}^K I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (k-1)} \otimes \left( Y_k \otimes I_{\mathcal{H}^*}+I_{\mathcal{H}} \otimes X_k^T \right) \otimes I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (K-k)} \right)=\sum_{k=1}^K \left( \mathrm{tr}_{\mathcal{H}}\left[ \omega Y_k \right] + \mathrm{tr}_{\mathcal{H}}\left[ \rho X_k \right] \right) \right\} \nonumber \\ &=-\min \left\{ \widetilde{\Gamma}(C_c^{\mathcal{A}}) \, \middle| \, \widetilde{\Gamma} \geq 0, \widetilde{\Gamma}\left( \sum_{k=1}^K I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (k-1)} \otimes \left( Y_k \otimes I_{\mathcal{H}^*}+I_{\mathcal{H}} \otimes X_k^T \right) \otimes I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (K-k)} \right)=\sum_{k=1}^K \left( \mathrm{tr}_{\mathcal{H}}\left[ \omega Y_k \right] + \mathrm{tr}_{\mathcal{H}}\left[ \rho X_k \right] \right) \right\}. \end{align}\tag{25}\] Consequently, \[\begin{align} \label{eq:IID-KD-final} \sup\left\{ \sum_{k=1}^K \left( \mathrm{tr}_{\mathcal{H}}\left[ \omega Y_k \right] + \mathrm{tr}_{\mathcal{H}}\left[ \rho X_k \right] \right) \, \middle| \, \sum_{k=1}^K I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (k-1)} \otimes \left( Y_k \otimes I_{\mathcal{H}^*}+I_{\mathcal{H}} \otimes X_k^T \right) \otimes I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (K-k)} \leq C_c^{\mathcal{A}} \right\} = \nonumber \\ =\min \left\{ \widetilde{\Gamma}(C_c^{\mathcal{A}}) \, \middle| \, \widetilde{\Gamma} \geq 0, \widetilde{\Gamma}\left( \sum_{k=1}^K I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (k-1)} \otimes \left( Y_k \otimes I_{\mathcal{H}^*}+I_{\mathcal{H}} \otimes X_k^T \right) \otimes I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (K-k)} \right)=\sum_{k=1}^K \left( \mathrm{tr}_{\mathcal{H}}\left[ \omega Y_k \right] + \mathrm{tr}_{\mathcal{H}}\left[ \rho X_k \right] \right) \right\}. \end{align}\tag{26}\] It can be shown very similarly to the proof of [50] that for any functional \(\widetilde{\Gamma}\) satisfying the conditions described on the right-hand side of 26 there exists a positive trace-class operator \(\Gamma \in \mathcal{T}_1\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right)\) such that \(\widetilde{\Gamma}(U)=\mathrm{tr}_{\left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K}}\left[ \Gamma U \right]\) for all \(U \in \mathcal{B}\left( \left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K} \right)^{sa}.\) The requirement that
\[\begin{align} \label{eq:partial-trace-requirement} \mathrm{tr}_{\left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K}}\left[ \Gamma \left( \sum_{k=1}^K I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (k-1)} \otimes \left( Y_k \otimes I_{\mathcal{H}^*}+I_{\mathcal{H}} \otimes X_k^T \right) \otimes I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (K-k)} \right) \right]=\sum_{k=1}^K \left( \mathrm{tr}_{\mathcal{H}}\left[ \omega Y_k \right] + \mathrm{tr}_{\mathcal{H}}\left[ \rho X_k \right] \right) \end{align}\tag{27}\] holds for all \(X_1, \dots, X_K, Y_1, \dots, Y_K \in \mathcal{B}(\mathcal{H})^{sa}\) is clearly equivalent to the condition \[\begin{align} \left( \Gamma \right)_{2k-1}=\omega, \, \left( \Gamma \right)_{2k}=\rho^T \text{ for all } k\in \left\{ 1, \dots, K \right\}, \end{align}\] and hence 26 can be written as
\[\begin{align} \label{eq:IID-KD-final-final} \sup\left\{ \sum_{k=1}^K \left( \mathrm{tr}_{\mathcal{H}}\left[ \omega Y_k \right] + \mathrm{tr}_{\mathcal{H}}\left[ \rho X_k \right] \right) \, \middle| \, \sum_{k=1}^K I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (k-1)} \otimes \left( Y_k \otimes I_{\mathcal{H}^*}+I_{\mathcal{H}} \otimes X_k^T \right) \otimes I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (K-k)} \leq C_c^{\mathcal{A}} \right\} = \nonumber \\ =\min \left\{ \mathrm{tr}_{\left( \mathcal{H}\otimes \mathcal{H}^* \right)^{\otimes K}}\left[ \Gamma C_c^{(\mathcal{A})} \right] \, \middle| \, \Gamma \geq 0, \, \left( \Gamma \right)_{2k-1}=\omega, \, \left( \Gamma \right)_{2k}=\rho^T \text{ for all } k\in \left\{ 1, \dots, K \right\} \right\}, \end{align}\tag{28}\] as desired. ◻
We noted before that in the case of factorizing transport cost (see eq. 10 ), the primal task 8 reduces to the linear problem \[\begin{align} \label{eq:primal-factorized-2} \text{minimize } \Pi \mapsto \mathrm{tr}_{\mathcal{H}\otimes \mathcal{H}^*} \left[ \Pi \left( \sum_{k=1}^K \iint_{\mathbb{R}\times \mathbb{R}} f_k\left( x_k,y_k \right) \mathrm{d}E_k(y_k) \otimes \mathrm{d}E_k^T (x_k) \right) \right] \text{ over } \mathcal{C}\left( \rho,\omega \right). \end{align}\tag{29}\] Let us consider the special case \(K=1\) in Theorem 1, and let us replace the cost operator \(C_c^{(\mathcal{A})}\) there by \(C_{fac},\) where \(C_{fac}\) is the shorthand for \(\sum_{k=1}^K \iint_{\mathbb{R}\times \mathbb{R}} f_k\left( x_k,y_k \right) \mathrm{d}E_k(y_k) \otimes \mathrm{d}E_k^T (x_k).\) Observe that the concrete form of the cost operator \(C_c^{(\mathcal{A})}\) does not play any role in the proof of Theorem 1, the cost operator can be replaced by any self-adjoint operator. Consequently, the proof of Theorem 1 shows that one gets strong Kantorovich duality also for the primal problem 29 , which we formalize in the following corollary.
Corollary 1. Assume that the transport cost factorizes in the sense of 10 . In this case, the primal problem 8 admits a strong Kantorovich dual problem, which is to maximize \(\mathrm{tr}_{\mathcal{H}}\left[ \omega Y \right]+\mathrm{tr}_{\mathcal{H}}\left[ \rho X \right]\) under the constraint \(Y \otimes I_{\mathcal{H}}^T +I_{\mathcal{H}} \otimes X^T \leq C_{fac}:=\sum_{k=1}^K \iint_{\mathbb{R}\times \mathbb{R}} f_k\left( x_k,y_k \right) \mathrm{d}E_k(y_k) \otimes \mathrm{d}E_k^T (x_k).\) That is, \[\begin{align} \sup \left\{ \mathrm{tr}_{\mathcal{H}}\left[ \omega Y \right]+\mathrm{tr}_{\mathcal{H}}\left[ \rho X \right] \, \middle| \, Y \otimes I_{\mathcal{H}}^T +I_{\mathcal{H}} \otimes X^T \leq C_{fac} \right\} =\min \left\{ \mathrm{tr}_{\mathcal{H}\otimes \mathcal{H}^*} \left[ \Pi C_{fac} \right] \, \middle| \, \Pi \in \mathcal{C}\left( \rho, \omega \right) \right\}, \end{align}\] where the variables \(X\) and \(Y\) to be optimized are self-adjoint and bounded operators on \(\mathcal{H}.\)
In [72] we considered also the following quantum mechanical optimal transport problem: let \(\mathcal{H}:=L^2(\mathbb{R}^K) \simeq L^2(\mathbb{R})^{\otimes K},\) and let \(c: \mathbb{R}^K \times \mathbb{R}^K \to [0,\infty)\) be a non-negative lower semi-continuous classical cost function. Let \(E: \mathcal{B}(\mathbb{R})\to \mathcal{P}(L^2(\mathbb{R}))\) be the spectral measure of the position operator \(Q\) acting on \(L^2(\mathbb{R}),\) that is, \(E(S)=M_{\chi_S},\) where \(\chi_S\) is the characteristic function of \(S\) and \(M_f\) is the multiplication by \(f\) given by \((M_f \psi)(x)=f(x)\psi(x).\)
The cost operator \(C_c \in \mathrm{Lin}\left( L^2(\mathbb{R}^K) \otimes (L^2(\mathbb{R}^K))^* \right)\) corresponding to the classical cost \(c\) is defined by Borel functional calculus the following way: \[\begin{align} \label{eq:pos-cost-op-def} C_c=\iint_{\mathbb{R}^K \times \mathbb{R}^K} c(x_1, \dots, x_K, y_1, \dots, y_K) \mathrm{d}E(y_1) \otimes \dots \otimes \mathrm{d}E(y_K) \otimes \mathrm{d}E(x_1)^T \otimes \dots \otimes \mathrm{d}E(x_K)^T. \end{align}\tag{30}\] Note that \(C_c\) is unbounded if \(c\) is so. Let \(\rho\) and \(\omega\) be states on \(L^2(\mathbb{R}^K).\) The optimization task is to \[\begin{align} \label{eq:pos-primal-task} \text{minimize } \mathrm{tr}_{L^2(\mathbb{R}^K) \otimes (L^2(\mathbb{R}^K))^*}\left[ \Pi C_c \right] \end{align}\tag{31}\] under the constraints \[\begin{align} \label{eq:pos-primal-constraints} \Pi \in \mathcal{S}\left( L^2(\mathbb{R}^K) \otimes (L^2(\mathbb{R}^K))^* \right), \, \mathrm{tr}_{(L^2(\mathbb{R}^K))^*}[\Pi]=\omega, \, \mathrm{tr}_{L^2(\mathbb{R}^K)} [\Pi]=\rho^T. \end{align}\tag{32}\]
Just like in the case of Corollary 1, the proof of Theorem 1 with \(K=1\) and with the appropriate cost operator demonstrates that the primal quantum optimal transport problem described in 31 and 32 has a strong dual. We formalize the precise statement in the following corollary.
Corollary 2. Let the cost operator \(C_c \in \mathrm{Lin}\left( L^2\left( \mathbb{R}^K \right)\otimes \left( L^2\left( \mathbb{R}^K \right) \right)^* \right)\) be defined as in 30 , and let \(\rho, \omega \in \mathcal{S}\left( L^2\left( \mathbb{R}^K \right) \right).\) Then \[\begin{align} \label{eq:duality-nr-2} \sup \left\{ \mathrm{tr}_{L^2\left( \mathbb{R}^K \right)}[\omega Y]+\mathrm{tr}_{L^2\left( \mathbb{R}^K \right)}[\rho X] \, \middle| \, X,Y \in \mathcal{B}(\mathcal{H}), \, Y \otimes I^T +I \otimes X^T \leq C_c \right\}= \nonumber \\ =\min \left\{ \Gamma(C_c) \, \middle| \, \Gamma \geq 0, \Gamma(A \otimes I^T + I\otimes B^T)=\mathrm{tr}_{L^2\left( \mathbb{R}^K \right)}[\omega A] + \mathrm{tr}_{L^2\left( \mathbb{R}^K \right)}[\rho B] \text{ for all } A, B \in \mathcal{B}\left( L^2\left( \mathbb{R}^K \right) \right)^{sa} \right\}. \end{align}\qquad{(8)}\]
The following statement demonstrates that the minimum of the primal problem 8 can indeed be larger than the minimum of its linear relaxation ?? .
Proposition 2. There exists \(C_{c}^{(\mathcal{A})}\) defined as in 7 and states \(\rho,\omega \in \mathcal{S}\left( \mathcal{H} \right)\), such that the infimum of the primal problem defined in ?? is strictly smaller than the infimum of the primal problem defined in 8 .
Proof. Let \(\mathcal{H}=\mathbb{C}^2,\) and with the notations introduced at the beginning of this section, let \(K=3,\) and \[c(x_1,x_2,x_3,y_1,y_2,y_3):=\left| x_1-y_1 \right|^p+\left| x_2-y_2 \right|^p+\left| x_3-y_3 \right|^p\] for some parameter \(p \geq 1.\) Let \(\mathcal{A}=\{\sigma_1, \sigma_2, \sigma_3\},\) where \[\begin{align} \label{eq:Pauli-def} \sigma_1=\sigma_x=\left[ \begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array} \right], \, \quad \sigma_2=\sigma_y=\left[ \begin{array}{cc} 0 & -i \\ i & 0 \end{array} \right], \, \quad \sigma_3=\sigma_z=\left[ \begin{array}{cc} 1 & 0 \\ 0 & -1 \end{array} \right], \end{align}\tag{33}\] that is, we set \(\mathcal{A}\) to be the collection of the Pauli matrices. Finally, let \[\begin{align} \label{eq:rho-omega-concrete} \rho:=1/2(I+1/2\sigma_z) \text{ and } \omega:=1/2(I-1/2\sigma_z). \end{align}\tag{34}\] The cost operator \(C_c^{(\mathcal{A})}\) given by 7 factorizes now the following way: \[\begin{align} \label{eq:C-c-A-factorized} C_{c}^{(\mathcal{A})}&=\sum_{k=1}^3 I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (k-1)} \otimes \left| \sigma_k\otimes I^T-I\otimes \sigma_k^T \right|^p \otimes I_{\mathcal{H}\otimes \mathcal{H}^*}^{\otimes (3-k)}. \end{align}\tag{35}\] Thus, on one hand, as we noted in 12 and 29 , the task 8 takes the form \[\begin{align} \text{minimize } \mathrm{tr}_{\mathcal{H}\otimes \mathcal{H}^*} \left[ \Pi \left( \sum_{k=1}^3 \left| \sigma_k\otimes I^T-I\otimes \sigma_k^T \right|^p \right) \right] \end{align}\] where \(\Pi\) runs over the set of all couplings of \(\rho, \omega \in \mathcal{S}\left( \mathcal{H} \right).\) On the other hand, as we noted in 11 , the task ?? takes the form \[\begin{align} \text{minimize } \sum_{k=1}^3\mathrm{tr}_{\mathcal{H}\otimes \mathcal{H}^*} \left[ \Pi_k \left| \sigma_k\otimes I^T-I\otimes \sigma_k^T \right|^p \right] \end{align}\] where all \(\Pi_k\) run over the set of all couplings of \(\rho, \omega \in \mathcal{S}\left( \mathcal{H} \right).\) Taking into account the concrete form of \(\rho\) and \(\omega\) (see 34 ), we conclude that the minimum of 8 takes the form \[\begin{align} \min \left\{ \mathrm{tr}_{\mathcal{H}\otimes \mathcal{H}^*}\left[ \Pi \left( \sum_{k=1}^3 \left| \sigma_k\otimes I^T-I\otimes \sigma_k^T \right|^p \right) \right] \, \middle| \, \Pi \in \mathcal{C}\left( 1/2(I+1/2\sigma_z), 1/2(I-1/2\sigma_z) \right) \right\} \nonumber \\ =2^p\left(1+\frac{1}{2}-\sqrt{\left(1-\frac{1}{2}\right)\left(1-\frac{1}{2}\right)}\right)=2^p, \end{align}\] where we made use of the fact that \(\rho\) and \(\omega\) given by 34 commute, and used Theorem 4 from the subsequent section where we give an explicit closed form for the optimal transport cost between commuting states. On the other hand, using again the the concrete form of \(\rho\) and \(\omega\) we conclude that the minimum of ?? takes the following form \[\begin{align} \label{eq:genprimal95halfway} \min \left\{ \sum_{k=1}^3\mathrm{tr}_{\mathcal{H}\otimes \mathcal{H}^*}\left[ \Pi_k \left| \sigma_k\otimes I^T-I\otimes \sigma_k^T \right|^p \right] \, \middle| \, \Pi_1,\Pi_2, \Pi_3 \in \mathcal{C}\left( 1/2(I+1/2\sigma_z), 1/2(I-1/2\sigma_z) \right) \right\} \nonumber \\ =\sum_{k=1}^3 \min\left\{ \mathrm{tr}_{\mathcal{H}\otimes \mathcal{H}^*} \left[ \Pi_k \left| \sigma_k\otimes I^T-I\otimes \sigma_k^T \right|^p \right] \middle| \, \Pi_k \in \mathcal{C}\left( 1/2(I+1/2\sigma_z), 1/2(I-1/2\sigma_z) \right) \right\}. \end{align}\tag{36}\] The first two terms of the sum on the right-hand side of 36 can be computed explicitly by Theorem 7 of the next Section (with appropriate changes of basis), while the third term is given by Proposition 10 there. Accordingly, \[\begin{align} \sum_{k=1}^3 \min\left\{ \mathrm{tr}_{\mathcal{H}\otimes \mathcal{H}^*} \left[ \Pi_k \left| \sigma_k\otimes I^T-I\otimes \sigma_k^T \right|^p \right] \middle| \, \Pi_k \in \mathcal{C}\left( 1/2(I+1/2\sigma_z), 1/2(I-1/2\sigma_z) \right) \right\} \nonumber \\ =2^p\left(1-\sqrt{1-\frac{1}{2^2}}\right)+2^{p-1}=2^p-\left(\frac{\sqrt{3}-1}{2}\right)2^p<2^p, \end{align}\] which completes the proof. ◻
In this section, we will apply the Kantorovich duality results Theorem 1 and Corollary 1 to prove the optimality of certain quantum couplings and operator Kantorovich potentials. We consider the case of quantum bits, that is, \(\mathcal{H}=\mathbb{C}^2,\) and the following transportation costs will be studied (with the notation introduced at the beginning of Section 2):
\(K=3, \, \mathcal{A}=\left\{ \sigma_1, \sigma_2,\sigma_3 \right\},\) and \(c(x,y)=\left|\left|x-y\right|\right|_p^p,\) where \(x,y \in \mathbb{R}^3,\) and \(\left|\left|\cdot\right|\right|_p\) is the \(l_p\) norm there;
\(K=1, \, \mathcal{A}=\left\{ \sigma_3 \right\},\) and \(c(x,y)=\left| x-y \right|^p,\) where \(x,y \in \mathbb{R}.\)
Let \(K=3, \, \mathcal{A}=\left\{ \sigma_1, \sigma_2,\sigma_3 \right\},\) and \(c(x,y)=\left|\left|x-y\right|\right|_p^p\) for some parameter \(p\geq 1.\) According to 7 , the cost operator \(C_c^{(\mathcal{A})}\) is the one given in 35 , and the primal quantum optimal transport problem 8 reduces to \[\begin{align} \text{minimize } \mathrm{tr}_{\mathcal{H}\otimes \mathcal{H}^*}\left[ \Pi C_{\text{symm},p} \right] \text{ over all } \Pi \in \mathcal{C}\left( \rho, \omega \right), \end{align}\] where \[\begin{align} C_{\text{symm},p}=\sum_{k=1}^3 \left| \sigma_k\otimes I^T-I\otimes \sigma_k^T \right|^p. \end{align}\] The cost operator \(C_{\text{symm},p}\) can be computed explicitly: \[\begin{align} \label{eq:Csymm} C_{\text{symm},p}=2^{p+1}I\otimes I^T -2^p|| I \rangle\rangle\langle\langle I || =\left[ \begin{array}{cccc} 2^{p} & 0 & 0 & -2^{p}\\ 0 & 2^{p+1} & 0 & 0\\ 0 & 0 & 2^{p+1} & 0\\ -2^{p} & 0 & 0 & 2^{p}\end{array} \right]. \end{align}\tag{37}\] The matrix form of the symmetric cost operator \(C_{\text{symm},p}\) is in fact basis-invariant, that is, \[\begin{align} \label{eq:C-symm-unitary-invariance} \left( U \otimes \left( U^* \right)^T \right) C_{\text{symm},p} \left( U^* \otimes U^T \right)=C_{\text{symm},p} \end{align}\tag{38}\] for every unitary \(U\) acting on \(\mathbb{C}^2.\) Consequently, for commuting quantum bits one can assume without loss of generality that both qubits commute with \(\sigma_z\).
In the following Proposition 3 and Theorem 4 we determine the optimal couplings of commuting quantum bits with respect to the transportation cost described by \(C_{\text{symm},p},\) and we give a simple closed form for the induced \(p\)-Wasserstein distance \(D_{\text{symm},p}.\) We recall that according to the recipe given in [72], the \(p\)-Wasserstein distance \(D_{\text{symm},p}\) corresponding to the cost operator \(C_{\text{symm},p}\) is defined by \[\begin{align} \label{eq:D-symm-p-def} D_{\text{symm},p}=\left( \min_{\Pi \in \mathcal{C}(\rho, \omega)} \left\{ \mathrm{tr}_{\mathcal{H}\otimes \mathcal{H}^*}\left[ \Pi C_{\text{symm},p} \right] \right\} \right)^{\frac{1}{p}}. \end{align}\tag{39}\]
Proposition 3. Let \[\label{eq:rho95alpha}\begin{align} \rho(\alpha):=\frac{1}{2}\left(I+\alpha\sigma_z\right)=\begin{pmatrix}\frac{1+\alpha}{2} & 0 \\ 0 & \frac{1 - \alpha}{2} \end{pmatrix}, \end{align}\qquad{(9)}\] for \(\alpha \in [-1,1].\) Then the optimal coupling of \(\rho(\alpha)\) and \(\rho(\beta)\) is given by 40 , and \[\label{eq:D95comm}\begin{align} D^p_{\text{symm},p}(\rho(\alpha),\rho(\beta))=2^p\left(1+\frac{1}{2}\left| \alpha-\beta \right|-\sqrt{(1+\min(\alpha,\beta))(1-\max(\alpha,\beta))}\right). \end{align}\qquad{(10)}\]
Proof. By using the symmetry mentioned above, one could also assume without losing generality that e.g. \(\alpha\geq\beta\) and then arrive to ?? without the extrema. Instead, for completeness we will prove ?? directly. Let \(z_-:=\min(\alpha,\beta)\) and \(z_+:=\max(\alpha,\beta)\), then let \[\begin{align} \label{eq:rho-alpha-beta-optimal-coupling} \Pi(\alpha,\beta):=\frac{1}{2}\left[ \begin{array}{cccc} 1+z_{-} & 0 & 0 & \sqrt{(1+z_-)(1-z_+)}\\ 0 & \max(\beta-\alpha,0) & 0 & 0\\ 0 & 0 & \max(\alpha-\beta,0) & 0\\ \sqrt{(1+z_-)(1-z_+)} & 0 & 0 & 1-z_+\end{array} \right]. \end{align}\tag{40}\] The matrix \(\Pi(\alpha,\beta)\) is clearly hermitian and is positive-semidefinite by Sylvester’s criterion, since all principal minors of \(\Pi(\alpha,\beta)\) are nonnegative. It is easy to check that \(\mathrm{tr}_1 \left[ \Pi(\alpha,\beta) \right]=\rho(\alpha)^T\) while \(\mathrm{tr}_2 \left[ \Pi(\alpha,\beta) \right]=\rho(\beta)\) (and consequently, \(\mathrm{tr}\left[ \Pi(\alpha,\beta) \right]=1\)), which demonstrate that \(\Pi(\alpha,\beta)\) is a coupling of \(\rho(\alpha)\) and \(\rho(\beta).\) It follows that \[\begin{align} \label{eq:D95upperbound} D^p_{\text{symm},p}(\rho(\alpha),\rho(\beta))&\leq \mathrm{tr}\left[ C_{\text{symm},p}\Pi(\alpha,\beta) \right] =2^{p+1}-2^p\langle \langle I | |\Pi(\alpha,\beta)| | I \rangle \rangle \nonumber \\ &=2^{p+1}-2^{p-1}\left((1+z_-)+(1-z_+)+2\sqrt{(1+z_-)(1-z_+)} \right)\\ &=2^p\left(1+\frac{1}{2}\left| \alpha-\beta \right|-\sqrt{(1+\min(\alpha,\beta))(1-\max(\alpha,\beta))}\right). \end{align}\tag{41}\] On the other hand, if \(\left| \alpha \right|\neq 1\) and \(\left| \beta \right|\neq 1\) consider \[\begin{align} X_1=\left[ \begin{array}{cc} -2^p\sqrt{\frac{1-\beta}{1+\alpha}}-2^p & 0 \\ 0 & 0 \end{array} \right],\quad Y_1=\left[ \begin{array}{cc} 2^{p+1} & 0 \\ 0 & 2^p-2^p\sqrt{\frac{1+\alpha}{1-\beta}} \end{array} \right], \end{align}\] and \[\begin{align} X_2=\left[ \begin{array}{cc} 2^{p+1} & 0 \\ 0 & 2^p-2^p\sqrt{\frac{1+\beta}{1-\alpha}} \end{array} \right],\quad Y_2=\left[ \begin{array}{cc} -2^p\sqrt{\frac{1-\alpha}{1+\beta}}-2^p & 0 \\ 0 & 0 \end{array} \right]. \end{align}\] Clearly, \(X_1\), \(X_2\), \(Y_1\) and \(Y_2\) are self-adjoint. It is also evident by Sylvester’s criterion, that \[\begin{align} C_{\text{symm},p}-Y_1\otimes I^T-I\otimes X_1^T= \left[ \begin{array}{cccc}2^p\sqrt{\frac{1-\beta}{1+\alpha}} & 0 & 0 & -2^p\\ 0 & 0 & 0 & 0\\ 0 & 0 & 2^{p+1}+2^p\sqrt{\frac{1-\beta}{1+\alpha}}+2^p\sqrt{\frac{1+\alpha}{1-\beta}} & 0\\ -2^p & 0 & 0 & 2^p\sqrt{\frac{1+\alpha}{1-\beta}}\end{array} \right]\geq 0, \end{align}\] and \[\begin{align} C_{\text{symm},p}-Y_2\otimes I^T-I\otimes X_2^T= \left[ \begin{array}{cccc}2^p\sqrt{\frac{1-\alpha}{1+\beta}} & 0 & 0 & -2^p\\ 0 & 2^{p+1}+2^p\sqrt{\frac{1-\alpha}{1+\beta}}+2^p\sqrt{\frac{1+\beta}{1-\alpha}} & 0 & 0\\ 0 & 0 & 0 & 0\\ -2^p & 0 & 0 & 2^p\sqrt{\frac{1+\beta}{1-\alpha}}\end{array} \right]\geq 0. \end{align}\] Therefore, \[\begin{align}\label{eq:D95lowerbound} &D^p_{\text{symm},p}(\rho(\alpha),\rho(\beta))\geq \max\left\{ \mathrm{tr}\left[ X_1\rho(\alpha) \right]+\mathrm{tr}\left[ Y_1\rho(\beta) \right],\mathrm{tr}\left[ X_2\rho(\alpha) \right]+\mathrm{tr}\left[ Y_2\rho(\beta) \right] \right\}\\ =&\max\left\{ -2^{p}\sqrt{(1+\alpha)(1-\beta)}+2^p+2^{p-1}(\beta-\alpha),-2^{p}\sqrt{(1+\beta)(1-\alpha)}+2^p+2^{p-1}(\alpha-\beta) \right\}\\ =&2^p\left(1+\frac{1}{2}\left| \alpha-\beta \right|-\sqrt{(1+\min(\alpha,\beta))(1-\max(\alpha,\beta))}\right). \end{align}\tag{42}\] For mixed states, combining 41 and 42 completes the proof. If either state is pure then it is known that there is only one coupling, the tensor product, and therefore 41 is an equality rather than an upper bound. ◻
Theorem 4. Let \(\rho\) denote now the standard Bloch parametrization of quantum bits, that is, \[\label{eq:rho95r}\begin{align} \rho(\vec{r}):=\frac{1}{2}\left(I+\vec{r}\cdot \vec{\sigma}\right), \text{ where } \vec{\sigma}=\left( \sigma_1,\sigma_2, \sigma_3 \right), \end{align}\qquad{(11)}\] and let us assume that \(\vec{r}_1\) and \(\vec{r}_2\) are scalar multiples of each other implying that \(\rho(\vec{r}_1)\) and \(\rho(\vec{r}_2)\) commute. Then \[\label{eq:D95comm95gen}\begin{align} D^p_{\text{symm},p}(\rho(\vec{r}_1),\rho(\vec{r}_2)) =2^p\left(1+\frac{1}{2}\left| \vec{r}_1-\vec{r}_2 \right|-\sqrt{\left( 1+\frac{\vec{r}_1 \cdot \vec{r}_2}{\max\{\left| \vec{r_1} \right|,\left| \vec{r_2} \right|\}} \right)\left( 1-\max\{\left| \vec{r_1} \right|,\left| \vec{r_2} \right|\} \right)}\right). \end{align}\qquad{(12)}\]
It is an interesting phenomenon that, according to many of the approaches including the one we follow in the present work [57], [58], the quantum Wasserstein distance of states is not a bona fide metric, for example, states may have positive distance from themselves. As a response to this phenomenon, De Palma and Trevisan introduced quadratic quantum Wasserstein divergences [58], which are appropriately modified versions of quadratic quantum Wasserstein distances, to eliminate self-distances. Their definition of the quadratic quantum Wasserstein divergence \(d_{\mathcal{A},2}\) corresponding to the collection \(\mathcal{A}=\left\{ A_1, \dots, A_K \right\}\) of observables is the following: \[\begin{align} \label{eq:quadratic-divergence-def} d_{\mathcal{A},2}\left( \rho, \omega \right):=\left( D_{\mathcal{A},2}^2\left( \rho, \omega \right)-\frac{1}{2}\left( D_{\mathcal{A},2}^2\left( \rho, \rho \right)+D_{\mathcal{A},2}^2\left( \omega, \omega \right) \right) \right)^{\frac{1}{2}}, \end{align}\tag{43}\] where \[\begin{align} \label{eq:quadratic-distance-def} D_{\mathcal{A},2}^2\left( \rho, \omega \right)= \min\left\{ \mathrm{tr}_{\mathcal{H}\otimes \mathcal{H}^*}\left[ \Pi \left( \sum_{k=1}^K \left( A_k \otimes I^T + I \otimes A_k^T \right)^2 \right) \right] \, \middle| \Pi \in \mathcal{C}(\rho, \omega) \right\}. \end{align}\tag{44}\] They conjectured that the divergences defined this way are genuine metrics on quantum state spaces [58], and this conjecture has recently been justified under certain additional assumptions [77].
In the following corollary, we use Theorem 4 to obtain a closed form for the quadratic divergence \(d_{\text{symmm},2}=d_{\left\{ \sigma_1,\sigma_2,\sigma_3 \right\},2}.\)
Corollary 3. Let \(\rho\) denote the Bloch parametrization as in ?? , let the \(2\)-Wasserstein distance \(D_{\text{symm},2}\) be given by 39 , and let the corresponding quadratic Wasserstein divergence \(d_{\text{symm},2}\) be given by 43 . Assume that \(\vec{r}_2\) is a scalar multiple of \(\vec{r}_1\) and hence \(\rho(\vec{r}_1)\) and \(\rho(\vec{r}_2)\) commute. Then \[\begin{align} d^2_{\text{symm},2}(\rho(\vec{r}_1),\rho(\vec{r}_2)) \nonumber \\ =2\left(\left| \vec{r}_1-\vec{r}_2 \right|+\sqrt{1-r_1^2}+\sqrt{1-r_2^2}-2\sqrt{\left( 1+\frac{\vec{r}_1 \cdot \vec{r}_2}{\max\{\left| \vec{r_1} \right|,\left| \vec{r_2} \right|\}} \right)\left( 1-\max\{\left| \vec{r_1} \right|,\left| \vec{r_2} \right|\} \right)}\right). \end{align}\]
Proof. Direct computation shows that \[\begin{align} \empty &d^2_{\text{symm},2}(\rho(\vec{r}_1),\rho(\vec{r}_2))\\ =&D^2_{\text{symm},2}(\rho(\vec{r}_1),\rho(\vec{r}_2))-\frac{1}{2}\mathrm{Tr}C_{\text{symm},2}\left(| | \sqrt{\rho(\vec{r}_1)} \rangle \rangle\langle \langle \sqrt{\rho(\vec{r}_1)} | |+| | \sqrt{\rho(\vec{r}_2)} \rangle \rangle\langle \langle \sqrt{\rho(\vec{r}_2)} | |\right)\\ =&D^2_{\text{symm},2}(\rho(\vec{r}_1),\rho(\vec{r}_2))-\frac{1}{2} \left(\langle \langle \sqrt{\rho(\vec{r}_1)} | |C_{\text{symm},2}| | \sqrt{\rho(\vec{r}_1)} \rangle \rangle+\langle \langle \sqrt{\rho(\vec{r}_2)} | |C_{\text{symm},2}| | \sqrt{\rho(\vec{r}_2)} \rangle \rangle\right)\\ =&D^2_{\text{symm},2}(\rho(\vec{r}_1),\rho(\vec{r}_2))-\frac{1}{2} \left(2^3-2^2\left| \left<\left<I\middle\|\sqrt{\rho(\vec{r}_1)}\right>\right> \right|+2^3-2^2\left| \left<\left<I\middle\|\sqrt{\rho(\vec{r}_2)}\right>\right> \right|\right)\\ =&D^2_{\text{symm},2}(\rho(\vec{r}_1),\rho(\vec{r}_2))-2^3+2 \left(\left[\mathrm{Tr}\sqrt{\rho(\vec{r}_1)}\right]^2+\left[\mathrm{Tr}\sqrt{\rho(\vec{r}_2)}\right]^2\right)\\ =&D^2_{\text{symm},2}(\rho(\vec{r}_1),\rho(\vec{r}_2))-2^3+2 \left(\left[\sqrt{\frac{1+r_1}{2}}+\sqrt{\frac{1-r_1}{2}}\right]^2+\left[\sqrt{\frac{1+r_2}{2}}+\sqrt{\frac{1-r_2}{2}}\right]^2\right)\\ =&D^2_{\text{symm},2}(\rho(\vec{r}_1),\rho(\vec{r}_2))-2^2+2\sqrt{1-r_1^2}+2\sqrt{1-r_2^2}\\ =&2^2\left(1+\frac{1}{2}\left| \vec{r}_1-\vec{r}_2 \right|-\sqrt{(1+\frac{\vec{r}_1\vec{r}_2}{\max(r_1,r_2)})(1-\max(r_1,r_2))}\right)-2^2+2\sqrt{1-r_1^2}+2\sqrt{1-r_2^2}\\ =&2\left(\left| \vec{r}_1-\vec{r}_2 \right|+\sqrt{1-r_1^2}+\sqrt{1-r_2^2}-2\sqrt{(1+\frac{\vec{r}_1\vec{r}_2}{\max(r_1,r_2)})(1-\max(r_1,r_2))}\right), \end{align}\] where we used Theorem 4 in the penultimate equality. ◻
Using the above obtained closed formula for the quadratic Wasserstein divergence \(d_{\text{symm},2},\) we prove in the next proposition that even the squared quantity \(d_{\text{symm},2}^2\) satisfies the triangle inequality if all three qubits involved commute with each other.
Proposition 5. For commuting qubits \(\rho,\sigma,\omega \in \mathcal{S}\left( \mathbb{C}^2 \right)\) the triangle inequality \[\begin{align} d^2_{\text{symm},2}(\rho,\sigma)+d^2_{\text{symm},2}(\sigma,\omega) \geq d^2_{\text{symm},2}(\rho,\omega) \end{align}\] holds.
Proof. For commuting \(\rho,\sigma,\omega\) it can be assumed that there are real numbers \(-1\leq \alpha,\beta,\gamma\leq 1\), for which \(\rho=\rho(\alpha),\sigma=\rho(\beta),\omega=\rho(\gamma)\) as in ?? . Thus by Corollary 3 we have that \[\begin{align}\label{eq:triang95ineq} &\frac{1}{2}\left(d^2_{\text{symm},2}(\rho,\sigma)+d^2_{\text{symm},2}(\sigma,\omega)-d^2_{\text{symm},2}(\rho,\omega)\right)\\=&\left| \alpha-\beta \right|+\sqrt{1-\alpha^2}+\sqrt{1-\beta^2}-2\sqrt{(1+\min(\alpha,\beta))(1-\max(\alpha,\beta))}\\+&\left| \beta-\gamma \right|+\sqrt{1-\beta^2}+\sqrt{1-\gamma^2}-2\sqrt{(1+\min(\beta,\gamma))(1-\max(\beta,\gamma))}\\-&\left| \alpha-\gamma \right|-\sqrt{1-\alpha^2}-\sqrt{1-\gamma^2}+2\sqrt{(1+\min(\alpha,\gamma))(1-\max(\alpha,\gamma))}\\=&\left| \alpha-\beta \right|+\left| \beta-\gamma \right|-\left| \alpha-\gamma \right|\\ +&2\sqrt{1-\beta^2}-2\sqrt{(1+\min(\alpha,\beta))(1-\max(\alpha,\beta))}\\ -&2\sqrt{(1+\min(\beta,\gamma))(1-\max(\beta,\gamma))}+2\sqrt{(1+\min(\alpha,\gamma))(1-\max(\alpha,\gamma))}\\ \geq&2\sqrt{1-\beta^2}-2\sqrt{(1+\min(\alpha,\beta))(1-\max(\alpha,\beta))}\\ -&2\sqrt{(1+\min(\beta,\gamma))(1-\max(\beta,\gamma))}+2\sqrt{(1+\min(\alpha,\gamma))(1-\max(\alpha,\gamma))}\\ \end{align}\tag{45}\] where we used the triangle inequality for \(d(a,b):=\left| a-b \right|\).
If \(\alpha\leq\beta\leq\gamma\), then the last line of 45 takes the following form: \[\begin{align}\label{eq:order95abc} 2\sqrt{(1+\beta)(1-\beta)}-2\sqrt{(1+\alpha)(1-\beta)}-2\sqrt{(1+\beta)(1-\gamma)}+2\sqrt{(1+\alpha)(1-\gamma)}. \end{align}\tag{46}\] If \(\alpha\leq\beta\leq\gamma\), then \[\begin{align} (\gamma-\beta)(\beta-\alpha)=\gamma\beta-\beta^2-\alpha\gamma+\alpha\beta&\geq 0\quad \Leftrightarrow\\ -\beta^2-\alpha\gamma&\geq -\gamma\beta-\alpha\beta\quad \Leftrightarrow\\ \left(\sqrt{(1+\beta)(1-\beta)}+\sqrt{(1+\alpha)(1-\gamma)}\right)^2&\geq \left(\sqrt{(1+\alpha)(1-\beta)}+\sqrt{(1+\beta)(1-\gamma)}\right)^2\quad \Leftrightarrow\\ \sqrt{(1+\beta)(1-\beta)}+\sqrt{(1+\alpha)(1-\gamma)}&\geq \sqrt{(1+\alpha)(1-\beta)}+\sqrt{(1+\beta)(1-\gamma)}, \end{align}\] from which it follows that if \(\alpha\leq\beta\leq\gamma\) 46 is nonnegative and then so is the last line of 45 . If \(\beta\leq\alpha\leq\gamma\), then half of the last line of 45 takes the following form: \[\begin{align} &\sqrt{(1+\beta)(1-\beta)}-\sqrt{(1+\beta)(1-\alpha)}-\sqrt{(1+\beta)(1-\gamma)}+\sqrt{(1+\alpha)(1-\gamma)}\\ =&\sqrt{(1+\beta)}\left(\sqrt{1-\beta}-\sqrt{(1-\alpha)}\right)+\sqrt{1-\gamma}\left(\sqrt{(1+\alpha)}-\sqrt{(1+\beta)}\right), \end{align}\] which is then nonnegative by assumption. The other four cases of the order of \(\alpha,\beta,\gamma\) can be transformed into either one of the above two with the use of variable changes \(\alpha':=(1-\alpha)\), \(\beta':=(1-\beta)\), \(\gamma':=(1-\gamma)\) and using the fact that 45 is symmetric in \(\alpha\) and \(\gamma\). Thus the last line of 45 is nonnegative which completes the proof. ◻
In this subsection we consider the case when a single observable generates the transport cost. On quantum bits, we may assume (up to an affine rescaling of the observable and a conjugation by a unitary) that this observable is \(\sigma_3=\sigma_z.\) So, we concern the setting described at the beginning of Section 2 and take \(K=1, \, \mathcal{A}=\{\sigma_z\},\) and \(c(x,y)=\left| x-y \right|^p.\) This choice gives rise to the cost operator \[\begin{align} \label{eq:Cz} C_{z,p}:= C_c^{(\mathcal{A})}= \left| \sigma_z\otimes I^T-I\otimes \sigma_z^T \right|^p=2^{p-1}\left(I\otimes I^T -\sigma_z\otimes\sigma_z^T\right) =\left[ \begin{array}{cccc} 0 & 0 & 0 & 0\\ 0 & 2^{p} & 0 & 0\\ 0 & 0 & 2^{p} & 0\\ 0 & 0 & 0 & 0\end{array} \right] \end{align}\tag{47}\] where \(p\geq 1.\) The transport cost \(C_{z,p}\) is invariant under unitary conjugations of the form \[\label{eq:unit-rot-1}\begin{align} X \mapsto \left( I\otimes \left( \exp\left( i\frac{\varphi}{2}\sigma_z \right) \right)^T \right) X \left( I\otimes \left( \exp\left( -i\frac{\varphi}{2}\sigma_z \right) \right)^T \right) \end{align}\tag{48}\] and \[\label{eq:unit-rot-2}\begin{align} X \mapsto \left( \exp\left( i\frac{\varphi}{2}\sigma_z \right)\otimes I^T \right) X \left( \exp\left( -i\frac{\varphi}{2}\sigma_z \right)\otimes I^T \right) \end{align}\tag{49}\] however, \[\begin{align} \Pi \in \mathcal{C}(\rho,\omega)\Longleftrightarrow \left( \exp\left( i\frac{\varphi}{2}\sigma_z \right)\otimes I^T \right) \Pi \left( \exp\left( -i\frac{\varphi}{2}\sigma_z \right)\otimes I^T \right) \in \mathcal{C}\left( \rho, \exp\left( i\frac{\varphi}{2}\sigma_z \right) \omega \exp\left( -i\frac{\varphi}{2}\sigma_z \right) \right) \nonumber \\ \Longleftrightarrow \left( I\otimes \left( \exp\left( i\frac{\varphi}{2}\sigma_z \right) \right)^T \right) X \left( I\otimes \left( \exp\left( -i\frac{\varphi}{2}\sigma_z \right) \right)^T \right) \in\mathcal{C}\left( \exp\left( i\frac{\varphi}{2}\sigma_z \right) \rho \exp\left( -i\frac{\varphi}{2}\sigma_z \right),\omega \right). \end{align}\]
This shows that in general whenever evaluating the \(p\)-Wasserstein distance \[\begin{align} \label{eq:D-z-p-def} D_{z,p}(\rho, \omega):=\left( \min\left\{ \mathrm{tr}_{\mathcal{H}\otimes \mathcal{H}^*}\left[ \Pi C_{z,p} \right] \, \middle| \, \Pi \in \mathcal{C}(\rho, \omega) \right\} \right)^{\frac{1}{p}} \end{align}\tag{50}\] between two qubits, one can rotate them such that neither qubit has a \(\sigma_y\) coordinate anymore and compute \(D_{z,p}\) then.
In the following Proposition 6 and Theorem 7 we give a simple closed formula for the \(p\)-Wasserstein distance \(D_{z,p}\) in the case when both qubits are orthogonal to \(\sigma_z\) in the Hilbert-Schmidt sense. We obtain the formula for \(D_{z,p}\) by determining the optimal transport plans and Kantorovich potentials, and we use the Kantorovich duality obtained in Section 2 to prove the optimality of these couplings and potentials.
Proposition 6. Let \(\rho\) denote now the following reduced Bloch parametrization: \[\label{eq:rho95alpha95x}\begin{align} \rho(\alpha):=\frac{1}{2}\left(I+\alpha\sigma_x\right)=\frac{1}{2}\left[ \begin{array}{cc}1 & \alpha \\ \alpha & 1 \end{array} \right]. \end{align}\qquad{(13)}\] Then \[\label{eq:D95z95xy}\begin{align} D^p_{z,p}(\rho(\alpha),\rho(\beta))=2^{p-1}\left(1-\sqrt{1-\max\left(\alpha^2,\beta^2\right)}\right). \end{align}\qquad{(14)}\]
Proof. By using the symmetry mentioned above, one can assume without losing generality that \(\left| \alpha \right|\geq\left| \beta \right|\). However, for completeness we will prove ?? directly for \(\left| \alpha \right|<\left| \beta \right|\) as well. Suppose now that \(\left| \alpha \right|\geq\left| \beta \right|\) and \(\left| \alpha \right|>0\) and consider \[\begin{align} \Pi_+(\alpha,\beta):=\frac{1}{4}\left[ \begin{array}{cccc} 1+\sqrt{1-\alpha^2} & \alpha & \beta & \frac{\left(1+\sqrt{1-\alpha^2}\right)\beta}{\alpha}\\ \alpha & 1-\sqrt{1-\alpha^2} & \frac{\left(1-\sqrt{1-\alpha^2}\right)\beta}{\alpha} & \beta\\ \beta & \frac{\left(1-\sqrt{1-\alpha^2}\right)\beta}{\alpha} & 1-\sqrt{1-\alpha^2} & \alpha\\ \frac{\left(1+\sqrt{1-\alpha^2}\right)\beta}{\alpha} & \beta & \alpha & 1+\sqrt{1-\alpha^2}\end{array} \right]. \end{align}\] \(\Pi_+(\alpha,\beta)\) is clearly hermitian and is positive-semidefinite by Sylvester’s criterion. To see that all principal minors are nonnegative note that the first two columns, the last two columns, the first two rows and the last two rows are all proportional pairs, with rate \(\frac{\alpha}{1+\sqrt{1-\alpha^2}}=\frac{1-\sqrt{1-\alpha^2}}{\alpha}\). It follows that the determinant and all minors of size 3 are 0-valued. All the elements in the diagonal are clearly nonnegative. Two of the principal minors of size 2 are 0-valued from the linear dependence. The nontrivial principal minors of size 2 are given by rows and columns \(\{(1,3),(1,4),(2,3),(2,4)\}\). Nonnegativity for principal minors \(\{(1,3),(2,4)\}\) yields the same condition \[\begin{align} \left(1+\sqrt{1-\alpha^2}\right)\left(1-\sqrt{1-\alpha^2}\right)=\alpha^2\geq\beta^2, \end{align}\] which is fulfilled by assumption. Nonnegativity for principal minors \(\{(1,4),(2,3)\}\) yields \[\begin{align} \left(1+\sqrt{1-\alpha^2}\right)^2\geq\frac{\left(1+\sqrt{1-\alpha^2}\right)^2\beta^2}{\alpha^2}\Leftrightarrow1\geq\frac{\beta^2}{\alpha^2}\Leftrightarrow\left(1-\sqrt{1-\alpha^2}\right)^2\geq\frac{\left(1-\sqrt{1-\alpha^2}\right)^2\beta^2}{\alpha^2}, \end{align}\] which is then again fulfilled by assumption. Easy computations show that \(\mathrm{Tr}_1 \left[ \Pi_+(\alpha,\beta) \right]=\rho(\alpha)^T\), while \(\mathrm{Tr}_2 \left[ \Pi_+(\alpha,\beta) \right]=\rho(\beta),\) which means that \(\Pi_+(\alpha,\beta)\) is a coupling of \(\rho(\alpha)\) and \(\rho(\beta)\). It follows that whenever \(\left| \alpha \right|\geq\left| \beta \right|\) and \(\left| \alpha \right|>0\), \[\begin{align} \label{eq:D95z95upperbound43} D^p_{z,p}(\rho(\alpha),\rho(\beta))&\leq \mathrm{Tr}C_{z,p}\Pi_+(\alpha,\beta)=2^{p-1}\left(1-\sqrt{1-\alpha^2}\right). \end{align}\tag{51}\] If \(\left| \alpha \right|<\left| \beta \right|\), then let us define \[\begin{align} \Pi_-(\alpha,\beta):=(\Pi_+(\beta,\alpha))^{ST}=\frac{1}{4}\left[ \begin{array}{cccc}1+\sqrt{1-\beta^2} & \alpha & \beta & \frac{\left(1+\sqrt{1-\beta^2}\right)\alpha}{\beta}\\ \alpha & 1-\sqrt{1-\beta^2} & \frac{\left(1-\sqrt{1-\beta^2}\right)\alpha}{\beta} & \beta\\ \beta & \frac{\left(1-\sqrt{1-\beta^2}\right)\alpha}{\beta} & 1-\sqrt{1-\beta^2} & \alpha\\ \frac{\left(1+\sqrt{1-\beta^2}\right)\alpha}{\beta} & \beta & \alpha & 1+\sqrt{1-\beta^2}\end{array} \right], \end{align}\] where \((\cdot)^{ST}\) denotes the swap transposition on \(\mathcal{T}_1\left( \mathcal{H}\otimes \mathcal{H}^* \right),\) which is the linear extension of the map \(A \otimes B^T \mapsto B \otimes A^T.\) The state \(\Pi_-(\alpha,\beta)\) is a coupling of \(\rho(\beta)\) and \(\rho(\alpha)\). It follows that whenever \(\left| \alpha \right|<\left| \beta \right|\), \[\begin{align} \label{eq:D95z95upperbound-} D^p_{z,p}(\rho(\alpha),\rho(\beta))&\leq \mathrm{Tr}C_{z,p}\Pi_-(\alpha,\beta)=2^{p-1}\left(1-\sqrt{1-\beta^2}\right). \end{align}\tag{52}\] If \(\alpha=\beta=0\), then \[\begin{align} \Pi_0:=\left[ \begin{array}{cccc}\frac{1}{2} & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & \frac{1}{2}\end{array} \right] \end{align}\] can be directly seen to be an optimal coupling yielding \[\begin{align} \label{eq:D95z95upperbound0} D^p_{z,p}(I/2,I/2)=\mathrm{Tr}C_{z,p}\Pi_0=0. \end{align}\tag{53}\] 52 , 51 , 53 together yield \[\label{eq:D95z95upperbound}\begin{align} D^p_{z,p}(\rho(\alpha),\rho(\beta))\leq2^{p-1}\left(1-\sqrt{1-\max\left(\alpha^2,\beta^2\right)}\right), \end{align}\tag{54}\] without further assumptions other than \(\rho(\alpha),\rho(\beta)\) having the form of ?? . Now let \(M=\max(\left| \alpha \right|,\left| \beta \right|)\), suppose that \(M<1\) and consider \[\begin{align} X_{\pm}=2^{p-1}\left[ \begin{array}{cc} 1-\frac{1}{\sqrt{1-M^2}} & \pm\sqrt{\frac{M^2}{1-M^2}} \\ \pm\sqrt{\frac{M^2}{1-M^2}} & 1-\frac{1}{\sqrt{1-M^2}} \end{array} \right],\quad Y=\left[ \begin{array}{cc} 0 & 0 \\ 0 & 0 \end{array} \right]. \end{align}\] Clearly, \(X_\pm\) and \(Y\) are self-adjoint. It is also evident by Sylvester’s criterion, that \[\begin{align} C_{z,p}-Y\otimes I^T-I\otimes X_\pm^T= 2^{p-1}\left[ \begin{array}{cccc} \frac{1}{\sqrt{1-M^2}}-1& \mp\sqrt{\frac{M^2}{1-M^2}} & 0 & 0\\ \mp\sqrt{\frac{M^2}{1-M^2}} & \frac{1}{\sqrt{1-M^2}}+1 & 0 & 0\\ 0 & 0 & \frac{1}{\sqrt{1-M^2}}+1 & \mp\sqrt{\frac{M^2}{1-M^2}}\\ 0 & 0 & \mp\sqrt{\frac{M^2}{1-M^2}} & \frac{1}{\sqrt{1-M^2}}-1\end{array} \right]\geq 0, \end{align}\] as well as \[\begin{align} C_{z,p}-X_\pm\otimes I^T-I\otimes Y^T= 2^{p-1}\left[ \begin{array}{cccc} \frac{1}{\sqrt{1-M^2}}-1& 0 & \mp\sqrt{\frac{M^2}{1-M^2}} & 0\\ 0 & \frac{1}{\sqrt{1-M^2}}+1 & 0 & \mp\sqrt{\frac{M^2}{1-M^2}}\\ \mp\sqrt{\frac{M^2}{1-M^2}} & 0 & \frac{1}{\sqrt{1-M^2}}+1 & 0\\ 0 & \mp\sqrt{\frac{M^2}{1-M^2}} & 0 & \frac{1}{\sqrt{1-M^2}}-1\end{array} \right]\geq 0. \end{align}\] Thus \[\begin{align}\label{eq:D95z95lowerbound} &D^p_{z,p}(\rho(\alpha),\rho(\beta))\geq \max\left(\mathrm{Tr}X_+\rho(\alpha),\mathrm{Tr}X_-\rho(\alpha),\mathrm{Tr}X_+\rho(\beta),\mathrm{Tr}X_-\rho(\beta)\right)\\ =&2^{p-1}\left(1-\frac{1}{\sqrt{1-M^2}}\right)+2^{p-1}M\sqrt{\frac{M^2}{1-M^2}}\\ =&2^{p-1}\left(1-\frac{1-M^2}{\sqrt{1-M^2}}\right)=2^{p-1}\left(1-\sqrt{1-M^2}\right)=2^{p-1}\left(1-\sqrt{1-\max\left(\alpha^2,\beta^2\right)}\right). \end{align}\tag{55}\] For mixed states, combining 54 and 55 completes the proof. If either state is pure then it is known that there is only one coupling and therefore 54 is an equality rather than an upper bound. ◻
Theorem 7. Let \(\rho\) denote the standard Bloch parametrization, that is, \[\label{eq:Bloch-para}\begin{align} \rho(\vec{r}):=\frac{1}{2}\left(I+\vec{r}\cdot \vec{\sigma}\right), \end{align}\qquad{(15)}\] and let us assume that both \(\vec{r}_1\) and \(\vec{r}_2\) are orthogonal to \((0,0,1).\) Then \[\label{eq:D95xy95gen}\begin{align} D^p_{z,p}(\rho(\vec{r}_1),\rho(\vec{r}_2))=2^{p-1}\left(1-\sqrt{1-\max\left(r_1^2,r_2^2\right)}\right). \end{align}\qquad{(16)}\] In particular, if \(r_1\geq r_2\), then \[\begin{align} D^p_{z,p}(\rho(\vec{r}_1),\rho(\vec{r}_2))=D^p_{z,p}(\rho(\vec{r}_2),\rho(\vec{r}_1))=D^p_{z,p}(\rho(\vec{r}_1),\rho(\vec{r}_1)). \end{align}\]
Proof. This follows immediately from the invariance of 47 under unitary conjugations implementing rotations around the \(\sigma_z\) axis (see 48 and 49 ), and Proposition 6. ◻
The above obtained formula for the \(p\)-Wasserstein distance \(D_{z,p}\) gives rise to an explicit closed form for the corresponding quadratic Wasserstein divergence \(d_{z,2}\) — this is the content of the next corollary.
Corollary 4. Let \(\rho\) denote the Bloch parametrization as is ?? , and let the us consider the cost operator be \(C_{z,2}\) given in 47 . Assume that both \(\vec{r}_1\) and \(\vec{r}_2\) are orthogonal to \((0,0,1)\). Then we have \[\begin{align} d^2_{z,2}(\rho(\vec{r}_1),\rho(\vec{r}_2))=\sqrt{1-\min(r_1,r_2)^2}-\sqrt{1-\max(r_1,r_2)^2}. \end{align}\]
Proof. Immediate from Theorem 7, as \[\begin{align} d^2_{z,2}(\rho(\vec{r}_1),\rho(\vec{r}_2))&=D^2_{z,2}(\rho(\vec{r}_1),\rho(\vec{r}_2))-\frac{1}{2}\left(D^2_{z,2}(\rho(\vec{r}_1),\rho(\vec{r}_1))+D^2_{z,2}(\rho(\vec{r}_2),\rho(\vec{r}_2))\right)\\ &=D^2_{z,2}(\rho(\vec{r}_+),\rho(\vec{r}_+))-\frac{1}{2}\left(D^2_{z,2}(\rho(\vec{r}_1),\rho(\vec{r}_1))+D^2_{z,2}(\rho(\vec{r}_2),\rho(\vec{r}_2))\right)\\ &=\frac{1}{2}\left(D^2_{z,2}(\rho(\vec{r}_+),\rho(\vec{r}_+))-D^2_{z,2}(\rho(\vec{r}_-),\rho(\vec{r}_-))\right),\\ \end{align}\] where \((\vec{r}_+,\vec{r}_-)\) is a permutation of \((\vec{r}_1,\vec{r}_2)\), so that \(\left| \vec{r}_+ \right|\geq\left| \vec{r}_- \right|\). ◻
Remark 8. The quantity \(\sqrt{1-r^2}\) in Theorem 7 and Corollary 4 is the length of the tangent that can be drawn from the perimeter of the circle given by the intersection of the Bloch ball and the \(xy\) plane to the centered circle of radius \(r\) on which the qubit \(\rho(\vec{r})\) lies.
A consequence of the closed formula for single observable cost and qubits perpendicular to the observable is that we can prove the quadratic triangle inequality in this case as follows.
Proposition 9. Let \(\rho,\sigma,\omega \in \mathcal{S}(\mathbb{C}^2)\) be quantum bits such that all of them are orthogonal to \(\sigma_z\) in the Hilbert-Schmidt sense. Then even the square of the quadratic Wasserstein divergence \(d_{z,2}\) satisfies the triangle inequality, that is, \[\begin{align} d^2_{z,2}(\rho,\sigma)+d^2_{z,2}(\sigma,\omega)\geq d^2_{z,2}(\rho,\omega). \end{align}\]
Proof. Let \(\vec{r}_\rho\), \(\vec{r}_\sigma\) and \(\vec{r}_\omega\) be the Bloch vectors of \(\rho,\sigma\) and \(\omega\). By Corollary 4, \[\label{eq:triang95ineq95xz}\begin{align} &d^2_{z,2}(\rho,\sigma)+d^2_{z,2}(\sigma,\omega)-d^2_{z,2}(\rho,\omega)=\sqrt{1-\min(r_\rho,r_\sigma)^2}-\sqrt{1-\max(r_\rho,r_\sigma)^2}\\ +&\sqrt{1-\min(\vphantom{r_\rho}r_\sigma,r_\omega)^2}-\sqrt{1-\max(\vphantom{r_\rho}r_\sigma,r_\omega)^2}-\sqrt{1-\min(r_\rho,r_\omega)^2}+\sqrt{1-\max(r_\rho,r_\omega)^2}. \end{align}\tag{56}\] If \(r_\rho\leq r_\sigma\leq r_\omega\), then 56 takes the following form: \[\begin{align} &d^2_{z,2}(\rho,\sigma)+d^2_{z,2}(\sigma,\omega)-d^2_{z,2}(\rho,\omega)=\sqrt{1-r_\rho^2}-\sqrt{1-r_\sigma^2\vphantom{r_\rho}}\\ +&\sqrt{1-r_\sigma^2\vphantom{r_\rho}}-\sqrt{1-r_\omega^2\vphantom{r_\rho}}-\sqrt{1-r_\rho^2}+\sqrt{1-r_\omega^2\vphantom{r_\rho}}=0. \end{align}\] If \(r_\rho\leq r_\omega\leq r_\sigma\), then 56 takes the following form: \[\begin{align} &d^2_{z,2}(\rho,\sigma)+d^2_{z,2}(\sigma,\omega)-d^2_{z,2}(\rho,\omega)=\sqrt{1-r_\rho^2}-\sqrt{1-r_\sigma^2\vphantom{r_\rho}}\\ +&\sqrt{1-r_\omega^2\vphantom{r_\rho}}-\sqrt{1-r_\sigma^2\vphantom{r_\rho}}-\sqrt{1-r_\rho^2}+\sqrt{1-r_\omega^2\vphantom{r_\rho}}=2\left(\sqrt{1-r_\omega^2\vphantom{r_\rho}}-\sqrt{1-r_\sigma^2\vphantom{r_\rho}}\right), \end{align}\] which is nonnegative by assumption. If \(r_\sigma\leq r_\rho\leq r_\omega\), then 56 takes the following form: \[\begin{align} &d^2_{z,2}(\rho,\sigma)+d^2_{z,2}(\sigma,\omega)-d^2_{z,2}(\rho,\omega)=\sqrt{1-r_\sigma^2\vphantom{r_\rho}}-\sqrt{1-r_\rho^2}\\ +&\sqrt{1-r_\sigma^2\vphantom{r_\rho}}-\sqrt{1-r_\omega^2\vphantom{r_\rho}}-\sqrt{1-r_\rho^2}+\sqrt{1-r_\omega^2\vphantom{r_\rho}}=2\left(\sqrt{1-r_\sigma^2\vphantom{r_\rho}}-\sqrt{1-r_\rho^2}\right), \end{align}\] which is nonnegative by assumption. The other three cases of the order of \(r_\rho, r_\sigma, r_\omega\) can be transformed into either one of the above using the fact that 56 is symmetric in \(r_\rho\) and \(r_\omega\). Thus 56 is nonnegative which completes the proof. ◻
We conclude this section by a simple computation which is an ingredient of the proof of Proposition 2, and also a sanity check showing that one gets back the classical optimal transportation problem when both states involved commute with the observable generating the transport cost.
Proposition 10. Let \[\label{eq:rho95alpha95z}\begin{align} \rho(\alpha):=\frac{1}{2}\left(I+\alpha\sigma_z\right) =\frac{1}{2}\left[ \begin{array}{cc} 1+\alpha & 0 \\ 0 & 1-\alpha \end{array} \right], \end{align}\qquad{(17)}\] then \[\label{eq:D95z95comm}\begin{align} D^p_{z,p}(\rho(\alpha),\rho(\beta))=2^{p-1}\left| \alpha-\beta \right|. \end{align}\qquad{(18)}\]
Proof. Consider \[\begin{align} \Pi(\alpha,\beta):=\frac{1}{2}\left[ \begin{array}{cccc} 1+\min(\alpha,\beta) & 0 & 0 & 0\\ 0 & \max(\beta-\alpha,0) & 0 & 0\\ 0 & 0 & \max(\alpha-\beta,0) & 0\\ 0 & 0 & 0 & (1-\max(\alpha,\beta))\end{array} \right]. \end{align}\] \(\Pi(\alpha,\beta)\) is clearly a coupling of \(\rho(\alpha)\) and \(\rho(\beta)\) and thus \[\begin{align} \label{eq:D95z95upperboundz} D^p_{z,p}(\rho(\alpha),\rho(\beta))&\leq \mathrm{tr}\left[ C_{z,p}\Pi(\alpha,\beta) \right]=2^{p-1}\left| \alpha-\beta \right|. \end{align}\tag{57}\] Now consider \[\begin{align} X=\left[ \begin{array}{cc}2^p & 0 \\ 0 & 0 \end{array} \right],\quad Y=-X=\left[ \begin{array}{cc} -2^{p} & 0 \\ 0 & 0 \end{array} \right]. \end{align}\] Clearly, \(X\) and \(Y\) are self-adjoint. It is also evident that \[\begin{align} C_{z,p}-Y\otimes I^T-I\otimes X^T= \left[ \begin{array}{cccc} 0 & 0 & 0 & 0\\ 0 & 2^{p+1} & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\end{array} \right] \geq 0, \end{align}\] and similarly \[\begin{align} C_{z,p}-X\otimes I^T-I\otimes Y^T= \left[ \begin{array}{cccc} 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 2^{p+1} & 0\\ 0 & 0 & 0 & 0\end{array} \right]\geq 0. \end{align}\] It follows that \[\begin{align} \label{eq:D95z95lowerboundz} D^p_{z,p}(\rho(\alpha),\rho(\beta))&\geq \max\left(\mathrm{tr}\left[ X\left(\rho(\alpha)-\rho(\beta)\right) \right],\mathrm{Tr}\left[ X\left(\rho(\beta)-\rho(\alpha)\right) \right]\right)=2^{p-1}\left| \alpha-\beta \right|. \end{align}\tag{58}\] ◻
The following is an immediate corollary.
Corollary 5. Let \[\begin{align} \rho(\alpha):=\frac{1}{2}\left(I+\alpha\sigma_z\right)=\frac{1}{2}\begin{pmatrix}1+\alpha & 0 \\ 0 & 1-\alpha \end{pmatrix}, \end{align}\] then \[\begin{align} d^2_{z,2}(\rho(\alpha),\rho(\beta))=D^2_{z,2}(\rho(\alpha),\rho(\beta))=2\left| \alpha-\beta \right|, \text{ and } D^p_{z,p}(\rho(\alpha),\rho(\beta))=2^{p-1} \left| \alpha-\beta \right| \text{ for } p \geq 1. \end{align}\]