July 05, 2024
Spectral properties of bounded linear operators play a crucial role in several areas of mathematics and physics. For each self-adjoint, trace-class operator \(O\) we define a set \(\Lambda_n\subset \mathbb{R}\), and we show that it converges to the spectrum of \(O\) in the Hausdorff metric under mild conditions. Our set \(\Lambda_n\) only depends on the first \(n\) moments of \(O\). We show that it can be effectively calculated for physically relevant operators, and it approximates the spectrum well without diagonalization.
We prove that using the above method we can converge to the minimal and maximal eigenvalues with super-exponential speed.
We also construct monotone increasing lower bounds \(q_n\) for the minimal eigenvalue (or decreasing upper bounds for the maximal eigenvalue). This sequence only depends on the moments of \(O\) and a concrete upper estimate of its \(1\)-norm; we also demonstrate that \(q_n\) can be effectively calculated for a large class of physically relevant operators. This rigorous lower bound \(q_n\) tends to the minimal eigenvalue with super-exponential speed provided that \(O\) is not positive semidefinite. As a by-product, we obtain computable upper bounds for the \(1\)-norm of \(O\), too.
Numerical examples demonstrate the relevance of our approximation in estimating entropy and negativity, which is useful, among others, in quantum optical and in open quantum system models. The results can be directly applicable to problems in quantum information, statistical mechanics, and quantum thermodynamics, where using traditional techniques based on diagonalization is impractical.
Self-adjoint trace-class linear operators on infinite dimensional separable Hilbert spaces played an inevitable role in the foundation of quantum mechanics [1]–[5]. In particular, density operators [1] have found many applications in quantum chemistry, statistical mechanics, quantum optics, quantum information theory, and the theory of open quantum systems [6]–[13]. These operators occur in mathematical physics [14]–[16], and they recently appeared in the modern physical description of our universe [17], [18].
Finding the spectrum of linear operators, or deciding whether they are at least positive semidefinite is an important and delicate problem: positivity preservation during the time evolution is a crucial question in the dynamics of open quantum systems [19]–[24]. Quantum entanglement problems still pose hard challenges to mathematicians and theoretical physicists [25]–[28]; they are also intimately connected to positivity via the Peres–Horodecki criterion [27], [29]. Describing entanglement in a quantitative way is a very important task in the theory of quantum computing, and quantum communication. Therefore, it is useful to know the magnitude of the negativity1 of a density operator after the partial transpose [30]–[33]. While the spectrum can be calculated relatively easily in finite dimensional Hilbert spaces [34]–[37], for infinite dimensional non-Gaussian operators it is still a hard problem [38]–[41]. Infinite dimensional trace-class integral operators arise in the modeling of quantum optical channels, the theory of distributed quantum communication, and the quantum internet [33], [42]–[44].
The spectrum of a trace-class operator determines a lot of important quantities, arguably the most important and difficult ones are the entropies, for example the Rényi or von Neumann entropy, see [45]. Entropy is a crucial notion in the areas of quantum information theory [46], quantum communication [42], quantum computation [43], quantum cryptography [47], quantum technology of entanglement distillation and noisy quantum channels [48] just to name a few. The eigenvalues and kernels of self-adjoint trace-class integral operators has also many applications in the area of machine learning, see e. g. [49] and references therein. As the eigenvalues and eigenvectors of a density operator encode all information about the physical subsystem it describes [1], [50], [51], approximating the spectrum has far-reaching physical applications and mathematical relevance.
One of the main challenges in quantum technology is to filter out quantum noise stemming from the imprecise quantum physical measurements and other external circumstances [52]. Mathematical and physical properties of quantum noise and quantum decoherence are only understood in simple models so far [53], [54]. Understanding the dynamics of entanglement between two coupled Brownian oscillators (considered as an open quantum system) is a notoriously hard problem [55]. Here time evolution is described by master equations which are very hard to solve analytically, and the problem is usually formalized in infinite dimensional Hilbert spaces, which also adds to the challenge. The Gaussian case can be solved exactly under certain conditions [55], but the non-Gaussian case is still mostly unexplored. Here approximating the eigenvalues yields estimate for the fidelity [46] and for various quantum entropies [56].
The maximal eigenvalue also has a meaning in statistical physics as follows. We consider a Hamiltonian operator \(\hat{H}\) of a physical system, whose energy spectrum is supposed to have a lower bound. The eigenstate with the lowest energy is called the ground state. In many cases, after a physical measurement, the examined physical system can be found in the ground state with larger probability than in any other state. Also, the largest probability equals the maximal eigenvalue of the statistical physical density operator. For example, in case of canonical ensembles in statistical quantum physics, the equilibrium state is characterized by a density operator \(\hat{\rho}=e^{-\beta \hat{H}}/\mathop{\mathrm{Tr}}\big\{e^{-\beta \hat{H}}\big\}\), see [57]. Here the maximal eigenvalue equals the probability of being in the ground state; this probability is an important quantity e. g. in the quantum thermodynamic description of physical systems [58].
The minimal eigenvalue of a self-adjoint operator, if negative, is a quantitative measure how far the operator is from being positive semidefinite.
Estimating the Schatten–von Neumann \(1\)-norm of trace-class integral operators plays an important role in the mathematics of anharmonic oscillators [59] and areas of harmonic analysis related to physics [60].
One of our main examples will be polynomial Gaussian operators, see Definition 1. For the physical relevance of polynomial Gaussian operators and for qualitative results about their positivity the readers might consult [61].
Self-adjoint trace-class operators are not only used in physics, but in related fields as well [62]. The kernels of these integral operators are given by polynomial times Gaussian kernels, which occur, for example, in general quantum physics (see, for example, the article [63] and the references therein), quantum optics [64], physics of open quantum systems [65], [66], in nonequilibrium statistical physics [67], high energy nuclear physics [68], and at last, but not least machine learning and related applications [69].
Finally, we believe that our results can be extended to more general cases, and thus we can get closer to answering many theoretical and experimental physics questions. For example, an interesting problem in case of infinite dimensional Hilbert spaces is to calculate the entanglement related quantities for real physical systems [66]. The method of diagonalization is often too involved to extract spectral quantities or it might not work at all. Instead, we provide a more general method for which it is enough to calculate the different moments of our operator.
Our results offer a powerful and efficient alternative to traditional spectral methods in various areas of physics, most importantly quantum physics. In particular, they open the way for computing quantum entropies and entanglement measures [70], [71] in systems where diagonalization does not work for theoretical or practical reasons; this includes non-Gaussian infinite dimensional models that arise naturally in quantum optics and open quantum systems.
In this subsection we examine a system comprising two coupled harmonic oscillators with different masses. The Hamiltonian operator for this system is expressed as \[\hat{H}_\text{sys}=\frac{\hat{P}_1^2}{2m_1}+\frac{1}{2}m_1\Omega^2\hat{x}_1^2+\frac{\hat{P}_2^2}{2m_2}+\frac{1}{2}m_2\Omega^2\hat{x}_2^2+\kappa(\hat{x}_1-\hat{x}_2)^2, \label{Hamiltonian}\tag{1}\] where \(\hat{x}_i\) and \(\hat{P}_i\) are position and momentum operators for the mechanical oscillators with masses \(m_1\) and \(m_2\) which oscillate at the same frequency \(\Omega\), respectively. The coupling constant between the two oscillators is denoted by \(\kappa\). We can introduce the following notations for quantities interpreted in center of mass and relative coordinate systems \[\begin{align} \hat{X}:=&\frac{m_1 \hat{x}_1 + m_2 \hat{x}_2}{m_1+m_2},~ \hat{P}:=\hat{P}_1+\hat{P}_2,~M:=m_1+m_2 \\ \hat{x}:=&\hat{x}_1-\hat{x}_2,~\hat{p}:=\frac{m_2}{m_1+m_2}\hat{P}_1-\frac{m_1}{m_1+m_2}\hat{P}_2,~\mu:=\frac{m_1 m_2}{m_1+m_2}, \end{align}\] thus, the Hamiltonian with the new operators has the form \[\hat{H}_{\text{sys}}=\frac{\hat{P}^2}{2M}+\frac{1}{2}M\Omega^2\hat{X}^2+\frac{\hat{p}^2}{2\mu}+\frac{1}{2}\mu\Omega^2\hat{x}^2+\kappa\hat{x}^2. \label{fvswghdl}\tag{2}\] The state of this system evolves according to the Neumann equation \[\frac{d\hat{\rho}_{\text{sys}}(t)}{dt}=-\frac{i}{\hbar}\left [\hat{H}_\text{sys},\hat{\rho}_{\text{sys}}(t)\right].\] In the next step we consider two representations \[\begin{align} \rho_o(\Delta_1,k_1,\Delta_2,k_2, t)&=\mathrm{Tr} \big\{\hat{\rho}(t) \exp\{ik_1 \hat{x}_1- i \Delta_1 \hat{P}_1+ik_2 \hat{x}_2- i \Delta_2 \hat{P}_2\} \big\} \text{ and } \\ \rho_c(\Delta,K,\delta,k, t)&= \mathrm{Tr} \big\{\hat{\rho}(t) \exp\{ik \hat{x}- i \delta \hat{p}+iK \hat{X}- i \Delta \hat{P}\} \big\}, \end{align}\] where \(\rho_o\) and \(\rho_c\) are the characteristic functions in the original variables and in the center of mass and relative variables, respectively. Here we have \[\begin{align} \label{eq:transformation951} K&:=&k_1+k_2, \quad k:=\frac{m_2}{m_1+m_2}k_1-\frac{m_1}{m_1+m_2}k_2, \nonumber\\ \Delta &:=&\frac{m_1 \Delta_1+m_2\Delta_2}{m_1+m_2}, \quad \delta:=\Delta_1-\Delta_2, \nonumber \end{align}\tag{3}\] and finally, after the necessary transformations and calculations we get the equation for the time evolution of the characteristic function \[\label{eq:master95equation} \frac{\partial}{\partial t} \rho_c(\mathbf{w},t) = \biggl[-\frac{\hbar K}{M} \frac{\partial}{\partial \Delta}-\frac{\hbar k}{\mu} \frac{\partial}{\partial \delta}+ \frac{M \Omega^2}{\hbar} \Delta\frac{\partial}{\partial K}+ \frac{\mu\Omega^2+2\kappa}{\hbar} \delta \frac{\partial}{\partial k} \biggr]\rho_c(\mathbf{w},t),\tag{4}\] where \[\mathbf{w}^\top=(\Delta, K, \delta, k).\] Eq. 4 must be solved with initial condition that at \(t=0\) the \(\rho\) phase space function is given by a prescribed \(\rho_{c_0}\) function \[\label{eq:initial95value} \rho_c(\mathbf{w},0)= \rho_{c_0} (\mathbf{w}) .\tag{5}\] We note that \(\mathrm{Tr}\,\hat{\rho}=1\) requires \[\rho_{c_0} (\mathbf{w}=\mathbf{0})=1.\] Let us introduce the following quantity: \[\label{eq:32new95frequencies} \quad \omega^2_d:=\Omega^2+\frac{2 \kappa}{\mu},\tag{6}\] then 4 can be written as \[\frac{\partial \rho_c}{\partial t} +\frac{\hbar K}{M}\frac{\partial \rho_c}{\partial \Delta}-\frac{M \Omega^2 \Delta}{\hbar} \frac{\partial \rho_c}{\partial K}+\frac{\hbar k}{\mu}\frac{\partial \rho_c}{\partial \delta}-\frac{\mu\omega_d^2 \delta}{\hbar} \frac{\partial \rho_c}{\partial k}=0. \label{eq:master95eq95rewritten}\tag{7}\] The left-hand side is a total derivative if one identifies the following equations for the characteristics: \[\frac{ d\Delta}{dt}=\frac{\hbar K}{M}, \quad \frac{ d K}{dt}=-\frac{M\Omega^2 \Delta}{\hbar}, \quad \frac{ d\delta}{dt}= \frac{\hbar k}{\mu}, \quad \frac{ d k}{dt}=-\frac{\mu\omega_d^2 \delta}{\hbar}. \label{eq:ODE95for95characterteristics}\tag{8}\] The equations (8 ) are linear in the phase-space variables with a constant coefficient matrix \[\label{eq:characteristic95equation} \frac{d}{dt} \begin {pmatrix}\Delta\\ \noalign{\medskip}K \\ \noalign{\medskip}\delta\\ \noalign{\medskip}k\end {pmatrix}= \begin {pmatrix} 0 &\frac{\hbar}{M}&0&0\\ \noalign{\medskip} -\frac{M{\Omega}^{2}}{\hbar}&0&0&0\\ \noalign{\medskip}0&0&0&\frac{ \hbar}{\mu} \\ \noalign{\medskip}0&0&-\frac{\mu{\omega_d}^{2}}{\hbar}&0 \end {pmatrix} \begin {pmatrix}\Delta\\ \noalign{\medskip}K \\ \noalign{\medskip}\delta\\ \noalign{\medskip}k\end {pmatrix}\equiv \mathbf{M} \cdot\begin{pmatrix} \Delta \\K \\ \delta \\k \end{pmatrix}.\tag{9}\] The solution of 9 is \[\mathbf{w}(t)=\exp{(\mathbf{M} t)}\cdot \mathbf{w}(0). \label{eq:evolution95of32characteristics}\tag{10}\] It can be checked that the solution of 4 is \[\label{eq:solution} \rho_c(\mathbf{w},t)=\rho_{c0}\Bigl(\exp(-\mathbf{M} t) \cdot \mathbf{w}\Bigr).\tag{11}\] The initial condition is trivially fulfilled.
We note that the characteristic function can be obtained as the Fourier transform of the Wigner function in all of its phase-space variables. Consequently, the solution also preserves the mathematical structure of the initial density operator in the position representation as well. So, if we start the time evolution of a density operator from a polynomial Gaussian form, it will remains in this form at all times, but with time-dependent parameters. After partial transposition, our approximation method described above can provide a good estimate of the logarithmic negativity even for non-trivial initial conditions. We can only mention as a non-trivial example that the mutual information between individual physical subsystems can be calculated for more general operator families with external noises, as in the article [66].
Let \(\mathcal{H}\) be a separable, complex Hilbert space and let \(O\colon \mathcal{H}\to \mathcal{H}\) be a self-adjoint, trace-class linear operator. Then \(O\) has only countably many real eigenvalues, let us enumerate them with multiplicities as \(\{\lambda_i\}_{i\geq 0}\). As \(O\) is trace-class, we have \(\sum_{i\geq 0} |\lambda_i|<\infty\). If there are only finitely many eigenvalues \(\lambda_0,\dots,\lambda_m\) then define \(\lambda_i=0\) for each integer \(i>m\).
Following [21] we define symmetric functions of the eigenvalues \(e_k\) such that \(e_0=1\) and for \(k\geq 1\) we have \[e_k\stackrel{\text{def}}{=}\sum_{0\leq i_1<\dots<i_k} \lambda_{i_1}\cdots \lambda_{i_k};\] we can calculate \(e_k\) from the moments of \(O\), see Subsection 3.1. In Section 3 we introduce the finite set \(\Lambda_n\subset \mathbb{R}\) as \[\Lambda_n\stackrel{\text{def}}{=}\left\{-\frac{1}{x}: x\in \mathbb{R}\text{ and } \sum_{k=0}^n e_kx^k=0\right\}.\] In order to find \(\Lambda_n\), we only need to calculate the above mentioned quantities \(e_k\) for \(k=0,\dots,n\). We prove in Theorem 2 that \(\Lambda_n\) converges to the closure of the set of non-zero eigenvalues in the Hausdorff metric, see Subsection 3.4. In Subsection 3.2 we demonstrate by a Gaussian example that \(\Lambda_n\) is very close to the set of eigenvalues even for \(n=10\); in this case the exact spectrum is known and can be compared to \(\Lambda_n\). In Subsection 3.3 we calculate \(\Lambda_{10}\) for a non-Gaussian example as well, where the exact spectrum is unknown. In subsection [sec:subsec:32Poly-Gauss] we calculate \(\Lambda_{15}\) for a positive semidefinite poly-Gaussian example, where not only the spectrum but also the von Neumann entropy were approximated in a special case of polynomial Gaussian operators [61].
In the latter sections we approximate the extremal eigenvalues and verify faster convergence. We only consider the case of the minimal eigenvalue, since replacing the operator \(O\) with \(-O\) handles the maximal eigenvalue as well. Let us define \[\lambda_{\min}\stackrel{\text{def}}{=}\min\{0,\lambda_i: i\geq 0\},\] which is zero if \(O\) is positive semidefinite, and equals to the minimal eigenvalue2 smaller than zero otherwise.
In Section 4 we approximate \(\lambda_{\min}\) by the sequence \(\{q_{n,0}\}_{n\geq 1}\), where \[q_{n,0}\stackrel{\text{def}}{=}\frac{-1}{\min \left\{x>0: \sum_{k=0}^n e_kx^k=0\right\}}.\] In Theorem 4 we prove that \(q_{n,0}\to \lambda_{\min}\) with super-exponential speed: \[|q_{n,0}-\lambda_{\min}|\leq \exp\left[-n \log(n)+\mathcal{O}(n)\right]\] under the slight technical condition that \(\lambda_{\min}\) is negative and has multiplicity one.
In Section 6 we assume that \(c>0\) is a number satisfying \[\label{eq:cc} \sum_{i\geq 0} |\lambda_i|\leq c.\tag{12}\] Then we can construct and calculate a sequence \(\{q_{n,c}\}_{n\geq 0}\) such that
(i) \(q_{n,c}\) is monotone increasing and \(q_{n,c}\to \lambda_{\min}\) as \(n\to \infty\);
(ii) \(q_{n,c}\approx -\frac{c}{n}\) if \(O\) is positive semidefinite;
(iii) \(q_{n,c}\to \lambda_{\min}\) with super-exponential speed if \(O\) is not positive semidefinite.
In Section 5 we demonstrate that we can effectively find values \(c\) according to 12 for a large class of integral operators having polynomial Gaussian kernels, see Definition 1 for the precise notion. Finding an upper bound for the Schatten–von Neumann \(1\)-norm \(\sum_{i\geq 0} |\lambda_i|\) is a hard problem in itself. We do this by writing \(O\) as the product of two, typically not self-adjoint Hilbert–Schmidt integral operators in different ways, and we apply Hölder’s inequality 34 ; finally, we are even able to optimize the result. This upper bound gives an automatic lower bound for the minimal eigenvalue and an upper bound for the negativity as well, see inequality 38 and its practical calculation in 50 . In Subsections 5.2 and 5.3 we show this method in case of a Gaussian kernel, and a Gaussian kernel multiplied by a quadratic polynomial, respectively.
In Subsection 6.1 we define \(q_{n,c}\) using the above defined quantities \(e_k\) and \(c\). We prove [i:q1] in Theorem 6. This means that the sequence \(q_{n,c}\) is monotone increasing and converges to \(\lambda_{\min}\) no matter what the value of \(c\) is, getting better and better, rigorous lower bounds during the process. We show [i:q2] with precise lower and upper bounds in Theorem 7, implying basically that \(q_{n,c}\) depends on \(c\) in a linear way if \(O\) is positive semidefinite. For [i:q3] see Theorem 8, where we prove a super-exponential speed of convergence \[|q_{n,c}-\lambda_{\min}|\leq \exp\left[-\frac{|\lambda_{\min}|}{c} n \log(n)+\mathcal{O}(n)\right]\] if \(O\) is not positive semidefinite. We also calculate and compare our estimates \(q_{n,0}\) and \(q_{n,c}\) for a concrete operator in Figure 3, obtaining that the sequence \(q_{n,0}\) converges extremely fast in practice.
Let \(\mathcal{H}\) be a separable, complex Hilbert space3 and let \(O\colon \mathcal{H}\to \mathcal{H}\) be a bounded linear operator. We say that \(\lambda\in \mathbb{C}\) is an eigenvalue of \(O\) if there exists a nonzero \(f\in \mathcal{H}\) such that \(Of=\lambda f\); then \(f\) is an eigenvector corresponding to \(\lambda\). The set of eigenvectors \[E_{\lambda}=\{f\in \mathcal{H}: Of=\lambda f\}\] is called the eigenspace corresponding to \(\lambda\). We call \(O\) self-adjoint if \(O^\dagger=O\), which implies that all eigenvalues are real. We say that \(O\) is positive semidefinite if \(\langle Of, f\rangle \geq 0\) for all \(f\in \mathcal{H}\). A positive semidefinite operator is always self-adjoint. We call \(O\) Hilbert–Schmidt if there exists an orthonormal basis \(\{f_i\}_{i\geq 0}\)4 of \(\mathcal{H}\) such that \(\sum_{i\geq 0} \langle Of_i, Of_i \rangle<\infty\). We say that \(O\) is trace-class5 if there is an orthonormal basis \(\{f_i\}_{i\geq 0}\) of \(\mathcal{H}\) such that \(\sum_{i\geq 0} |\langle |O|f_i, f_i \rangle |<\infty\), where \(|O|\stackrel{\text{def}}{=}\sqrt{O^{\dagger}O}\). Then the trace of \(O\) is defined as \[\mathop{\mathrm{Tr}}\{O\}\stackrel{\text{def}}{=}\sum_{i\geq 0} \langle Of_i, f_i \rangle,\] where the sum is absolutely convergent and independent of the orthonormal basis \(f_i\), see [72]. Now assume that \(O\) is self-adjoint and trace-class. Every trace-class operator is compact by [72]. As \(O\) is compact and self-adjoint, the spectral theorem [73] implies that it has countably many eigenvalues6 \(\{\lambda_i\}_{i\geq 0}\), and there exists an orthonormal basis consisting of eigenvectors. Choosing such a basis yields \[\mathop{\mathrm{Tr}}\{O\}=\sum_{i\geq 0} \lambda_i.\] The main example in this paper is \(\mathcal{H}=L^2(\mathbb{R}^{d})\), that is, the Hilbert space of the complex-valued square integrable functions defined on \(\mathbb{R}^d\) endowed with the scalar product \(\langle f,g \rangle=\int_{\mathbb{R}^d} f(\mathbf{x})g^{*}(\mathbf{x}) \, \mathrm{d} \mathbf{x}\), where \(z^{*}\) denotes the complex conjugate of \(z\). A kernel \(K \in L^2(\mathbb{R}^{2d})\) defines an integral operator \(\widehat{K}\colon L^2(\mathbb{R}^{d})\to L^2(\mathbb{R}^{d})\) by the formula \[\big( \widehat{K} f \big) (\mathbf{x}) = \int_{\mathbb{R}^{d}}\, K(\mathbf{x},\mathbf{y}) f(\mathbf{y})\, \mathrm{d}\mathbf{y}.\] The operator \(\widehat{K}\) is always Hilbert–Schmidt, so compact, see e. g. [72]. Thus \(\widehat{K}\) has countably many eigenvalues \(\{\lambda_i\}_{i\geq 0}\), and each eigenvalue \(\lambda\) satisfies a Fredholm integral equation \[\int_{\mathbb{R}^d} \, K(\mathbf{x},\mathbf{y}) f(\mathbf{y})\,\mathrm{d}\mathbf{y}=\lambda f(\mathbf{x})\] with some nonzero functions \(f\in L^2(\mathbb{R}^d)\). If the kernel \(K\) is continuous, then \(\widehat{K}\) is self-adjoint if and only if \(K(\mathbf{y},\mathbf{x})=K^*(\mathbf{x},\mathbf{y})\) for all \(\mathbf{x},\mathbf{y}\in \mathbb{R}^d\). It is a trickier question when \(\widehat{K}\) becomes trace-class; somewhat surprisingly adding the natural-looking condition \(K\in L^1(\mathbb{R}^{2d})\) is not sufficient, see [72]. The Schwartz space \(\mathcal{S}(\mathbb{R}^d)\) is the set of smooth functions \(f\colon \mathbb{R}^d\to \mathbb{C}\) with rapidly decreasing mixed partial derivatives, see e. g. [74]. If \(K\in \mathcal{S}(\mathbb{R}^{2d})\), then \(\widehat{K}\) is trace-class, see [75] or [76] with the remark afterwards. If \(K\) is continuous and \(\widehat{K}\) is trace-class, then [76], [77] yield the useful formula \[\mathop{\mathrm{Tr}}\big\{\widehat{K}\big\}=\int_{\mathbb{R}^d} K(\mathbf{x},\mathbf{x}) \, \mathrm{d}\mathbf{x}.\] Note that \(\widehat{K}\) is positive semidefinite if and only if all of its eigenvalues are greater than or equal to zero, which is equivalent to \[\int_{\mathbb{R}^{d}} \int_{\mathbb{R}^{d}} K(\mathbf{x},\mathbf{y}) f(\mathbf{x})f^{*}(\mathbf{y}) \, \mathrm{d} \mathbf{x} \, \mathrm{d} \mathbf{y} \geq 0 \quad \text{for all } f \in L^2(\mathbb{R}^d).\] For more on Hilbert–Schmidt and trace-class operators see e. g. [72], [75], [78].
We say that \(\widehat{\rho}\) is a density operator if it is positive semidefinite with \(\mathop{\mathrm{Tr}}\big\{\widehat{\rho}\big\}=1\), that is, we have \(\lambda_i \geq 0\) and \(\sum_{i\geq 0} \lambda_i=1\).
Definition 1. A Gaussian kernel \(K_G\in L^2(\mathbb{R}^{2d})\) is of the form \[K_G(\mathbf{x},\mathbf{y})=\exp\left\{-\mathbf{r}^{\top}\mathbf{M}\mathbf{r}- \mathbf{V}^{\top}\mathbf{r} + F\right\},\] where \(\mathbf{r}=(\mathbf{x},\mathbf{y})^{\top}\), and \(\mathbf{M}=\mathbf{M}_1+i\mathbf{M}_2\in \mathbb{C}^{2d\times 2d}\) is a symmetric complex matrix such that \(\mathbf{M}_1,\mathbf{M}_2\in \mathbb{R}^{2d\times 2d}\) and \(\mathbf{M}_1\) is positive definite, \(\mathbf{V}\in \mathbb{C}^{2d}\) is a complex vector, and \(F\in \mathbb{C}\) is a complex number. In particular, a self-adjoint Gaussian kernel \(K_G\in L^2(\mathbb{R}^{2d})\) is of the form \[\begin{align} K_G(\mathbf{x},\mathbf{y})=&\exp \left\{-(\mathbf{x}-\mathbf{y})^{\top} \mathbf{A} (\mathbf{x}-\mathbf{y}) -i(\mathbf{x}-\mathbf{y})^{\top} \mathbf{B} (\mathbf{x}+\mathbf{y})\right. \\ &\quad \quad \left. -(\mathbf{x}+\mathbf{y})^{\top} \mathbf{C}(\mathbf{x}+\mathbf{y})-i\mathbf{D}^{\top}(\mathbf{x}-\mathbf{y})-\mathbf{E}^{\top}(\mathbf{x}+\mathbf{y})+F\right\}, \end{align}\] where \(\mathbf{A},\mathbf{B},\mathbf{C}\in \mathbb{R}^{d\times d}\) are real matrices such that \(\mathbf{A},\mathbf{C}\) are symmetric and positive definite, \(\mathbf{E},\mathbf{D}\in \mathbb{R}^d\), and \(F\in \mathbb{R}\). We call \(K\in L^2(\mathbb{R}^{2d})\) a polynomial Gaussian kernel if \[\label{eq:poly} K(\mathbf{x},\mathbf{y})=P(\mathbf{x},\mathbf{y})K_G(\mathbf{x},\mathbf{y}),\tag{13}\] where \(P\) is a self-adjoint polynomial in \(2d\) variables, and \(K_G\in L^2(\mathbb{R}^{2d})\) is a self-adjoint Gaussian kernel.
Recall that we use the following notation throughout the paper. Let \(O\colon \mathcal{H}\to \mathcal{H}\) be a self-adjoint, trace-class linear operator acting on a separable, complex Hilbert space \(\mathcal{H}\). Then \(O\) has only countably many real eigenvalues, let us enumerate them with multiplicities as \(\{\lambda_i\}_{i\geq 0}\). As \(O\) is trace-class, we have \(\sum_{i\geq 0} |\lambda_i|<\infty\). If there are only finitely many eigenvalues \(\lambda_0,\dots,\lambda_m\) then let \(\lambda_i=0\) for each integer \(i>m\).
Following [21] let \(e_0=1\) and for positive integers \(k\geq 1\) define \[e_k\stackrel{\text{def}}{=}\sum_{0\leq i_1<\dots<i_k} \lambda_{i_1}\cdots \lambda_{i_k}.\] It might be standard that the quantities \(e_k\) are intimately connected to the positivity of the operator \(O\), see [21] for an exact reference.
Claim 1. The operator \(O\) is positive semidefinite \(\Longleftrightarrow\) \(e_k\geq 0\) for all \(k\geq 1\).
Let \(K\in L^2(\mathbb{R}^{2d})\) be a self-adjoint kernel. By the Plemelj–Smithies formulas7 we obtain that the values \(e_k\) corresponding to the operator \(\widehat{K}\) can be calculated by \[e_k =\frac{1}{k!}\left|\begin{array}{ccccc} M_1 & 1 & 0 & \cdots \\ M_2 & M_1 & 2 & 0 & \cdots \\ \vdots & & \ddots & \ddots \\ M_{k-1} & M_{k-2} & \cdots & M_1 & k-1 \\ M_k & M_{k-1} & \cdots & M_2 & M_1 \end{array}\right|, \label{eq:32e95k-k}\tag{14}\] where the moments \(\{M_\ell\}_{1\leq \ell\leq k}\) are defined as \[M_\ell \stackrel{\text{def}}{=}\sum_{i=0}^{\infty}\lambda_i^\ell=\mathop{\mathrm{Tr}}\big\{\widehat{K}^\ell\big\} =\int_{\mathbb{R}^{\ell d}} \, \left[K(\mathbf{x}_\ell,\mathbf{x}_1) \prod_{i=1}^{\ell-1} K(\mathbf{x}_i,\mathbf{x}_{i+1}) \right]\, \mathrm{d}\mathbf{x}_1\cdots \mathrm{d}\mathbf{x}_\ell. \label{eq:32M95l-ek}\tag{15}\] Also, it was demonstrated in [21] that this calculation is effective for polynomial Gaussian kernels according to Definition 1.
Define \(g\colon \mathbb{R}\to \mathbb{R}\) as \[\label{eq:g40x41} g(x)\stackrel{\text{def}}{=}\prod_{i=0}^{\infty} (1+\lambda_i x).\tag{16}\] By [79] we obtain that \(g(x)\) is finite for all \(x\in \mathbb{R}\) and \[\label{eq:g} g(x)=\sum_{k=0}^{\infty} e_k x^k,\tag{17}\] see also [21] for an elementary proof. Taking the algebraic expansion of \(\left(\sum_{i=0}^{\infty} |\lambda_i|\right)^k\) yields that the sequence \(e_k\) rapidly converges to \(0\). More precisely, we obtain the following [79]: \[\label{eq:ek} |e_k|\leq \frac{\left(\sum_{i=0}^{\infty}|\lambda_i| \right)^k}{k!} \quad \textrm{for all } k\geq 1.\tag{18}\]
Definition 2. For each integer \(n\geq 1\) define the set \(\Lambda_n\subset \mathbb{R}\) as \[\Lambda_n= \left\{-\frac{1}{x}: x\in \mathbb{R}\text{ and } g_n(x)=0\right\}, \quad \text{where } g_n(x)=\sum_{k=0}^n e_kx^k.\]
The main idea of this section is that \(\Lambda_n\) will approximate the closure of the set of non-zero eigenvalues given by \[\left\{-\frac{1}{x}: x\in \mathbb{R}\text{ and } g(x)=\sum_{k=0}^{\infty} e_k x^k=0\right\}.\] The connection between the two sets is that the polynomial \(g_n\) is a truncation of the power series \(g\). The more precise statement is based on the concept of Hausdorff metric, see Subsection 3.4 for more details.
First we apply our method for a Gaussian operator, where our approximation \(\Lambda_n\) can be directly compared to the exact spectrum. A self-adjoint Gaussian kernel \(K_G\colon \mathbb{R}^2 \to \mathbb{C}\) with trace \(1\) is of the form \[\begin{align} \begin{aligned} K_G(x,y)=N_0\exp&\left\{-A(x-y)^2-iB(x^2-y^2)-C(x+y)^2 \right.\\ &~ \left. -iD(x-y)-E(x+y)\right\}, \label{eq:rho95G} \end{aligned} \end{align}\tag{19}\] with real parameters \(A>0\), \(C>0\), \(B\), \(D\), \(E\), where \(N_0=2\sqrt{\frac{C}{\pi}} \exp \left[-\frac{E^2}{4C}\right]\) is a normalizing factor ensuring that \(\mathop{\mathrm{Tr}}\big\{\widehat{K}_G\big\}\)=1. Let \(\beta=\frac{\sqrt{A}-\sqrt{C}}{\sqrt{A}+\sqrt{C}}\), then the eigenvalues are \(\lambda_n=(1-\beta)\beta^n\) where \(n\geq 0\), and the quantities \(e_k\) can be calculated as \[e_k=\frac{(1-\beta)^k \beta^{\frac{k(k-1)}{2}}}{\prod_{i=1}^k (1-\beta^i)}. \label{eq:gaussian95e95k}\tag{20}\] In Table 1 we considered the parameters \(A=1\), \(C=4\), \(B=D=E=0\).
\(k\) | \(\lambda_{k}\) | \(\lambda^{(10)}_{k}\) | \(r_{k}\) |
---|---|---|---|
\(0\) | \(1.333333333\) | \(1.333333333\) | \(3.86624\cdot 10^{-27}\) |
\(1\) | \(-0.4444444444\) | \(-0.4444444444\) | \(1.7122\cdot 10^{-22}\) |
\(2\) | \(0.1481481481\) | \(0.1481481481\) | \(3.79164\cdot 10^{-18}\) |
\(3\) | \(- 0.04938271605\) | \(- 0.04938271605\) | \(2.39836\cdot 10^{-14}\) |
\(4\) | \(0.01646090535\) | \(0.01646090536\) | \(5.31401\cdot 10^{-11}\) |
\(5\) | \(- 0.00548696845\) | \(- 0.00548696824\) | \(3.85098 \cdot10^{-8}\) |
\(6\) | \(0.001828989483\) | \(0.00182897225\) | \(9.42254\cdot 10^{-6}\) |
\(7\) | \(- 0.00060966316\) | \(- 0.00061011957\) | \(0.000748632\) |
\(8\) | \(0.00020322\) | \(0.0002074\) | \(0.0204092\) |
\(9\) | \(- 0.00006774\) | \(- 0.00005448\) | \(0.195762\) |
Now we test our estimate in case of an integral operator whose kernel is a quadratic polynomial multiplied by a Gaussian. We will show that determining the approximate eigenvalues for positive semidefinite kernels (i.e., for density operators) according to Definition 2 allows us to accurately estimate the von Neumann entropy \[S= - \sum^{\infty}_{i=0}\lambda_i \log{\lambda_i}\] by \[S^{(n)}= - \sum_{i}\lambda^{(n)}_i \log{\lambda^{(n)}_i},\] where \(\lambda^{(n)}_i\) denotes the \(i\)th eigenvalue in the \(n\)th approximation. Note that the polynomial \(g_n(x)\) of order \(n\) in Definition 2 is the truncated version of the power series \(g(x)\) from Eq. 17 . Here we set the Boltzmann constant to be \(k_B=1\). For non-positive kernels we will also calculate the approximate Schatten–von Neumann norm by \[\| \hat{\rho} \|_1 \approx \sum_{i} |\lambda^{(n)}_i|.\]
Recall that a polynomial Gaussian kernel \(K\colon \mathbb{R}^2\to \mathbb{C}\) is given by \[K(x,y)=P(x,y) K_G(x,y),\] where \(K_G(x,y)\) is a self-adjoint Gaussian kernel with trace \(1\) of the form 19 with real parameters \(A,B,C,D,E\) such that \(A,C>0\); in our example \(P(x,y)\) is a self-adjoint quadratic polynomial given by \[\begin{align} P(x,y)=\frac{1}{N} &\left(A_P (x-y)^2+i B_P(x^2-y^2) + C_P (x+y)^2 \right. \\ &\, \left. +i D_P(x-y)+E_P(x+y) +F_P\right), \end{align}\] with real parameters \(A_P,B_P, C_P, D_P,E_P,F_P\) and normalizing factor \[N=F_P + \frac{C_P-E_P E}{2C}+\frac{C_P E^2}{4 C^2}.\]
Tables [t:tab95polynomial95gauss95for95etropy] and [t:tab95polynomial95gauss] show data for a positive semidefinite kernel, and a non-positive kernel, respectively. Tables [t:tab95polynomial95gauss95for95etropy] and [t:tab95polynomial95gauss] indicate data for the von Neumann entropy and the Schatten–von Neumann norm for parameters shown in the captions. One can also compare these values with the more precise values obtained from direct diagonalization (See subsection 6.2). The numbers of diagonalization in the captions of Tables [t:tab95polynomial95gauss95for95etropy] and [t:tab95polynomial95gauss] clearly confirm correctness of the method and the fast convergence to the asymptotics. This fact is a very useful property in practice. These numerical results explicitly demonstrate that our method can accurately approximate key quantum informational quantities such as the von Neumann entropy and logarithmic negativity, even in non-Gaussian, physically motivated scenarios. This confirms the practical relevance of our approach in real-world quantum technologies, especially in continuous variable and noisy quantum systems. The quantities \(e_k\) were calculated with Mathematica.
\[\begin{array}{|c||c|c|c|c|c|c|} \hline n & \lambda_1 & \lambda_2 & \lambda_3 & \lambda_4 & \lambda_5 & \lambda_6 \\ \hline
1 & 1.0000000 & & & & & \\
2 & - & & & & & \\
3 & 0.6270753 & 0.3061496 & 0.0667750 & & & \\
4 & 0.6261790 & 0.3103465 & 0.0538776 & 0.0095969 & & \\
5 & 0.6261805 & 0.3103324 & 0.0541775 & 0.0080588 & 0.0012508 & \\
6 & 0.6261805 & 0.3103324 & 0.0541768 & 0.0080884 & 0.0010669 & 0.0001550\\
7 & 0.6261805 & 0.3103324 & 0.0541768 & 0.0080883 & 0.0010701 & 0.0001333 \\
\hline
\end{array}\]
\[\begin{array}{|c||c|c|c|c|c|c|}
\hline n & 1 & 2 & 3 & 4 & 5 & 6 \\ \hline S^{(n)} & 0.0000000 & - & 0.8357542 & 0.8582210 & 0.8614143 & 0.8618263 \\
\hline\hline n & 7 & 8 & 9 & 10 & 11 & 12 \\ \hline S^{(n)} & 0.8618767 & 0.8618827 & 0.8618834 & 0.8618835 & 0.8618835 & 0.8618835 \\ \hline
\end{array}\]
\[\begin{array}{|c||c|c|c|c|c|c|} \hline n & \lambda_1 & \lambda_2 & \lambda_3 & \lambda_4 & \lambda_5 & \lambda_6 \\ \hline
1 & 1.0000000 & & & & & \\
2 & 0.5253423 & 0.4746577 & & & & \\
3 & 0.5747363 & 0.4133032 & 0.0119604 & & & \\
4 & 0.5746924 & 0.4133888 & 0.0110457 & 0.0008731 & & \\
5 & 0.5746912 & 0.4133922 & 0.0085830 & 0.0062595 & -0.0029260 & \\
6 & 0.5746912 & 0.4133922 & 0.0086980 & 0.0060556 & -0.0030094 & 0.0001724 \\
7 & 0.5746912 & 0.4133922 & 0.0086978 & 0.0060563 & -0.0030100 & 0.0001458 \\
\hline
\end{array}\]
\[\begin{array}{|c||c|c|c|c|c|c|} \hline n & 1 & 2 & 3 & 4 & 5 & 6 \\ \hline
\|\hat{\rho}\|_1 & 1.0000000 & 1.0000000 & 1.0000000 & 1.0000000 & 1.0058519 & 1.0060189 \\ \hline\hline n & 7 & 8 & 9 & 10 & 11 & 12 \\ \hline
\|\hat{\rho}\|_1 & 1.0060201 & 1.0060201 & 1.0060201 & 1.0060201 & 1.0060201 & 1.0060201 \\ \hline
\end{array}\]
The main goal of this subsection is to show that the set \(\Lambda_n\) defined in Definition 2 converges to the closure of the set of non-zero eigenvalues. Indeed, as the quantities \(e_k\) from Subsection 3.1 carry no information about the zero eigenvalues, we need to implement a mild modification to the set of eigenvalues. Let \(\Lambda\subset \mathbb{R}\) denote the closure of the set of non-zero \(\lambda_i\), that is, \[\Lambda=\overline{\{\lambda_i: i\geq 0\}\setminus\{0\}}.\] This means that \(0\in \Lambda\) if and only if there are infinitely many non-zero \(\lambda_i\).
Definition 3. For two non-empty compact sets \(K_1,K_2\subset \mathbb{R}\) let us define their Hausdorff distance as \[\mathop{\mathrm{d_H}}(K_1,K_2)\stackrel{\text{def}}{=}\inf\{r: K_1\subset B(K_2,r) \text{ and } K_2\subset B(K_1,r) \},\] where \[B(A,r)=\{x\in \mathbb{R}: \exists y\in A \text{ such that } |x-y|<r\}.\] Then the non-empty compact subsets of \(\mathbb{R}\) endowed with the Hausdorff metric \(\mathop{\mathrm{d_H}}\) form a complete, separable metric space, see [80] for more information.
The main goal of this subsection is to prove the following theorem.
Theorem 2. Assume that each non-zero \(\lambda_i\) has multiplicity \(1\). Then \(\Lambda_n\to \Lambda\) in Hausdorff metric.8
Proof. First assume that there are only finitely many non-zero \(\lambda_i\), say \(\lambda_0, \dots, \lambda_{k-1}\). Then clearly \(g_n(x)=g(x)\) for all \(n\geq k\) for each \(x\). Hence \(\Lambda_n=\Lambda\) for all \(n\geq k\).
Now assume that there are infinitely many non-zero \(\lambda_i\). Fix an arbitrary \(\varepsilon>0\), we need to prove that there exists \(N\in \mathbb{N}\) such that \(\mathop{\mathrm{d_H}}(\Lambda_n, \Lambda)\leq \varepsilon\) for all \(n\geq N\). Define \[K_{\varepsilon}=\left\{-\frac{1}{x}: x\in \mathbb{R}\setminus B(\Lambda,\varepsilon)\right\}.\] Clearly \(K_{\varepsilon}\) is closed, and \(0\in \Lambda\) implies that \(K_{\varepsilon}\subset [-1/\varepsilon, 1/\varepsilon]\), so \(K_{\varepsilon}\) is compact. As \(g\) is continuous and non-zero on \(K_{\varepsilon}\), we can define \(\theta\stackrel{\text{def}}{=}\min\{|g(x)|: x\in K_{\varepsilon}\}>0\). Since \(g_n\to g\) uniformly on \(K_{\varepsilon}\), there exists \(N_1\in \mathbb{N}\) such that \(|g_n(x)-g(x)|<\theta\) for all \(n\geq N_1\) and \(x\in K_{\varepsilon}\). Therefore, the definition of \(\theta\) yields \(K_{\varepsilon}\cap \{x: g_n(x)=0\}=\emptyset\) for all \(n\geq N_1\). Hence the definitions of \(K_{\varepsilon}\) and \(\Lambda_n\) imply for each \(n\geq N_1\) that \[\label{eq:cont1} \Lambda_n\subset B(\Lambda,\varepsilon).\tag{21}\] Now choose finitely many non-zero \(\lambda_i\) (we may assume that \(\lambda_0,\dots, \lambda_{k-1}\)) and \(\delta>0\) such that \[\label{eq:ldelta} [\lambda_i-\delta, \lambda_i+\delta]\cap \Lambda=\{\lambda_i\} \quad \text{for all } 0\leq i<k,\tag{22}\] and for any points \(x_i\in [\lambda_i-\delta, \lambda_i+\delta]\) we have \[\label{eq:ldelta2} \Lambda\subset B\left(\cup_{i=0}^{k-1} \{x_i\},\varepsilon\right).\tag{23}\] Define a \(2k\) element set \[C\stackrel{\text{def}}{=}\left\{\frac{1}{-\lambda_i\pm \delta}: 0\leq i<k\right\},\] and let \(I_i\) be the open interval with endpoints \(1/(-\lambda_i\pm\delta)\) for any \(0\leq i<k\). As \(\lambda_i\) has multiplicity \(1\), 22 yields that \(g(\frac{1}{-\lambda_i-\delta})\) and \(g(\frac{1}{-\lambda_i+\delta})\) are non-zero and have opposite signs for all \(i\). As \(g_n\to g\) uniformly on \(C\), we infer that there exists \(N_2\in \mathbb{N}\) such that for each \(n\geq N_2\) the values \(g_n(\frac{1}{-\lambda_i-\delta})\) and \(g_n(\frac{1}{-\lambda_i+\delta})\) are non-zero and have opposite signs for all \(0\leq i<k\). Thus for any \(n\geq N_2\) for all \(0\leq i<k\) by the continuity of \(g_n\) we can define \(z_{i,n}\in I_i\) such that \(g_n(z_{i,n})=0\). Let \(x_{i,n}=-1/z_{i,n}\), then clearly \(\{x_{i,n}: 0\leq i<k\}\subset \Lambda_n\), and \(x_{i,n}\in (\lambda_i-\delta, \lambda_i+\delta)\) for all \(0\leq i<k\). Then 23 implies that \[\label{eq:cont2} \Lambda\subset B(\{x_{i,n}: 0\leq i<k\},\varepsilon) \subset B(\Lambda_n,\varepsilon).\tag{24}\] Finally, let \(N=\max\{N_1,N_2\}\), then 21 and 24 imply for all \(n\geq N\) that \[\mathop{\mathrm{d_H}}(\Lambda_n,\Lambda)\leq \varepsilon.\] As \(\varepsilon>0\) was arbitrary, the proof is complete. ◻
We apply the method of Section 3 to estimate the minimal eigenvalue. Let the operator \(O\) and its real eigenvalues \(\lambda_i\) be the same as in Section 3. Recall that \(\lambda_{\min}=\min\{0,\lambda_i: i\geq 0\}\), and note that \[\label{eq:lmin} \lambda_{\min}=\frac{-1}{\min\{x>0: g(x)=0\}},\tag{25}\] where we use the convention \(\min \emptyset=+\infty\). Let us recall the following definition.
Definition 4. For all integers \(n\geq 1\) define \[\label{eq:qn0} q_{n,0}\stackrel{\text{def}}{=}\frac{-1}{\min \left\{x>0: g_n(x)=0\right\}}, \quad \text{where } g_n(x)=\sum_{k=0}^n e_kx^k.\tag{26}\]
We consider the approximation of \(\lambda_{\min}\) by \(q_{n,0}\). In the next theorem we prove that \(q_{n,0}\to \lambda_{\min}\) with super-exponential speed under the slight technical condition that \(\lambda_{\min}<0\) has multiplicity one. First we need the following easy fact.
Fact 3. Let \(n\geq 0\) be an integer and let \(0\leq x\leq \frac{n+2}{2}\). Then \[\sum_{k=n+1}^{\infty} \frac{x^k}{k!}\leq \frac{2x^{n+1}}{(n+1)!}\]
Proof. We can use a geometric sum for the estimate as \[\sum_{k=n+1}^{\infty} \frac{x^k}{k!}\leq \frac{x^{n+1}}{(n+1)!}\sum_{i=0}^{\infty} \left(\frac{x}{n+2}\right)^i\leq \frac{2x^{n+1}}{(n+1)!}. \qedhere\] ◻
Theorem 4. Assume that \(\lambda_{\min}<0\) has multiplicity one. Then \(q_{n,0}\to \lambda_{\min}\). Moreover, \[\label{eq:pn} |q_{n,0}-\lambda_{\min}|\leq \exp[-n \log(n)+\mathcal{O}(n)].\tag{27}\]
Proof. First we prove that \(q_{n,0}\to \lambda_{\min}\). Assume that \(\lambda_0=\lambda_{\min}\) and fix \(\gamma_0\) such that \(\lambda_0<\gamma_0<\min\{0, \lambda_i: i\geq 1\}\). Define \[\theta_0=-\frac{1}{\lambda_0} \quad \text{and} \quad \theta_1=-\frac{1}{\gamma_0}.\] Since \(g\) has no root between \(\theta_0\) and \(\theta_1\), and \(\theta_0\) is the smallest root of \(g\) with multiplicity \(1\), we obtain that \(g(x)>0\) for all \(0\leq x<\theta_0\) and \(g(x)<0\) for all \(\theta_0<x\leq \theta_1\). Fix an arbitrary \(0<\varepsilon<\min\{\theta_0,\theta_1-\theta_0\}\), and define \(\delta>0\) as \[\delta\stackrel{\text{def}}{=}\min\{|g(x)|: x\in [0, \theta_0-\varepsilon]\cup\{\theta_0+\varepsilon \} \}.\] Since \(g_n\) uniformly converges to \(g\) on the interval \([0, \theta_0+\varepsilon]\), we obtain that there exists an integer \(N=N(\varepsilon)\geq 0\) such that \[\label{eq:hgdelta} |g_n(x)-g(x)|<\delta \quad \text{for all } x\in [0, \theta_0+\varepsilon] \text{ and } n\geq N.\tag{28}\] The definition of \(\delta\) and 28 imply that for all \(n\geq N\) we have \(g_n(x)>0\) for all \(x\in [0, \theta_0-\varepsilon]\) and \(g_n(\theta_0+\varepsilon)<0\). Hence \(\min \left\{x>0: g_n(x)=0\right\}\in [\theta_0-\varepsilon, \theta_0+\varepsilon]\) for all \(n\geq N\). Taking the limit \(\varepsilon \to 0+\) yields that \(\min \left\{x>0: g_n(x)=0\right\}\to \theta_0\) as \(n\to \infty\), which implies that \(q_{n,0}\to \lambda_0\) by definition.
Now we prove the upper bound 27 . Let \(F(x)=|\lambda_0| \prod_{i\geq 1} (1+\lambda_i x)\), and let \[\Delta\stackrel{\text{def}}{=}\min\left\{F(x): 0\leq x\leq \theta_1\right\}.\] As \(F\) is continuous and positive on the interval \([0,\theta_1]\), we obtain that \(\Delta>0\). Consider \(x=\theta_0+s\), where \(s\in [-\theta_0,\theta_1-\theta_0]\). Then the definition of \(\Delta\) yields \[\label{eq:glin} |g(x)|=\left|\prod_{i=0}^{\infty} (1+\lambda_ix)\right|=|s \lambda_0| \prod_{i\geq 1} (1+\lambda_i x)\geq \Delta|s|.\tag{29}\] Define \[L\stackrel{\text{def}}{=}\sum_{i=0}^{\infty} |\lambda_i|,\] and assume that \(n>2 \theta_1 L\). Then \(Lx\leq L\theta_1<\frac{n}{2}\) for every \(x\in [0,\theta_1]\). Thus 18 , Fact 3, and the inequality \(n!\geq e\left(\frac{n}{e}\right)^{n}\) imply that for all \(0\leq x\leq \theta_1\) we have \[\label{eq:hg} |g_n(x)-g(x)|=\left|\sum_{k=n+1}^{\infty} e_kx^k\right|\leq \sum_{k=n+1}^{\infty} \frac{(Lx)^k}{k!}\leq 2\frac{(Lx)^{n+1}}{(n+1)!}< \left(\frac{eL\theta_1}{n+1}\right)^{n+1}.\tag{30}\] Now define \(\varepsilon_n>0\) as \[\label{eq:eps} \varepsilon_n \stackrel{\text{def}}{=}\frac{1}{\Delta}\left(\frac{eL\theta_1}{n+1}\right)^{n+1}.\tag{31}\] Then 29 , 30 , and 31 yield that for all large enough \(n\) we have \(g_n(x)>0\) for all \(0\leq x\leq \theta_0-\varepsilon_n\) and \(g_n(\theta_0+\varepsilon_n)<0\); so we can define \[r_n\stackrel{\text{def}}{=}\min\{x>0: g_n(x)=0 \}\in [\theta_0-\varepsilon_n, \theta_0+\varepsilon_n].\] Then \(r_n\in [\theta_0-\varepsilon_n, \theta_0+\varepsilon_n]\), and the definition of \(\varepsilon_n\) with some straightforward analysis imply that for all large enough \(n\) we have \[|q_{n,0}-\lambda_0|=\left|\frac{1}{\theta_0}-\frac{1}{r_n}\right|\leq \left|\frac{1}{\theta_0}-\frac{1}{\theta_0-\varepsilon_n}\right|\leq 2|\lambda_0|^2 |\varepsilon_n|=\exp[-n \log(n)+\mathcal{O}(n)].\] The proof of the theorem is complete. ◻
Figure 3 will compare this approximation with the one in Section 6.
Remark 5. Note that the proof of an analogous theorem would also work if we only assumed that \(\lambda_{\min}<0\) has odd multiplicity \(m\) in the sequence \(\{\lambda_i\}_{i\geq 0}\), with the only difference that \(n\log(n)\) need to be replaced by \(\frac{1}{m} n\log(n)\) in 27 . However, having multiplicity bigger than one does not seem to be a natural condition for the minimal eigenvalue of a linear operator.
The main goal of this section is to give a method which can provide rigorous upper bounds for the Schatten–von Neumann \(1\)-norm, and to demonstrate that it works effectively for polynomial Gaussian operators, see Definition 1. We use this section as a preparation for Section 6, but it can be interesting in its own right. Let \(\mathcal{H}\) be a separable, complex Hilbert space. For the following definition see for example [74].
Definition 5. Assume that \(O\colon \mathcal{H}\to \mathcal{H}\) is a Hilbert–Schmidt operator. Let \(\{s_i\}_{i\geq 0}\) be the singular values of \(O\), that is, the sequence of the square roots of the eigenvalues of \(OO^{\dagger}\) with multiplicities. Then we can define the finite \(2\)-norm of \(O\) as \[\| O \|_2 \stackrel{\text{def}}{=}\Bigg( \sum_{i\geq 0} s_{i}^2 \Bigg)^{\frac{1}{2}}.\] If \(O\) is trace-class, we can define its Schatten–von Neumann \(1\)-norm as \[\| O \|_1 \stackrel{\text{def}}{=}\sum_{i\geq 0} s_{i}.\]
By [74] for \(K\in L^2(\mathbb{R}^{2d})\) the \(2\)-norm of the Hilbert–Schmidt integral operator \(\widehat{K}\) can be calculated as \[\label{eq:Knorm} \big\| \widehat{K}\big\|^2_2=\int_{\mathbb{R}^{2d}} |K(\mathbf{x},\mathbf{y}) |^2 \, \mathrm{d} \mathbf{x} \, \mathrm{d} \mathbf{y}.\tag{32}\]
Now assume that \(O\) is a self-adjoint, trace-class operator, then clearly we obtain that \(\| O \|_1=\sum_{i\geq 0} |\lambda_i|<\infty\), where \(\{\lambda_i\}_{i\geq 0}\) are the eigenvalues with multiplicities. The main goal of this section is to find an upper bound for \(\|O\|_1\) which can be effectively calculated. By [74] the operator \(O\) can be written as a product \[\label{eq:O12} O=O_1 O_2,\tag{33}\] where \(O_1,O_2\colon \mathcal{H}\to \mathcal{H}\) are Hilbert–Schmidt operators, so \(\|O_i\|_2<\infty\) for \(i=1,2\). Hölder’s inequality [74] yields \[\label{eq:OHolder} \|O\|_1\leq \| O_1\|_2 \| O_2\|_2,\tag{34}\] giving an upper bound for the Schatten–von Neumann \(1\)-norm of \(O\). However, it is a hard problem to find a decomposition 33 where we can also calculate or estimate the norms \(\| O_1\|_2\) and \(\| O_2\|_2\).
We propose a solution to self-adjoint trace-class integral operators \(\widehat{K}\) with kernels consisting of a polynomial multiplied by a Gaussian kernel, see Subsection 5.1. Then a decomposition to Hilbert–Schmidt operators \[\label{eq:K12} \widehat{K}=\widehat{K}_1 \widehat{K}_2\tag{35}\] implies that the kernels satisfy \[\label{eq:prod95of95int95ops} K(\mathbf{x},\mathbf{y}) = \int_{\mathbb{R}^d} K_1(\mathbf{x},\mathbf{z}) K_2(\mathbf{z},\mathbf{y}) \, \mathrm{d} \mathbf{z}.\tag{36}\] Applying Hölder’s inequality 34 to the decomposition 35 , and using the formula 32 for \(K_1\) and \(K_2\) yield \[\label{eq:Hold} \big\| \widehat{K} \big\|_1\leq \left(\int_{\mathbb{R}^{2d}} |K_1(\mathbf{x},\mathbf{y}) |^2 \, \mathrm{d} \mathbf{x} \, \mathrm{d} \mathbf{y}\right)^{\frac{1}{2}} \left(\int_{\mathbb{R}^{2d}} |K_2(\mathbf{x},\mathbf{y}) |^2 \, \mathrm{d} \mathbf{x} \, \mathrm{d} \mathbf{y}\right)^{\frac{1}{2}}.\tag{37}\] This upper bound has a direct connection to the minimal eigenvalue problem. We obtain an immediate lower bound for \(\lambda_{\min}\) and upper bound for the negativity \(\mathcal{N} \stackrel{\text{def}}{=}\sum_{i=0}^{\infty} |\min\{\lambda_i,0\}|\) as \[\label{eq:Trmin} |\lambda_{\min}|\leq \mathcal{N}= \frac{\big\| \widehat{K} \big\|_1- \mathop{\mathrm{Tr}}\big\{\widehat{K}\big\}}{2}\leq \frac{\big\| \widehat{K}_1 \big\|_{2} \big\| \widehat{K}_2 \big\|_{2}- \mathop{\mathrm{Tr}}\big\{\widehat{K}\big\}}{2}.\tag{38}\] It is interesting that while our operator \(\widehat{K}\) is always self-adjoint, often there are only decompositions of the form 35 in which neither \(\widehat{K}_1\) nor \(\widehat{K}_2\) is self-adjoint. We will see more sophisticated estimates of \(\lambda_{\min}\) in Section 6, where we will also use an upper bound for \(\big\| \widehat{K} \big\|_1\) coming from 35 and 37 . In the rest of this section we show a method how to calculate such upper bounds.
Given a polynomial Gaussian kernel \(K\in L^2(\mathbb{R}^{2d})\) of the form 13 , we want to decompose the operator \(\widehat{K}\) to a product according to 35 . We will search the decomposition in the form \[K_1(\mathbf{x},\mathbf{y})=P_1(\mathbf{x},\mathbf{y})K_{G_1}(\mathbf{x},\mathbf{y}) \quad \text{and} \quad K_2(\mathbf{x},\mathbf{y})=P_2(\mathbf{x},\mathbf{y})K_{G_2}(\mathbf{x},\mathbf{y}),\] where \(K_{G_1},K_{G_2}\) are not necessarily self-adjoint Gaussian kernels in \(2d\) variables, and \(P_1,P_2\) are polynomials in \(2d\) variables such that \(\deg P_1+\deg P_2=\deg P\). In particular, if \(\deg P_1=0\) or \(\deg P_2=0\) then the decomposition leads us to a system of linear equations which can be solved effectively, and we can even try to minimize \(\big\| \widehat{K}_1 \big\|_{2} \big\| \widehat{K}_2 \big\|_{2}\) coming from the different decompositions of \(\widehat{K}\). We demonstrate this algorithm in the \(d=1\) case together with a minimization in the following subsections.
Recall that a self-adjoint Gaussian kernel \(K_G\colon \mathbb{R}^2\to \mathbb{C}\) is of the form \[\begin{align} \label{eq:g1d} \begin{aligned} K_G(x,y)=N_0\exp&\left\{-A(x-y)^2-iB(x^2-y^2)-C(x+y)^2 \right.\\ &~ \left. -iD(x-y)-E(x+y)\right\}, \end{aligned} \end{align}\tag{39}\] with real parameters \(A>0\), \(C>0\), \(B\), \(D\), \(E\), where \(N_0=2\sqrt{\frac{C}{\pi}} \exp \left[-\frac{E^2}{4C}\right]\) is a normalizing factor ensuring that \(\mathop{\mathrm{Tr}}\big\{\widehat{K}_G\big\}\)=1. The eigenvalue equation was solved in [81]; the eigenvalues are given by \[\lambda_i=\frac{2\sqrt{C}}{\sqrt{A}+\sqrt{C}} \left(\frac{\sqrt{A}-\sqrt{C}}{\sqrt{A}+\sqrt{C}}\right)^i, \quad i=0,1,\ldots.\] Correspondingly, the Schatten–von Neumann \(1\)-norm of the Gaussian \(\widehat{K}_G\) is \[\label{eq:exact95l1norm:for95gauss} \big\| \widehat{K}_G\big\|_1=\sum_{i=0}^\infty |\lambda_i|=\begin{cases} 1 & \text{if } A\geq C, \\ \sqrt{\frac{C}{A}} & \text{if } C>A. \end{cases}\tag{40}\] By 36 we consider the decomposition of \(K_G\) as \[\label{eq:gauss95decomp} K_G (x,y)=\int_{-\infty}^\infty \, K_1(x,z) K_2(z,y)\, \mathrm{d}z.\tag{41}\] We define \(K_1\) and \(K_2\) containing the free parameter \(w\in \mathbb{R}\) as follows. Let \[K_1(x,y)=N_1 \exp \left(-A_1 x^2 - B_1 y^2 - C_1 xy - D_1 x - E_1 y \right),\] with \[\begin{align} \begin{aligned} \label{eq:first95exp95par1} A_1&=w+ i B +\frac{AC}{w}\\ B_1&=w- i B +\frac{AC}{w},\\ C_1&=-2 \left( w-\frac{AC}{w}\right), \\ D_1&= i D+ \frac{AE}{w}, \\ E_1&=-i D+ \frac{AE}{w}, \\ N_1&=\frac{2\sqrt{C}}{\pi}\sqrt{\frac{(w^2-AC)^2}{w(A-w)(C-w)}}\exp{\left(-\frac{\left(AC-w^2\right)E^2}{4w(C-w)C}\right)}. \end{aligned} \end{align}\tag{42}\] Similarly let \[K_2(x,y)= \exp \left(-A_2 x^2 - B_2 y^2 - C_2 xy - D_2 x - E_2 y \right)\] with \[\begin{align} \begin{aligned} \label{eq:second95exp95par1} A_2&= i B +\frac{(A-w)C}{C-w}+\frac{A(C-w)}{A-w},\\ B_2&= - i B + \frac{(A-w)C}{C-w}+\frac{A(C-w)}{A-w}, \\ C_2&=2 \left( \frac{(A-w)C}{C-w}-\frac{A(C-w)}{A-w} \right), \\ D_2&= i D + \frac{(A-w)E}{C-w}, \\ E_2&=-i D + \frac{(A-w)E}{C-w}. \end{aligned} \end{align}\tag{43}\] It is easy to check that 41 holds and the integral is finite if \(w\) satisfies \[\label{eq:allowed95w} 0 < w < \min(A,C) \qquad \text{or} \qquad 0< \max(A,C) < w.\tag{44}\] Straightforward calculation leads to \[\begin{align} \begin{aligned} \label{eq:rho12} \iint_{\mathbb{R}^2} \left| K_1(x,y)\right|^2 \, \mathrm{d} x \, \mathrm{d} y=&\frac{C(w^2-A C)^2}{\pi \sqrt{AC} w(A-w)(C-w)} \exp{\left( -\frac{(A-w)E^2}{2(C-w)C}\right)},\\ \iint_{\mathbb{R}^2} \left| K_2(x,y)\right|^2 \, \mathrm{d} x \, \mathrm{d} y=&\frac{\pi}{4\sqrt{AC}} \exp{\left( \frac{(A-w)E^2}{2(C-w)C}\right)}, \end{aligned} \end{align}\tag{45}\] provided that \(w\) is taken from the allowed intervals 44 . By 45 we can calculate the product \[\iint_{\mathbb{R}^2} \left| K_1(x,y)\right|^2 \, \mathrm{d} x \, \mathrm{d} y \iint_{\mathbb{R}^2} \left| K_2(x,y)\right|^2 \, \mathrm{d} x \, \mathrm{d} y=\frac{(w^2-AC)^2}{4Aw(A-w)(C-w)}.\] We can minimize this in \(w\) on \([0,\min[A,C]]\cup [\max[A,C],\infty]\), which gives \[\begin{align} \label{eq:location95of95min95for95pos95gauss} w_{\min}^{G}=\begin{cases} A \mp \sqrt{A^2-AC}& \qquad \text{if } A\geq C, \\ C \mp \sqrt{C^2-AC}& \qquad \text{if } C>A. \end{cases} \end{align}\tag{46}\] The minimum value are the same at both \((\pm)\) values. Therefore, using Hölder’s inequality 37 we obtain the upper bound \[\begin{align} \big\| \widehat{K}_G\big\|_1&\leq \left. \sqrt{\iint_{\mathbb{R}^2} \left| K_1(x,y)\right|^2 \, \mathrm{d} x \, \mathrm{d} y \iint_{\mathbb{R}^2} \left| K_2(x,y)\right|^2 \, \mathrm{d} x \, \mathrm{d} y }\right|_{w=w_{\min}^{G}} \\ &=\begin{cases} 1 & \text{if } A\geq C, \\ \sqrt{\frac{C}{A}}& \text{if } C<A. \end{cases} \end{align}\] Comparing this with 40 shows that our upper bound equals to the exact value of the norm \(\big\| \widehat{K}_G\big\|_1\).
Our second example is the polynomial Gaussian kernel \(K\colon \mathbb{R}^2\to \mathbb{C}\) given by \[\label{eq:rho95Q} K(x,y)=P(x,y) K_G(x,y),\tag{47}\] where \(K_G(x,y)\) is the self-adjoint Gaussian kernel with trace \(1\) from 39 , and \(P(x,y)\) is the self-adjoint polynomial given by \[\begin{align} P(x,y)=\frac{1}{N} &\left(A_P (x-y)^2+i B_P(x^2-y^2) + C_P (x+y)^2 \right. \\ &\, \left. +i D_P(x-y)+E_P(x+y) +F_P\right), \end{align}\] with real parameters \(A_P,B_P, C_P, D_P, E_P, F_P\) and normalizing factor \[N=F_P + \frac{C_P-E_P E}{2C}+\frac{C_P E^2}{4 C^2}.\] This ensures that \(\mathop{\mathrm{Tr}}\big\{ \widehat{K}\big\}=1\). Let us decompose \(\widehat{K}=\widehat{K}_1 \widehat{K}_2\), by 36 the kernels must satisfy \[\label{eq:decomposition95B} K(x,y)=\int_{-\infty}^{\infty} K_1(x,z)K_2(z,y)\, \mathrm{d}z.\tag{48}\] We assume that \(K_1\) is a Gaussian kernel and \(K_2\) is a Gaussian multiplied by a quadratic polynomial of the form \[\begin{align} K_1(x,y)&=N'_1 \exp\left( -A_1 x^2 - B_1 y^2 - C_1 xy - D_1 x - E_1 y\right); \\ K_2(x,y)&=P_2(x,y)\exp\left( -A_2 x^2 - B_2 y^2 - C_2 xy - D_2 x - E_2 y\right), \end{align}\] where \[\begin{align} P_2(x,y)=&A_{P_2} (x-y)^2+i B_{P_2}(x^2-y^2) + C_{P_2} (x+y)^2\\ &+i D_{P_2}(x-y)+E_{P_2}(x+y) +F_{P_2}, \end{align}\] and all the parameters are real. Choose the exponential parameters \(A_1,\dots,E_1\), and \(A_2,\dots,E_2\) according to 42 43 . By a straightforward calculation the normalizing factor \(N'_1\) and the polynomial parameters \(A_{P_2},\dots,F_{P_2}\) are uniquely determined by the decomposition 48 : one obtains linear equations for the polynomial parameters. Then the only free parameter is \(w\) (as before), which allows minimization. Allowed regions for \(w\) are still given by 44 , where \(A\) and \(C\) are real parameters of the Gaussian \(K_G(x,y)\). The quantity \[R(w)=\left(\iint_{\mathbb{R}^2} \left| K_1(x,y)\right|^2 \, \mathrm{d} x \, \mathrm{d} y \right) \left( \iint_{\mathbb{R}^2} \left| K_2(x,y)\right|^2 \, \mathrm{d} x \, \mathrm{d} y\right)\] can be easily calculated. It is a lengthy rational function of the free parameter \(w\), thus we do not quote it here. Finding the minimum point \(w_{\min}\) of \(R(w)\) in the allowed regions 44 give an optimized upper bound to the norm \(\big\| \widehat{K}\big\|_1\). Note that the value of \(w_{\min}\) is different from \(w_{\min}^{G}\) given in 46 , our minimum is affected by the polynomial parameters as well. On Figure 1 we display upper bounds of the Schatten–von Neumann \(1\)-norm for a family of quadratic Gaussian operators of the form 47 . The decomposition 48 is not as optimal as in our first example. It can be proved that at \(C_{P}=1\) the the operator \(\widehat{K}\) is positive semidefinite, thus the \(1\)-norm is exactly \(1\), but our decomposition 48 after the minimization leads to the upper bound \[\label{eq:104} \big\|\widehat{K} \big\|_1\leq 1.04054.\tag{49}\] Then 38 yields a lower bound for the minimal eigenvalue \(\lambda_{\min}\) and an upper bound for the negativity \(\mathcal{N}\) as follows: \[\label{eq:lminbound} |\lambda_{\min}|\leq \mathcal{N}\leq \frac{1.04054-1}{2}=0.02027.\tag{50}\]
Let \(O\colon \mathcal{H}\to \mathcal{H}\) be a self-adjoint, trace-class linear operator acting on a separable, complex Hilbert space \(\mathcal{H}\). Then \(O\) has countably many real eigenvalues \(\{\lambda_i\}_{i\geq 0}\) satisfying \(\sum_{i\geq 0} |\lambda_i|<\infty\). If there are only finitely many eigenvalues \(\lambda_0,\dots,\lambda_m\) then define \(\lambda_i=0\) for each integer \(i>m\). From now on let us fix \(c>0\) satisfying \[\label{eq:c} \sum_{i=0}^{\infty} |\lambda_i|\leq c.\tag{51}\]
Definition 6. For \(n\geq 0\) define \(h_n=h_{n,c}\colon [0,\infty)\to \mathbb{R}\) as \[h_{n,c}(x)\stackrel{\text{def}}{=}\sum_{k=0}^n e_kx^k-\sum_{k=n+1}^{\infty} \frac{(cx)^k}{k!},\] and define \(q_n=q_{n,c}\) as9 \[\label{eq:q95n95definition} q_{n,c}\stackrel{\text{def}}{=}\frac{-1}{\min\{x>0: h_{n,c}(x)=0\}}.\tag{52}\]
The following theorem shows that \(q_n\) is a monotone, asymptotically sharp lower bound of the minimal eigenvalue \(\lambda_{\min}\).
Theorem 6. The sequence \(q_n\) is monotone increasing and \[\lim_{n\to \infty} q_n= \lambda_{\min}.\]
Proof. First we show that for all \(x\geq 0\) and \(n\geq 0\) we have \[\label{eq:gn431} g(x)\geq h_{n+1}(x)\geq h_n(x).\tag{53}\] Let us fix \(x\geq 0\) and \(n\geq 0\). The definition of \(g,h_n\) and 18 imply that \[g(x)=\sum_{k=0}^{\infty} e_k x^k\geq \sum_{k=0}^n e^k x^k-\sum_{k=n+1}^{\infty} \frac{(cx)^k}{k!}=h_n(x).\] Inequality 18 also implies that \[h_{n+1}(x)-h_n(x)=\left(e_{n+1}+\frac{c^{n+1}}{(n+1)!}\right)x^{n+1}\geq 0,\] thus 53 holds. Therefore the definition of \(q_n\), inequalities 25 and 53 , and \(g(0)=h_n(0)=1\) imply that \(q_n\) is monotone increasing and \(q_n\leq \lambda_{\min}\) for all \(n\geq 0\). Finally, \(q_n\to \lambda_{\min}\) follows from the fact that \(h_n\) uniformly converges to \(g\) on any bounded interval \(I\subset [0,\infty)\). ◻
The following theorem states that if \(\lambda_{\min}=0\) then \(q_{n}\approx -\frac{c}{n}\), see Figure 2 for a numerical example.
Theorem 7. Let \(\lambda_{\min}=0\). Then for any integer \(n\geq 0\) we have \[\frac{c}{n+1} \leq |q_n|\leq \frac{ec}{n+1}.\] Moreover, the lower bound of \(|q_n|\) holds without the assumption \(\lambda_{\min}=0\), too.
Proof. First we prove the upper bound. Let \(\psi_n(x)=\sum_{k=n+1}^{\infty} \frac{x^{k}}{k!}\) and let \(b_n\) be the only positive solution to the equation \(\psi_n(x)=1\). As \(e_k\geq 0\) for all \(k\), we have \(\sum_{k=0}^n e_kx^k\geq 1\) for all \(x\geq 0\). Therefore, the definition of \(h_n\) implies that \[\min\{x>0: h_n(x)=0\}\geq \min\{x>0: \psi_n(cx)=1\}=\frac{b_n}{c},\] which yields \[\label{eq:qb} |q_n|\leq \frac{c}{b_n}.\tag{54}\] Before proving a lower bound for \(b_n\) we show that \[\label{eq:Kn} b_n\leq \frac{n+2}{2}.\tag{55}\] Let \(x_n=\frac{n+2}{2}\), we need to prove that \(\psi_n(x_n)\geq 1\). By a straightforward estimate \[\psi_{n}(x_n)\geq \frac{(x_n)^{n+1}}{(n+1)!}=\frac{(n+2)^{n+1}}{2^{n+1} (n+1)!},\] so it is enough to show that the sequence \(a_n\stackrel{\text{def}}{=}\frac{(n+2)^{n+1}}{2^{n+1} (n+1)!}\) satisfies \(a_n\geq 1\) for all \(n\). Indeed, \(a_0=1\) and using the monotonicity of the sequence \(\left(1+\frac{1}{n}\right)^n\) for all \(n\geq 1\) we obtain \[\frac{a_n}{a_{n-1}}=\frac{1}{2} \left(1+\frac{1}{n+1}\right)^{n+1}>1,\] so \(a_n\geq 1\). Therefore 55 holds.
By Fact 3 we have \[\psi_n(x)\leq \frac{2x^{n+1}}{(n+1)!} \quad \text{for all } 0\leq x\leq \frac{n+2}{2}.\] Using this together with 55 , and the well-known inequality \(n!\geq e\left(\frac{n}{e}\right)^{n}\) we obtain \[1=\psi_n(b_n)\leq 2 \frac{(b_n)^{n+1}}{(n+1)!}< \left(\frac{eb_n}{n+1} \right)^{n+1}.\] Thus \(b_n\geq \frac{n+1}{e}\). Hence 54 yields \[|q_n|\leq \frac{c}{b_n}\leq \frac{ec}{n+1},\] which finishes the proof of the upper bound.
Finally, we will prove the lower bound. For integers \(n\geq 0\) and reals \(x\geq 0\) consider the function \[f_n(x)=\sum_{k=0}^n \frac{x^k}{k!}-\sum_{k=n+1}^{\infty} \frac{x^k}{k!},\] and let \[d_n=\min\{x>0: f_n(x)=0\}.\] By 18 we obtain that \(h_n(x)\leq f_n(cx)\). This inequality and \(h_n(0)=f_n(0)=1\) imply that \[\min\{x>0: h_n(x)=0\}\leq \min\{x>0: f_n(cx)=0\} \leq \frac{d_n}{c},\] which yields that \[\label{eq:qd} |q_n|\geq \frac{c}{d_n}.\tag{56}\] Thus it is enough to show that \[\label{eq:dn} d_n\leq n+1,\tag{57}\] for which it is sufficient to see that \(f_n(n+1)\leq 0\). In order to see this, it is enough to prove that for all integers \(0\leq m\leq n\) we have \[\label{eq:frac} \frac{(n+1)^{n-m}}{(n-m)!}\leq \frac{(n+1)^{n+1+m}}{(n+1+m)!},\tag{58}\] since summarizing 58 from \(m=0\) to \(n\) clearly yields \[\sum_{k=0}^n \frac{(n+1)^k}{k!}\leq \sum_{k=n+1}^{2n+1} \frac{(n+1)^k}{k!}<\sum_{k=n+1}^{\infty} \frac{(n+1)^k}{k!}.\] Inequality 58 is equivalent to \[\prod_{i=-m}^{m} (n+1+i)\leq (n+1)^{2m+1},\] which clearly follows from the identity \[(n+1-i)(n+1+i)=(n+1)^2-i^2\leq (n+1)^2.\] Hence we proved 57 . Inequalities 56 and 57 imply that \[|q_n|\geq \frac{c}{d_n}\geq \frac{c}{n+1},\] so the proof of the lower bound is also complete. ◻
In the case of \(\lambda_{\min}<0\) the following theorem establishes that \(q_n\to \lambda_{\min}\) with super-exponential speed. On the good side, \(q_{n,c}\) is monotone, so our method provides a rigorous lower bound for \(\lambda_{\min}\). On the bad side, \(q_{n,0}\) approximates \(\lambda_{\min}\) much better than \(q_{n,c}\), and the main reason of this is that \(q_{n,0}\) does not depend on the value of \(c\) through the sub-optimal estimate 18 . Note that inequality 27 is better than 60 , since the coefficient of \(-n\log(n)\) in the exponent is \(1\) instead of \(\frac{|\lambda_{\min}|}{c}\leq 1\), avoiding the critical case when \(\frac{|\lambda_{\min}|}{c}\) is very small.
Theorem 8. Let \(\lambda_{\min}<0\) and \(\alpha=\frac{c}{|\lambda_{\min}|}\). Then for all \(n\) we have \[\label{eq:qlambda} |q_{n,c}-\lambda_{\min}|\leq \frac{\delta_n}{1-\delta_n} |\lambda_{\min}|,\tag{59}\] where \(0<\delta_n<1\) is the unique solution of the equation \[x^{\alpha}=2\sum_{k=n+1}^{\infty} \frac{(\alpha(1-x))^{k}}{k!}.\] For \(n\geq \max\{3,2(\alpha-1)\}\) we have the estimate \[\label{eq:delta} \delta_n\leq \left(\frac{e\alpha}{n+1}\right)^{\frac{n+1}{\alpha}}=\exp \left[-\frac{|\lambda_{\min}|}{c} n \log(n)+\mathcal{O}(n)\right].\tag{60}\]
Proof. First we show 59 . Let \[\theta=\min\{x>0: g(x)=0\}=\frac{1}{|\lambda_{\min}|} \quad \text{and} \quad \theta_n=\min\{x>0: h_n(x)=0\}.\] Since \(\theta_n\leq \theta\), we can define \(0\leq s_n<1\) such that \(\theta_n=(1-s_n) \theta\). Then we have \[|q_n-\lambda_{\min}|=\left|-\frac{1}{\theta_n}+\frac{1}{\theta}\right|=\frac{|\theta_n-\theta|}{\theta_n \theta}=\frac{s_n}{1-s_n} |\lambda_{\min}|,\] so as the function \(x\mapsto \frac{x}{1-x}\) is monotone increasing on \([0,1)\), it is enough to show that \(s_n\leq \delta_n\). For \(0\leq x\leq 1\) let \[\varphi_n(x)=x^{\alpha}-2\sum_{k=n+1}^{\infty} \frac{(\alpha(1-x))^{k}}{k!}.\] Since \(\varphi_n\) is strictly monotone increasing and \(\varphi_n(\delta_n)=0\), it is enough to prove that \[\label{eq:phi} \varphi_n(s_n)\leq 0.\tag{61}\] First we prove that \[\label{eq:theta} g(\theta_n)\geq s_n^{\alpha}.\tag{62}\] Let \(\lambda=|\lambda_{\min}|\). It is easy to see that the function \(x\mapsto (1-x)^{\frac{1}{x}}\) is monotone decreasing on \((0,1]\), so for all \(0\leq x\leq \theta =\frac{1}{\lambda}\) we have \[g(x)=\prod_{i=0}^{\infty} (1+\lambda_ix)\geq \prod_{i: \lambda_i<0} (1+\lambda_i x)\geq \prod_{i: \lambda_i<0} (1-\lambda x)^{\frac{|\lambda_i|}{\lambda}}\geq (1-\lambda x)^{\alpha},\] which implies 62 . On the other hand, by the definitions of \(g,h_n\) and 18 for all \(x\geq 0\) we obtain \[g(x)-h_n(x)=\sum_{k=n+1} ^{\infty} \left(e_k+\frac{c^k}{k!}\right)x^k\leq 2 \sum_{k=n+1} ^{\infty} \frac{(cx)^k}{k!},\] therefore \[\label{eq:sn} g(\theta_n)\leq 2\sum_{k=n+1}^{\infty} \frac{(\alpha(1-s_n))^{k}}{k!}.\tag{63}\] Then 62 and 63 imply 61 , so the proof of 59 is complete.
Finally, we prove 60 . By a well-known estimate for factorials we have \[\label{eq:fact} (n+1)!\geq \sqrt{2n\pi} \left( \frac{n+1}{e}\right)^{n+1}> 4 \left( \frac{n+1}{e}\right)^{n+1}\tag{64}\] and \(n\geq \max\{3,2(\alpha-1)\}\) implies \[\label{eq:n432} \alpha(1-\delta_n)<\alpha\leq \frac{n+2}{2}.\tag{65}\] Then 65 with Fact 3 and 64 yield \[\delta_n^{\alpha} =2\sum_{k=n+1}^{\infty} \frac{(\alpha(1-\delta_n))^{k}}{k!}\leq 4\frac{(\alpha (1-\delta_n))^{n+1}}{(n+1)!} \leq \left(\frac{e\alpha (1-\delta_n)}{n+1}\right)^{n+1}.\] Substituting \(N=\frac{n+1}{\alpha}\) implies \[N\delta_n^{\frac{1}{N}}\leq e(1-\delta_n)\leq e.\] Thus \[\delta_n\leq \left(\frac{e}{N}\right)^N=\left(\frac{e\alpha}{n+1}\right)^{\frac{n+1}{\alpha}}=\exp \left[-\frac{|\lambda_{\min}|}{c} n \log(n)+\mathcal{O}(n)\right],\] hence 60 holds. This completes the proof of the theorem. ◻
Matrix representation \(K_{m,n}\) of an arbitrary quadratic polynomial Gaussian kernel in case of \(d=1\) can be calculated rigorously, see [21]. Since the matrix elements \(K_{m,n}\) decrease exponentially as \(m,n\to \infty\), we can truncate the matrix at a finite size, and the original spectrum can be numerically approximated with the eigenvalues of the resulting finite dimensional operator. In the more general case, numerical diagonalization is not an easy task even for a polynomial Gaussian kernel. Our results derived from the numerical diagonalization are shown in Figure 1, and we also used it for Figure 3. The efficiency and accuracy of this method is confirmed by comparing it with the results of [21], [61].
The eigenvalues of self-adjoint trace-class operators play a very important role in physics. These operators naturally appear in many areas of quantum physics, and their eigenvalues are given by a Fredholm type integral equation, whose analytical solution is unfortunately completely hopeless in general. As our main result, we approximated the spectrum of the aforementioned operators both in theory and in practice. In particular, our methods provide estimates of the Schatten–von Neumann 1-norm and the von Neumann entropy.
We also investigated a mathematically precise non-monotone approximation of the minimal and maximal eigenvalues, which also often play an important role in physics. Here, we paid special attention to estimating the speed of convergence, which is important for many practical applications. We proved that if the there is a negative eigenvalue, then the speed of convergence to the minimal eigenvalue given by our algorithm is super-exponential, see Section 4. Analogously, the speed of convergence to the maximal eigenvalue is always super-exponential.
In Section 5 we provided rigorous upper bounds for the Schatten–von Neumann 1-norm. We demonstrated that it works effectively for Gaussian and polynomial Gaussian operators as well. It is interesting that non self-adjoint operators also come into the picture: we decompose our operator to the product two not necessarily self-adjoint operators, and bound the \(1\)-norm of our operator by calculating the two \(2\)-norms and using Hölder’s inequality. This method might be advantageous if calculating the moments of our operator is challenging; and the aforementioned decompositions might be useful in practice as well.
Often, the only goal is to decide whether a self-adjoint trace-class operator is positive semidefinite, which depends on whether the minimal eigenvalue is negative. We constructed a monotone sequence of lower bounds for the minimal eigenvalue that only depends on the moments of the operator and a concrete upper estimate of its Schatten–von Neumann \(1\)-norm, and also converges to the minimal eigenvalue. We demonstrated that this approximation can be effectively calculated for a large class of physically relevant operators. We proved that this approximation converges with super-exponential speed to the minimal eigenvalue if the operator is not positive semidefinite.
Finally, we note that the algorithm outlined in this paper is an efficient alternative of the numerical diagonalization for a polynomial Gaussian kernel, see Subsection 6.2. In principle, it can be applied to a much broader class of self-adjoint operators, not only to polynomial Gaussian operators. A major advantage of our method is that it avoids the need for an appropriate basis for diagonalization. However, it requires an extreme precise determination of the moments \(M_k\). Usually, the calculation of low moments is easy, but determination of higher moments can be more demanding. The next step, the calculation of \(e_k\) from the moments can be also difficult numerically. The numbers \(e_k\) converge much faster to zero than the moments \(M_k\) for increasing \(k\). With extended precision to a few tens of digits, offered by Mathematica, our method presented here is numerically stable, accurate, fast, and it can be implemented efficiently.
We are indebted to József Zsolt Bernád, András Bodor,
András Frigyik, Mátyás Koniorczyk, Miklós Pintér, and Géza Tóth for some helpful conversations.
The first author was supported by the National Research, Development and Innovation Office – NKFIH, grants no. 124749 and 146922, and by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences.
The second author thanks the “Frontline” Research Excellence Programme of the NKFIH (Grant no. KKP133827) and Project no. TKP2021-NVA-04, which has been implemented with the support provided by the Ministry of Innovation and Technology of Hungary from the National Research, Development and Innovation Fund, financed under the TKP2021-NVA funding scheme.
The third author was supported by the Hungarian National Research, Development and Innovation Office within the Quantum Information National Laboratory of Hungary grants no. 2022-2.1.1-NL-2022-00004 and 134437.
the sum of the absolute values of the negative eigenvalues↩︎
\(\lambda_{\min}\) is the infimum of the eigenvalues except for the case when \(O\) has only finitely many strictly positive eigenvalues; we are clearly more interested in the infinite dimensional case though.↩︎
We use the mathematical convention \(\langle \lambda f, g\rangle=\lambda \langle f, g\rangle\) and \(\langle f, \lambda g\rangle=\lambda^{*} \langle f, g\rangle\), where \(\lambda^{*}\) denotes the conjugate of \(\lambda\).↩︎
The notation \(\{f_i\}_{i\geq 0}\) and \(\{\lambda_i\}_{i\geq 0}\) wants to take into account that there might be only finitely many basis vectors \(f_i\) and eigenvalues \(\lambda_i\), respectively.↩︎
Linear operators satisfy the inclusion: \(\text{trace-class} \subset \text{Hilbert--Schmidt}\subset \text{compact} \subset \text{bounded}\).↩︎
Note that in the enumeration \(\{\lambda_i\}_{i\geq 0}\) we list every eigenvalue \(\lambda\) with multiplicity \(\dim E_{\lambda}\).↩︎
See also Newton’s identities or the Faddeev–LeVerrier algorithm.↩︎
This means that, in particular, for large \(n\) the set \(\Lambda_n\) is non-empty.↩︎
If it does not cause confusion, we simply use \(h_n\) and \(q_n\) instead of \(h_{n,c}\) and \(q_{n,c}\), respectively.↩︎