May 16, 2022
We propose a new proof method for direct coding theorems for wiretap channels where the eavesdropper has access to a quantum version of the transmitted signal on an infinite-dimensional Hilbert space and the legitimate parties communicate through a classical channel or a cq channel. The transmitter input can be subject to an additive cost constraint, which specializes to the case of an average energy constraint. This method yields errors that decay exponentially with increasing block lengths. Moreover, it provides a guarantee of a quantum version of semantic security, which is an established concept in classical cryptography and physical layer security. Therefore, it complements existing works which either do not prove the exponential error decay or use weaker notions of security. The main part of this proof method is a direct coding result on channel resolvability which states that there is only a doubly exponentially small probability that a standard random codebook does not solve the channel resolvability problem for the cq channel. Semantic security has strong operational implications meaning essentially that the eavesdropper cannot use its quantum observation to gather any meaningful information about the transmitted signal. We also discuss the connections between semantic security and various other established notions of secrecy.
Developments in the area of quantum computing in the last decades have put a spotlight on how vulnerable many state-of-the-art security techniques for communication networks are against attacks based on execution of quantum algorithms that exploit the laws of quantum physics [1]. Another aspect of the rapid development in experimental quantum physics, however, has received significantly less attention in communication engineering: Instead of simply performing quantum processing steps on an intercepted classical signal, an attacker can use quantum measurement devices to exploit the quantum nature of the signals themselves. For instance, radio waves as well as visible light that is used in optical fiber communications both consist of photons. This fact was exploited in [2] to introduce the so-called photonic side channels.
Any communication network that involves a vast number of interconnected network elements and systems communicating with each other, by its very nature exposes a large attack surface to potential adversaries. This is even exacerbated if many of the communication paths are wireless, as is the case in cellular networks such as 6G. Examples for possible attacks that can be carried out against various parts of such networks are algorithm implementation attacks, jamming attacks, side-channel attacks, and attacks on the physical layer of the communication system. This multi-faceted nature of the threat necessitates a diverse range of countermeasures. It has therefore been a longstanding expectation that established defenses based on cryptography will need to be complemented with defenses based on pls which is a promising approach to protect against lower-layer attacks. It can be used as an additional layer of security to either increase the overall system security or reduce the complexity of cryptographic algorithms and protocols running at higher layers of the protocol stack. We point out that complexity may be a key aspect in massive wireless networks of the future such as 6G, where many low-cost, resource and computationally constrained devices will be deployed, making it difficult to use advanced cryptographic techniques.
In cryptography, sequences of bits are protected against attacks. The main threat posed by the recent progress in the implementation of quantum computers and known quantum algorithms [1], [3] which directly affect the security of certain cryptographic schemes, therefore, is that an attacker may have the ability to process cipher texts with quantum computers. This has triggered significant research and development efforts in the field of post-quantum cryptography [4]. The goal in this field is to develop new cryptographic algorithms and protocols that are resistant to attacks by quantum computers. The threat for defense mechanisms based on pls, on the other hand, is of a different nature: Since there is no assumption regarding the computational capability of attackers needed to guarantee security, such techniques are inherently safe against attacks with quantum computers that process classically represented signals. But since pls seeks to protect the communication signals themselves against attacks, they are vulnerable to violations of system assumptions regarding what type of signal the attacker can intercept. Therefore, security is not guaranteed if an attack is carried out with quantum hardware such as optical quantum detectors and similar quantum measurement equipment. For example, the photon emission from integrated circuits has been exploited in [2], [5] to read out the secret keys used. The underlying physical process represents a quantum side channel since the properties of the emitted photons strongly depend on the operation that the involved device performs and the data being processed. This shortcoming of (classical) pls techniques can be addressed by establishing results for pls that take the quantum nature of wireless communication signals and side channels in transmitters and receivers into account. For example, the work [6] defined the notion of an exclusion region with the goal of guaranteeing information-theoretic security as long as no wiretapper can interact with signals inside the exclusion region.
Another important aspect is the availability of bounds for secure communication in the finite block length regime. These are important for many practical purposes, such as the construction of secrecy maps for indoor and outdoor wireless networks in [7]–[9] that we expect to be crucial for the integration of pls into mobile communication networks.
In this work, we prove direct coding results for wiretap channels that take both of these aspects into consideration. The channel models considered have a quantum output at the eavesdropper’s channel terminal, and the derived bounds can be numerically evaluated at finite block lengths. Besides this, we also discuss operational implications of the resulting security guarantees and compare them to other notions of security that are commonly used in the literature.
The results we provide in this paper are rooted in and draw from several branches of research like classical pls and its connection to cryptography, channel resolvability, and quantum Shannon theory. For this reason, we give a short overview of existing literature in these fields which is closely related to this work.
An important branch of research with long history in pls addresses fundamental bounds to confidentiality of communication, which is traditionally based on the communication model of the wiretap channel [10]–[15].
It is important to emphasize that the measure of confidentiality itself has undergone a tremendous evolution. While the results of [10], [11] use equivocation as the underlying measure, [12]–[15] rely on strong secrecy or closely related measures as the security metric. The disadvantage of these security metrics is that they allow no or very limited operational interpretation, i.e., they do not provide a way to quantify leakage of information for specific types of eavesdropping attacks. A confidentiality measure that has been known in the cryptography community for a long time and allows a clear operational interpretation is semantic security [16]. It has been adopted in the pls community as a suitable measure of secrecy and used for wiretap channel coding problems in [17]–[20].
A quantum version of the wiretap channel was first analyzed in [21] for a one-shot scenario. [22], [23] derive non-asymptotic results for general finite-dimensional wiretap channels with classical input and quantum outputs as well as in the case of quantum input and quantum outputs under the strong secrecy criterion, but due to the proof methods used, semantic security is implicitly established as well. In [24], error exponents and equivocation rates are established for the case that there is only randomness of limited quality available at the transmitter. [25] extends the result from [22] to pure-loss bosonic channels under the strong secrecy criterion. The series of works [26]–[29] explores a trade-off region between semantically secure communication, public communication and secret key generation which can be specialized to just semantically secure communication in a straightforward way. The channel models considered are the pure-loss bosonic channel, the thermal-noise bosonic channel, the quantum amplifier channel and general infinite-dimensional quantum channels. All achievability results are of asymptotic nature. In [30], a feedback scenario is considered under weak secrecy. One-shot achievability bounds for the finite-dimensional broadcast channel with private and confidential messages are given in [31]–[33]. In contrast to the prior works discussed, the present work derives achievability results for a general infinite-dimensional cq channels, providing bounds that can be numerically evaluated for finite block lengths.
Multi-letter converse results for finite-dimensional channels appear in [22], [23]. One-shot converse results for finite-dimensional broadcast channels with private and confidential messages are proposed in [31], [32]. In [25], [27], single-letter converse results for pure-loss bosonic channels are given, but they rely on an unproven conjecture, the entropy photon number inequality. Converse results for general quantum channels and certain bosonic channels that do not rely on unproven conjectures are given in [24], [28], [29]. Single-letter versions are available only in the case of degradable channels, otherwise the converse results are multi-letter.
As mentioned above, many works have either given results that imply semantic security or established them implicitly in their proofs. The works [24], [34], however, have considered the question of security notion explicitly and established an interesting connection between strong secrecy and semantic security. [24] contains a construction that can transform any wiretap code which achieves strong secrecy over a cqq wiretap channel into a code that ensures semantic security and has (asymptotically) the same rate. As was observed in [34], this holds also in the infinite-dimensional case. For the finite-dimensional case, [34] gives an alternative construction for this transformation; the advantage here is that the construction is explicit so that it can be expected that if it is used on a practically feasible code that achieves strong secrecy, the transformation will result in a semantically secure wiretap code that retains the practical feasibility. The prior works study semantic security either implicitly or explicitly under one of multiple possible definitions. In this work, we connect these definitions with each other and with strong and weak secrecy by showing, similarly as was done before for classical channels [35], what implications hold between them. In doing so, we also pay special attention to the subtleties that arise in the infinite-dimensional case and make explicit which of these implications continue to hold.
To the best of our knowledge, the first works that contain resolvability results for cq channels are [22], [23]. Further results appeared in [36] and a much more in-depth treatment with generalizations to the case of imperfect randomness at the transmitter can be found in [24]. All of these works are specific to the finite-dimensional case, while in this paper, we propose results that are valid for general infinite-dimensional cq channels.
The contribution of this paper can be summarized as follows:
We establish a semantic security based direct coding theorem for wiretap channels with infinite-dimensional quantum output observed by the eavesdropper (cf. Fig. 1). This result applies both to the case where the legitimate parties communicate over a (possibly) continuous-alphabet classical channel, and to the case in which they also use a channel with classical input and infinite-dimensional quantum output.
We develop a new proof method for wiretap channels with the eavesdropper having access to a quantum version of the input signal. This method also establishes a resolvability result for cq channels with infinite-dimensional output. Our method is based on symmetrization arguments used to prove non-asymptotic versions of uniform laws of large numbers based on Rademacher complexity. One advantage of this new method, besides its amenability to infinite-dimensional channels, is that it naturally yields nonasymptotic results that are valid for any block length and not only asymptotically for block lengths tending to infinity.
We state versions of our results that are amenable to numerical evaluations for finite block lengths. While it may not be possible from these results to obtain nontrivial bounds for very small block lengths, it is possible to numerically determine minimum block lengths for which our theorems guarantee certain performance metrics.
We establish our results under additive cost constraints. These types of constraints specialize in particular to the case of average input energy constraints.
We illustrate the finite-block length nature of our methods with numerical evaluations of the obtained error bounds in the special case where both the legitimate communication parties and the eavesdropper use Gaussian cq channels.
We discuss various established secrecy metrics for quantum communication systems and show how they are related to each other.
In Section [sec:main-result], we introduce the wiretap channel models that are considered in this paper, and state the main results. In Section 3, we give the definitions of semantic security and other established secrecy metrics for quantum communication systems, and we discuss how they are related. In particular, we show how the results stated in Section [sec:main-result] imply a semantic security guarantee. To establish the main results, we need a theorem on cq channel resolvability and a coding theorem for the cq channel, which we state and briefly discuss in Section 4. Section 5 contains the proofs of the theorems in Sections [sec:main-result] and 4. Finally, in Section 6, we specialize our results to Gaussian cq channels. For the example of a set of system parameters that is plausible for a real-world optical communication system, we numerically evaluate and plot the bounds derived in this paper. A number of technical lemmas are relegated to the appendix.
In this section, we introduce the wiretap channel model and state our main results.
Figure 1: Quantum wiretap channel models.. a — ccq wiretap channel consisting of a classical channel \(W\) with continuous input and output alphabet and a channel \(D_\mathfrak{E}\) with classical continuous input alphabet and quantum output on separable infinite-dimensional Hilbert space., b — cqq wiretap channel consisting of the pair of channels \((D_\mathfrak{B}, D_\mathfrak{E})\) both with classical continuous input alphabet and quantum output on separable infinite-dimensional Hilbert space.
Before we can make the formal problem statement, it is necessary to introduce some preliminaries. Let \(\mathcal{H}\) be a separable Hilbert space over \(\mathbb{C}\) and let \[\mathcal{B}({\mathcal{H}}):=\left\{A: \mathcal{H}\to \mathcal{H}: A\textrm{ is linear and } \sup_{h\in \mathcal{H}: || h||\le 1} || Ah||< \infty \right\}\] be the set of bounded linear operators on \(\mathcal{H}\) where \(|| \cdot ||\) denotes the norm on the Hilbert space \(\mathcal{H}\) induced by the inner product. The set \(\mathcal{T}({\mathcal{H}})\) of trace class operators [37] is defined by \[\mathcal{T}({\mathcal{H}}):=\left\{A\in \mathcal{B}({\mathcal{H}}): \mathrm{tr}(A^* A)^{\frac{1}{2}}< \infty \right\}.\] Endowed with the norm \(\left\lVert {\cdot} \right\rVert_\mathrm{tr}: \mathcal{T}({\mathcal{H}})\to \mathbb{R}_{+}\) defined by \[\left\lVert {A} \right\rVert_\mathrm{tr}:=\mathrm{tr}(A^* A)^{\frac{1}{2}},\] for \(A\in \mathcal{T}({\mathcal{H}})\), the pair \((\mathcal{T}({\mathcal{H}}), \left\lVert {\cdot} \right\rVert_\mathrm{tr})\) is a Banach space (cf. [37]) and is called the trace class on \(\mathcal{H}\).
The set of states or density operators \(\mathcal{S}({\mathcal{H}})\subset \mathcal{T}({\mathcal{H}})\) is given by \[\label{eq:def-state} \mathcal{S}({\mathcal{H}}) := \left\{ \rho\in \mathcal{T}({\mathcal{H}}):~ \rho\geq 0,~ \mathrm{tr}\rho= 1 \right\},\tag{1}\] where \(\geq\) denotes the positive semi-definite partial ordering of operators, and we follow the tradition of using lowercase Greek letters for states. We will also occasionally use the operator norm \(\left\lVert {\cdot} \right\rVert_\mathrm{op}\) on \(\mathcal{B}({\mathcal{H}})\) which is given by \[\left\lVert {A} \right\rVert_\mathrm{op}:= \sup_{h\in \mathcal{H}: || h||\le 1} || Ah||\] for \(A\in \mathcal{B}({\mathcal{H}})\), and for the reader’s convenience we summarize the specific properties of the operator and trace norms that we use in Lemma 15 in Appendix 6.6. Note that the pair \((\mathcal{B}({\mathcal{H}}), \left\lVert {\cdot} \right\rVert_\mathrm{op})\) is a Banach space as well (cf. [37]).
Given any finite set \(V\), a \(V\)-valued povm [38], [39] is a sequence \((F_v)_{v\in V}\) of operators on \(\mathcal{H}\) such that for all \(v\), \(F_v\geq 0\), and \[\label{eq:def-povm} \sum_{v\in V} F_v = \mathbf{1}.\tag{2}\] The povm \((F_v)_{v\in V}\) is a mathematical description of a measurement with the possible outcomes \(v\in V\). For a system in the state \(\rho\in \mathcal{S}({\mathcal{H}})\), the probability of the outcome \(v\in V\) when performing the measurement represented by the povm \((F_v)_{v\in V}\) is given by the Born rule as \[p(v):=\mathrm{tr}\left(\rho F_{v}\right).\] Since \(\rho\ge 0\) and \(F_{v}\ge 0\), we have \(p(v)\ge 0\). The defining relations (1 ), (2 ), and the linearity of the trace show that \[\sum_{v\in V}p(v)=1,\] so that, indeed, \(p\) is a pmf on \(V\).
Throughout the paper, we use the following abbreviations. For a Hilbert space \(\mathcal{H}\), \(A\in \mathcal{B}({\mathcal{H}})\), and \(n\in \mathbb{N}\), \(\mathcal{H}^{\otimes n}:=\mathcal{H}\otimes \ldots \otimes \mathcal{H}\) stands for \(n\)-fold tensor product of \(\mathcal{H}\) and \(A^{\otimes n}:= A\otimes\ldots \otimes A\) denotes the \(n\)-fold tensor power of \(A\). For a set \(\mathcal{X}\) and \(n\in \mathbb{N}\) we set \(\mathcal{X}^{n}:=\mathcal{X}\times \ldots \times \mathcal{X}\) for \(n\)-fold Cartesian product of the set \(\mathcal{X}\). Moreover, we use the abbreviation \(x^{n}:=(x_1, \ldots , x_{n})\in \mathcal{X}^{n}\).
Throughout this work, expectations and integrals of operator-valued random variables and functions will play an important role. Since we do not assume that the input alphabet \(\mathcal{X}\) is necessarily finite or discrete, we assume that \(\mathcal{X}\) is equipped with a \(\sigma\)-algebra \(\Sigma\) which represents a collection of measurable sets. This means in particular that the probability space underlying these expectations is possibly infinite as well. The expectations are therefore formally defined as Bochner integrals on the Banach space \((\mathcal{T}({\mathcal{H}}), \left\lVert {\cdot} \right\rVert_\mathrm{tr})\). The exact definitions and many important facts about Bochner integration can, e.g., be found in [40]. Therefore, many technical questions of measurability and integrability arise in our proofs. We summarize the preliminaries on measurability that we use in this paper in Lemma 18 and the necessary preliminaries on Bochner integration in Lemma 19 in Appendix 6.6. When referring to the notions of continuity (measurability) of functions, it is important to specify with respect to which topology (\(\sigma\)-algebra) the function in question is continuous (measurable). We therefore adopt the following conventions: When the domain or range of a function is given as \(\mathcal{X}\), measurability is with respect to \(\Sigma\). When the domain or range is given as \(\mathcal{B}({\mathcal{H}})\), the continuity (measurability) of the function is with respect to the topology (Borel \(\sigma\)-algebra) induced by the operator norm. When the domain or range is given as \(\mathcal{T}({\mathcal{H}})\) or \(\mathcal{S}({\mathcal{H}})\), the continuity (measurability) of the function is with respect to the topology (Borel \(\sigma\)-algebra) induced by the trace norm. When the domain or range is a subset of \(\mathbb{R}\) or \(\mathbb{C}\), we consider the topology (Borel \(\sigma\)-algebra) induced by the Euclidean norm.
In this work, we study wiretap channels with ccq and wiretap channels with cqq. Formally, a ccq wiretap channel is a pair \((W, D_\mathfrak{E})\), where \(W\) is a stochastic kernel (i.e., a classical channel), \(D_\mathfrak{E}\) is a cq channel, and \(W\) and \(D_\mathfrak{E}\) share the same input alphabet. The idea behind this definition is that the eavesdropper observes a quantum system instead of a classical output while the legitimate receiver observes a classical output. A cqq wiretap channel is a pair \((D_\mathfrak{B},D_\mathfrak{E})\) where both \(D_\mathfrak{B}\) and \(D_\mathfrak{E}\) are cq channels.
In the following, we describe the system model (which is also depicted in Fig. 1) in more detail. The transmitter \(\mathfrak{A}\) transmits a channel input \(X\) which is a random variable ranging over a measurable space \((\mathcal{X},\Sigma)\), the input alphabet. The eavesdropper \(\mathfrak{E}\) observes the output of a cq channel which is described by a measurable map \(D_\mathfrak{E}: \mathcal{X}\rightarrow \mathcal{S}({\mathcal{H}_\mathfrak{E}})\) where \(\mathcal{H}_\mathfrak{E}\) is a separable Hilbert space. The measurability of \(D_\mathfrak{E}\) is with respect to \(\Sigma\) and the Borel \(\sigma\)-algebra induced by the trace norm. In the case of the cqq wiretap channel (Fig. 1 (b)), the legitimate receiver \(\mathfrak{B}\) also observes the output of a cq channel, described by a measurable map \(D_\mathfrak{B}: \mathcal{X}\rightarrow \mathcal{S}({\mathcal{H}_\mathfrak{B}})\), where \(\mathcal{H}_\mathfrak{B}\) is a separable Hilbert space. In the case of the ccq wiretap channel (Fig. 1 (a)), the legitimate receiver observes the output \(\hat{X}\) which is a random variable ranging over another measurable space \((\hat{\mathcal{X}},\hat{\Sigma})\) and its relationship with \(X\) is described by a stochastic kernel \(W: \hat{\Sigma}\times \mathcal{X}\to [0,1]\), the legitimate receiver’s (classical) channel.
For \(x^n:=(x_1, \ldots ,x_{n})\in \mathcal{X}^n\) and \(B_1,\ldots, B_n \in \hat{\Sigma}\), we set \[W^n(B_1\times \ldots \times B_n,x^n):= \prod_{i=1}^{n}W(B_i, x_i),\;\] for classical channels \(W\), and \[D^{n} (x^n) := \bigotimes_{i=1}^{n} D(x_i)= D(x_1)\otimes \dots \otimes D(x_{n}),\] for cq channels \(D\), i.e., we consider \(n\)-th memoryless extensions of the channels \(W\) and \(D\).
A wiretap code for the ccq channel \((W, D_\mathfrak{E})\) (for the cqq channel \((D_\mathfrak{B}, D_\mathfrak{E})\)) with message set size \(M\) and block length \(n\) is a pair \((\mathrm{Enc}, \mathrm{Dec})\) such that
The encoder is a stochastic kernel mapping \(\mathrm{Enc}: \Sigma^{\otimes n} \times \{1, \dots, M\} \to [0,1]\), where \(\mathcal{X}\) is the common input alphabet of \(W\) and \(D_\mathfrak{E}\) (of \(D_\mathfrak{B}\) and \(D_\mathfrak{E}\)).
In case of a classical channel \(W\) to the legitimate receiver, the decoder is a deterministic mapping \(\mathrm{Dec}: \hat{\mathcal{X}}^n\to \{1, \dots, M\}\), where \(\hat{\mathcal{X}}\) is the output alphabet of \(W\).
In case of a cq channel \(D_\mathfrak{B}\) to the legitimate receiver, the decoder is a \(\{1, \dots, M\}\)-valued povm \((Y_m)_{m=1}^M\) on \(\mathcal{H}_\mathfrak{B}\).
The rate of a wiretap code is defined as \(\log (M) / n\). In this paper, we always use the logarithm (as well as the exponential function denoted \(\exp\)) with Euler’s number as a base and consequently, the rate is given in nats per channel use. For the ccq channel and \(\varepsilon\ge 0\), we say that the wiretap code has average error \(\varepsilon\) if \[\label{eq:def-average-error} \mathbb{E}_\mathfrak{M} \mathbb{P}\big( \mathrm{Dec}(\hat{X}^n) \neq \mathfrak{M} \big) \leq \varepsilon\tag{3}\] where \(\hat{X}^n\) is the random variable observed by the legitimate receiver, resulting from encoding and transmission of a uniformly distributed message \(\mathfrak{M}\) through the channel \(W^n\). For the cqq channel and \(\varepsilon\ge 0\), we say that the wiretap code has average error \(\varepsilon\) if \[\label{eq:def-average-error-cq} \mathbb{E}_\mathfrak{M} \mathrm{tr}\left( D_\mathfrak{B}^n\circ \mathrm{Enc}(\mathfrak{M}) (\mathbf{1}- Y_\mathfrak{M}) \right) \leq \varepsilon\tag{4}\] for a uniformly distributed message \(\mathfrak{M}\). The composition of the stochastic kernel mapping \(\mathrm{Enc}: \Sigma^{\otimes n} \times \{1, \dots, M\} \to [0,1]\) and the cq channel \(D_\mathfrak{B}^n:\mathcal{X}^n\to \mathcal{S}({\mathcal{H}_\mathfrak{B}^{\otimes n}})\) is given by \[\label{eq:def-channel-composition} D_\mathfrak{B}^n\circ \mathrm{Enc}(m) := \int_{\mathcal{X}^n} D_\mathfrak{B}^n(x^n)\mathrm{Enc}(d x^n, m),\tag{5}\] for \(m\in \{1, \dots, M\}\). Note that the integral is well-defined by the measurability of \(D_\mathfrak{B}\) and that the integral exists in Bochner sense by Lemma 19-[item:bochner-integral-basics-existence]. Moreover, \(D_\mathfrak{B}^n\circ \mathrm{Enc}(m)\in \mathcal{S}({\mathcal{H}_\mathfrak{B}^{\otimes n}})\) by Lemma 19-[item:bochner-integral-basics-trace] and Lemma 19-[item:bochner-integral-basics-bounded-functional].
Remark 1. Despite their different appearance, the expressions in (3 ) and (4 ) are closely related, as we will briefly explain. Introducing the indicator functions of the decoding sets \(\mathbb{1}_{\mathrm{Dec}^{-1}(m)}\) for \(m\in \{1, \ldots, M\}\), we can write (3 ) as \[\label{eq:remark-average-errors-1} \mathbb{E}_\mathfrak{M} \mathbb{P}\big( \mathrm{Dec}(\hat{X}^n) \neq \mathfrak{M} \big) = \frac{1}{M}\sum_{m=1}^{M}\sum_{\substack{\hat{m}=1\\ \hat{m}\neq m}}^{M} \int \left( \int\mathbb{1}_{\mathrm{Dec}^{-1}(\hat{m})}(\hat{x}^{n})W^n(d \hat{x}^{n}, x^n)\right) \mathrm{Enc}(d x^n, m).\qquad{(1)}\] On the other hand, using (2 ) for the decoding povm \((Y_m)_{m=1}^M\) together with the linearity of the trace and the integral, we have \[\label{eq:remark-average-errors-2} \mathbb{E}_\mathfrak{M} \mathrm{tr}\left( D_\mathfrak{B}^n\circ \mathrm{Enc}(\mathfrak{M}) (\mathbf{1}- Y_\mathfrak{M}) \right) = \frac{1}{M}\sum_{m=1}^{M}\sum_{\substack{\hat{m}=1\\ \hat{m}\neq m}}^{M} \int \mathrm{tr}\left(Y_{\hat{m}}D_\mathfrak{B}^n(x^n)\right)\mathrm{Enc}(d x^n, m) .\qquad{(2)}\] The inner integral in (?? ) and the term containing the trace in (?? ) both describe the probability that the message \(\hat{m}\) is detected/measured given that the channel input is \(x^{n}\in \mathcal{X}^{n}\).
For \(\delta\ge 0\), we say that the wiretap code \((\mathrm{Enc}, \mathrm{Dec})\) has distinguishing security level \(\delta\) if \[\label{eq:def-distinguishing-security} \forall m_1, m_2 \in \{1, \dots, M\} ~~ \left\lVert { D_\mathfrak{E}^n\circ \mathrm{Enc}(m_1) - D_\mathfrak{E}^n\circ \mathrm{Enc}(m_2) } \right\rVert_\mathrm{tr} \leq \delta,\tag{6}\] where \(D_\mathfrak{E}^n\circ \mathrm{Enc}\) is defined in an analogous way to (5 ).
The main results of this work state that wiretap codes for certain ccq and cqq channels exist that simultaneously achieve low average error and low distinguishing security level. Achievable secrecy rates are characterized by two information quantities. Information density and mutual information of the classical channel \(W\) associated with an input distribution \(P\) on \(\mathcal{X}\) are defined in the usual way as \[i_{P}({ x^n };{ \hat{x}^n }) := \log \frac{d{W^n(\cdot, x^n)}}{d{\hat{P}}} (\hat{x}^n) ,~~ I(P,W) := \mathbb{E} i_{P}({ X };{ \hat{X} })\] where \(\hat{P}\) is the distribution of the output of \(W\) under the input distribution \(P\), and \(\hat{X}\) is a random variable distributed according to \(\hat{P}\) (or \(\hat{X}\sim\hat{P}\) for short). For \(\rho\in \mathcal{S}({\mathcal{H}})\), we define the von Neumann entropy \[H\left(\rho\right) := - \mathrm{tr}( \rho\log \rho )\] with the convention \(H\left(\rho\right) = \infty\) if \(\rho\log \rho\notin \mathcal{T}({\mathcal{H}})\). For an input distribution \(P\), \(X\sim P\), and a cq channel \(D: \mathcal{X}\rightarrow \mathcal{S}({\mathcal{H}})\), we define \[D_P:= \mathbb{E}D(X)\] to be the density operator of the output of the channel under input distribution \(P\). Note that since \(\left\lVert {D(x)} \right\rVert_\mathrm{tr}=1\) for all \(x\in \mathcal{X}\), the expectation \(\mathbb{E}D(X)\) exists by Lemma 19-[item:bochner-integral-basics-existence]. Moreover, von Neumann entropy \(H\left(\cdot\right)\) is lower semi-continuous with respect to the topology induced by the trace norm \(\left\lVert {\cdot} \right\rVert_\mathrm{tr}\) (cf. [41]). Consequently, the map \(\mathcal{X}\ni x\mapsto H\left(D(x)\right)\) is measurable. Therefore, for a random variable \(X\sim P\), we can define the Holevo information as \[\chi(P;D) := \mathbb{E} \mathrm{tr}\big( D(X) \log D(X) \big) - \mathrm{tr}\big( D_P \log D_P \big) = H\left(D_P\right) - \mathbb{E} H\left(D(X)\right),\] where we adopt the convention that \(\chi(P;D)= \infty\) whenever \(H\left(D_P\right) = \infty\) or if the integral \(\mathbb{E} H\left(D(X)\right)\) does not exist. More information on definition, measurability, and structural properties of \(\chi(P;D)\) can be found in [42].
Definition 1. An additive cost constraint* \((c,C)\) consists of a measurable cost function \(c: \mathcal{X}\rightarrow [0,\infty)\) and a constraint \(C\in (0,\infty)\). A tuple \(x^n\in \mathcal{X}^n\) satisfies the additive cost constraint \((c,C)\) if \[c(x_1) + \dots + c(x_n) \leq nC.\] We say that the cost constraint is compatible with a distribution \(P\) on \(\mathcal{X}\) if, for \(X\sim P\), we have \[\mathbb{E}c(X) < C,\] and if there is \(t\in (0,\infty)\) with \(\mathbb{E}\exp(tc(X)) < \infty\).*
We have now introduced all necessary terminology to state the main result of this work. For better readability, we state the result for the ccq wiretap channel (Theorem 1) and for the cqq wiretap channel (Theorem 2) separately although the theorems are very similar. The proofs are deferred to Section 5.6.
Theorem 1. Let \((W, D)\) be a \(\gls{ccq}\) wiretap channel, let \(P\) be a probability distribution on the input alphabet \(\mathcal{X}\), and let \(X\sim P\) such that
there is \(\alpha_{\min} < 1\) with \(D_\mathfrak{E}(x)^{\alpha_{\min}} \in \mathcal{T}({\mathcal{H}})\) for almost all \(x\), \(D_P^{\alpha_{\min}} \in \mathcal{T}({\mathcal{H}})\), and the Bochner integral \(\mathbb{E}D_\mathfrak{E}(X)^{\alpha_{\min}}\) exists;
there is \(t> 0\) such that \(\mathbb{E}\exp(ti_{P}({X};{\hat{X}})) < \infty\).
Let \((c, C)\) be a cost constraint compatible with \(P\), and let \(R< I(P,W)- \chi(P;D_\mathfrak{E})\). Then there are \(\gamma_1, \gamma_2 \in (0,\infty)\) such that for sufficiently large \(n\), there exists a wiretap code with the following properties:
Theorem 2. Let \((D_\mathfrak{B},D_\mathfrak{E})\) be a cqq wiretap channel, and let \(P\) be a probability distribution on the input alphabet \(\mathcal{X}\) such that for both choices of \(D\in \{D_\mathfrak{E}, D_\mathfrak{B}\}\), there is \(\alpha_{\min} < 1\) with \(D(x)^{\alpha_{\min}} \in \mathcal{T}({\mathcal{H}})\) for almost all \(x\), \(D_P^{\alpha_{\min}} \in \mathcal{T}({\mathcal{H}})\), and the Bochner integral \(\mathbb{E}D(X)^{\alpha_{\min}}\) exists, where \(X\sim P\).
Let \((c, C)\) be a cost constraint compatible with \(P\), and let \(R< \chi(P;D_\mathfrak{B}) - \chi(P;D_\mathfrak{E})\). Then there are \(\gamma_1, \gamma_2 \in (0,\infty)\) such that for sufficiently large \(n\), there exists a wiretap code with the following properties:
The code has a rate of at least \(R\).
The output of the encoder satisfies the cost constraint \((c,C)\) almost surely.
The code has an average error of \(\varepsilon= \exp(-\gamma_1 n)\) as defined in (4 ).
The code has distinguishing security level \(\delta= \exp(-\gamma_2 n)\) as defined in (6 ).
Remark 2. As mentioned in the introduction, converse results were derived in [24], [28], [29]. Single-letter versions are shown for degradable channels; otherwise the bounds are of the multi-letter variety. In [43], it has been shown by means of many explicit examples that the capacity of cqq and ccq wiretap channels is non-additive. Therefore, single-letter converses are not possible in general.
In this subsection, we state versions of Theorem 1 and Theorem 2 that can be evaluated at given, finite block lengths \(n\). While the expressions neither give closed-form expressions for the achievable average error and security level at a given \(n\) nor guarantee nontrivial bounds for these performance metrics at arbitrarily short block lengths, they do make it possible to numerically evaluate tradeoffs between \(n\), average error, and security level for given channels and communication rates. For instance, it is possible to numerically approximate the average error and security level for a given channel and a set communication rate and block length, or conversely, to determine a minimum block length at which a set combination of performance metrics can be achieved. We show examples for this in Section 6.
Before we can state these somewhat more involved versions of our main results, we need to make some more definitions. Given the classical channel \(W\) and input distribution \(P\), we define for \(\alpha\in [0,1) \cup (1,\infty)\) \[I_{\alpha}(P,W) := \frac{1}{1-\alpha} \log \mathbb{E}_{P\hat{P}}\left( \frac{d{W(\cdot, X)}}{d{\hat{P}}}(\hat{X}) \right)^\alpha\] to denote the Rényi divergence between the joint input-output distribution and the product of their marginals. As before, we use \(X,\hat{X}\) to denote the channel input and output, and we use \(\hat{P}\) to denote the marginal output distribution of \(W\) under the input distribution \(P\). It is well-known [44] that this definition can be continuously extended to \(\alpha\in [0,\infty]\) and with this extension, \(I_{1}(P,W)=I(P,W)\).
For every cq channel \(D\) and \(x\in \mathcal{X}\), we fix a spectral decomposition \[D(x) = \sum_{y\in \mathbb{N}} P_{D}( y| x) \left\lvert {e_{y| x}} \right\rangle\left\langle {e_{y| x}} \right\rvert,\] where \(\{e_{y| x}\}_{y\in \mathbb{N}}\) is an orthonormal basis of the output Hilbert space of \(D\) for every \(x\in \mathcal{X}\). Whenever the channel \(D\) is clear from context, we will drop the corresponding subscript. By Lemma 18-[item:spectral-decompositions-measurable-eigenvalues], for every \(\mathcal{Y}\subseteq \mathbb{N}\), the map \(x \mapsto \sum_{y\in \mathcal{Y}} P_{D}(y| x)\) is a pointwise limit of measurable maps and therefore measurable. Moreover, for all \(x\), we have \(\sum_{y\in \mathbb{N}} P_{D}(y| x) = \mathrm{tr}D(x) = 1\). Therefore, the eigenvalues \(P(y| x)\) induce a stochastic kernel from \(\mathcal{X}\) to \(\mathbb{N}\), and together with a probability distribution \(P\) on \(\mathcal{X}\), we obtain a joint probability distribution on \(\mathcal{X}\times \mathbb{N}\). In a slight abuse of notation, we will use the symbol \(P\) to denote this joint probability distribution as well as the marginal on \(\mathbb{N}\) and all corresponding pmfs. Let \((\Omega,\mathcal{F}, \mu)\) be a probability space and \(X: \Omega\rightarrow \mathcal{X}\) be a random variable distributed according to \(P\). We write the density operator of the channel output in terms of its spectral decomposition \[D_P := \mathbb{E}_XD(X) = \sum_{y\in \mathbb{N}} U_{D}(y) \left\lvert {e_y} \right\rangle\left\langle {e_y} \right\rvert.\] Clearly, \(U\) is a pmf on \(\mathbb{N}\) which we identify with the probability measure it induces. We define the entropies \[\begin{align} \tag{7} H_{P_{D}} &:= - \sum_{x\in \mathcal{X}} \sum_{y\in \mathbb{N}} P_{D}(x,y) \log P_{D}(y| x) = - \mathbb{E}_X \sum_{y\in \mathbb{N}} P_{D}(y| X) \log P_{D}(y| X) = - \mathbb{E}_X \mathrm{tr}( D(X) \log D(X) ) = \mathbb{E}_X H\left(D(X)\right) , \\ \tag{8} H_{U_{D}} &:= - \sum_{y\in \mathbb{N}} U_{D}(y) \log U_{D}(y) = - \mathrm{tr}( D_P \log D_P ) = H\left(D_P\right). \end{align}\] If \(\chi(P;D)< \infty\), then \(H_P\) and \(H_U\) are both finite, and we have \[\label{eq:quantum-information} \chi(P;D) = -H_P+ H_U.\tag{9}\]
Furthermore, we define for \(\alpha\in [\alpha_{\min},1) \cup (1,\infty)\) \[\begin{align} \tag{10} H_{P_{D},{\alpha}} &:= \frac{1}{1-\alpha} \log \mathbb{E}_{P_{D}}\left( P_{D}(Y| X)^{\alpha-1} \right) = \frac{1}{1-\alpha} \log \mathbb{E}_{P_{D}} \sum\limits_{y\in \mathbb{N}} P_{D}(y| X)^{\alpha} = \frac{1}{1-\alpha} \log\mathbb{E}_{P_{D}}\mathrm{tr}\left( D(X)^{\alpha} \right) \\ \tag{11} H_{U_{D},{\alpha}} &:= \frac{1}{1-\alpha} \log \mathbb{E}_{U_{D}}\left( U_{D}(Y)^{\alpha-1} \right) = \frac{1}{1-\alpha} \log \sum\limits_{y\in \mathbb{N}} U_{D}(y)^{\alpha} = \frac{1}{1-\alpha} \log\mathrm{tr}\left( D_P^{\alpha} \right). \end{align}\]
We will use properties of \(H_{P_{D},{\alpha}}\) and \(H_{U_{D},{\alpha}}\) which are stated and proved in Lemmas 20 and 21 in Appendix 6.6. Lemma 21 allows us to expand the definitions of \(H_{U_{D},{\alpha}}\) and \(H_{P_{D},{\alpha}}\) to the domain \([\alpha_{\min},\infty)\) and obtain continuous functions in \(\alpha\) with \(H_{U_{D},{1}} = H_{U_{D}}\) and \(H_{P_{D},{1}} = H_{P_{D}}\).
We define the following functions: \[\begin{align} \nonumber \mathcal{W}_\mathrm{class}^{W}(R,n) &:= \inf_{\substack{\varepsilon\in (0,\\ I(P,W)-R)}}\Bigg( \exp\left( - n \sup_{\alpha\in (1,\infty)}\left( (\alpha-1) \left( I(P,W) + \varepsilon - I_{\alpha}(P,W) \right) \right) \right) + \exp\left( - n (I(P,W)+\varepsilon-R) \right) \Bigg) \\ \tag{12} \mathcal{R}_1^D(\varepsilon, n) &:= \exp\left( -n \sup_{\alpha\in (1,\infty)} (\alpha-1)(H_{P_{D},{\alpha}} + \varepsilon- H_{P_{D}}) \right) \\ \tag{13} \mathcal{R}_2^D(\varepsilon, n) &:= \exp\left( -n \sup_{\alpha\in [\alpha_{\min},1)} (1-\alpha)(H_{P_{D}} + \varepsilon- H_{P_{D},{\alpha}}) \right) \\ \tag{14} \mathcal{R}_3^D(\varepsilon, n) &:= \exp\left( - n \sup_{\alpha\in (1,\infty)} (\alpha-1) ( H_{U_{D},{\alpha}} + \varepsilon - H_{U_{D}} ) \right) \\ \tag{15} \mathcal{R}_4^D(\varepsilon, n) &:= \exp\left( - n \sup_{\alpha\in [\alpha_{\min},1)} (1-\alpha) ( H_{U_{D}} + \varepsilon - H_{U_{D},{\alpha}} ) \right) \\ \tag{16} \mathcal{W}_\mathrm{coding}^D(R,n) &:= \inf_{\substack{\varepsilon\in (0,\\(\chi(P;D)-R)/2)}}\Big( 2 \mathcal{R}_1^D(\varepsilon,n) + 2 \mathcal{R}_2^D(\varepsilon,n) + 4 \mathcal{R}_3^D(\varepsilon,n) + 4 \mathcal{R}_4^D(\varepsilon,n) + 4 \exp\big( - n (\chi(P;D)- R- 2\varepsilon) \big) \Big) \\ \tag{17} \mathcal{W}_\mathrm{res}^D(R,n) &:=\inf_{\substack{\varepsilon\in (0,\\(R-\chi(P;D))/4)}}\left( 2\mathcal{R}_1^D(\varepsilon, n) + 2\mathcal{R}_2^D(\varepsilon, n) + 2\mathcal{R}_3^D(\varepsilon, n) + 2\mathcal{R}_4^D(\varepsilon, n) + 6\exp\left( - \frac{1}{2} n (R- \chi(P;D)- 4\varepsilon) \right) \right) \\ \tag{18} \mathcal{W}_\mathrm{cost}^{(c,C)}( \beta, R, n ) &:= \exp\Big( -2\exp(n(R-2\beta))\big(1-\exp(-n(\beta_1-\beta))\big)^2 \Big), \end{align}\] where \((c,C)\) is an additive cost constraint and \[\label{eq:mgf-parameter-choice} \beta_1 := \sup_{\hat{t} \in (0,\infty)} \left( - \log p(\hat{t}) \right),\tag{19}\] with \(p(t) := \mathbb{E}_P\exp(t(c(X)-C))\), the moment generating function of \(c(X)-C\).
With this, we are ready to state finite-block length versions of Theorem 1 and Theorem 2.
Let \((W, D)\) be a \(\gls{ccq}\) wiretap channel, let \(P\) be a probability distribution on the input alphabet \(\mathcal{X}\), and let \(X\sim P\) such that
there is \(\alpha_{\min} < 1\) with \(D_\mathfrak{E}(x)^{\alpha_{\min}} \in \mathcal{T}({\mathcal{H}})\) for almost all \(x\), \(D_P^{\alpha_{\min}} \in \mathcal{T}({\mathcal{H}})\), and the Bochner integral \(\mathbb{E}D_\mathfrak{E}(X)^{\alpha_{\min}}\) exists;
there is \(t> 0\) such that \(\mathbb{E}\exp(ti_{P}({X};{\hat{X}})) < \infty\).
Let \((c, C)\) be a cost constraint compatible with \(P\), let \(M, L\in \mathbb{N}\), and let \(R,\tilde{R},\hat{R}\in (0,\infty)\) with the properties \[\begin{align} \tag{20} M&\geq \exp(n\tilde{R}) \\ \tag{21} ML&\leq \exp(n\hat{R}) \\ \tag{22} L&\geq \exp(nR). \end{align}\]
Further assume that \[\begin{align} \tilde{R}&> \chi(P;D) \\ \hat{R}&< I(P,W). \end{align}\]
Let \(\beta_2 \in (0, \min(\beta_1,(R+L)/2)), \beta_3 \in (0,\infty), \beta_4 \in (0, \min(\beta_1,\tilde{R}/2)), \beta_5 \in (0, \tilde{R}/2)\), and \(n\in \mathbb{N}\), where \(\beta_1\) is defined in 19 , be chosen such that \[\begin{gather} \label{eq:wiretap-ccq-prob-one} \mathcal{W}_\mathrm{class}^W(\hat{R},n)\exp(n\beta_3) + \mathcal{W}_\mathrm{cost}^{(c,C)}(\beta_2,R+\tilde{R},n) \\+ \mathcal{W}_\mathrm{cost}^{(c,C)}( \beta_4, \tilde{R}, n ) \exp(n(\hat{R}-\tilde{R})) + \exp\left( -\frac{1}{2}\exp( n (\tilde{R}-2\beta_5) ) + n( \hat{R}-\tilde{R} ) \right) < 1. \end{gather}\tag{23}\]
Then there exists a wiretap code with the following properties:
The code has size \(L\).
The output of the encoder satisfies the cost constraint \((c,C)\) almost surely.
The code has an average error \[\varepsilon = \exp(-n\beta_2) + \exp(-n\beta_3)\] as defined in (3 ).
The code has distinguishing security level \[\delta = 2\mathcal{W}_\mathrm{res}^D(\tilde{R},n) + 4\exp(-\beta_4n) + 2\exp(-\beta_5n)\] as defined in (6 ).
Let \((D_\mathfrak{B},D_\mathfrak{E})\) be a cqq wiretap channel, and let \(P\) be a probability distribution on the input alphabet \(\mathcal{X}\) such that for both choices of \(D\in \{D_\mathfrak{E}, D_\mathfrak{B}\}\), there is \(\alpha_{\min} < 1\) with \(D(x)^{\alpha_{\min}} \in \mathcal{T}({\mathcal{H}})\) for almost all \(x\), \(D_P^{\alpha_{\min}} \in \mathcal{T}({\mathcal{H}})\), and the Bochner integral \(\mathbb{E}D(X)^{\alpha_{\min}}\) exists, where \(X\sim P\).
Let \((c, C)\) be a cost constraint compatible with \(P\), let \(M,L\in \mathbb{N}\) and \(R,\tilde{R},\hat{R}\in \mathbb{R}\) satisfying 20 , 21 , and 22 .
Further assume that \[\begin{align} \tilde{R}&> \chi(P;D_\mathfrak{E}) \\ \hat{R}&< \chi(P;W). \end{align}\]
Let \(\beta_2 \in (0, \min(\beta_1,(R+L)/2)), \beta_3 \in (0,\infty), \beta_4 \in (0, \min(\beta_1,\tilde{R}/2)), \beta_5 \in (0,R/2)\), and \(n\in \mathbb{N}\), where \(\beta_1\) is defined in 19 , be chosen such that \[\begin{gather} \label{eq:wiretap-cq-prob-one} \mathcal{W}_\mathrm{coding}^{D_\mathfrak{B}}(\hat{R},n)\exp(n\beta_2) + \mathcal{W}_\mathrm{cost}^{(c,C)}\left(\beta_3,R+\tilde{R},n\right) \\+ \mathcal{W}_\mathrm{cost}^{(c,C)}( \beta_4, \tilde{R}, n ) \exp(n(\hat{R}-\tilde{R})) + \exp\left( -\frac{1}{2}\exp( n (\tilde{R}-2\beta_5) ) + n( \hat{R}-\tilde{R} ) \right) < 1. \end{gather}\tag{24}\]
Then there exists a wiretap code with the following properties:
The code has size \(L\).
The output of the encoder satisfies the cost constraint \((c,C)\) almost surely.
The code has an average error \[\varepsilon = \exp(-n\beta_2) + 2\exp(-n\beta_3)\] as defined in (4 ).
The code has distinguishing security level \[\delta = 2\mathcal{W}_\mathrm{res}^{D_\mathfrak{E}}(n) + 4\exp(-\beta_4n) + 2\exp(-\beta_5n)\] as defined in (6 ).
The distinguishing security criterion provided by Theorems 1 and 2 implies security guarantees against eavesdropping attacks that extend beyond an attacker’s ability to reconstruct the entire transmitted message. In this section, we discuss various notions of security and their operational implications.
In this subsection, we give a definition of semantic security that is analogous to the classical archetype in [17]. Formally, semantic security means that a large class of possible objectives (which includes the objective of reconstructing the entire transmitted message, but also many others) cannot be reached by the eavesdropper. To the best of our knowledge, this definition of semantic security has not appeared before in the literature for channels with quantum outputs.
Problem 1. **(Eavesdropper’s objective.)* Each possible eavesdropper’s objective that we consider in this paper is defined by a partition \(\Pi\) of the message space \(\{1, \dots, M\}\). The eavesdropper’s objective is then to output some \(\pi\in \Pi\) such that the originally transmitted message is contained in \(\pi\).*
The partition of \(\{1, \dots, M\}\) that corresponds to the reconstruction of the entire message consists of all singleton sets, i.e., \[\Pi= \big\{ \{1\}, \dots, \{M\} \big\}.\]
But also other possible objectives can be defined by a message space partition. For instance, the task of reconstructing the first bit of the transmitted message would be represented by \[\Pi= \big\{ \{ m:~ \textrm{The binary representation of m starts with 0} \}, \{ m:~ \textrm{The binary representation of m starts with 1} \} \big\}.\]
In this section, we assume that the eavesdropper has prior knowledge of the probability distribution \(P_\mathfrak{M}\) from which the transmitted message \(\mathfrak{M}\) is drawn. While this may not be realistic in some practical scenarios, it is certainly possible, for example when the eavesdropper knows that a protocol is executed in which only certain messages can be transmitted at certain time instances. In any case, this is not a restrictive assumption since it represents an additional advantage that the eavesdropper has compared to a case where no such prior knowledge is available.
We assume in this work that, after message \(\mathfrak{M}\) has been transmitted, the eavesdropper can perform any \(\Pi\)-valued povm on the output \(\rho_{\mathfrak{E}}^{n}(\mathfrak{M})\) of the channel \[\label{eq:eve-q-output-state} \rho_{\mathfrak{E}}^{n}:= D_\mathfrak{E}^n\circ \mathrm{Enc}.\tag{25}\] By the Born rule, a \(\Pi\)-valued povm \((F_\pi)_{\pi\in \Pi}\) and \(\rho_{\mathfrak{E}}^{n}(\mathfrak{M})\) induce a probability distribution on \(\Pi\), described by the pmf \[p(\pi) := \mathrm{tr}(F_\pi\rho_{\mathfrak{E}}^{n}(\mathfrak{M})).\] \(p\) describes the probability distribution of the eavesdropper’s conclusion of what the correct partition element \(\pi\) is under a fixed communication scheme, distribution of the message, and reconstruction strategy of the eavesdropper.
We define the eavesdropper’s maximum success probability when a quantum measurement is performed on a quantum state \(\rho_{\mathfrak{E}}^{n}(\mathfrak{M})\) as \[\label{eq:def-eavesdropper-success} \mathrm{Succ}(\Pi, P_\mathfrak{M}, \rho_{\mathfrak{E}}^{n}) := \sup\left\{ \mathbb{E}_\mathfrak{M}\sum_{\pi\in \Pi: \mathfrak{M}\in \pi} \mathrm{tr}(F_\pi\rho_{\mathfrak{E}}^{n}(\mathfrak{M})) :~ (F_\pi)_{\pi\in \Pi} \text{ is a \gls{povm}} \right\}.\tag{26}\] Note that since \(\Pi\) is a partition, the sum consists of only one summand for each fixed realization of \(\mathfrak{M}\).
It is important to emphasize at this point that it is not possible in general to guarantee a low eavesdropper’s success probability. For instance, \(\mathfrak{M}\) could be drawn from a distribution where the binary representation of \(\mathfrak{M}\) almost surely starts with \(0\). What we can control, however, is the advantage that the eavesdropper gains by observing \(\rho_{\mathfrak{E}}^{n}(\mathfrak{M})\) compared to the situation where it cannot observe any channel output. We denote the maximum achievable success probability in solving Problem 1 for partition \(\Pi\) and message distribution \(P_\mathfrak{M}\) without observing any quantum state (i.e., by pure guessing) by \[\mathrm{Guess}(\Pi, P_\mathfrak{M}) := \sup\left\{ \mathbb{P}(\mathfrak{M}\in \pi):~ \pi\text{ is a random variable ranging over \Pi} \right\}.\]
This leads us to the following definition of semantic security.
Definition 2. The eavesdropper’s semantic security advantage associated with the eavesdropper’s output state \(\rho_{\mathfrak{E}}^{n}(\mathfrak{M})\) is defined as \[\begin{gather} \label{eq:def-eavesdropper-advantage} \mathrm{Adv}_\mathrm{sem}(\rho_{\mathfrak{E}}^{n}) := \sup\Big\{ \mathrm{Succ}(\Pi, P_\mathfrak{M}, \rho_{\mathfrak{E}}^{n}) - \mathrm{Guess}(\Pi, P_\mathfrak{M}) :~ \\ \Pi\text{ is a partition of \{1, \dots, M\}},~ P_\mathfrak{M}\text{ is a probability distribution on \{1, \dots, M\}}. \Big\} \end{gather}\qquad{(3)}\] We say that semantic security level \(\delta\)* is satisfied for the output state \(\rho_{\mathfrak{E}}^{n}\) if \[\mathrm{Adv}_\mathrm{sem}(\rho_{\mathfrak{E}}^{n}) \leq \delta.\]*
This is a straightforward quantum analog of the term semantic security which is already an established performance metric both in classical cryptography and information-theoretic secrecy [17].
There are alternative security definitions that are easier to analyze in proofs than semantic security in the sense of Definition 2. In this section, we introduce several of these alternative established notions of security. We point out that the terminology is not the same everywhere in the literature, neither for classical nor for quantum wiretap channels. Because both distinguishing security as defined in Definition 3-[item:distinguishing-security] and mutual information security as defined in Definition 3-[item:mutual-information-security] below imply semantic security in the sense of Definition 2 above, some papers use these alternative definitions directly for semantic security. In this section, we define these alternative notions and show some implications between them. This is very similar to the treatment of the classical case [17], [35].
Definition 3. Given a cq channel \(\{1, \dots ,M\}\ni m\mapsto \rho_{\mathfrak{E}}^{n}(m)\) (cf. (25 )) and \(\delta\in\mathbb{R}\), \(\delta\ge 0\), we say that:
Weak secrecy level \(\delta\)* is satisfied if \[\mathrm{Adv}_\mathrm{weak}(\rho_{\mathfrak{E}}^{n}) := \frac{1}{n} \chi(P_\mathfrak{M};\rho_{\mathfrak{E}}^{n}) \leq \delta,\] where \(P_\mathfrak{M}\) is the uniform distribution over all messages \(\{1, \dots, M\}\).*
Strong secrecy level \(\delta\)* is satisfied if \[\mathrm{Adv}_\mathrm{str}(\rho_{\mathfrak{E}}^{n}) := \chi(P_\mathfrak{M};\rho_{\mathfrak{E}}^{n}) \leq \delta,\] where \(P_\mathfrak{M}\) is the uniform distribution over all messages \(\{1, \dots, M\}\).*
Mutual information security level \(\delta\)* is satisfied if \[\mathrm{Adv}_\mathrm{inf}(\rho_{\mathfrak{E}}^{n}) := \max_{P_\mathfrak{M}} \chi(P_\mathfrak{M};\rho_{\mathfrak{E}}^{n}) \leq \delta,\] where \(P_\mathfrak{M}\) ranges over all probability distributions of messages \(\{1, \dots, M\}\).*
Distinguishing security level \(\delta\)* is satisfied if \[\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) := \max\Big\{ \left\lVert {\rho_{\mathfrak{E}}^{n}(m_1) - \rho_{\mathfrak{E}}^{n}(m_2)} \right\rVert_\mathrm{tr} ~:~ m_1, m_2 \in \{1, \dots, M\} \Big\} \leq \delta.\]*
The terms on the left-hand side of these inequalities are called the weak secrecy (strong secrecy, mutual information security, distinguishing security) advantage* of the eavesdropper.*
For a sequence of encoding schemes with growing block length \(n\), we say that the sequence is semantically secure (weakly secret, strongly secret, mutual information secure, distinction secure)* if the corresponding advantage vanishes as \(n\) tends to infinity.*
Remark 3. We note that post-processing steps carried out by the eavesdropper cannot increase any of the advantages given in Definition 2 and Definition 3. To see this, let \(\mathcal{N}:
\mathcal{T}({\mathcal{H}_\mathfrak{E}})\to \mathcal{T}({\mathcal{H}_\mathfrak{E}'})\) be a quantum channel, i.e., positive, trace-preserving map such that the dual map \(\mathcal{N}':
\mathcal{B}({\mathcal{H}_\mathfrak{E}'})\to \mathcal{B}({\mathcal{H}_\mathfrak{E}})\), given by \[\label{eq:cp-duality}
\mathrm{tr}(A\mathcal{N}(B))=\mathrm{tr}(\mathcal{N}'(A) B)\qquad{(4)}\] for all \(A\in \mathcal{B}({\mathcal{H}_\mathfrak{E}'})\) and all \(B\in\mathcal{T}({\mathcal{H}_\mathfrak{E}})\), is completely positive [39]. The map \(\mathcal{N}\) represents a possible post-processing that can be performed by the eavesdropper. The monotonicity of quantum relative entropy under the action of quantum channels [45] implies that \[\mathrm{Adv}_\mathrm{weak}(\mathcal{N}\circ \rho_{\mathfrak{E}}^{n})\le
\mathrm{Adv}_\mathrm{weak}(\rho_{\mathfrak{E}}^{n}),\] \[\mathrm{Adv}_\mathrm{str}(\mathcal{N}\circ \rho_{\mathfrak{E}}^{n}) \le \mathrm{Adv}_\mathrm{str}(\rho_{\mathfrak{E}}^{n}),\] and \[\mathrm{Adv}_\mathrm{inf}(\mathcal{N}\circ \rho_{\mathfrak{E}}^{n})\le \mathrm{Adv}_\mathrm{inf}(\rho_{\mathfrak{E}}^{n}).\] Moreover, since for all states \(\rho,
\sigma\in\mathcal{S}({\mathcal{H}})\) we have [38] \[\left\lVert {\mathcal{N}(\rho) -
\mathcal{N}(\sigma)} \right\rVert_\mathrm{tr} \le \left\lVert {\rho-\sigma} \right\rVert_\mathrm{tr},\] we can conclude that \[\mathrm{Adv}_\mathrm{dist}(\mathcal{N}\circ \rho_{\mathfrak{E}}^{n}) \le
\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}),\] holds as well. Finally, we have \[\mathrm{Adv}_\mathrm{sem}(\mathcal{N}\circ \rho_{\mathfrak{E}}^{n})
\le
\mathrm{Adv}_\mathrm{sem}(\rho_{\mathfrak{E}}^{n}),\] which can be deduced as follows. Since for any given partition \(\Pi\) and probability distribution \(P_\mathfrak{M}\), the
quantity \[\mathrm{Guess}(\Pi, P_\mathfrak{M})\] does not depend on the eavesdropper’s output state, it is sufficient to prove \[\mathrm{Succ}(\Pi, P_\mathfrak{M}, \mathcal{N}\circ
\rho_{\mathfrak{E}}^{n}) \le \mathrm{Succ}(\Pi, P_\mathfrak{M}, \rho_{\mathfrak{E}}^{n}),\] for all partitions \(\Pi\) and probability distributions \(P_\mathfrak{M}\). Showing this
is the aim of the following argument: \[\begin{align}
\mathrm{Succ}(\Pi, P_\mathfrak{M}, \mathcal{N}\circ \rho_{\mathfrak{E}}^{n}) &=
\sup\left\{ \mathbb{E}_\mathfrak{M}\sum_{\pi\in \Pi: \mathfrak{M}\in \pi} \mathrm{tr}(F_\pi\mathcal{N}\circ \rho_{\mathfrak{E}}^{n}(\mathfrak{M})) :~ (F_\pi)_{\pi\in \Pi} \text{ is a \gls{povm} on } \mathcal{H}_\mathfrak{E}'
\right\} \\
\overset{(a)}&{=}
\sup\left\{ \mathbb{E}_\mathfrak{M}\sum_{\pi\in \Pi: \mathfrak{M}\in \pi} \mathrm{tr}(\mathcal{N}'(F_\pi) \rho_{\mathfrak{E}}^{n}(\mathfrak{M})) :~ (F_\pi)_{\pi\in \Pi} \text{ is a \gls{povm} on } \mathcal{H}_\mathfrak{E}'
\right\} \displaybreak[0] \\
&=
\sup\left\{ \mathbb{E}_\mathfrak{M}\sum_{\pi\in \Pi: \mathfrak{M}\in \pi} \mathrm{tr}(F'_\pi\rho_{\mathfrak{E}}^{n}(\mathfrak{M})) :~ F'_{\pi}=\mathcal{N}'(F_{\pi}) \text{ where }(F_\pi)_{\pi\in \Pi} \text{ is a \gls{povm} on }
\mathcal{H}_\mathfrak{E}'
\right\} \displaybreak[0] \\
\overset{(b)}&{\le}
\sup\left\{ \mathbb{E}_\mathfrak{M}\sum_{\pi\in \Pi: \mathfrak{M}\in \pi} \mathrm{tr}(F'_\pi\rho_{\mathfrak{E}}^{n}(\mathfrak{M})) :~ (F'_\pi)_{\pi\in \Pi} \text{ is a \gls{povm} on } \mathcal{H}_\mathfrak{E}
\right\}\\
&=
\mathrm{Succ}(\Pi, P_\mathfrak{M}, \rho_{\mathfrak{E}}^{n}),
\end{align}\] where step (a) is by definition (?? ) of the dual map \(\mathcal{N}'\). The inequality (b) follows from the fact that \(\mathcal{N}'\) is a unital (i.e., \(\mathcal{N}'(\mathbf{1}_{\mathcal{H}_\mathfrak{E}'})=\mathbf{1}_{\mathcal{H}_\mathfrak{E}}\)) and positive map. Therefore, it maps the povms on
\(\mathcal{H}_\mathfrak{E}'\) to a subset* of the povms on the Hilbert space \(\mathcal{H}_\mathfrak{E}\),
which leads to a larger supremum in the inequality (b).
Consequently, in proofs of security guarantees, it is sufficient to focus on the case \(\rho_{\mathfrak{E}}^{n}= D^n\circ \mathrm{Enc}\), as we do in Theorems 1 and 2, since the guarantees automatically extend to all post-processing attempts represented by application of
quantum channels at the eavesdropper.*
In the remainder of this section, we follow the arguments in [35] for the classical case and prove that similar implications also hold in the quantum case between the different security notions of Definitions 2 and 3. For the implication from distinguishing security to mutual information security, we use a slightly more general approach that applies not only to finite-dimensional systems but also to some practically relevant infinite-dimensional systems.
Lemma 1. \(\mathrm{Adv}_\mathrm{sem}(\rho_{\mathfrak{E}}^{n}) \leq \frac{1}{2} \mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n})\).
Proof. Suppose the eavesdropper’s output state is \(\rho_{\mathfrak{E}}^{n}(\mathfrak{M})\), and let \((F_{\pi})_{\pi\in \Pi}\) be a povm.
We define a pmf \[\label{eq:semantic-security-eve-guess} q: \Pi\rightarrow [0,1],~ \pi \mapsto \mathbb{E}_\mathfrak{M} \mathrm{tr}(F_\pi\rho_{\mathfrak{E}}^{n}(\mathfrak{M})).\tag{27}\] Clearly, \(q\) does not depend on an observed channel output, so we have \[\label{eq:semantic-security-comparison-advantage} \mathrm{Guess}(\Pi, P_\mathfrak{M}) \geq \mathbb{E}_\mathfrak{M} \sum_{\pi\in \Pi: \mathfrak{M}\in \pi} q(\pi).\tag{28}\] Therefore, \[\begin{align} \mathbb{E}_\mathfrak{M}\sum_{\pi\in \Pi: \mathfrak{M}\in \pi} \mathrm{tr}(F_\pi\rho_{\mathfrak{E}}^{n}(\mathfrak{M})) - \mathrm{Guess}(\Pi, P_\mathfrak{M}) \overset{(\ref{eq:semantic-security-comparison-advantage})}&{\leq} \mathbb{E}_\mathfrak{M} \sum_{\pi\in \Pi: \mathfrak{M}\in \pi}\left( \mathrm{tr}(F_\pi\rho_{\mathfrak{E}}^{n}(\mathfrak{M})) - q(\pi) \right) \\ \overset{(\ref{eq:semantic-security-eve-guess})}&{=} \mathbb{E}_\mathfrak{M} \sum_{\pi\in \Pi: \mathfrak{M}\in \pi}\left( \mathrm{tr}\big( F_{\pi}\rho_{\mathfrak{E}}^{n}(\mathfrak{M}) \big) - \mathbb{E}_\mathfrak{M} \mathrm{tr}\big( F_{\pi}\rho_{\mathfrak{E}}^{n}(\mathfrak{M}) \big) \right) \displaybreak[0] \\ &= \mathbb{E}_\mathfrak{M} \mathrm{tr}\left( \sum_{\pi\in \Pi: \mathfrak{M}\in \pi}\left( F_{\pi}\rho_{\mathfrak{E}}^{n}(\mathfrak{M}) - F_{\pi}\mathbb{E}_\mathfrak{M}\rho_{\mathfrak{E}}^{n}(\mathfrak{M}) \right) \right) \displaybreak[0] \\ &= \mathbb{E}_\mathfrak{M} \mathrm{tr}\left( \left( \sum_{\pi\in \Pi: \mathfrak{M}\in \pi} F_{\pi} \right)( \rho_{\mathfrak{E}}^{n}(\mathfrak{M}) - \mathbb{E}_\mathfrak{M}\rho_{\mathfrak{E}}^{n}(\mathfrak{M}) ) \right) \displaybreak[0] \\ \overset{(a)}&{\leq} \frac{1}{2} \mathbb{E}_\mathfrak{M} \left\lVert { \rho_{\mathfrak{E}}^{n}(\mathfrak{M}) - \mathbb{E}_\mathfrak{M}\rho_{\mathfrak{E}}^{n}(\mathfrak{M}) } \right\rVert_\mathrm{tr} \displaybreak[0] \\ \overset{(b)}&{\leq} \frac{1}{2} \mathbb{E}_{\mathfrak{M},\mathfrak{M}'} \left\lVert { \rho_{\mathfrak{E}}^{n}(\mathfrak{M}') - \rho_{\mathfrak{E}}^{n}(\mathfrak{M}) } \right\rVert_\mathrm{tr} \\ &\leq \frac{1}{2} \mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}). \end{align}\] (a) is by Lemma 15-[item:norm-basics-zero-trace-operators], and (b) is by replacing \(\mathfrak{M}\) with an independent copy \(\mathfrak{M}'\) and then applying Lemma 19-[item:bochner-integral-basics-trace-norm-jensen]. Since this derivation holds for every choice of \(F\), \(P_\mathfrak{M}\) and \(\Pi\), the lemma immediately follows. ◻
Lemma 2. \(\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) \leq 4 \mathrm{Adv}_\mathrm{sem}(\rho_{\mathfrak{E}}^{n})\).
Proof. Fix arbitrary \(m_1, m_2 \in \{1, \dots, M\}\). We will show that \[\left\lVert {\rho_{\mathfrak{E}}^{n}(m_1) - \rho_{\mathfrak{E}}^{n}(m_2)} \right\rVert_\mathrm{tr} \leq 4\mathrm{Adv}_\mathrm{sem}(\rho_{\mathfrak{E}}^{n}).\] Since this is immediately clear for \(m_1 = m_2\), we may assume \(m_1 \neq m_2\) in the following. We fix the message probability distribution \[P_\mathfrak{M}(m) := \begin{cases} \frac{1}{2}, &m\in \{m_1, m_2\} \\ 0, &\text{otherwise,} \end{cases}\] and the partition \(\Pi:= \{\{m_1\}, \{1, \dots, M\} \setminus \{m_1\}\}\). Any guess over this partition will be correct with probability \(1/2\). So for any operator \(0 \leq F\leq \mathbf{1}\), we can calculate \[\mathrm{Adv}_\mathrm{sem}(\rho_{\mathfrak{E}}^{n}) \overset{(\ref{eq:def-eavesdropper-advantage})}{\geq} \mathrm{Succ}(\Pi,P_\mathfrak{M},\rho_{\mathfrak{E}}^{n}) - \frac{1}{2} \overset{(\ref{eq:def-eavesdropper-success})}{\geq} \frac{1}{2} \mathrm{tr}(F\rho_{\mathfrak{E}}^{n}(m_1)) + \frac{1}{2} \mathrm{tr}((\mathbf{1}-F) \rho_{\mathfrak{E}}^{n}(m_2)) - \frac{1}{2} = \frac{1}{2} \mathrm{tr}(F(\rho_{\mathfrak{E}}^{n}(m_1)-\rho_{\mathfrak{E}}^{n}(m_2))).\] We now fix \(F\) as the operator which maximizes the latter term according to Lemma 15-[item:norm-basics-zero-trace-operators], and obtain \[\mathrm{Adv}_\mathrm{sem}(A) \geq \frac{1}{4} \left\lVert {\rho_{\mathfrak{E}}^{n}(m_1)-\rho_{\mathfrak{E}}^{n}(m_2)} \right\rVert_\mathrm{tr},\] concluding the proof. ◻
Lemma 3. \(\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) \leq 2\sqrt{2\mathrm{Adv}_\mathrm{inf}(\rho_{\mathfrak{E}}^{n})}\).
Proof. Fix arbitrary \(m_1, m_2 \in \{1, \dots, M\}\). We will show that \[\label{eq:information-security-to-distinguishing-objective} \left\lVert {\rho_{\mathfrak{E}}^{n}(m_1) - \rho_{\mathfrak{E}}^{n}(m_2)} \right\rVert_\mathrm{tr} \leq 2\sqrt{2\mathrm{Adv}_\mathrm{inf}(\rho_{\mathfrak{E}}^{n})}.\tag{29}\] Since this is immediately clear for \(m_1 = m_2\), we may assume \(m_1 \neq m_2\) in the following. We define quantum states \[\begin{align} \rho &:= \frac{1}{2} \left\lvert {m_1} \right\rangle\left\langle {m_1} \right\rvert \otimes \rho_{\mathfrak{E}}^{n}(m_1) + \frac{1}{2} \left\lvert {m_2} \right\rangle\left\langle {m_2} \right\rvert \otimes \rho_{\mathfrak{E}}^{n}(m_2) \\ \sigma &:= \left( \frac{1}{2} \left\lvert {m_1} \right\rangle\left\langle {m_1} \right\rvert + \frac{1}{2} \left\lvert {m_2} \right\rangle\left\langle {m_2} \right\rvert \right) \otimes \left( \frac{1}{2} \rho_{\mathfrak{E}}^{n}(m_1) + \frac{1}{2} \rho_{\mathfrak{E}}^{n}(m_2) \right), \end{align}\] where \(\left\lvert {m_1} \right\rangle, \left\lvert {m_2} \right\rangle\) are an orthonormal basis of some \(2\)-dimensional Hilbert space. We further fix a distribution \(P_\mathfrak{M}\) on the set \(\{1, \dots, M\}\) of messages via \[P_\mathfrak{M}(m) := \begin{cases} \frac{1}{2}, &m\in \{m_1, m_2\} \\ 0, &\text{otherwise.} \end{cases}\] Then, we can calculate \[\begin{align} \mathrm{Adv}_\mathrm{inf}(\rho_{\mathfrak{E}}^{n}) &\geq \chi(P_\mathfrak{M};\rho_{\mathfrak{E}}^{n}) \\ \overset{(a)}&{=} H(\rho;\sigma) \\ \overset{(b)}&{\geq} \frac{1}{2} \left\lVert {\rho-\sigma} \right\rVert_\mathrm{tr}^2 \\ &= \frac{1}{2} \left\lVert { \frac{1}{4} \left( \left\lvert {m_1} \right\rangle\left\langle {m_1} \right\rvert - \left\lvert {m_2} \right\rangle\left\langle {m_2} \right\rvert \right) \otimes \left( \rho_{\mathfrak{E}}^{n}(m_1) - \rho_{\mathfrak{E}}^{n}(m_2) \right) } \right\rVert_\mathrm{tr}^2 \\ &= \frac{1}{2} \cdot \frac{1}{16} \left\lVert { \left\lvert {m_1} \right\rangle\left\langle {m_1} \right\rvert - \left\lvert {m_2} \right\rangle\left\langle {m_2} \right\rvert } \right\rVert_\mathrm{tr}^2 \left\lVert { \rho_{\mathfrak{E}}^{n}(m_1) - \rho_{\mathfrak{E}}^{n}(m_2) } \right\rVert_\mathrm{tr}^2 \\ &= \frac{1}{8} \left\lVert { \rho_{\mathfrak{E}}^{n}(m_1) - \rho_{\mathfrak{E}}^{n}(m_2) } \right\rVert_\mathrm{tr}^2 \end{align}\] where step (a) uses the definition of relative quantum entropy \[H(\rho;\sigma) := \mathrm{tr}{\rho\log \rho} - \mathrm{tr}{\rho\log \sigma},\] where by convention, \(H(\rho;\sigma) = \infty\) whenever the support of \(\rho\) is not contained in the support of \(\sigma\) or one of the traces is infinite [41]. Step (b) is a quantum version of the Pinsker inequality, namely, the stronger version of [41] that is mentioned in [41] and originally proven for a more general scenario in [46]. This yields (29 ), concluding the proof of the lemma. ◻
For the bound of \(\mathrm{Adv}_\mathrm{inf}(\rho_{\mathfrak{E}}^{n})\) in terms of \(\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n})\), we introduce some technical terms first. A linear operator \(G\) on \(\mathcal{H}\) is called a Gibbs observable if it satisfies the following properties:
It is of the form \[\label{eq:gibbs-eigenvalues} G\left\lvert {h} \right\rangle := \sum_{i\in \mathbb{N}} g_i \left\langle {e_i}, {h} \right\rangle\left\lvert {e_i} \right\rangle,\tag{30}\] defined on the dense domain \[\label{eq:gibbs-domain} \mathcal{D}(G):= \left\{ h\in \mathcal{H} : \sum_{i\in \mathbb{N}} \left \lvert g_i \right \rvert^2 \left \lvert \left\langle {e_i}, {h} \right\rangle \right \rvert^2 < \infty \right\},\tag{31}\] where \((g_i)_{i\in \mathbb{N}}\) is an unbounded sequence of nonnegative real numbers that includes \(0\) and \(\left\lvert {e_i} \right\rangle_{i\in \mathbb{N}}\) form an orthonormal basis in \(\mathcal{H}\) (cf. [41] and [47]). Note that \(G\) is an unbounded self-adjoint operator on the domain \(\mathcal{D}(G)\) (which is a simple consequence of [37]).
For every \(\beta\in (0,\infty)\), it holds that \[\label{eq:gibbs-regularity} \exp(-\beta G) \in \mathcal{T}({\mathcal{H}}),\tag{32}\] with the definition \[\exp(-\beta G) \left\lvert {h} \right\rangle := \sum_{i\in \mathbb{N}} \exp(-\beta g_i) \left\langle {e_i}, {h} \right\rangle \left\lvert {e_i} \right\rangle,\] \(h\in \mathcal{H}\) (cf. [47]). Note that for (32 ) to hold, it is necessary that the spectrum of \(G\) given by the sequence \((g_i)_{i\in \mathbb{N}}\) is unbounded and every eigenvalue has finite multiplicity.
For a Gibbs observable \(G\) and \(\rho\in \mathcal{S}({\mathcal{H}})\), we adopt the convention (cf. [41]) \[\label{eq:gibbs-trace} \mathrm{tr}(\rho G) := \sum_{i\in \mathbb{N}} g_i \left\langle {e_i}, {\rho e_i} \right\rangle \in [0,\infty],\tag{33}\] and we define (see [47]) for \(E\in [0,\infty)\) \[\label{eq:gibbs-max-entropy-function} H_{G}(E) := \sup\left\{ H\left(\rho\right) : \rho\in \mathcal{S}({\mathcal{H}}),~ \mathrm{tr}(\rho G) \leq E \right\} \in [0,\infty).\tag{34}\] In [48], it is proven that this supremum is finite and is in fact realized by a state of maximum entropy.
Lemma 4. Suppose that \(\hat{G}\) is a Gibbs observable on \(\mathcal{H}_\mathfrak{E}^{\otimes n}\) and \(E\in [0,\infty)\) such that for every \(m\in \{1, \dots, M\}\), we have \(\mathrm{tr}(\rho_{\mathfrak{E}}^{n}(m) \hat{G}) \leq E\). Assume further that \(\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) \leq 1\). Then, we have \[\mathrm{Adv}_\mathrm{inf}(\rho_{\mathfrak{E}}^{n}) \leq \mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) H_{\hat{G}}\left( \frac{2E}{\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n})} \right) + h\left( \frac{\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n})}{2} \right),\] where \(h: [0,1] \rightarrow \mathbb{R}, t\mapsto -t\log t- (1-t)\log(1-t)\) is the binary entropy.
Proof. We fix an arbitrary probability distribution \(P_\mathfrak{M}\) on \(\{1, \dots, M\}\) and define \[\label{eq:security-implication-minentropy} \tilde{m} := \mathop{\mathrm{arg\,min}}_{m\in \{1, \dots, M\}} H\left(\rho_{\mathfrak{E}}^{n}(m)\right).\tag{35}\] Then, we have \[\begin{align} \chi(P_\mathfrak{M};\rho_{\mathfrak{E}}^{n}) &= H\left(\mathbb{E}_\mathfrak{M}\rho_{\mathfrak{E}}^{n}(m)\right) - \mathbb{E}_\mathfrak{M} H\left(\rho_{\mathfrak{E}}^{n}(\mathfrak{M})\right) \\ \overset{(\ref{eq:security-implication-minentropy})}&{\leq} H\left(\mathbb{E}_\mathfrak{M}\rho_{\mathfrak{E}}^{n}(m)\right) - H\left(\rho_{\mathfrak{E}}^{n}(\tilde{m})\right) \\ \overset{(a)}&{\leq} \left\lVert { \mathbb{E}_\mathfrak{M}\rho_{\mathfrak{E}}^{n}(m) - \rho_{\mathfrak{E}}^{n}(\tilde{m}) } \right\rVert_\mathrm{tr} H_{\hat{G}}\left( \frac{2E}{ \left\lVert { \mathbb{E}_\mathfrak{M}\rho_{\mathfrak{E}}^{n}(m) - \rho_{\mathfrak{E}}^{n}(\tilde{m}) } \right\rVert_\mathrm{tr} } \right) + h\left( \frac{ \left\lVert { \mathbb{E}_\mathfrak{M}\rho_{\mathfrak{E}}^{n}(m) - \rho_{\mathfrak{E}}^{n}(\tilde{m}) } \right\rVert_\mathrm{tr} }{ 2 } \right) \\ \overset{(b)}&{\leq} \mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) H_{\hat{G}}\left( \frac{ 2E }{ \mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) } \right) + h\left( \frac{ \mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) }{ 2 } \right), \end{align}\] where (a) is an application of [47]. For (b), we first observe that due to Lemma 19-[item:bochner-integral-basics-trace-norm-jensen] and Definition 3-[item:distinguishing-security], we have \[\left\lVert { \mathbb{E}_\mathfrak{M}\rho_{\mathfrak{E}}^{n}(\mathfrak{M}) - \rho_{\mathfrak{E}}^{n}(\tilde{m}) } \right\rVert_\mathrm{tr} \leq \mathbb{E}_\mathfrak{M} \left\lVert { \rho_{\mathfrak{E}}^{n}(\mathfrak{M}) - \rho_{\mathfrak{E}}^{n}(\tilde{m}) } \right\rVert_\mathrm{tr} \leq \mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) \leq 1.\] The inequality in the first summand then follows from [47], and in the second summand from the well-known fact that binary entropy is nondecreasing on \([0,1/2]\). ◻
It is not clear from Lemma 4 how \(\mathrm{Adv}_\mathrm{inf}(\rho_{\mathfrak{E}}^{n})\) behaves for \(n\rightarrow \infty\) since both \(\rho_{\mathfrak{E}}^{n}\) and the Gibbs observable \(\hat{G}\) depend on \(n\). In the following Lemmas 5 and 6, we derive a bound for \(\mathrm{Adv}_\mathrm{inf}(\rho_{\mathfrak{E}}^{n})\) that depends on \(n\) only explicitly and via \(\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n})\). To this end, we need a way to lift a Gibbs observable \(G\) on a Hilbert space \(\mathcal{H}\) to a Gibbs observable \(G^n\) on \(\mathcal{H}^{\otimes n}\), and a relation between \(H_{G}\) and \(H_{G^n}\). Let \((g_i)_{i\in \mathbb{N}}\) be the eigenvalues and \(\left\lvert {e_i} \right\rangle_{i\in \mathbb{N}}\) be the eigenvectors associated with \(G\), i.e., \(G\) can be written as in (30 ) and (31 ). Then \((\left\lvert {e_{i_{1}}} \right\rangle \otimes \dots \otimes \left\lvert {e_{i_{n}}} \right\rangle)_{i_{1}, \ldots ,i_{n} \in \mathbb{N}}\) is an orthonormal basis of \(\mathcal{H}^{\otimes n}\) and the operator \[\label{eq:n-block-gibbs} G^n := G\otimes \mathbf{1}\otimes \dots \otimes \mathbf{1} + \dots + \mathbf{1}\otimes \dots \otimes \mathbf{1}\otimes G\tag{36}\] is well-defined on the set of (finite) linear combinations of \((\left\lvert {e_{i_{1}}} \right\rangle \otimes \dots \otimes \left\lvert {e_{i_{n}}} \right\rangle)_{i_{1}, \ldots ,i_{n} \in \mathbb{N}}\). Moreover, for any \(i_{1}, \ldots ,i_{n} \in \mathbb{N}\), we have \[G^{n} \left\lvert {e_{i_{1}}} \right\rangle \otimes \dots \otimes \left\lvert {e_{i_{n}}} \right\rangle = (g_{i_{1}}+\dots + g_{i_{n}})\left\lvert {e_{i_{1}}} \right\rangle \otimes \dots \otimes \left\lvert {e_{i_{n}}} \right\rangle.\] Then on the dense domain \[\label{eq:n-block-gibbs-domain} \mathcal{D}(G^{n}):= \left\{ h\in \mathcal{H}^{\otimes n} : \sum_{i_1,\ldots , i_{n} \in \mathbb{N}} \left \lvert g_{i_{1}}+\dots + g_{i_{n}} \right \rvert^2 \left \lvert \left\langle { {e_{i_{1}}} \otimes \dots \otimes {e_{i_{n}}} }, {h} \right\rangle \right \rvert^2 < \infty \right\},\tag{37}\] we can write \[\label{eq:n-block-gibbs-alt} G^{n} \left\lvert {h} \right\rangle= \sum_{i_1,\ldots , i_{n} \in \mathbb{N}}(g_{i_{1}}+\dots + g_{i_{n}})\left\langle { {e_{i_{1}}} \otimes \dots \otimes {e_{i_{n}}} }, {h} \right\rangle \left\lvert {e_{i_{1}}} \right\rangle \otimes \dots \otimes \left\lvert {e_{i_{n}}} \right\rangle.\tag{38}\] It is easily shown using [37] that (37 ) and (38 ) define a self-adjoint operator on \(\mathcal{H}^{\otimes n}\). Additionally, for any \(\beta\in (0,\infty)\), we have \[\begin{align} \mathrm{tr}\big(\exp(-\beta G^n)\big) &= \sum_{i_1,\ldots , i_{n} \in \mathbb{N}} \left\langle { e_{i_{1}} \otimes \dots \otimes e_{i_{n}} }, {\exp(-\beta G^{n}) e_{i_{1}} \otimes \dots \otimes e_{i_{n}} } \right\rangle \\ &= \sum_{i_1,\ldots , i_{n} \in \mathbb{N}}\exp(-\beta( g_{i_{1}}+\dots + g_{i_{n}} )) \displaybreak[0] \\ \overset{(a)}&{=} \left( \sum_{i\in \mathbb{N}}\exp(-\beta g_{i})\right)^{n} \\ &< \infty, \end{align}\] where (a) is due to the theorem of Fubini-Tonelli applied to the \(n\)-fold product of the counting measure on \(\mathbb{N}\). Therefore, \[\label{eq:n-block-gibbs-regularity} \exp(-\beta G^{n}) \in \mathcal{T}({\mathcal{H}^{\otimes n}}),\tag{39}\] for all \(\beta\in (0,\infty)\). Consequently, (37 ), (38 ), and (39 ) show that \(G^{n}\) is a Gibbs observable on \(\mathcal{H}^{\otimes n}\).
Lemma 5. Let \(G\) be a Gibbs observable on \(\mathcal{H}\), and let \(G^{n}\) be the corresponding Gibbs observable on \(\mathcal{H}^{\otimes n}\) given by (36 ) – (38 ). Let \(E\in [0,\infty)\). Then \[H_{G^n}(E) = nH_{G}\left( \frac{E}{n} \right).\]
Proof. Let \(\rho\in \mathcal{S}({\mathcal{H}^{\otimes n}})\) be the quantum state which attains the supremum in the definition of \(H_{G^n}(E)\) (cf. [48]) analog to (34 ), i.e., \(\mathrm{tr}(\rho G^n) \leq E\) and \(H\left(\rho\right) = H_{G^n}(E)\). Further, use \(\rho_1, \dots, \rho_n\) to denote the marginal states of \(\rho\) on the factors of \(\mathcal{H}^{\otimes n}\). We have \[\label{eq:gibbs-extension-sum} \mathrm{tr}(\rho G^n) = \mathrm{tr}(\rho(G\otimes \mathbf{1}\otimes \dots \otimes \mathbf{1})) + \dots + \mathrm{tr}(\rho(\mathbf{1}\otimes \dots \otimes \mathbf{1}\otimes G)) = \mathrm{tr}(\rho_1 G) + \dots + \mathrm{tr}(\rho_nG).\tag{40}\] By sub-additivity of von Neumann entropy, we have \[\label{eq:ineq-1} H\left(\rho\right) \leq H\left(\rho_1\right) + \dots + H\left(\rho_n\right) \le H_{G}(E_1) + \dots + H_{G}(E_n),\tag{41}\] where we defined \(E_i:= \mathrm{tr}(\rho_iG)\) for \(i\in \{1, \dots, n\}\). On the other hand, for states \(\sigma_i\), \(i\in \{1, \dots, n\}\), with \[H\left(\sigma_i\right)= H_{G}(E_i)\] we define \[\tilde{\sigma}:=\sigma_1 \otimes \dots \otimes \sigma_n.\] Then \[\mathrm{tr}(\tilde{\sigma}G^{n})\le E\] by a calculation similar to (40 ), and by additivity of von Neumann entropy on product states we obtain \[\label{eq:ineq-2} H_{G}(E_1) + \dots + H_{G}(E_n) = H\left(\sigma_1\right) + \dots + H\left(\sigma_n\right) = H\left(\tilde{\sigma}\right)\le H_{G^{n}}(E)= H\left(\rho\right).\tag{42}\] From (41 ) and (42 ) we arrive at \[\label{eq:gibbs-energy-intermediate-equality} H_{G^n}(E) = H\left(\rho\right) = H\left(\rho_1\right) + \dots + H\left(\rho_n\right) = H_{G}(E_1) + \dots + H_{G}(E_n).\tag{43}\] From (40 ), it is clear that \(E\geq E_1 + \dots + E_n\). Let \(E_1', \dots, E_n' \in [0,\infty)\) be such that \(E_1' + \dots + E_n' \leq E\), but otherwise arbitrary. By choosing states \(\rho_1', \dots, \rho_n'\) with \(\mathrm{tr}(\rho_i'G) \leq E_i'\) and \(H\left(\rho_i'\right) = H_{G}(E_i')\), we can argue \[H_{G}(E_1') + \dots + H_{G}(E_n') = H\left(\rho_1'\right) + \dots + H\left(\rho_n'\right) = H\left(\rho_1' \otimes \dots \otimes \rho_n'\right) \overset{(a)}{\leq} H_{G^n}(E_1' + \dots + E_n') \overset{(b)}{\leq} H_{G^n}(E),\] where (a) follows from a calculation similar to (40 ) and the definition (34 ), and (b) is by monotonicity of \(H_{G_n}\). By (43 ), these inequalities hold with equality in case \(E_1 = E_1', \dots, E_n= E_n'\), so we have \[H_{G}(E_1) + \dots + H_{G}(E_n) = \max\Big\{ H_{G}(E_1') + \dots + H_{G}(E_n') ~:~ E_1', \dots, E_n' \in [0,\infty) ,~~ E_1' + \dots + E_n' \leq E \Big\}.\] Moreover, we have for any such choice of \(E_1', \dots, E_n'\) \[\label{eq:gibbs-concavity-consequence} n H_{G}\left( \frac{E}{n} \right) \overset{(a)}{\geq} n H_{G}\left( \sum_{i=1}^n \frac{1}{n} E_i' \right) \overset{(b)}{\geq} n \sum_{i=1}^n \frac{1}{n} H_{G}(E_i') = H_{G}(E_1') + \dots + H_{G}(E_n'),\tag{44}\] where (a) is again by monotonicity and (b) follows from the fact that \(H_{G}\) is a concave function [48]. Due to the strict monotonicity (it is shown in [48] that the derivative is strictly positive) and strict concavity of \(H_{G}\) [48], these inequalities clearly both hold with equality iff \(E_1' = \dots = E_n' = E/n\), and since we know that \(E_1, \dots, E_n\) maximize the right-hand side of (44 ), we get \(E_1 = \dots = E_n= E/n\). Substituting this in (43 ) proves the lemma. ◻
Lemma 6. Let \(\rho_{\mathfrak{E}}^{n}\) be as defined in (25 ). Assume that for all \(x\in \mathcal{X}\), we have \(\mathrm{tr}(D_\mathfrak{E}(x) G) \leq E\), where \(G\) is a Gibbs observable on \(\mathcal{H}_\mathfrak{E}\). Suppose further that \(\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) \leq 1\). Then \[\mathrm{Adv}_\mathrm{inf}(\rho_{\mathfrak{E}}^{n}) \leq n \mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) H_{G}\left( \frac{2E}{\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n})} \right) + h\left( \frac{\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n})}{2} \right).\]
Proof. Let \(G^n\) be given in (36 ) – (38 ). We have, for every \(x^n\in \mathcal{X}^n\), \[\label{eq:gibbs-product-energy} \mathrm{tr}( D_\mathfrak{E}^n(x^n) G^n ) = \mathrm{tr}(D_\mathfrak{E}(x_1) G) + \dots + \mathrm{tr}(D_\mathfrak{E}(x_n) G) \leq nE.\tag{45}\] Note that for each \(i\in \{1,\ldots ,n\}\), the function \(\mathrm{tr}(D_\mathfrak{E}(x_i) G)\) is measurable due to representation as a series of measurable functions given in (33 ). Therefore, \(\mathrm{tr}( D_\mathfrak{E}^n(x^n) G^n )\) is a measurable and bounded function. Thus, using the representation (38 ) of \(G^{n}\), we obtain, for every \(m\), \[\begin{align} \mathrm{tr}(\rho_{\mathfrak{E}}^{n}(m)G^n) &= \mathrm{tr}\left( \int D_\mathfrak{E}^n(x^n)\mathrm{Enc}(d x^n, m) G^n \right) \\ \overset{(\ref{eq:gibbs-trace})}&{=} \sum_{i_1,\ldots , i_{n} \in \mathbb{N}} (g_{i_{1}}+\dots + g_{i_{n}}) \left\langle {\int D_\mathfrak{E}^n(x^n)\mathrm{Enc}(d x^n, m)e_{i_{1}}\otimes \dots \otimes e_{i_{n}} }, { e_{i_{1}}\otimes \dots \otimes e_{i_{n}} } \right\rangle \\ \overset{(a)}&{=} \sum_{i_1,\ldots , i_{n} \in \mathbb{N}} (g_{i_{1}}+\dots + g_{i_{n}}) \int \left\langle {D_\mathfrak{E}^n(x^n) e_{i_{1}}\otimes \dots \otimes e_{i_{n}} }, { e_{i_{1}}\otimes \dots \otimes e_{i_{n}} } \right\rangle \mathrm{Enc}(d x^n, m) \\ \overset{(b)}&{=} \int \sum_{i_1,\ldots , i_{n} \in \mathbb{N}} (g_{i_{1}}+\dots + g_{i_{n}}) \left\langle {D_\mathfrak{E}^n(x^n) e_{i_{1}}\otimes \dots \otimes e_{i_{n}} }, { e_{i_{1}}\otimes \dots \otimes e_{i_{n}} } \right\rangle \mathrm{Enc}(d x^n, m) \\ \overset{(\ref{eq:gibbs-trace})}&{=} \int \mathrm{tr}(D_\mathfrak{E}^n(x^n)G^n)\mathrm{Enc}(d x^n, m) \\ \overset{(\ref{eq:gibbs-product-energy})}&{\leq} nE, \end{align}\] where (a) is by Lemma 19-[item:bochner-integral-basics-bounded-functional] and (b) is by the monotone convergence theorem. Therefore, we can apply Lemmas 4 and 5 and obtain \[\begin{align} \mathrm{Adv}_\mathrm{inf}(\rho_{\mathfrak{E}}^{n}) &\leq \mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) H_{G^n}\left( \frac{2nE}{\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n})} \right) + h\left( \frac{\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n})}{2} \right) \\ &= n \mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) H_{G}\left( \frac{2E}{\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n})} \right) + h\left( \frac{\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n})}{2} \right), \end{align}\] concluding the proof. ◻
It is known from [48] that \(\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) H_{G}\left( 2E/\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) \right)\) tends to \(0\) as \(\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n})\) tends to \(0\), and it is also clear that the binary entropy term vanishes in this case. However, in general, the presence of the extra factor \(n\) in the upper bound of Lemma 6 means that an additional argument is necessary to show that distinguishing security implies mutual information security. We do not have such an argument for the general case, but we show in the following lemma that the implication holds in important special cases. The first of these cases is that the eavesdropper’s output is finite-dimensional. The other cases are infinite dimensional and assume the existence of Gibbs observables for the eavesdropper’s output that are closely related to the Hamiltonian of harmonic oscillator of bounded energy. Therefore, they specialize to bosonic systems with either one or finitely many modes. In this case, the Hilbert space under consideration is \(\mathcal{H}_\mathfrak{E}=L^2(\mathbb{R})\) and the Gibbs observable is given by \[\label{eq:def-number-operator} G:=a^*a,\tag{46}\] defined, for the moment, on the domain \[\mathcal{D}(G)=\mathcal{S}(\mathbb{R})\] where \(a^*\) and \(a\) denote the creation resp. annihilation operators (cf. [41]) defined on \(\mathcal{S}(\mathbb{R})\) and where \(\mathcal{S}(\mathbb{R})\) is the Schwartz space of rapidly decreasing functions which is dense in \(L^2(\mathbb{R})\). It is well known that \(G:=a^* a\) is essentially self-adjoint on \(\mathcal{D}(G)\) (cf. [49] from which the claim follows by a slight modification of the argument). The operator \(G\) is, up to additive scalar multiple of \(\mathbf{1}\), the Hamiltonian of the quantum harmonic oscillator [41]. Moreover, the operator \(G\) has discrete spectrum with spectral decomposition (cf. [49], [41]) \[\label{eq:spec-decomp-number-operator} G=\sum_{i=0}^{\infty} i\left\lvert {e_{i}} \right\rangle\left\langle {e_{i}} \right\rvert,\tag{47}\] where \(\left\lvert {e_{i}} \right\rangle_{i\in \mathbb{N}_{0}}\) is an orthonormal basis of \(\mathcal{H}_\mathfrak{E}=L^2(\mathbb{R})\) consisting of so-called number state vectors. The final domain of the operator \(G\) is then given by \[\bar{\mathcal{D}}(G)=\left\{\left\lvert {h} \right\rangle\in L^2(\mathbb{R}): \sum_{i=0}^{\infty} i^2 \left \lvert \left\langle {e_{i}}, {h} \right\rangle \right \rvert^2<\infty \right\}.\] The operator \(G\) is self-adjoint on \(\bar{\mathcal{D}}(G)\).
For the case of \(s\) modes, we similarly define the operator \[\label{eq:def-s-mode-number-operator} G^{(s)} := \sum_{j=1}^s\mathbf{1}^{\otimes j-1} \otimes \omega_jG_j\otimes \mathbf{1}^{\otimes s- j},\tag{48}\] on \(\mathcal{D}(G^{s}):=\mathcal{S}(\mathbb{R})^{\otimes s} \subseteq (L^2(\mathbb{R}))^{\otimes s}\), where \(\omega_1,\ldots , \omega_s\in (0,\infty)\) denote the frequencies of the modes and the operators \(G_1, \ldots , G_s\) are given in (46 ) with the respective creation and annihilation operators of the modes. Using the spectral decomposition (47 ) for each of the modes, we obtain the spectral decomposition \[G^{(s)} = \sum_{i_1, \dots, i_s=0}^\infty\left( \sum_{j=1}^s \omega_j i_j \right) \left\lvert { e_{1,i_1} \otimes \dots \otimes e_{s,i_s} } \right\rangle\left\langle { e_{1,i_1} \otimes \dots \otimes e_{s,i_s} } \right\rvert,\] where \(( e_{1,i_1} \otimes \dots \otimes e_{s,i_s} )\) is the orthonormal basis of \((L^2(\mathbb{R}))^{\otimes s}\) composed of bases consisting of number state vectors of individual modes (47 ). Defining \[\bar{\mathcal{D}}\left(G^{(s)}\right):= \left\{ \left\lvert {h} \right\rangle\in (L^2(\mathbb{R}))^{\otimes s} : \sum_{i_1, \dots, i_s=0}^\infty \left( \sum_{j=1}^s \omega_j i_j \right)^2 \left \lvert \left\langle { e_{1,i_1} \otimes \dots \otimes e_{s,i_s} }, {h} \right\rangle \right \rvert^2 <\infty \right\},\] we note that the operator \(G^{(s)}\) is self-adjoint on the dense domain \(\bar{\mathcal{D}}(G^{(s)})\) [37].
In order to show that \(G\) and \(G^{(s)}\) are Gibbs observables, it remains to verify that for all \(\beta\in (0,\infty)\), we have \[\exp(-\beta G) \in \mathcal{T}({L^2(\mathbb{R})})\quad \textrm{ and } \quad \exp(-\beta G^{(s)})\in \mathcal{T}({(L^2(\mathbb{R}))^{\otimes s}}),\] which will be proven in the implications 3) and 4) of Lemma 7. An example of a cqq wiretap channel for which \(G\) is a Gibbs observable is given in Section 6.1, and a general account and further examples of energy-constrained quantum channels can be found in [41]. According to the following lemma, superlinear convergence of the distinguishing security level to \(0\) is sufficient to guarantee mutual information security in the finite-dimensional case. In the case that the eavesdropper’s channel has energy constraints described by Gibbs observables of the form (46 ), (48 ), superquadratic convergence is a sufficient condition. Therefore, mutual information security in these cases follows in particular from the exponential bounds on distinguishing security level that we obtain in this work.
Lemma 7. \((\rho_{\mathfrak{E}}^{n})_{n\in \mathbb{N}}\) is mutual information secure if any of the following assumption holds:
\(\dim \mathcal{H}_\mathfrak{E}= d< \infty\) and \(n\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) \rightarrow 0\) as \(n\rightarrow \infty\).
There exists a Gibbs observable \(G\) on \(\mathcal{H}_\mathfrak{E}\) and \(c_1,E\in [0,\infty)\) such that for all \(\beta\in (0,\infty)\), we have \(\log \mathrm{tr}\exp(-\beta G) \leq c_1\beta^{-1}\), and for all \(x\in \mathcal{X}\), we have \(\mathrm{tr}(D_\mathfrak{E}(x)G) \leq E\). Moreover, there exist \(c_2 \in (0, \infty), \alpha\in (2,\infty)\) with \(\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) \leq c_2 n^{-\alpha}\).
There is a Gibbs observable \(G:= \sum_{i=0}^\infty i\left\lvert {e_i} \right\rangle\left\langle {e_i} \right\rvert\) on \(\mathcal{H}_\mathfrak{E}\), where \(\left\lvert {e_i} \right\rangle_{i=0}^\infty\) is an orthonormal basis of \(\mathcal{H}_\mathfrak{E}\). Moreover, there is \(E\in (0,\infty)\) such that for all \(x\in \mathcal{X}\), we have \(\mathrm{tr}(D_\mathfrak{E}(x)G) \leq E\), and there are \(c\in (0, \infty), \alpha\in (2,\infty)\) with \(\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) \leq cn^{-\alpha}\).
We have \(\mathcal{H}_\mathfrak{E}= \mathcal{H}^{\otimes s}\) and a Gibbs observable \(G:= \sum_{j=1}^s\mathbf{1}^{\otimes j-1} \otimes \omega_jG_j\otimes \mathbf{1}^{\otimes s- j}\), where each \(G_j\) is a Gibbs observable on \(\mathcal{H}\) of the same form as in [item:distinguishing-to-information-specialized-harmonic]) and each \(\omega_j\in (0,\infty)\). We assume further that there is \(E\in (0,\infty)\) such that for all \(x\in \mathcal{X}\), we have \(\mathrm{tr}(D_\mathfrak{E}(x)G) \leq E\), and there are \(c\in (0, \infty), \alpha\in (2,\infty)\) with \(\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) \leq cn^{-\alpha}\).
Proof. We first prove the lemma for case [item:distinguishing-to-information-specialized-finite]). In this case, the proof proceeds in parallel to Lemma 4. We fix an arbitrary probability distribution \(P_\mathfrak{M}\) on \(\{1, \dots, M\}\) and define \(\tilde{m}\) as in (35 ). With this, we obtain, for sufficiently large \(n\), \[\begin{align} \chi(P_\mathfrak{M};\rho_{\mathfrak{E}}^{n}) &= H\left(\mathbb{E}_\mathfrak{M}\rho_{\mathfrak{E}}^{n}(m)\right) - \mathbb{E}_\mathfrak{M} H\left(\rho_{\mathfrak{E}}^{n}(\mathfrak{M})\right) \\ \overset{(\ref{eq:security-implication-minentropy})}&{\leq} H\left(\mathbb{E}_\mathfrak{M}\rho_{\mathfrak{E}}^{n}(m)\right) - H\left(\rho_{\mathfrak{E}}^{n}(\tilde{m})\right) \displaybreak[0] \\ \overset{(a)}&{\leq} \left\lVert { \mathbb{E}_\mathfrak{M}\rho_{\mathfrak{E}}^{n}(m) - \rho_{\mathfrak{E}}^{n}(\tilde{m}) } \right\rVert_\mathrm{tr} n\log d + h\left( \frac{ \left\lVert { \mathbb{E}_\mathfrak{M}\rho_{\mathfrak{E}}^{n}(m) - \rho_{\mathfrak{E}}^{n}(\tilde{m}) } \right\rVert_\mathrm{tr} }{ 2 } \right) \\ \overset{(b)}&{\leq} \mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) n\log d + h\left( \frac{ \mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) }{ 2 } \right), \end{align}\] where (a) is due to the universal bound of [47] (note that \(\dim \mathcal{H}_\mathfrak{E}^{\otimes n} = d^n\)) and (b) holds for \(n\) large enough so that \(\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) \leq 1\). Clearly, this bound vanishes as \(n\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n})\) vanishes.
Next, we prove the lemma for case [item:distinguishing-to-information-specialized-gibbs]). We have \[\begin{align} n \mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) H_{G}\left( \frac{2E}{\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n})} \right) \overset{(a)}&{=} n \mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) \inf_{\beta\in (0,\infty)}\left( \beta \frac{2E}{\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n})} + \log \mathrm{tr}\exp(-\beta G) \right) \\ \overset{(b)}&{\leq} n \mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) \inf_{\beta\in (0,\infty)}\left( \beta \frac{2E}{\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n})} + \frac{c_1}{\beta} \right) \\ \overset{(c)}&{\leq} 2E n^{1-\alpha'} + c_1 \mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n}) n^{1+\alpha'} \\ \overset{(d)}&{\leq} 2E n^{1-\alpha'} + c_1 c_2 n^{1+\alpha'-\alpha}, \end{align}\] where (a) uses the variational representation of the function \(H_{G}\) [48], (b) follows from the assumption \(\log \mathrm{tr}\exp(-\beta G) \leq c_1\beta^{-1}\), (c) follows by upper bounding the infimum with the choices \(\beta:=n^{-\alpha'}\) and \(\alpha' \in (1,\alpha-1)\), and in (d) we have substituted the upper bound for \(\mathrm{Adv}_\mathrm{dist}(\rho_{\mathfrak{E}}^{n})\) from the lemma statement. The obtained upper bound vanishes, and thus mutual information security follows from Lemma 6 and the fact that binary entropy vanishes at \(0\).
The remaining two cases are special cases of [item:distinguishing-to-information-specialized-gibbs]). For [item:distinguishing-to-information-specialized-harmonic]), we note that with \(G\) as defined in this case, we have for \(\beta\in (0,\infty)\) \[\begin{align} \nonumber \log\mathrm{tr}\exp(-\beta G) &= \log \sum_{i= 0}^\infty \exp(-\beta i) \\ \nonumber \overset{(a)}&{=} - \log(1-\exp(-\beta)) \\ \nonumber \overset{(b)}&{\leq} - \left( 1 - \frac{1}{1-\exp(-\beta)} \right) \\ \nonumber &= \frac{1}{\exp(\beta)-1} \\ \label{eq:distinguishing-to-information-specialized-harmonic-trace} \overset{(c)}&{\leq} \frac{1}{\beta}, \end{align}\tag{49}\] where (a) is the known convergence behavior of the geometric series, (b) is due to the inequality \(\forall t\in (0,\infty)~\log(t) \geq 1-1/t\), and (c) is due to the inequality \(\forall t\in \mathbb{R}~ \exp(t) \geq t+ 1\). Thus, the assumption of case [item:distinguishing-to-information-specialized-gibbs]) is satisfied.
For case [item:distinguishing-to-information-specialized-harmonic-multimode]), we fix suitable orthonormal bases \(\left\lvert {e_{1,i}} \right\rangle_{i=0}^\infty, \dots, \left\lvert {e_{s,i}} \right\rangle_{i=0}^\infty\) of \(\mathcal{H}\) such that \[\begin{align} G &= \sum_{j=1}^s \sum_{i_1, \dots, i_s=0}^\infty \omega_j i_j \left\lvert { e_{1,i_1} \otimes \dots \otimes e_{s,i_s} } \right\rangle\left\langle { e_{1,i_1} \otimes \dots \otimes e_{s,i_s} } \right\rvert \\ &= \sum_{i_1, \dots, i_s=0}^\infty\left( \sum_{j=1}^s \omega_j i_j \right) \left\lvert { e_{1,i_1} \otimes \dots \otimes e_{s,i_s} } \right\rangle\left\langle { e_{1,i_1} \otimes \dots \otimes e_{s,i_s} } \right\rvert. \end{align}\] It suffices to show \(\log \mathrm{tr}\exp(-\beta G) = \beta^{-1}(\omega_1^{-1} + \dots + \omega_s^{-1})\), which we do by induction on \(s\). For \(s=1\), this reduces to (49 ). For \(s> 1\), we calculate \[\begin{align} \log \mathrm{tr}\exp(-\beta G) &= \log \sum_{i_1, \dots, i_s=0}^\infty \exp\left( -\beta \sum_{j=1}^s \omega_j i_j \right) \\ &= \log \sum_{i_1, \dots, i_s=0}^\infty \exp\left( -\beta \sum_{j=1}^{s-1} \omega_j i_j \right) \exp( -\beta \omega_s i_s ) \displaybreak[0] \\ &= \log \sum_{i_1, \dots, i_{s-1}=0}^\infty \exp\left( -\beta \sum_{j=1}^{s-1} \omega_j i_j \right) + \log \sum_{i_s=0}^\infty \exp( -\beta \omega_s i_s ) \displaybreak[0] \\ \overset{(\ref{eq:distinguishing-to-information-specialized-harmonic-trace})}&{\leq} \frac{1}{\omega_s\beta} + \log \sum_{i_1, \dots, i_{s-1}=0}^\infty \exp\left( -\beta \sum_{j=1}^{s-1} \omega_j i_j \right) \\ \overset{(a)}&{\leq} \beta^{-1}(\omega_1^{-1} + \dots + \omega_s^{-1}), \end{align}\] where the induction hypothesis is used in step (a). ◻
Observing the trivial inequalities \[\mathrm{Adv}_\mathrm{weak}(\rho_{\mathfrak{E}}^{n}) \leq \mathrm{Adv}_\mathrm{str}(\rho_{\mathfrak{E}}^{n}) \leq \mathrm{Adv}_\mathrm{inf}(\rho_{\mathfrak{E}}^{n}),\] the preceding lemmas imply the following equivalences and implications for the asymptotic notions about sequences of codes in parallel to the classical case [35] \[\text{semantically secure} \Leftrightarrow \text{distinction secure} \Leftrightarrow \text{mutual information secure} \Rightarrow \text{strongly secure} \Rightarrow \text{weakly secure}.\] It should be noted that Lemma 7 shows \[\text{distinction secure} \Rightarrow \text{mutual information secure}\] only under additional assumptions which capture, however, some practically important cases. But even in the general case, both mutual information secure and distinction secure sequences of codes are also semantically secure according to Definition 2.
Figure 2: Point-to-point cq channel models considered in Section 4.. a — Generic cq channel model for channel resolvability., b — Generic cq channel model for transmission of messages.
Important prerequisites in proving achievability of rates for the wiretap channel will be results of achievability for the channel resolvability problem, depicted in Fig. 2 (a), and the channel coding problem, depicted in Fig. 2 (b).
Here, we just consider a point-to-point cq channel described by a measurable map \(D: \mathcal{X}\rightarrow \mathcal{S}({\mathcal{H}})\) with some separable Hilbert space \(\mathcal{H}\). We focus on two different problems: In channel resolvability, the question will be which transmit strategies (i.e., rules for generating \(X^n\)) the transmitter can employ to approximate the output state \(D_P^{\otimes n}\) by \(D^n(X^n)\) at the receiver. In channel coding, the task is to encode a given message for transmission through the channel in such a way that the receiver can with high probability decode the message. In the proof of Theorems 1 and 2, the security criterion can then be shown by applying the resolvability result to the channel \(D= D_\mathfrak{E}\), and for the case of a quantum output at the legitimate receiver, the average decoding error criterion can be shown by applying the channel coding result to the channel \(D= D_\mathfrak{B}\) (for the case of classical output at \(\mathfrak{B}\), we cite a classical channel coding result which is used instead).
In the solution to both of these problems, we use the same standard random codebook construction: The random codebook \(\mathcal{C}\) of block length \(n\) and size \(M\) is of the form \(\mathcal{C}:= (\mathcal{C}(m))_{m=1}^M\) where the codewords \(\mathcal{C}(m)\) are vectors with entries from \(\mathcal{X}\) and length \(n\) where all entries across all codewords are i.i.d. and follow the distribution \(P\).
Both theorems stated in this section use the technical assumption that there is some \(\alpha_{\min} \in (0,1)\) such that: \[\label{eq:technical-assumptions} \begin{align} &D_P^{\alpha_{\min}} \in \mathcal{T}({\mathcal{H}}), \\ &\text{for P-almost all } x:~ D(x)^{\alpha_{\min}} \in \mathcal{T}({\mathcal{H}}), \\ &\text{the Bochner integral } \mathbb{E}(D(X)^{\alpha_{\min}}) \in \mathcal{T}({\mathcal{H}}) \text{ exists}. \end{align}\tag{50}\] It is an immediate consequence of Lemmas 20 and 21 that \(\chi(P;D) < \infty\) whenever (50 ) holds, and we will implicitly use this fact in the following.
For channel resolvability, each fixed realization of \(\mathcal{C}\) defines an induced channel output density \[\label{eq:codebook-channel-output-definition} D_\mathcal{C} := \frac{1}{M} \sum\limits_{m=1}^M D^n(\mathcal{C}(m))\tag{51}\] which results if the transmitter chooses a codeword for transmission through the channel uniformly at random. With these definitions, we now have the necessary terminology to state our resolvability and coding results. The proofs are deferred to Sections 5.2 and 5.3, and in Sections 5.4 and 5.5, we analyze the concentration behavior of the errors and extend both results to the case of cost-constrained channel inputs.
Theorem 3. Let \(R> \chi(P;D)\), and suppose \(M\geq \exp(nR)\). Moreover, assume that (50 ) holds. Then, for all \(n\in \mathbb{N}\), we have \[\label{eq:q-resolvability-all-blocklengths} \mathbb{E}_\mathcal{C} \left\lVert { D_\mathcal{C} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} \leq \mathcal{W}_\mathrm{res}(R,n),\qquad{(5)}\] with \(\mathcal{W}_\mathrm{res}\) defined in 17 . Furthermore, \(\mathcal{W}_\mathrm{res}(R,n)\) tends to \(0\) exponentially as \(n\) tends to \(\infty\). That is, for sufficiently large \(n\in \mathbb{N}\), we have \(\gamma\in (0,\infty)\) with \[\label{eq:q-resolvability-large-blocklengths} \mathcal{W}_\mathrm{res}(R,n) \leq \exp(-n\gamma).\qquad{(6)}\]
For channel coding, given a fixed codebook \(\mathcal{C}\), the transmitter chooses the codeword \(\mathcal{C}(\mathfrak{M})\) to encode a message \(\mathfrak{M}\).
Theorem 4. Let \(R< \chi(P;D)\), and suppose \(M\leq \exp(nR)\). Moreover, assume that (50 ) holds. Then, for each \(\mathcal{C}\), there is a decoding povm \((Y_m)_{m=1}^M\) such that every \(Y_m\) is measurable as a function of \(\mathcal{C}\), and for all \(n\in \mathbb{N}\), we have \[\label{eq:decoding-error-all-blocklengths} \mathbb{E}_\mathcal{C}\left( \frac{1}{M} \sum_{m=1}^M \mathrm{tr}\left( D(\mathcal{C}(m)) (\mathbf{1}- Y_m) \right) \right) \leq \mathcal{W}_\mathrm{coding}(R,n),\qquad{(7)}\] with \(\mathcal{W}_\mathrm{coding}\) defined in 16 . Furthermore, \(\mathcal{W}_\mathrm{coding}(R,n)\) tends to \(0\) exponentially. That is, there is \(\gamma\in (0,\infty)\) such that for sufficiently large \(n\in \mathbb{N}\), we have \[\label{eq:decoding-error-large-blocklengths} \mathcal{W}_\mathrm{coding}(R,n) \leq \exp(-n\gamma).\qquad{(8)}\]
In this section, we prove the theorems which are stated in the preceding sections. In Section 5.1, we introduce notions of typicality and state two lemmas around typicality that will be needed both for the resolvability and coding results. We then proceed to proving Theorem 3 on resolvability in Section 5.2 and Theorem 4 on coding in Section 5.3. In Section 5.4, we show that the error terms are very tightly concentrated around their expectations. Extensions that incorporate an additive input cost constraint can be found in Section 5.5. Finally, everything is put together in Section 5.6 where Theorems [theorem:wiretap-ccq-all-blocklengths], 1, [theorem:wiretap-cq-all-blocklengths], and 2 on coding for the wiretap channel are proved. The proofs of technical lemmas are relegated to Appendix 6.5.
In this section, we introduce typicality notions and related technical lemmas that are used in the proofs of Theorems 3 and 4.
For \(n\)-fold uses of the channel \(D\), we note that via the definitions \[\begin{align} e_{y^n| x^n} &:= \bigotimes\limits_{i=1}^n e_{y_i| x_i} \\ e_{y^n} &:= \bigotimes\limits_{i=1}^n e_{y_i} \\ P(x^n) &:= \prod\limits_{i=1}^n P(x_i) \\ P(y^n| x^n) &:= \prod\limits_{i=1}^n P(y_i| x_i) \\ U(y^n) &:= \prod\limits_{i=1}^n U(y_i), \end{align}\] we identify \(P\) and \(U\) with corresponding distributions on \(\mathcal{X}^n\times \mathbb{N}^n\) (and the resulting marginals) and \(\mathbb{N}^n\), and we have \[\begin{align} D^n(x^n) &= \sum_{y^n\in \mathbb{N}^n} P(y^n| x^n) \left\lvert {e_{y^n| x^n}} \right\rangle\left\langle {e_{y^n| x^n}} \right\rvert, \\ D_P^{\otimes n} &= \sum_{y^n\in \mathbb{N}^n} U(y^n) \left\lvert {e_{y^n}} \right\rangle\left\langle {e_{y^n}} \right\rvert. \end{align}\]
For any \(\varepsilon\in (0, \infty)\), we define \[\begin{align} \mathcal{P}_{\varepsilon,n} &:= \left\{ ( x^n, y^n ) \in \mathcal{X}^n \times \mathbb{N}^n :~ n(H_P- \varepsilon) < -\log P( y^n ~|~ x^n ) < n(H_P+ \varepsilon) \right\} \\ \mathcal{U}_{\varepsilon,n} &:= \left\{ y^n \in \mathbb{N}^n :~ n(H_U- \varepsilon) < -\log U( y^n ) < n(H_U+ \varepsilon) \right\} \end{align}\] and, based on these definitions, \[\begin{align} \label{eq:spectral-decomposition-joint-typicality-operator} \Psi_{\varepsilon,n} (x^n) &:= \sum\limits_{y^n\in \mathbb{N}^n} \mathbb{1}_{\mathcal{P}_{\varepsilon,n}} ( x^n, y^n ) \left\lvert {e_{ y^n | x^n }} \right\rangle\left\langle {e_{ y^n | x^n }} \right\rvert \displaybreak[0] \\ \nonumber \Theta_{\varepsilon,n} &:= \sum\limits_{y^n\in \mathbb{N}^n} \mathbb{1}_{\mathcal{U}_{\varepsilon,n}} (y^n) \left\lvert {e_{y^n}} \right\rangle\left\langle {e_{y^n}} \right\rvert. \end{align}\tag{52}\] While \(\Theta_{\varepsilon,n}\) is a fixed operator, \(\Psi_{\varepsilon,n}\) is an operator-valued function and its measurability is not immediately clear from the definition. Therefore, we note that it can also be written as \[\Psi_{\varepsilon,n} (x^n) = \mathbb{1}_{( \exp(-n(H_P+ \varepsilon)), \exp(-n(H_P- \varepsilon)) )} \big( D^n(x^n) \big)\] and is therefore a measurable map \(\mathcal{X}^n\rightarrow \mathcal{T}({\mathcal{H}^{\otimes n}})\) by Lemma 18-[item:spectral-decompositions-measurable-indicators]. We also note the following basic properties of these projections which are straightforward to verify from their definitions: \[\label{eq:typicality-operators-basic-properties} \Theta_{\varepsilon,n}^2 = \Theta_{\varepsilon,n} , ~~ \Psi_{\varepsilon,n} (x^n)^2 = \Psi_{\varepsilon,n} (x^n) , ~~ D^n(x^n) \Psi_{\varepsilon,n} (x^n) = \Psi_{\varepsilon,n} (x^n) D^n(x^n).\tag{53}\]
Finally, define \[\begin{align} \tag{54} \Gamma_{\varepsilon,n}:~ \mathcal{X}^n\rightarrow \mathcal{T}({\mathcal{H}^{\otimes n}}),~ x^n &\mapsto \Theta_{\varepsilon,n} \Psi_{\varepsilon,n} (x^n) \\ \tag{55} \Phi_{\varepsilon,n}:~ \mathcal{X}^n\rightarrow \mathcal{T}({\mathcal{H}^{\otimes n}}),~ x^n &\mapsto \Theta_{\varepsilon,n} \Psi_{\varepsilon,n}(x^n) \Theta_{\varepsilon,n}. \end{align}\] which clearly are also measurable.
These notions of typicality will allow us to split the error terms that appear in the proofs of Theorems 3 and 4 into a typical and an atypical part. These terms can be bounded separately with the help of the next two lemmas. These are fairly well-known facts in quantum information theory. For this reason, we omit the proofs here, but include them in Appendix 6.5 for sake of self-containedness of the paper.
Lemma 8. **(Bound for typical terms).* We have*
\(\forall x^n\in \mathcal{X}^n~~ \Psi_{\varepsilon,n} (x^n) D^n (x^n) \Psi_{\varepsilon,n} (x^n) \leq \exp\big( - n (H_P- \varepsilon) \big) \Psi_{\varepsilon,n} (x^n)\)
\(\forall x^n\in \mathcal{X}^n~~ \mathrm{tr}\Psi_{\varepsilon,n} (x^n) < \exp\big( n (H_P+ \varepsilon) \big)\)
\(\Theta_{\varepsilon,n} D_P^{\otimes n} \Theta_{\varepsilon,n} \leq \exp\big( - n (H_U- \varepsilon) \big) \Theta_{\varepsilon,n}\)
\(\mathrm{tr}\Theta_{\varepsilon,n} < \exp\big( n (H_U+ \varepsilon) \big)\).
Lemma 9. **(Bound for atypical terms).* For all \(n\in \mathbb{N}, \alpha_1, \alpha_3 \in (1,\infty), \alpha_2, \alpha_4 \in [\alpha_{\min},1)\), we have \[\begin{align} \tag{56} \mathbb{E}\mathrm{tr}\left( D^n(X^n) \Gamma_{\varepsilon,n} (X^n) ^* \right) &\geq 1 - \mathcal{R}_1(\varepsilon, n) - \mathcal{R}_2(\varepsilon, n) - \mathcal{R}_3(\varepsilon, n) - \mathcal{R}_4(\varepsilon, n) \\ \tag{57} \mathbb{E}\mathrm{tr}\left( D^n(X^n) \Phi_{\varepsilon,n} (X^n) \right) &\geq 1 - \mathcal{R}_1(\varepsilon, n) - \mathcal{R}_2(\varepsilon, n) - 2\mathcal{R}_3(\varepsilon, n) - 2\mathcal{R}_4(\varepsilon, n), \end{align}\] where \(\Gamma_{\varepsilon,n}\) is defined in (54 ), \(\Phi_{\varepsilon,n}\) is defined in (55 ), \(\mathcal{R}_1\), \(\mathcal{R}_2\), \(\mathcal{R}_3\), and \(\mathcal{R}_4\) are defined in 12 to 15 . Furthermore, the lower bounds 56 and 57 tend to \(1\) exponentially as \(n\) tends to \(\infty\). That is, there is \(\gamma_1 \in (0,\infty)\) such that for sufficiently large \(n\), we have \[\label{eq:atypicalTermsFunc-bounds} \mathcal{R}_1(\varepsilon, n), \mathcal{R}_2(\varepsilon, n), \mathcal{R}_3(\varepsilon, n), \mathcal{R}_4(\varepsilon, n), < \exp(-\gamma_1 n).\tag{58}\] *
We start with a preliminary lemma which is used in the proof of Theorem 3 and encapsulates a symmetrization argument similar to the one used in [50].
Lemma 10. Let \((\mathcal{X}, \Sigma)\) be a measurable space and \(T: \mathcal{X}\rightarrow \mathcal{T}({\mathcal{H}})\) a measurable map. Let \(P\) be a probability measure on \(\mathcal{X}\), and \(X:= (X_1, \dots, X_\ell)\) be a tuple of \(\mathcal{X}\)-valued random variables i.i.d. according to \(P\). Let \[T_X := \frac{1}{\ell} \sum\limits_{i=1}^\ell T(X_i).\] Then, we have \[\begin{align} \label{eq:symmetrization-first-alternative} \mathbb{E}\left\lVert { T_X - \mathbb{E} T_X } \right\rVert_\mathrm{tr} &\leq 2 \mathrm{tr}\sqrt{ \frac{1}{\ell} \mathbb{E}\big( T(X_1)^* T(X_1) \big), } \\ \label{eq:symmetrization-second-alternative} \mathbb{E}\left\lVert { T_X - \mathbb{E} T_X } \right\rVert_\mathrm{tr} &\leq 2 \mathrm{tr}\sqrt{ \frac{1}{\ell} \mathbb{E}\big( T(X_1) T(X_1)^* \big) }. \end{align}\] {#eq: sublabel=eq:eq:symmetrization-first-alternative,eq:eq:symmetrization-second-alternative}
Proof. Let \(\hat{X_1}, \dots, \hat{X_\ell}\) be i.i.d. independent copies of \(X_1, \dots, X_\ell\), and let \(\mathcal{E}_1, \dots, \mathcal{E}_\ell\) be i.i.d. uniformly on \(\{-1,1\}\). The following derivations are adapted from the proof of [50]. \[\begin{align} \mathbb{E}\left\lVert { T_X - \mathbb{E} T_X } \right\rVert_\mathrm{tr} &= \mathbb{E}_X \left\lVert { \mathbb{E}_{\hat{X}} \sum\limits_{i=1}^\ell \frac{1}{\ell} \left( T(X_i) - T(\hat{X}_i) \right) } \right\rVert_\mathrm{tr} \\ \overset{(a)}&{\leq} \mathbb{E}_{X,\hat{X}} \left\lVert { \sum\limits_{i=1}^\ell \frac{1}{\ell} \left( T(X_i) - T(\hat{X}_i) \right) } \right\rVert_\mathrm{tr} \displaybreak[0] \\ \overset{(b)}&{=} \mathbb{E}_{X,\hat{X},\mathcal{E}} \left\lVert { \sum\limits_{i=1}^\ell \frac{1}{\ell} \mathcal{E}_i \left( T(X_i) - T(\hat{X}_i) \right) } \right\rVert_\mathrm{tr} \displaybreak[0] \\ &\leq \mathbb{E}_{X,\mathcal{E}} \left\lVert { \sum\limits_{i=1}^\ell \frac{1}{\ell} \mathcal{E}_i T(X_i) } \right\rVert_\mathrm{tr} + \mathbb{E}_{\hat{X},\mathcal{E}} \left\lVert { \sum\limits_{i=1}^\ell \frac{1}{\ell} \mathcal{E}_i T(\hat{X}_i) } \right\rVert_\mathrm{tr} \displaybreak[0] \\ &= 2 \mathbb{E}_{X,\mathcal{E}} \left\lVert { \sum\limits_{i=1}^\ell \frac{1}{\ell} \mathcal{E}_i T(X_i) } \right\rVert_\mathrm{tr} \displaybreak[0] \\ &= 2 \mathbb{E}_{X,\mathcal{E}} \mathrm{tr} \sqrt{ \left( \sum\limits_{i=1}^\ell \frac{1}{\ell} \mathcal{E}_i T(X_i) \right) ^* \left( \sum\limits_{i=1}^\ell \frac{1}{\ell} \mathcal{E}_i T(X_i) \right) } \displaybreak[0] \\ \overset{(c)}&{\leq} 2 \mathrm{tr} \sqrt{ \mathbb{E}_{X,\mathcal{E}} \left( \left( \sum\limits_{i=1}^\ell \frac{1}{\ell} \mathcal{E}_i T(X_i) \right) ^* \left( \sum\limits_{i=1}^\ell \frac{1}{\ell} \mathcal{E}_i T(X_i) \right) \right) } \displaybreak[0] \\ &= 2 \mathrm{tr} \sqrt{ \frac{1}{\ell^2} \mathbb{E}_X \sum\limits_{i_1,i_2=1}^\ell \mathbb{E}_\mathcal{E} ( \mathcal{E}_{i_1} \mathcal{E}_{i_2} ) T(X_{i_1}) ^* T(X_{i_2}) } \displaybreak[0] \\ &\overset{(d)}{=} 2 \mathrm{tr} \sqrt{ \frac{1}{\ell^2} \mathbb{E}_X \sum\limits_{i=1}^\ell T(X_{i}) ^* T(X_{i}) } \\ &= 2 \mathrm{tr} \sqrt{ \frac{1}{\ell} \mathbb{E}_{X_1} T(X_{1}) ^* T(X_{1}) }. \end{align}\] Step (a) follows by Lemma 19-[item:bochner-integral-basics-trace-norm-jensen]. For step (b), we observe that the equality holds conditioned on any realization of \(\mathcal{E}_1, \dots, \mathcal{E}_\ell\) since \(X_i\) and \(\hat{X}_i\) are identically distributed and therefore can be swapped if \(\mathcal{E}_i= -1\). Inequality (c) is by Lemma 19-[item:bochner-integral-basics-square-root-jensen]. Finally, equality (d) holds because \(\mathbb{E}_\mathcal{E} ( \mathcal{E}_{i_1} \mathcal{E}_{i_2} )\) equals \(1\) if \(i_1=i_2\) and \(0\) otherwise. This concludes the proof of (?? ). (?? ) follows similarly using \(\left\lVert {A} \right\rVert_\mathrm{tr} = \left\lVert {A^*} \right\rVert_\mathrm{tr}\) before expanding the trace norm. ◻
Remark 4. [50] (from the proof of which the first part of the calculation above is adapted) bounds the absolute deviation of an empirical average from its expectation in terms of the Rademacher complexity of a suitably defined class of functions. Indeed, via the trace norm duality stated in Lemma 15-[item:norm-basics-trace-norm-duality], the term \[\mathbb{E}_{X,\mathcal{E}} \left\lVert { \sum\limits_{i=1}^\ell \frac{1}{\ell} \mathcal{E}_i T(X_i) } \right\rVert_\mathrm{tr}\] which appears in the calculation can be argued to be equal to the Rademacher complexity of the function class \[\{ \mathcal{X} \ni x \mapsto \mathrm{tr}\left( A T(x) \right) \in \mathbb{C} ~:~ A\in \mathcal{B}({\mathcal{H}}),~ \left\lVert {A} \right\rVert_\mathrm{op} \leq 1 \}.\]
Proof of Theorem 3. In this proof, we use the definitions of Section 5.1. We multiply with identities and use the triangle inequality to bound \[\begin{align} \nonumber &\hphantom{{}={}} \mathbb{E}_\mathcal{C} \left\lVert { D_\mathcal{C} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} \\ \nonumber &= \mathbb{E}_\mathcal{C} \left\lVert { \frac{1}{M} \sum\limits_{m=1}^M\Big( D^n(\mathcal{C}(m)) - \mathbb{E}_\mathcal{C} D^n(\mathcal{C}(m)) \Big) } \right\rVert_\mathrm{tr} \displaybreak[0] \\ \nonumber &= \mathbb{E}_\mathcal{C} \left\lVert { \frac{1}{M} \sum\limits_{m=1}^M\bigg( \big( \Gamma_{\varepsilon, n} (\mathcal{C}(m)) + \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big) D^n(\mathcal{C}(m)) - \mathbb{E}_\mathcal{C}\Big( \big( \Gamma_{\varepsilon, n} (\mathcal{C}(m)) + \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big) D^n(\mathcal{C}(m)) \Big) \bigg) } \right\rVert_\mathrm{tr} \displaybreak[0] \\ \nonumber &\leq \begin{aligned}[t] &\mathbb{E}_\mathcal{C} \left\lVert { \frac{1}{M} \sum\limits_{m=1}^M\Big( \Gamma_{\varepsilon, n} (\mathcal{C}(m)) D^n(\mathcal{C}(m)) - \mathbb{E}_\mathcal{C}\big( \Gamma_{\varepsilon, n} (\mathcal{C}(m)) D^n(\mathcal{C}(m)) \big) } \right\rVert_\mathrm{tr} \\ &+ \mathbb{E}_\mathcal{C} \left\lVert { \frac{1}{M} \sum\limits_{m=1}^M\bigg( \big( \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big) D^n(\mathcal{C}(m)) - \mathbb{E}_\mathcal{C}\Big( \big( \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big) D^n(\mathcal{C}(m)) \Big) \bigg) } \right\rVert_\mathrm{tr} \end{aligned} \displaybreak[0] \\ \nonumber &= \begin{align}[t] &\mathbb{E}_\mathcal{C} \left\lVert { \frac{1}{M} \sum\limits_{m=1}^M\Big( \Gamma_{\varepsilon, n} (\mathcal{C}(m)) D^n(\mathcal{C}(m)) - \mathbb{E}_\mathcal{C}\big( \Gamma_{\varepsilon, n} (\mathcal{C}(m)) D^n(\mathcal{C}(m)) \big) } \right\rVert_\mathrm{tr} \\ &+ \begin{multlined}[t] \mathbb{E}_\mathcal{C} \Bigg\lVert \frac{1}{M} \sum\limits_{m=1}^M\bigg( \big( \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big) D^n(\mathcal{C}(m)) \big( \Gamma_{\varepsilon, n} (\mathcal{C}(m)) + \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big)^* \\- \mathbb{E}_\mathcal{C}\Big( \big( \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big) D^n(\mathcal{C}(m)) \big( \Gamma_{\varepsilon, n} (\mathcal{C}(m)) + \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big)^* \Big) \bigg) \Bigg\rVert_\mathrm{tr} \end{multlined} \end{align} \displaybreak[0] \\ \label{eq:q-resolvability-split} &\begin{align}[c] {}\leq{} &\mathbb{E}_\mathcal{C} \left\lVert { \frac{1}{M} \sum\limits_{m=1}^M\Big( \Gamma_{\varepsilon, n} (\mathcal{C}(m)) D^n(\mathcal{C}(m)) - \mathbb{E}_\mathcal{C}\big( \Gamma_{\varepsilon, n} (\mathcal{C}(m)) D^n(\mathcal{C}(m)) \big) } \right\rVert_\mathrm{tr} \\ &+ \mathbb{E}_\mathcal{C} \left\lVert { \frac{1}{M} \sum\limits_{m=1}^M\Big( D^n(\mathcal{C}(m)) \Gamma_{\varepsilon, n}(\mathcal{C}(m))^* - \mathbb{E}_\mathcal{C}\big( D^n(\mathcal{C}(m)) \Gamma_{\varepsilon, n}(\mathcal{C}(m))^* \big) \Big) } \right\rVert_\mathrm{tr} \\ &+ \mathbb{E}_\mathcal{C} \left\lVert { \frac{1}{M} \sum\limits_{m=1}^M\bigg( \Gamma_{\varepsilon, n} (\mathcal{C}(m)) D^n(\mathcal{C}(m)) \Gamma_{\varepsilon, n}(\mathcal{C}(m))^* - \mathbb{E}_\mathcal{C}\Big( \Gamma_{\varepsilon, n} (\mathcal{C}(m)) D^n(\mathcal{C}(m)) \Gamma_{\varepsilon, n}(\mathcal{C}(m))^* \Big) \bigg) } \right\rVert_\mathrm{tr} \\ &+ \begin{multlined}[t] \mathbb{E}_\mathcal{C} \Bigg\lVert \frac{1}{M} \sum\limits_{m=1}^M\bigg( \big( \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big) D^n(\mathcal{C}(m)) \big( \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big)^*\\ - \mathbb{E}_\mathcal{C}\Big( \big( \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big) D^n(\mathcal{C}(m)) \big( \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big)^* \Big) \bigg) \Bigg\rVert_\mathrm{tr} \end{multlined} \end{align} \end{align}\tag{59}\] We next bound the summands in (59 ) separately. For the last summand, we calculate \[\begin{align} &\hphantom{{}={}} \begin{multlined}[t] \mathbb{E}_\mathcal{C} \Bigg\lVert \frac{1}{M} \sum\limits_{m=1}^M\bigg( \big( \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big) D^n(\mathcal{C}(m)) \big( \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big)^*\\ - \mathbb{E}_\mathcal{C}\Big( \big( \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big) D^n(\mathcal{C}(m)) \big( \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big)^* \Big) \bigg) \Bigg\rVert_\mathrm{tr} \end{multlined} \\ &\overset{(a)}{\leq} \begin{aligned}[t] &\mathbb{E}_\mathcal{C} \left\lVert { \frac{1}{M} \sum\limits_{m=1}^M\Big( \big( \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big) D^n(\mathcal{C}(m)) \big( \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big)^* \Big) } \right\rVert_\mathrm{tr} \\ &+ \left\lVert { \frac{1}{M} \mathbb{E}_\mathcal{C} \sum\limits_{m=1}^M\Big( \big( \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big) D^n(\mathcal{C}(m)) \big( \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big)^* \Big) } \right\rVert_\mathrm{tr} \end{aligned} \\ &\overset{(b)}{=} 2\frac{1}{M} \sum\limits_{m=1}^M \mathbb{E}_\mathcal{C}\mathrm{tr}{\Big( \big( \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big) D^n(\mathcal{C}(m)) \big( \mathbf{1} - \Gamma_{\varepsilon, n} (\mathcal{C}(m)) \big)^* \Big)} \\ &= 2 \mathbb{E}_{X^n}\mathrm{tr}{\Big( \big( \mathbf{1} - \Gamma_{\varepsilon, n} (X^n) \big) D^n(X^n) \big( \mathbf{1} - \Gamma_{\varepsilon, n} (X^n) \big)^* \Big)} \\ &= 2\mathbb{E}_{X^n}\left( 1 - \mathrm{tr}{\Big( \Theta_{\varepsilon,n} \Psi_{\varepsilon,n} (X^n) D^n(X^n) \Big)} - \mathrm{tr}{\Big( D^n(X^n) \Gamma_{\varepsilon, n}(X^n)^* \Big)} + \mathrm{tr}{\Big( \Theta_{\varepsilon,n} \Psi_{\varepsilon,n} (X^n) D^n(X^n) \Psi_{\varepsilon,n} (X^n) \Theta_{\varepsilon,n} \Big)} \right) \\ &\overset{(c)}{=} 2\left( 1 - \mathbb{E}_{X^n}\Big( \mathrm{tr}{\big( D^n(X^n) \Gamma_{\varepsilon, n}(X^n)^* \big)} \Big) \right) \\ &\overset{(d)}{\leq} 2\mathcal{R}_1(\varepsilon, n) + 2\mathcal{R}_2(\varepsilon, n) + 2\mathcal{R}_3(\varepsilon, n) + 2\mathcal{R}_4(\varepsilon, n). \end{align}\] (a) is due to the triangle inequality, (b) uses Lemma 19-[item:bochner-integral-basics-bounded-functional] and the fact that trace and trace norm are equal for positive semi-definite operators, (c) is by cyclic permutations inside the trace and using (53 ). Finally, (d) is due to Lemma 9.
For the first summand in (59 ), we apply (?? ) of Lemma 10 with \(T: \mathcal{X}^n \rightarrow \mathcal{T}({\mathcal{H}}), x^n \mapsto \Gamma_{\varepsilon, n} (x^n) D^n(x^n)\) (which is measurable by Lemma 16), \(\ell= M\) and \(X_m= \mathcal{C}(m)\). This yields \[\begin{align} &\hphantom{{}={}} \mathbb{E}_\mathcal{C} \left\lVert { \frac{1}{M} \sum\limits_{m=1}^M\Big( \Gamma_{\varepsilon, n} (\mathcal{C}(m)) D^n(\mathcal{C}(m)) - \mathbb{E}_\mathcal{C}\big( \Gamma_{\varepsilon, n} (\mathcal{C}(m)) D^n(\mathcal{C}(m)) \big) \Big) } \right\rVert_\mathrm{tr} \\ &\leq 2\mathrm{tr}{ \sqrt{ \frac{1}{M} \mathbb{E}_{X^n}\big( \Gamma_{\varepsilon, n} (X^n) D^n(X^n)^2 \Gamma_{\varepsilon, n} (X^n) ^* \big) } } \\ \overset{(\ref{eq:typicality-operators-basic-properties})}&{=} 2\mathrm{tr}{ \sqrt{ \frac{1}{M} \mathbb{E}_{X^n}\big( \Theta_{\varepsilon, n} D^n (X^n)^{\frac{1}{2}} \Psi_{\varepsilon, n} (X^n) D^n(X^n) \Psi_{\varepsilon, n} (X^n) D^n (X^n)^{\frac{1}{2}} \Theta_{\varepsilon, n} \big) } }. \end{align}\] For the second summand in (59 ), we use (?? ) of Lemma 10 with \(T: \mathcal{X}^n \rightarrow \mathcal{T}({\mathcal{H}}), x^n \mapsto D^n(x^n) \Gamma_{\varepsilon, n} (x^n)^*\) (which is again measurable by Lemma 16) and obtain the exact same upper bound.
For the third summand in (59 ), we use Lemma 10 one more time with \(T: \mathcal{X}^n \rightarrow \mathcal{T}({\mathcal{H}}), x^n \mapsto \Gamma_{\varepsilon, n} (x^n) D^n(x^n) \Gamma_{\varepsilon, n} (x^n)^*\) (measurability is again by Lemma 16; this time it does not matter which alternative we use because \(T\) has self-adjoint values) and obtain \[\begin{align} &\hphantom{{}={}} \mathbb{E}_\mathcal{C} \left\lVert { \frac{1}{M} \sum\limits_{m=1}^M\Big( \Gamma_{\varepsilon, n} (\mathcal{C}(m)) D^n(\mathcal{C}(m)) \Gamma_{\varepsilon, n}(\mathcal{C}(m))^* - \mathbb{E}_\mathcal{C}\big( \Gamma_{\varepsilon, n} (\mathcal{C}(m)) D^n(\mathcal{C}(m)) \Gamma_{\varepsilon, n}(\mathcal{C}(m))^* \big) \Big) } \right\rVert_\mathrm{tr} \\ &\leq 2\mathrm{tr}\sqrt{ \frac{1}{M} \mathbb{E}_{X^n}\left( \Gamma_{\varepsilon, n} (X^n) D^n(X^n) \Gamma_{\varepsilon, n} (X^n) ^* \Gamma_{\varepsilon, n} (X^n) D^n(X^n) \Gamma_{\varepsilon, n} (X^n) ^* \right) } \\ \overset{(\ref{eq:typicality-operators-basic-properties})}&{=} 2\mathrm{tr}{ \sqrt{ \frac{1}{M} \mathbb{E}_{X^n}\big( \Theta_{\varepsilon, n} D^n (X^n)^{\frac{1}{2}} \Psi_{\varepsilon, n} (X^n) D^n (X^n)^{\frac{1}{2}} \Psi_{\varepsilon, n} (X^n) \Theta_{\varepsilon, n} \Psi_{\varepsilon, n} (X^n) D^n (X^n)^{\frac{1}{2}} \Psi_{\varepsilon, n} (X^n) D^n (X^n)^{\frac{1}{2}} \Theta_{\varepsilon, n} \big) } } \\ \overset{(a)}&{\leq} 2\mathrm{tr}{ \sqrt{ \frac{1}{M} \mathbb{E}_{X^n}\big( \Theta_{\varepsilon, n} D^n (X^n)^{\frac{1}{2}} \Psi_{\varepsilon, n} (X^n) D^n(X^n) \Psi_{\varepsilon, n} (X^n) D^n (X^n)^{\frac{1}{2}} \Theta_{\varepsilon, n} \big) } }. \end{align}\] (a) is by \(\Theta_{\varepsilon,n} \leq \mathbf{1}\) and once more applying (53 ). We have obtained matching upper bounds in all three cases and can conclude our calculation with a successive application of the sub-items of Lemma 8 in conjunction with the operator monotonicity of the square root [51]. The number of the subitem of Lemma 8 is indicated above the inequality sign where it is applied. \[\begin{align} &\hphantom{{}={}} \mathrm{tr}{ \sqrt{ \frac{1}{M} \mathbb{E}_{X^n}\big( \Theta_{\varepsilon, n} D^n (X^n)^{\frac{1}{2}} \Psi_{\varepsilon, n} (X^n) D^n(X^n) \Psi_{\varepsilon, n} (X^n) D^n (X^n)^{\frac{1}{2}} \Theta_{\varepsilon, n} \big) } } \\ &\overset{\ref{item:q-resolvability-typical-terms-joint})}{\leq} \mathrm{tr}{ \sqrt{ \frac{1}{M} \exp\big( - n (H_P- \varepsilon) \big) \mathbb{E}_{X^n}\big( \Theta_{\varepsilon, n} D^n (X^n) \Theta_{\varepsilon, n} \big) } } \displaybreak[0] \\ &= \mathrm{tr}{ \sqrt{ \frac{1}{M} \exp\big( - n (H_P- \varepsilon) \big) \Theta_{\varepsilon, n} D_P^{\otimes n} \Theta_{\varepsilon, n} } } \displaybreak[0] \\ &\overset{\ref{item:q-resolvability-typical-terms-output})}{\leq} \mathrm{tr}{ \sqrt{ \frac{1}{M} \exp\big( - n (H_P+ H_U- 2\varepsilon) \big) \Theta_{\varepsilon, n} } } \displaybreak[0] \\ &\overset{(a)}{\leq} \exp\left( - \frac{1}{2} n (H_P+ H_U+ R- 2\varepsilon) \right) \mathrm{tr}{ \Theta_{\varepsilon, n} } \displaybreak[0] \\ &\overset{\ref{item:q-resolvability-typical-terms-output-trace})}{\leq} \exp\left( - \frac{1}{2} n (H_P- H_U+ R- 4\varepsilon) \right) \\ \overset{eq. (\ref{eq:quantum-information})}&{=} \exp\left( - \frac{1}{2} n (R- \chi(P;D)- 4\varepsilon) \right). \end{align}\] Step (a) is due to the assumption \(M\geq \exp(nR)\) and (53 ). ?? now follows from 59 and the upper bounds for the four summands on its right hand side which we have calculated. To prove ?? , we choose \(\varepsilon\in (0,(R-\chi(P;D))/4)\) and invoke Lemma 9 to fix \(\gamma_1\) which satisfies 58 for this choice of \(\varepsilon\). The infimum in 17 is clearly upper bounded by the realization for our choice of \(\varepsilon\), so we have argued that ?? holds for any choice \[\gamma \in \left( 0, \min\left( \gamma_1, \frac{1}{2} (R- \chi(P;D)- 4\varepsilon) \right) \right). \qedhere\] ◻
In this section, we follow the methodology in [52], making adaptations as needed to derive the exponential error bound as stated in Theorem 4. An essential ingredient will be the following lemma from [52]. In the statement of the lemma, we need the notion of Moore-Penrose pseudoinverse which assigns to any \(A\in\mathcal{B}({\mathcal{H}})\) an (unbounded) operator \(A^{-1}\) acting on \(\mathcal{H}\) [53]. Moreover, for \(A\in\mathcal{B}({\mathcal{H}})\), we use the notation \(\mathrm{im}(A):=\{h\in \mathcal{H}: \exists \tilde{h} \in \mathcal{H}\textrm{ such that }h= A\tilde{h} \}\). The following lemma is a well-established fact in quantum information theory, but some care needs to be taken in the infinite-dimensional case to ensure the Moore-Penrose pseudoinverse which appears is well-behaved. Therefore, we include a proof of this part of the lemma in Appendix 6.5.
Lemma 11. **(Hayashi-Nagaoka [52]) Let \(c\in (0,\infty)\), and let \(A, B\in \mathcal{B}({\mathcal{H}})\) with \(0 \leq A\leq \mathbf{1}\) and \(0 \leq B\) such that \(\mathrm{im}(A+B)\) is closed. Then the following statements hold true:
The Moore-Penrose pseudoinverse \(\sqrt{A+B}^{-1}\) is a bounded linear operator, i.e., \(\sqrt{A+B}^{-1} \in \mathcal{B}({\mathcal{H}})\).
For any real number \(c>0\), we have \[\mathbf{1} - \sqrt{A+ B}^{-1} A \sqrt{A+ B}^{-1} \leq (1+c) (\mathbf{1}- A) + (2+c+c^{-1}) B.\]
For the proof of Theorem 4, we use the definitions from Section 5.1. For decoding, we choose the povm \((Y_m)_{m=1}^M\) defined as \[Y_m := \sqrt{ \sum_{\hat{m}=1}^M \Phi_{\varepsilon,n} (\mathcal{C}(\hat{m})) }^{-1} \Phi_{\varepsilon,n} (\mathcal{C}(m)) \sqrt{ \sum_{\hat{m}=1}^M \Phi_{\varepsilon,n} (\mathcal{C}(\hat{m})) }^{-1},\] where we use \(\Phi_{\varepsilon,n}\) defined in (55 ). We note that \[\dim \mathrm{im}\left(\sum_{\hat{m}=1}^M \Phi_{\varepsilon,n} (\mathcal{C}(\hat{m}))\right)< \infty,\] by Lemma 8-[item:q-resolvability-typical-terms-output-trace] which implies that \[\mathrm{im}\left(\sum_{\hat{m}=1}^M \Phi_{\varepsilon,n} (\mathcal{C}(\hat{m}))\right)\] is closed. Consequently, by the first statement of Lemma 11 we have \[\sqrt{ \sum_{\hat{m}=1}^M \Phi_{\varepsilon,n} (\mathcal{C}(\hat{m})) }^{-1} \in \mathcal{B}({\mathcal{H}}).\]
The first part of the statement of Theorem 4, namely that the \(Y_m\) are measurable functions of \(\mathcal{C}\), is proven in the next two lemmas. This measurability is also essential for the proof of the remainder of Theorem 4.
Lemma 12. Let \(\mathcal{H}\) be a finite-dimensional complex Hilbert space. Then the function \[\cdot^{-1}:~ \mathcal{B}({\mathcal{H}}) \rightarrow \mathcal{B}({\mathcal{H}}) ,~~ A \mapsto A^{-1}\] which maps every operator to its Moore-Penrose pseudoinverse is measurable.
Proof. We can represent the Moore-Penrose pseudoinverse as a limit (see [54]) \[A^{-1} = \lim_{k\rightarrow \infty} \left( A^* A + \frac{1}{k} \mathbf{1} \right)^{-1} A^*.\] Matrix inversion is continuous (see [54]), and so are addition and multiplication. Therefore, \(\cdot^{-1}\) is represented as a pointwise limit of continuous (and therefore measurable) functions, hence it is measurable. ◻
Lemma 13. For every \(m\in \{1, \dots, M\}\), \(\mathcal{X}^{nM} \rightarrow \mathcal{T}({\mathcal{H}}) ,~~ \mathcal{C}\mapsto Y_m\) is measurable.
Proof. Clearly, \[\sum_{\hat{m}=1}^M \Phi_{\varepsilon,n} (\mathcal{C}(\hat{m}))\] is a measurable function of \(\mathcal{C}\), so by Lemma 17, its square root is also measurable. Denote the restriction of \[\sqrt{ \sum_{\hat{m}=1}^M \Phi_{\varepsilon,n} (\mathcal{C}(\hat{m})) }\] to \(\mathrm{im}(\Theta_{\varepsilon, n})\) by \(A\). It can be seen in (55 ) that \(A: \mathrm{im}(\Theta_{\varepsilon, n}) \rightarrow \mathrm{im}(\Theta_{\varepsilon, n})\), and that we can write \[\label{eq:inverse-representation} \sqrt{ \sum_{\hat{m}=1}^M \Phi_{\varepsilon,n} (\mathcal{C}(\hat{m})) }^{-1} (h) = \begin{cases} A^{-1} h, &h\in \mathrm{im}(\Theta_{\varepsilon, n}) \\ 0, &\text{otherwise.} \end{cases}\tag{60}\] By Lemma 8-[item:q-resolvability-typical-terms-output-trace], \(\mathrm{im}(\Theta_{\varepsilon, n})\) is finite-dimensional, so we may apply Lemma 12 to argue that the operator represented in (60 ) is a measurable function of \(\mathcal{C}\). Since all norms are equivalent on finite-dimensional spaces, this measurability also applies with respect to the trace norm. The measurability of \(Y_m\) is then a straightforward consequence of Lemma 16. ◻
Proof of Theorem 4. Clearly, \(Y_m\geq 0\) and \(Y_1 + \dots + Y_M= \mathbf{1}\), so \((Y_m)_{m=1}^M\) is a \(\{1, \dots, M\}\)-valued povm. We have for the decoding error \[\begin{align} \nonumber &\hphantom{{}={}} \frac{1}{M} \sum_{m=1}^M \mathrm{tr}\left( D(\mathcal{C}(m)) (\mathbf{1}- Y_m) \right) \\ \nonumber &= \frac{1}{M} \sum_{m=1}^M \mathrm{tr}\left( \sqrt{D(\mathcal{C}(m))} \left( \mathbf{1} - \sqrt{ \sum_{\hat{m}=1}^M \Phi_{\varepsilon,n} (\mathcal{C}(\hat{m})) }^{-1} \Phi_{\varepsilon,n} (\mathcal{C}(m)) \sqrt{ \sum_{\hat{m}=1}^M \Phi_{\varepsilon,n} (\mathcal{C}(\hat{m})) }^{-1} \right) \sqrt{D(\mathcal{C}(m))} \right) \\ \label{eq:decoding-typical-split} \overset{(a)}&{\leq} \frac{2}{M} \sum_{m=1}^M \mathrm{tr}\Big( D(\mathcal{C}(m)) \big(\mathbf{1}-\Phi_{\varepsilon,n}(\mathcal{C}(m))\big) \Big) + \frac{4}{M} \sum_{m=1}^M \mathrm{tr}\left( D(\mathcal{C}(m)) \sum_{\substack{\hat{m}=1\\ \hat{m} \neq m}}^M \Phi_{\varepsilon,n} (\mathcal{C}(\hat{m})) \right), \end{align}\tag{61}\] where (a) is an application of Lemma 11 with \[A := \Phi_{\varepsilon,n} (\mathcal{C}(m)) ,~~ B := \sum_{\substack{\hat{m}=1\\ \hat{m} \neq m}}^M \Phi_{\varepsilon,n} (\mathcal{C}(\hat{m})) ,~~ c:= 1,\] where \(\mathrm{im}(A+B)\) is closed because it is finite-dimensional by Lemma 8-[item:q-resolvability-typical-terms-output-trace].
As a prerequisite to bounding the expectation of the second summand in (61 ), we calculate \[\begin{align} \mathrm{tr}\left( D_P^{\otimes n} \Phi_{\varepsilon,n} (\mathcal{C}(m)) \right) \overset{(\ref{eq:typicality-operator-product-decoder})}&{=} \mathrm{tr}\left( D_P^{\otimes n} \Theta_{\varepsilon,n} \Psi_{\varepsilon,n}(\mathcal{C}(m)) \Theta_{\varepsilon,n} \right) \\ \overset{(a)}&{=} \mathrm{tr}\left( \Psi_{\varepsilon,n}(\mathcal{C}(m)) \Theta_{\varepsilon,n} D_P^{\otimes n} \Theta_{\varepsilon,n} \Psi_{\varepsilon,n}(\mathcal{C}(m)) \right) \\ \overset{(b)}&{\leq} \exp\big( - n (H_U- \varepsilon) \big) \mathrm{tr}\left( \Psi_{\varepsilon,n}(\mathcal{C}(m)) \Theta_{\varepsilon,n} \Psi_{\varepsilon,n}(\mathcal{C}(m)) \right) \\ \overset{(c)}&{\leq} \exp\big( - n (H_U- \varepsilon) \big) \mathrm{tr}\left( \Psi_{\varepsilon,n}(\mathcal{C}(m)) \right) \\ \overset{(d)}&{\leq} \exp\big( - n (H_U- H_P- 2\varepsilon) \big) \\ \overset{(\ref{eq:quantum-information})}&{=} \exp\big( - n (\chi(P;D)- 2\varepsilon) \big), \end{align}\] where (a) is by (53 ) and the cyclic property of the trace, (b) is by Lemma 8-[item:q-resolvability-typical-terms-output], (c) is due to \(\Theta_{\varepsilon,n} \leq \mathbf{1}\) and (53 ), and (d) is by Lemma 8-[item:q-resolvability-typical-terms-joint-trace]. We can use this in conjunction with the independence of codewords and bound \[\begin{align} \nonumber \mathbb{E}_\mathcal{C} \mathrm{tr}\left( D(\mathcal{C}(m)) \sum_{\substack{\hat{m}=1\\ \hat{m} \neq m}}^M \Phi_{\varepsilon,n} (\mathcal{C}(\hat{m})) \right) &= \mathbb{E}_\mathcal{C} \mathrm{tr}\left( D_{P}^{\otimes n} \sum_{\substack{\hat{m}=1\\ \hat{m} \neq m}}^M \Phi_{\varepsilon,n} (\mathcal{C}(\hat{m})) \right) \\ \nonumber &\leq M \exp\big( - n (\chi(P;D)- 2\varepsilon) \big) \\ \label{eq:decoding-typical-expectation} &\leq \exp\big( - n (\chi(P;D)- R- 2\varepsilon) \big). \end{align}\tag{62}\]
Next, we apply \(\mathbb{E}_\mathcal{C}\) in (61 ). We use Lemma 9 in the first summand and (62 ) in the second summand which yields ?? . Next, we note that the infimum in 16 is upper bounded by the realization for any fixed \(\varepsilon\). So we pick any \(\varepsilon\in (0,(\chi(P;D)- R)/2)\), then invoke Lemma 9 and fix some \(\gamma_1\) which satisfies 58 . With the choice \(\gamma\in (0, \min(\gamma_1, \chi(P;D)- R- 2\varepsilon))\), this proves ?? . ◻
Theorems 3 and 4 are formulated in terms of expectation, but both the decoding error and the trace distance from the ideal output distribution are concentrated around their mean, as can be seen in the following corollaries.
Corollary 1. Under the assumptions of Theorem 3 and for every \(\gamma\in (0,R/2)\), we have \[\label{eq:q-resolvability-cor-statement-all-blocklengths} \mathbb{P}_\mathcal{C}\left( \left\lVert { D_\mathcal{C} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} \geq \mathcal{W}_\mathrm{res}(R,n) + \exp(-\gamma n) \right) \leq \exp\left( -\frac{1}{2}\exp( n (R-2\gamma) ) \right).\qquad{(9)}\]
Furthermore, ?? implies that \(\left\lVert { D_\mathcal{C} - D_P^{\otimes n} } \right\rVert_\mathrm{tr}\) tends to \(0\) exponentially with a doubly exponentially small error probability as \(n\rightarrow \infty\). That is, there are \(\gamma_1,\gamma_2 \in (0,\infty)\) such that ?? implies for all sufficiently large \(n\) that \[\label{eq:q-resolvability-cor-statement-large-blocklengths} \mathbb{P}_\mathcal{C}\left( \left\lVert { D_\mathcal{C} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} \geq \exp(-\gamma_1n) \right) \leq \exp( -\exp( \gamma_2 n ) ).\qquad{(10)}\]
Corollary 2. Under the assumptions of Theorem 4 and for any choice of \(\gamma_1 \in (0,\infty)\), we have \[\label{eq:decoding-cor-all-blocklengths} \mathbb{P}_\mathcal{C}\left( \frac{1}{M} \sum_{m=1}^M \mathrm{tr}\left( D^n(m) (\mathbf{1}-Y_m) \right) \geq \exp(-n\gamma_1) \right) \leq \exp(n\gamma_1) \mathcal{W}_\mathrm{coding}(R,n).\qquad{(11)}\] Furthermore, ?? implies exponentially small decoding error with exponentially small error. That is, there is a suitable choice for \(\gamma_1\) and some \(\gamma_2 \in (0,\infty)\) such that for sufficiently large \(n\), \[\label{eq:decoding-cor-large-blocklengths} \mathbb{P}_\mathcal{C}\left( \frac{1}{M} \sum_{m=1}^M \mathrm{tr}\left( D^n(m) (\mathbf{1}-Y_m) \right) \geq \exp(-n\gamma_1) \right) \leq \exp(-n\gamma_2).\qquad{(12)}\]
The proof of Corollary 1 is essentially an application of the bounded differences inequality [55]. For the reader’s convenience, we reproduce the result here in the form that we will be using.
Theorem 5. **(Bounded differences inequality as stated in [56].)* Let \(\mathcal{X}\) be a measurable space, and let \(f: \mathcal{X}^\ell\rightarrow \mathbb{R}\) be measurable. Assume that there are nonnegative constants \(c_1, \dots, c_\ell\) with the property \[\label{eq:bounded-differences} \forall i\in \{1, \dots, \ell\}~ \sup\Big\{ \left \lvert f(x_1, \dots, x_\ell) - f( x_1, \dots, x_{i- 1}, x_i', x_{i+1}, \dots, x_\ell ) \right \rvert :~ x_1, \dots, x_\ell, x_i' \in \mathcal{X} \Big\} \leq c_i,\tag{63}\] and denote \[v:= \frac{1}{4} \sum_{i=1}^\ell c_i^2.\] Let \(X_1, \dots, X_\ell\) be independent random variables such that \(\mathbb{E}f(X_1, \dots, X_\ell)\) exists. Then, for any \(t\in (0, \infty)\), \[\mathbb{P}\left( f(X_1, \dots, X_\ell) - \mathbb{E}f(X_1, \dots, X_\ell) > t \right) \leq \exp\left( -\frac{t^2}{2v} \right).\]*
Condition (63 ) is called the bounded differences property which gives Theorem 5 its name.
Proof of Corollary 1. We apply Theorem 5 to the function \[f : (\mathcal{C}(1), \dots, \mathcal{C}(M)) \mapsto \left\lVert { D_\mathcal{C} - D_P^{\otimes n} } \right\rVert_\mathrm{tr},\] so we first need to compute \(c_{\hat{m}}\) with \[\sup\left\{ \vphantom{\bigg(} \left \lvert \vphantom{\Big(} \left\lVert { D_\mathcal{C} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} - \left\lVert { D_{\mathcal{C}'} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} \right \rvert :~ \forall m \in \{ 1, \dots, \hat{m}-1, \hat{m}+1, \dots, M \} :~ \mathcal{C}(m) = \mathcal{C}'(m) \right\} \leq c_{\hat{m}}.\] To this end, we calculate \[\begin{align} \left \lvert \vphantom{\Big(} \left\lVert { D_\mathcal{C} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} - \left\lVert { D_{\mathcal{C}'} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} \right \rvert \overset{(a)}&{\leq} \left\lVert { D_\mathcal{C} - D_P^{\otimes n} - D_{\mathcal{C}'} + D_P^{\otimes n} } \right\rVert_\mathrm{tr} \\ \overset{(\ref{eq:codebook-channel-output-definition})}&{=} \left\lVert { \frac{1}{M} \sum_{m=1}^M\Big( D^n(\mathcal{C}(m)) - D^n(\mathcal{C}'(m)) \Big) } \right\rVert_\mathrm{tr} \\ \overset{(b)}&{=} \frac{1}{M} \left\lVert { D^n(\mathcal{C}(\hat{m})) - D^n(\mathcal{C}'(\hat{m})) } \right\rVert_\mathrm{tr} \\ \overset{(c)}&{\leq} \frac{2}{M}, \end{align}\] where (a) is due to the triangle inequality, (b) is because \(\mathcal{C}(m) = \mathcal{C}'(m)\) whenever \(m\neq \hat{m}\), and (c) is an application of the triangle inequality and the fact \(D\) is a map to \(\mathcal{S}({\mathcal{H}})\).
This means that we can choose \(c_{\hat{m}}:= 2/M\) for all \(\hat{m}\). Consequently, we obtain \[v = \frac{1}{4} \sum_{\hat{m}=1}^M \frac{2^2}{M^2} = M^{-1}.\]
Hence, \[\begin{align} \mathbb{P}_\mathcal{C}\left( \left\lVert { D_\mathcal{C} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} \geq \mathcal{W}_\mathrm{res}(R,n) + \exp(-\gamma n) \right) \overset{(a)}&{\leq} \mathbb{P}_\mathcal{C}\left( \left\lVert { D_\mathcal{C} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} - \mathbb{E}_\mathcal{C} \left\lVert { D_\mathcal{C} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} \geq \exp(-\gamma n) \right) \\ \overset{(b)}&{\leq} \exp\left( -\frac{1}{2}\exp(-2\gamma n)M \right), \end{align}\] where (a) is the application of Theorem 3 and (b) the application of Theorem 5. ?? then follows from \(M\geq \exp(nR)\). In order to prove ?? , we invoke ?? of Theorem 3 to find \(\beta\in (0,\infty)\) such that \(\mathcal{W}_\mathrm{res}(R,n) \leq \exp(-\beta n)\) for large enough \(n\). Then, fixing some \(\gamma_1 \in (0,\min(\beta,\gamma))\), we have for sufficiently large \(n\), \[\begin{align} \mathbb{P}_\mathcal{C}\left( \left\lVert { D_\mathcal{C} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} \geq \exp(-\gamma_1n) \right) &\leq \mathbb{P}_\mathcal{C}\left( \left\lVert { D_\mathcal{C} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} \geq \exp(-\beta n) + \exp(-\gamma n) \right) \\ &\leq \mathbb{P}_\mathcal{C}\left( \left\lVert { D_\mathcal{C} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} \geq \mathcal{W}_\mathrm{res}(R,n) + \exp(-\gamma n) \right) \\ \overset{\eqref{eq:q-resolvability-cor-statement-all-blocklengths}}&{\leq} \exp\left( -\frac{1}{2}\exp( n (R-2\gamma) ) \right). \end{align}\] This proves ?? with any choice of \(\gamma_2 \in (0,R-2\gamma)\). ◻
Proof of Corollary 2. ?? is an immediate consequence of Theorem 4 and Markov’s inequality. In order to argue ?? , we first invoke Theorem 4 to obtain \(\gamma\in (0,\infty)\) such that the ?? holds. ?? then follows from ?? with the choices \(\gamma_1 \in (0,\gamma)\) and \(\gamma_2 := \gamma- \gamma_1\). ◻
In this section, we extend Corollaries 1 and 2 to a cost-constrained version \(\mathcal{C}_{c,C}\) of the codebook \(\mathcal{C}\), where \((c,C)\) is an additive cost constraint compatible with the input distribution \(P\) chosen for the generation of \(\mathcal{C}\).
As long as there exists at least one \(x^n\in \mathcal{X}^n\) which satisfies the cost constraint (which is always the case if there is an input distribution compatible with the cost constraint), we can (given any codebook \(\mathcal{C}\)) define the cost-constrained codebook \(\mathcal{C}_{c,C}\) via \[\mathcal{C}_{c,C}(m) = \begin{cases} \mathcal{C}(m), &\mathcal{C}(m) \text{ satisfies the cost constraint } (c,C), \\ x^n, &\text{otherwise.} \end{cases}\]
We define a set of bad codeword indices \[\mathbb{B} := \left\{ m\in \{1, \dots, M\} :~~ \mathcal{C}(m) \neq \mathcal{C}_{c,C}(m) \right\}.\]
It is possible with methods similar to the ones used in [57] to bound the probability that the codebook contains a large number of bad code words as in the following lemma. For completeness, we give a full proof in Appendix 6.5.
Lemma 14. Let \(\mathcal{C}\) be a random codebook generated from a channel input distribution which is compatible with the additive cost constraint \({(c,C)}\). Let \(\beta_1\) be as defined in 19 . Then, \(\beta_1 > 0\) and for all \(\beta\in (0,\beta_1)\), we have \[\label{eq:bad-codewords-all-blocklengths} \mathbb{P}_\mathcal{C}\left( \left\lvert \mathbb{B} \right \rvert \geq M \exp(-n\beta) \right) \leq \mathcal{W}_\mathrm{cost}^{(c,C)}\left( \beta, \frac{\log M}{n}, n \right),\qquad{(13)}\] with \(\mathcal{W}_\mathrm{cost}^{(c,C)}\) defined in 18 . Furthermore, for all \(R\in (0,\infty)\), \(\beta\in (0,\min(R/2,\beta_1)\), \(\mathcal{W}_\mathrm{cost}^{(c,C)}(\beta,R,n)\) tends to \(0\) doubly exponentially fast as \(n\) tends to \(\infty\). That is, for every \(R\) and \(\beta\in (0,\min(R/2,\beta_1)\), \(\gamma\in (0,R-2\beta)\), we have, for sufficiently large \(n\), \[\label{eq:bad-codewords-large-blocklengths} \mathcal{W}_\mathrm{cost}^{(c,C)}\left( \beta, R, n \right) \leq \exp(-\exp(\gamma n)).\qquad{(14)}\]
Corollary 3. Make the same assumptions as in Theorem 3, and let \((c, C)\) be an additive cost constraint which is compatible with the input distribution \(P\) and induces the cost-constrained random codebook \(\mathcal{C}_{c,C}\). Then, we have for \(\beta_1\) chosen as in 19 and all \(\beta_2 \in (0,\min(\beta_1,R/2)), \beta_3 \in (0,R/2)\) \[\label{eq:q-resolvability-constrained-all-blocklengths} \hphantom{{}={}} \mathbb{P}_\mathcal{C}\bigg( \left\lVert { D_{\mathcal{C}_{c,C}} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} \geq \mathcal{W}_\mathrm{res}(R,n) + 2\exp(-\beta_2n) + \exp(-\beta_3n) \bigg) \leq \mathcal{W}_\mathrm{cost}^{(c,C)}( \beta_2, R, n ) + \exp\left( -\frac{1}{2}\exp( n (R-2\beta_3) ) \right)\qquad{(15)}\]
Furthermore, ?? implies that \(\left\lVert { D_{\mathcal{C}_{c,C}} - D_P^{\otimes n} } \right\rVert_\mathrm{tr}\) tends to \(0\) exponentially with a doubly exponentially small error probability as \(n\rightarrow \infty\). That is, there are \(\gamma_1,\gamma_2 \in (0,\infty)\) such that ?? implies for all sufficiently large \(n\) that \[\label{eq:q-resolvability-constrained-large-blocklengths} \mathbb{P}_\mathcal{C}\left( \left\lVert { D_{\mathcal{C}_{c,C}} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} \geq \exp(-\gamma_1n) \right) \leq \exp( -\exp( \gamma_2 n ) ).\qquad{(16)}\]
Proof. We will bound \(\left\lVert {D_\mathcal{C}- D_{\mathcal{C}_{c,C}}} \right\rVert_\mathrm{tr}\) in such a way that this corollary follows as an immediate consequence of Corollary 1.
Conditioned on the event \(\left\lvert \mathbb{B} \right \rvert < M \exp(-n\beta_2)\), we have almost surely \[\begin{align} \nonumber \left\lVert { D_\mathcal{C}- D_{\mathcal{C}_{c,C}} } \right\rVert_\mathrm{tr} &= \left\lVert { \frac{1}{M} \sum_{m\in\mathbb{B}} \Big( D^n\big( \mathcal{C}(m) \big) - D^n\big( \mathcal{C}_{c,C}(m) \big) \Big) } \right\rVert_\mathrm{tr} \\ \nonumber \overset{(a)}&{\leq} \frac{1}{M} \sum_{m\in\mathbb{B}}\left( \left\lVert { D^n\big( \mathcal{C}(m) \big) } \right\rVert_\mathrm{tr} + \left\lVert { D^n\big( \mathcal{C}_{c,C}(m) \big) } \right\rVert_\mathrm{tr} \right) \displaybreak[0] \\ \nonumber &= \frac{2\left\lvert \mathbb{B} \right \rvert}{M} \\ \label{eq:resolvability-cost-constraint-bad-codeword-conditioned} &< 2\exp(-n\beta_2), \end{align}\tag{64}\] where step (a) is due to the triangle inequality. Consequently, we obtain \[\begin{align} &\hphantom{{}={}} \mathbb{P}_\mathcal{C}\bigg( \left\lVert { D_{\mathcal{C}_{c,C}} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} \geq \mathcal{W}_\mathrm{res}(R,n) + 2\exp(-\beta_2n) + \exp(-\beta_3n) \bigg) \\ \overset{(a)}&{\leq} \mathbb{P}_\mathcal{C}\bigg( \left\lVert { D_{\mathcal{C}_{c,C}} - D_{\mathcal{C}} } \right\rVert_\mathrm{tr} + \left\lVert { D_{\mathcal{C}} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} \geq \mathcal{W}_\mathrm{res}(R,n) + 2\exp(-\beta_2n) + \exp(-\beta_3n) \bigg) \\ \overset{(b)}&{\leq} \mathbb{P}_\mathcal{C}\left( \left\lVert { D_{\mathcal{C}_{c,C}} - D_{\mathcal{C}} } \right\rVert_\mathrm{tr} \geq 2\exp(-\beta_2n) \right) + \mathbb{P}_\mathcal{C}\left( \left\lVert { D_{\mathcal{C}} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} \geq \mathcal{W}_\mathrm{res}(R,n) + \exp(-\beta_3n) \right), \end{align}\] where (a) is due to the triangle inequality and (b) is by the union bound. We conclude the proof of ?? by applying Lemma 14 along with 64 and \(M\geq \exp(nR)\) in the first summand and ?? of Corollary 1 in the second summand.
In order to prove ?? , we invoke ?? of Theorem 3 to find \(\beta_4 \in (0,\infty)\) such that \(\mathcal{W}_\mathrm{res}(R,n) \leq \exp(-\beta_4 n)\) for large enough \(n\). Then with any choice of \(\gamma_1 \in (0,\min(\beta_2,\beta_3,\beta_4)), \gamma_2 \in (0,\min(R-2\beta_2,R-2\beta_3))\),we obtain for sufficiently large \(n\) \[\begin{align} &\hphantom{{}={}} \mathbb{P}_\mathcal{C}\bigg( \left\lVert { D_{\mathcal{C}_{c,C}} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} \geq \exp(-n\gamma_1) \bigg) \\ &\leq \mathbb{P}_\mathcal{C}\bigg( \left\lVert { D_{\mathcal{C}_{c,C}} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} \geq \exp(-\beta_4 n) + 2\exp(-\beta_2n) + \exp(-\beta_3n) \bigg) \\ &\leq \mathbb{P}_\mathcal{C}\bigg( \left\lVert { D_{\mathcal{C}_{c,C}} - D_P^{\otimes n} } \right\rVert_\mathrm{tr} \geq \mathcal{W}_\mathrm{res}(R,n) + 2\exp(-\beta_2n) + \exp(-\beta_3n) \bigg) \\ \overset{\eqref{eq:q-resolvability-constrained-all-blocklengths}}&{\leq} \mathcal{W}_\mathrm{cost}^{(c,C)}( \beta_2, R, n ) + \exp\left( -\frac{1}{2}\exp( n (R-2\beta_3) ) \right) \\ \overset{\eqref{eq:bad-codewords-large-blocklengths}}&{\leq} \exp(-\exp(n\gamma_2)). \qedhere \end{align}\] ◻
Corollary 4. Make the same assumptions as in Theorem 4, and let \((c,C)\) be an additive cost constraint which is compatible with the input distribution \(P\) and induces the cost-constrained random codebook \(\mathcal{C}_{c,C}\). Then, for each \(\mathcal{C}\), there is a decoding povm \((Y_m)_{m=1}^M\) such that every \(Y_m\) is measurable as a function of \(\mathcal{C}\), and for all \(\beta_2 \in (0,\infty) , \beta_3 \in (0,\beta_1)\) with \(\beta_1\) defined in 19 , we have \[\label{eq:decoding-constrained-all-blocklengths} \mathbb{P}_\mathcal{C}\Bigg( \frac{1}{M} \sum_{m=1}^M \mathrm{tr}\left( D(\mathcal{C}_{c,C}(m)) (\mathbf{1}- Y_m) \right) \geq \exp(-n\beta_2) + 2\exp(-n\beta_3) \Bigg) \leq \mathcal{W}_\mathrm{coding}(R,n)\exp(n\beta_2) + \mathcal{W}_\mathrm{cost}^{(c,C)}\left(\beta_3,\frac{\log M}{n},n\right).\qquad{(17)}\] Furthermore, ?? implies that if \(M\geq \exp(nR_{\min})\) for some \(R_{\min} \in (0,R]\) and all \(n\), the decoding error tends to \(0\) exponentially with an exponentially small error probability as \(n\rightarrow \infty\). That is, there are \(\gamma_1, \gamma_2 \in (0,\infty)\) such that for sufficiently large \(n\), \[\label{eq:decoding-constrained-large-blocklengths} \mathbb{P}_\mathcal{C}\left( \frac{1}{M} \sum_{m=1}^M \mathrm{tr}\left( D(\mathcal{C}(m)) (\mathbf{1}- Y_m) \right) \geq \exp(-n\gamma_1) \right) \leq \exp(-n\gamma_2).\qquad{(18)}\]
Proof. We apply Corollary 2 to obtain a povm \((Y_m)_{m=1}^M\) which is measurable as a function of \(\mathcal{C}\). Due to the union bound, \[\begin{align} \nonumber &\hphantom{{}={}} \mathbb{P}_\mathcal{C}\left( \frac{1}{M} \sum_{m=1}^M \mathrm{tr}\left( D(\mathcal{C}_{c,C}(m)) (\mathbf{1}- Y_m) \right) \geq \exp(-n\beta_2) + 2\exp(-n\beta_3) \right) \\ \label{eq:decoding-constrained-split} &\leq \begin{multlined}[t] \mathbb{P}_\mathcal{C}\left( \frac{1}{M} \sum_{m=1}^M \mathrm{tr}\left( D(\mathcal{C}(m)) (\mathbf{1}- Y_m) \right) \geq \exp(-n\beta_2) \right) \\+ \mathbb{P}_\mathcal{C}\left( \frac{1}{M} \sum_{m=1}^M \mathrm{tr}\Big( \big( D(\mathcal{C}_{c,C}(m)) - D(\mathcal{C}(m)) \big) \big(\mathbf{1}- Y_m\big) \Big) \geq 2\exp(-n\beta_3) \right). \end{multlined} \end{align}\tag{65}\] The first summand in 65 can now be upper bounded by \(\mathcal{W}_\mathrm{coding}(R,n)\exp(n\beta_2)\) due to Corollary 2. For the second summand, we note that \[\begin{align} \mathrm{tr}\Big( \big( D(\mathcal{C}_{c,C}(m)) - D(\mathcal{C}(m)) \big) \big(\mathbf{1}- Y_m\big) \Big) \overset{(a)}&{\leq} \left\lVert { \big( D(\mathcal{C}_{c,C}(m)) - D(\mathcal{C}(m)) \big) \big(\mathbf{1}- Y_m\big) } \right\rVert_\mathrm{tr} \\ \overset{(b)}&{\leq} \left\lVert { D(\mathcal{C}_{c,C}(m)) - D(\mathcal{C}(m)) } \right\rVert_\mathrm{tr} \left\lVert { \mathbf{1}- Y_m } \right\rVert_\mathrm{op} \\ \overset{(c)}&{\leq} 2, \end{align}\] where (a) is by Lemma 15-[item:norm-basics-trace-norm-duality], (b) is by Lemma 15-[item:norm-basics-trace-norm-submultiplicative] and (c) follows because \(Y_m\leq \mathbf{1}\) and \(D\) maps to \(\mathcal{S}({\mathcal{H}})\). Hence, the second summand in 65 is upper bounded by \[\mathbb{P}_\mathcal{C}\left( \frac{2\left\lvert \mathbb{B} \right \rvert}{M} \geq 2\exp(-n\beta_3) \right) \leq \mathcal{W}_\mathrm{cost}^{(c,C)}\left(\beta_3,\frac{\log M}{n},n\right)\] where the inequality follows by Lemma 14. This concludes the proof of ?? . In order to prove ?? , we invoke ?? of Theorem 4 to obtain \(\beta_4 \in (0,\infty)\) with \(\mathcal{W}_\mathrm{coding}(R,n) \leq \exp(-n\beta_4)\) for large enough \(n\) and we invoke ?? of Lemma 14 to obtain \(\beta_5 \in (0,\infty)\) with \(\mathcal{W}_\mathrm{cost}^{(\mathcal{W}_\mathrm{cost},C)}(\beta_3,R_{\min},n) \leq \exp(-\exp(n\beta_5))\) for sufficiently large \(n\). We are allowed to make \(\beta_2\) small enough so that \(\beta_2 \in (0,\beta_4)\). With any choice of \(\gamma_1 \in (0, \min(\beta_2, \beta_3))\) and \(\gamma_2 \in (0,\beta_4-\beta_2)\), we then have \[\begin{align} \mathbb{P}_\mathcal{C}\left( \frac{1}{M} \sum_{m=1}^M \mathrm{tr}\left( D(\mathcal{C}(m)) (\mathbf{1}- Y_m) \right) \geq \exp(-n\gamma_1) \right) &\leq \mathbb{P}_\mathcal{C}\left( \frac{1}{M} \sum_{m=1}^M \mathrm{tr}\left( D(\mathcal{C}(m)) (\mathbf{1}- Y_m) \right) \geq \exp(-n\beta_2) + 2\exp(-n\beta_3) \right) \\ \overset{\eqref{eq:decoding-constrained-all-blocklengths}}&{\leq} \mathcal{W}_\mathrm{coding}(R,n)\exp(n\beta_2) + \mathcal{W}_\mathrm{cost}^{(c,C)}\left(\beta_3,R_{\min},n\right) \\ &\leq \exp(-n(\beta_4-\beta_2) + \exp(-\exp(\beta_5 n)) \\ &\leq \exp(-n\gamma_2). \qedhere \end{align}\] ◻
We now have everything needed to prove the main results of this paper.
Proof of Theorem [theorem:wiretap-ccq-all-blocklengths]. We prove the existence of a codebook by arguing that if we draw a codebook at random, it has all the properties claimed in the theorem statement with a positive probability.
We generate a wiretap codebook \(\mathcal{C}:= (\mathcal{C}_1, \dots, \mathcal{C}_L)\) which is an \(L\)-tuple of i.i.d. standard random codebooks drawn according to \(P\). We also define the associated cost-constrained wiretap codebook \(\mathcal{C}_{c,C} = (\mathcal{C}_{1, c,C}, \dots, \mathcal{C}_{L, c,C})\).
Let \(\mathfrak{M}\in \{1, \dots, L\}\). In order to generate the corresponding output of \(\mathrm{Enc}\), we draw a random number \(m\in \{1, \dots, M\}\) and output \(\mathcal{C}_{\mathfrak{M}, c,C}(m)\). Note that (22 ) ensures that our code has a rate of at least \(R\).
We use a joint typicality decoder; i.e., \(\mathrm{Dec}\) outputs \(\hat{\mathfrak{M}}\) if there is \(m\in \{1, \dots, M\}\) such that \(\mathcal{C}_{\hat{\mathfrak{M}}}(m)\) is jointly typical with \(\hat{X}^n\) and \(\hat{\mathfrak{M}}, m\) are unique with this property. If no such \(\hat{\mathfrak{M}}, m\) exist, the decoder outputs \(1\). The definition of typicality we use is in terms of information density (cf. [58]). Let \[\varepsilon_\mathcal{C}:~ \{1, \dots, L\} \times \{1, \dots, M\} \rightarrow [0,1] ,~~ (\ell,m) \mapsto \mathbb{P}\left( \hat{\mathfrak{M}} \neq \ell ~|~ X^n= \mathcal{C}_\ell(m) \right).\]
Let us first look at the average decoding error of an alternative encoder \(\mathrm{Enc}'\) which draws a random number \(m\in \{1, \dots, M\}\) and outputs \(\mathcal{C}_{\mathfrak{M}}(m)\) to transmit message \(\mathfrak{M}\). The average decoding error in case \(\mathrm{Enc}'\) is used can be expressed as \[\frac{1}{LM} \sum_{\ell=1}^L \sum_{m=1}^M \varepsilon_\mathcal{C}(\ell,m),\]
and so we have by [58] that \[\mathbb{E}_\mathcal{C}\left( \frac{1}{LM} \sum_{\ell=1}^L \sum_{m=1}^M \varepsilon_\mathcal{C}(\ell,m) \right) \leq \mathcal{W}_\mathrm{class}^{W}(\hat{R},n).\] We use Markov’s inequality to infer that for any \(\beta_3 \in (0,\infty)\), we have \[\mathbb{P}_\mathcal{C}\left( \frac{1}{LM} \sum_{\ell=1}^L \sum_{m=1}^M \varepsilon_\mathcal{C}(\ell,m) \geq \exp(-n\beta_3) \right) \leq \mathcal{W}_\mathrm{class}^{W}(\hat{R},n)\exp(n\beta_3).\] Noting that \[\frac{1}{LM} \sum_{\ell=1}^L \sum_{m=1}^M \varepsilon_\mathcal{C}(\ell,m) - \frac{1}{LM} \sum_{\ell=1}^L \sum_{m=1}^M \varepsilon_{\mathcal{C}_{c, C}}(\ell,m) \leq \frac{\left\lvert \mathbb{B} \right \rvert}{LM}\] and using Lemma 14, we obtain for every \(\beta_2 \in (0,(R+\tilde{R})/2)\), \[\begin{align} \nonumber &\hphantom{{}={}} \mathbb{P}_\mathcal{C}\left( \frac{1}{LM} \sum_{\ell=1}^L \sum_{m=1}^M \varepsilon_{\mathcal{C}_{c, C}}(\ell,m) \geq \exp(-n\beta_2) + \exp(-n\beta_3) \right) \\ \nonumber &\leq \mathbb{P}_\mathcal{C}\left( \frac{1}{LM} \sum_{\ell=1}^L \sum_{m=1}^M \varepsilon_\mathcal{C}(\ell,m) \geq \exp(-n\beta_3) \right) + \mathbb{P}_\mathcal{C}\left( \frac{1}{LM} \sum_{\ell=1}^L \sum_{m=1}^M \varepsilon_\mathcal{C}(\ell,m) - \frac{1}{LM} \sum_{\ell=1}^L \sum_{m=1}^M \varepsilon_{\mathcal{C}_{c, C}}(\ell,m) \geq \exp(-n\beta_2) \right) \\ \label{eq:wiretap-ccq-decoding-error} &\leq \mathcal{W}_\mathrm{class}^{W}(\hat{R},n)\exp(n\beta_3) + \mathcal{W}_\mathrm{cost}^{(c,C)}(\beta_2,R+\tilde{R},n). \end{align}\tag{66}\]
We first note that by passing \(\mathfrak{M}\) through \(\mathrm{Enc}\) and \(D_\mathfrak{E}^n\), we obtain the density operator \(D_{\mathfrak{E}, \mathcal{C}_{\mathfrak{M}, c,C}}\). We apply Corollary 3 to obtain, for every \(\ell\in \{1, \dots, L\}\), \[\mathbb{P}_\mathcal{C}\bigg( \left\lVert { D_{\mathfrak{E}, \mathcal{C}_{\ell, c,C}} - D_{\mathfrak{E}, P}^{\otimes n} } \right\rVert_\mathrm{tr} \geq \mathcal{W}_\mathrm{res}^{D}(\tilde{R},n) + 2\exp(-\beta_4n) + \exp(-\beta_5n) \bigg) \leq \mathcal{W}_\mathrm{cost}^{(c,C)}( \beta_4, \tilde{R}, n ) + \exp\left( -\frac{1}{2}\exp( n (\tilde{R}-2\beta_5) ) \right).\]
Hence, \[\begin{gather} \label{eq:ccq-wiretap-tvdist} \mathbb{P}_\mathcal{C}\bigg( \exists \ell\in \{1, \dots, L\}~ \left\lVert { D_{\mathfrak{E}, \mathcal{C}_{\ell, c,C}} - D_{\mathfrak{E}, P}^{\otimes n} } \right\rVert_\mathrm{tr} \geq \mathcal{W}_\mathrm{res}^{D}(\tilde{R},n) + 2\exp(-\beta_4n) + \exp(-\beta_5n) \bigg) \\ \leq \mathcal{W}_\mathrm{cost}^{(c,C)}( \beta_4, \tilde{R}, n ) \exp(n(\hat{R}-\tilde{R})) + \exp\left( -\frac{1}{2}\exp( n (\tilde{R}-2\beta_5) ) + n( \hat{R}-\tilde{R} ) \right) \end{gather}\tag{67}\]
Conditioned on the event that we have for every \(\ell\in \{1, \dots, L\}\) \[\label{eq:wiretap-security-assumption} \left\lVert { D_{\mathfrak{E}, \mathcal{C}_{\ell, c,C}} - D_{\mathfrak{E}, P}^{\otimes n} } \right\rVert_\mathrm{tr} < \mathcal{W}_\mathrm{res}^{D}(\tilde{R},n) + 2\exp(-\beta_4n) + \exp(-\beta_5n),\tag{68}\] the triangle inequality yields, for all \(\ell_1, \ell_2\), \[\begin{align} \left\lVert { D_{\mathfrak{E}, \mathcal{C}_{\ell_1, c,C}} - D_{\mathfrak{E}, \mathcal{C}_{\ell_2, c,C}} } \right\rVert_\mathrm{tr} &\leq \left\lVert { D_{\mathfrak{E}, \mathcal{C}_{\ell_1, c,C}} - D_{\mathfrak{E}, P}^{\otimes n} } \right\rVert_\mathrm{tr} + \left\lVert { D_{\mathfrak{E}, \mathcal{C}_{\ell_2, c,C}} - D_{\mathfrak{E}, P}^{\otimes n} } \right\rVert_\mathrm{tr} \\ \overset{(\ref{eq:wiretap-security-assumption})}&{\leq} 2\mathcal{W}_\mathrm{res}^{D}(\tilde{R},n) + 4\exp(-\beta_4n) + 2\exp(-\beta_5n), \end{align}\] which means that our wiretap code has distinguishing security level \(2\mathcal{W}_\mathrm{res}^{D}(\tilde{R},n) + 4\exp(-\beta_4n) + 2\exp(-\beta_5n)\).
In conclusion, if 23 is satisfied, then we can combine 66 and 67 with the union bound to argue that if we carry out the random codebook construction as described above, there is a nonzero probability that the resulting cost-constrained codebook simultaneously satisfies the decoding error property claimed in item [item:wiretap-ccq-all-blocklengths-error] and the security property claimed in item [item:wiretap-ccq-all-blocklengths-security] of the theorem statement. Furthermore, it is clear that all codebooks that could be drawn randomly in the fashion described satisfy items [item:wiretap-ccq-all-blocklengths-codebook-size] and [item:wiretap-ccq-all-blocklengths-cost]. This means that we have shown that at least one codebook must exist that has all the properties claimed in the statement of Theorem [theorem:wiretap-ccq-all-blocklengths]. ◻
Proof of Theorem 1. We have \(R< I(P,W)- \chi(P;D_\mathfrak{E})\). This allows us to fix \(\tilde{R}\in (\chi(P;D_\mathfrak{E}), I(P,W)- R)\), and, subsequently, \(\hat{R}\in (R+ \tilde{R}, I(P,W))\). For sufficiently large \(n\), this then allows us to fix \(M, L\in \mathbb{N}\) satisfying 20 , 21 , and 22 .
We fix \(\beta_2 \in (0, \min(\beta_1,(R+L)/2))\) and \(\beta_4 \in (0, \min(\beta_1,R/2))\), where \(\beta_1\) is defined in 19 . Then we invoke [58], Lemma 14, and Theorem 3 to find \(\beta_6, \beta_7, \beta_8, \beta_9 \in(0,\infty)\) with the properties that for sufficiently large \(n\), \[\begin{align} \mathcal{W}_\mathrm{class}^{W}(\hat{R},n) &\leq \exp(-\beta_6 n) \\ \mathcal{W}_\mathrm{cost}^{(c,C)}(\beta_2,R+\tilde{R},n) &\leq \exp(-\exp(\beta_7 n)) \\ \mathcal{W}_\mathrm{cost}^{(c,C)}(\beta_4,\tilde{R},n) &\leq \exp(-\exp(\beta_8 n)) \\ \mathcal{W}_\mathrm{res}^{D}(\tilde{R},n) &\leq \exp(-\beta_9 n). \end{align}\] Finally, we pick \(\beta_3 \in (0,\beta_6)\) and \(\beta_5 \in (0,R/2)\). Then the left hand side in 23 is upper bounded by \[\exp(-(\beta_6-\beta_3) n) + \exp(-\exp(\beta_7 n)) + \exp(-\exp(\beta_8 n) + n(\hat{R}-\tilde{R})) + \exp\left( -\frac{1}{2}\exp( n (R-2\beta_5) ) + n(\hat{R}-\tilde{R}) \right),\] which clearly tends to \(0\) for \(n\rightarrow \infty\). In particular, for large enough \(n\), condition 23 is satisfied. With the choices \(\gamma_1 \in (0,\min(\beta_2,\beta_3)\) and \(\gamma_2 \in (0, \min(\beta_4,\beta_5,\beta_9))\), Theorem [theorem:wiretap-ccq-all-blocklengths] gives us a wiretap code which by item [item:wiretap-ccq-all-blocklengths-error] has, for sufficiently large \(n\), average error \[\varepsilon = \exp(-n\beta_2) + \exp(-n\beta_3) < \exp(-n\gamma_1)\] and by item [item:wiretap-ccq-all-blocklengths-security] distinguishing security level \[\delta = 2\mathcal{W}_\mathrm{res}^{D}(\tilde{R},n) + 4\exp(-\beta_4n) + 2\exp(-\beta_5n) \leq 2\exp(-\beta_9 n) + 4\exp(-\beta_4n) + 2\exp(-\beta_5n) < \exp(-n\gamma_2). \qedhere\] ◻
Proof of Theorem [theorem:wiretap-cq-all-blocklengths]. The proof is similar to that of Theorem [theorem:wiretap-ccq-all-blocklengths], except that we invoke Corollary 2 for the average decoding error at the legitimate receiver.
See proof of Theorem 1.
See proof of Theorem 1.
We treat the wiretap codebook \(\mathcal{C}\) as one large codebook \(\tilde{\mathcal{C}}\) of size \(LM\) defined by \(\tilde{\mathcal{C}}(\ell,m) := \mathcal{C}_\ell(m)\). Together with (21 ), this allows us to invoke Corollary 4 to obtain a decoding povm \((Y_{\ell,m})_{\ell,m=1}^{L,M}\) for the codebook \(\tilde{\mathcal{C}}_{c,C}\). Define a corresponding decoding povm \((Y_{\ell})_{\ell=1}^{L}\) for the wiretap channel by \[Y_{\ell} := \sum_{m=1}^M Y_{\ell,m}.\]
Then \[\begin{align} \mathbb{E}_\mathfrak{M} \mathrm{tr}\left( D_\mathfrak{B}^n\circ \mathrm{Enc}(\mathfrak{M}) (\mathbf{1}- Y_\mathfrak{M}) \right) &= \frac{1}{L} \sum_{\ell=1}^{L} \mathrm{tr}\left( D_\mathfrak{B}^n\circ \mathrm{Enc}(\ell) \left(\mathbf{1}- Y_{\ell}\right) \right) \\ &= \frac{1}{L} \sum_{\ell=1}^{L} \mathrm{tr}\left( \left( \frac{1}{M} \sum_{m=1}^M D_\mathfrak{B}^n\left(\tilde{\mathcal{C}}_{c,C}(\ell,m)\right) \right) \left( \mathbf{1} - \sum_{m=1}^MY_{\ell,m} \right) \right) \\ &= \frac{1}{LM} \sum_{\ell=1}^{L} \sum_{m=1}^M \mathrm{tr}\left( D_\mathfrak{B}^n\left(\tilde{\mathcal{C}}_{c,C}(\ell,m)\right) \left( \mathbf{1} - \sum_{m=1}^MY_{\ell,m} \right) \right) \\ &\leq \frac{1}{LM} \sum_{\ell=1}^{L} \sum_{m=1}^M \mathrm{tr}\left( D_\mathfrak{B}^n\left(\tilde{\mathcal{C}}_{c,C}(\ell,m)\right) \left( \mathbf{1} - Y_{\ell,m} \right) \right), \end{align}\] and therefore, we have by Corollary 4 that \[\begin{align} &\hphantom{{}={}} \mathbb{P}_\mathcal{C}\Big( \mathbb{E}_\mathfrak{M} \mathrm{tr}\left( D_\mathfrak{B}^n\circ \mathrm{Enc}(\mathfrak{M}) (\mathbf{1}- Y_\mathfrak{M}) \right) \geq \exp(-n\beta_2) + 2\exp(-n\beta_3) \Big) \\ &\leq \mathcal{W}_\mathrm{coding}^{D_\mathfrak{B}}(\hat{R},n)\exp(n\beta_2) + \mathcal{W}_\mathrm{cost}^{(c,C)}\left(\beta_3,\frac{\log (LM)}{n},n\right) \\ &\leq \mathcal{W}_\mathrm{coding}^{D_\mathfrak{B}}(\hat{R},n)\exp(n\beta_2) + \mathcal{W}_\mathrm{cost}^{(c,C)}\left(\beta_3,R+\tilde{R},n\right). \end{align}\]
See proof of Theorem 1.
Similarly as in the proof of Theorem 1, the existence of a wiretap code as claimed in the theorem statement is assured for sufficiently large \(n\). ◻
Proof of Theorem 2. Theorem 2 follows from Theorem [theorem:wiretap-cq-all-blocklengths] with the same argument we used to show that Theorem 1 follows from Theorem [theorem:wiretap-ccq-all-blocklengths]. ◻
In this section, we demonstrate the finite block length nature of our results on the example of the Gaussian cqq wiretap channel. We only give a sketch of the additional steps necessary to evaluate the bounds for this channel in the rest of this section, but the full annotated Python source code that reproduces the plots in Figures 4, 5 and 6 is available as an electronic supplement with this paper.
In the following, we introduce the Gaussian cq channel along with some properties that we need in this section. For more details, we refer the reader to [41].
Let \(\mathcal{H}:= L^2(\mathbb{R})\) be the complex Hilbert space of equivalence classes of complex-valued, square integrable functions on \(\mathbb{R}\). For \(k=0,1,\dots\), let \(\left\lvert {k} \right\rangle\) be the number state vector defined in [41]. Here, it will only be important that \(\left\lvert {0} \right\rangle, \left\lvert {1} \right\rangle, \dots\) form an orthonormal basis of \(\mathcal{H}\). For every \(\zeta\in \mathbb{C}\), we define a coherent state vector (see [41]) \[\label{eq:coherent-state-vector} \left\lvert {\zeta} \right\rangle := \exp\left( -\frac{\left \lvert \zeta \right \rvert^2}{2} \right) \sum_{k=0}^\infty \frac{\zeta^k}{\sqrt{k!}} \left\lvert {k} \right\rangle.\tag{69}\] The Gaussian cq channel is then defined (cf. [41]) as \[D^{(\eta,N)}:~ \mathbb{C}\rightarrow \mathcal{S}({\mathcal{H}}) ,~~ x \mapsto \frac{1}{\pi N} \int_\mathbb{C} \left\lvert {\zeta} \right\rangle\left\langle {\zeta} \right\rvert \exp\left( -\frac{\left \lvert \zeta-\sqrt{\eta}x \right \rvert^2}{N} \right) d\zeta.\] It is parametrized by the noise power \(N>0\), and the transmittivity \(\eta\in [0,1]\). With \(\eta=1\), this is the same definition as in [41]. In the following, we summarize some properties of this channel that we need in this section. They are derived in [41] for the case \(\eta=1\), but they carry over because \(D^{(\eta,N)}(x) = D^{(1,N)}(\sqrt{\eta}x)\). For every \(x\in \mathbb{C}\), there is [41] a unitary displacement operator \(U_{x}\) such that \[D^{(1,N)}(x) = U_{x} D^{(1,N)}(0) U_{x}^*,\] and hence \[\label{eq:gaussian-cq-unitary} D^{(\eta,N)}(x) = D^{(1,N)}(\sqrt{\eta}x) = U_{\sqrt{\eta}x} D^{(1,N)}(0) U_{\sqrt{\eta}x}^* = U_{\sqrt{\eta}x} D^{(\eta,N)}(0) U_{\sqrt{\eta}x}^*.\tag{70}\] \(D^{(\eta,N)}(0)=D^{(1,N)}(0)\) can be written as [41] \[\label{eq:gaussian-cq-zero-decomposition} D^{(\eta,N)}(0) = \frac{1}{N+1} \sum_{k=0}^\infty \left( \frac{N}{N+1} \right)^k \left\lvert {k} \right\rangle\left\langle {k} \right\rvert.\tag{71}\] This state has the von Neumann entropy [41] \[\label{eq:gaussian-cq-entropy} H\left(D^{(\eta,N)}(0)\right) = g(N),\tag{72}\] where \[\label{eq:gordon} g:~ [0,\infty) \rightarrow [0,\infty),~~ t\mapsto \begin{cases} (t+1) \log(t+1) - t\log t, &t> 0, \\ 0, &t=0 \end{cases}\tag{73}\] is called the Gordon function.
A consequence of (70 ) and (71 ) is that for every \(f: \mathbb{R}\rightarrow \mathbb{R}\) and every \(x\in \mathbb{C}\), we have \[\begin{align} \nonumber \mathrm{tr} f\big( D^{(\eta,N)}(x) \big) \overset{(\ref{eq:gaussian-cq-unitary})}&{=} \mathrm{tr} f\big( U_{\sqrt{\eta}x} D^{(\eta,N)}(0) U_{\sqrt{\eta}x}^* \big) \\ \nonumber \overset{(\ref{eq:gaussian-cq-zero-decomposition})}&{=} \mathrm{tr} f\left( \frac{1}{N+1} \sum_{k=0}^\infty \left( \frac{N}{N+1} \right)^k U_{\sqrt{\eta}x} \left\lvert {k} \right\rangle\left\langle {k} \right\rvert U_{\sqrt{\eta}x}^* \right) \\ \nonumber \overset{(a)}&{=} \mathrm{tr} f\left( \frac{1}{N+1} \sum_{k=0}^\infty \left( \frac{N}{N+1} \right)^k \left\lvert {k} \right\rangle\left\langle {k} \right\rvert \right) \\ \label{eq:gaussian-cq-trace-function} \overset{(\ref{eq:gaussian-cq-zero-decomposition})}&{=} \mathrm{tr}f(D^{(\eta,N)}(0)), \end{align}\tag{74}\] where step (a) is due to the fact that both \(\left\lvert {0} \right\rangle, \left\lvert {1} \right\rangle, \dots\) and \(U_{\sqrt{\eta}x}\left\lvert {0} \right\rangle, U_{\sqrt{\eta}x}\left\lvert {1} \right\rangle, \dots\) are orthonormal bases of \(\mathcal{H}\).
As the input distribution \(P\), we choose a complex Gaussian distribution where real and imaginary part are independent with mean \(0\) and variance \(E/2\), and \(E\) is the average energy per channel use. In order to relate this to the case \(\eta=1\), we also define an auxiliary input distribution \(P_\eta\) which is complex Gaussian with mean \(0\) and variance \(\eta E/2\) per complex component. With this choice, we have [41] \[\label{eq:gaussian-cq-average} D^{(\eta,N)}_P = D^{(1,N)}_{P_\eta} = \frac{1}{N+\eta E+1} \sum_{k=1}^\infty \left( \frac{N+\eta E}{N+\eta E+1} \right)^k \left\lvert {k} \right\rangle\left\langle {k} \right\rvert.\tag{75}\]
In this section, we consider a cqq wiretap channel where \(D_\mathfrak{B}= D^{(\eta_\mathfrak{B}, N_\mathfrak{B})}\) and \(D_\mathfrak{E}= D^{(\eta_\mathfrak{E},N_\mathfrak{E})}\).
In the following, we describe a commonly employed channel model of physical systems [25], [59] that is a special case of the Gaussian cqq wiretap channel \((D_\mathfrak{B},D_\mathfrak{E})\) introduced in Section 6.1. It uses a quantum channel \(\mathcal{V}_\eta\) called a beam splitter [60] with a parameter \(\eta\in [0,1]\) called transmittivity. Denoting by \(a^{\ast},b^\ast\) and \(a,b\) the creation, respectively annihilation operators on two copies \(\mathcal{H}_\mathfrak{A},\mathcal{H}'_\mathfrak{A}\) of the Hilbert space \(\mathcal{H}=L^2(\mathbb{R})\), \(\mathcal{V}_{\eta}: \mathcal{S}({\mathcal{H}_{\mathfrak{A}}\otimes \mathcal{H}'_{\mathfrak{A}}}) \to \mathcal{S}({\mathcal{H}_{\mathfrak{A}}\otimes \mathcal{H}'_{\mathfrak{A}}})\) is defined as \[\mathcal{V}_{\eta} (\rho) := B_{\theta} \rho B_{\theta}^*,\] where \(\theta\in[0,\pi/2]\) is such that \(\eta=\cos^2(\theta)\) and the unitary \(B_{\theta}: \mathcal{H}_{\mathfrak{A}}\otimes \mathcal{H}'_{\mathfrak{A}} \to\mathcal{H}_{\mathfrak{A}}\otimes \mathcal{H}'_{\mathfrak{A}}\) is given by [60] \[B_\theta= \exp\{\theta(a^\ast \otimes b-a\otimes b^\ast)\}.\] For every \(x,x'\in\mathbb{C}\) and corresponding coherent state vectors \(\left\lvert {x} \right\rangle\in\mathcal{H}_{\mathfrak{A}}\) and \(\left\lvert {x'} \right\rangle\in \mathcal{H}'_{\mathfrak{A}}\), we have [60] \[\label{eq:beam-splitter-key-property} \mathcal{V}_\eta\left( \left\lvert {x} \right\rangle\left\langle {x} \right\rvert \otimes \left\lvert {x'} \right\rangle\left\langle {x'} \right\rvert \right) = \left\lvert {\sqrt{\eta}x+\sqrt{\eta'}x'} \right\rangle\left\langle {\sqrt{\eta}x+\sqrt{\eta'}x'} \right\rvert \otimes \left\lvert {\sqrt{\eta'}x-\sqrt{\eta}x'} \right\rangle\left\langle {\sqrt{\eta'}x-\sqrt{\eta}x'} \right\rvert,\tag{76}\] where \(\eta':=1-\eta\). Moreover, we use an additive thermal noise channel \(\mathcal{N}_N(\rho)\) which is described [59] by \[\mathcal{N}_N(\rho) = \frac{1}{\pi N} \int_\mathbb{C} U_{\zeta} \rho U_{\zeta}^* \exp\left( - \frac{\left \lvert \zeta \right \rvert^2}{N} \right) d\zeta,\] where \(U_{\zeta}\) are displacement operators and \(\zeta\in\mathbb{C}\). For every coherent state \(\left\lvert {\zeta} \right\rangle\left\langle {\zeta} \right\rvert\) and \(N\in (0,\infty)\), it holds that \[\label{eq:additive-noise-channel-key-property} \mathcal{N}_{N}\left( \left\lvert {\zeta} \right\rangle\left\langle {\zeta} \right\rvert \right) = D^{(1,N)} (\zeta).\tag{77}\]
A bosonic wiretap model [25], [59] can be based on the assumption that every photon which is lost from the original signal can be detected by the eavesdropper. This is an extremely pessimistic model, not taking into account the photons which are detected neither by the legitimate receiver nor the eavesdropper due to channel loss. In contrast, we evaluate a channel model, depicted in Fig. 3, in which both the legitimate receiver and the eavesdropper are subject to channel loss. To this end, we consider Hilbert spaces \(\mathcal{H}_{\mathfrak{A},1}\), \(\mathcal{H}'_{\mathfrak{A},1}\), \(\mathcal{H}_{\mathfrak{A},2}\), \(\mathcal{H}'_{\mathfrak{A}, 2}\), \(\mathcal{H}_\mathfrak{B}\), \(\mathcal{H}_\mathfrak{E}\), and \(\mathcal{H}_{\mathrm{env}}\) all of which are copies of \(\mathcal{H}=L^2(\mathbb{R})\). Moreover, we choose transmittivity parameters \(\eta_1, \eta_2 \in [0,1]\).
In this model, the transmitter’s channel alphabet is given as \(\mathcal{X}:= \mathbb{C}\). The transmission symbol \(x\in \mathbb{C}\) is transformed into a coherent state \(\left\lvert {x} \right\rangle\left\langle {x} \right\rvert\) with \(\left\lvert {x} \right\rangle\in \mathcal{H}_{\mathfrak{A},1}\) defined in (69 ). This state \(\left\lvert {x} \right\rangle\left\langle {x} \right\rvert\) is passed to the first input of the beam splitter \(\mathcal{V}_{\eta_1}: \mathcal{S}({\mathcal{H}_{\mathfrak{A},1}\otimes \mathcal{H}'_{\mathfrak{A},1} }) \to \mathcal{S}({\mathcal{H}_{\mathfrak{A},1}\otimes \mathcal{H}_{\mathfrak{A},2}})\), where the second input of the beam splitter receives the vacuum state \(\left\lvert {0} \right\rangle\left\langle {0} \right\rvert\in \mathcal{S}({\mathcal{H}'_{\mathfrak{A},1}})\). The first output is passed through \(\mathcal{N}_{N_\mathfrak{B}}:\mathcal{S}({\mathcal{H}_{\mathfrak{A},1}})\to \mathcal{S}({\mathcal{H}_\mathfrak{B}})\) to model the thermal noise at the legitimate receiver. The second output is passed through another beam splitter \(\mathcal{V}_{\eta_2}: \mathcal{S}({\mathcal{H}_{\mathfrak{A},2}\otimes \mathcal{H}'_{\mathfrak{A},2} }) \to \mathcal{S}({\mathcal{H}_{\mathfrak{A},2}\otimes \mathcal{H}_{\mathrm{env}}})\), again with the vacuum state \(\left\lvert {0} \right\rangle\left\langle {0} \right\rvert\in \mathcal{S}({\mathcal{H}'_{\mathfrak{A},2}})\) at the second input. The eavesdropper receives a version of the first output of the beam splitter \(\mathcal{V}_{\eta_2}\) which is passed through \(\mathcal{N}_{N_\mathfrak{E}}: \mathcal{S}({ \mathcal{H}_{\mathfrak{A},2} })\to \mathcal{S}({\mathcal{H}_\mathfrak{E}})\) to model the eavesdropper’s thermal receiver noise. The remaining beam splitter output is considered an environment state which is received neither at the legitimate receiver nor at the eavesdropper. The joint quantum state at the legitimate receiver’s channel output, the eavesdropper’s channel output, and the environment channel output can thus be written as \[\begin{align} &\hphantom{{}={}} \left( \mathcal{N}_{N_\mathfrak{B}} \otimes \mathcal{N}_{N_\mathfrak{E}} \otimes \textrm{id}_{\mathcal{B}({\mathcal{H}'_{\mathfrak{A},2}})}\right) \circ \left(\textrm{id}_{\mathcal{B}({\mathcal{H}_{\mathfrak{A},1}})} \otimes \mathcal{V}_{\eta_2}\right) \Big( \mathcal{V}_{\eta_1}\big( \left\lvert {x} \right\rangle\left\langle {x} \right\rvert \otimes \left\lvert {0} \right\rangle\left\langle {0} \right\rvert \big) \otimes \left\lvert {0} \right\rangle\left\langle {0} \right\rvert \Big) \\ \overset{(\ref{eq:beam-splitter-key-property})}&{=} \left(\mathcal{N}_{N_\mathfrak{B}} \otimes \mathcal{N}_{N_\mathfrak{E}} \otimes \textrm{id}_{\mathcal{B}({\mathcal{H}'_{\mathfrak{A},2}})}\right) \circ \left(\textrm{id}_{\mathcal{B}({\mathcal{H}_{\mathfrak{A},1}})} \otimes \mathcal{V}_{\eta_2}\right) \Big( \left\lvert {\sqrt{\eta_1}x} \right\rangle\left\langle {\sqrt{\eta_1}x} \right\rvert \otimes \left\lvert {\sqrt{\eta'_1}x} \right\rangle\left\langle {\sqrt{\eta'_1}x} \right\rvert \otimes \left\lvert {0} \right\rangle\left\langle {0} \right\rvert \Big) \\ \overset{(\ref{eq:beam-splitter-key-property})}&{=} \left( \mathcal{N}_{N_\mathfrak{B}} \otimes \mathcal{N}_{N_\mathfrak{E}} \otimes \textrm{id}_{\mathcal{B}({\mathcal{H}'_{\mathfrak{A},2}})} \right) \Big( \left\lvert {\sqrt{\eta_1}x} \right\rangle\left\langle {\sqrt{\eta_1}x} \right\rvert \otimes \left\lvert {\sqrt{\eta_2\eta'_1}x} \right\rangle\left\langle {\sqrt{\eta_2\eta'_1}x} \right\rvert \otimes \left\lvert {\sqrt{\eta'_2\eta'_1}x} \right\rangle\left\langle {\sqrt{\eta'_2\eta'_1}x} \right\rvert \Big) \\ \overset{(\ref{eq:additive-noise-channel-key-property})}&{=} D^{(1,N_\mathfrak{B})} \left(\sqrt{\eta_1}x\right) \otimes D^{(1,N_\mathfrak{E})} \left(\sqrt{\eta_2\eta'_1}x\right) \otimes \left\lvert {\sqrt{\eta'_2\eta'_1}x} \right\rangle\left\langle {\sqrt{\eta'_2\eta'_1}x} \right\rvert \\ &= D^{(\eta_1,N_\mathfrak{B})} \left(x\right) \otimes D^{(\eta_2\eta'_1,N_\mathfrak{E})} \left(x\right) \otimes \left\lvert {\sqrt{\eta'_2\eta'_1}x} \right\rangle\left\langle {\sqrt{\eta'_2\eta'_1}x} \right\rvert, \end{align}\] where we have used \(\eta'_1 := 1-\eta_1\), \(\eta'_2 := 1-\eta_2\), and the identity maps \(\textrm{id}_{\mathcal{B}({\mathcal{H}_{\mathfrak{A},1}})}\) and \(\textrm{id}_{\mathcal{B}({\mathcal{H}'_{\mathfrak{A},2}})}\) on the spaces \(\mathcal{B}({\mathcal{H}_{\mathfrak{A},1}})\) and \(\mathcal{B}({\mathcal{H}'_{\mathfrak{A},2}})\).
Therefore, this system is a special case of the cqq wiretap channel \(D_\mathfrak{B}= D^{(\eta_\mathfrak{B}, N_\mathfrak{B})}\) and \(D_\mathfrak{E}= D^{(\eta_\mathfrak{E},N_\mathfrak{E})}\) described in Section 6.1 where \(\eta_\mathfrak{B}:= \eta_1\) and \(\eta_\mathfrak{E}:= \eta_2(1 - \eta_1)\).
In this subsection, we fix a channel \(D^{(\eta,N)}\) and show how to compute the quantities \(H_P,H_U,H_{P,{\alpha}},H_{U,{\alpha}}\) and evaluate 12 to 18 for such a channel under a Gaussian input distribution with energy \(E\) as defined in Section 6.1.
The properties collected in Section 6.1 allow us to calculate the quantities defined in (7 ) and (8 ) as \[\begin{align} \tag{78} H_P \overset{(\ref{eq:joint-entropy})}&{=} -\mathbb{E}_X \mathrm{tr}\left( D^{(\eta,N)}(X) \log D^{(\eta,N)}(X) \right) \overset{(\ref{eq:gaussian-cq-trace-function})}{=} - \mathrm{tr}( D^{(\eta,N)}(0) \log D^{(\eta,N)}(0) ) \overset{(\ref{eq:gaussian-cq-entropy})}{=} g(N) \\ \tag{79} H_U \overset{(\ref{eq:output-entropy})}&{=} H\left(D^{(\eta,N)}_P\right) \overset{(\ref{eq:gaussian-cq-entropy}),(\ref{eq:gaussian-cq-average})}{=} g(N+\eta E). \end{align}\]
Next, we use the convergence behavior of the geometric series to argue that for \(\alpha\in (0,\infty)\), \[\label{eq:gaussian-cq-power} \mathrm{tr}\left(D^{(\eta,N)}(0)^\alpha\right) = \frac{1}{(N+1)^\alpha} \sum_{k=0}^\infty \left( \frac{N}{N+1} \right)^{\alpha k} = \frac{1}{(N+1)^\alpha(1-N^\alpha/(N+1)^\alpha)} = \frac{1}{(N+1)^\alpha-N^\alpha}\tag{80}\]
With this, we can calculate the quantities defined in (10 ) and (11 ) as \[\begin{align} H_{P,{\alpha}} \overset{(\ref{eq:conditional-renyi-entropy})}&{=} \frac{1}{1-\alpha} \log\mathbb{E}_X\left( \mathrm{tr}\left(D^{(\eta,N)}(X)^\alpha\right) \right) \overset{(\ref{eq:gaussian-cq-trace-function})}{=} \frac{1}{1-\alpha} \log \mathrm{tr}\left(D^{(\eta,N)}(0)^\alpha\right) \overset{(\ref{eq:gaussian-cq-power})}{=} \frac{1}{\alpha-1} \log( (N+1)^\alpha-N^\alpha ) = g_\alpha(N) \\ H_{U,{\alpha}} \overset{(\ref{eq:renyi-entropy})}&{=} \frac{1}{1-\alpha} \log \mathrm{tr}\left(\left(D^{(\eta,N)}_P\right)^\alpha\right) \overset{(\ref{eq:gaussian-cq-average}),(\ref{eq:gaussian-cq-power})}{=} g_\alpha(N+\eta E), \end{align}\] where we use a modified version of the Gordon function \[\label{eq:gordon-renyi} g_\alpha:~ [0,\infty) \rightarrow [0,\infty) ,~~ t \mapsto \frac{\log((t+1)^\alpha-t^\alpha)}{\alpha-1}.\tag{81}\] From the considerations in Section 5.1, we know that with the convention \(g_1 := g\), the map \((0,\infty) \rightarrow (0,\infty), \alpha\mapsto g_\alpha(t)\) is continuous. We can also calculate the derivative \[\label{eq:gordon-renyi-derivative} g_\alpha'(t) := \frac{\partial}{\partial\alpha} g_\alpha(t) = \frac{ \frac{ (t+1)^\alpha\log(t+1) - t^\alpha\log t }{ (t+1)^\alpha-t^\alpha} - g_\alpha(t) }{ \alpha-1 }.\tag{82}\] With this, it is possible to numerically evaluate the functions \(\mathcal{R}_1,\mathcal{R}_2,\mathcal{R}_3,\mathcal{R}_4\) defined in 12 to 15 for any given values of \(\varepsilon,n\). Consequently, the evaluation of \(\mathcal{W}_\mathrm{coding}\) defined in 16 and \(\mathcal{W}_\mathrm{res}\) defined in 17 requires only a one-dimensional optimization over all admissible values of the parameter \(\varepsilon\). The evaluation of the bounds of Theorem [theorem:wiretap-cq-all-blocklengths] also requires calculation of \(\mathcal{W}_\mathrm{cost}^{(c,C)}\) defined in 18 and another doubly exponential term that appears in 24 , however, we observe that in the regime in which we evaluate the bounds for the plots in the present paper, these terms are so small that they are negligible. This means that we can avoid a costly simultaneous optimization over the parameters \(\beta_2\), \(\beta_3\), \(\beta_4\), and \(\beta_5\) that appear in the statement of Theorem [theorem:wiretap-cq-all-blocklengths]. Instead, we only consider the terms that contain \(\mathcal{W}_\mathrm{res}\) or \(\mathcal{W}_\mathrm{coding}\) and only check that the remaining terms are indeed neglible. For details on how small we ensure these terms are, we refer the reader to Section 6.4. Full details and comments on how this is ensured can be found in the source code in the electronic supplement of this paper.
We plot various quantities of interest for a model of an optical communication system with a noiseless source as described in Section 6.2. In order to find reasonable system parameters, we make a series of rough estimates of what values these parameters may have in a realistic communication system. When a transmitter communicates a line of sight optical signal to a receiver, the transmittivity \(\eta\in [0,1]\) can be roughly estimated as \[\eta= c\cdot\left(\frac{L_\mathrm{diam}}{L_\mathrm{dist}}\right)^2\] where \(c\) is a system-dependent constant, \(L_\mathrm{diam}\) the receiver diameter and \(L_\mathrm{dist}\) the distance between transmitter and receiver [61]. In slightly non-optimal situations where fog disturbs the link, this transmittivity in a free-space optical system can easily be as low as \(\eta=10^{-5}\) already at distances \(\require{physics} L_\mathrm{dist}=\qty{1}{\km}\) [61]. For our plots, we choose \(\eta_1 := 10^{-5}\) and \(\eta_2 := 6 \cdot 10^{-6}\) to obtain a scenario in which the eavesdropper is at a slightly greater distance from the sender than the legitimate receiver (or equivalently, guaranteed to be unable to place its receive antenna close enough to the center of the optical beam, cf. [6]), and otherwise uses equivalent equipment. For the average transmit energy, we note that the number of photons per channel use can be estimated by considering a laser with \(\require{physics} \qty{1}{\W}\) output power at \(\require{physics} \qty{1550}{\nm}\). This system will emit in the order of \(10^{19}\) photons per second, and typically use a modulation format such that \(10^{10}\) pulses are emitted per second. Thus, \(E:= 10^9\) photons per channel use are a fair estimate. We assume that the number of noise photons per channel use is around \(10^{-5}\), which is at the lower end of the plausible range of values. At a wave length of \(\require{physics} \qty{1550}{\nm}\) and a baud rate of \(10^{10}\) pulses per second, this is equivalent to a background noise power of about \(\require{physics} \qty{1.5e-14}{\W}\). This means that in our example, we choose \(N_\mathfrak{A}:= N_\mathfrak{B}:= 10^{-5}\).
This yields the system parameters \(\eta_\mathfrak{B}= \eta_1 = 10^{-5}\), \(\eta_\mathfrak{E}= \eta_2(1-\eta_1) \approx 6 \cdot 10^{-6}\), \(N_\mathfrak{B}= N_\mathfrak{E}= 10^{-5}\). By Theorem 2 and (78 ), (79 ), we obtain a bound \[\chi(P;D_\mathfrak{B}) - \chi(P;D_\mathfrak{E}) = g(N_\mathfrak{B}+ \eta_\mathfrak{B}E) - g(N_\mathfrak{B}) - g(N_\mathfrak{E}+ \eta_\mathfrak{E}E) + g(N_\mathfrak{E}) \approx 0.5108\] on the achievable rates. When calculating the achievable bounds, we do not optimize over the doubly exponential terms that appear in 24 , but we make sure they do not exceed \(\exp(-100)\) in value. This threshold is chosen arbitrarily, but we believe it is small enough to convince the reader that neglecting this term does not have more severe effects than other limitations that are inherent in numerical evaluations such as the machine precision of the computer used. For the cost constraint and the concentration of the security level and decoding error around its expectation, we also need to allow for a little bit of slack due to the nature of the results we evaluate. We allow the security level and decoding error to differ by at most \(10^{-8}\) from the expected value and the individual code words to exceed the expected energy per code word by at most \(10\%\). Again, the choice of these threshold values is somewhat arbitrary, but we believe the values are small enough to convince the reader that the difference between expected and actual security level and decoding error does not significantly impact the accuracy of the plots given that we do not display values of security level and decoding error smaller than \(10^{-5}\) and the possible additional energy consumption of some of the code words is not so large that it would meaningfully impact the practicality of such a code. In Fig. 4 and 5, it can be seen how the security level and decoding error decay with increasing block length for various rates. As expected, the decay is always at least exponential, but the slope of the lines is steeper for larger gaps to the upper bound of achievable rates. In Fig. 6, we plot the block lengths necessary to achieve a few selected combinations of decoding error and security level in dependence of the rate. It can be seen that the necessary block length increases sharply when the rate gets close to the upper bound.
In this appendix, we give the proofs for the technical lemmas related to quantum information theory and stochastics that are omitted in the main part of the paper.
Proof of Lemma 8. For [item:q-resolvability-typical-terms-joint]), we use the presence of the indicator in the definition of \(\Psi_{\varepsilon,n}(x^n)\) to bound \[\begin{align} \Psi_{\varepsilon,n} (x^n) D^n (x^n) \Psi_{\varepsilon,n} (x^n) &= \sum_{y^n\in \mathbb{N}^n} \mathbb{1}_{\mathcal{P}_{\varepsilon,n}} (x^n,y^n) P(y^n| x^n) \left\lvert { e_{y^n| x^n} } \right\rangle\left\langle { e_{y^n| x^n} } \right\rvert \\ &\leq \sum_{y^n\in \mathbb{N}^n} \mathbb{1}_{\mathcal{P}_{\varepsilon,n}} (x^n,y^n) \exp\big(-n(H_P- \varepsilon)\big) \left\lvert { e_{y^n| x^n} } \right\rangle\left\langle { e_{y^n| x^n} } \right\rvert \\ &= \exp\big( - n (H_P- \varepsilon) \big) \Psi_{\varepsilon,n} (x^n). \end{align}\]
For [item:q-resolvability-typical-terms-joint-trace], we fix some \(x^n\) and note \[\label{eq:q-resolvability-typical-terms-joint-cardinality} \mathrm{tr}\Psi_{\varepsilon,n}(x^n) = \mathrm{tr} \sum\limits_{y^n\in \mathbb{N}^n} \mathbb{1}_{\mathcal{P}_{\varepsilon,n}} ( x^n, y^n ) \left\lvert {e_{ y^n | x^n }} \right\rangle\left\langle {e_{ y^n | x^n }} \right\rvert = \left\lvert \{ y^n:~ (x^n, y^n) \in \mathcal{P}_{\varepsilon, n} \} \right \rvert.\tag{83}\] For every \((x^n, y^n) \in \mathcal{P}_{\varepsilon, n}\), we have \[P(y^n| x^n) > \exp\big( - n (H_P+ \varepsilon) \big).\] This allows us to argue \[1 \geq \sum_{y^n:~ (x^n, y^n) \in \mathcal{P}_{\varepsilon, n}} P(y^n| x^n) > \sum_{y^n:~ (x^n, y^n) \in \mathcal{P}_{\varepsilon, n}} \exp\big( - n (H_P+ \varepsilon) \big) = \left\lvert \{ y^n:~ (x^n, y^n) \in \mathcal{P}_{\varepsilon, n} \} \right \rvert \exp\big( - n (H_P+ \varepsilon) \big),\] from which item [item:q-resolvability-typical-terms-joint-trace]) follows by (83 ).
For [item:q-resolvability-typical-terms-output]), we have a calculation very similar to the one for [item:q-resolvability-typical-terms-joint]) \[\begin{align} \Theta_{\varepsilon,n} D_P^{\otimes n} \Theta_{\varepsilon,n} &= \sum_{y^n\in \mathbb{N}^n} \mathbb{1}_{\mathcal{U}_{\varepsilon,n}} (y^n) U(y^n) \left\lvert {e_{y^n}} \right\rangle\left\langle {e_{y^n}} \right\rvert \\ &\leq \sum_{y^n\in \mathbb{N}^n} \mathbb{1}_{\mathcal{U}_{\varepsilon,n}} (y^n) \exp\big( - n (H_U- \varepsilon) \big) \left\lvert {e_{y^n}} \right\rangle\left\langle {e_{y^n}} \right\rvert \\ &= \exp\big( - n (H_U- \varepsilon) \big) \Theta_{\varepsilon,n}. \end{align}\] For [item:q-resolvability-typical-terms-output-trace]), the argument is very similar to the one for [item:q-resolvability-typical-terms-joint-trace]). We note \[\label{eq:q-resolvability-typical-terms-output-cardinality} \mathrm{tr}\Theta_{\varepsilon,n} = \mathrm{tr}\sum_{y^n\in \mathcal{U}_{\varepsilon, n}} \left\lvert {e_{y^n}} \right\rangle\left\langle {e_{y^n}} \right\rvert = \left\lvert \mathcal{U}_{\varepsilon, n} \right \rvert.\tag{84}\] For every \(y^n\in \mathcal{U}_{\varepsilon, n}\), we have \[U(y^n) > \exp\big( - n (H_U+ \varepsilon) \big).\] This allows us to argue \[1 \geq U( y^n \in \mathcal{U}_{\varepsilon,n} ) = \sum_{y^n\in \mathcal{U}_{\varepsilon, n}} U(y^n) > \sum_{y^n\in \mathcal{U}_{\varepsilon, n}} \exp\big( - n (H_U+ \varepsilon) \big) = \left\lvert \mathcal{U}_{\varepsilon, n} \right \rvert \exp\big( - n (H_U+ \varepsilon) \big),\] from which item [item:q-resolvability-typical-terms-output-trace]) follows by (84 ). ◻
Proof of Lemma 9. We bound the trace in (56 ) as \[\begin{align} \nonumber \mathrm{tr}\left( D^n(X^n) \Gamma_{\varepsilon,n} (X^n) ^* \right) \overset{(\ref{eq:typicality-operator-product})}&{=} \mathrm{tr}\left( D^n(X^n) \Psi_{\varepsilon,n} (X^n) \Theta_{\varepsilon,n} \right) \\ \nonumber &= \mathrm{tr}\left( D^n(X^n) \Psi_{\varepsilon,n} (X^n) \right) - \mathrm{tr}\left( D^n(X^n) \Psi_{\varepsilon,n} (X^n) (\mathbf{1}-\Theta_{\varepsilon,n}) \right) \\ \nonumber \overset{(a)}&{=} \mathrm{tr}\left( D^n(X^n) \Psi_{\varepsilon,n} (X^n) \right) - \mathrm{tr}\left( (\mathbf{1}-\Theta_{\varepsilon,n}) \sqrt{D^n(X^n)} \Psi_{\varepsilon,n} (X^n) \sqrt{D^n(X^n)} (\mathbf{1}-\Theta_{\varepsilon,n}) \right) \\ \nonumber &\overset{(b)}{\geq} \mathrm{tr}\left( D^n(X^n) \Psi_{\varepsilon,n} (X^n) \right) - \mathrm{tr}\left( D^n(X^n) (\mathbf{1}-\Theta_{\varepsilon,n}) \right) \\ \label{eq:q-resolvability-atypical-terms-split} &= 1 - \mathrm{tr}\left( D^n(X^n) ( \mathbf{1} - \Psi_{\varepsilon,n} (X^n) ) \right) - \mathrm{tr}\left( D^n(X^n) (\mathbf{1}-\Theta_{\varepsilon,n}) \right) \end{align}\tag{85}\] where (a) and (b) use (53 ) along with the cyclic property of the trace and (b) additionally uses \(\Psi_{\varepsilon,n}(X^n) \leq \mathbf{1}\), which can be verified with the spectral representation (52 ).
The trace in (57 ) can be bounded as \[\begin{align} \nonumber \mathrm{tr}\left( D(X^n) \Phi_{\varepsilon,n} (X^n) \right) \overset{(\ref{eq:typicality-operator-product-decoder})}&{=} \mathrm{tr}\Big( D(X^n) \Theta_{\varepsilon,n} \Psi_{\varepsilon,n}(X^n) \Theta_{\varepsilon,n} \Big) \\ \nonumber \overset{(a)}&{\geq} \mathrm{tr}\Big( D(X^n) \Psi_{\varepsilon,n}(X^n) \Big) - 2 \mathrm{tr}\Big( D(X^n) (\mathbf{1}- \Theta_{\varepsilon,n}) \Big) \\ \label{eq:decoder-atypical-terms-split} &= 1 - \mathrm{tr}\Big( D(X^n) (\mathbf{1}- \Psi_{\varepsilon,n}(X^n)) \Big) - 2 \mathrm{tr}\Big( D(X^n) (\mathbf{1}- \Theta_{\varepsilon,n}) \Big), \end{align}\tag{86}\] where (a) is due to [52].
The same two terms appear in (85 ) and (86 ), and we will bound their expectations separately. For the first term, we recall the definition (10 ). Since \(\mathrm{tr}(D(X)^\alpha)\) is nonincreasing in \(\alpha\), it is clear that (50 ) and Lemma 19-[item:bochner-integral-basics-existence] ensure that \(H_{P,{\alpha}} < \infty\) for all \(\alpha\in [\alpha_{\min},\infty)\). With that, we have for every \(\alpha_1 \in (1,\infty)\) and \(\alpha_2 \in [\alpha_{\min},1)\) \[\begin{align} \nonumber &\hphantom{{}={}} \mathbb{E}\mathrm{tr}\left( D^n(X^n) ( \mathbf{1} - \Psi_{\varepsilon,n} (X^n) ) \right) \\ \nonumber &= \mathbb{E} \mathrm{tr}\left( \sum\limits_{y^n\in \mathbb{N}^n} \mathbb{1}_{ (\mathcal{X}^n\times \mathbb{N}^n) \setminus \mathcal{P}_{\varepsilon,n} } ( X^n, y^n ) P(y^n| X^n) \left\lvert {e_{ y^n | X^n }} \right\rangle\left\langle {e_{ y^n | X^n }} \right\rvert \right) \displaybreak[0] \\ \nonumber &= P( (\mathcal{X}^n\times \mathbb{N}^n) \setminus \mathcal{P}_{\varepsilon,n} ) \displaybreak[0] \\ \nonumber &= P\big( -\log(P(Y^n| X^n)) \leq n(H_P- \varepsilon) \big) + P\big( -\log(P(Y^n| X^n)) \geq n(H_P+ \varepsilon) \big) \displaybreak[0] \\ \nonumber &= P\big( P(Y^n| X^n)^{\alpha_1 - 1} \geq \exp\left( -n(\alpha_1-1)(H_P- \varepsilon) \right) \big) + P\big( P(Y^n| X^n)^{\alpha_2-1} \geq \exp\left( -n(\alpha_2-1)(H_P+ \varepsilon) \right) \big) \displaybreak[0] \\ \nonumber \overset{(a)}&{\leq} \mathbb{E}_P\left( P(Y^n| X^n)^{\alpha_1 - 1} \right) \exp\left( n(\alpha_1-1)(H_P- \varepsilon) \right) + \mathbb{E}_P\left( P(Y^n| X^n)^{\alpha_2-1} \right) \exp\left( n(\alpha_2-1)(H_P+ \varepsilon) \right) \\ \label{eq:q-resolvability-atypical-terms-joint} &= \exp\left( -n(\alpha_1-1)(H_{P,{\alpha_1}} + \varepsilon- H_P) \right) + \exp\left( -n(1-\alpha_2)(H_P+ \varepsilon- H_{P,{\alpha_2}}) \right), \end{align}\tag{87}\] where the inequality step (a) is due to Markov’s inequality.
Similarly, we recall the definition (11 ) and note that if \(D_P^\alpha\in \mathcal{T}({\mathcal{H}})\), then \(H_{U,{\alpha}} < \infty\). For \(\alpha_3 \in (1,\infty)\) and \(\alpha_4 \in [\alpha_{\min}, 1)\), we obtain \[\begin{align} \nonumber &\hphantom{{}={}} \mathbb{E}\mathrm{tr}\left( D^n(X^n) (\mathbf{1}-\Theta_{\varepsilon,n}) \right) \\ \nonumber &= \mathrm{tr}\left( D_P^{\otimes n} (\mathbf{1}-\Theta_{\varepsilon,n}) \right) \displaybreak[0] \\ \nonumber &= \mathrm{tr}\left( \sum_{y^n\in \mathbb{N}} \mathbb{1}_{ \mathbb{N}^n \setminus \mathcal{U}_{\varepsilon,n} } U(y^n) \left\lvert {e_{y^n}} \right\rangle\left\langle {e_{y^n}} \right\rvert \right) \displaybreak[0] \\ \nonumber &= U( \mathbb{N}^n \setminus \mathcal{U}_{\varepsilon,n} ) \displaybreak[0] \\ \nonumber &= U\Big( -\log U( Y^n ) \leq n(H_U- \varepsilon) \Big) + U\Big( -\log U( Y^n ) \geq n(H_U+ \varepsilon) \Big) \displaybreak[0] \\ \nonumber &= U\Big( U( Y^n )^{\alpha_3-1} \geq \exp\big( - n (\alpha_3-1) (H_U- \varepsilon) \big) \Big) + U\Big( U( Y^n )^{\alpha_4-1} \geq \exp\big( - n (\alpha_4-1) (H_U+ \varepsilon) \big) \Big) \displaybreak[0] \\ \nonumber &\leq \mathbb{E}_U\Big( U( Y^n )^{\alpha_3-1} \Big) \exp\big( n (\alpha_3-1) (H_U- \varepsilon) \big) + \mathbb{E}_U\Big( U( Y^n )^{\alpha_4-1} \Big) \exp\big( n (\alpha_4-1) (H_U+ \varepsilon) \big) \\ \label{eq:q-resolvability-atypical-terms-output} &= \exp\big( - n (\alpha_3-1) ( H_{U,{\alpha_3}} + \varepsilon - H_U ) \big) + \exp\big( - n (1-\alpha_4) ( H_U + \varepsilon - H_{U,{\alpha_4}} ) \big). \end{align}\tag{88}\]
Optimizing over the choices of \(\alpha_1\), \(\alpha_2\), \(\alpha_3\), and \(\alpha_4\), 56 now follows from 85 , 87 , and 88 , and 57 follows from 86 , 87 , and 88 . To argue 58 , we invoke Lemma 21 to argue that there exist \(\alpha_1 \in (1,\infty)\) with \(H_{P,{\alpha_1}} > H_P- \varepsilon\), \(\alpha_2 \in [\alpha_{\min},1)\) with \(H_{P,{\alpha_2}} < H_P+ \varepsilon\), \(\alpha_3 \in (1,\infty)\) with \(H_{U,{\alpha_3}} > H_U- \varepsilon\), and \(\alpha_4 \in [\alpha_{\min},1)\) with \(H_{U,{\alpha_4}} < H_U+ \varepsilon\). These properties ensure we can choose \[\begin{align} \gamma_1 \in \Big( 0, \min\big( &(\alpha_1-1)(H_{P,{\alpha_1}} + \varepsilon- H_P) , \\ &(1-\alpha_2)(H_P+ \varepsilon- H_{P,{\alpha_2}}) , \\ &(\alpha_3-1) ( H_{U,{\alpha_3}} + \varepsilon - H_U ) , \\ &(1-\alpha_4) ( H_U + \varepsilon - H_{U,{\alpha_4}} ) \big) \Big). \end{align}\] This choice of \(\gamma_1\) clearly satisfies \[\begin{align} \gamma_1 &< (\alpha_1-1)(H_{P,{\alpha_1}} + \varepsilon- H_P) < \sup_{\alpha\in (1,\infty)} (\alpha-1)(H_{P,{\alpha}} + \varepsilon- H_P) \\ \gamma_1 &< (1-\alpha_2)(H_P+ \varepsilon- H_{P,{\alpha_2}}) < \sup_{\alpha\in [\alpha_{\min},1)} (1-\alpha)(H_P+ \varepsilon- H_{P,{\alpha_2}}) \displaybreak[0] \\ \gamma_1 &< (\alpha_3-1) ( H_{U,{\alpha_3}} + \varepsilon - H_U ) < \sup_{\alpha\in (1,\infty)} (\alpha-1) ( H_{U,{\alpha}} + \varepsilon - H_U ) \\ \gamma_1 &< (1-\alpha_4) ( H_U + \varepsilon - H_{U,{\alpha_4}} ) < \sup_{\alpha\in [\alpha_{\min},1)} (1-\alpha) ( H_U + \varepsilon - H_{U,{\alpha}} ). \end{align}\] Since \(\gamma_1\) is strictly less than any of the exponents that appear in the definitions of \(\mathcal{R}_1\), \(\mathcal{R}_2\), \(\mathcal{R}_3\), and \(\mathcal{R}_4\), these choices ensure that 58 is also satisfied as long as \(n\) is large enough. ◻
Proof of Lemma 11. [item:hninequality-bounded]) In the proof of the first claim, we will use the following fact: If \(T\in \mathcal{B}({\mathcal{H}})\), \(T\ge 0\), and \(\mathrm{im}(T)\) is closed, then \(\mathrm{im}(T)=\mathrm{im}(\sqrt{T})\). To prove this, first observe that clearly \(\mathrm{im}(T)\subseteq \mathrm{im}(\sqrt{T})\) holds since \(T= \sqrt{T}\cdot \sqrt{T}\). Therefore, it suffices to show \(\mathrm{im}(\sqrt{T})\subseteq \mathrm{im}(T)\). The last inclusion is a consequence of the elementary relations \(\mathrm{ker}(T)=\mathrm{ker}(\sqrt{T})\), and \(\mathrm{ker}(T)=\mathrm{im}(T)^{\perp}\), where \(\mathrm{ker}(T)=\{h\in\mathcal{H}: Th=0 \}\) and \(M^{\perp}\) denotes the orthogonal complement of a subspace \(M\subseteq\mathcal{H}\): \[\label{eq:images-1} \mathrm{im}(T)^{\perp}=\mathrm{ker}(T)=\mathrm{ker}(\sqrt{T})=\mathrm{im}(\sqrt{T})^{\perp}.\tag{89}\] Passing to the orthogonal complement in (89 ), using our assumption that \(\mathrm{im}(T)\) is closed, and the fact that for every subspace \(M\subseteq \mathcal{H}\) we have \(M^{\perp \perp}= \overline{M}\), we obtain \[\mathrm{im}(T)= \mathrm{im}(\sqrt{T})^{\perp \perp}=\overline{\mathrm{im}(\sqrt{T})}\supseteq \mathrm{im}(\sqrt{T}).\]
Since \(\mathrm{im}(A+B)\) is closed by assumption, we have \(\mathrm{im}(A+ B)=\mathrm{im}(\sqrt{A+B})\). Consequently, \(\mathrm{im}(\sqrt{A+B})\) is
closed as well and [53] states that in this case the Moore-Penrose pseudoinverse \(\sqrt{A+B}^{-1}\) is a bounded linear operator, i.e. \(\sqrt{A+B}^{-1} \in \mathcal{B}({\mathcal{H}})\).
[item:hninequality-main]) The operator inequality in the lemma follows from the proof of [52]. ◻
Proof of Lemma 14. The compatibility of the cost constraint by Definition 1 clearly implies that \(p(t) < \infty\) in an interval around \(0\) and that \(p'(0) = \mathbb{E}_P(c(X)-C) < 0\). Since \(p(0) = 1\), this means that there is \(\hat{t} > 0\) with \(p(\hat{t}) < 1\) and hence, \(\beta_1>0\).
Furthermore, for any \(\hat{t} > 0\) with \(p(\hat{t}) < \infty\), we have \[\begin{align} \mathbb{P}_\mathcal{C}\left( \sum_{i=1}^nc(\mathcal{C}(m)_i) > nC \right) &= \mathbb{P}_\mathcal{C}\left( \sum_{i=1}^n \exp\left( \hat{t} \left( c(\mathcal{C}(m)_i) - C \right) \right) > 1 \right) \\ \overset{(a)}&{\leq} \prod_{i=1}^n \mathbb{E}_\mathcal{C} \exp\left( \hat{t} \left( c(\mathcal{C}(m)_i) - C \right) \right) \\ &= p(\hat{t})^n, \end{align}\] where step (a) uses Markov’s inequality and the independence of the codeword entries. This clearly implies \[\mathbb{P}_\mathcal{C}\left( \sum_{i=1}^nc(\mathcal{C}(m)_i) > nC \right) \leq \exp(-n\beta_1).\] with \(\beta_1\) defined in 19 .
This means that \(\left\lvert \mathbb{B} \right \rvert\) follows a binomial distribution with \(M\) trials and some success probability \(s\leq \exp(-n\beta_1)\). Hence, for every \(\beta\in (0,\beta_1)\), \[\begin{align} \mathbb{P}_\mathcal{C}\big( \left\lvert \mathbb{B} \right \rvert \geq M\exp(-n\beta) \big) &= \mathbb{P}_\mathcal{C}\big( \left\lvert \mathbb{B} \right \rvert \geq Ms + M(\exp(-n\beta) - s) \big) \\ &= \mathbb{P}_\mathcal{C}\big( \left\lvert \mathbb{B} \right \rvert - \mathbb{E}\left\lvert \mathbb{B} \right \rvert \geq M(\exp(-n\beta) - s) \big) \\ \overset{(a)}&{\leq} \exp\left( -2\frac{M^2(\exp(-n\beta) - s)^2}{M} \right) \\ &= \exp\left( -2M(\exp(-n\beta) - s)^2 \right) \\ \overset{(b)}&{\leq} \exp\Big( -2M\exp(-2n\beta)\big(1-\exp(-n(\beta_1-\beta))\big)^2 \Big) \\ &= \mathcal{W}_\mathrm{cost}^{(c,C)}\left( \beta, \frac{\log M}{n}, n \right) \end{align}\] where step (a) is due to the Chernoff-Hoeffding inequality in the form stated, e.g., in [56] and step (b) uses \(s\leq\exp(-n\beta_1)\leq\exp(-n\beta)\). This concludes the proof of ?? . Furthermore, since \(\beta<\beta_1\), it is clear that \(1-\exp(-n(\beta_1-\beta))\) tends to \(1\) as \(n\) tends to infinity. In particular, this expression is lower bounded by, say, \(1/\sqrt{2}\) for sufficiently large \(n\). The bound ?? is then clear. ◻
In this appendix, we collect technical facts on functional analysis and Rényi entropy that we need for the proofs in this paper.
Lemma 15. The norms \(\left\lVert {\cdot} \right\rVert_\mathrm{op}\) on \(\mathcal{B}({\mathcal{H}})\) and \(\left\lVert {\cdot} \right\rVert_\mathrm{tr}\) on \(\mathcal{T}({\mathcal{H}})\) have the following properties:
\(\forall A\in \mathcal{T}({\mathcal{H}}):~ \left\lVert {A} \right\rVert_\mathrm{op} \leq \left\lVert {A} \right\rVert_\mathrm{tr}\)
\(\forall A_1 \in \mathcal{B}({\mathcal{H}}), A_2 \in \mathcal{T}({\mathcal{H}}), A_3 \in \mathcal{B}({\mathcal{H}}) :~ A_1 A_2 A_3 \in \mathcal{T}({\mathcal{H}}) \wedge \left\lVert {A_1 A_2 A_3} \right\rVert_\mathrm{tr} \leq \left\lVert {A_1} \right\rVert_\mathrm{op} \left\lVert {A_2} \right\rVert_\mathrm{tr} \left\lVert {A_3} \right\rVert_\mathrm{op}\)
\(\forall A_1, A_2 \in \mathcal{T}({\mathcal{H}}) :~ \left\lVert {A_1 A_2} \right\rVert_\mathrm{tr} \leq \left\lVert {A_1} \right\rVert_\mathrm{tr} \left\lVert {A_2} \right\rVert_\mathrm{tr}\)
The maps \(\mathcal{T}({\mathcal{H}}) \rightarrow \mathcal{K}({\mathcal{H}})', A\mapsto \mathrm{tr}(A\cdot)\) and \(\mathcal{B}({\mathcal{H}}) \rightarrow \mathcal{T}({\mathcal{H}})', A\mapsto \mathrm{tr}(A\cdot)\) are isometric isomorphisms. In particular, \[\forall A\in \mathcal{T}({\mathcal{H}}):~ \left\lVert {A} \right\rVert_\mathrm{tr} = \sup\left\{ \left \lvert \mathrm{tr}(AB) \right \rvert :~ B\in \mathcal{B}({\mathcal{H}}) \wedge \left\lVert {B} \right\rVert_\mathrm{op} \leq 1 \right\}.\]
For any \(\rho, \sigma\in \mathcal{S}({\mathcal{H}})\), we have \[\left\lVert {\rho- \sigma} \right\rVert_\mathrm{tr} = 2\max\left\{ \mathrm{tr}(B(\rho- \sigma)) :~ 0 \leq B\leq \mathbf{1} \right\}.\]
Since these are well-known facts, we provide references to textbooks in lieu of a proof: [item:norm-basics-trace-norm-bound]) follows from [62], [item:norm-basics-trace-class-ideal]) follows from [62], [item:norm-basics-trace-norm-submultiplicative]) is immediate from [item:norm-basics-trace-norm-bound]) and [item:norm-basics-trace-class-ideal]), [item:norm-basics-trace-norm-duality] is [37], and [item:norm-basics-zero-trace-operators] is stated in [63] for the finite-dimensional case, however, the proof also works for infinite-dimensional spaces without modification.
Lemma 16. Let \(\ell\in \mathbb{N}\). Then the map \(\mathcal{T}({\mathcal{H}})^\ell\rightarrow \mathcal{T}({\mathcal{H}}), (A_1, \dots, A_\ell) \mapsto A_1 \circ \cdots \circ A_\ell\) is continuous.
Proof. The proof is by induction on \(\ell\). The case \(\ell= 1\) is clear, and the case \(\ell> 2\) follows by induction hypothesis and the case \(\ell= 2\).
Hence, the only case left to prove is \(\ell= 2\). Let \(A_1, A_2 \in \mathcal{T}({\mathcal{H}})\), and for all \(i\in \{1,2\}\), let \((A_i^{(j)})_{j\in \mathbb{N}}\) be a sequence with \[\lim\limits_{j\rightarrow \infty} \left\lVert { A_i - A_i^{(j)} } \right\rVert_\mathrm{tr} = 0.\] Then we use the triangle inequality and sub-multiplicativity of the trace norm in Lemma 15-[item:norm-basics-trace-norm-submultiplicative] to argue \[\begin{align} \left\lVert { A_1 A_2 - A_1^{(j)} A_2^{(j)} } \right\rVert_\mathrm{tr} &= \left\lVert { A_1 A_2 - A_1^{(j)} A_2 + A_1^{(j)} A_2 - A_1^{(j)} A_2^{(j)} } \right\rVert_\mathrm{tr} \\ &\leq \left\lVert { A_1 - A_1^{(j)} } \right\rVert_\mathrm{tr} \left\lVert { A_2 } \right\rVert_\mathrm{tr} + \left\lVert { A_1^{(j)} } \right\rVert_\mathrm{tr} \left\lVert { A_2 - A_2^{(j)} } \right\rVert_\mathrm{tr} \end{align}\] Since the norm of a convergent sequence is upper bounded, we can apply the limit on both sides and obtain \[\lim\limits_{j\rightarrow \infty} \left\lVert { A_1 A_2 - A_1^{(j)} A_2^{(j)} } \right\rVert_\mathrm{tr} = 0. \qedhere\] ◻
Lemma 17. Let \(s\in (0,\infty)\), and let \(D: \mathcal{X}\rightarrow \{ A\in \mathcal{T}({\mathcal{H}}):~ 0 \leq A\leq s\mathbf{1}\}\) be measurable. Let \(f: [0,s] \rightarrow \mathbb{R}\) be continuous with \(f(0)=0\), and assume that \(f(D(x)) \in \mathcal{T}({\mathcal{H}})\) for all \(x\in \mathcal{X}\). Then \(\mathcal{X}\rightarrow \mathcal{T}({\mathcal{H}}), x\mapsto f(D(x))\) is measurable.
A sufficient condition for \(f(D(x)) \in \mathcal{T}({\mathcal{H}})\) is that there is \(c\in (0,s]\) with \(f(t) = 0\) for all \(t< c\).
Proof. We begin with the second part of the statement and show that if there is \(c\in (0,s]\) with \(f(t) = 0\) for all \(t< c\), then \(f(D(x)) \in \mathcal{T}({\mathcal{H}})\) for all \(x\in \mathcal{X}\). To this end, we apply the spectral theorem for self-adjoint trace class operators to write \[f(D(x)) = \sum_{\ell=1}^\infty f(\lambda_\ell(x)) \left\lvert {e_\ell(x)} \right\rangle\left\langle {e_\ell(x)} \right\rvert,\] where \((e_\ell(x))_{\ell\in \mathbb{N}}\) is a sequence of corresponding orthonormal eigenvectors and \((\lambda_\ell(x))_{\ell\in \mathbb{N}}\) is the non-increasing sequence of eigenvalues of \(D(x)\) which contains all nonzero eigenvalues counted with multiplicity and converges to \(0\). The Riesz-Schauder theorem (see, e.g., [37] ensures that it is possible to arrange the eigenvalues in such a way. For every \(x\), we have \(f(\lambda_\ell(x)) \neq 0\) for only finitely many \(\ell\), and thus \(f(D(x)) \in \mathcal{T}({\mathcal{H}})\).
Let us briefly outline the proof of the first part of the lemma statement. We will first prove that if \(p: \mathbb{R}\rightarrow \mathbb{R}\) is a polynomial function with \(p(0) = 0\), \[\label{eq:continuous-functions-operator-measurable-polynomials} \mathcal{X}\rightarrow \mathcal{T}({\mathcal{H}}), x \mapsto p(D(x)) \textrm{ is measurable.}\tag{90}\] Next, we will infer that \[\label{eq:continuous-functions-operator-measurable-weak-measurability} \forall h_1, h_2 \in \mathcal{H} :~ \mathcal{X}\rightarrow \mathbb{C}, x \mapsto \left\langle {h_1}, { f(D(x)) h_2 } \right\rangle \textrm{ is measurable.}\tag{91}\] Finally, we will use (91 ) to verify the criterion [64] for measurability of operator valued functions.
For (90 ), we note that since \(p(0) = 0\), we can write \[p(D(x)) = \sum\limits_{i=1}^\ell a_i D(x)^i.\] Each summand is a composition of the measurable map \(D^i: \mathcal{X}\rightarrow \mathcal{T}({\mathcal{H}})^i, x\mapsto (D(x), \dots, D(x))\), and the product of operators which is continuous by Lemma 16. Both the trace class and the set of measurable functions are closed under finite summations, hence we obtain (90 ).
In order to prove (91 ), we apply the Weierstraß approximation theorem to obtain a sequence of polynomials \((p_i)_{i\in \mathbb{N}}\) with \(p_i(0) = 0\) for every \(i\) and \[\lim\limits_{i\rightarrow \infty} \left\lVert { f - p_i } \right\rVert_\infty = 0,\] where the difference is pointwise and \(p_i\) is implicitly identified with its restriction to \([0,s]\). By the continuity of the continuous functional calculus, this implies, for every \(x\), \[\lim\limits_{i\rightarrow \infty} \left\lVert { f(D(x)) - p_i(D(x)) } \right\rVert_\mathrm{op} = 0.\] The map \(\mathcal{B}({\mathcal{H}}) \rightarrow \mathbb{C}, A\mapsto \left\langle {h_1}, {Ah_2} \right\rangle\) is continuous for every fixed \(h_1,h_2 \in \mathcal{H}\), so it follows that \[\label{eq:operator-functions-weakly-measurable-limit-representation} \lim\limits_{i\rightarrow \infty} \left\langle {h_1}, {p_i(D(x)) h_2} \right\rangle = \left\langle {h_1}, {f(D(x)) h_2} \right\rangle.\tag{92}\] Clearly, the identity map \(\mathcal{T}({\mathcal{H}}) \rightarrow \mathcal{B}({\mathcal{H}})\) is also continuous and \(\mathcal{X}\rightarrow \mathcal{T}({\mathcal{H}}), x\mapsto p_i(D(x))\) is measurable by (90 ). Therefore, (92 ) is a representation of the map \(x\mapsto \left\langle {h_1}, { f(D(x)) h_2} \right\rangle\) as a pointwise limit of measurable functions, hence we obtain (91 ).
Clearly, \(\mathrm{im}f\circ D\subseteq \mathcal{T}({\mathcal{H}})\) is separable due to the separability of \(\mathcal{H}\), so the remaining criterion [64] requires us to verify that for every \(\varphi\in \mathcal{T}({\mathcal{H}})'\), the map \(\mathcal{X}\rightarrow \mathbb{C}, x\mapsto \varphi(f(D(x)))\) is measurable. By Lemma 15-[item:norm-basics-trace-norm-duality], the map \[\mathcal{B}({\mathcal{H}}) \rightarrow \mathcal{T}({\mathcal{H}})', A\mapsto \mathrm{tr}(A~\cdot)\] is an isometric isomorphism. Therefore, it remains to show that for all \(A\in \mathcal{B}({\mathcal{H}})\), the map \(\mathcal{X}\rightarrow \mathbb{C}, x\mapsto \mathrm{tr}(Af(D(x)))\) is measurable.
By Lemma 15-[item:norm-basics-trace-class-ideal], \(Af(D(x)) \in \mathcal{T}({\mathcal{H}})\). Pick an orthonormal basis \((e_i)_{i\in \mathbb{N}}\) of \(\mathcal{H}\). Then, by [37], \[\mathrm{tr}(Af(D(x))) := \sum_{i=1}^\infty \left\langle {e_i}, {Af(D(x))e_i} \right\rangle\] converges absolutely and regardless of the choice of basis. For every \(\ell\in \mathbb{N}\), we have \[\sum\limits_{i=1}^\ell \left\langle {e_i}, {Af(D(x))e_i} \right\rangle = \sum\limits_{i=1}^\ell \left\langle {A^*e_i}, {f(D(x))e_i} \right\rangle,\] which is a measurable function of \(x\) by (91 ). So we have written the map \(\mathcal{X}\rightarrow \mathbb{C}, x\mapsto \mathrm{tr}(Af(D(x)))\) as a pointwise limit of measurable maps. ◻
Lemma 18. Let \(D: \mathcal{X}\rightarrow \mathcal{S}({\mathcal{H}})\) be measurable. For every \(\ell\in \mathbb{N}\), define a map \(\lambda_\ell: \mathcal{X}\rightarrow [0,1]\) in such a way that for every \(x\in \mathcal{X}\), \((\lambda_\ell(x))_{\ell\in \mathbb{N}}\) is the non-increasing sequence which contains all nonzero eigenvalues of \(D(x)\) counted with multiplicity.
Then, we have the following:
For every \(\ell\in \mathbb{N}\), the map \(\lambda_\ell: \mathcal{X}\rightarrow [0,1]\) is measurable.
Let \(0 < a< b\leq 1\). Then \(\mathcal{X}\rightarrow \mathcal{T}({\mathcal{H}}), x \mapsto \mathbb{1}_{(a,b)} (D(x))\) is measurable.
Proof. For item [item:spectral-decompositions-measurable-eigenvalues]), we let \(s_\ell: \mathcal{S}({\mathcal{H}}) \rightarrow [0,1]\) be the map that maps each \(A\) to its \(\ell\)-th largest eigenvalue (counted with multiplicity). We note that \(s_\ell\circ D= \lambda_\ell\). Since by Lemma 15-[item:norm-basics-trace-norm-bound], the operator norm is upper bounded by the trace norm, [65] implies that the map \(s_\ell\) is continuous. Therefore, \(\lambda_\ell\) is measurable, since it is the composition of a continuous and a measurable map.
For item [item:spectral-decompositions-measurable-indicators]), we define a sequence \((f_i)_{i\in \mathbb{N}}\) of functions \([0,1] \rightarrow \mathbb{R}\) as follows: \[f_i(t) := \begin{cases} 1, &t\in [a+ \frac{1}{i}, b- \frac{1}{i}] \\ i(t- a), &t\in (a, a+ \frac{1}{i}) \\ i(b- t), &t\in (b- \frac{1}{i}, b) \\ 0, &t\notin (a, b). \end{cases}\] It is straightforward to verify that \((f_i)_{i\in \mathbb{N}}\) is a sequence of continuous functions and that for all \(i\) and all \(t< a\), we have \(f_i(t) = 0\). This allows us to apply Lemma 17 and conclude that the functions \(\mathcal{X}\rightarrow \mathcal{T}({\mathcal{H}}), x\mapsto f_i(D(x))\) are measurable.
It remains to show that for every \(x\), \(f_i(D(x))\) converges to \(\mathbb{1}_{(a,b)}(D(x))\) in trace norm. We are going to prove the stronger statement \[\forall x\in \mathcal{X}~ \exists i' \in \mathbb{N}~ \forall i\geq i': f_i(D(x)) = \mathbb{1}_{(a,b)}(D(x)).\] To this end, we observe that we have equality if \(f_i\) and \(\mathbb{1}_{(a,b)}\) coincide on \(\sigma\left({D(x)}\right)\). It can easily be verified that these functions coincide everywhere but on \((a, a+ \frac{1}{i})\) and \((b- \frac{1}{i}, b)\). But if \(\sigma\left({D(x)}\right)\) has nonempty intersection with one of these intervals for infinitely many \(i\), we have found an accumulation point of \(\sigma\left({D(x)}\right)\) either at \(a\) or \(b\). By the Riesz-Schauder theorem (see, e.g., [37]), \(\sigma\left({D(x)}\right)\) does not have nonzero accumulation points, so this contradicts \(a, b\neq 0\). ◻
Lemma 19. Let \((\Omega, \mathcal{F}, \mu)\) be a probability space, \(\mathcal{H}\) a separable Hilbert space and let \(A: \Omega\rightarrow \mathcal{T}({\mathcal{H}})\) be a random variable. Then, we have the following:
The Bochner integral \(\mathbb{E}A\in \mathcal{T}({\mathcal{H}})\) exists iff the Lebesgue integral \(\mathbb{E}\left\lVert {A} \right\rVert_\mathrm{tr}\) exists and is finite.
\(\left\lVert {\mathbb{E}A} \right\rVert_\mathrm{tr} \leq \mathbb{E}\left\lVert {A} \right\rVert_\mathrm{tr}\)
Let \(B\) be a bounded linear functional on \(\mathcal{T}({\mathcal{H}})\). Then, if \(\mathbb{E}A\) exists, \(\mathbb{E}(B(A))\) exists and \(B(\mathbb{E}A) = \mathbb{E}(B(A))\),
If \(\mathbb{E}A\) exists, then \(\mathbb{E}\mathrm{tr}A\) exists and \(\mathrm{tr}\mathbb{E}A= \mathbb{E}\mathrm{tr}A\).
If for all \(\omega\in \Omega\), we have \(A(\omega) \geq 0\) and \(\sqrt{A(\omega)} \in \mathcal{T}({\mathcal{H}})\), then \[\mathbb{E}\sqrt{A} \leq \sqrt{\mathbb{E}{A}}.\]
Proof. For [item:bochner-integral-basics-existence]) and [item:bochner-integral-basics-trace-norm-jensen]), we note that the separability of \(\mathcal{H}\) implies that \(\mathcal{T}({\mathcal{H}})\) is separable, which means that \(A\) has separable range. Therefore, the statements are proven in [40]. [item:bochner-integral-basics-bounded-functional]) is a special case of [40]. [item:bochner-integral-basics-trace]) is a special case of [item:bochner-integral-basics-bounded-functional]), since the trace is a bounded linear operator [62].
For [item:bochner-integral-basics-square-root-jensen]), first observe that the measurability of \(\omega\mapsto \sqrt{A(\omega)}\) follows from Lemma 17. We apply the theorem in [66] in the form of eq. (K’) with the function \(\mathcal{T}({\mathcal{H}}) \rightarrow \mathcal{T}({\mathcal{H}}), A\mapsto A^* A\) substituted for \(g\). To this end, we first have to note that the function is continuous by Lemma 16 and that it satisfies the convexity condition in [66]. In order to show the latter, we calculate for \(t\in [0,1]\) and \(A_1, A_2 \in \mathcal{T}({\mathcal{H}})\): \[\begin{align} &\hphantom{{}={}} \big( tA_1 + (1-t) A_2 \big) ^* \big( tA_1 + (1-t) A_2 \big) \\ &= t^2 A_1^* A_1 + t(1-t)( A_1^* A_2 + A_2^* A_1 ) + (1-t)^2 A_2^* A_2 \\ &= tA_1^* A_1 + (1-t) A_2^* A_2 + (t^2 - t) A_1^* A_1 + \big((1-t)^2-(1-t)\big) A_2^* A_2 + t(1-t)( A_1^* A_2 + A_2^* A_1 ) \\ &= tA_1^* A_1 + (1-t) A_2^* A_2 - t(1-t) \big( A_1^* A_1 - A_1^* A_2 - A_2^* A_1 + A_2^* A_2 \big) \\ &= tA_1^* A_1 + (1-t) A_2^* A_2 - t(1-t) \big( A_1 - A_2 \big) ^* \big( A_1 - A_2 \big) \\ &\leq tA_1^* A_1 + (1-t) A_2^* A_2. \end{align}\] We can now argue \[\mathbb{E}\sqrt{A} \overset{(a)}{=} \sqrt{ \left( \mathbb{E}\sqrt{A} \right) ^* \mathbb{E}\sqrt{A} } \overset{(b)}{\leq} \sqrt{ \mathbb{E} \left( \sqrt{A}^* \sqrt{A} \right) } \overset{(c)}{=} \sqrt{\mathbb{E}{A}}.\] (a) is due to \(\mathbb{E}\sqrt{A} \geq 0\), (b) is the application of [66] combined with the fact that the square root is operator monotone [51], and (c) follows directly from \(A\geq 0\). ◻
Lemma 20. Let \(p: \mathbb{N}\rightarrow [0,1]\) be a pmf with \[\sum\limits_{i\in \mathbb{N}} p(i)^{\alpha_{\min}} < \infty.\] Then the function \[\label{eq:renyi-entropy-basics-preliminary-function} f_p: \alpha \mapsto \sum\limits_{i\in \mathbb{N}} p(i)^\alpha\qquad{(19)}\] is continuously differentiable on \((\alpha_{\min},\infty)\), and its derivative is \[\label{eq:renyi-entropy-basics-preliminary-derivative} f_p': \alpha \mapsto \sum\limits_{i\in \mathbb{N}} p(i)^\alpha \log p(i) \in (-\infty,0].\qquad{(20)}\] Moreover, for every \(\alpha_0 > \alpha_{\min}\), there exists \(g_{\max}(\alpha_0) \in [0, \infty)\) such that for all \(\alpha\in [\alpha_0,\infty)\) \[\label{eq:renyi-entropy-basics-preliminary-derivative-bound} \left \lvert f_p'(\alpha) \right \rvert \leq g_{\max}(\alpha_0) \sum\limits_{i\in \mathbb{N}} p(i)^{\alpha_{\min}} < \infty .\qquad{(21)}\]
Proof. Fix an arbitrary \(\alpha_0 > \alpha_{\min}\). Because \(\alpha_0\) can be chosen arbitrarily close to \(\alpha_{\min}\), it is sufficient to compute \(f'\) on \([\alpha_0,\infty)\). By [67], we only have to show that \[\sum\limits_{i\in \mathbb{N}} \frac{d}{d \alpha} p(i)^\alpha = \sum\limits_{i\in \mathbb{N}} p(i)^\alpha \log p(i)\] converges uniformly for \(\alpha\in [\alpha_0,\infty)\). To this end, we write \[- p(i)^\alpha \log p(i) = - p(i)^{\alpha_{\min}} p(i)^{\alpha_0 - \alpha_{\min}} p(i)^{\alpha- \alpha_0} \log p(i) \leq - p(i)^{\alpha_{\min}} p(i)^{\alpha_0 - \alpha_{\min}} \log p(i)\] By the Weierstraß criterion, we only have to show that \[\sum\limits_{i\in \mathbb{N}} \Big( - p(i)^{\alpha_{\min}} p(i)^{\alpha_0 - \alpha_{\min}} \log p(i) \Big)\] converges. To this end, we first examine the function \[g: (0,1] \rightarrow [0,\infty), t \mapsto - t^{\alpha_0 - \alpha_{\min}} \log t.\] Clearly, \(g(1) = 0\). Moreover, by substituting \(\hat{t} := - \log t\), we get \[\lim\limits_{t\rightarrow 0} g(t) = \lim\limits_{t\rightarrow 0} \Big( - t^{\alpha_0 - \alpha_{\min}} \log t \Big) = \lim\limits_{\hat{t} \rightarrow \infty} \hat{t} \exp\left(-\hat{t}(\alpha_0 - \alpha_{\min})\right) = 0,\] so \(g\) can be extended to a continuous function on the whole interval \([0,1]\) with \(g(0) = g(1) = 0\). This means that \(g\) takes a finite maximum value \(g_{\max}\) on \([0,1]\). So we have \[\sum\limits_{i\in \mathbb{N}} \Big( - p(i)^{\alpha_{\min}} p(i)^{\alpha_0 - \alpha_{\min}} \log p(i) \Big) \leq g_{\max} \sum\limits_{i\in \mathbb{N}} p(i)^{\alpha_{\min}} < \infty\] by the assumption of the lemma, concluding the proof. ◻
Lemma 21. Let \(P\) and \(D\) be such that (50 ) holds, let \(H_P\) be as defined as in 7 , \(H_U\) as defined in (8 ), \(H_{P,{\alpha}}\) as defined in (10 ), and \(H_{U,{\alpha}}\) as defined in (11 ).
Then \(\alpha\mapsto H_{U,{\alpha}}\) and \(\alpha\mapsto H_{P,{\alpha}}\) are continuously differentiable on \((\alpha_{\min},1) \cup (1,\infty)\). Moreover, we have \[\begin{align} \label{eq:renyi-entropy-basics-output} \lim\limits_{\alpha\rightarrow 1} H_{U,{\alpha}} &= H_U \\ \label{eq:renyi-entropy-basics-joint} \lim\limits_{\alpha\rightarrow 1} H_{P,{\alpha}} &= H_P. \end{align}\] {#eq: sublabel=eq:eq:renyi-entropy-basics-output,eq:eq:renyi-entropy-basics-joint}
Proof. In order to apply Lemma 20, we define functions \(f_U\) and \(f_{P(\cdot | X)}\) as in (?? ), where the pmf \(U\) (respectively \(P(\cdot | X)\)) is substituted for \(p\). Observe that \[H_{U,{\alpha}} = \frac{1}{1-\alpha} \log f_U(\alpha),\] where \(f_U\) is defined by (?? ) with \(p:=U\). Therefore, the continuous differentiability follows immediately from Lemma 20.
The argument for the continuous differentiability of \[H_{P,{\alpha}} = \frac{1}{1-\alpha} \log\mathbb{E} f_{P(\cdot | X)} (\alpha)\] is similar, but we need an additional argument that \[\label{eq:renyi-entropy-basics-intermediate-derivative} \frac{d}{d\alpha} \mathbb{E} f_{P(\cdot | X)} (\alpha) = \mathbb{E} f_{P(\cdot | X)}' (\alpha),\tag{93}\] (and, implicitly, that the derivative on the left-hand side exists) where \(f_{P(\cdot | X)}'\) is given in (?? ). It is sufficient to show that this is the case for all \(\alpha> \alpha_0 > \alpha_{\min}\). We argue this with the criterion [68] for interchangeability of differentiation and expectation, which requires us to show that \(f_{P(\cdot | x)}'\) is a continuous function of \(\alpha\) for fixed \(x\) and that \(\left \lvert f_{P(\cdot | X)}'(\alpha) \right \rvert\) has an upper bound that is uniform in \(\alpha\) and integrable over \(X\). The former clearly follows from Lemma 20, and for the latter, we note the bound (?? ) in Lemma 20 and observe that \[\sum\limits_{y\in \mathbb{N}} P(y| x)^{\alpha_{\min}} = \mathrm{tr}\left( D(x)^{\alpha_{\min}} \right),\] so by the interchangeability of expectation and trace, the integrability follows from the assumption that \(\mathbb{E}D(X)^{\alpha_{\min}} \in \mathcal{T}({\mathcal{H}})\) exists.
In order to show (?? ), we note that \[\lim\limits_{\alpha\rightarrow 1} H_{U,{\alpha}} = - \lim\limits_{\alpha\rightarrow 1} \frac{ \log f_U(\alpha) - \log f_U(1) }{ \alpha- 1 } = (\log \circ f_U)' (1).\] Lemma 20 and the chain rule yield \[(\log \circ f_U)' (1) = \frac{f'_U(1)}{f_U(1)} = H_U.\]
For (?? ), we have \[\lim\limits_{\alpha\rightarrow 1} H_{P,{\alpha}} = - \lim\limits_{\alpha\rightarrow 1} \frac{ \log\mathbb{E} f_{P(\cdot | X)} (\alpha) - \log\mathbb{E} f_{P(\cdot | X)} (1) }{ \alpha- 1 } = g' (1),\] where \(g'\) is the derivative of \[g: \alpha \mapsto \log\mathbb{E} f_{P(\cdot | X)} (\alpha).\] We use (93 ) and the chain rule for differentiation to obtain \[g'(\alpha) = \frac{ \mathbb{E}f'_{P(\cdot | X)}(\alpha) }{ \mathbb{E}f_{P(\cdot | X)}(\alpha) } = \frac{ \mathbb{E}\sum_{y\in \mathbb{N}} P(y| X)^\alpha \log P(y| X) }{ \mathbb{E}\sum_{y\in \mathbb{N}} P(y| X)^\alpha },\] so clearly, \(g'(1) = H_P\), concluding the proof of the lemma. ◻