Abstract

THIS PAPER IS ELIGIBLE FOR THE STUDENT PAPER AWARD. We study the growth of the support size of the capacity-achieving input distribution for the amplitude-constrained additive white Gaussian noise (AWGN) channel. While it is known since Smith (1971) that the optimal input is discrete with finitely many mass points, tight bounds on the number of support points \(K(A)\) as the amplitude constraint \(A\) increases remain open. Building on recent work by Dytso et al. (2019) and Mattingly et al. (2018), we derive a new analytical lower bound showing that \(K(A)\) grows super-linearly in \(A\). Our approach combines total-variation convergence of the output distribution to the uniform law with quantitative limits on Gaussian mixture approximation.

1 Introduction↩︎

The amplitude-constrained AWGN channel \[Y = X + Z, \qquad Z \sim \mathcal{N}(0,1),\] is a fundamental model where the input \(X\) is restricted to \(|X|\le A\). Its capacity is \[C(A) = \sup_{P_X:|X|\le A} I(X;Y).\] In the seminal work by Smith [1], it was proved that the capacity-achieving input \(P_X^\star = \pi_A^\star\) is discrete with finitely many mass points \(K_A = |\operatorname{supp}(\pi_A^\star)|\).

The asymptotic behavior of \(K_A\) as \(A\!\to\!\infty\) remains an active topic: Dytso et al. [2] established \(\sqrt{1+\frac{2A^2}{\pi e}} \le K_A \le a_2 A^2 + a_1 A + a_0\) and conjectured \(K_A=\Theta(A)\) as \(A\to\infty\). Numerical simulations [3] suggested \(K_A\propto A^{4/3}\) as \(A\to\infty\), but no rigorous proof has been provided. Moreover, the empirical scaling law is questionable due to the finite-sample nature of the simulations, and the accuracy of the numerical simulations may lead to incorrect \(K\).

In this paper, we prove a stronger analytic lower bound:

Theorem 1. As \(A \to \infty\), the support size of the optimal input grows super-linearly: \[\lim_{A\to\infty} \frac{K_A}{A} = \infty.\]

Contributions:

We show that the output density induced by \(\pi_A^\star\) converges in total variation to that of a uniform input over \([-A,A]\).
We establish a quantitative lower bound on how well any \(K\)-point Gaussian mixture can approximate the uniform distribution, improving the result of [4].
Combining the two yields the super-linear growth law (1).

Organization: Section 2 defines notation. Sections 3–4 present the two key ingredients, followed by the proof of Theorem 1 in Section 5. Section 6 concludes.

2 Preliminaries and Notation↩︎

Let \(\phi(x) = \tfrac{1}{\sqrt{2\pi}} e^{-x^2/2}\) be the standard Gaussian density. For any input distribution \(\pi\), define the output density \(f_\pi = \phi * \pi\). The total variation distance between two densities \(p,q\) is \[\operatorname{TV}(p,q) = \frac{1}{2} \int_{\mathbb{R}} |p(y)-q(y)| dy.\] We denote by \(\operatorname{Unif}([-A,A])\) the uniform distribution on \([-A,A]\).

For brevity, write \(f_{\text{unif,A}} = \phi * \operatorname{Unif}([-A,A])\) and \(f_A^\star = \phi * \pi_A^\star\).

3 Convergence of the Capacity-Achieving Output↩︎

We first show that the output distribution approaches the uniform distribution as the amplitude constraint grows.

Theorem 2. As \(A\to\infty\), \[\operatorname{TV}(f_A^\star, f_{\text{unif,A}}) \to 0.\]

This result may be understood intuitively as follows. The capacity achieving output distribution \(f_A^\star\) optimizes \(I(X;Y) = H(Y) - H(Y|X) = H(Y) - \log(\sqrt{2\pi e})\); therefore, \(f_A^\star\) is also the output distribution with maximum entropy. Moreover, when \(A\) is large, most of the mass of \(f_A^\star\) lies within the bulk interior of \([-A,A]\). For such a distribution to have maximum entropy, it should be close to the uniform distribution in the bulk interior of \([-A,A]\). Therefore, the \(TV\) distance between \(f_A^\star\) and \(f_{\text{unif,A}}\) is small.

4 Approximation Limits for Finite Gaussian Mixtures↩︎

Consider any input distribution \(\pi\) supported on \(K\) points, and denote \(f_\pi = \phi * \pi\). The next result quantifies how close such a mixture can be to the uniform output.

Theorem 3. For every \(A>0\) and any discrete \(\pi\) with \(K\) mass points, \[\operatorname{TV}\!\left(f_\pi, f_{\text{unif,A}}\right) \ge \frac{1}{4}\exp\!\Big(-2\pi^2\frac{K^2}{A^2}\Big).\]

This result is an improvement on Theorem 6 and Theorem 7 of [4]: their Theorem \(6\) fails to lower bound \(\operatorname{TV}\!\left(f_\pi, f_{\text{unif,A}}\right)\) away from zero when \(K=\Theta(A)\) and \(A\to\infty\). Their Theorem \(7\) lower bounds the \(\operatorname{TV}\) away from zero only when \(A\ge\sqrt{2\pi} K\). In comparison, our Theorem 3 lower bounds the \(\operatorname{TV}\) from 0 whenever \(K=O(A)\), providing a sharp characterization of the inconsistency regime of finite approximation by Gaussian. The proof techniques for Theorem 3 is quite similar to those in [4], and a concise proof is provided in the appendix.

5 Proof of the Main Result↩︎

Combining Theorems 2 and 3 yields the desired scaling law.

Suppose by contradiction that \(K_A = O(A)\). Then as \(A \to \infty\), \(K_A / A \le c\) for some constant \(c\). Theorem 3 gives \(\operatorname{TV}(f_A^\star,f_{\text{unif,A}})\ge \tfrac14 e^{-2\pi^2 c^2}\), a fixed positive number, contradicting Theorem 2 that \(\lim_{A\to\infty} \operatorname{TV}(f_A^\star,f_{\text{unif,A}})\to0\). By contradiction, we have that \(K_A \gg A\) as \(A \to \infty\).

6 Conclusion↩︎

We have proved that the number of mass points \(K_A\) in the capacity-achieving input for the amplitude-constrained AWGN channel grows super-linearly with the amplitude \(A\). This rigorously disproves the standing conjecture that \(K_A = \Theta(A)\).

This result reveals that the optimal input becomes dense at a super-linear rate, but the precise scaling law remains an open and challenging question. Numerical simulations in [3] suggested \(K_A \propto A^{4/3}\). Verifying this conjecture, or finding the true asymptotic behavior, is a key direction for future work. Further research could also investigate the asymptotic structure of the support points and their associated probabilities.

7 Acknowledgment↩︎

The author is deeply grateful to Professor Yihong Wu for bringing this problem to the author’s attention and for numerous valuable discussions and suggestions throughout the course of this research.

In this appendix, we provide proofs for Theorems 2 and 3.

The following is a key technical result that will be used in the proof of Theorem 2. It appears as Theorem 4.16 in [5].

Lemma 1. For \(0<B<A\), as \(A \to \infty\) and \(A-B \to \infty\), \[\begin{align} \lim_{A \to \infty, A-B \to \infty} \sup_{|x| \le B} |2Af_A^\star(x) - 1| = 0 \label{eq:key} \end{align}\qquad{(1)}\]

In addition to the above lemma, we also need the following similar but much simpler result.

Lemma 2. For \(0<B<A\), as \(A \to \infty\) and \(A-B \to \infty\), \[\begin{align} \lim_{A \to \infty, A-B \to \infty} \sup_{|x| \le B} |2Af_{\text{unif,A}}(x) - 1| = 0 \label{eq:key2} \end{align}\qquad{(2)}\] Further more, combining with Lemma 1, we have that \[\begin{align} \lim_{A \to \infty, A-B \to \infty} \sup_{|x|\le B} |2Af_{\text{unif,A}}(x) - 2Af_A^\star(x)| = 0 \label{eq:key3} \end{align}\qquad{(3)}\]

We know that \(f_{\text{unif,A}}(x) = \phi * \operatorname{Unif}([-A,A])(x)\) has a simple analytical expression: \[\begin{align} f_{\text{unif,A}}(x) = \frac{1}{2A} (\Phi(A-x) - \Phi(-A-x)) \end{align}\] We know that for \(|x|\le B\), \(2A f_{\text{unif,A}}(x)\) has the following inequality: \[\Phi(A+B) - \Phi(-A-B) \ge \Phi(A-x) - \Phi(-A-x) \ge \Phi(A-B) - \Phi(-A+B)\] Both the LHS and the RHS converge to 1 as \(A \to \infty\) and \(A-B \to \infty\). By the squeeze theorem, we have proven this lemma.

With this lemma, we can prove Theorem 2 as follows.

Choose an arbitrary \(A-B\) that grows slower than \(A\), for example \(A-B = A^{1/2}\). We can break the integral \[\begin{align} \label{eq:tv-integral} \operatorname{TV}(f_A^\star, f_{\text{unif,A}}) = \frac{1}{2} \int_{-\infty}^{\infty} |f_A^\star(x) - f_{\text{unif,A}}(x)| dx \end{align}\tag{1}\] into three parts: \(\int_{-\infty}^{-B}, \int_{-B}^{B}, \int_{B}^{\infty}\). For the middle part, applying Lemma 2, \[\begin{align} \int_{-B}^{B} |f_A^\star(x) - f_{\text{unif,A}}(x)| dx &\le \frac{B}{A} \sup_{|x|\le B} |2Af_A^\star(x) - 2Af_{\text{unif,A}}(x)| \\ &\le \sup_{|x|\le B} |2Af_A^\star(x) - 2Af_{\text{unif,A}}(x)| \to 0 \end{align}\] where the last convergence is due to ?? . The left hand side of the above is a positive quantity, upper bounded by a term that converges to 0, therefore it must also converge to 0: \(\int_{-B}^{B} |f_A^\star(x) - f_{\text{unif,A}}(x)| dx \to 0\). Similarly, we can prove that \[\begin{align} \int_{-B}^{B} \Big|f_A^\star(x) - \frac{1}{2A}\Big| dx \to 0 \end{align}\] This further implies that \[\begin{align} \int_{-B}^B f_A^\star(x) dx &= \int_{-B}^B \frac{1}{2A} + f_A^\star(x) - \frac{1}{2A} dx \\ &\ge \int_{-B}^B \frac{1}{2A} - \Big|f_A^\star(x) - \frac{1}{2A}\Big| dx \\ &= \frac{B}{A} - \int_{-B}^B \Big|f_A^\star(x) - \frac{1}{2A}\Big| dx \to 1 \end{align}\] The LHS of the above is a quantity that is strictly smaller than 1, which is lower bounded by a term that converges to 1; therefore, it must also converge to 1: \[\begin{align} \int_{-B}^{B} f_A^\star(x) dx \to 1 \end{align}\] This implies that \[\begin{align} \int_{-\infty}^{-B} f_A^\star(x) dx \to 0, \quad \int_{B}^{\infty} f_A^\star(x) dx \to 0 \end{align}\]

Now, we have enough information to handle the left and right parts of the integral in 1 . For the left part of the integral \(\int_{-\infty}^{-B}\), we have: \[\begin{align} \int_{-\infty}^{-B} |f_A^\star(x) - f_{\text{unif,A}}(x)| dx &\le \int_{-\infty}^{-B} f_A^\star(x) + f_{\text{unif,A}}(x) dx \to 0 \\ \end{align}\] Similarly, the right part of the integral also converges to 0.

Therefore, we have shown that \(\operatorname{TV}(f_A^\star, f_{\text{unif,A}}) \to 0\) as \(A \to \infty\).

Before we proceed to prove Theorem 3, let us first introduce some preliminary notation. The \(k\)-th trigonometric moment of a random variable \(X\) is defined as \(t_k(X) = \mathbb{E} [\exp(ikX)]\). Furthermore, we can define the \(n\)-th trigonometric moment matrix of a random variable \(X\) as follows: \[T_n(X) = \begin{pmatrix} t_0(X) & t_1(X) & \cdots & t_{n-1}(X) \\ t_{-1}(X) & t_0(X) & \cdots & t_{n-2}(X) \\ \vdots & \vdots & \ddots & \vdots \\ t_{-n}(X) & t_{-n+1}(X) & \cdots & t_{0}(X) \end{pmatrix}\] which is an \((n+1) \times (n+1)\) Hermitian matrix.

This proof adapts the trigonometric moment method of [4]. Our key improvement is simple, which is simply considering a higher order trigonometric moment matrix. For completeness, we provide a concise version of the full proof here. For full details, please refer to the original proof in [4].

Assume we have the following random variables: \(Z\sim \mathcal{N}(0,1)\), \(X\sim \operatorname{Unif}([-A,A])\), \(X_K \sim \pi\), \(Y = X + Z\), \(Y_K = X_K + Z\), where \(\pi\) is a discrete distribution with \(K\) mass points. Then \(Y\) has the law of \(f_{\text{unif,A}}\) and \(Y_K\) has the law of \(f_\pi\).

We can lower bound the TV distance between \(f_{\text{unif,A}}\) and \(f_\pi\) using the variational representation: \[\begin{align} \label{eq:tv-variational} \operatorname{TV}(f_{\text{unif,A}}, f_\pi) = \frac{1}{2} \sup_{|g|_\infty \le 1} \Big|\mathbb{E} \left[ g(Y) \right] - \mathbb{E} \left[ g(Y_K) \right]\Big| \end{align}\tag{2}\] We choose the function \(g\) to be \(g_k(t) = \exp(i\delta kt)\), where \(\delta=\frac{\pi}{A}\) and \(k\in\mathbb{Z}\). Then we have: \[\begin{align} \mathbb{E} \left[ g_k(Y) \right] &= \mathbb{E} \left[ g_k(X+Z) \right] \\ &= \mathbb{E} \left[ g_k(X)\right]\mathbb{E} \left[ g_k(Z)\right] \\ &= \exp(-k^2\delta^2/2) t_k(\delta X) \\ &= \exp(-k^2\delta^2/2) \mathbb{I}\left\{k=0\right\} \tag{3} \\ \mathbb{E} \left[ g_k(Y_K) \right] &= \exp(-k^2\delta^2/2) t_k(\delta X_K) \tag{4} \end{align}\] where \(\mathbb{I}\left\{k=0\right\}\) is the indicator function that equals 1 if \(k=0\) and 0 otherwise.

Restricting \(k\in[-2K,2K]\), we obtain the following bound by substituting 3 and 4 into 2 : \[\begin{align} \operatorname{TV}(f_{\text{unif,A}}, f_\pi) \ge \sup_{|k| \le 2K} \frac{1}{2e^{k^2\delta^2/2}} \Big| t_k(\delta X) - t_k(\delta X_K) \Big| \\ \ge \frac{1}{2e^{2K^2\delta^2}} \| T_{2K}(\delta X) - T_{2K}(\delta X_K)\|_{\max} \\ \ge \frac{1}{2e^{2K^2\delta^2}(2K+1)} \| T_{2K}(\delta X) - T_{2K}(\delta X_K)\|_{\text{fro}} \\ \end{align}\] where \(\| \cdot \|_{\max}\) is the element-wise maximum norm of a matrix, and \(\| \cdot \|_{\text{fro}}\) is the Frobenius norm of a matrix.

Note that \(T_{2K}(\delta X)\) is the identity matrix, while \(T_{2K}(\delta X_K)\) is a rank-\(K\) matrix because \(\pi\) is a discrete distribution with \(K\) mass points. Therefore, by the Eckart-Young-Mirsky theorem, we have \(\| T_{2K}(\delta X) - T_{2K}(\delta X_K)\|_{\text{fro}} \ge K+1\). Substituting this into the above inequality, we obtain: \[\begin{align} \operatorname{TV}(f_{\text{unif,A}}, f_\pi) \ge \frac{K+1}{2(2K+1)} e^{-2K^2\delta^2}\\ \ge \frac{1}{4}\exp\left(-2\pi^2 \frac{K^2}{A^2}\right) \end{align}\]

References↩︎

[1]

J. G. Smith, “The Information Capacity of Amplitude- and Variance-Constrained Scalar Gaussian Channels.”

[2]

A. Dytso, S. Yagli, H. V. Poor, and S. Shamai, “The Capacity Achieving Distribution for the Amplitude Constrained Additive Gaussian Channel: An Upper Bound on the Number of Mass Points,” https://arxiv.org/abs/1901.03264v4, Jan. 2019.

[3]

H. H. Mattingly, M. K. Transtrum, M. C. Abbott, and B. B. Machta, “Maximizing the information learned from finite data selects a simple model,” Feb. 2018.

[4]

Y. Ma, Y. Wu, and P. Yang, “On the best approximation by finite Gaussian mixtures,” Apr. 2024.

[5]

Z. Zhang, “Discrete Noneinformative Priors,” Ph.D. dissertation, Yale University, New Haven, Nov. 1994.

Super-Linear Growth of the Capacity-Achieving Input Support for the Amplitude-Constrained AWGN Channel