August 01, 2024
Abstarct
This article introduces operator on operator regression in quantum probability. Here in the regression model, the response and the independent variables are certain operator valued observables, and they are linearly associated with unknown scalar coefficient (denoted by \(\beta\)), and the error is a random operator. In the course of this study, we propose a quantum version of a class of estimators (denoted by \(M\) estimator) of \(\beta\), and the large sample behaviour of those quantum version of the estimators are derived, given the fact that the true model is also linear and the samples are observed eigenvalue pairs of the operator valued observables.
Regression analysis is one of the most well-known toolkit in statistical modelling. More specifically, regression analysis studies the “relationship" between the dependent/response variable and one or more independent/explanatory variables. This”relationship" can be linear, non-linear or even non-parametric as well (see, e.g., [1] and a few relevant references therein), and using various statistical techniques, one may estimate the unknown parameters involved in the “relationship". Recently, the concepts of regression have been extended to functional valued random elements also; for instance, the interested reader may look at [2] and a few relevant references therein regarding function on function regression. However, to the best of our knowledge, regression analysis is an area unexplored in the setting of quantum probability. This work aims to investigate various statistical issues involved in operator on operator regression in quantum probability. We first provide a literature review and then brief the research problem without much technical details in the following.
In the context of other statistical concepts studied in the language of quantum probability theory, recently, [3] extended the concepts of sufficient statistic and Rao-Blackwellization in the framework of quantum probability, and a few years before, [4] overviewed and reviewed the problem of statistical sufficiency in the setting of quantum statistics. They provided sufficient and necessary conditions for sufficiency and established many results from traditional statistical theory in the language of noncommutative geometry, and at the same time, [5] derived the noncommutative version of well-known factorization theorem in traditional Statistics (see, e.g, [6]).
Let us now consider a pair of operator valued observables \((X, Y)\) (see, e.g., [7] for details on observables) with observed \(n\) pairs of eigen values \((\lambda_1, \mu_1), \ldots, (\lambda_{n}, \mu_n)\). Observe that here in the setting of quantum probability, we are observing the eigen values of \((X, Y)\), unlike the observations on the random element \((X, Y)\) as in traditional Statistics/classical Probability. Suppose that unobserved operators \(Y\) and \(X\) are linearly associated in true sense with scalar valued coefficient (see 1 ), and in the sense of regression modelling, the operator valued observables are assumed to be linear with scalar valued coefficient having some operator valued error (see 2 ). In this work, based on \((\lambda_1, \mu_1), \ldots, (\lambda_{n}, \mu_n)\), we estimate (denoted by \(\hat{\beta}_{n}\) as an estimator) the unknown parameter \(\beta\) involved in the model described in 2 . In the course of this study, we establish the connection between the concepts in quantum probability and classical probability (see Section 2.1), and in view of this relation, we investigate the large sample properties \(\hat{\beta}_{n}\) (see Section 3).
The rest of the article is structured as follows. Section 2 describes the preliminaries of the model in terms of quantum probability, and for this problem, a key connection between the quantum probability and classical probability is investigated in Section 2.1, and the problem is reformulated in terms of classical probability in Section 2.2. The large sample properties of the proposed estimator is studied in Section 3, and Section 4 contains a few relevant concluding remarks. Finally, necessary technical details are provided in the Appendix in Section 5.
Let \(\mathcal{H}\) be a real separable Hilbert space with inner-product \(\left\langle \cdot, \cdot \right\rangle_\mathcal{H}\) and norm \(\|\cdot\|_\mathcal{H}\). If \(T\) and \(S\) are two bounded linear operators on \(\mathcal{H}\) related by \(T = \beta S\) for some non-zero real scalar \(\beta\), then given any eigen-value \(\lambda\) of \(S\), we have an eigen-value \(\mu = \beta \lambda\) of \(T\) with the same eigen vector of unit norm. Again, given any eigen-value \(\mu\) of \(T\), we have an eigen-value \(\lambda = \frac{\mu}{\beta}\) of \(S\) with the same eigen vector of unit norm. In light of this fact, one may expect that the linear regression model associating two operator valued observables can be represented by their corresponding eigen values. The research problem is foramlly described in the following.
Let \(X\) and \(Y\) be two compact self-adjoint operator (on \(\mathcal{H}\)) valued observables related by \[\begin{align} \label{modeltrue} Y = \beta_{0} X, \end{align}\tag{1}\] where \(\beta_{0}\in\mathbb{R}\) is unknown. We now consider the operator on operator regression model \[\begin{align} \label{model} Y = \beta X + \epsilon, \end{align}\tag{2}\] where \(\epsilon\) is a random operator valued error, and \(\beta\in\mathbb{R}\) is an unknown constant. In this work, our objective is to propose an estimator (denoted by \(\hat{\beta}_{n}\)) of the unknown \(\beta\) based on the observed eigenvalues of \((X, Y)\). The technical assumptions are stated in Section 3, where the large sample properties of \(\hat{\beta}_{n}\) are studied.
Suppose that \((\lambda_1, \mu_1), \ldots, (\lambda_n, \mu_n)\) are \(n\) pairs of observed eigen-values of \((X, Y)\). Motivated by our earlier observation on deterministic operators \(T\) and \(S\), we assume that each pair \((\lambda_j, \mu_j)\) appear with a common eigen-vector \(v_j\) of unit norm. Here, as mentioned earlier, using the observed \((\lambda_{j}, \mu_{j})_{j = 1}^{n}\), we would like to estimate the unknown parameter \(\beta\). The estimation procedure of \(\beta\) is described in the following.
In order to carry our the estimation of \(\beta\), we first reformulate the problem, from originally stated in terms of \(X\) and \(Y\) in the quantum probability setup to their eigen-value pairs \((\lambda, \mu)\) in the classical probability setup. Next, using the classical probability model thus obtained the problem is reduced to a standard linear regression problem and we can then use classical statistical tools in order to estimate \(\beta\). Section 2.1 studies the eigen decomposition of operator valued observables \(X\) and \(Y\), which is the key concept in reformulating the problem in the setup of classical probability.
Assume that \(X\) and \(Y\) are compact self-adjoint operators on \(\mathcal{H}\) with discrete spectrum given by \(\sigma(X)\) and \(\sigma(Y)\), respectively. As per our discussion above, the eigenspaces corresponding to each feasible eigen-value pairs are the same. Consequently, we have the following spectral decompositions (see [8]): \[X = \sum_{\alpha \in \sigma(X)} \alpha P_\alpha, \quad Y = \sum_{\alpha \in \sigma(Y)} \alpha P_\alpha,\] where the sum is countable, and \(P_\alpha\) denotes the projection onto the eigen-space of any eigen-value \(\alpha\). Here, the eigenvalues are all real and the eigen-spaces corresponding to \(0 \neq \alpha \in \sigma(X)\) are finite-dimensional. Now, given any state \(\phi\) (unit trace), we have the expectations in the setting of quantum probability as
\[E X = tr(\phi X) = \sum_{\alpha \in \sigma(X)} \alpha \, tr(\phi P_\alpha), \quad E Y = tr(\phi Y) = \sum_{\alpha \in \sigma(Y)} \alpha \, tr (\phi P_\alpha).\]
Note that \(tr(\phi P_\alpha) \geq 0, \forall \alpha \in\sigma(X)\). Moreover, \(\sum_{\alpha \in \sigma(X)} tr(\phi P_\alpha) = tr(\rho) = 1\). A similar statement is true when \(X\) is replaced by \(Y\).
Remark 1. Observe that one can think of \(\lambda_j\)’s drawn independently from a discrete probability distribution supported on \(\sigma(X)\) and with the probability mass function \[f_\lambda(\alpha) = tr(\phi P_\alpha) 1_{(\alpha \in \sigma(X))}.\] Similarly, \(\mu_j\)’s are drawn independently from a discrete probability distribution supported on \(\sigma(Y)\) and with the probability mass function \[f_\mu(\alpha) = tr(\phi P_\alpha)1_{(\alpha \in \sigma(Y))}.\] This fact enables us to reformulate the problem in the set up of traditional statistics.
Given \(n\) pairs \((\lambda_1, \mu_1), \ldots, (\lambda_n, \mu_n)\) of observed eigen-values of \((X, Y)\), assume that \(v_1, v_2, \cdots, v_n\) be the corresponding eigen-vectors of unit norm. Then, \[\begin{align} \label{eigenX} X v_{j} = \lambda_{j} v_{j}~for all j = 1, \ldots, n, \end{align}\tag{3}\] where for \(j = 1, \ldots, n\), \(||v_{j}||_\mathcal{H}= 1\), and similarly, we have \[\begin{align} \label{eigenY} Y v_{j} = \mu_{j} v_{j}~for all j = 1, \ldots, n. \end{align}\tag{4}\] Next, it follows from 2 , 3 and 4 that for all \(j = 1, \ldots, n\), \(Y v_{j} = \beta X v_{j} + \epsilon v_{j}\), which yields \[\mu_{j}v_{j} = \beta \lambda_{j}v_{j} + \epsilon v_{j},\] and consequently, \[\label{reduced-model} \mu_j = \left\langle v_j, \mu_jv_j \right\rangle_\mathcal{H} = \left\langle v_j, \beta \lambda_{j}v_{j} + \epsilon v_{j} \right\rangle_\mathcal{H} = \beta \lambda_j + \left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H},\, \forall j = 1, \ldots, n.\tag{5}\]
We now estimate \(\beta\), involved in 5 , based on \((\lambda_{j}, \mu_{j})_{j = 1}^{n}\) in various ways, by assuming conditions on the error terms \(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H},\, j = 1, \ldots, n\). Now, recall Remark 1 that \(\lambda_j\)’s are i.i.d. random variables following a discrete probability distribution supported on \(\sigma(X)\) and with the probability mass function \[f_\lambda(\alpha) = tr(\phi P_\alpha) 1_{(\alpha \in \sigma(X))},\] and \(\mu_j\)’s are i.i.d. random variables following a discrete probability distribution supported on \(\sigma(Y)\) and with the probability mass function \[f_\mu(\alpha) = tr(\phi P_\alpha)1_{(\alpha \in \sigma(Y))}.\]
For the data \((\lambda_{j}, \mu_{j})\) (\(j = 1, \ldots, n\)), the \(M\)-estimator of \(\beta\) (see 5 ), which is denoted by \(\hat{\beta}_{n}\), is defined as \[\begin{align} \label{Mestimation} \hat{\beta}_{n} &=& arg\min_{\beta}\sum\limits_{j = 1}^{n} \rho(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H})\nonumber\\ & = & arg\min_{\beta}\sum\limits_{j = 1}^{n} \rho(\mu_{j} - \beta \lambda_{j})~(uisng \eqref{reduced-model}), \end{align}\tag{6}\] where \(\rho(.)\) is a certain convex function, and few more assumptions will be stated in the subsequent section. Observe that for \(\rho(x) = x^{2}\), \(\hat{\beta}_{n}\) coincides with the well-known least squares estimator (see, e.g., [9], pp.–43) based on the data \((\lambda_{j}, \mu_{j})_{j = 1}^{n}\), and when \(\rho (x) = |x|\), \(\hat{\beta}_{n}\) coincides with the well-known least absolute deviation estimator (see, e.g., [9], pp.–43) based on the data \((\lambda_{j}, \mu_{j})_{j = 1}^{n}\). Besides, it also includes Huber’s estimator with \(\rho(x) = x^{2}1_{(|x|\leq c)}/2 + (c|x| - c^{2}/2)1_{(|x| > c)}\), where \(c> 0\) (see, e.g., [9], p.), \({\cal{L}}^{q}\) regression estimate with \(\rho(x) = |x|^{q}\), where \(q\in [1, 2]\) (see, e.g., [10]) and well-known \(\alpha\)-th regression quantiles with \(\rho(x) = |x| + (2\alpha - 1) x\), where \(\alpha\in (0, 1)\) (see, e.g., [9], p., equation (5.5)). This is one of the reasons to consider the technique of \(M\)-estimation, which includes many well-known estimators of the unknown regression parameter.
In order to study the large sample properties of \(\hat{\beta}_{n}\) defined in 6 , one needs to assume the following technical conditions. (A1) \(\rho(.)\) is a strictly convex function.
(A2) The error random variables \(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H}\) (\(j = 1, \ldots, n\)) are i.i.d., and the common distribution of \(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H}\), which is denoted by \(F\) (depends on joint distribution of \((\lambda, \mu)\)), is such that \(F ({\cal{D}}) = 0\). Here \({\cal{D}}\) is the set of discontinuity points of \(\rho'(.)\), where \(\rho{'}(.)\) denotes the derivative of \(\rho(.)\).
(A3) \(E[\rho{'} (\left\langle v_1, \epsilon v_1 \right\rangle_\mathcal{H} + c)] = ac + o(|c|)\) as \(|c|\rightarrow 0\). Here \(a\) and \(c\) are two some constants.
(A4) \(g(c) : = E[\rho{'} (\left\langle v_1, \epsilon v_1 \right\rangle_\mathcal{H} + c) - \rho{'} (\left\langle v_1, \epsilon v_1 \right\rangle_\mathcal{H})]\) exists for some \(c\in (-\delta, \delta)\), where \(\delta > 0\) is a some constant. Moreover, \(g(.)\) is a continuous function at \(0\).
(A5) \(E[\{\rho{'} (\left\langle v_1, \epsilon v_1 \right\rangle_\mathcal{H})\}^{2}] : = D > 0\).
(A6) \(S_{n} : = \sum\limits_{j = 1}^{n} \lambda_{j}^{2} > 0\) for some \(n\geq n_{0}\) and \[d_{n}^{2} : = \max_{1\leq j \leq n}\frac{\lambda_{j}^{2}}{S_{n}}\rightarrow 0~~{as}~~n\rightarrow\infty .\]
Remark 2. Condition (A1) is required to have unique minima of the equation described in 6 , and it is a common assumption across the literature of M-estimation (see, e.g., [9]) in traditional Statistics. Condition (A2) indicates certain smoothness of the function \(\rho\) along with the fact that the error terms (in the forms of inner product) are i.i.d. random variables. Conditions (A3) and (A4) imply the existence of the derivative of the criterion function of \(M\) estimation in some neighbourhood of \(0\), which is necessary to solve minimization problem involved in \(M\) estimation. Next, Condition (A5) imply that the \(M\)-estimator \(\hat{\beta}\) (see 6 ), after appropriate normalization, will weakly converge to a non-degenerate random variable (see the statement of Theorem 2 below, and finally, Condition (A6) is required to have the asymptotic normality of \(\hat{\beta}_{n}\), and from statistical point of view, one can interpret as the variation explained by any eigen value will not be a dominating factor.
Now, we state the consistency and the asymptotic normality results associated with \(\hat{\beta}_{n}\).
Theorem 1. Under (A1)–(A6), \(\hat{\beta}_{n}\stackrel{p}\rightarrow \beta\) as \(n\rightarrow\infty\). Here \(\stackrel{p}\rightarrow\) denotes the convergence in probability, and \(\hat{\beta}_{n}\) and \(\beta\) are the same as defined in 6 and 5 , respectively.
Theorem 2. Under (A1)–(A6), \[T_{n}^{-1/2}K_{n} (\hat{\beta}_{n} - \beta)\stackrel{w}\rightarrow Z,\] where \(Z\) is a random variable associated with standard normal distribution, and \(\stackrel{w}\rightarrow\) denotes the convergence in distribution. Here \(T_{n} = D S_{n}\) and \(K_{n} = a S_{n}\), where \(D\), \(a\) and \(S_{n}\) are the same as defined in (A5), (A3) and (A6), respectively.
Remark 3. In the set up of quantum probability, the observations \((\lambda_{j}, \mu_{j})_{j = 1}^{n}\) are \(n\) eigen value pairs of \((X, Y)\) unlike the classical probability, where the observations are \((X_{j}, Y_{j})_{j = 1}^{n}\) obtained from the distribution of \((X, Y)\). Despite this issue, Theorems 1 and 2 assert that \(\hat{\beta}_{n}\) is a consistent estimator of \(\beta\), and after appropriate normalization, \((\hat{\beta}_{n} - \beta)\) converges weakly to a standard normal random variable.
In this work, observe that the true model (see 1 ) is also linear with scalar valued coefficient, i.e., in other words, we work on specified model. However, the study will be entirely different if the model becomes mis-specified, which is of great interest in many statistical problem (see, e.g., [11] and a few relevant references therein). Investigating this issue may be of interest to future research. Besides, we have assumed our operators to be compact and self-adjoint. The classical analog would be to assume that the real random variables are discrete. It will be of interest when the involved operators are just self-adjoint. Then the spectrum may be continuous and in this case, our technique would not suffice. Moreover, in this work as the parameter \(\beta\) is scalar, the operators involved here are apriori simultaneously disgonalizable and therefore they can be measured simultaneously.
In the course of this study, it is assumed that the error random variables \(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H}\) (\(j = 1, \ldots, n\)) involved in 5 are i.i.d., and the assertions in the main results (i.e., Theorems 1 and 2) rely on this assumption. However, this assumption can be weakened by following the approach considered in [12]. In this work, we did not dig in this issue as establishing the large sample properties of \(\hat{\beta}_{n}\) is not the main theme of the work, and therefore, we kept the assumption simpler so that the readers can appreciate the notion on quantum probability conveniently for operator on operator regression.
Subhra Sankar Dhar gratefully acknowledges his core research grant (CRG/2022/001489), Government of India. Suprio Bhar gratefully acknowledges the Matrics grant MTR/2021/000517 from the Science and Engineering Research Board (Department of Science & Technology, Government of India). Soumalya Joardar acknowledges support from the SERB MATRICS grant (MTR 2022/000515).
The arguments in the proof are parallel to the proof of Theorem 2.2 in [13]. Without loss of generality, we assume that \(\beta = 0\). It is now enough to show that \[P[|\hat{\beta}_{n}| \geq a_{n}]\rightarrow 0\] as \(n\rightarrow\infty\), where \(\{a_{n}\}_{n\geq 1}\) is such that \(a_{n}\rightarrow\infty\) as \(n\rightarrow\infty\). Observe that it follows from (3.11) in [13] that there exists a sequence \(a_{n}^{'}\) such that \(a_{n}^{'}\rightarrow\infty\) as \(n\rightarrow\infty\) and \(a_{n}^{'}\leq a_{n}\) for all \(n\in\mathbb{N}\). Moreover, in view of it, we also have \[\begin{align} \label{ineq1} \sup_{|\beta|\leq a_{n}^{'}}\left|\sum\limits_{j = 1}^{n}[\rho(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H} - \beta\lambda_{j}) - \rho(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H}) + \beta\lambda_{j}\rho{'}(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H})] - \frac{\beta^{2}K_{n}}{2} \right|\stackrel{p}\rightarrow 0 \end{align}\tag{7}\] as \(n\rightarrow\infty\), where \(K_{n}\) is the same as defined in the statement of Theorem 2.
Further, when \(|\beta| = a_{n}^{'}\) for some \(n\in\mathbb{N}\), then observe that \[\begin{align} \label{ineq2} \frac{\beta^{2}K_{n}}{2}\geq\frac{a_{n}^{'2}\max\limits_{1\leq j\leq n}\lambda_{j}}{2}, \end{align}\tag{8}\] which follows from the definition of of \(K_{n}\) defined in the statement of Theorem 2 and \(S_{n}\) defined in (A6) before Remark 2. Moreover, observe that using (A4) on 7 , we have \[\begin{align} \label{ineq3} \sum\limits_{j = 1}^{n}\lambda_{j} \rho{'}(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H}) = O_{p}(1) , \end{align}\tag{9}\] and hence, \[\begin{align} \label{ineq4} \sup_{|\beta|\leq a_{n}^{'}}\left|\beta\sum\limits_{j = 1}^{n}\lambda_{j} \rho{'}(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H})\right| = O_{p}(a_{n}^{'}). \end{align}\tag{10}\]
Therefore, using 7 , 8 , 9 and 10 , we have \[\begin{align} \label{ineq5} P\left(\inf_{|\beta| = a_{n}^{'}}\sum\limits_{j = 1}^{n}\rho(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H} - \beta\lambda_{j}) - \rho(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H}) \leq 0\right)\rightarrow 0 \end{align}\tag{11}\] as \(n\rightarrow\infty\), and next, using (A1), we have \[\begin{align} \label{ineq6} P\left(\inf_{|\beta| \geq a_{n}^{'}}\sum\limits_{j = 1}^{n}\rho(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H} - \beta\lambda_{j}) - \rho(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H}) \leq 0\right)\rightarrow 0 \end{align}\tag{12}\] as \(n\rightarrow\infty\).
Finally, by the definition of \(\hat{\beta}_{n}\) in 6 , we have \[\begin{align} \label{ineq7} P(|\hat{\beta}_{n}|\geq a_{n}^{'})\rightarrow 0 \end{align}\tag{13}\] as \(n\rightarrow\infty\). As \(a_{n}^{'}\leq a_{n}\) for all \(n\in\mathbb{N}\), the proof follows. \(\Box\)
Without loss of generality, we here consider \(\beta = 0\). First of all, using (A6) and in view of Lindeberg Feller CLT (see [9]), we have \[\begin{align} \label{inweakcon} \sum\limits_{j = 1}^{n} \lambda_{j} \rho{'}(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H}) : = K_{n}\widetilde{\beta}_{n}\stackrel{w}\rightarrow Z, \end{align}\tag{14}\] as \(n\rightarrow\infty\), where \(Z\) is a random variable associated with standard normal distribution, \(K_{n}\) is the same as defined in the statement of Theorem 2, and ‘\(\stackrel{w}\rightarrow\)’ denotes weak convergence. Hence, in order to prove this theorem, it is enough to show that \[\begin{align} \label{rqconprob} \hat{\beta}_{n} -\widetilde{\beta}_{n}\stackrel{p}\rightarrow 0 \end{align}\tag{15}\] as \(n\rightarrow\infty\), where \(\widetilde{\beta}_{n}\) and \(\hat{\beta}_{n}\) are the same as defined in 14 and 6 , respectively. Observe that using 7 and the definition of \(\widetilde{\beta}_{n}\) in 14 , we have \[\begin{align} \label{ergzpskx} \sum\limits_{j = 1}^{n}[\rho(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H} - \widetilde{\beta}_{n}\lambda_{j}) - \rho(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H})] + \frac{\beta\widetilde{\beta}_{n} K_{n}}{2}\stackrel{p}\rightarrow 0 \end{align}\tag{16}\] as \(n\rightarrow\infty\), and for some sequence \(\{b_{n}\}_{n\geq 1}\) such that \(b_{n}\rightarrow\infty\) as \(n\rightarrow\infty\) and for some arbitrary constant \(\delta > 0\), we have \[\begin{align} \label{ineq8} \sup_{|\beta|\leq b_{n} + \delta} \left|\sum\limits_{j = 1}^{n}[\rho(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H} - \beta\lambda_{j}) - \rho(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H})] + \beta\widetilde{\beta}_{n} K_{n} - \frac{\beta^{2}K_{n}}{2}\right|\stackrel{p}\rightarrow 0 \end{align}\tag{17}\] as \(n\rightarrow\infty\). Now, it follows from 13 and 17 , we have \[\begin{align} \label{ineq9} \sup_{|\beta - \widetilde{\beta}_{n}| = \delta} \left|\sum\limits_{j = 1}^{n}[\rho(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H} - \beta\lambda_{j}) - \rho(\left\langle v_j, \epsilon v_j \right\rangle_\mathcal{H} - \widetilde{\beta}_{n}\lambda_{j})] - \frac{(\beta - \widetilde{\beta}_{n})^{2}K_{n}}{2}\right|\stackrel{p}\rightarrow 0 \end{align}\tag{18}\] as \(n\rightarrow\infty\). Moreover, observe that when \(|\beta - \widetilde{\beta}_{n}| = \delta\), we have \[\begin{align} \label{ineq10} (\beta - \widetilde{\beta}_{n})^{2}K_{n}\geq\left(\min_{1\leq j \leq n}\lambda_{j}\right)\delta^{2}. \end{align}\tag{19}\] Therefore, using 18 and 19 , we have \[\begin{align} P(|\beta - \widetilde{\beta}_{n}|> \delta)\rightarrow 0 \end{align}\] as \(n\rightarrow\infty\), where \(\delta > 0\) is an arbitrary constant. It completes the proof. \(\Box\)