Abstract

We revisit tail-index regressions. For linear specifications, we find that the usual full-rank condition can fail because conditioning on extreme outcomes causes regressors to degenerate to constants. More generally, the conditional distribution of the covariates in the tails concentrates on the values at which the tail index is minimized. Away from those points, the conditional density tends to zero. For local nonparametric tail index regression, the convergence rate can be very slow. We conclude with practical suggestions for applied work.

1 Introduction↩︎

Tail index estimation has been widely applied in modeling extreme events, such as natural environmental and financial market extremes; see, for example, [1] and references within. Building on the seminal paper of [2], subsequent work has sought to model the tail index as a function of covariates, either parametrically (e.g., [3] and [4]), semiparametrically (e.g., [5]), or nonparametrically (e.g., [6]). The literature is big, we just name a few important and representative developments. In this paper, we caution some issue with tail index regressions. We emphasize that our purpose is not to criticize the existing literature, as these works are valuable and important contributions. Rather, our goal is to highlight issues that researchers should be aware of when applying tail index regressions.

To set the stage, suppose we observe \((Y,X)\), where \(X\) enters the tail index function of \(Y\). Following the literature, we assume \[\bar{F}(y\mid x)=1-F(y\mid x)=y^{-\alpha(x)}L(y;x),\quad\text{for }y\geq w,\label{eq:surviveCDF}\tag{1}\] where \(F(y\mid x)\) is the cumulative distribution function of \(Y\) conditional on \(X=x,\) and \(L(y;x)\) is a slowly varying function that satisfies \(L(yt;x)/L(y;x)\to1\) for any \(t>0\) as \(y\to\infty\). The inclusion of \(L(y;x)\) makes the distributional assumption in 1 much more general than formulations without it; see [1] and [4] for discussion.

Because of the presence of \(L(y;x)\), the maximum likelihood estimation based on the approximation, \(\bar{F}(y\mid x)\approx Cy^{-\alpha(x)}\textrm{ for some }C>0,\) contains a bias term, which vanishes only as \(w\to\infty\). Therefore, as the sample size grows, one must take a larger threshold \(w\) and restrict the analysis to samples with \(Y>w\) in order to eliminate the bias asymptotically.

We find that the full rank condition, which is regularly assumed in regression analysis, is likely to fail asymptotically in tail index regressions. A similar phenomenon arises in semiparametric and nonparametric tail index regressions. The intuition follows from Bayes’ theorem: conditional on large values of \(Y\), the distribution of \(X\) degenerates toward the point \(x\) where \(\alpha(x)\) attains its minimum. Since only observations with \(Y>w\) are retained and the threshold \(w\to\infty\) as the sample size grows, the effective variation in \(X\) diminishes, leading to a failure of the full rank condition.

Specifically, the main results of this paper are as follows:

In the parametric case where \(\alpha(X)=\exp\left(X'\theta^{*}\right)\) and the usage of the \(\exp\) function is to ensure the index is always positive, we show that the population Gram matrix \(\bar{\varSigma}=\mathbb{E}(XX'\mid Y>w)\) becomes nearly singular as \(w\) grows large, under fairly general conditions, even when \(\mathbb{E}(XX')\) is nonsingular. As a consequence, the convergence rate of the parametric estimator in [4] is slower than originally anticipated. We provide conditions under which the nearly singular \(\bar{\varSigma}\) can still be accommodated, and establish the corresponding asymptotic properties.
In semiparametric cases where \(\alpha(X)\) is a general smooth function of \(X\), we show that the conditional density \(f(x\mid Y>w)\) converges to zero for all \(x\) except at points where \(\alpha(x)\) attains its minimum, even though the unconditional density \(f(x)\) is bounded away from zero. This property may undermine the estimation of the nonparametric component in semiparametric models.
In nonparametric settings, we find that the convergence can be very slow, even slower than that of standard nonparametric mean regression, affecting not only the variance but also the bias. When a large bandwidth is chosen for local smoothing, the issues described in points 1 and 2 may also arise, and the tail index is likely underestimated.

The rest of the paper is structured as follows. Section 2 formalizes the results. Section 3 concludes. All proofs and technical lemmas are collected in the Appendix.

Notation. \(\log\left(\cdot\right)\) stands for the natural logarithm. For a vector \(x\), \(\left\Vert x\right\Vert\) denotes the \(L_{2}\) norm, and for a matrix \(A,\) \(\left\Vert A\right\Vert\) denotes the Frobenius norm. \(\rho_{\min}(\cdot)\) and \(\rho_{\max}(\cdot)\) denote the minimum and the maximum eigenvalues of a matrix, respectively. \(\mathbb{I}_{p}\) denotes the \(p\times p\) identity matrix. For a positive definite matrix \(A=LL'\), we write \(A^{1/2}=L\). For the deterministic series \(\{a_{n},b_{n}\}_{n=1}^{\infty}\), we denote \(a_{n}\propto b_{n}\) if \(0<C_{1}\leq\liminf_{n\rightarrow\infty}\left\vert a_{n}/b_{n}\right\vert \leq\limsup_{n\rightarrow\infty}\left\vert a_{n}/b_{n}\right\vert \leq C_{2}<\infty\) for some constants \(C_{1}\) and \(C_{2}\) , \(a_{n}\ll b_{n}\) if \(a_{n}=o(b_{n})\), and \(a_{n}\gg b_{n}\) if \(b_{n}\ll a_{n}\). \(\overset{P}{\rightarrow}\) and \(\overset{d}{\rightarrow}\) denote convergence in probability and distribution, respectively. \(C\) denotes some generic positive constants that may vary from line to line.

2 The Model and the Issue↩︎

2.1 The Linear Model↩︎

[4] assumed that \(\alpha(X)=\exp\left(X'\theta^{*}\right)\), and observations are \((x_{i},y_{i}),\,i=1,2,\ldots,n\), i.i.d. across \(n\). For finite samples, the truncation parameter is allowed to depend on \(n\), denoted by \(w_{n}\). They proposed estimating \(\theta^{*}\) by minimizing the approximate negative log-likelihood function: \[\hat{\theta}=\arg\min_{\theta}\sum_{i=1}^{n}\Bigl\{\exp(x_{i}'\theta)\,\log\!\left(\frac{y_{i}}{w_{n}}\right)-x_{i}'\theta\Bigr\}\,I(y_{i}>w_{n}),\] where \(I(\cdot)\) is the indicator function.

A key condition in [4] for identification of \(\theta^{*}\) is that the Gram matrix \[\hat{\varSigma}_{w_{n}}=\frac{1}{n_{0}}\sum_{i=1}^{n}x_{i}x_{i}'I(y_{i}>w_{n}),\quad\text{with }n_{0}=\sum_{i=1}^{n}I(y_{i}>w_{n}),\] is non-singular. When deriving the asymptotic properties of \(\hat{\theta}\), they implicitly assume that the minimum eigenvalue of \(\hat{\varSigma}_{w_{n}}\) is uniformly bounded away from zero as \(n\rightarrow\infty\), so that \[\hat{\theta}-\theta^{*}=O_{P}(n_{0}^{-1/2}),\] see, for example, their proof of Theorem 4.

We will show that under fairly general conditions, the minimum eigenvalue of the population counterpart \[\bar{\varSigma}_{w_{n}}=\mathbb{E}\left(\left.XX'\right|Y>w_{n}\right)\label{eq:Sigma95w95bar}\tag{2}\] converges to zero even if \(\mathbb{E}(XX')\) is non-singular. That is, \(\bar{\varSigma}_{w_{n}}\) becomes nearly singular as \(w_{n}\) grows.

2.2 The Issue of the Rank Condition↩︎

We start with the simple regression case where there is only one regressor along with the intercept, \(X=(1,X_{1})\). To simplify notation, let \[\alpha(X)=\exp\left(X_{1}\right),\label{eq:alpha}\tag{3}\] where the intercept term is 0, and \(X_{1}\) has a compact support on \(\left[\underline{u}_{x},\bar{u}_{x}\right]\) with \(\bar{u}_{x}>\underline{u}_{x}\). Note that 3 is equivalent to \[\alpha(X)=\exp\left(\theta_{0}^{*}+\theta_{1}^{*}\tilde{X}_{1}\right),\] where \(\theta_{0}^{*}=\underline{u}_{x},\theta_{1}^{*}=\bar{u}_{x}-\underline{u}_{x}\) and \(\tilde{X}_{1}=(X_{1}-\underline{u}_{x})\left/\left(\bar{u}_{x}-\underline{u}_{x}\right)\right.\), and the support of \(\tilde{X}_{1}\) is \([0,1]\). Thus, 3 is some normalization of a general linear index.

For illustration, we write \[Z=\exp\left(X_{1}\right).\] We will show the properties of \(Z\) first, then extend the results to \(X.\)

Assumption 1. The support of \(X_{1}\) is finite, that is, \(-\infty<\underline{u}_{x}<\bar{u}_{x}<\infty\). The density function of \(X_{1}\) is bounded and bounded away from zero: \(0<\underline{c}_{x}\leq f(x_{1})\leq\bar{c}_{x}<\infty.\)

Denote \(\underline{u}=\exp\left(\underline{u}_{x}\right)\) and \(\bar{u}=\exp\left(\bar{u}_{x}\right)\), then the support of \(Z\) is \(\left[\underline{u},\bar{u}\right]\), and the distribution of \(Z\) is bounded and bounded away from zero; specifically, there exist constants such that \[0<\underline{c}\leq f(z)\leq\bar{c}<\infty\textrm{ for }z\in[\underline{u},\bar{u}].\] For simplicity of analysis, we assume \(L(y;x)=1\). The results will not change much with a more general \(L(y;x).\)

Theorem 1. Suppose Assumption 1 holds and \(Y\) follows the conditional distribution in 1 with \(L=1\). Then, \[f(z\mid Y>w)\leq\frac{\bar{c}}{\underline{c}}\frac{w^{-(z-\underline{u})}\log w}{1-w^{-(\bar{u}-\underline{u})}},\quad z\in[\underline{u},\bar{u}],\] and as \(w\to\infty\), \[\begin{align} \mathbb{E}(Z\mid Y>w) & \to\underline{u}\textrm{ \textrm{ and}},\\ \operatorname{Var}(Z\mid Y>w) & \to0. \end{align}\]

Theorem 1 shows that \(Z\) degenerates to its lower bound as \(w\to\infty\).

Note that \(X_{1}=\log Z.\) The conditional density of \(X_{1}\) takes a similar form by the change of variable method. Lemma 1 shows that the result applies to \(X_{1}\) as well: \[\begin{align} \mathbb{E}(X_{1} & \mid Y>w)\rightarrow\log\left(\underline{u}\right)=\underline{u}_{x}\textrm{ and}\nonumber \\ \textrm{Var}(X_{1} & \mid Y>w)\rightarrow0.\label{eq:var-620} \end{align}\tag{4}\]

In the special case where \(Z\) is uniformly distributed or equivalently \(\alpha(X)=\exp\left(X_{1}\right)\) is uniformly distributed, we are able to derive the rate at which \(\operatorname{Var}(Z\mid Y>w)\) converges to zero.

Corollary 1. Suppose the assumptions in Theorem 1 hold and \(Z\) is uniformly distributed on \([\underline{u},\bar{u}]\). Then, as \(w\to\infty\), \[\begin{align} \mathbb{E}(Z\mid Y>w) & =\underline{u}+\frac{1}{\log w}+o\!\left(\frac{1}{\log w}\right)\textrm{ \textrm{ and}},\\[6pt] \operatorname{Var}(Z\mid Y>w) & =\frac{1}{(\log w)^{2}}+o\!\left(\frac{1}{(\log w)^{2}}\right). \end{align}\]

For \(X_{1}=\log Z\), Lemma 2 shows \[\frac{1}{\bar{u}^{2}}\textrm{Var}\left(Z\mid Y>w\right)\leq\textrm{Var}\left(X_{1}\mid Y>w\right)\leq\frac{1}{\underline{u}^{2}}\textrm{Var}\left(Z\mid Y>w\right).\label{eq:varX1bound}\tag{5}\] Therefore, the minimum eigenvalue of \(\bar{\varSigma}_{w_{n}}\), defined in (2 ), is proportional to \(\textrm{Var}\left(X_{1}\mid Y>w\right)\) and is at the rate of \(1/(\log w)^{2}.\)

Now consider the case with multiple regressors, \(X=(1,X_{1},X_{2},\ldots,X_{p})\). Suppose we are in the most favorable scenario for the rank condition of the Gram matrix: the regressors are mutually independent, \(X_{1}\perp X_{2}\perp\cdots\perp X_{p}\), and each \(X_{j}\) satisfies the support and density conditions in Assumption 1. In addition, assume \[\alpha(X)=\exp\left(X_{1}+X_{2}+\cdots+X_{p}\right).\]

Define \(\tilde{X}_{1}=X_{1}+X_{2}+\cdots+X_{p}\). Clearly, \(\tilde{X}_{1}\) also satisfies Assumption 1. Apply Theorem 1 and the result in (4 ), we obtain:

Corollary 2. Suppose \(X_{1},X_{2},\ldots,X_{p}\) satisfy Assumption 1 and are mutually independent. Let \(\underline{\tilde{u}}_{x}=\inf\tilde{X}_{1}\). Then, as \(w\to\infty\), \[\begin{align} \mathbb{E}(\tilde{X}_{1}\mid Y>w) & \to\underline{\tilde{u}}_{x}\textrm{\textrm{ and}},\\[6pt] \operatorname{Var}(\tilde{X}_{1}\mid Y>w) & \to0. \end{align}\]

In other words, \((1,X_{1},X_{2},\ldots,X_{p})\) becomes nearly collinear as \(w\to\infty\), since \(\tilde{X}_{1}=X_{1}+X_{2}+\cdots+X_{p}\) behaves like a constant. Thus, under fairly general conditions, \(\bar{\varSigma}_{w}\) degenerates to a singular matrix as \(w\to\infty\).

2.3 A Remedy of the Asymptotics↩︎

In this section, we provide additional conditions and a new proof showing that the main result in [4] continues to hold, albeit with a slower convergence rate and some extra condition.

We continue to assume \(X=(1,X_{1},X_{2},\ldots,X_{p})\). We show that as long as the eigenvalues of \[\bar{\varSigma}_{w_{n}}=\mathbb{E}\!\left(XX'|Y>w_{n}\right),\] the population counterpart of \(\hat{\varSigma}_{w_{n}}\), do not decay too quickly, the estimator \(\hat{\theta}\) remains consistent and asymptotically normal. For simplicity, we assume that the minimum and maximum eigenvalues of \(\bar{\varSigma}_{w_{n}}\) converge to zero at the same rate. That is, there exist positive finite constants \(\underline{B}\) and \(\bar{B}\) that do not depend on \(n\), and a sequence \(\{a_{n}\}\), such that \[\underline{B}a_{n}^{-1}\;\leq\;\rho_{\min}\!\left(\bar{\varSigma}_{w_{n}}\right)\;\leq\;\rho_{\max}\!\left(\bar{\varSigma}_{w_{n}}\right)\;\leq\;\bar{B}a_{n}^{-1},\label{eq:varying95rank}\tag{6}\] with \[a_{n}\to\infty\quad\text{as }w_{n}\to\infty.\] We can straightforwardly generalize the results to the case where the eigenvalues tend to zero at different rates, yet with more tedious notation.

To simplify the analysis, we assume \(L(y;x)=1\), so that we do not need to account for the bias term in [4]. The results are qualitatively unchanged if \(L(y;x)\) is not constant.

Theorem 2. Suppose \(L=1\). \(\mathbb{E}\!\left[\|X\|^{2+\delta}\mid Y>w_{n}\right]\) is uniformly bounded for some \(\delta\geq2\). The rank condition 6 holds. Let \((x_{i},y_{i}),i=1,2,\ldots,n\), be i.i.d. across \(n\). In addition, \(a_{n}\) satisfies that \(a_{n}^{2}/n_{0}\to0\), where \(n_{0}=\sum_{i=1}^{n}I(y_{i}>w_{n})\). Then \[\sqrt{n_{0}}\,\hat{\varSigma}_{w_{n}}^{1/2}(\hat{\theta}-\theta^{*})\;\overset{d}{\longrightarrow}\;N(0,\mathbb{I}_{p}).\]

Theorem 2 has the same form as the main theorem in [4], but with two key differences. First, the convergence rate of \(\hat{\theta}\) is \(\sqrt{n_{0}/a_{n}}\), which is slower than \(\sqrt{n_{0}}\). Second, we require the crucial condition \(a_{n}^{2}/n_{0}\to0\). Since \(a_{n}\) generally increases while \(n_{0}\) decreases as \(w_{n}\to\infty\), this condition is satisfied as long as \(w_{n}\) does not grow too quickly. This implies that practitioners should use effectively more observations for estimation; in other words, adopt a relatively smaller choice of \(w_{n}\).

In practice, there is no need to know or estimate \(a_{n}\); inference can be conducted using the asymptotic distribution in Theorem 2 directly.

2.4 Semiparametric Tail Index Regression↩︎

The problem is more severe in the semiparametric case, particularly concerning the nonparametric component within the semiparametric framework. We first derive the conditional density of \(X\) in the tail, and then present the results regarding the semiparametric regression.

Density of \(X\) in the Tail↩︎

For convenience, suppose that \[\alpha(X)=X_{1},\label{eq:alphax95semiparametric}\tag{7}\] and we wish to estimate \(\alpha(X)\) nonparametrically. The support of \(X_{1}\) is \([\underline{u}_{x},\bar{u}_{x}]\) with \(\underline{u}_{x}>0\).

Assume the density of \(X_{1}\) is bounded and bounded away from zero. This is the most favorable scenario for nonparametric estimation. Theorem 1 implies that \[f(x_{1}\mid Y>w)\;\leq\;C\frac{w^{-(x_{1}-\underline{u}_{x})}\log w}{1-w^{-(\bar{u}_{x}-\underline{u}_{x})}},\quad x_{1}\in[\underline{u}_{x},\bar{u}_{x}],\] for some \(C>0.\) This shows that the conditional density of \(X_{1}\) converges to zero for all \(x_{1}\in(\underline{u}_{x},\bar{u}_{x}]\), i.e., for all values in the support of \(X_{1}\) except the minimum, even under the best-case scenario. The speed of decay becomes faster as \(x_{1}\) moves farther away from its minimum. The problem can be exacerbated in the presence of bias, where \(L\left(y;x\right)\) is not a constant. The bias term will dominate more easily as \(x_{1}\) moves away from its minimum.

The discussion under condition 7 can be generalized. Suppose we have a general nonlinear \(\alpha(X)\). Define \[Z=\alpha(X).\] Assume \(\alpha(x)\) is continuously differentiable, with \(\alpha'(x)\) bounded and bounded away from zero, and the density of \(X\) bounded and bounded away from zero. For any value of \(z\), there exist only finitely many \(x\) such that \(\alpha(x)=z\). Then \(Z\) also has bounded support, and \[f(z)=\sum_{\alpha(x)=z}\frac{f(x)}{|\alpha'(x)|},\label{eq:fzfx}\tag{8}\] which is well defined for all \(z\) in the support of \(Z\). It is straightforward to see that \(f(z)\) is also bounded and bounded away from zero. If we denote the support of \(Z\) as \([\underline{u},\bar{u}]\), then applying the previous result we obtain \[f(z\mid Y>w)\;\leq\;C\,\frac{w^{-(z-\underline{u})}\log w}{1-w^{-(\bar{u}-\underline{u})}},\quad z\in[\underline{u},\bar{u}],\label{eq:fz124y}\tag{9}\] for some positive constant \(C\). The inequality above, together with 8 , implies that \(f(x\mid Y>w)\) converges to zero whenever \(\alpha(x)\neq\underline{u}\), with faster decay the further \(\alpha(x)\) lies above its minimum.

Semiparametric Regression↩︎

Suppose \(X=\left[1,X_{1},X_{2}\right]\). The semiparametric framework in [5] can be written as \[\alpha(X)=\theta_{0}^{*}+\theta_{1}^{*}X_{1}+g^{*}\left(X_{2}\right),\] where \(g^{*}\) is some unknown function. In an attempt to estimate the nonparametric component together with the parametric component, the initial step (Step 1) in [5] approximated the nonparametric part using sieves, allowing both components to be estimated simultaneously. That is, \[\min_{\theta,g_{n}}\sum_{i=1}^{n}\Bigl\{\exp\left(\theta_{0}+\theta_{1}x_{1i}+g_{n}\left(x_{2i}\right)\right)\,\log\!\left(\frac{y_{i}}{w_{n}}\right)-\theta_{0}-\theta_{1}x_{1i}-g_{n}\left(x_{2i}\right)\Bigr\}\,I(y_{i}>w_{n}),\] where \(g_{n}\) is an approximation of \(g^{*}\) using sieves (e.g., B-splines). Thus, the above mimic a parametric tail index regression.

We caution that this may create issues for the nonparametric component, \(g^{*}\left(X_{2}\right)\), due to the degenerate density function in (9 ). Specifically, most of the extreme observations are likely to be concentrated around the point where the minimum of \(\alpha\) is attained. Consequently, \(g^{*}\) is weakly identified and estimated imprecisely for\(X_{2}\) values away from the minimizer.

2.5 Nonparametric Tail Index Regression↩︎

The nonparametric tail index regression in [6] is more robust to the issue we identified, because it relies only on local data for estimation and \(\alpha(x)\) behaves approximately like a constant locally. In addition, they adopt order statistics to decide which observations to include for local estimation. Clearly, order statistics reflect the value of the index.

As in typical nonparametric regressions, a potential limitation is that the convergence rate can be very slow. For instance, in standard nonparametric univariate kernel estimation of conditional means, the convergence rate is \[O_{P}\!\left((nh)^{-1/2}+h^{r}\right),\label{eq:nonparametric95rate}\tag{10}\] where \(h\) is the bandwidth, \(h\to0\) as \(n\to\infty\), \(r\) is the order of the kernel function, and usually \(r\geq2\).

Suppose we are in the general setting \[\begin{align} \bar{F}(y\mid x) & =1-F(y\mid x)=y^{-\alpha(x)}L(y;x),\quad y\geq w,\\ L(y;x) & =c_{1}+c_{2}y^{-\beta(x)}+o\!\left(y^{-\beta(x)}\right),\quad c_{1},c_{2}\geq0,\\ \alpha(x) & >0,\quad\beta(x)\geq0. \end{align}\] This is a typical condition for tail index regression in the literature, originally proposed by [7]. In this case, we need to be concerned about the bias arising from the use of an approximate likelihood function. Suppose we use \(2\lfloor nh\rfloor\) (the integer part of \(nh\)) local observations around \(x\) to estimate \(\alpha(X)\) for \(X=x\). From [8], the fastest achievable convergence rate, obtained by optimally choosing \(w_{n}\) (or equivalently, the order statistics) to balance variance and bias and assuming \(\left(\alpha(x),\beta(x)\right)\) being constants locally, is \[O_{P}\!\left((nh)^{-\frac{\beta(x)}{2\beta(x)+\alpha(x)}}\kappa_{n}\right),\text{with, e.g., }\kappa_{n}=\log n.\] In addition, kernel estimation introduces an extra bias term of order \(h\) (not \(h^{r}\)), regardless of the choice of kernel, since we are estimating the index in the distribution rather than a conditional mean. As a result, the convergence rate becomes \[O_{P}\!\left((nh)^{-\frac{\beta(x)}{2\beta(x)+\alpha(x)}}\kappa_{n}+h\right),\] which is slower than the rate in (10 ). The rate is particularly slow when \(\beta(x)\) is small (so that \(y^{-\alpha(x)}\) provides a poor approximation) but \(\alpha(x)\) is large (corresponding to a light tail).

Since the rate \((nh)^{-\frac{\beta(x)}{2\beta(x)+\alpha(x)}}\) can be very slow, one may choose a relatively large bandwidth \(h\), as the bias is of order \(h\) and may help balance variance and bias. However, when \(h\) is large, \(\alpha(x)\) may vary substantially, to the degree that the situation discussed in the previous section, particularly the result in (9 ), becomes relevant for small-sample performance. This means that extreme observations are more likely to occur at \(X=x^{*}\) where \(\alpha(x^{*})\) attains the minimum in the local area around \(x\). Therefore, the estimation of \(\alpha(x)\) is likely underestimated. The impact of the underestimation on subsequent analysis remains unclear.

2.6 Implications for Practice↩︎

From the previous discussions, we caution that the nonparametric part in the semiparametric estimation might be problematic if one attempts to estimate it globally using sieves. The reason is that the conditional density in the tail region decays to zero everywhere except at the minimum, and the speed of decay increases rapidly as the tail index moves away from its minimum.

The nonparametric tail index regressions can converge at a much slower rate than regular nonparametric regressions for conditional means, because \(\alpha\left(x\right)\) can be large and \(\beta\left(x\right)\) can be small for some \(x\), while the bias term from kernel smoothing is only of order \(h\). The issue with semiparametric estimation may arise when \(h\) is relatively large, which can likely lead to underestimation of the tail index.

Simple linear or parametric models are more practical. The problem is much less severe: the minimum eigenvalue of the Gram matrix decays at the rate of \((\log w)^{-2}\) in a special case. The new asymptotics developed in this paper ensure valid inference. Moreover, the inference does not rely on knowing the unknown rate of the minimum eigenvalue of the Gram matrix. When applying linear models, we recommend choosing a relatively small threshold \(w\), or equivalently, using a larger proportion of the sample, at least from a theoretical point of view. In addition, one should remain vigilant about the rank condition of the conditional Gram matrix. To this end, we recommend explicitly checking the rank of the conditional Gram matrix when selecting \(w\).

3 Conclusion↩︎

In this paper, we identify problems with the rank condition and with the conditional density of explanatory variables in tail-index regressions. In this setting, the convergence rate of local nonparametric regression can be very slow. These issues arise because the estimation sample is not random but is instead correlated with the explanatory variables.

A promising direction for future research is to examine whether similar issues occur in related settings.

Appendix

4 Lemmas and Proofs↩︎

Lemma 1. The results in Theorem 1 imply (4 ).

Proof. Note for \(z>\underline{u}>0\), \[\log z-\log\underline{u}=\frac{z-\underline{u}}{\xi}\leq\frac{z-\underline{u}}{\underline{u}},\] for some \(\xi\in\left(\underline{u},z\right)\) by the mean value theorem. Therefore, \[\mathbb{E}(X_{1}\mid Y>w)-\log\left(\underline{u}\right)=\mathbb{E}(\log Z\mid Y>w)-\log\left(\underline{u}\right)\leq\left[\mathbb{E}\left(Z\mid Y>w\right)-\underline{u}\right]/\underline{u}\to0.\] For the variance, say we have an i.i.d. copy \(Z'\). Then \[\begin{align} \textrm{Var}\left(X_{1}\mid Y>w\right) & =\frac{1}{2}\mathbb{E}\left[\left(\log Z-\log Z'\right)^{2}\mid Y>w\right]\leq\frac{1}{2\underline{u}^{2}}\mathbb{E}\left[\left(Z-Z'\right)^{2}\mid Y>w\right]\\ & =\frac{1}{\underline{u}^{2}}\textrm{Var}\left(Z\mid Y>w\right)\to0, \end{align}\] as desired.\(\blacksquare\)

Lemma 2. The results in Corollary 1 imply (5 ).

Proof. The second inequality in (5 ) is directly from the proof of Lemma 1. Note for \(x_{1}<x_{1}'<\log\bar{u}=\bar{u}_{x}<\infty,\) \[0<\exp\left(x_{1}'\right)-\exp\left(x_{1}\right)=\exp\left(\xi\right)\left(x_{1}'-x_{1}\right)\leq\bar{u}\left(x_{1}'-x_{1}\right),\] for some \(\xi\in\left(x_{1},x_{1}'\right)\) by the mean value theorem. Say we have an i.i.d. copy \(X_{1}'.\) The first inequality is obtained by \[\begin{align} \textrm{Var}\left(Z\mid Y>w\right) & =\frac{1}{2}\mathbb{E}\left[\left(\exp X_{1}-\exp X_{1}'\right)^{2}\mid Y>w\right]\\ & \leq\frac{\bar{u}^{2}}{2}\mathbb{E}\left[\left(X_{1}-X_{1}'\right)^{2}\mid Y>w\right]=\bar{u}^{2}\textrm{Var}\left(X_{1}\mid Y>w\right), \end{align}\] as desired.\(\blacksquare\)

Proof of Theorem 1. We first derive the conditional density of \(Z|Y>w.\)

Using the Bayes’ theorem, \[\begin{align} f\left(z|Y>w\right) & =\frac{\Pr\left(Y>w|z\right)f\left(z\right)}{\int_{\underline{u}}^{\bar{u}}\Pr\left(Y>w|z\right)f\left(z\right)\textrm{d}z}\nonumber \\ & =\frac{w^{-z}f\left(z\right)}{\int_{\underline{u}}^{\bar{u}}w^{-z}f\left(z\right)\textrm{d}z}=\frac{w^{\underline{u}-z}f\left(z\right)}{\int_{\underline{u}}^{\bar{u}}w^{\underline{u}-z}f\left(z\right)\textrm{d}z}\nonumber \\ & \leq\frac{\bar{c}}{\underline{c}}\frac{w^{\underline{u}-z}}{\int_{\underline{u}}^{\bar{u}}w^{\underline{u}-z}\textrm{d}z}=\frac{\bar{c}}{\underline{c}}\frac{w^{-\left(z-\underline{u}\right)}\log w}{1-w^{-\left(\bar{u}-\underline{u}\right)}}.\label{eq:f40z124Y62w41} \end{align}\tag{11}\] We have the first result.

Using the second line in (11 ), \[\begin{align} \mathbb{E}\left(Z|Y>w\right) & =\int_{\underline{u}}^{\bar{u}}zf\left(z|Y>w\right)\textrm{d}z\\ & =\frac{\int_{\underline{u}}^{\bar{u}}w^{\underline{u}-z}zf\left(z\right)\textrm{d}z}{\int_{\underline{u}}^{\bar{u}}w^{\underline{u}-z}f\left(z\right)\textrm{d}z}=\frac{\int_{\underline{u}}^{\underline{u}+\epsilon}w^{\underline{u}-z}zf\left(z\right)\textrm{d}z+\int_{\underline{u}+\epsilon}^{\bar{u}}w^{\underline{u}-z}zf\left(z\right)\textrm{d}z}{\int_{\underline{u}}^{\underline{u}+\epsilon}w^{\underline{u}-z}f\left(z\right)\textrm{d}z+\int_{\underline{u}+\epsilon}^{\bar{u}}w^{\underline{u}-z}f\left(z\right)\textrm{d}z}\\ & \equiv\frac{D_{1\epsilon}\left(w\right)+R_{1\epsilon}\left(w\right)}{D_{2\epsilon}\left(w\right)+R_{2\epsilon}\left(w\right)}, \end{align}\] for some small \(\epsilon>0\). We claim that as \(w\rightarrow\infty,\) \[R_{1\epsilon}\left(w\right)=o\left(D_{2\epsilon}\left(w\right)\right)\textrm{ and }R_{2\epsilon}\left(w\right)=o\left(D_{2\epsilon}\left(w\right)\right).\] We defer its proof to the end.

Note that \[\frac{D_{1\epsilon}\left(w\right)}{D_{2\epsilon}\left(w\right)}=\frac{\int_{\underline{u}}^{\underline{u}+\epsilon}w^{\underline{u}-z}zf\left(z\right)\textrm{d}z}{\int_{\underline{u}}^{\underline{u}+\epsilon}w^{\underline{u}-z}f\left(z\right)\textrm{d}z}\leq\frac{\left(\underline{u}+\epsilon\right)\int_{\underline{u}}^{\underline{u}+\epsilon}w^{\underline{u}-z}f\left(z\right)\textrm{d}z}{\int_{\underline{u}}^{\underline{u}+\epsilon}w^{\underline{u}-z}f\left(z\right)\textrm{d}z}=\underline{u}+\epsilon.\] Therefore when \(w\) is large enough, \[\mathbb{E}\left(Z|Y>w\right)\leq\underline{u}+2\epsilon.\] Since \(\epsilon\) can be arbitrary small, we have \[\mathbb{E}\left(Z|Y>w\right)\rightarrow\underline{u},\textrm{ as }w\rightarrow\infty.\label{eq:EZ124Y}\tag{12}\] Similarly, \[\mathbb{E}\left(Z^{2}|Y>w\right)\rightarrow\underline{u}^{2},\textrm{ as }w\rightarrow\infty.\label{eq:EZ942124Y}\tag{13}\] By (12 ) and (13 ), we must have \[\textrm{Var}\left(Z|Y>w\right)=\mathbb{E}\left(Z^{2}|Y>w\right)-\left[\mathbb{E}\left(Z|Y>w\right)\right]^{2}\rightarrow0\] \(\textrm{as }w\rightarrow\infty.\)

We now show the claim. Note that Assumption 1 implies \[D_{2\epsilon}\left(w\right)\geq\int_{\underline{u}}^{\underline{u}+\epsilon/2}w^{\underline{u}-z}f\left(z\right)\textrm{d}z\geq\frac{\epsilon}{2}w^{-\epsilon/2}\underline{c}.\label{eq:D2w}\tag{14}\] On the other hand, Assumption 1 guarantees \[\max\left\{ R_{1\epsilon}\left(w\right),R_{2\epsilon}\left(w\right)\right\} \leq\left(\bar{u}-\underline{u}\right)\max\left\{ 1,\bar{u}\right\} w^{-\epsilon}\bar{c}.\label{eq:maxR}\tag{15}\] Since \(\epsilon w^{-\epsilon/2}\gg w^{-\epsilon}\) as \(w\rightarrow\infty\) for a fixed \(\epsilon,\) (14 ) and (15 ) imply that \[D_{2\epsilon}\left(w\right)\gg\max\left\{ R_{1\epsilon}\left(w\right),R_{2\epsilon}\left(w\right)\right\} ,\] as desired. \(\blacksquare\)

Proof of Corollary 1. We first derive the conditional density of \(Z:\) \[\begin{align} f\left(z|Y>w\right) & =\frac{\Pr\left(Y>w|z\right)f\left(z\right)}{\int_{\underline{u}}^{\bar{u}}\Pr\left(Y>w|z\right)f\left(z\right)\textrm{d}z}\\ & =\frac{w^{-z}}{\int_{\underline{u}}^{\bar{u}}w^{-z}\textrm{d}z}=\frac{w^{-z}\log w}{w^{-\underline{u}}-w^{-\bar{u}}}=\frac{w^{-\left(z-\underline{u}\right)}\log w}{1-w_{}^{\underline{u}-\bar{u}}},\textrm{ for }z\in\left[\underline{u},\bar{u}\right]. \end{align}\] Based on it, \[\begin{align} \mathbb{E}\left(Z|Y>w\right) & =\int_{\underline{u}}^{\bar{u}}z\frac{w^{\underline{u}-z}\log w}{1-w_{}^{\underline{u}-\bar{u}}}\textrm{d}z=\underline{u}+\frac{1}{\log w}-\frac{\left(\bar{u}-\underline{u}\right)w^{\underline{u}-\bar{u}}}{1-w^{\underline{u}-\bar{u}}}=\underline{u}+\frac{1}{\log w}+o\left(\frac{1}{\log w}\right). \end{align}\] and \[\begin{align} \mathbb{E}\left(Z^{2}|Y>w\right) & =\int_{\underline{u}}^{\bar{u}}z^{2}\frac{w^{\underline{u}-z}\log w}{1-w^{\underline{u}-\bar{u}}}\textrm{d}z\\ & =\frac{\underline{u}^{2}-\bar{u}^{2}w^{\underline{u}-\bar{u}}}{1-w^{\underline{u}-\bar{u}}}+\frac{2}{\log w}\mathbb{E}\left(Z|Y>w\right)\\ & =\underline{u}^{2}-\frac{\left(\bar{u}^{2}-\underline{u}^{2}\right)w^{\underline{u}-\bar{u}}}{1-w^{\underline{u}-\bar{u}}}+\frac{2}{\log w}\mathbb{E}\left(Z|Y>w\right). \end{align}\] We proceed to calculate its variance, \[\begin{align} \textrm{Var}\left(Z|Y>w\right) & =\mathbb{E}\left(Z^{2}|Y>w\right)-\left[\mathbb{E}\left(Z|Y>w\right)\right]^{2}\\ & =\frac{1}{\left(\log w\right)^{2}}-\frac{\left(\bar{u}-\underline{u}\right)^{2}w^{-\left(\bar{u}-\underline{u}\right)}}{\left(1-w^{-\left(\bar{u}-\underline{u}\right)}\right)^{2}}\\ & =\frac{1}{\left(\log w\right)^{2}}+o\left(\frac{1}{\left(\log w\right)^{2}}\right)\text{,} \end{align}\] as desired. An unrelated note: the variance above can be shown to be positive for any \(w\).\(\blacksquare\)

Proof of Theorem 2. We only show the part that differs from the proof in [4]. That is, the convergence rate, \(\sqrt{n_{0}/a_{n}}\), and the asymptotic normality that accommodates the irregular \(\sqrt{n_{0}/a_{n}}\). The rest are the same as in [4]. Denote \[\varsigma_{n}=n_{0}/a_{n}.\]

We denote, \(\gamma=\theta-\theta^{*},\) and \[\mathcal{K}_{n}^{*}\left(\gamma\right)=\sum_{i=1}^{n}\left\{ \exp\left[x_{i}'\left(\gamma+\theta^{*}\right)\right]\log\left(\left.y_{i}\right/w_{n}\right)-x_{i}'\left(\gamma+\theta^{*}\right)\right\} I\left(y_{i}>w_{n}\right).\] For a fixed \(p\times1\) vector \(u\), write \(u=\sqrt{\varsigma_{n}}\gamma\). The second-order Taylor expansion of \(\mathcal{K}_{n}^{*}\left(u/\sqrt{\varsigma_{n}}\right)\) around 0 is: \[\mathcal{K}_{n}^{*}\left(u/\varsigma_{n}\right)-\mathcal{K}_{n}^{*}\left(0\right)=\varsigma_{n}^{-1/2}u'\dot{\mathcal{K}}_{n}^{*}\left(0\right)+\varsigma_{n}^{-1}u'\mathcal{\ddot{K}}_{n}^{*}\left(0\right)u/2+o_{P}\left(1\right),\label{eq:Taylor95expansion}\tag{16}\] where \(\dot{\mathcal{K}^{*}}\) and \(\ddot{\mathcal{K}}^{*}\) denote the first and second order derivatives of \(\mathcal{K}\), respectively.

We show the properties of \(\varsigma_{n}^{-1/2}\dot{\mathcal{K}}_{n}^{*}\left(0\right)\) and \(\varsigma_{n}^{-1}\mathcal{\ddot{K}}_{n}^{*}\left(0\right)\) at the end of the proof. (21 ) shows that \(\varsigma_{n}^{-1/2}\dot{\mathcal{K}}_{n}^{*}\left(0\right)=O_{P}\left(1\right),\) and (22 ) and (23 ) imply \(\varsigma_{n}^{-1}\mathcal{\ddot{K}}_{n}^{*}\left(0\right)\) behaves like a full rank matrix with finite eigenvalues.

Using those, we must have \(\left\Vert u\right\Vert\) uniformly bounded with very high probability such that \(\mathcal{K}_{n}^{*}\left(u/\varsigma_{n}\right)-\mathcal{K}_{n}^{*}\left(0\right)\leq0\) is possible, otherwise \(\varsigma_{n}^{-1}u'\mathcal{\ddot{K}}_{n}^{*}\left(0\right)u/2\) will dominate in (16 ) and make \(\mathcal{K}_{n}^{*}\left(u/\varsigma_{n}\right)-\mathcal{K}_{n}^{*}\left(0\right)>0\). By definition \(\mathcal{K}_{n}^{*}\left(\hat{\gamma}\right)-\mathcal{K}_{n}^{*}\left(0\right)\leq0\) with \(\hat{\gamma}=\hat{\theta}-\theta^{*}\). By setting \(\hat{u}/\sqrt{\varsigma_{n}}=\hat{\gamma}=\hat{\theta}-\theta^{*}\) and previous analysis, we must have \(\hat{u}=O_{P}\left(1\right),\) otherwise, \(\mathcal{K}_{n}^{*}\left(\hat{\gamma}\right)-\mathcal{K}_{n}^{*}\left(0\right)\leq0\) cannot hold with very high probability. Therefore, \[\hat{\theta}-\theta^{*}=\hat{u}/\sqrt{\varsigma_{n}}=O_{P}\left(\varsigma_{n}^{-1/2}\right)=O_{P}\left(\sqrt{a_{n}/n_{0}}\right).\]

We now show the asymptotic normality. The first order condition yields \[\mathcal{\dot{K}}_{n}^{*}\left(\hat{\gamma}\right)=0,\] which, by the first order Taylor expansion around \(\mathcal{\dot{K}}_{n}^{*}\left(0\right)\), leads to \[\left[1+o_{P}\left(1\right)\right]\mathcal{\ddot{K}}_{n}^{*}\left(0\right)\left(\hat{\theta}-\theta^{*}\right)=-\mathcal{\dot{K}}_{n}^{*}\left(0\right).\] Multiply both sides by \(a_{n}^{-1/2}\varsigma_{n}^{-1/2}\bar{\varSigma}_{w_{n}}^{-1/2}\), \[\left[1+o_{P}\left(1\right)\right]\left(a_{n}^{-1/2}\bar{\varSigma}_{w_{n}}^{-1/2}\right)\cdot\left[\varsigma_{n}^{-1}\mathcal{\ddot{K}}_{n}^{*}\left(0\right)\right]\cdot\varsigma_{n}^{1/2}\left(\hat{\theta}-\theta^{*}\right)=-\left(a_{n}^{-1/2}\bar{\varSigma}_{w_{n}}^{-1/2}\right)\cdot\varsigma_{n}^{-1/2}\mathcal{\dot{K}}_{n}^{*}\left(0\right).\] Finally, applying the results in (21 ), (22 ), and (23 ) and the continuous mapping theorem yield \[\left(a_{n}^{1/2}\bar{\varSigma}_{w_{n}}^{1/2}\right)\cdot\varsigma_{n}^{1/2}\left(\hat{\theta}-\theta^{*}\right)\overset{d}{\rightarrow}N\left(0,\mathbb{I}_{p}\right).\] Note that \(\varsigma_{n}=n_{0}/a_{n}\) and \(a_{n}\left\Vert \hat{\varSigma}_{w_{n}}-\bar{\varSigma}_{w_{n}}\right\Vert \overset{P}{\rightarrow}0\) (for similar reason as in (23 )). Using the continuous mapping theorem again, the above can be written as \[\left(a_{n}^{1/2}\hat{\varSigma}_{w_{n}}^{1/2}\right)\cdot\sqrt{n_{0}/a_{n}}\left(\hat{\theta}-\theta^{*}\right)\overset{d}{\rightarrow}N\left(0,\mathbb{I}_{p}\right),\] which is \[\sqrt{n_{0}}\hat{\varSigma}_{w_{n}}^{1/2}\left(\hat{\theta}-\theta^{*}\right)\overset{d}{\rightarrow}N\left(0,\mathbb{I}_{p}\right),\] as desired.

Result 1: Properties of \(\varsigma_{n}^{-1/2}\dot{\mathcal{K}}_{n}^{*}\left(0\right)\).

Note that \[\varsigma_{n}^{-1/2}\dot{\mathcal{K}}_{n}^{*}\left(0\right)=\sum_{i=1}^{n}\varsigma_{n}^{-1/2}x_{i}\left[\exp\left(x_{i}'\theta^{*}\right)\log\left(\left.y_{i}\right/w_{n}\right)-1\right]I\left(y_{i}>w_{n}\right)\equiv\sum_{i=1}^{n}q_{ni},\] with \[\begin{align} q_{ni} & \equiv\varsigma_{n}^{-1/2}x_{i}\left[\exp\left(x_{i}'\theta^{*}\right)\log\left(\left.y_{i}\right/w_{n}\right)-1\right]I\left(y_{i}>w_{n}\right)\\ & \equiv\varsigma_{n}^{-1/2}x_{i}\epsilon_{ni}I\left(y_{i}>w_{n}\right) \end{align}\] where \[\epsilon_{ni}\equiv\exp\left(x_{i}'\theta^{*}\right)\log\left(\left.y_{i}\right/w_{n}\right)-1.\] Then, \[\begin{align} \mathbb{E}\left(q_{ni}\right) & =\varsigma_{n}^{-1/2}\Pr\left(y_{i}>w_{n}\right)\mathbb{E}\left(x_{i}\epsilon_{ni}|y_{i}>w_{n}\right)\\ & =\varsigma_{n}^{-1/2}\Pr\left(y_{i}>w_{n}\right)\mathbb{E}\left[\left.x_{i}\mathbb{E}\left(\epsilon_{ni}|y_{i}>w_{n},x_{i}\right)\right|y_{i}>w_{n}\right]\\ & =0, \end{align}\] because \(\epsilon_{ni}+1=\exp\left(x_{i}'\theta^{*}\right)\log\left(\left.y_{i}\right/w_{n}\right)\) is the standard exponential conditional on \(\left\{ y_{i}>w_{n},x_{i}\right\} .\) Further, \[\begin{align} \mathbb{E}\left(q_{ni}q_{ni}'\right) & =\varsigma_{n}^{-1}\mathbb{E}\left[x_{i}x_{i}'\epsilon_{ni}^{2}I\left(y_{i}>w_{n}\right)\right]\nonumber \\ & =\varsigma_{n}^{-1}\Pr\left(y_{i}>w_{n}\right)\mathbb{E}\left[\left.x_{i}x_{i}'\mathbb{E}\left(\epsilon_{ni}^{2}|y_{i}>w_{n},x_{i}\right)\right|y_{i}>w_{n}\right]\nonumber \\ & =\varsigma_{n}^{-1}\Pr\left(y_{i}>w_{n}\right)\mathbb{E}\left(\left.x_{i}x_{i}'\right|y_{i}>w_{n}\right),\label{eq:EZZ39} \end{align}\tag{17}\] since \(\epsilon_{ni}+1|\left\{ y_{i}>w_{n},x_{i}\right\}\) is standard exponential.

Take any finite \(p\times1\) vector \(c\) with \(\left\Vert c\right\Vert =1\). Using (17 ), the variance of \(c'\sum_{i=1}^{n}q_{ni}\) (a scalar) is \[\begin{align} s_{n}^{2}\left(c\right) & =c'\left[\sum_{i=1}^{n}\mathbb{E}\left(q_{ni}q_{ni}'\right)\right]c=nc'\mathbb{E}\left(q_{ni}q_{ni}'\right)c\nonumber \\ & =n\varsigma_{n}^{-1}\Pr\left(y_{i}>w_{n}\right)c'\mathbb{E}\left(\left.x_{i}x_{i}'\right|y_{i}>w_{n}\right)c\nonumber \\ & =\frac{n\Pr\left(y_{i}>w_{n}\right)}{n_{0}}a_{n}^ {}c'\mathbb{E}\left(\left.x_{i}x_{i}'\right|y_{i}>w_{n}\right)c.\label{eq:s95n} \end{align}\tag{18}\] [4] showed that \(n\Pr\left(y_{i}>w_{n}\right)/n_{0}=1+o\left(1\right).\) Together with (6 ), \[s_{n}^{2}\left(c\right)=a_{n}^ {}c'\bar{\varSigma}_{w_{n}}c\left(1+o\left(1\right)\right),\textrm{ and}\] \[\left(1+o\left(1\right)\right)\underline{B}\leq s_{n}^{2}\left(c\right)\leq\left(1+o\left(1\right)\right)\bar{B}.\label{eq:s95n95finite}\tag{19}\] In addition, \[\begin{align} \sum_{i=1}^{n}\mathbb{E}\left(c'q_{ni}\right)^{2+\delta} & =n\varsigma_{n}^{-\left(2+\delta\right)/2}\mathbb{E}\left[\left(c'x_{i}\right)^{2+\delta}\epsilon_{ni}^{2+\delta}I\left(y_{i}>w_{n}\right)\right]\nonumber \\ & =n\Pr\left(y_{i}>w_{n}\right)\varsigma_{n}^{-\left(2+\delta\right)/2}\mathbb{E}\left[\left.\left(c'x_{i}\right)^{2+\delta}\mathbb{E}\left(\epsilon_{ni}^{2+\delta}|y_{i}>w_{n},x_{i}\right)\right|y_{i}>w_{n}\right]\nonumber \\ & \leq Cn_{0}\left(\frac{a_{n}}{n_{0}}\right)^{1+\delta/2}\mathbb{E}\left[\left.\left|c'x_{i}\right|^{2+\delta}\right|y_{i}>w_{n}\right]\nonumber \\ & =C\frac{a_{n}^{1+\delta/2}}{n_{0}^{\delta/2}}\mathbb{E}\left[\left.\left|c'x_{i}\right|^{2+\delta}\right|y_{i}>w_{n}\right]\rightarrow0,\label{eq:x94243delta} \end{align}\tag{20}\] for some positive \(C,\) due to \(n\Pr\left(y_{i}>w_{n}\right)/n_{0}=1+o\left(1\right),\) the moment condition \(\mathbb{E}\left[\left.\left\Vert x_{i}\right\Vert ^{2+\delta}\right|y_{i}>w_{n}\right]\) being finite and \[\left.a_{n}^{1+\delta/2}\right/n_{0}^{\delta/2}=a_{n}^{1-\delta/2}\left(\left.a_{n}^{2}\right/n_{0}\right)^{\delta/2}\rightarrow0\textrm{ by }\delta\geq2\textrm{ and }\left.a_{n}^{2}\right/n_{0}\rightarrow0.\]

(19 ) and (20 ) imply that \[\frac{\sum_{i=1}^{n}\mathbb{E}\left(c'q_{ni}\right)^{2+\delta}}{s_{n}^{2}\left(c\right)}\rightarrow0,\] which is the Lyapunov condition for \(c'\sum_{i=1}^{n}q_{ni}\). Therefore, the Lyapunov Central Limit Theorem implies \[\frac{c'\varsigma_{n}^{-1/2}\dot{\mathcal{K}}_{n}^{*}\left(0\right)}{s_{n}\left(c\right)}=\frac{c'\sum_{i=1}^{n}q_{ni}}{s_{n}\left(c\right)}\overset{d}{\rightarrow}N\left(0,1\right).\] Recall that \(s_{n}^{2}\left(c\right)=a_{n}^{-1}c'\bar{\varSigma}_{w_{n}}c\left(1+o\left(1\right)\right)\), applying the Cramér-Wold device and continuous mapping theorem yields \[\left(a_{n}^{-1/2}\bar{\varSigma}_{w_{n}}^{-1/2}\right)\cdot\varsigma_{n}^{-1/2}\dot{\mathcal{K}}_{n}^{*}\left(0\right)=a_{n}^{-1/2}\bar{\varSigma}_{w_{n}}^{-1/2}\sum_{i=1}^{n}q_{ni}\overset{d}{\rightarrow}N\left(0,\mathbb{I}_{p}\right).\label{eq:Kn46}\tag{21}\]

Result 2: Properties of \(\varsigma_{n}^{-1}\mathcal{\ddot{K}}_{n}^{*}\left(0\right)\).

Recall that \(\epsilon_{ni}+1=\exp\left(x_{i}'\theta^{*}\right)\log\left(\left.y_{i}\right/w_{n}\right)\) is the standard exponential conditional on \(\left\{ y_{i}>w_{n},x_{i}\right\} .\) Using the same logic in (20 ), \[\begin{align} \mathbb{E}\left[\varsigma_{n}^{-1}\mathcal{\ddot{K}}_{n}^{*}\left(0\right)\right] & =\mathbb{E}\left[\varsigma_{n}^{-1}\sum_{i=1}^{n}x_{i}x_{i}'\exp\left(x_{i}'\theta^{*}\right)\log\left(\left.y_{i}\right/w_{n}\right)I\left(y_{i}>w_{n}\right)\right]\nonumber \\ & =\mathbb{E}\left[\varsigma_{n}^{-1}\sum_{i=1}^{n}x_{i}x_{i}'\left(\epsilon_{ni}+1\right)I\left(y_{i}>w_{n}\right)\right]\nonumber \\ & =\left(1+o\left(1\right)\right)a_{n}\mathbb{E}\left[\left.x_{i}x_{i}'\right|y_{i}>w_{n}\right]\nonumber \\ & =\left(1+o\left(1\right)\right)a_{n}\bar{\varSigma}_{w_{n}}.\label{eq:Kn4646} \end{align}\tag{22}\] As a result, \[\left(1+o\left(1\right)\right)\underline{B}\leq\rho_{\min}\left\{ \mathbb{E}\left[\varsigma_{n}^{-1}\mathcal{\ddot{K}}_{n}^{*}\left(0\right)\right]\right\} \leq\rho_{\max}\left\{ \mathbb{E}\left[\varsigma_{n}^{-1}\mathcal{\ddot{K}}_{n}^{*}\left(0\right)\right]\right\} \leq\left(1+o\left(1\right)\right)\bar{B}.\]

\(\left(j,l\right)\)-th element of \(\varsigma_{n}^{-1}\mathcal{\ddot{K}}_{n}^{*}\left(0\right)\) converges to its expectation by Markov inequality due to the following: \[\begin{align} & \textrm{Var}\left[\varsigma_{n}^{-1}\sum_{i=1}^{n}x_{ij}x_{il}\exp\left(x_{i}'\theta^{*}\right)\log\left(\left.y_{i}\right/w_{n}\right)I\left(y_{i}>w_{n}\right)\right]\\ = & n\varsigma_{n}^{-2}\textrm{Var}\left[x_{ij}x_{il}\left(\epsilon_{ni}+1\right)I\left(y_{i}>w_{n}\right)\right]\leq\mathbb{E}\left[x_{ij}^{2}x_{il}^{2}\left(\epsilon_{ni}+1\right)^{2}I\left(y_{i}>w_{n}\right)\right]\\ = & n\Pr\left(y_{i}>w_{n}\right)\varsigma_{n}^{-2}\mathbb{E}\left\{ \left.x_{ij}^{2}x_{il}^{2}\mathbb{E}\left[\left.\left(\epsilon_{ni}+1\right)^{2}\right|y_{i}>w_{n},x_{i}\right]\right|y_{i}>w_{n}\right\} \\ = & 2n_{0}\varsigma_{n}^{-2}\left(1+o\left(1\right)\right)\mathbb{E}\left(\left.x_{ij}^{2}x_{il}^{2}\right|y_{i}>w_{n}\right)\\ = & 2\frac{a_{n}^{2}}{n_{0}}\left(1+o\left(1\right)\right)\mathbb{E}\left(\left.x_{ij}^{2}x_{il}^{2}\right|y_{i}>w_{n}\right)\\ \leq & 2\frac{a_{n}^{2}}{n_{0}}\left(1+o\left(1\right)\right)\left[\mathbb{E}\left(\left.x_{ij}^{4}\right|y_{i}>w_{n}\right)\right]^{1/2}\left[\mathbb{E}\left(\left.x_{il}^{4}\right|y_{i}>w_{n}\right)\right]^{1/2}\\ \rightarrow & 0, \end{align}\] by \(\left.a_{n}^{2}\right/n_{0}\rightarrow0\) and the finite fourth moment condition. Since the dimension of \(\varsigma_{n}^{-1}\mathcal{\ddot{K}}_{n}^{*}\left(0\right)\) is finite, the above implies \[\left\Vert \varsigma_{n}^{-1}\mathcal{\ddot{K}}_{n}^{*}\left(0\right)-\mathbb{E}\left[\varsigma_{n}^{-1}\mathcal{\ddot{K}}_{n}^{*}\left(0\right)\right]\right\Vert \overset{P}{\rightarrow}0.\label{eq:Kn4646-62E}\tag{23}\] \(\blacksquare\)

References↩︎

[1]

L. de Haan and A. Ferreira, Extreme value theory: An introduction. Springer New York, NY, 2006 , edition = {1}.

[2]

B. M. Hill, “A simple general approach to inference about the tail of a distribution,” The Annals of Statistics, vol. 3, no. 5, pp. 1163–1174, 1975.

[3]

P. Hall and N. Tajvidi, “Nonparametric analysis of temporal trend when fitting parametric models to extreme?value data,” Statistical Science, vol. 15, no. 2, pp. 153–167, 2000.

[4]

H. Wang and C.-L. Tsai, “Tail index regression,” Journal of the American Statistical Association, vol. 104, no. 487, pp. 1233–1240, 2009, doi: 10.1198/jasa.2009.tm08458.

[5]

R. Li, C. Leng, and J. You, “Semiparametric tail index regression,” Journal of Business & Economic Statistics, vol. 40, no. 1, pp. 82–95, 2022, doi: 10.1080/07350015.2020.1775616.

[6]

L. de Haan and C. Zhou, “Trends in extreme value indices,” Journal of the American Statistical Association, vol. 116, no. 535, pp. 1265–1279, 2021, doi: 10.1080/01621459.2019.1705307.

[7]

P. Hall, “On some simple estimates of an exponent of regular variation,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 44, no. 1, pp. 37–42, Dec. 1982.

[8]

E. Haeusler and J. L. Teugels, “On asymptotic normality of hill’s estimator for the exponent of regular variation,” Annals of Statistics, vol. 13, no. 2, pp. 743–756, 1985, doi: 10.1214/aos/1176349551.

Corresponding author. Research School of Economics, The Australian National University, Canberra, ACT 0200, Australia. Email: tao.yang@anu.edu.au.↩︎

Cautions on Tail Index Regressions