Abstract

The normality assumption for random errors is fundamental in the analysis of variance (ANOVA) models, yet it is seldom subjected to formal testing in practice. In this paper, we develop Neyman’s smooth tests for assessing normality in a broad class of ANOVA models. The proposed test statistics are constructed via the Gaussian probability integral transformation of ANOVA residuals and are shown to follow an asymptotic Chi-square distribution under the null hypothesis, with degrees of freedom determined by the dimension of the smooth model. We further propose a data-driven selection of the model dimension based on a modified Schwarz’s criterion. Monte Carlo simulations demonstrate that the tests maintain the nominal size and achieve high power against a wide range of alternatives. Our framework thus provides a systematic and effective tool for formally validating the normality assumption in ANOVA models.

Keywords: ANOVA; Estimation effect; Normality; Schwarz’s selection rule; Smooth test.

1 Introduction↩︎

Analysis of Variance (ANOVA) is a fundamental and widely used tool in both exploratory and confirmatory data analysis [1], particularly for comparing group means and assessing the significance of factors in experimental designs. The theory of ANOVA has been well established in the literature; see, for example, [2], [3], [4], [5], [6], and [7]. A standard assumption in ANOVA is that the random errors are normally distributed. Violations of this assumption may invalidate normal-based inference. From an estimation perspective, departures from normality undermine the validity of variance component estimators, since the classical variance formulae rely on mean squares of random effects being constant multiples of Chi-square variables [2]. From a testing perspective, the \(F\)-test in ANOVA is sensitive to nonnormality, especially when group sizes are unequal [8]. The impact of nonnormality on the size and power of the \(F\)-test has been extensively studied [9]–[18]. These considerations highlight the importance of rigorously evaluating the normality assumption in ANOVA models.

Although a variety of normality tests have been proposed for observed data and linear models, relatively little attention has been paid to systematically assessing normality in ANOVA. A straightforward diagnostic approach is to use the normal probability plots of ANOVA residuals [5], [6], which are intuitive but lack theoretical justification. To our knowledge, formal hypothesis testing procedures designed explicitly for ANOVA are scarce in the literature. Most existing methods simply apply classical normality tests to ANOVA residuals, see, for example, [19], [20], and [21]. While applicable in practice, such approaches do not provide a rigorous assessment of normality within the ANOVA framework and fail to account for the structural constraints inherent in ANOVA models.

In this article, following the spirit of [22], we propose a unified smooth test framework for assessing normality in various types of ANOVA models. Neyman’s smooth test is widely used in diverse scientific fields due to its theoretical soundness and practical effectiveness, with proven power against a broad class of alternatives; see, e.g., [23], [24], and [25]. Specifically, we reformulate the normality testing problem as one of uniformity via the Gaussian probability integral transform (PIT), and construct test statistics based on the Gaussian PIT of ANOVA residuals. Under mild conditions, the proposed statistics are asymptotically Chi-square distributed under the null hypothesis. We further analyze the power properties under both fixed and local alternatives. A modified Schwarz’s selection rule is proposed to determine the direction of tests. Monte Carlo experiments demonstrate the good performance of our methodology in finite samples.

The main contributions of this paper are threefold. First, the proposed asymptotic Chi-square tests are simple to implement: critical values are obtained directly from the limiting distributions without resorting to resampling. Second, we systematically investigate the properties of the test statistics in three types of one-way fixed effects models, potentially with a diverging number of groups, within a unified asymptotic framework. Our analysis clarifies how parameter estimation affects the test statistics: heterogeneity in group means and variances alters their explicit form and the corresponding regularity conditions, but the convergence rate depends only on the total sample size under both null and alternative hypotheses. Third, we provide theoretical justification for data-driven smooth tests in the ANOVA setting, including a revised limiting null distribution for the corresponding statistics in finite samples.

The remainder of this paper is organized as follows. Section 2 introduces the general testing framework. Section 3 establishes theoretical results for three types of ANOVA models. Section 4 proposes a data-driven testing procedure. Section 5 reports simulation results. Section 6 discusses further extensions. Section 7 concludes. Proofs and additional numerical results are provided in Appendix 8.

2 The testing framework↩︎

Our problem of interest is to determine whether the random error \(\varepsilon = \sigma e\) in ANOVA models is normally distributed with mean zero and variance \(\sigma^2\), that is, \(\varepsilon \sim \mathcal{N} (0,\sigma^2)\) for some \(\sigma>0\). Equivalently, this can be formulated as testing the null hypothesis about the standardized random error \(H_0: e \sim \mathcal{N} (0,1)\) against alternatives of nonnormality. To motivate our methodology, consider the transformation \(Z=\Phi(e)\), where \(\Phi(\cdot)\) denotes the cumulative distribution function (CDF) of the standard normal distribution. Under the null hypothesis of normality, the CDF of the transformed random variable \(Z\) is given by \[G(z)=\mathrm{P}(Z\le z)=\mathrm{P}\left(e \le \Phi^{-1}(z)\right)=\Phi(\Phi^{-1}(z))=z.\] Or, equivalently, under \(H_0\), the probability density function (PDF) of \(Z\) is \(g(z)\equiv 1, z\in[0,1]\), that is, \(Z \sim \mathcal{U}[0,1]\), where \(\mathcal{U}[0,1]\) denotes the uniform distribution on the interval \([0,1]\). Under the alternative, the density \(g(z)\) deviates from unity. This distinct behavior of \(Z\) under \(H_0\) and \(H_1\) provides the foundation for Neyman’s smooth test. In particular, [22] introduced the following smooth alternative to the uniform density: \[\label{pit95density1} h\left(z\right) =c\left( \boldsymbol{\theta}_K\right) \exp\left[ \sum_{k=1}^{K} \theta_k\pi_k\left(z\right) \right], \quad 0\le z\le 1,\tag{1}\] where \(c( \boldsymbol{\theta}_K)\) is a a normalizing constant depending on \(\boldsymbol{\theta}_K = (\theta_1,\theta_2,...,\theta_K)^\top\), and \(\{ \pi_k\left(z\right)\} _{k=0}^{\infty}\) denotes an orthonormal system in \(L_2[0,1]\) with \(\pi_0(z)=1\) and \[\int_0^1\pi_k\left(z\right) \pi_{l}\left(z\right) \mathrm{d}z=\delta _{kl}, \text{ where }\delta_{kl}=\begin{cases} =1\text{, if }k=l,\\ =0\text{, if }k\neq l. \end{cases} \label{orth95cond1}\tag{2}\] The null hypothesis \(Z\sim \mathcal{U}[0,1]\) can thus be assessed by testing \(\theta_1=\theta_2= \ldots =\theta_K=0\) in 1 .

If independent and identically distributed (i.i.d.) observations \(\{\varepsilon_{i}\}_{i=1}^n\) are available and \(\sigma\) were known, the smooth test statistic for testing \(H_0\) has the following quadratic form: \[\label{iFY} \Psi_K^2 = \sum_{k=1}^{K}\left(\frac{1}{n} \sum_{i=1}^{n} \pi_k(Z_{i})\right)^2,\tag{3}\] where \(K\) is a fixed and given positive integer and \(Z_{i}=\Phi(e_i)=\Phi(\varepsilon_i/\sigma)\). Under \(H_0\), the statistic \(n\Psi_K^2\) converges in distribution to a Chi-square law with \(K\) degrees of freedom, denoted by \(\chi_K^2\). Moreover, the test inherits the local optimal properties of Rao’s score test.

However, in practice, the sequence \(\{\varepsilon_{i}\}_{i=1}^n\) is unobserved and \(\sigma\) is unknown. We instead rely on approximations \(\{\hat{\varepsilon}_i\}_{i=1}^n\) for \(\{\varepsilon_{i}\}_{i=1}^n\) and an estimator \(\hat{\sigma}\) for \(\sigma\) (e.g., residuals and the sample standard deviation, respectively). Accordingly, we consider the following approximation for \(Z_i\): \[\hat{Z}_{i}=\Phi(\hat{e}_i)=\Phi\left(\frac{\hat{\varepsilon}_{i}}{\hat{\sigma}}\right), \quad i=1,\ldots,n,\] and construct a feasible test statistic of the following form: \[\sum_{k=1}^{K}\left(\frac{1}{n}\sum_{i=1}^{n}\pi_k\left(\hat{Z}_{i}\right)\right)^2. \label{naive}\tag{4}\] The random variables \(\{\hat{Z}_{i}\}_{i=1}^n\) are not i.i.d. any longer due to estimation effects caused by \(\hat{\varepsilon}_i\) and \(\hat{\sigma}\). Therefore, it is crucial to study the asymptotic properties of \({n}^{-1}\sum_{i=1}^{n}\pi_k(\hat{Z}_{i})\). Compared with \(\Psi_K^2\) in 3 , the statistic based on \(\hat{Z}_{i}\) suffers from non-negligible estimation effects, as demonstrated in the theoretical results below. Consequently, the presence of estimation effects invalidates the form of the “naive” statistic 4 and requires a different normalization matrix to restore the \(\chi_K^2\) limiting null distribution. This motivates a detailed analysis of estimation effects, leading to the correct quadratic-form test statistic.

Another important issue concerns sample sizes in ANOVA models. To highlight the essence of smooth tests, we first focus on the simplified setting of a single random sequence with sample size \(n\) throughout this section. This simplified setting captures the core methodological ideas and facilitates the derivation of key asymptotic properties. In contrast, the complete ANOVA setting involves additional complications arising from multiple sample sizes and the associated indexing structure, which demand more refined analysis. From a practical standpoint, our method is further designed to accommodate scenarios where the number of groups increases with the total sample size. These issues will be systematically investigated in subsequent analysis of specific ANOVA models.

3 Testing for normality in one-way fixed effects models↩︎

In this section, we develop smooth tests for normality in three types of one-way fixed effects models, where each model assumes equality in at least one of the group means or the group variances. The testing procedure and its theoretical properties are investigated separately for each case. Let \(\{Y_{ij}\}_{i=1,j=1}^{N_j,J}\) denote the observations of interest coming from \(J\) groups, with \(N_j\) being the sample sizes in group \(j\) for \(1 \le j \le J\). The total sample size is defined as \(N = \sum_{j=1}^J N_j\). In our theoretical framework, the number of groups, \(J\), is allowed to diverge as \(N\) increases.

3.1 Same mean and same variance↩︎

We begin with the ANOVA models where both the group means and group variances are equal. Specifically, consider the model \[Y_{ij} = \mu + \varepsilon_{ij} = \mu + \sigma e_{ij}, \quad i = 1, \ldots, N_j, \quad j = 1, \ldots, J, \label{model95same}\tag{5}\] where \(\{e_{ij}\}_{i=1,j=1}^{N_j,J}\) are i.i.d. standardized random errors with mean zero and unit variance, \(\mu\) denotes the common group mean, and \(\sigma\) is the standard deviation of random errors \(\{\varepsilon_{ij}\}_{i=1,j=1}^{N_j,J}\). We also define \(Z_{ij} = \Phi(e_{ij})\).

The parameters \(\mu\) and \(\sigma^2\) in 5 are estimated by the sample mean \(\hat{\mu}=N^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}Y_{ij}\) and the sample variance \(\hat{\sigma}^2=N^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\left(Y_{ij}-\hat{\mu}\right)^2\), respectively. Define \(\hat{\varepsilon}_{ij}=Y_{ij}-\hat{\mu}\), \(\hat{e}_{ij}=\hat{\varepsilon}_{ij}/\hat{\sigma}=(Y_{ij}-\hat{\mu}) / \hat{\sigma}\) and \(\hat{Z}_{ij} = \Phi (\hat{e}_{ij})\). To derive the feasible test statistic and its properties, we first analyze the asymptotic behavior of \(N^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k(\hat{Z}_{ij})\) for \(k=1,\ldots,K\). For this purpose, we impose the following two assumptions.

Assumption 1. \(\{e_{ij}\}_{i=1,j=1}^{N_j,J}\) are i.i.d. with continuous CDF \(F(x)\), PDF \(f(x)\), mean zero, unit variance, and finite sixth moments.

Assumption 2. For \(k =1,\ldots, K\), \(\pi_k(\cdot)\) are twice continuously differentiable with derivatives \(\dot{\pi}_k(\cdot)\) and \(\ddot\pi_k(\cdot)\), and they are both bounded.

The two mild assumptions are in line with similar ones adopted in the literature of Neyman’s smooth tests, including [26] and [27]. For subsequent analysis, we introduce two constants: \[c_{1k} = \int_0^1 \pi_k (z) \Phi^{-1} (z) \mathrm{d}z, \quad c_{2k} = \int_0^1 \pi_k (z) \left( \Phi^{-1} (z) \right)^2 \mathrm{d}z, \quad k=1,\ldots,K.\]

Theorem 1. Suppose Assumptions 1 and 2 hold. Then, under the null hypothesis \(H_0: e \sim \Phi(x)\), for \(k=1,\ldots,K\), \[\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k\left(\hat{Z}_{ij}\right)=\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\left\{\pi_k(Z_{ij})-c_{1k}e_{ij}-\frac{c_{2k}}{2}\left(e_{ij}^2-1\right)\right\} + o_p\left(\frac{1}{\sqrt{N}}\right),\] as \(N\to\infty\).

Theorem 1 states that \(N^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k(\hat{Z}_{ij})\) is equivalent to the sum of three terms after neglecting the higher-order term: \(N^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k(Z_{ij})\), \(-c_{1k}N^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}e_{ij}\), and \(-c_{2k}(2N)^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}(e_{ij}^2-1)\). The latter two represent the estimation effects due to estimating \(\mu\) and \(\sigma^2\), respectively, and they contribute to the limiting distribution of the feasible smooth test statistic based on \(N^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k(\hat{Z}_{ij})\), \(k=1,\ldots,K\).

With the assistance of Theorem 1, and by invoking the central limit theorem (CLT), we obtain the asymptotic normality of the \(K\)-dimensional vector \[\label{asym:thm1:K95dim95vec} \begin{align} \frac{1}{\sqrt{N}}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\boldsymbol{\pi}_K\left(\hat{Z}_{ij}\right) & =\sqrt{N}\left[\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_1\left(\hat{Z}_{ij}\right),\ldots,\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_K\left(\hat{Z}_{ij}\right)\right]^\top \\ & \to_d \mathcal{N}_K\left(\boldsymbol{0}, \boldsymbol{\Sigma}_K\right), \end{align}\tag{6}\] where the asymptotic covariance matrix \(\boldsymbol{\Sigma}_K=\left(\sigma_{kl}\right)_{K\times K}\) is given by \[\begin{align} \sigma_{kl} &= \mathrm{E}\left[\left(\pi_k(Z)-c_{1k}e-\frac{c_{2k}}{2}\left(e^2-1\right)\right)\left(\pi_{l}(Z)-c_{1l}e-\frac{c_{2l}}{2}\left(e^2-1\right)\right)\right]\\ & =\delta_{kl} - c_{1k} c_{1l} - \frac{1}{2} c_{2k} c_{2l}. \end{align}\] Based on this result, we define the feasible smooth test statistic as \[\hat{\Psi}_K^2=\left(\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\boldsymbol{\pi}_K\left(\hat{Z}_{ij}\right)\right)^\top\boldsymbol{\Sigma}_K^{-1}\left(\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\boldsymbol{\pi}_K\left(\hat{Z}_{ij}\right)\right). \label{hat95Psi95k951}\tag{7}\] The following corollary establishes the limiting null distribution of \(\hat{\Psi}_K^2\) defined in 7 .

Corollary 1. Suppose Assumptions 1 and 2 hold. Then, under the null hypothesis \(H_0:e \sim\Phi(x)\), \[N\hat{\Psi}_K^2 \to_d \chi_K^2,\] as \(N\to\infty\).

Corollary 1 establishes an asymptotic \(\chi^2\) test for the normality of model 5 based on the statistic \(N\hat{\Psi}_K^2\). Given the asymptotic significance level of \(\alpha\), we reject the null hypothesis if \[N\hat{\Psi}_K^2 > \chi_{K, 1 - \alpha}^2,\] where \(\chi_{K, 1 - \alpha}^2\) denotes the \((1 - \alpha)\)-th quantile of the \(\chi_K^2\) distribution.

We now investigate the asymptotic behavior of \(\hat{\Psi}_K^2\) under the fixed alternatives as well as under a Pitman-type sequence of local alternatives. For the fixed alternatives, we consider the following form: \[H_1: F(x)\ne\Phi(x)\text{, and there exists at least one }1\le k \le K\text{ such that } \mathrm{E}[\pi_k (Z)] \ne 0. \label{H195fixed}\tag{8}\] Under \(H_1\) given by 8 , the random error \(e\) is no longer normally distributed. Consequently, the asymptotic behavior of \(N^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k(\hat{Z}_{ij})\) and \(\hat{\Psi}_K^2\) differs from that under \(H_0\). To formally state the corresponding theoretical results under \(H_1\), we introduce the following notations. Let \[d_{1k}=\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)\right]=\int_{- \infty}^{\infty}\dot{\pi}_k\left(\Phi\left(x\right)\right)\phi\left(x\right)f(x)\mathrm{d}x,\] \[d_{2k}=\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)e\right]=\int_{- \infty}^{\infty}\dot{\pi}_k\left(\Phi\left(x\right)\right)\phi\left(x\right)xf(x)\mathrm{d}x,\] \[d_{3k}=\mathrm{E}\left[\pi_k(Z)e\right]=\int_{- \infty}^{\infty}\pi_k\left(\Phi\left(x\right)\right)xf(x)\mathrm{d}x,\] and \[d_{4k}=\mathrm{E}\left[\pi_k(Z)e^2\right]=\int_{- \infty}^{\infty}\pi_k\left(\Phi\left(x\right)\right)x^2f(x)\mathrm{d}x,\] for \(k=1,\ldots,K\). Note that under \(H_0\), the constants \(d_{1k}=d_{3k}=c_{1k}\) and \(d_{2k}=d_{4k}=c_{2k}\) since \(f(x) = \phi(x)\). The following theorem characterizes the asymptotic properties of \(\hat{\Psi}_K^2\) under the alternative hypothesis \(H_1\).

Theorem 2. Suppose Assumptions 1 and 2 hold. Then, under the alternative hypothesis \(H_1\) in 8 , for \(k=1,\ldots,K\), \[\label{H195same95decomp} \begin{align} \frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k\left(\hat{Z}_{ij}\right)=&\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\left\{\left(\pi_k(Z_{ij})-\mathrm{E}\left[\pi_k(Z)\right]\right)-d_{1k}e_{ij}-\frac{d_{2k}}{2}\left(e_{ij}^2-1\right)\right\}\\ &+\mathrm{E}\left[\pi_k(Z)\right]+o_p\left(\frac{1}{\sqrt N}\right), \end{align}\qquad{(1)}\] as \(N\to\infty\). Furthermore, \[\hat{\Psi}_K^2 \to_p \boldsymbol{a}_K^\top \boldsymbol{\Sigma}_K^{-1} \boldsymbol{a}_K,\] and \[\sqrt{N} \left( \hat{\Psi}_K^2 - \boldsymbol{a}_K^\top \boldsymbol{\Sigma}_K^{-1} \boldsymbol{a}_K \right) \to_d \mathcal{N}\left(0, 4 \boldsymbol{a}_K^\top \boldsymbol{\Sigma}_K^{-1} \boldsymbol{\Xi}_K \boldsymbol{\Sigma}_K^{-1} \boldsymbol{a}_K\right), \label{sandwitch}\qquad{(2)}\] where \(\boldsymbol{a}_K\equiv (a_1,\ldots,a_K)^\top = ( \mathrm{E}[\pi_1 (Z)], \ldots, \mathrm{E}[\pi_K (Z)])^\top\) and \(\boldsymbol{\Xi}_K=\left(\xi_{kl}\right)_{K\times K}\) is given by \[\begin{align} \xi_{kl} = & \mathrm{E}\left[ \pi_k (Z) \pi_l (Z) \right]-a_k a_l-\left[d_{1k}d_{3l} + d_{1l}d_{3k} \right] + d_{1k} d_{1l} + \frac{1}{2} \left[ a_k d_{2l} + a_l d_{2k} \right] \\ &-\frac{1}{2}\left[ d_{2k}d_{4l} + d_{2l}d_{4k}\right] + \frac{1}{2}\left[ d_{1k}d_{2l} + d_{1l}d_{2k}\right] \mathrm{E}\left[e^3\right] +\frac{d_{2k} d_{2l} }{4} \left[\mathrm{E}\left[e^4\right]-1 \right]. \end{align}\]

From Theorem 2, it follows that under the alternative hypothesis \(H_1\) in 8 , if there exists at least one \(1\le k\le K\) such that \[a_k \equiv \mathrm{E}[\pi_k(Z)]=\int_{-\infty}^{\infty} \pi_k(\Phi(x))f(x)\mathrm{d}x =\int_0^1\pi_k(z)\frac{f(\Phi^{-1}(z))}{\phi(\Phi^{-1}(z))}\mathrm{d}z\ne 0,\] then the smooth test statistic satisfies \(N \hat{\Psi}_K^2 \to \infty\) in probability as \(N \to \infty\), implying that the asymptotic power of the test is \(1\). Specifically, the power function is given by \[1 - \Phi \left( \left(4 N \boldsymbol{a}_K^\top \boldsymbol{\Sigma}_K^{-1} \boldsymbol{\Xi}_K \boldsymbol{\Sigma}_K^{-1} \boldsymbol{a}_K\right)^{-1/2} \chi_{K, 1 - \alpha}^2 - \left( N \boldsymbol{a}_K^\top \boldsymbol{\Sigma}_K^{-1} \boldsymbol{\Xi}_K \boldsymbol{\Sigma}_K^{-1} \boldsymbol{a}_K/4 \right)^{1/2}\right).\]

For further investigation of the test’s power, we consider a Pitman-type sequence of local alternatives that converges to the null hypothesis at an appropriate rate, which is specified as follows: \[H_{1L}: F(x) = (1-\delta_N)\Phi(x)+\delta_N Q(x), \label{H1L}\tag{9}\] where \(Q\left(x\right)\) (which admits PDF \(q\left(x\right)\)) represents some distribution function that is different from \(\Phi(x)\), and \(\delta_N \to 0\) as \(N\to \infty\). Define \[\Delta_k = \int_0^1\pi_k(z)\frac{q(\Phi^{-1}(z))}{\phi(\Phi^{-1}(z))} \mathrm{d}z, \quad k=1,\ldots,K.\] The following theorem presents the theoretical properties of \(N\hat{\Psi}_K^2\) under the local alternatives. Specifically, the rate of \(\delta_N\) tending to \(0\) as \(N \to \infty\) is crucial to the nontrivial local power of the test.

Theorem 3. Suppose Assumptions 1 and 2 hold. Then, under the local alternative hypothesis \(H_{1L}\) in 9 , with \(\delta_N = N^{-1/2}\), for \(k=1,\ldots,K\), \[\label{decomp95H1L} \begin{align} \frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k\left(\hat{Z}_{ij}\right)=&\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\left\{\left(\pi_k(Z_{ij})-\mathrm{E}\left[\pi_k(Z)\right]\right)-c_{1k}e_{ij}-\frac{c_{2k}}{2}\left(e_{ij}^2-1\right)\right\}\\ &+\delta_N \Delta_k+o_p\left(\frac{1}{\sqrt N}\right), \end{align}\qquad{(3)}\] as \(N\to\infty\). Furthermore, \[N\hat{\Psi}_K^2 \to_d \chi_K^2\left(\boldsymbol{\Delta}_K^\top\boldsymbol{\Sigma}_K^{-1}\boldsymbol{\Delta}_K\right),\] where \(\boldsymbol{\Delta}_K = ( \Delta_1, \ldots, \Delta_K)^\top\), and \(\chi^2_K \left(\tau\right)\) denotes the noncentral \(\chi^2\) distribution with \(K\) degrees of freedom and nonnegative noncentrality parameter \(\tau\).

It is worthwhile to mention that \(N^{-1/2}\) is the critical order of magnitude for our test to achieve nontrivial power against the Pitman-type sequence of local alternatives. If \(\delta_N = o( N^{-1/2})\), it would be impossible to distinguish the alternatives from the null. Under \(H_{1L}\) in 9 , as long as there exists at least one \(1 \le k \le K\) such that \(\Delta_k \neq 0\), the proposed test statistic \(N \hat{\Psi}_K^2\) attains nontrivial asymptotic power against the local alternatives because the noncentrality parameter \(\boldsymbol{\Delta}_K^\top\boldsymbol{\Sigma}_K^{-1}\boldsymbol{\Delta}_K>0\).

Remark 1. We emphasize that the aforementioned smooth test is not “consistent” in the strict sense; that is, the power of the test does not necessarily approach \(1\) as \(N \to \infty\) under all directions of fixed alternatives. Only under alternatives where \(\mathrm{E}[\pi_k(Z)]\neq 0\) for at least one \(1 \leq k \leq K\) does the asymptotic power converge to \(1\). Otherwise, if \(\mathrm{E}[\pi_k(Z)]=0\) for all \(1 \leq k \leq K\) (but \(\mathrm{E}[\pi_{K+1}(Z)]\neq 0\) maybe), \(\hat{\Psi}_K^2\) will fail to detect the discrepancy between \(F\) and \(\Phi\). For example, let \(e\sim \mathcal{U}[-\sqrt{3},\sqrt{3}]\) so that \(\mathrm{E}[e]=0\) and \(\operatorname{Var}[e]=1\), and consider the first-order orthonormal Legendre polynomial \(\pi_1(z)=\sqrt{3}(2z-1)\). For \(Z=\Phi(e)\), we have \[\begin{align} \mathrm{E}[Z] &=\frac{1}{2\sqrt{3}} \int_{-\sqrt{3}}^{\sqrt{3}} \Phi(z) \mathrm{d}z \\ &= \frac{1}{2\sqrt{3}} z\Phi(z)\Big|_{-\sqrt{3}}^{\sqrt{3}}-\frac{1}{2\sqrt{3}} \int_{-\sqrt{3}}^{\sqrt{3}}z \mathrm{d} \Phi(z) \\ &=\frac{1}{2} \left(\Phi\left(\sqrt{3}\right)+\Phi\left(-\sqrt{3}\right)\right) \\ & = \frac{1}{2}, \end{align}\] which implies \(\mathrm{E}[\pi_1(Z)]=0\). In fact, smooth tests are neither directional nor omnibus; that is, they maintain reasonable power across a broad—but not universal—range of alternatives, and generally exhibit good finite-sample power properties; see [26] for further discussion.

3.2 Different means and same variance↩︎

Next, we consider ANOVA models in which the group means are allowed to differ, that is, \[Y_{ij} = \mu_j + \varepsilon_{ij}=\mu_j+ \sigma e_{ij},\quad i=1,\ldots,N_j,\quad j=1,\ldots,J, \label{model95diff95mean}\tag{10}\] where \(\{\mu_j\}_{j=1}^J\) denote the potentially distinct means of each group. Let \(\hat{\mu}_j=N_j^{-1}\sum_{i=1}^{N_j}Y_{ij}\) and \(\hat{\sigma}^2=N^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\left(Y_{ij}-\hat{\mu}_j\right)^2\) be the estimators of \(\mu_j\) (\(j = 1,\ldots,J\)) and \(\sigma^2\), respectively. Correspondingly we define \(\hat{\varepsilon} _{ij} = Y_{ij}-\hat{\mu}_j\), \(\hat{e}_{ij}=\hat{\varepsilon} _{ij}/\hat{\sigma}=(Y_{ij}-\hat{\mu}_j) / \hat{\sigma}\), and \(\hat{Z}_{ij}=\Phi(\hat{e}_{ij})\). The following theorem characterizes the properties of \(N^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k(\hat{Z}_{ij})\) under the model 10 .

Theorem 4. Suppose Assumptions 1 and 2 hold. Then, under the null hypothesis \(H_0: e \sim\Phi(x)\), for \(k=1,\ldots,K\), \[\begin{align} \frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k\left(\hat{Z}_{ij}\right)=&\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\left\{\pi_k(Z_{ij})-c_{1k}e_{ij}-\frac{c_{2k}}{2}\left(e_{ij}^2-1\right)\right\}+ o_p\left(\frac{1}{\sqrt{N}}\right), \end{align}\] as \(\min\{N_1,\ldots,N_J\}\to\infty\), \(J = o( N^{1/2})\) and \(\sum_{j=1}^J N_j^{-1}=o(1)\). Furthermore, for \(\hat{\Psi}_K^2\) defined in 7 , we have \[N\hat{\Psi}_K^2 \to_d \chi_K^2.\]

Theorem 4 parallels Theorem 1 and Corollary 1, except that it requires additional conditions on the group sample sizes and the number of groups. These conditions are imposed for technical reasons and to account for heterogeneity across group means in model 10 . The intuition behind the conditions is illustrated as follows. First, the sample size of each group, \(N_j\), must tend to infinity. In model 10 , where \(\mu_j\) can be distinct, only the observations within group \(j\) can be used to estimate \(\mu_j\), making \(N_j \to \infty\) necessary for the consistency of \(\hat{\mu}_j\). Second, the number of groups \(J\) is required to be finite or to diverge no faster than \(N^{1/2}\), and \(\sum_{j=1}^J N_j^{-1}=o(1)\) must hold jointly for \(\{N_j\}_{j=1}^J\) and \(J\). This rules out scenarios where the number of groups grows faster than the sample sizes within each group. In contrast, for model 5 in the previous section, it suffices that the total sample size \(N \to \infty\) to derive asymptotic properties, regardless of the magnitudes of \(N_j\) and \(J\). This is because in model 5 , all groups share the same mean \(\mu\) and variance \(\sigma^2\), so the observations \(\{Y_{ij}\}_{i=1,j=1}^{N_j,J}\) are i.i.d., allowing them to be pooled for efficient estimation and inference.

The following two theorems establish formally the asymptotic properties of \(N^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k(\hat{Z}_{ij})\) and \(\hat{\Psi}_K^2\) (defined in 7 ) under the alternatives. The conditions are identical to those in Theorem 4, and the results parallel those in Theorems 2 and 3.

Theorem 5. Suppose Assumptions 1 and 2 hold. Then, under the alternative hypothesis \(H_1\) in 8 , for \(k=1,\ldots,K\), \[\begin{align} \frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k\left(\hat{Z}_{ij}\right)=&\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\left\{\left(\pi_k(Z_{ij})-\mathrm{E}\left[\pi_k(Z)\right]\right)-d_{1k}e_{ij}-\frac{d_{2k}}{2}\left(e_{ij}^2-1\right)\right\}\\ &+\mathrm{E}\left[\pi_k(Z)\right]+o_p\left(\frac{1}{\sqrt N}\right), \end{align}\] as \(\min\{N_1,\ldots,N_J\}\to\infty\), \(J = o( N^{1/2})\) and \(\sum_{j=1}^J N_j^{-1}=o(1)\). Furthermore, \[\hat{\Psi}_K^2 \to_p \boldsymbol{a}_K^\top \boldsymbol{\Sigma}_K^{-1} \boldsymbol{a}_K,\] and \[\sqrt{N} \left( \hat{\Psi}_K^2 - \boldsymbol{a}_K^\top \boldsymbol{\Sigma}_K^{-1} \boldsymbol{a}_K \right) \to_d \mathcal{N}\left(0, 4\boldsymbol{a}_K^\top \boldsymbol{\Sigma}_K^{-1} \boldsymbol{\Xi}_K \boldsymbol{\Sigma}_K^{-1} \boldsymbol{a}_K\right).\]

Theorem 6. Suppose Assumptions 1 and 2 hold. Then, under the local alternative hypothesis \(H_{1L}\) in 9 , with \(\delta_N = N^{-1/2}\), for \(k=1,\ldots,K\), \[\begin{align} \frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k\left(\hat{Z}_{ij}\right)=&\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\left\{\left(\pi_k(Z_{ij})-\mathrm{E}\left[\pi_k(Z)\right]\right)-c_{1k}e_{ij}-\frac{c_{2k}}{2}\left(e_{ij}^2-1\right)\right\}\\ &+\delta_N \Delta_k+o_p\left(\frac{1}{\sqrt N}\right), \end{align}\] as \(\min\{N_1,\ldots,N_J\}\to\infty\), \(J = o( N^{1/2})\) and \(\sum_{j=1}^J N_j^{-1}=o(1)\). Furthermore, \[N\hat{\Psi}_K^2 \to_d \chi_K^2\left(\boldsymbol{\Delta}_K^\top\boldsymbol{\Sigma}_K^{-1}\boldsymbol{\Delta}_K\right).\]

Theorems 4, 5, and 6 provide an asymptotic \(\chi^2\) test for the normality of model 10 and demonstrate its power properties. Such a test is quite similar to that in Section 3.1. When the group means in 10 are all equal in the sense that \(\mu_1=\ldots=\mu_J=\mu\), the methodology in this section can be reduced to that in Section 3.1 for model 5 , which validates the unified inferential framework and reflects the effects of group heterogeneity in means.

3.3 Same mean and different variances↩︎

We now extend model 5 by allowing the group variances to differ across groups, that is, \[Y_{ij} = \mu + \varepsilon_{ij} = \mu + \sigma_j e_{ij}, \quad i = 1, \ldots, N_j, \quad j = 1, \ldots, J, \label{model95diff95var}\tag{11}\] where \(\{\sigma_j\}_{j=1}^J\) represents the potentially distinct standard deviations of each group. For estimation, we adopt \(\hat{\mu} = J^{-1} \sum_{j = 1}^J N_j^{-1} \sum_{i = 1}^{N_j} Y_{ij}\) and \(\hat{\sigma}_j^2 = N_j^{-1} \sum_{i = 1}^{N_j} ( Y_{ij} - \hat{\mu})^2\) for \(\mu\) and \(\sigma_j^2\) (\(j=1,\ldots, J\)), respectively. Note that the estimator of the population mean here is different from the sample mean in Section 3.1 due to the heterogeneity of group variances. Define \(\hat{\varepsilon}_{ij} = Y_{ij}-\hat{\mu}\), \(\hat{e}_{ij} = \hat{\varepsilon}_{ij}/ \hat{\sigma}_j= (Y_{ij} - \hat{\mu}) / \hat{\sigma}_j\) and \(\hat{Z}_{ij} = \Phi (\hat{e}_{ij})\). Let \(p_j=\lim N_j/N\) and \(q_j=\lim J N_j / N\) be quantities characterizing the relative proportions of the group sample sizes. Here we further impose the following conditions for model 11 : () there exist \(0<\underline{\sigma}\le \overline{\sigma}<\infty\) such that \(\underline{\sigma}< \inf_{1\le j\le J} \sigma_j\le \sup_{1\le j\le J} \sigma_j<\overline{\sigma}\); () there exist \(0<\underline{q}\le \overline{q}<\infty\) such that \(\underline{q}< \inf_{1\le j\le J} q_j\le \sup_{1\le j\le J} q_j<\overline{q}\). These conditions imply that the group-specific sample sizes and standard deviations are of the same order of magnitude.

Following the approaches in Sections 3.1 and 3.2, we first examine the asymptotic behavior of \(N^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k(\hat{Z}_{ij})\), which is summarized in the theorem below.

Theorem 7. Suppose Assumptions 1 and 2 hold. Then, given the conditions of model 11 , under \(H_0:e \sim\Phi(x)\), for \(k=1,\ldots,K\), \[\label{H095different95var95decomp} \begin{align} \frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k\left(\hat{Z}_{ij}\right) =& \frac{1}{N} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \left\{ \pi_k (Z_{ij}) - c_{1k}\left( \sum_{\ell = 1}^J \frac{ p_{\ell}}{\sigma_{\ell}} \right) \frac{\sigma_j e_{ij}}{q_j} - \frac{c_{2k}}{2} \left( e_{ij}^2 - 1 \right)\right\} \\ &+ o_p\left(\frac{1}{\sqrt N}\right), \end{align}\qquad{(4)}\] as \(\min\{N_1,\ldots,N_J\}\to\infty\) and \(J = o( N^{1/2})\).

Note that the asymptotic decomposition of \(N^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k(\hat{Z}_{ij})\) in Theorem 7 is no longer the same as that in Theorems 1 and 4. The difference lies in the parameter estimation effect of \(\hat{\mu}\), which is reflected by the term \(-c_{1k}N^{-1}(\sum_{\ell =1}^J p_{\ell}\sigma^{-1}_{\ell})\sum_{j=1}^J\sum_{i=1}^{N_j}\sigma_j e_{ij}/q_j\) on the right-hand side of ?? . Unlike model 5 , which uses the sample mean to estimate \(\mu\), model 11 employs a different estimator, \(\hat{\mu}\), resulting in this distinct estimation effect. In particular, when all groups share the same variance, i.e., \(\sigma_1=\ldots=\sigma_J\), the term reduces to \(-c_{1k}N^{-1} \sum_{j =1}^J q_j^{-1} \sum_{i=1}^{N_j} e_{ij} = -c_{1k} \sum_{j =1}^J N_j^{-1} \sum_{i=1}^{N_j} e_{ij}\); further if the group sample sizes are equal, i.e., \(N_1=\ldots=N_J\), then the estimation effect simplifies to that in Theorem 1, \(-c_{1k}N^{-1} \sum_{j =1}^J\sum_{i=1}^{N_j} e_{ij}\). In contrast, the estimation effect of the variances always aggregates into the term \(-c_{2k}(2N)^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}(e_{ij}^2-1)\) regardless of the equality of group variances or group sample sizes.

The asymptotic normality of the vector \(N^{-1} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \boldsymbol{\pi}_K (\hat{Z}_{ij})\) follows directly from ?? . Under the conditions of Theorem 7, an application of the CLT yields that \[\begin{align} \frac{1}{\sqrt{N}}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\boldsymbol{\pi}_K\left(\hat{Z}_{ij}\right) & =\sqrt{N}\left[\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_1\left(\hat{Z}_{ij}\right),\ldots,\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_K\left(\hat{Z}_{ij}\right)\right]^\top \\ & \to_d \sum_{j = 1}^J \sqrt{p_j} \boldsymbol{Z}_j, \end{align}\] where \(\boldsymbol{Z}_1, \ldots, \boldsymbol{Z}_J\) are independently distributed as \[\boldsymbol{Z}_j \sim \mathcal{N}_K \left( \boldsymbol{0}, \boldsymbol{\Omega}_K^{(j)} \right),\] with \(\boldsymbol{\Omega}_K^{(j)}=\left(\omega_{kl}^{(j)}\right)_{K\times K}\) given by \[\begin{align} \omega_{kl}^{(j)} & = \mathrm{E}\left\{ \pi_k (Z) -c_{1k} \left( \sum_{\ell = 1}^J \frac{p_{\ell}}{\sigma_{\ell}} \right) \frac{\sigma_j e}{q_j} - \frac{c_{2k}}{2} \left( e^2 - 1 \right)\right\} \\ & \qquad \left\{ \pi_l (Z) - c_{1l}\left( \sum_{\ell = 1}^J \frac{ p_{\ell}}{\sigma_{\ell}} \right) \frac{\sigma_j e}{q_j} - \frac{c_{2l}}{2} \left( e^2 - 1 \right)\right\} \\ & = \delta_{kl} - \frac{2 c_{1k} c_{1l} \sigma_j}{q_j} \sum_{\ell = 1}^J \frac{p_{\ell}}{\sigma_{\ell}} + \frac{c_{1k}c_{1 l} \sigma_j^2}{q_j^2} \left( \sum_{\ell = 1}^J \frac{p_{\ell}}{\sigma_{\ell}}\right)^2 - \frac{c_{2k} c_{2l}}{2}. \end{align}\] Therefore, under \(H_0\), we have \[\frac{1}{\sqrt{N}}\sum_{j = 1}^J \sum_{i = 1}^{N_j} \boldsymbol{\pi}_K \left(\hat{Z}_{ij}\right) \to_d \mathcal{N}_K \left( \boldsymbol{0}, \sum_{j = 1}^J p_j \boldsymbol{\Omega}_K^{(j)}\right).\] By replacing \(\sigma_j\), \(p_j\) and \(q_j\) with their sample analogues \(\hat{\sigma}_j\), \(\hat{p}_j = N_j / N\) and \(\hat{q}_j = J N_j / N\) respectively, we obtain a consistent estimator of \(\boldsymbol{\Omega}_K^{(j)}\), denoted by \(\widehat{\boldsymbol{\Omega}}_K^{(j)}\). Accordingly, the test statistic is defined as \[\label{hat95Psi95k952} \hat{\Psi}_K^2 = \left( \frac{1}{N}\sum_{j = 1}^J \sum_{i = 1}^{N_j} \boldsymbol{\pi}_K \left(\hat{Z}_{ij}\right) \right)^\top \left(\sum_{j = 1}^J \hat{p}_j \widehat{\boldsymbol{\Omega}}_K^{(j)} \right)^{-1} \left( \frac{1}{N}\sum_{j = 1}^J \sum_{i = 1}^{N_j} \boldsymbol{\pi}_K \left(\hat{Z}_{ij}\right) \right).\tag{12}\] We also introduce the infeasible version, \[\label{tilde95Psi95k} \tilde{\Psi}_K^2 = \left( \frac{1}{N}\sum_{j = 1}^J \sum_{i = 1}^{N_j} \boldsymbol{\pi}_K \left(\hat{Z}_{ij}\right) \right)^\top \left(\sum_{j = 1}^J p_j \boldsymbol{\Omega}_K^{(j)} \right)^{-1} \left( \frac{1}{N}\sum_{j = 1}^J \sum_{i = 1}^{N_j} \boldsymbol{\pi}_K \left(\hat{Z}_{ij}\right) \right).\tag{13}\]

Corollary 2. Suppose Assumptions 1 and 2 hold. Then, given the conditions of model 11 , under \(H_0:e \sim\Phi(x)\), \[N\hat{\Psi}_K^2 \to_d \chi_K^2 \text{ and } N\tilde{\Psi}_K^2 \to_d \chi_K^2,\] as \(\min\{N_1,\ldots,N_J\}\to\infty\) and \(J = o( N^{1/2})\).

Corollary 2 shows that under the null hypothesis, both the feasible test statistic \(\hat{\Psi}_K^2\) (defined in 12 ) and its infeasible counterpart \(\tilde{\Psi}_K^2\) (in 13 ) are asymptotically \(\chi_K^2\) distributed when multiplied by the total sample size \(N\). Although these test statistics take a slightly different form from those in the previous Sections 3.1 and 3.2, the limiting Chi-square distribution is retained, and the convergence rate remains \(N\), which further supports the validity of our unified theoretical framework. The following two theorems characterize the asymptotic behavior of \(\tilde{\Psi}_K^2\) under both fixed and local alternatives, in a manner analogous to that of \(\hat{\Psi}_K^2\) discussed in Sections 3.1 and 3.2.

Theorem 8. Suppose Assumptions 1 and 2 hold. Then, given the conditions of model 11 , under the alternative hypothesis \(H_1\) in 8 , for \(k=1,\ldots,K\), \[\begin{align} \frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k\left(\hat{Z}_{ij}\right) =& \frac{1}{N} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \left\{ \left(\pi_k (Z_{ij})-\mathrm{E}\left[\pi_k(Z)\right]\right) - d_{1k}\left( \sum_{\ell = 1}^J \frac{ p_{\ell}}{\sigma_{\ell}} \right) \frac{\sigma_j e_{ij}}{q_j} \right. \nonumber\\ &-\left.\frac{d_{2k}}{2} \left( e_{ij}^2 - 1 \right)\right\}+\mathrm{E}\left[\pi_k(Z)\right]+ o_p\left(\frac{1}{\sqrt N}\right), \label{H195different95var95decomp} \end{align}\qquad{(5)}\] as \(\min\{N_1,\ldots,N_J\}\to\infty\) and \(J = o( N^{1/2})\). Furthermore, \[\tilde{\Psi}_K^2 \to_p \boldsymbol{a}_K^\top \left(\sum_{j = 1}^J p_j \boldsymbol{\Omega}_K^{(j)} \right)^{-1}\boldsymbol{a}_K,\] and \[\sqrt{N} \left( \tilde{\Psi}_K^2 - \boldsymbol{a}_K^\top \left(\sum_{j = 1}^J p_j \boldsymbol{\Omega}_K^{(j)} \right)^{-1} \boldsymbol{a}_K \right) \to_d \mathcal{N}\left(0, 4\boldsymbol{a}_K^\top\boldsymbol{\Upsilon}_K \boldsymbol{a}_K\right), \label{Psi95tilde95asym95normal}\qquad{(6)}\] where \[\boldsymbol{\Upsilon}_K= \left(\sum_{j = 1}^J p_j \boldsymbol{\Omega}_K^{(j)} \right)^{-1}\left(\sum_{j = 1}^J p_j \boldsymbol{\Lambda}_K^{(j)} \right) \left(\sum_{j = 1}^J p_j \boldsymbol{\Omega}_K^{(j)} \right)^{-1},\] and \(\boldsymbol{\Lambda}_K^{(j)}=\left(\lambda_{kl}^{(j)}\right)_{K\times K}\) is given by \[\begin{align} \lambda_{kl}^{(j)} = &\mathrm{E}\left[ \pi_k (Z) \pi_l (Z) \right]-a_k a_l- \frac{ (d_{1k} d_{3l}+d_{1l} d_{3k}) \sigma_j}{q_j} \sum_{\ell = 1}^J \frac{p_{\ell}}{\sigma_{\ell}} + \frac{d_{1k}d_{1 l} \sigma_j^2}{q_j^2} \left( \sum_{\ell = 1}^J \frac{p_{\ell}}{\sigma_{\ell}}\right)^2 \\ &+ \frac{1}{2} \left[ a_k d_{2l} + a_l d_{2k} \right]-\frac{1}{2}\left[ d_{2k}d_{4l} + d_{2l}d_{4k}\right]+ \frac{ (d_{1k} d_{2l}+d_{1l} d_{2k}) \sigma_j}{2q_j} \sum_{\ell = 1}^J \frac{p_{\ell}}{\sigma_{\ell}}\mathrm{E}\left[e^3\right]\\ &+\frac{d_{2k} d_{2l} }{4} \left[\mathrm{E}\left[e^4\right]-1 \right]. \end{align}\]

Theorem 9. Suppose Assumptions 1 and 2 hold. Then, given the conditions of model 11 , under the local alternative hypothesis \(H_{1L}\) in 9 , with \(\delta_N = N^{-1/2}\), for \(k=1,\ldots,K\), \[\begin{align} \frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k\left(\hat{Z}_{ij}\right) =& \frac{1}{N} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \left\{ \left(\pi_k (Z_{ij})-\mathrm{E}\left[\pi_k(Z)\right]\right) - c_{1k}\left( \sum_{\ell = 1}^J \frac{ p_{\ell}}{\sigma_{\ell}} \right) \frac{\sigma_j e_{ij}}{q_j} \right.\\ &-\left. \frac{c_{2k}}{2} \left( e_{ij}^2 - 1 \right)\right\}+\delta_N\Delta_k+ o_p\left(\frac{1}{\sqrt N}\right), \end{align}\] as \(\min\{N_1,\ldots,N_J\}\to\infty\) and \(J = o( N^{1/2})\). Furthermore, \[N\tilde{\Psi}_K^2 \to_d \chi_K^2\left(\boldsymbol{\Delta}_K^\top\left(\sum_{j = 1}^J p_j \boldsymbol{\Omega}_K^{(j)} \right)^{-1}\boldsymbol{\Delta}_K\right).\]

Remark 2. To conclude this section, we summarize the regularity conditions for the three models in Table 1. On one hand, more complex model structures naturally impose additional restrictions on parameters and group sample sizes, as the heterogeneity across groups introduces extra challenges for theoretical analysis. On the other hand, under these mild conditions, an asymptotic Chi-square test with convergence rate \(N\) can be established for each model. This highlights that our proposed methodology provides a unified theoretical framework while remaining flexible enough to accommodate the distinctive features of different ANOVA settings.

Table 1: The regularity conditions for the three models
Model	Conditions
Same mean & same variance 5	\(N \to \infty\)
Different means & same variance 10
\(J = o( N^{1/2})\), \(\sum_{j=1}^J N_j^{-1}=o(1)\)
Same mean & different variances 11
\(J = o( N^{1/2})\) , \(\sum_{j=1}^J N_j^{-1}=o(1)\),
\(\underline{\sigma}< \inf_{1\le j\le J} \sigma_j\le \sup_{1\le j\le J} \sigma_j<\overline{\sigma}\),
\(\underline{q}< \inf_{1\le j\le J} q_j\le \sup_{1\le j\le J} q_j<\overline{q}\)

4 Data-driven choice of \(K\)↩︎

The methodology in Section 3 relies on the fact that the order of smooth alternatives 1 , \(K\), is fixed before testing. The choice of \(K\) critically affects the performance of smooth tests [18], [28]–[39]. A large \(K\) introduces redundant components in the alternatives that contribute little to the test statistic but inflate the degrees of freedom, leading to power dilution. Conversely, a small \(K\) increases the risk that \(\mathrm{E}[\pi_k (Z)] = 0\) for all \(1\le k \le K\), rendering the test powerless. Hence, an appropriate choice of \(K\) is crucial. In this section, we focus on data-driven selection of \(K\) for the three cases in Section 3.

In the literature of Neyman’s smooth tests, data-driven selection was first proposed by [28] for testing uniformity. The data-driven procedure consists of two steps. First, Schwarz’s selection rule (also known as the Bayesian Information Criterion, BIC) is applied to determine a suitable dimension for the smooth model that best fits the data. Second, Neyman’s smooth test is performed within the selected model space, yielding a data-driven test statistic. The data-driven smooth test thus serves to combine preliminary model selection with a more precise inferential procedure, being Neyman’s test in the “right” direction. Desirable theoretical properties and extensive numerical results of the selection rule and the induced test statistic were established in [29] and [30], [31]. Later, modifications of Schwarz’s rule addressed more complex problems, including testing for composite hypotheses [32]–[34], testing for independence [35], two-sample testing problems [18], and so on. These modified rules leverage smooth test statistics directly, avoiding the computation of the maximized log-likelihood, and are easier to implement.

Motivated by these strategies, we adopt a modified Schwarz’s rule to determine the order \(K\) for smooth tests in ANOVA models. The selection rule is specified as follows:

\[\hat{K} = \min\left\{ \mathop{\arg\max}\limits_{1 \le k \le D} \left(N \hat{\Psi}_k^2 - k \log N \right)\right\}, \label{bic}\tag{14}\] where \(D\) is a fixed positive integer, and \(\hat{\Psi}_k^2\) takes the form of 7 or 12 depending on the specific scenario. The resulting data-driven test statistic is \(N\Psi_{\hat{K}}^2\), with \(\hat{K}\) given by 14 . The form of 14 is mainly inspired by [36], [37], [38] and [39] where the upper bound of selection \(D\) is fixed. An alternative is to let \(D = D(N) \to \infty\) as \(N \to \infty\) [18], [29]–[35]. Although a diverging upper bound can theoretically improve the consistency of smooth tests, the required rate of divergence is slow, and simulation studies show that empirical power levels off rapidly as \(D\) increases. Hence, our proposed selection rule 14 with fixed \(D\) is both reasonable and easy to implement in practice.

To establish the theoretical properties of \(\hat{K}\) and \(N\hat{\Psi}_{\hat{K}}^2\), we introduce a revised version of the alternative hypothesis \(H_1\) 8 and Assumption 2. Consider \[H_1^\prime: \mathrm{E}[\pi_1(Z)]=\ldots=\mathrm{E}[\pi_{K_0-1}(Z)]=0, \mathrm{E}[\pi_{K_0}(Z)]\ne 0, \quad K_0 \le D, \label{H195data95driven}\tag{15}\] which encompasses a broader range of alternatives than \(H_1\) by replacing the fixed order \(K\) with the larger \(D\). Correspondingly, the condition on the orthonormal system is extended to include all functions up to order \(D\), as follows.

Assumption 3. For \(k =1,\ldots, D\), \(\pi_k(\cdot)\) are two times differentiable with derivatives \(\dot{\pi}_k(\cdot)\) and \(\ddot\pi_k(\cdot)\), and they are both bounded.

The following theorem establishes the properties of data-driven tests under the null and the revised alternatives.

Theorem 10. Suppose Assumptions 1 and 3 hold.

For model 5 in Section 3.1, as \(N \to \infty\), under \(H_0: e \sim \Phi(x)\), \(\mathrm{P}(\hat{K} = 1) \to 1\) and \(N \hat{\Psi}_{\hat{K}}^2 \to_{d} \chi_1^2\); under \(H_1^\prime\) in 15 , \(\mathrm{P}( \hat{K} \ge K_0) \to 1\) and \(\mathrm{P}(N\hat{\Psi}^2_{\hat{K}}\le x) \to 0\) for any \(x\in \mathbb{R}\).
For model 10 in Section 3.2, as \(\min\{N_1,\ldots,N_J\}\to\infty\), \(J = o( N^{1/2})\), and \(\sum_{j=1}^J N_j^{-1}=o(1)\), under \(H_0: e \sim \Phi(x)\), \(\mathrm{P}(\hat{K} = 1) \to 1\) and \(N \hat{\Psi}_{\hat{K}}^2 \to_{d} \chi_1^2\); under \(H_1^\prime\) in 15 , \(\mathrm{P}( \hat{K} \ge K_0 ) \to 1\) and \(\mathrm{P}(N\hat{\Psi}^2_{\hat{K}}\le x) \to 0\) for any \(x\in \mathbb{R}\).
For model 11 in Section 3.3, with additional conditions such that () there exist \(0<\underline{\sigma}\le \overline{\sigma}<\infty\) such that \(\underline{\sigma}< \inf_{1\le j\le J} \sigma_j\le \sup_{1\le j\le J} \sigma_j<\overline{\sigma}\); () there exist \(0<\underline{q}\le \overline{q}<\infty\) such that \(\underline{q}< \inf_{1\le j\le J} q_j\le \sup_{1\le j\le J} q_j<\overline{q}\), as \(\min\{N_1,\ldots,N_J\}\to\infty\) and \(J = o( N^{1/2})\), under \(H_0: e \sim \Phi(x)\), \(\mathrm{P}(\hat{K} = 1) \to 1\), \(N \hat{\Psi}_{\hat{K}}^2 \to_{d} \chi_1^2\) and \(N \tilde{\Psi}_{\hat{K}}^2 \to_{d} \chi_1^2\); under \(H_1^\prime\) in 15 , \(\mathrm{P}( \hat{K} \ge K_0 ) \to 1\) and \(\mathrm{P}(N\tilde{\Psi}^2_{\hat{K}}\le x) \to 0\) for any \(x\in \mathbb{R}\).

Theorem 10 presents the unified results of data-driven smooth tests for the three cases in Section 3. Under the null hypothesis, the probability of \(\{\hat{K} = 1\}\) tends to \(1\) asymptotically, implying that the first-order smooth model provides the best fit to \(\{\hat{Z}_{ij}\}_{i=1,j=1}^{N_j,J}\) among the \(D\) candidate models. From the perspective of hypothesis testing, this means that \(\hat{\Psi}_1^2\) is informative and sufficient for assessing uniformity or normality. Meanwhile, the data-driven selection procedure allows for the identification of the mechanism behind nonuniformity (and also nonnormality) and enhances the test’s power against a broader class of alternatives, \(H_1^\prime\). These findings are consistent with previous results on data-driven smooth tests.

In practice, however, the \(\chi_1^2\) limiting null distribution of data-driven smooth test statistics often performs poorly in finite samples [18], [30], [32]–[39]. To address this issue in our problem, and provided the nested orthonormal system (i.e. \(\{\pi_k\}_{k=1}^K \subseteq \{\pi_k\}_{k=1}^{K+1}\) for \(1\le K\le D-1\)), we follow the approach of [30], [32], [33], [35], [18], [38] and adopt the following finite-sample approximation for the null distribution of \(N \hat{\Psi}_{\hat{K}}^2\): \[\begin{align} &H(x) = \begin{cases} (2 \Phi(\sqrt{x})-1)(2 \Phi(\sqrt{\log N})-1), & x \le \log N, \\ H(\log N)+(x-\log N)(H(2 \log N)-H(\log N))/(\log N), & \log N < x < 2 \log N, \\ (2 \Phi(\sqrt{x})-1)(2 \Phi(\sqrt{\log N})-1)+2(1-\Phi(\sqrt{\log N})), & x \ge 2 \log N. \end{cases} \end{align} \label{null95approx}\tag{16}\] Such an approximation primarily accounts for the selection uncertainty of \(\hat{K}\). In finite samples, under the null, the empirical probability of the event \(\{\hat{K} = 1\}\) may not be exactly \(1\), as \(\{\hat{K} = 2\}\) can occur with a small but negligible probability. Consequently, 16 is derived from the approximation \(\mathrm{P}(N \hat{\Psi}_{\hat{K}}^2 \le x) \approx \mathrm{P}(N \hat{\Psi}_1^2 \le x, \hat{K} = 1) + \mathrm{P}(N \hat{\Psi}_2^2 \le x, \hat{K} = 2)\). The reliability and accuracy of \(H(x)\) are further illustrated through simulation studies in the next section and the appendix.

5 Simulation↩︎

In this section, we conduct Monte Carlo experiments to examine the finite-sample performance of the proposed tests developed in Sections 3 and 4. Following [22], [26] and [27], we adopt the orthonormal Legendre polynomials on the interval \([0,1]\) for \(\{\pi_k\}_{k = 0}^{\infty}\). Under this choice, the analytical properties of the constants \(c_{1k}\) and \(c_{2k}\) are provided in Proposition 1 of [39], and numerical values for \(1 \le k \le 10\) are available in Table 1 of their Supplementary Material. From a computational perspective, the test statistic \(N\hat{\Psi}_K^2\), introduced in Sections 3.1 and 3.2, can be computed using a simplified expression analogous to equation (14) in [39], which leverages the closed form of \(\boldsymbol{\Sigma}_K^{-1}\) [see equations (12) and (13) therein].

We first study the tests for model 5 through two experiments. The observations are generated as follows.

Experiment :

\[Y_{ij}^{(0,1)} \sim \mathcal{N}\left( 5, 4\right), \quad Y_{ij}^{(1,1)} \sim \chi_2^2 + 3, \quad i = 1, \ldots, jm, \quad j = 1, \ldots, 5.\]

Experiment :

\[Y_{ij}^{(0,2)} \sim \mathcal{N}\left( 8, 1\right), \;Y_{ij}^{(1,2)} \sim \mathcal{U} \left[ 8 - \sqrt{3}, 8 + \sqrt{3}\right], \;i = 1, \ldots, jm, \;j = 1, \ldots, 5.\] Here \(\{Y_{ij}^{(0,1)}\}_{i=1,j=1}^{jm,5}\) and \(\{Y_{ij}^{(0,2)}\}_{i=1,j=1}^{jm,5}\) are generated under \(H_0\), while \(\{Y_{ij}^{(1,1)}\}_{i=1,j=1}^{jm,5}\) and \(\{Y_{ij}^{(1,2)}\}_{i=1,j=1}^{jm,5}\) are generated under \(H_1\). The sample size parameter \(m\) varies from 10 to 150 in increments of 10. We set the significance level \(\alpha = 5\%\) and perform \(500\) replications. We evaluate the test statistic \(N\hat{\Psi}_K^2\) from Section 3.1, with \(1 \leq K \leq 5\), and the data-driven test statistic \(N\hat{\Psi}_{\hat{K}}^2\) from Section 4, where \(\hat{K}\) is determined by 14 with \(D=5\).

Results of Experiment are shown in Tables 2, 3, 4 and Figure 1. Table 2 reports the empirical rejection rates under \(H_0\). For fixed \(K\), \(N\hat{\Psi}_K^2\) maintains the nominal level well, even for small sample sizes such as \(m = 10\) (corresponding to a total sample size of \(N = 150\)). For the data-driven test statistic \(N\hat{\Psi}_{\hat{K}}^2\), the limiting \(\chi^2_1\) distribution underestimates the variability under \(H_0\), while the revised distribution \(H(x)\) achieves better size control. Tables 3 and 4 present the empirical frequency of \(\hat{K}\) under \(H_0\) and \(H_1\), respectively. Under \(H_0\), the frequency of \(\{\hat{K}=1\}\) approaches \(1\) as \(m\) increases, whereas under \(H_1\), \(\hat{K}\) is consistently greater than \(1\), in line with theoretical expectations. Figure 1 displays the sample means of \(\hat{K}\) across different values of \(m\), with error bars representing one standard deviation around the mean. Under \(H_0\), the sample mean of \(\hat{K}\) stays close to \(1\) with diminishing variability as \(m\) grows, indicating consistent model selection. Under \(H_1\), \(\hat{K}\) increases steadily and approaches the maximum \(D=5\), with reduced variability, demonstrating the adaptiveness of the data-driven procedure.

The results of Experiment are quite similar; see Tables 5, 6, and 7, as well as Figure 2. However, from Table 5, under \(H_1\), the test statistic \(N\hat{\Psi}_1^2\) fails to exhibit power because in this case, \(\mathrm{E}[\pi_1(Z)] = \mathrm{E}[\pi_1(\Phi(Y_{ij}^{(1,2)} - 8))] = 0\). This illustrates that \(N\hat{\Psi}_K^2\) may be inconsistent against certain alternatives, especially when \(K\) is small. Correspondingly, both Table 6 and Figure 2 show that under \(H_1\), \(\{\hat{K} = 1\}\) does not occur, consistent with Theorem 10 (3), which ensures that \(\mathrm{P}(\hat{K} \ge K_0) \to 1\) under \(H_1^\prime\) with \(K_0 = 2\).

Table 2: Empirical rejection rates under \(H_0\) in Experiment , for fixed \(K\) (\(K = 1,\ldots,5\)), the data-driven test with limiting \(\chi^2_1\) null distribution (\(\hat{K}\&\chi^2_1\)), and the data-driven test with approximated null distribution \(H(x)\) (\(\hat{K}\&H(x)\)). The empirical power under \(H_1\) equals 1 across all settings.
\(m\)	\(K = 1\)	\(K = 2\)	\(K = 3\)	\(K = 4\)	\(K = 5\)	\(\hat{K}\&\chi^2_1\)	\(\hat{K}\&H(x)\)
10	0.050	0.048	0.048	0.044	0.040	0.056	0.034
20	0.046	0.044	0.034	0.040	0.048	0.066	0.046
30	0.042	0.036	0.044	0.068	0.068	0.060	0.046
40	0.058	0.040	0.036	0.058	0.052	0.058	0.050
50	0.064	0.044	0.054	0.054	0.062	0.068	0.060
60	0.054	0.034	0.038	0.044	0.050	0.070	0.062
70	0.056	0.038	0.048	0.046	0.052	0.056	0.048
80	0.038	0.050	0.072	0.042	0.048	0.058	0.048
90	0.062	0.066	0.052	0.064	0.058	0.048	0.040
100	0.034	0.050	0.050	0.060	0.054	0.052	0.038
110	0.054	0.032	0.054	0.060	0.050	0.038	0.036
120	0.052	0.052	0.038	0.044	0.052	0.052	0.050
130	0.052	0.050	0.064	0.060	0.058	0.068	0.064
140	0.062	0.040	0.026	0.046	0.030	0.038	0.030
150	0.054	0.044	0.062	0.034	0.058	0.056	0.048

Table 3: Empirical frequency of \(\hat{K}\) under \(H_0\) in Experiment
\(m\)	\(\hat{K}=1\)	\(\hat{K}=2\)	\(\hat{K}=3\)	\(\hat{K}=4\)	\(\hat{K}=5\)
10	0.982	0.018	0	0	0
20	0.988	0.010	0	0	0.002
30	0.986	0.012	0	0.002	0
40	0.984	0.014	0.002	0	0
50	0.988	0.012	0	0	0
60	0.980	0.020	0	0	0
70	0.994	0.006	0	0	0
80	0.996	0.002	0.002	0	0
90	0.996	0.002	0.002	0	0
100	0.996	0.004	0	0	0
110	0.994	0.006	0	0	0
120	0.990	0.010	0	0	0
130	0.994	0.006	0	0	0
140	0.994	0.006	0	0	0
150	0.994	0.006	0	0	0

Table 4: Empirical frequency of \(\hat{K}\) under \(H_1\) in Experiment
\(m\)	\(\hat{K}=1\)	\(\hat{K}=4\)	\(\hat{K}=5\)
10	0.004	0.484	0.512
20	0	0.326	0.674
30	0	0.194	0.806
40	0	0.090	0.910
50	0	0.076	0.924
60	0	0.030	0.970
70	0	0.012	0.988
80	0	0.010	0.990
90	0	0.006	0.994
100	0	0.004	0.996
110	0	0.006	0.994
120	0	0	1
130	0	0	1
140	0	0.002	0.998
150	0	0	1

Figure 1: The sample means and error bars of \hat{K} in Experiment — Figure 1: The sample means and error bars of \(\hat{K}\) in Experiment

Table 5: Empirical rejection rates in Experiment , for fixed \(K\) (\(K = 1,\ldots,5\)), the data-driven test with limiting \(\chi^2_1\) null distribution (\(\hat{K}\&\chi^2_1\)), and the data-driven test with approximated null distribution \(H(x)\) (\(\hat{K}\&H(x)\)).
	\(H_0\)							\(H_1\)
\(m\)	\(K = 1\)	\(K = 2\)	\(K = 3\)	\(K = 4\)	\(K = 5\)	\(\hat{K}\&\chi^2_1\)	\(\hat{K}\&H(x)\)	\(K = 1\)	\(K = 2\)	\(K = 3\)	\(K = 4\)	\(K = 5\)	\(\hat{K}\&\chi^2_1\)	\(\hat{K}\&H(x)\)
10	0.042	0.050	0.046	0.042	0.052	0.076	0.048	0.046	0.998	0.996	0.998	0.996	0.998	0.998
20	0.034	0.042	0.026	0.050	0.060	0.070	0.054	0.018	1	1	1	1	1	1
30	0.028	0.048	0.036	0.046	0.046	0.070	0.062	0.022	1	1	1	1	1	1
40	0.054	0.054	0.046	0.050	0.060	0.062	0.052	0.042	1	1	1	1	1	1
50	0.064	0.062	0.042	0.060	0.050	0.068	0.058	0.032	1	1	1	1	1	1
60	0.032	0.054	0.060	0.040	0.040	0.078	0.064	0.026	1	1	1	1	1	1
70	0.076	0.038	0.052	0.052	0.036	0.052	0.042	0.022	1	1	1	1	1	1
80	0.040	0.046	0.034	0.048	0.058	0.074	0.068	0.028	1	1	1	1	1	1
90	0.044	0.052	0.050	0.054	0.042	0.060	0.050	0.032	1	1	1	1	1	1
100	0.038	0.068	0.050	0.046	0.046	0.064	0.056	0.028	1	1	1	1	1	1
110	0.052	0.052	0.046	0.052	0.048	0.050	0.046	0.032	1	1	1	1	1	1
120	0.052	0.042	0.038	0.034	0.052	0.040	0.036	0.030	1	1	1	1	1	1
130	0.048	0.052	0.044	0.036	0.048	0.062	0.054	0.040	1	1	1	1	1	1
140	0.062	0.058	0.056	0.038	0.038	0.046	0.042	0.044	1	1	1	1	1	1
150	0.038	0.056	0.046	0.052	0.060	0.060	0.054	0.034	1	1	1	1	1	1

Table 6: Empirical frequency of \(\hat{K}\) under \(H_0\) in Experiment
\(m\)	\(\hat{K}=1\)	\(\hat{K}=2\)	\(\hat{K}=3\)	\(\hat{K}=5\)
10	0.968	0.026	0.004	0.002
20	0.982	0.016	0.002	0
30	0.978	0.020	0.002	0
40	0.992	0.008	0	0
50	0.990	0.006	0.004	0
60	0.996	0.004	0	0
70	0.994	0.006	0	0
80	0.984	0.016	0	0
90	0.996	0.004	0	0
100	0.994	0.006	0	0
110	0.996	0.004	0	0
120	0.996	0.004	0	0
130	0.994	0.006	0	0
140	0.994	0.006	0	0
150	0.988	0.012	0	0

Table 7: Empirical frequency of \(\hat{K}\) under \(H_1\) in Experiment
\(m\)	\(\hat{K}=1\)	\(\hat{K}=2\)	\(\hat{K}=3\)	\(\hat{K}=5\)
10	0.002	0.984	0.012	0.002
20	0	0.994	0.004	0.002
30	0	0.990	0.008	0.002
40	0	0.996	0.004	0
50	0	0.996	0.004	0
60	0	0.994	0.004	0.002
70	0	0.998	0.002	0
80	0	1	0	0
90	0	1	0	0
100	0	0.996	0.004	0
110	0	0.998	0.002	0
120	0	0.994	0.006	0
130	0	0.998	0.002	0
140	0	0.998	0.002	0
150	0	0.996	0.004	0

Figure 2: The sample means and error bars of \hat{K} in Experiment — Figure 2: The sample means and error bars of \(\hat{K}\) in Experiment

The next two experiments are designed to evaluate the tests for models 10 and 11 , respectively. All settings are inherited from the previous two experiments, except that the data-generating process is modified accordingly.

Experiment :

\[Y_{ij}^{(0,3)} \sim \mathcal{N}\left( 5j, 4\right), \quad Y_{ij}^{(1,3)} \sim \chi_2^2 + (5j - 2), \quad i = 1, \ldots, jm, \quad j = 1, \ldots, 5.\]

Experiment :

\[Y_{ij}^{(0,4)} \sim \mathcal{N}\left( 8, j^2\right), \;Y_{ij}^{(1,4)} \sim \mathcal{U} \left[ 8 - \sqrt{3} j, 8 + \sqrt{3} j\right], \;i = 1, \ldots, jm, \;j = 1, \ldots, 5.\] Here \(\{Y_{ij}^{(0,3)}\}_{i=1,j=1}^{jm,5}\) and \(\{Y_{ij}^{(0,4)}\}_{i=1,j=1}^{jm,5}\) are generated under \(H_0\), while \(\{Y_{ij}^{(1,3)}\}_{i=1,j=1}^{jm,5}\) and \(\{Y_{ij}^{(1,4)}\}_{i=1,j=1}^{jm,5}\) are generated under \(H_1\). Note that in Experiment , \(\mathrm{E}[Y_{ij}^{(0,3)}] = \mathrm{E}[Y_{ij}^{(1,3)}] = 5j\) and \(\operatorname{Var}[Y_{ij}^{(0,3)}] = \operatorname{Var}[Y_{ij}^{(1,3)}] = 4\), which corresponds to model 10 ; in Experiment , \(\mathrm{E}[Y_{ij}^{(0,4)}] = \mathrm{E}[Y_{ij}^{(1,4)}] = 8\) and \(\operatorname{Var}[Y_{ij}^{(0,4)}] = \operatorname{Var}[Y_{ij}^{(1,4)}] = j^2\), which corresponds to model 11 . The conclusions from Experiments and are analogous to those from Experiments and , respectively (see Tables 8–13 and Figures 3–4), thus validating our unified testing framework. To examine the robustness of our findings with respect to the number of groups, we additionally consider a variant of Experiment with \(J=10\) (referred to as Experiment \(^\prime\)). The results corroborate the main conclusions, as reported in 8.

Table 8: Empirical rejection rates under \(H_0\) in Experiment , for fixed \(K\) (\(K = 1,\ldots,5\)), the data-driven test with limiting \(\chi^2_1\) null distribution (\(\hat{K}\&\chi^2_1\)), and the data-driven test with approximated null distribution \(H(x)\) (\(\hat{K}\&H(x)\)). The empirical power under \(H_1\) equals 1 across all settings.
\(m\)	\(K = 1\)	\(K = 2\)	\(K = 3\)	\(K = 4\)	\(K = 5\)	\(\hat{K}\&\chi^2_1\)	\(\hat{K}\&H(x)\)
\(10\)	0.050	0.046	0.048	0.048	0.052	0.076	0.050
\(20\)	0.072	0.060	0.050	0.046	0.050	0.054	0.048
\(30\)	0.054	0.046	0.054	0.038	0.056	0.074	0.062
\(40\)	0.046	0.048	0.060	0.054	0.048	0.054	0.050
\(50\)	0.044	0.060	0.052	0.052	0.048	0.040	0.038
\(60\)	0.066	0.044	0.048	0.052	0.058	0.048	0.042
\(70\)	0.044	0.048	0.038	0.046	0.048	0.072	0.064
\(80\)	0.050	0.044	0.048	0.038	0.042	0.074	0.068
\(90\)	0.040	0.060	0.046	0.054	0.046	0.048	0.042
\(100\)	0.062	0.056	0.042	0.050	0.048	0.058	0.052
\(110\)	0.060	0.046	0.066	0.052	0.050	0.068	0.054
\(120\)	0.034	0.044	0.038	0.038	0.036	0.058	0.056
\(130\)	0.038	0.050	0.042	0.052	0.050	0.058	0.048
\(140\)	0.038	0.050	0.048	0.042	0.044	0.044	0.040
\(150\)	0.064	0.056	0.048	0.058	0.058	0.046	0.042

Table 9: Empirical frequency of \(\hat{K}\) under \(H_0\) in Experiment
\(m\)	\(\hat{K}=1\)	\(\hat{K}=2\)	\(\hat{K}=3\)	\(\hat{K}=5\)
\(10\)	0.976	0.024	0	0
\(20\)	0.974	0.018	0.006	0.002
\(30\)	0.986	0.014	0	0
\(40\)	0.990	0.010	0	0
\(50\)	0.998	0.002	0	0
\(60\)	0.988	0.012	0	0
\(70\)	0.992	0.008	0	0
\(80\)	0.990	0.008	0.002	0
\(90\)	0.988	0.012	0	0
\(100\)	0.996	0.004	0	0
\(110\)	0.994	0.004	0.002	0
\(120\)	0.992	0.006	0.002	0
\(130\)	0.998	0.002	0	0
\(140\)	0.992	0.008	0	0
\(150\)	0.996	0.004	0	0

Table 10: Empirical frequency of \(\hat{K}\) under \(H_1\) in Experiment
\(m\)	\(\hat{K}=1\)	\(\hat{K}=2\)	\(\hat{K}=4\)	\(\hat{K}=5\)
\(10\)	0.032	0.004	0.418	0.546
\(20\)	0	0	0.314	0.686
\(30\)	0	0	0.174	0.826
\(40\)	0	0	0.098	0.902
\(50\)	0	0	0.064	0.936
\(60\)	0	0	0.036	0.964
\(70\)	0	0	0.022	0.978
\(80\)	0	0	0.018	0.982
\(90\)	0	0	0.008	0.992
\(100\)	0	0	0.008	0.992
\(110\)	0	0	0.002	0.998
\(120\)	0	0	0	1
\(130\)	0	0	0	1
\(140\)	0	0	0.002	0.998
\(150\)	0	0	0	1

Figure 3: The sample means and error bars of \hat{K} in Experiment — Figure 3: The sample means and error bars of \(\hat{K}\) in Experiment

Figure 4: The sample means and error bars of \hat{K} in Experiment — Figure 4: The sample means and error bars of \(\hat{K}\) in Experiment

In summary, the experiments demonstrate that the proposed tests in Section 3 maintain the nominal significance level well under the null, and exhibit high power under the alternatives, except when \(\mathrm{E}[\pi_k(Z)] = 0\) for all \(1 \le k \le K\). The data-driven tests with the approximated null distribution \(H(x)\) reliably control the Type error and achieve excellent power performance.

Table 11: Empirical rejection rates in Experiment , for fixed \(K\) (\(K = 1,\ldots,5\)), the data-driven test with limiting \(\chi^2_1\) null distribution (\(\hat{K}\&\chi^2_1\)), and the data-driven test with approximated null distribution \(H(x)\) (\(\hat{K}\&H(x)\)).
	\(H_0\)							\(H_1\)
\(m\)	\(K = 1\)	\(K = 2\)	\(K = 3\)	\(K = 4\)	\(K = 5\)	\(\hat{K}\&\chi^2_1\)	\(\hat{K}\&H(x)\)	\(K = 1\)	\(K = 2\)	\(K = 3\)	\(K = 4\)	\(K = 5\)	\(\hat{K}\&\chi^2_1\)	\(\hat{K}\&H(x)\)
\(10\)	0.034	0.044	0.040	0.030	0.028	0.062	0.042	0.040	0.998	0.998	0.996	0.994	0.998	0.998
\(20\)	0.062	0.064	0.060	0.056	0.058	0.060	0.050	0.034	1	1	1	1	1	1
\(30\)	0.040	0.040	0.038	0.042	0.048	0.060	0.044	0.034	1	1	1	1	1	1
\(40\)	0.038	0.042	0.044	0.028	0.032	0.052	0.040	0.024	1	1	1	1	1	1
\(50\)	0.044	0.042	0.042	0.042	0.044	0.060	0.048	0.036	1	1	1	1	1	1
\(60\)	0.062	0.080	0.052	0.064	0.074	0.082	0.066	0.016	1	1	1	1	1	1
\(70\)	0.062	0.066	0.060	0.050	0.062	0.052	0.042	0.050	1	1	1	1	1	1
\(80\)	0.046	0.048	0.042	0.058	0.036	0.040	0.038	0.042	1	1	1	1	1	1
\(90\)	0.060	0.058	0.050	0.052	0.046	0.052	0.042	0.032	1	1	1	1	1	1
\(100\)	0.052	0.038	0.046	0.058	0.042	0.054	0.048	0.018	1	1	1	1	1	1
\(110\)	0.040	0.034	0.038	0.032	0.040	0.052	0.046	0.032	1	1	1	1	1	1
\(120\)	0.056	0.044	0.066	0.046	0.046	0.066	0.058	0.040	1	1	1	1	1	1
\(130\)	0.052	0.058	0.054	0.056	0.050	0.058	0.052	0.028	1	1	1	1	1	1
\(140\)	0.050	0.060	0.054	0.042	0.040	0.074	0.068	0.028	1	1	1	1	1	1
\(150\)	0.056	0.064	0.054	0.058	0.060	0.054	0.052	0.028	1	1	1	1	1	1

Table 12: Empirical frequency of \(\hat{K}\) under \(H_0\) in Experiment
\(m\)	\(\hat{K}=1\)	\(\hat{K}=2\)	\(\hat{K}=3\)	\(\hat{K}=4\)
\(10\)	0.972	0.024	0.004	0
\(20\)	0.986	0.012	0	0.002
\(30\)	0.980	0.018	0.002	0
\(40\)	0.992	0.008	0	0
\(50\)	0.990	0.010	0	0
\(60\)	0.996	0.004	0	0
\(70\)	0.994	0.006	0	0
\(80\)	0.996	0.004	0	0
\(90\)	0.996	0.004	0	0
\(100\)	0.990	0.010	0	0
\(110\)	0.994	0.006	0	0
\(120\)	0.992	0.006	0.002	0
\(130\)	0.992	0.008	0	0
\(140\)	0.986	0.014	0	0
\(150\)	0.998	0.002	0	0

Table 13: Empirical frequency of \(\hat{K}\) under \(H_1\) in Experiment
\(m\)	\(\hat{K}=2\)	\(\hat{K}=3\)	\(\hat{K}=4\)	\(\hat{K}=5\)
\(10\)	0.990	0.008	0.002	0
\(20\)	0.982	0.018	0	0
\(30\)	0.990	0.008	0	0.002
\(40\)	0.994	0.006	0	0
\(50\)	0.996	0.004	0	0
\(60\)	0.996	0.004	0	0
\(70\)	0.994	0.006	0	0
\(80\)	0.996	0.004	0	0
\(90\)	1	0	0	0
\(100\)	0.998	0.002	0	0
\(110\)	0.996	0.004	0	0
\(120\)	1	0	0	0
\(130\)	1	0	0	0
\(140\)	0.988	0.012	0	0
\(150\)	1	0	0	0

6 Discussion of other ANOVA models↩︎

In this section, we briefly discuss smooth tests for normality of other ANOVA models beyond those in Section 3. The testing approach above is also applicable to these models.

6.1 One-way fixed effects model with heterogeneous means and variances↩︎

We first consider the one-way ANOVA model with potentially different group means and variances, given by \[\label{model95diff} Y_{ij} = \mu_j + \varepsilon_{ij} = \mu_j + \sigma_j e_{ij}, \quad i = 1, \ldots, N_j, \quad j = 1, \ldots, J.\tag{17}\] Here \(\{\mu_j\}_{j=1}^J\) and \(\{\sigma_j\}_{j=1}^J\) denote the group-specific means and standard deviations. Owing to heterogeneity across groups, the estimators of each group mean and variance rely solely on the corresponding observations, in the sense that \(\hat{\mu}_j=N_j^{-1}\sum_{i=1}^{N_j}Y_{ij}, \hat{\sigma}_j^2=N_j^{-1} \sum_{i=1}^{N_j}\left(Y_{ij}-\hat{\mu}_j\right)^2\) for \(j=1,\ldots,J\). Accordingly, we define \(\hat{\varepsilon}_{ij} = Y_{ij}-\hat{\mu}_j\), \(\hat{e}_{ij}=\hat{\varepsilon}_{ij}/\hat{\sigma}_j =(Y_{ij}-\hat{\mu}_j) / \hat{\sigma}_j\), and \(\hat{Z}_{ij}=\Phi(\hat{e}_{ij})\). To implement the smooth test, one may follow the procedure in Section 3 and derive an expression for \(N^{-1}\sum_{j=1}^J \sum_{i=1}^{N_j} \pi_k(\hat{Z}_{ij})\), \(k=1,\ldots,K\), that accounts for the estimation effects. The test statistic is then constructed as a quadratic form of \(N^{-1/2}\sum_{j=1}^J \sum_{i=1}^{N_j} \boldsymbol{\pi}_K(\hat{Z}_{ij})\), normalized by the inverse of its asymptotic covariance matrix. Under mild conditions, an asymptotic \(\chi^2\) test can be established for model 17 , thereby extending the unified framework to this case.

6.2 One-way random effects model↩︎

Next, we consider the one-way random effects model: \[Y_{ij} = \mu + A_j + \varepsilon_{ij} = \mu + A_j + \sigma e_{ij}, \quad i = 1, \ldots, N_j, \quad j = 1, \ldots, J, \label{random95effects}\tag{18}\] where the random effects \(\{A_j\}_{j=1}^J\) are i.i.d. with \(\mathrm{E}[A_j]=0\), \(\operatorname{Var}[A_j]=\sigma_a^2\), and are independent of the errors \(\{e_{ij}\}_{i=1,j=1}^{N_j,J}\). We estimate \(\sigma^2\) by \[\hat{\sigma}^2 = \frac{1}{N} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \left( Y_{ij} - \bar{Y}_{\cdot j} \right)^2 = \frac{1}{N}\sum_{j = 1}^J \sum_{i = 1}^{N_j} \hat{\varepsilon}_{ij}^2,\] where \(\bar Y_{\cdot j} = N_j^{-1}\sum_{i=1}^{N_j} Y_{ij}\) and \(\hat{\varepsilon}_{ij}=Y_{ij}-\bar Y_{\cdot j}\). Note that \(\bar Y_{\cdot j}\) coincides with \(\hat{\mu}_j\) in Section 3.2, so the expression for \(\hat{\sigma}^2\) is identical to that in Section 3.2. Consequently, the methodology in Section 3.2 can be directly applied to test the normality of \(e_{ij}\) in 18 , since the random effects \(A_j\) are removed by within-group centering.

6.3 Two-way ANOVA models↩︎

We now extend the discussion to two-way ANOVA models. Consider the two-way fixed effects model: \[Y_{i j_1 j_2} = \mu_{j_1 j_2} + \varepsilon_{i j_1 j_2} = \big( \mu + \alpha_{j_1} + \beta_{j_2} + \gamma_{j_1 j_2} \big) + \sigma e_{i j_1 j_2}, \label{two-way95fixed95effects}\tag{19}\] subject to the identifiability constraints \[\sum_{j_1 = 1}^{J_1} \alpha_{j_1}=\sum_{j_2 = 1}^{J_2} \beta_{j_2}= \sum_{j_1 = 1}^{J_1} \gamma_{j_1 j_2}= \sum_{j_2 = 1}^{J_2} \gamma_{j_1 j_2}=0.\] Inspired by the methodology developed in Section 3, we can construct analogous estimators for \(\mu\), \(\{\alpha_{j_1}\}_{j_1=1}^{J_1}\), \(\{\beta_{j_2}\}_{j_2=2}^{J_2}\), \(\{\gamma_{j_1 j_2}\}_{j_1=1,j_2=1}^{J_1,J_2}\), and \(\sigma^2\), and derive the corresponding test statistic. The theoretical framework in Section 3 remains valid in this setting.

Similarly, consider the two-way random effects model: \[Y_{i j_1 j_2} = \mu_{j_1 j_2} + \varepsilon_{i j_1 j_2} = \big( \mu + A_{j_1} + B_{j_2} + D_{j_1 j_2} \big) + \sigma e_{i j_1 j_2}, \label{two-way95random95effects}\tag{20}\] where the i.i.d. random effects \(\{A_{j_1}\}\), \(\{B_{j_2}\}\), \(\{D_{j_1 j_2}\}\) and the errors \(\{e_{ij}\}\) are mutually independent. To test the normality of \(e_{ij}\) in 20 , one can apply the strategy of Section 6.2, first eliminating the random effects by within-cell centering and then constructing the test statistic based on the residuals.

To conclude, the methodology proposed in Section 3 can be seamlessly extended to more general ANOVA settings. This unified framework offers a flexible and effective approach to testing for normality across diverse linear models.

7 Concluding remarks↩︎

In this paper, we establish Neyman’s smooth tests for assessing the normality assumption in ANOVA models. For three types of one-way fixed effects models, we derive the asymptotic properties of the proposed tests and validate their finite-sample performance through extensive simulations. Our results provide a rigorous and practical tool for evaluating normality in a broad range of ANOVA settings.

Several directions remain open for future research. First, while the proposed tests are developed for specific ANOVA models, the true data-generating mechanism may not be known in practice. A natural extension is to combine our procedures with preliminary structural tests to identify the most appropriate ANOVA specification before applying the proposed methodology. Second, the assumption of independent random errors can be relaxed. For example, repeated clinical trials often involve correlated errors [40]–[42]. In such cases, our smooth test framework may be adapted by incorporating estimated correlation structures. Finally, for the data-driven selection rule, it is of theoretical interest to explore the scenario where the upper bound \(D = D(N)\) diverges with \(N\), that is, \(D \to \infty\) as \(N \to \infty\).

8 Appendix↩︎

We provide the proof of our main theoretical results, along with some additional numerical results, in this Appendix. The symbol “\(\lesssim\)" means that the left side is bounded by a positive constant times the right side. The symbol”\(\asymp\)" means that both sides are asymptotically equivalent.

Proof of Theorem 1.

We first study the estimation effects of \(\hat{\mu}-\mu\) and \(\hat{\sigma}^2-\sigma^2\). Note that \[\hat{\mu}-\mu=\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\left(Y_{ij}-\mu\right)=\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\varepsilon_{ij}=\frac{\sigma}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}e_{ij}. \label{mu95hat-mu95same}\tag{21}\] Under \(H_0\), \(e_{ij} \sim \mathcal{N}(0,1) i.i.d\)., then \(\sqrt{N}(\hat{\mu}-\mu)\sim \mathcal{N}(0,\sigma^2)\), and \(\hat{\mu}-\mu=O_p(N^{-1/2})\) as \(N\to\infty\). Besides, \[\begin{align} \hat{\sigma}^2-\sigma^2=&\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\left(\left(Y_{ij}-\hat{\mu}\right)^2-\sigma^2\right) \nonumber\\ =&\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\left(\varepsilon_{ij}^2-\sigma^2\right)-\left(\hat{\mu}-\mu\right)^2\nonumber\\ =&\frac{\sigma^2}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\left(e_{ij}^2-1\right)+O_p\left(\frac{1}{N}\right). \label{sigma94295hat-sigma94295same} \end{align}\tag{22}\] Under \(H_0\), by CLT, \(\sqrt{N}(\hat{\sigma}^2-\sigma^2)\to_d \mathcal{N}(0,2\sigma^4)\) as \(N\to \infty\), then \(\hat{\sigma}^2-\sigma^2=O_p(N^{-1/2})\).

Now we deal with \(N^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k(\hat{Z}_{ij})\). By second order Taylor expansion of \(\pi_k(\hat{Z}_{ij})\) with respect to \(\hat{Z}_{ij}\) at \(Z_{ij}\), we obtain \[\begin{align} \frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k\left(\hat{Z}_{ij}\right)=&\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k(Z_{ij})+\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\dot{\pi}_k(Z_{ij})\left( \hat{Z}_{ij}-Z_{ij}\right) \nonumber\\ &+\frac{1}{2N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\ddot\pi_k\left(\tilde{Z}_{ij}\right)\left( \hat{Z}_{ij}-Z_{ij}\right)^2, \label{decomp} \end{align}\tag{23}\] where \(\tilde{Z}_{ij}\) lies between \(\hat{Z}_{ij}\) and \(Z_{ij}\). Recall that \(\hat{Z}_{ij}=\Phi({\hat{\varepsilon}_{ij}} / {\hat{\sigma}})\) and \(Z_{ij}=\Phi(\varepsilon_{ij}/\sigma)=\Phi(e_{ij})\). Then, by Taylor expansion again, the second term on the right-hand side of 23 can be further decomposed as \[\begin{align} &\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\dot{\pi}_k(Z_{ij})\phi\left(e_{ij}\right)\left(\frac{\hat{\varepsilon}_{ij}}{\hat{\sigma}}- \frac{\varepsilon_{ij}}{\sigma}\right)+\frac{1}{2N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\dot{\pi}_k(Z_{ij})\dot{\phi}\left(\frac{\tilde{\varepsilon}_{ij}}{\tilde{\sigma}}\right)\left(\frac{\hat{\varepsilon}_{ij}}{\hat{\sigma}}-\frac{\varepsilon_{ij}}{\sigma}\right)^2, \label{main95term} \end{align}\tag{24}\] where \(\tilde{\varepsilon}_{ij}/\tilde{\sigma}\) an intermediate point between \({\hat{\varepsilon}_{ij}} / {\hat{\sigma}}\) and \(\varepsilon_{ij}/\sigma\), whose exact value may vary across different scenarios.

We focus on the first term in 24 . Straightforward calculation leads to \[\begin{align} &\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\dot{\pi}_k(Z_{ij})\phi\left(e_{ij}\right)\left(\frac{\hat{\varepsilon}_{ij}}{\hat{\sigma}}-\frac{\varepsilon_{ij}}{\sigma}\right)\nonumber\\ =&\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\dot{\pi}_k(Z_{ij})\phi\left(e_{ij}\right)\frac{\hat{\varepsilon}_{ij}-\varepsilon_{ij}}{\hat{\sigma}}-\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\dot{\pi}_k(Z_{ij})\phi\left(e_{ij}\right)\frac{\varepsilon_{ij}(\hat{\sigma}^2-\sigma^2)}{\hat{\sigma}\sigma(\hat{\sigma}+\sigma)}\nonumber\\ =&-\frac{\hat{\mu}-\mu}{\hat{\sigma}}\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\dot{\pi}_k(Z_{ij})\phi\left(e_{ij}\right)-\frac{\hat{\sigma}^2-\sigma^2}{\hat{\sigma}(\hat{\sigma}+\sigma)}\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\dot{\pi}_k(Z_{ij})\phi\left(e_{ij}\right)e_{ij}\nonumber\\ =&-\frac{\hat{\mu}-\mu}{\sigma}\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)\right]-\frac{\hat{\sigma}^2-\sigma^2}{2\sigma^2}\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)e\right]\nonumber\\ &+o_p\left(\hat{\mu}-\mu\right)+o_p\left(\hat{\sigma}^2-\sigma^2\right), \label{a1} \end{align}\tag{25}\] where the last step follows due to the law of large numbers (LLN) for \(N^{-1}\sum_{j=1}^J\sum_{i=1}^{N_j}\dot{\pi}_k(Z_{ij})\phi(e_{ij})=\mathrm{E}[\dot{\pi}_k(Z)\phi(e)]+o_p(1)\) and \(N^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\dot{\pi}_k(Z_{ij})\phi(e_{ij})e_{ij}=\mathrm{E}[\dot{\pi}_k(Z)\phi(e)e]+o_p(1)\) as \(N\to\infty\), as well as the consistency of \(\hat{\sigma}\) to \(\sigma\) from the arguments above. Under \(H_0\), through integration by parts, we have \[\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)\right]=\int_0^1\pi_k(z)\Phi^{-1}(z)\mathrm{d}z=c_{1k},\] and \[\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)e\right]=\int_0^1\pi_k(z) \left[ \left(\Phi^{-1}(z)\right)^2 - 1\right]\mathrm{d}z = \int_0^1\pi_k(z) \left(\Phi^{-1}(z)\right)^2 \mathrm{d}z=c_{2k},\] where we use the fact that \(\mathrm{d} \Phi^{-1} (z) = \mathrm{d}z/\phi( \Phi^{-1} (z))\) and \(\int_0^1 \pi_k (z) \mathrm{d}z = 0\). Thus, along with 21 and 22 , 25 can be re-expressed as \[\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k(Z_{ij})-c_{1k}\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}e_{ij}-\frac{c_{2k}}{2}\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\left(e_{ij}^2-1\right)+o_p\left(\frac{1}{\sqrt N}\right). \label{main95same95H0}\tag{26}\]

It suffices to show that \[R_{N1}\equiv \frac{1}{2N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\ddot\pi_k\left(\tilde{Z}_{ij}\right)\left( \hat{Z}_{ij}-Z_{ij}\right)^2=o_p\left(\frac{1}{\sqrt N}\right), \label{RN1}\tag{27}\] and \[R_{N2}\equiv\frac{1}{2N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\dot{\pi}_k(Z_{ij})\dot{\phi}\left(\frac{\tilde{\varepsilon}_{ij}}{\tilde{\sigma}}\right)\left(\frac{\hat{\varepsilon}_{ij}}{\hat{\sigma}}-\frac{\varepsilon_{ij}}{\sigma}\right)^2=o_p\left(\frac{1}{\sqrt N}\right).\label{RN2}\tag{28}\] For \(R_{N1}\), by the boundedness of \(\ddot\pi_k\) (in Assumption 2) and \(\phi\), \[\begin{align} \vert R_{N1}\vert & \lesssim \frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j} \left[ \Phi \left( \frac{\hat{\varepsilon}_{ij}}{\hat{\sigma}} \right) - \Phi \left( \frac{{\varepsilon}_{ij}}{\sigma}\right)\right]^2 \\ & =\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j} \phi^2 \left( \frac{\tilde{\varepsilon}_{ij}}{\tilde{\sigma}} \right) \left( \frac{\hat{\varepsilon}_{ij}}{\hat{\sigma}} - \frac{{\varepsilon}_{ij}}{\sigma}\right)^2 \\ & \lesssim \frac{1}{N} \sum_{j=1}^{J}\sum_{i=1}^{N_j} \left( \frac{\hat{\varepsilon}_{ij}}{\hat{\sigma}} - \frac{{\varepsilon}_{ij}}{\sigma}\right)^2 \\ & = \frac{1}{N} \sum_{j=1}^{J}\sum_{i=1}^{N_j}\left( \frac{\hat{\varepsilon}_{i j}-\varepsilon_{i j}}{\hat{\sigma}} - \frac{\varepsilon_{i j}(\hat{\sigma}^2-\sigma^2)}{\hat{\sigma} \sigma(\hat{\sigma}+\sigma)} \right)^2. \end{align}\] By the inequality \((a-b)^2 \le 2(a^2+b^2)\) for any \(a,b \in \mathbb{R}\), \[\begin{align} & \frac{1}{N} \sum_{j=1}^{J}\sum_{i=1}^{N_j}\left( \frac{\hat{\varepsilon}_{i j}-\varepsilon_{i j}}{\hat{\sigma}} - \frac{\varepsilon_{i j}(\hat{\sigma}^2-\sigma^2)}{\hat{\sigma} \sigma(\hat{\sigma}+\sigma)} \right)^2 \\ \lesssim & \frac{1}{N} \sum_{j=1}^{J}\sum_{i=1}^{N_j} \left(\frac{\hat{\varepsilon}_{i j}-\varepsilon_{i j}}{\hat{\sigma}} \right)^2 + \frac{1}{N} \sum_{j=1}^{J}\sum_{i=1}^{N_j} \left( \frac{\varepsilon_{i j}(\hat{\sigma}^2-\sigma^2)}{\hat{\sigma} \sigma(\hat{\sigma}+\sigma)} \right)^2 \\ \asymp & \left(\hat{\mu} - \mu \right)^2 + (\hat{\sigma}^2-\sigma^2)^2 \frac{1}{N} \sum_{j=1}^{J}\sum_{i=1}^{N_j} \varepsilon_{i j}^2 \\ = & O_p \left( \frac{1}{N} \right) + O_p \left( \frac{1}{N} \right) \left( \sigma^2 + o_p(1) \right) = o_p \left( \frac{1}{\sqrt{N}}\right), \end{align}\] where the last step is the result of 21 , 22 and LLN for \(N^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j} \varepsilon_{i j}^2=\sigma^2+o_p(1)\). For \(R_{N2}\), we have \[\begin{align} \vert R_{N2}\vert & = \left\vert \frac{1}{2N} \sum_{j=1}^{J}\sum_{i=1}^{N_j} \dot{\pi}_k (Z_{ij}) \left[ -\frac{\tilde{\varepsilon}_{ij}}{\tilde{\sigma}} \phi \left( \frac{\tilde{\varepsilon}_{ij}}{\tilde{\sigma}} \right)\right] \left( \frac{\hat{\varepsilon}_{ij}}{\hat{\sigma}} - \frac{{\varepsilon}_{ij}}{\sigma}\right)^2 \right\vert \\ & \lesssim \frac{1}{N} \sum_{j=1}^{J}\sum_{i=1}^{N_j} \left\vert \frac{\tilde{\varepsilon}_{ij}}{\tilde{\sigma}} \right\vert \left( \frac{\hat{\varepsilon}_{ij}}{\hat{\sigma}} - \frac{{\varepsilon}_{ij}}{\sigma}\right)^2 \\ & \le \left( \frac{1}{N} \sum_{j=1}^{J}\sum_{i=1}^{N_j} \frac{\tilde{\varepsilon}_{ij}^2}{\tilde{\sigma}^2} \right)^{1/2} \left( \frac{1}{N} \sum_{j=1}^{J}\sum_{i=1}^{N_j} \left( \frac{\hat{\varepsilon}_{ij}}{\hat{\sigma}} - \frac{{\varepsilon}_{ij}}{\sigma}\right)^4 \right)^{1/2}, \end{align}\] where the first “\(=\)" is due to \(\dot{\phi}(x) = -x\phi (x)\), and the last”\(\le\)" is due to Cauchy–Schwarz inequality. Similar arguments as above yield that \[\begin{align} &\frac{1}{N} \sum_{j=1}^{J}\sum_{i=1}^{N_j} \left( \frac{\hat{\varepsilon}_{ij}}{\hat{\sigma}} - \frac{{\varepsilon}_{ij}}{\sigma}\right)^4 \\ = & \frac{1}{N} \sum_{j=1}^{J}\sum_{i=1}^{N_j} \left( \frac{\hat{\varepsilon}_{i j}-\varepsilon_{i j}}{\hat{\sigma}} - \frac{\varepsilon_{i j}(\hat{\sigma}^2-\sigma^2)}{\hat{\sigma} \sigma(\hat{\sigma}+\sigma)} \right)^4 \\ \lesssim &\left[ \frac{1}{N} \sum_{j=1}^{J}\sum_{i=1}^{N_j} \left( \frac{\hat{\varepsilon}_{ij}-\varepsilon_{ij}}{\hat{\sigma}} \right)^4 + \frac{1}{N} \sum_{j=1}^{J}\sum_{i=1}^{N_j} \left( \frac{\varepsilon_{i j}(\hat{\sigma}^2-\sigma^2)}{\hat{\sigma} \sigma(\hat{\sigma}+\sigma)} \right)^4 \right] \\ \asymp &\left( \hat{\mu} - \mu \right)^4 + \left( \hat{\sigma}^2-\sigma^2\right)^4 \left( \mathrm{E}\left[e^4\right] + o_p(1)\right) = O_p \left( \frac{1}{N^2}\right), \end{align}\] and \[\begin{align} \frac{1}{N} \sum_{j=1}^{J}\sum_{i=1}^{N_j} \frac{\tilde{\varepsilon}_{ij}^2}{\tilde{\sigma}^2} & \le \frac{1}{N} \sum_{j=1}^{J}\sum_{i=1}^{N_j} \left( \frac{\hat{\varepsilon}_{ij}}{\hat{\sigma}}\right)^2 + \frac{1}{N} \sum_{j=1}^{J}\sum_{i=1}^{N_j} \left( \frac{{\varepsilon}_{ij}}{{\sigma}}\right)^2 \nonumber\\ & = 1 + \left( 1 + o_p(1)\right) = O_p(1). \label{RN295Op1} \end{align}\tag{29}\] Thus, \(\vert R_{N2}\vert = O_p(N^{-1})=o_p(N^{-1/2})\). Hence 27 and 28 are verified. Combining 23 , 24 , 25 , 26 , 27 and 28 , we complete the proof of Theorem 1. \(\square\)

Proof of Theorem 2.

The proof of Theorem 2 is similar to that of Theorem 1. Under \(H_1\), 21 still holds, and \(\sqrt{N}(\hat{\mu}-\mu) \to_d \mathcal{N}(0,\sigma^2)\) since \(e_{ij}\sim F \ne \Phi\). Analogously, 22 still holds, and \(\sqrt{N}(\hat{\sigma}^2-\sigma^2) \to_d \sigma^2\mathcal{N}(0,\mathrm{E}[e^4]-1)\).

We start from 23 . Under \(H_1\), the arguments and results in 23 , 24 and 25 still hold. Unlike 26 , here in the context of \(H_1\) the last line of 25 should be expressed as \[\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k(Z_{ij})-d_{1k}\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}e_{ij}-\frac{d_{2k}}{2}\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\left(e_{ij}^2-1\right)+o_p\left(\frac{1}{\sqrt N}\right), \label{main95same95H1}\tag{30}\] where we replace \(c_{1k}\) with \(d_{1k}\) and \(c_{2k}\) with \(d_{2k}\). Besides, the remainder terms \(R_{N1}\) in 27 and \(R_{N2}\) in 28 are still \(O_p(N^{-1/2})\) through similar derivation above. Combining 23 , 24 , 25 , 27 , 28 and 30 , we obtain ?? in Theorem 2. From ?? , we know that \[\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k\left(\hat{Z}_{ij}\right) = \mathrm{E}\left[\pi_k (Z)\right] + o_p(1),\] as \(N\to\infty\), which implies \(\hat{\Psi}_K^2 \to_p \boldsymbol{a}_K^\top \boldsymbol{\Sigma}_K^{-1} \boldsymbol{a}_K\). Furthermore, \[\begin{align} &\sqrt{N} \left( \hat{\Psi}_K^2 - \boldsymbol{a}_K^\top \boldsymbol{\Sigma}_K^{-1} \boldsymbol{a}_K \right) \nonumber\\ =& \sqrt{N} \left[ \boldsymbol{a}_K + \frac{1}{N} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \boldsymbol{\pi}_K\left(\hat{Z}_{ij}\right)-\boldsymbol{a}_K\right]^\top \boldsymbol{\Sigma}_K^{-1} \left[ \boldsymbol{a}_K + \frac{1}{N} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \boldsymbol{\pi}_K\left(\hat{Z}_{ij}\right)-\boldsymbol{a}_K\right] \nonumber\\ &- \sqrt{N}\boldsymbol{a}_K^\top \boldsymbol{\Sigma}_K^{-1} \boldsymbol{a}_K \nonumber\\ =& 2a^\top \boldsymbol{\Sigma}_K^{-1} \frac{1}{\sqrt{N}} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \left( \boldsymbol{\pi}_K \left(\hat{Z}_{ij}\right) - \boldsymbol{a}_K \right) +\left[\frac{1}{\sqrt{N}} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \left( \boldsymbol{\pi}_K \left(\hat{Z}_{ij}\right) - \boldsymbol{a}_K \right)\right]^\top\boldsymbol{\Sigma}_K^{-1}\nonumber\\ &\left[\frac{1}{\sqrt{N}} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \left( \boldsymbol{\pi}_K \left(\hat{Z}_{ij}\right) - \boldsymbol{a}_K \right)\right]. \label{quadratic} \end{align}\tag{31}\] The following part demonstrates the asymptotic normality of \[\frac{1}{\sqrt N}\sum_{j = 1}^J \sum_{i = 1}^{N_j}\left(\boldsymbol{\pi}_K (\hat{Z}_{ij}) - \boldsymbol{a}_K \right).\] Indeed, the \(k\)-th component in it is \[\frac{1}{\sqrt{N}} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \left( \pi_k (\hat{Z}_{ij}) - a_k \right) = \frac{1}{\sqrt{N}} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \left( \zeta_k (e_{ij}) - \mathrm{E}\left[\zeta_k (e_{ij})\right] \right) + o_p(1),\] where \[\zeta_k(e) \equiv \pi_k(\Phi(e)) - d_{1k} e - \frac{d_{2k}}{2} (e^2 - 1).\] Then, by CLT, \[\frac{1}{\sqrt{N}} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \left( \boldsymbol{\pi}_K (\hat{Z}_{ij}) - \boldsymbol{a}_K \right) \to_d \mathcal{N}_K\left( \boldsymbol{0}, \boldsymbol{\Xi}_K\right),\] with \(\boldsymbol{\Xi}_K=\left(\xi_{kl}\right)_{K \times K}\) given by \[\begin{align} \xi_{kl} =& \mathrm{E}\left[\zeta_k(e) \zeta_l (e) \right] \\ =&\mathrm{E}\left[ \pi_k (Z) \pi_l (Z) \right]-a_k a_l-\left[d_{1k}d_{3l} + d_{1l}d_{3k} \right] + d_{1k} d_{1l} + \frac{1}{2} \left[ a_k d_{2l} + a_l d_{2k} \right] \\ &-\frac{1}{2}\left[ d_{2k}d_{4l} + d_{2l}d_{4k}\right] + \frac{1}{2}\left[ d_{1k}d_{2l} + d_{1l}d_{2k}\right] \mathrm{E}\left[e^3\right] +\frac{d_{2k} d_{2l} }{4} \left[\mathrm{E}\left[e^4\right]-1 \right]. \end{align}\] Along with 31 , ?? in Theorem 2 is validated. This completes the proof of Theorem 2. \(\square\)

Proof of Theorem 3.

The proof of Theorem 3 is similar to that of Theorems 1 and 2. We just follow the arguments in the previous proof and carefully specify the behavior of the estimation effects. Under \(H_{1L}\) in 9 , the PDF of \(e\) is given by \[f(x) = (1-\delta_N) \phi(x)+\delta_N q(x).\]

We first examine the convergence of \(\hat{\mu}-\mu\) and \(\hat{\sigma}^2-\sigma^2\) as before. For \(\hat{\mu}-\mu\), 21 still holds under \(H_{1L}\), and \(\sqrt{N}(\hat{\mu}-\mu) \to_d \mathcal{N}(0,\sigma^2)\) as \(N\to \infty\). For \(\hat{\sigma}^2-\sigma^2\), under \(H_{1L}\), 22 still holds, and \(\sqrt{N}(\hat{\sigma}^2-\sigma^2) \to_d \mathcal{N}(0,2\sigma^4)\) ⁴ since \[\begin{align} \mathrm{E}\left[e^4\right]=&\int_{-\infty}^{\infty} x^4f(x)\mathrm{d}x \\ =&(1-\delta_N)\int_{-\infty}^{\infty}x^4\phi(x)\mathrm{d}x+\delta_N\int_{-\infty}^{\infty}x^4q(x)\mathrm{d}x\\ =&3+o(1). \end{align}\]

Next, we clarify the relationship of the denoted constants. Recall the definitions of the constants \(d_{1k}=\mathrm{E}[\dot{\pi}_k(Z)\phi\left(e\right)]=\int_{- \infty}^{\infty}\dot{\pi}_k(\Phi\left(x\right))\phi\left(x\right)f(x)\mathrm{d}x\) and \(d_{2k}=\mathrm{E}[\dot{\pi}_k(Z)\phi\left(e\right)e]=\int_{- \infty}^{\infty}\dot{\pi}_k(\Phi\left(x\right))\phi\left(x\right)xf(x)\mathrm{d}x\). Then under \(H_{1L}\), as \(N\to \infty\), we have \[\begin{align} d_{1k}=&\int_{-\infty}^{\infty}\dot{\pi}_k(\Phi\left(x\right))\phi\left(x\right)f(x)\mathrm{d}x \nonumber \\ =&(1-\delta_N)\int_{-\infty}^{\infty}\dot{\pi}_k(\Phi\left(x\right))\phi^2\left(x\right)\mathrm{d}x+\delta_N\int_{-\infty}^{\infty}\dot{\pi}_k(\Phi\left(x\right))\phi\left(x\right)q(x)\mathrm{d}x \nonumber\\ =&c_{1k}+o(1), \label{c195d195H1L95same} \end{align}\tag{32}\] \[\begin{align} d_{2k}=&\int_{- \infty}^{\infty}\dot{\pi}_k(\Phi\left(x\right))\phi\left(x\right)xf(x)\mathrm{d}x \nonumber\\ =&(1-\delta_N)\int_{-\infty}^{\infty}\dot{\pi}_k(\Phi\left(x\right))x\phi^2\left(x\right)\mathrm{d}x+\delta_N\int_{-\infty}^{\infty}\dot{\pi}_k(\Phi\left(x\right))x\phi\left(x\right)q(x)\mathrm{d}x \nonumber\\ =&c_{2k}+o(1), \label{c295d295H1L95same} \end{align}\tag{33}\] and \[\begin{align} \mathrm{E}[\pi_k (Z)] =& \int_{-\infty}^{\infty} \pi_k \left(\Phi(x)\right)f(x) \mathrm{d}x \nonumber\\ =&\left(1-\delta_N\right)\int_{-\infty}^{\infty} \pi_k \left(\Phi(x)\right)\phi(x) \mathrm{d}x+\delta_N\int_{-\infty}^{\infty} \pi_k \left(\Phi(x)\right)q(x) \mathrm{d}x \nonumber\\ =& \delta_N \Delta_k. \label{drift95H1L95same} \end{align}\tag{34}\]

Now we start from 23 again. Under \(H_{1L}\), we still have 23 , 24 and 25 . 26 also holds as a result of 32 and 33 . The remainder terms \(R_{N1}\) in 27 and \(R_{N2}\) in 28 are still \(O_p(N^{-1/2})\). Combining 23 , 24 , 25 , 26 , 27 , 28 and 34 , we obtain ?? in Theorem 3.

It suffices to derive the asymptotic normality of \(N^{-1/2}\sum_{j = 1}^J \sum_{i = 1}^{N_j}\boldsymbol{\pi}_K (\hat{Z}_{ij})\). Under \(H_{1L}\), straightforward calculation yields that \[\begin{align} \mathrm{E}\left[\pi_k(Z)\pi_l(Z)\right]=&\int_{-\infty}^{\infty} \pi_k(\Phi(x))\pi_l(\Phi(x))f(x)\mathrm{d}x \\ =&(1-\delta_N)\int_{-\infty}^{\infty}\pi_k(\Phi(x))\pi_k(\Phi(x))\phi(x)\mathrm{d}x\\ &+\delta_N\int_{-\infty}^{\infty}\pi_k(\Phi(x))\pi_k(\Phi(x))q(x)\mathrm{d}x\\ =&\delta_{kl}+o(1), \end{align}\] \[\begin{align} d_{3k}=&\mathrm{E}\left[\pi_k(Z)e\right]=\int_{-\infty}^{\infty} \pi_k(\Phi(x))xf(x)\mathrm{d}x \\ =&(1-\delta_N)\int_{-\infty}^{\infty}\pi_k(\Phi(x))x\phi(x)\mathrm{d}x+\delta_N\int_{-\infty}^{\infty}\pi_k(\Phi(x))xq(x)\mathrm{d}x\\ =&c_{1k}+o(1), \end{align}\] \[\begin{align} d_{4k}=&\mathrm{E}\left[\pi_k(Z)e^2\right]=\int_{-\infty}^{\infty} \pi_k(\Phi(x))x^2f(x)\mathrm{d}x \\ =&(1-\delta_N)\int_{-\infty}^{\infty}\pi_k(\Phi(x))x^2\phi(x)\mathrm{d}x+\delta_N\int_{-\infty}^{\infty}\pi_k(\Phi(x))x^2q(x)\mathrm{d}x\\ =&c_{2k}+o(1), \end{align}\] and \[\begin{align} \mathrm{E}\left[e^3\right]=&\int_{-\infty}^{\infty} x^3f(x)\mathrm{d}x \\ =&(1-\delta_N)\int_{-\infty}^{\infty}x^3\phi(x)\mathrm{d}x+\delta_N\int_{-\infty}^{\infty}x^3q(x)\mathrm{d}x\\ =&o(1). \end{align}\] Thus, by ?? and CLT, \[\begin{align} \frac{1}{\sqrt{N}}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\boldsymbol{\pi}_K\left(\hat{Z}_{ij}\right) & =\sqrt{N}\left[\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_1\left(\hat{Z}_{ij}\right),\ldots,\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_K\left(\hat{Z}_{ij}\right)\right]^\top \\ & \to_d \mathcal{N}_K\left(\boldsymbol{\Delta}_K, \boldsymbol{\Sigma}_K\right), \end{align}\] since the asymptotic covariance matrix is still \(\boldsymbol{\Sigma}_K\) from the calculation above. Besides, \[N\hat{\Psi}_K^2 \to_d \chi_K^2\left(\boldsymbol{\Delta}_K^\top\boldsymbol{\Sigma}_K^{-1}\boldsymbol{\Delta}_K\right),\] by definition of the non-central \(\chi^2\) distribution. This completes the proof of Theorem 3. \(\square\)

Proof of Theorem 4, 5 and 6.

The proof is similar to that of Theorems 1, 2, and 3. Note that in the case of different means and same variance, we assume \(\min\{N_1,\ldots,N_J\}\to\infty\), \(J = o( N^{1/2})\) and \(\sum_{j=1}^J N_j^{-1}=o(1)\). To simplify the procedure and notations, here we just clarify the asymptotic properties of \(N^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k(\hat{Z}_{ij})\) without distinction of the null and alternatives. Our goal is to prove that \[\begin{align} &\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k\left(\hat{Z}_{ij}\right)\nonumber\\ =&\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\left\{\pi_k(Z_{ij})-\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)\right]e_{ij}-\frac{\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)e\right]}{2}\left(e_{ij}^2-1\right)\right\}\nonumber\\ &+o_p\left(\frac{1}{\sqrt N}\right). \label{decomp95diff95mean} \end{align}\tag{35}\]

We first focus on the estimation effects of \(\hat{\mu}_j-\mu_j\) and \(\hat{\sigma}^2-\sigma^2\). For \(\hat{\mu}_j-\mu_j\), by CLT, we have \(\sqrt{N_j}(\hat{\mu}_j-\mu_j)\to_d \mathcal{N}(0,\sigma^2)\) for each \(1\le j\le J\) as \(\min\{N_1,\ldots,N_J\}\to\infty\). Besides, since \[\mathrm{E}\left[\frac{1}{N}\sum_{j=1}^{J}N_j\left(\hat{\mu}_j-\mu_j\right)^2\right]=\frac{\sigma^2}{N}\sum_{j=1}^{J}\mathrm{E}\left[N_j\left(\frac{1}{N_j}\sum_{i=1}^{N_j}e_{ij}\right)^2\right]=\frac{J}{N}\sigma^2, \label{mu95j95hat-mu95j95sum952}\tag{36}\] we have \(N^{-1}\sum_{j=1}^{J}N_j\left(\hat{\mu}_j-\mu_j\right)^2=O_p(J/N)=o_p(N^{-1/2})\) from \(J = o( N^{1/2})\). Analogously, by \[\mathrm{E}\left[\left(\frac{1}{N_j}\sum_{i=1}^{N_j}e_{ij}\right)^4\right] =\frac{1}{N_j^4} \left[3N_j(N_j-1)+N_j\mathrm{E}\left[e^4\right]\right]=\frac{3}{N_j^2}+\frac{\mathrm{E}\left[e^4\right]-3}{N_j^3},\] we obtain \[\begin{align} &\mathrm{E}\left[\frac{1}{N}\sum_{j=1}^{J}N_j\left(\hat{\mu}_j-\mu_j\right)^4\right]\nonumber\\ =&\frac{\sigma^4}{N}\sum_{j=1}^{J}\mathrm{E}\left[N_j\left(\frac{1}{N_j}\sum_{i=1}^{N_j}e_{ij}\right)^4\right]=\frac{\sigma^4}{N}\sum_{j=1}^{J}\left(\frac{3}{N_j}+\frac{\mathrm{E}\left[e^4\right]-3}{N_j^2}\right), \label{mu95j95hat-mu95j95sum954} \end{align}\tag{37}\] which implies \(N^{-1}\sum_{j=1}^{J}N_j\left(\hat{\mu}_j-\mu_j\right)^4=N^{-1}O_p(\sum_{j=1}^J N_j^{-1})=o_p(N^{-1})\) since \(\sum_{j=1}^J N_j^{-1}=o_p(1)\). For \(\hat{\sigma}^2-\sigma^2\), we have \[\begin{align} \hat{\sigma}^2-\sigma^2=&\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\left(\left(Y_{ij}-\hat{\mu}_j\right)^2-\sigma^2\right)\\ =&\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\left(\varepsilon_{ij}^2-\sigma^2\right)-\frac{1}{N}\sum_{j=1}^{J}N_j\left(\hat{\mu}_j-\mu_j\right)^2\\ =&\frac{\sigma^2}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\left(e_{ij}^2-1\right)+o_p\left(\frac{1}{\sqrt{N}}\right), \end{align}\] with the last step following 36 . Note that the main term in the last line is the same as 22 .

Next, we will show that \[\begin{align} &\frac{1}{N\sigma}\sum_{j=1}^{J}N_j\left(\hat{\mu}_j-\mu_j\right)\left\{\frac{1}{N_j}\sum_{i=1}^{N_j}\dot{\pi}_k(Z_{ij})\phi\left(e_{ij}\right)\right\}\nonumber\\ =&\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)\right]\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}e_{ij}+o_p\left(\frac{1}{\sqrt{N}}\right).\label{c1k95different95mean} \end{align}\tag{38}\] Indeed, since \[\begin{align} &\frac{1}{N\sigma}\sum_{j=1}^{J}N_j\left(\hat{\mu}_j-\mu_j\right)\left\{\frac{1}{N_j}\sum_{i=1}^{N_j}\dot{\pi}_k(Z_{ij})\phi\left(e_{ij}\right)\right\} \\ = & \frac{1}{N}\sum_{j=1}^{J}\frac{1}{N_j}\sum_{i_1=1}^{N_j}\sum_{i_2=1}^{N_j}\dot{\pi}_k(Z_{i_1j})\phi\left(e_{i_1j}\right)e_{i_2j}, \end{align}\] and \[\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}e_{ij} = \frac{1}{N}\sum_{j=1}^{J}\frac{1}{N_j}\sum_{i_1=1}^{N_j}\sum_{i_2=1}^{N_j}e_{i_2j},\] then we have \[\begin{align} &\frac{1}{N\sigma}\sum_{j=1}^{J}N_j\left(\hat{\mu}_j-\mu_j\right)\left\{\frac{1}{N_j}\sum_{i=1}^{N_j}\dot{\pi}_k(Z_{ij})\phi\left(e_{ij}\right)\right\}- \mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)\right]\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}e_{ij}\\ =&\frac{1}{N}\sum_{j=1}^{J}\frac{1}{N_j}\sum_{i_1=1}^{N_j}\sum_{i_2=1}^{N_j}\left(\dot{\pi}_k(Z_{i_1j})\phi\left(e_{i_1j}\right)-\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)\right]\right)e_{i_2j}\\ =&\frac{1}{N}\sum_{j=1}^{J}\frac{1}{N_j}\sum_{i=1}^{N_j}\left(\dot{\pi}_k(Z_{ij})\phi\left(e_{ij}\right)-\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)\right]\right)e_{ij}\\ &+\frac{1}{N}\sum_{j=1}^{J}\frac{1}{N_j}\sum_{i_1=1}^{N_j}\sum_{i_2\ne i_1}^{N_j}\left(\dot{\pi}_k(Z_{i_1j})\phi\left(e_{i_1j}\right)-\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)\right]\right)e_{i_2j}\\ \equiv& A_{N1}+A_{N2}. \end{align}\] For \(A_{N1}\), we have \[\mathrm{E}[A_{N1}] = \frac{J}{N} \mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)e\right],\] and \[\operatorname{Var}[A_{N1}] = \operatorname{Var}\left[\left(\dot{\pi}_k(Z)\phi\left(e\right)-\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)\right]\right)e\right]\frac{1}{N^2}\sum_{j=1}^{J}\frac{1}{N_j},\] then \(A_{N1} = o_p(N^{-1/2})\) since \(J=o(N^{1/2})\) as \(N\to\infty\). For \(A_{N2}\), we have \(\mathrm{E}[A_{N2}]=0\), and \[\begin{align} &\operatorname{Var}[A_{N2}] \\ =& \frac{1}{N^2} \sum_{j=1}^{J} \frac{1}{N_j^2} \operatorname{Var}\left[\sum_{i_1=1}^{N_j}\sum_{i_2\ne i_1}^{N_j}\left(\dot{\pi}_k(Z_{i_1j})\phi\left(e_{i_1j}\right)-\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)\right]\right)e_{i_2j}\right] \\ =&\frac{1}{N^2} \sum_{j=1}^{J} \frac{1}{N_j^2} \mathrm{E}\left[\left(\sum_{i_1=1}^{N_j}\sum_{i_2\ne i_1}^{N_j}\left(\dot{\pi}_k(Z_{i_1j})\phi\left(e_{i_1j}\right)-\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)\right]\right)e_{i_2j}\right)^2\right]\\ =&\frac{1}{N^2} \sum_{j=1}^{J} \frac{N_j (N_j-1)}{N_j^2}\left(\operatorname{Var}\left[\dot{\pi}_k(Z)\phi\left(e\right)\right]+\mathrm{E}^2 \left[\dot{\pi}_k(Z)\phi\left(e\right)e\right]\right)\\ =&\left(\operatorname{Var}\left[\dot{\pi}_k(Z)\phi\left(e\right)\right]+\mathrm{E}^2 \left[\dot{\pi}_k(Z)\phi\left(e\right)e\right]\right)\frac{1}{N^2}\left(J-\sum_{j=1}^J\frac{1}{N_j}\right), \end{align}\] then \(A_{N2} = o_p(N^{-1/2})\). Hence, 38 is verified.

Now we start from 23 . In the case of different means and same variance, 23 and 24 still hold. The difference lies in that the first term of 24 should be further expressed as \[\begin{align} &\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\dot{\pi}_k(Z_{ij})\phi\left(\frac{{\varepsilon}_{ij}}{\sigma}\right)\left(\frac{\hat{\varepsilon}_{ij}}{\hat{\sigma}}-e_{ij}\right)\nonumber\\ =&\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\dot{\pi}_k(Z_{ij})\phi\left(e_{ij}\right)\frac{\hat{\varepsilon}_{ij}-\varepsilon_{ij}}{\hat{\sigma}}-\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\dot{\pi}_k(Z_{ij})\phi\left(e_{ij}\right)\frac{\varepsilon_{ij}(\hat{\sigma}^2-\sigma^2)}{\hat{\sigma}\sigma(\hat{\sigma}+\sigma)}\nonumber\\ =&-\frac{1}{\hat{\sigma}}\frac{1}{N}\sum_{j=1}^{J}N_j\left(\hat{\mu}_j-\mu_j\right)\left\{\frac{1}{N_j}\sum_{i=1}^{N_j}\dot{\pi}_k(Z_{ij})\phi\left(e_{ij}\right)\right\}\nonumber\\ &-\frac{\hat{\sigma}^2-\sigma^2}{\hat{\sigma}(\hat{\sigma}+\sigma)}\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\dot{\pi}_k(Z_{ij})\phi\left(e_{ij}\right)e_{ij}\nonumber\\ =&-\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)\right]\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}e_{ij}-\frac{1}{2\sigma^2}\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)e\right]\left(\hat{\sigma}^2-\sigma^2\right)\nonumber\\ &+o_p\left(\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}e_{ij}\right)+o_p\left(\hat{\sigma}^2-\sigma^2\right) \nonumber\\ =&-\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)\right]\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}e_{ij}\nonumber\\ &-\frac{1}{2N}\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)e\right]\sum_{j=1}^{J}\sum_{i=1}^{N_j}\left(e_{ij}^2-1\right)+o_p\left(\frac{1}{\sqrt{N}}\right), \label{a2} \end{align}\tag{39}\] where we apply 22 , 38 , LLN for \(N^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\dot{\pi}_k(Z_{ij})\phi\left(e_{ij}\right)e_{ij}\), and CLT for \(N^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}e_{ij}\) as \(N \to \infty\).

It suffices to show \(R_{N1}\) (in 27 ) and \(R_{N2}\) (in 28 ) are \(o_p( N^{-1/2})\) in order to obtain 35 . The technique is similar to that in the proof of Theorem 1. For \(R_{N1}\), we have \[\begin{align} \vert R_{N1}\vert & \lesssim \frac{1}{N} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \left( \frac{\hat{\varepsilon}_{ij} - \varepsilon_{ij}}{\hat{\sigma}}\right)^2 + \frac{1}{N} \sum_{j=1}^{J} \sum_{i=1}^{N_{j}}\left(\frac{\varepsilon_{i j}(\hat{\sigma}^2-\sigma^2)}{\hat{\sigma} \sigma(\hat{\sigma}+\sigma)}\right)^2 \\ & \lesssim \frac{1}{N \sigma^2} \sum_{j = 1}^J N_j \left( \hat{\mu}_j - \mu_j\right)^2 + \left( \hat{\sigma}^2 - \sigma^2 \right)^2 \frac{1}{N} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \varepsilon_{ij}^2 \\ & \asymp O_{p}\left(\frac{J}{N}\right) + O_p \left( \frac{1}{N}\right) \left( \sigma^2 + o_p (1)\right)=O_p \left( \frac{J}{N}\right) = o_p \left( \frac{1}{\sqrt{N}}\right), \end{align}\] by 36 and the assumption \(J = o(N^{1/2})\). For \(R_{N2}\), we first have \[\begin{align} &\frac{1}{N} \sum_{j=1}^{J} \sum_{i=1}^{N_{j}}\left(\frac{\hat{\varepsilon}_{i j}}{\hat{\sigma}}-\frac{\varepsilon_{i j}}{\sigma}\right)^{4}\\ \lesssim &\left[\frac{1}{N} \sum_{j=1}^{J} \sum_{i=1}^{N_{j}}\left(\frac{\hat{\varepsilon}_{i j}-\varepsilon_{ij}}{\hat{\sigma}}\right)^{4}+\frac{1}{N} \sum_{j=1}^{J} \sum_{i=1}^{N_{j}}\left(\frac{\varepsilon_{i j}(\hat{\sigma}^2-\sigma^2)}{\hat{\sigma} \sigma(\hat{\sigma}+\sigma)}\right)^{4}\right] \\ \lesssim &\frac{1}{N} \sum_{j = 1}^J N_j \left( \hat{\mu}_j - \mu_j \right)^4 + (\hat{\sigma}^2-\sigma^2)^{4}\left(\mathrm{E}\left[e^{4}\right]+o_{p}(1)\right) \\ \asymp & o_p \left( \frac{1}{N}\right) + O_p \left( \frac{1}{N^2}\right) = o_p \left( \frac{1}{N}\right), \end{align}\] by 37 . Along with \(N^{-1} \sum_{j=1}^{J} \sum_{i=1}^{N_{j}} \tilde{\varepsilon}_{i j}^2/\tilde{\sigma}^2 = O_p(1)\) from 29 , we obtain \[\left\vert R_{N 2}\right\vert \lesssim\left[\frac{1}{N} \sum_{j=1}^{J} \sum_{i=1}^{N_{j}} \frac{\tilde{\varepsilon}_{i j}^2}{\tilde{\sigma}^2}\right]^{1 / 2}\left[\frac{1}{N} \sum_{j=1}^{J} \sum_{i=1}^{N_{j}}\left(\frac{\hat{\varepsilon}_{i j}}{\hat{\sigma}}-\frac{\varepsilon_{i j}}{\sigma}\right)^{4}\right]^{1 / 2} = o_p \left( \frac{1}{\sqrt{N}}\right).\] Thus, 35 is verified. The form of 35 can be further refined under \(H_0\), \(H_1\) and \(H_{1L}\) respectively. For example, under \(H_0\), we can write \(\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)\right]=c_{1k}\) and \(\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)e\right]=c_{2k}\) in 35 , and then obtain the results of Theorem 4. \(\square\)

Proof of Theorem 7.

The technique is still the same as the previous ones. Recall that in the context of same mean and different variances, we denote \(p_j=N_j/N\), \(q_j=JN_j/N\) for \(1\le j \le J\), and we further assume that as \(\min\{N_1,\ldots,N_J\}\to\infty\) and \(J = o( N^{1/2})\), () there exist \(0<\underline{\sigma}\le \overline{\sigma}<\infty\) such that \(\underline{\sigma}< \inf_{1\le j\le J} \sigma_j\le \sup_{1\le j\le J} \sigma_j<\overline{\sigma}\); () there exist \(0<\underline{q}\le \overline{q}<\infty\) such that \(\underline{q}< \inf_{1\le j\le J} q_j\le \sup_{1\le j\le J} q_j<\overline{q}\).

We first tackle the estimation effects of \(\hat{\mu}-\mu\) and \(\hat{\sigma}^2_j-\sigma^2_j\). For \(\hat{\mu}-\mu\), note that \(\hat{\mu} = J^{-1}\sum_{j = 1}^J N_j^{-1}\sum_{i = 1}^{N_j} Y_{ij}\) here. Then, \[\hat{\mu} - \mu = \frac{1}{J} \sum_{j = 1}^{J} \frac{1}{N_j} \sum_{i = 1}^{N_j} \varepsilon_{ij}=\frac{1}{J} \sum_{j = 1}^{J} \frac{\sigma_j}{N_j} \sum_{i = 1}^{N_j} e_{ij}. \label{mu95hat-mu95diff95var}\tag{40}\] From 40 we obtain \(\mathrm{E}[\hat{\mu} - \mu]=0\), and \(\operatorname{Var}[\hat{\mu} - \mu] = J^{-2} \sum_{j = 1}^{J} N_j^{-1}\sigma_j^2\asymp J^{-2} \sum_{j = 1}^{J} N_j^{-1}=N^{-1}J^{-1} \sum_{j = 1}^{J} q_j^{-1}=O_p(N^{-1})\). Then \(\hat{\mu}-\mu=O_p(N^{-1/2})\). For \(\hat{\sigma}^2_j-\sigma^2_j\), we have \[\label{sigma95j94295hat-sigma95j942} \begin{align} \hat{\sigma}_j^2 - \sigma_j^2 & = \frac{1}{N_j} \sum_{i = 1}^{N_j} \left(\left(Y_{i j}-\hat{\mu}\right)^2-\sigma_j^2\right) = \frac{1}{N_j} \sum_{i = 1}^{N_j} \left( \varepsilon_{ij}^2 - \sigma_j^2 \right) + \left( \hat{\mu} - \mu \right)^2- 2\left( \hat{\mu} - \mu \right) \frac{1}{N_j} \sum_{i = 1}^{N_j} \varepsilon_{ij}\\ & = \frac{\sigma_j^2}{N_j} \sum_{i = 1}^{N_j} \left( e_{ij}^2 - 1\right) + \left( \hat{\mu} - \mu \right)^2- 2\left( \hat{\mu} - \mu \right) \frac{\sigma_j}{N_j} \sum_{i = 1}^{N_j} e_{ij}\\ & = O_p\left(\frac{1}{\sqrt{N_j}}\right)+O_p\left(\frac{1}{N}\right)+O_p\left(\frac{1}{\sqrt{N_j N}}\right)=O_p\left(\frac{1}{\sqrt{N_j}}\right), \end{align}\tag{41}\] by CLT for \(N_j^{-1/2} \sum_{i = 1}^{N_j} e_{ij} \to_d \mathcal{N}(0,1)\) and \(N_j^{-1/2} \sum_{i = 1}^{N_j} ( e_{ij}^2 - 1) \to_d \mathcal{N}(0,\mathrm{E}[e^4]-1)\) as \(N_j \to \infty\) and 40 .

We next prove some preliminary results such that \[\sum_{j = 1}^{J} \frac{N_j}{\sigma_jN} \left\{ \frac{1}{N_j} \sum_{i = 1}^{N_j} \dot{\pi}_k (Z_{ij}) \phi \left( e_{ij}\right)\right\} = \mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)\right] \sum_{j = 1}^{J} \frac{p_j}{\sigma_j}+O_p\left(\frac{1}{\sqrt{N}}\right),\label{c1k95different95var}\tag{42}\] \[\begin{align} &\frac{1}{N} \sum_{j = 1}^J \frac{1}{N_j} \left\{\sum_{i_1 = 1}^{N_j}\sum_{i_2=1}^{N_j} \dot{\pi}_k\left(Z_{i_1 j}\right) \phi\left(e_{i_1 j}\right) e_{i_1 j}\left(e_{i_2 j}^2-1\right)\right\}\nonumber\\ =& \mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)e\right]\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}(e_{ij}^2-1)+o_p\left(\frac{1}{\sqrt{N}}\right), \label{c2k95different95var} \end{align}\tag{43}\] \[\frac{1}{N} \sum_{j = 1}^J \frac{1}{N_j} \sum_{i_1 = 1}^{N_j}\sum_{i_2=1}^{N_j} \dot{\pi}_k\left(Z_{i_1 j}\right) \phi\left(e_{i_1 j}\right) e_{i_1 j} e_{i_2 j} = \mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)e\right]\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}e_{ij}+o_p\left(\frac{1}{\sqrt{N}}\right), \label{c2k95different95var95minus}\tag{44}\] \[\frac{1}{N} \sum_{j = 1}^J \frac{1}{\sigma_j^2} \left\{\sum_{i = 1}^{N_j} \dot{\pi}_k\left(Z_{i j}\right) \phi\left(e_{ij}\right) e_{ij}\right\} = O_p(1),\label{DN295Op1}\tag{45}\] as \(\min\{N_1,\ldots,N_J\}\to\infty\), \(J = o( N^{1/2})\). For 42 , straightforward calculation leads to \[\mathrm{E}\left[\sum_{j = 1}^{J} \frac{N_j}{\sigma_jN} \left\{ \frac{1}{N_j} \sum_{i = 1}^{N_j} \dot{\pi}_k (Z_{ij}) \phi \left( e_{ij}\right)\right\}\right] =\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)\right] \sum_{j = 1}^{J} \frac{p_j}{\sigma_j},\] and \[\begin{align} &\operatorname{Var}\left[\sum_{j = 1}^{J} \frac{N_j}{\sigma_jN} \left\{ \frac{1}{N_j} \sum_{i = 1}^{N_j} \dot{\pi}_k (Z_{ij}) \phi \left( e_{ij}\right)\right\}\right] =\operatorname{Var}\left[\dot{\pi}_k (Z) \phi \left( e\right)\right] \sum_{j = 1}^{J} \frac{p^2_j}{N_j\sigma^2_j}\\ =&\operatorname{Var}\left[\dot{\pi}_k (Z) \phi \left( e\right)\right]\frac{1}{N^2} \sum_{j = 1}^{J} \frac{N_j}{\sigma^2_j}=O\left(\frac{1}{N}\right). \end{align}\] Thus, 42 is verified. The derivation of 43 is analogous to that of 38 if we first rewrite \(N^{-1}\sum_{j=1}^{J}\sum_{i=1}^{N_j}(e_{ij}^2-1)\) on the right side as \[\frac{1}{N}\sum_{j=1}^{J}\frac{1}{N_j}\sum_{i_1=1}^{N_j}\sum_{i_2=1}^{N_j}(e_{i_2j}^2-1),\] and so is 44 . 45 is obtained by \[\mathrm{E}\left[\frac{1}{N}\sum_{j = 1}^J \frac{1}{\sigma_j^2} \{\sum_{i = 1}^{N_j} \dot{\pi}_k(Z_{i j}) \phi(e_{ij}) e_{ij}\}\right] =\mathrm{E}[\dot{\pi}_k(Z) \phi(e) e]\frac{1}{N}\sum_{j=1}^J\frac{N_j}{\sigma_j^2}=O(1),\] and \[\operatorname{Var}\left[\frac{1}{N}\sum_{j = 1}^J \frac{1}{\sigma_j^2}\sum_{i = 1}^{N_j} \dot{\pi}_k(Z_{i j}) \phi(e_{ij}) e_{ij}\right]=O\left(\frac{1}{N}\right).\]

Now we start from 23 . In the case of the same mean and different variances, 23 and 24 still hold. The difference lies in the fact that the first term of 24 should be further expressed as \[\begin{align} &\frac{1}{N} \sum_{j=1}^{J} \sum_{i=1}^{N_{j}} \dot{\pi}_k\left(Z_{i j}\right) \phi\left(\frac{\varepsilon_{i j}}{\sigma_j}\right)\left(\frac{\hat{\varepsilon}_{i j}}{\hat{\sigma}_j}-\frac{\varepsilon_{i j}}{\sigma_j}\right) \\ =&\frac{1}{N} \sum_{j=1}^{J} \sum_{i=1}^{N_{j}} \dot{\pi}_k\left(Z_{i j}\right) \phi\left(e_{ij}\right) \frac{\hat{\varepsilon}_{i j}-\varepsilon_{i j}}{\hat{\sigma}_j}\\ &-\frac{1}{N} \sum_{j=1}^{J} \sum_{i=1}^{N_{j}} \dot{\pi}_k\left(Z_{i j}\right) \phi\left(e_{ij}\right) \frac{\varepsilon_{i j} (\hat{\sigma}_j^2-\sigma_j^2 )}{\hat{\sigma}_j \sigma_j(\hat{\sigma}_j+\sigma_j)}.\\ \equiv& D_{N1} - D_{N2}. \end{align}\] For \(D_{N1}\), under \(H_0\), we have \[\begin{align} D_{N1} &= - \frac{\hat{\mu} - \mu}{N} \sum_{j = 1}^{J} \frac{N_j}{\sigma_j} \left\{ \frac{1}{N_j} \sum_{i = 1}^{N_j} \dot{\pi}_k (Z_{ij}) \phi \left( e_{ij}\right)\right\} \left(1+o_p(1)\right) \nonumber\\ & = - \frac{1}{N} \left( \sum_{j = 1}^J \frac{N}{J N_j} \sum_{i = 1}^{N_j} \varepsilon_{ij}\right) \left( \sum_{j = 1}^{J} \frac{N_j}{\sigma_jN} \left\{ \frac{1}{N_j} \sum_{i = 1}^{N_j} \dot{\pi}_k (Z_{ij}) \phi \left( e_{ij}\right)\right\} \right) \left(1+o_p(1)\right) \nonumber\\ & = - \frac{1}{N} \left( \sum_{j = 1}^J \sum_{i = 1}^{N_j} \frac{\sigma_j e_{ij}}{q_j}\right) \left(c_{1k}\sum_{j = 1}^J \frac{p_j}{\sigma_j} + O_p \left( \frac{1}{\sqrt{N}}\right)\right) \left(1+o_p(1)\right)\nonumber\\ & = - \frac{c_{1k}}{N} \left[ \sum_{j = 1}^J \frac{ p_j}{\sigma_j} \right] \sum_{j = 1}^J \sum_{i = 1}^{N_j} \frac{\sigma_j e_{ij}}{q_j} + o_p \left( \frac{1}{\sqrt{N}}\right), \label{DN1} \end{align}\tag{46}\] where the second last line is followed by 42 , and \(\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)\right]=c_{1k}\) under \(H_0\). The last line is followed by \(N^{-1} \sum_{j = 1}^J q_j^{-1}\sum_{i = 1}^{N_j} \varepsilon_{ij}=o_p(1)\) using LLN. For \(D_{N2}\), under \(H_0\), we have \[\begin{align} D_{N2} =& \frac{1}{N} \sum_{j = 1}^J \frac{ (\hat{\sigma}_j^2-\sigma_j^2)}{\hat{\sigma}_j (\hat{\sigma}_j+\sigma_j)} \left\{\sum_{i = 1}^{N_j} \dot{\pi}_k\left(Z_{i j}\right) \phi\left(e_{ij}\right) e_{ij}\right\} \nonumber\\ =& \frac{1}{N} \sum_{j = 1}^J \frac{1}{\hat{\sigma}_j (\hat{\sigma}_j+\sigma_j)} \left\{\sum_{i = 1}^{N_j} \dot{\pi}_k\left(Z_{i j}\right) \phi\left(e_{ij}\right) e_{ij}\right\}\left\{\frac{\sigma_j^2}{N_j} \sum_{i = 1}^{N_j} \left( e_{ij}^2 - 1\right) + \left( \hat{\mu} - \mu \right)^2 \right. \nonumber \\ &\left.- 2\left( \hat{\mu} - \mu \right) \frac{\sigma_j}{N_j} \sum_{i = 1}^{N_j} e_{ij}\right\}\nonumber\\ =& \frac{1}{N} \sum_{j = 1}^J \frac{\sigma_j^2}{N_j\hat{\sigma}_j (\hat{\sigma}_j+\sigma_j)} \left\{\sum_{i_1 = 1}^{N_j}\sum_{i_2=1}^{N_j} \dot{\pi}_k\left(Z_{i_1 j}\right) \phi\left(e_{i_1 j}\right) e_{i_1 j}\left(e_{i_2 j}^2-1\right)\right\}\nonumber\\ &+\left(\hat{\mu} - \mu\right)^2\frac{1}{N} \sum_{j = 1}^J \frac{1}{\hat{\sigma}_j (\hat{\sigma}_j+\sigma_j)} \left\{\sum_{i = 1}^{N_j} \dot{\pi}_k\left(Z_{i j}\right) \phi\left(e_{ij}\right) e_{ij}\right\}\nonumber\\ &- 2 \left( \hat{\mu} - \mu \right) \frac{1}{N} \sum_{j = 1}^J \frac{\sigma_j}{N_j\hat{\sigma}_j (\hat{\sigma}_j+\sigma_j)} \left\{\sum_{i_1 = 1}^{N_j}\sum_{i_2=1}^{N_j} \dot{\pi}_k\left(Z_{i_1 j}\right) \phi\left(e_{i_1 j}\right) e_{i_1 j} e_{i_2 j}\right\}\nonumber\\ =& \frac{1}{2N} \sum_{j = 1}^J \frac{1}{N_j} \left\{\sum_{i_1 = 1}^{N_j}\sum_{i_2=1}^{N_j} \dot{\pi}_k\left(Z_{i_1 j}\right) \phi\left(e_{i_1 j}\right) e_{i_1 j}\left(e_{i_2 j}^2-1\right)\right\}\left(1+o_p(1)\right)\nonumber\\ &+\left(\hat{\mu} - \mu\right)^2\frac{1}{2N} \sum_{j = 1}^J \frac{1}{\sigma_j^2} \left\{\sum_{i = 1}^{N_j} \dot{\pi}_k\left(Z_{i j}\right) \phi\left(e_{ij}\right) e_{ij}\right\}\left(1+o_p(1)\right)\nonumber\\ &-\left( \hat{\mu} - \mu \right) \frac{1}{N} \sum_{j = 1}^J \frac{1}{N_j \sigma_j} \left\{\sum_{i_1 = 1}^{N_j}\sum_{i_2=1}^{N_j} \dot{\pi}_k\left(Z_{i_1 j}\right) \phi\left(e_{i_1 j}\right) e_{i_1 j}e_{i_2 j}\right\} \nonumber \\ =&\frac{c_{2k}}{2}\frac{1}{N} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \left(e_{ij}^2 - 1\right) + o_p \left( \frac{1}{\sqrt{N}}\right), \label{DN2} \end{align}\tag{47}\] following 40 , 41 , 43 , 44 and 45 , as well as \(\mathrm{E}\left[\dot{\pi}_k(Z)\phi\left(e\right)e\right]=c_{2k}\) under \(H_0\).

It suffices to show \(R_{N1}\) (in 27 ) and \(R_{N2}\) (in 28 ) are \(o_p(N^{-1/2})\) in this case. For \(R_{N1}\), referring to the arguments in the proof of Theorem 4, here we have \[\vert R_{N 1}\vert \lesssim \frac{1}{N} \sum_{j=1}^{J} \sum_{i=1}^{N_{j}}\left(\frac{\hat{\varepsilon}_{i j}-\varepsilon_{i j}}{\hat{\sigma}_j}\right)^2+\frac{1}{N} \sum_{j=1}^{J} \sum_{i=1}^{N_{j}}\left(\frac{\varepsilon_{i j}(\hat{\sigma}_j^2-\sigma_j^2)}{\hat{\sigma}_j \sigma_j(\hat{\sigma}_j+\sigma_j)}\right)^2 \equiv R_{N1}^A + R_{N1}^B.\] For \(R_{N1}^A\), we obtain \[R_{N1}^A = \frac{(\hat{\mu} - \mu)^2}{N} \sum_{j = 1}^J \frac{N_j}{\hat{\sigma}_j^2} \asymp \frac{1}{N} O_p \left( \frac{1}{N}\right) O_p (N)=O_p \left( \frac{1}{N}\right) = o_p \left( \frac{1}{\sqrt{N}}\right),\] by 40 and \[\sum_{j = 1}^J \frac{N_j}{\hat{\sigma}_j^2} = \sum_{j = 1}^J \frac{N_j}{\sigma_j^2}(1+o_p(1))=O_p(N).\] For \(R_{N1}^B\), we have \[\begin{align} R_{N1}^B \lesssim& \frac{1}{N} \sum_{j = 1}^{J} N_j \left( \hat{\sigma}_j^2 - \sigma_j^2\right)^2 \left\{ \frac{1}{N_j} \sum_{i = 1}^{N_j} \varepsilon_{ij}^2 \right\}\\ =&\frac{1}{N} \sum_{j = 1}^{J}\sigma_j^2 \left( \frac{\sigma_j^2}{N_j} \sum_{i = 1}^{N_j} \left( e_{ij}^2 - 1\right) + \left( \hat{\mu} - \mu \right)^2- 2\left( \hat{\mu} - \mu \right) \frac{\sigma_j}{N_j} \sum_{i = 1}^{N_j} e_{ij}\right)^2 \left(\sum_{i = 1}^{N_j} e_{ij}^2\right) \\ \lesssim &\frac{1}{N} \sum_{j = 1}^{J}\frac{\sigma_j^6}{N_j^2}\left( \sum_{i = 1}^{N_j} \left( e_{ij}^2 - 1\right) \right)^2 \left(\sum_{i = 1}^{N_j} e_{ij}^2\right)+\left( \hat{\mu} - \mu \right)^4\left(\frac{1}{N} \sum_{j = 1}^{J}\sigma_j^2\sum_{i = 1}^{N_j} e_{ij}^2\right) \\ &+4\left( \hat{\mu} - \mu \right)^2 \frac{1}{N} \sum_{j = 1}^{J}\frac{\sigma_j^4}{N_j^2} \left( \sum_{i = 1}^{N_j} e_{ij} \right)^2 \left(\sum_{i = 1}^{N_j} e_{ij}^2\right)\\ \equiv& R_{N1}^{B,1}+\left( \hat{\mu} - \mu \right)^4 R_{N1}^{B,2}+4 \left( \hat{\mu} - \mu \right)^2 R_{N1}^{B,3}. \end{align}\] For \(R_{N1}^{B,1}\), we rewrite it as \[R_{N1}^{B,1}=\frac{1}{N}\sum_{j=1}^J\frac{\sigma_j^6}{N_j^2}\sum_{i_1 = 1}^{N_j} \sum_{i_2 = 1}^{N_j}\sum_{i_3 = 1}^{N_j}\left( e_{i_1 j}^2 - 1\right)\left( e_{i_2 j}^2 - 1\right) e_{i_3 j}^2.\] Note that \(\mathrm{E}[( e_{i_1 j}^2 - 1)( e_{i_2 j}^2 - 1) e_{i_3 j}^2]=\mathrm{E}[(e^2-1)^2e^2]\mathbb{1}(i_1= i_2=i_3)+\mathrm{E}[(e^2-1)^2]\mathbb{1}(i_1= i_2\ne i_3)\). Then, \[\begin{align} \mathrm{E}\left[R_{N1}^{B,1}\right] &= \frac{1}{N}\sum_{j=1}^J\frac{\sigma_j^6}{N_j^2} \left(N_j\mathrm{E}\left[(e^2-1)^2e^2\right]+N_j(N_j-1)\mathrm{E}\left[(e^2-1)^2\right] \right)\\ &= O\left(\frac{J}{N}\right), \end{align}\] which implies \(R_{N1}^{B,1}=o_p(N^{-1/2})\). Similarly, for \(R_{N1}^{B,3}\), by \(\mathrm{E}[ e_{i_1 j} e_{i_2 j} e_{i_3 j}^2]=\mathrm{E}[e^4]\mathbb{1}(i_1= i_2)+\mathbb{1}(i_1= i_2\ne i_3)\), we have \[\mathrm{E}\left[R_{N1}^{B,3}\right]=\frac{1}{N}\sum_{j=1}^J\frac{\sigma_j^4}{N_j^2} \left(N_j\mathrm{E}\left[e^4\right]+N_j(N_j-1)\right)=O\left(\frac{J}{N}\right),\] which implies \(R_{N1}^{B,3} = o_p(N^{-1/2})\). For \(R_{N1}^{B,2}\), straightforward calculation leads to \[\mathrm{E}\left[R_{N1}^{B,2}\right] = \mathrm{E}\left[\frac{1}{N} \sum_{j = 1}^{J}\sigma_j^2\sum_{i = 1}^{N_j} e_{ij}^2\right]=\frac{1}{N} \sum_{j = 1}^{J}N_j\sigma_j^2=O(1),\] then \(R_{N1}^{B,2}=O_p(1)\). Along with 40 , we obtain \(R_{N1}^B = o_p ( N^{-1/2})\). Hence \(R_{N1} = o_p ( N^{-1/2})\). On the other hand, for \(R_{N2}\), we first note that \[\begin{align} \frac{1}{N} \sum_{j=1}^{J} \sum_{i=1}^{N_{j}}\left(\frac{\hat{\varepsilon}_{i j}}{\hat{\sigma}_j}-\frac{\varepsilon_{i j}}{\sigma_j}\right)^{4} & \lesssim \frac{1}{N} \sum_{j=1}^{J} \sum_{i=1}^{N_{j}}\left(\frac{\hat{\varepsilon}_{i j}-\varepsilon_{ij}}{\hat{\sigma}_j}\right)^{4}+\frac{1}{N} \sum_{j=1}^{J} \sum_{i=1}^{N_{j}}\left(\frac{\varepsilon_{i j}\left(\hat{\sigma}_j^2-\sigma_j^2\right)}{\hat{\sigma}_j \sigma_j(\hat{\sigma}_j+\sigma_j)}\right)^{4} \\ & \asymp \left( \hat{\mu} - \mu\right)^4 + \frac{1}{N} \sum_{j = 1}^J N_j \left( \hat{\sigma}_j^2 - \sigma_j^2 \right)^4 \left\{ \frac{1}{N_j} \sum_{i = 1}^{N_j} \varepsilon_{ij}^4 \right\} \\ & \asymp O_p \left(\frac{1}{N^2} \right) + o_p \left(\frac{1}{N} \right) = o_p \left(\frac{1}{N} \right), \end{align}\] since \[\begin{align} &\frac{1}{N} \sum_{j = 1}^J N_j \left( \hat{\sigma}_j^2 - \sigma_j^2 \right)^4 \left\{ \frac{1}{N_j} \sum_{i = 1}^{N_j} \varepsilon_{ij}^4 \right\} \\ \le& \left\{\max_{1\le j\le J}\left(\hat{\sigma}_j^2-\sigma_j^2\right)^2\right\}\left\{\frac{1}{N} \sum_{j = 1}^J N_j \left( \hat{\sigma}_j^2 - \sigma_j^2 \right)^2 \left\{ \frac{1}{N_j} \sum_{i = 1}^{N_j} \varepsilon_{ij}^4 \right\} \right\}\\ =&o_p\left(\frac{1}{N} \sum_{j = 1}^J N_j \left( \hat{\sigma}_j^2 - \sigma_j^2 \right)^2 \left\{ \frac{1}{N_j} \sum_{i = 1}^{N_j} \varepsilon_{ij}^4 \right\}\right)\\ =&o_p\left(\frac{1}{\sqrt{N}}\right), \end{align}\] where we use \(\hat{\sigma}^2_j-\sigma^2_j=o_p(1)\) from 41 , and the last step can be obtained through similar arguments above. Besides, \(N^{-1}\sum_{j=1}^{J} \sum_{i=1}^{N_{j}} \tilde{\varepsilon}_{i j}^2/\tilde{\sigma}_j^2=O_{p}(1)\) can be derived from the procedure in 29 analogously. Then we have \[\left\vert R_{N 2}\right\vert \lesssim\left[\frac{1}{N} \sum_{j=1}^{J} \sum_{i=1}^{N_{j}} \frac{\tilde{\varepsilon}_{i j}^2}{\tilde{\sigma}_j^2}\right]^{1 / 2}\left[\frac{1}{N} \sum_{j=1}^{J} \sum_{i=1}^{N_{j}}\left(\frac{\hat{\varepsilon}_{i j}}{\hat{\sigma}_j}-\frac{\varepsilon_{i j}}{\sigma_j}\right)^{4}\right]^{1 / 2} = o_p \left( \frac{1}{\sqrt{N}}\right).\] Therefore, ?? is validated. This completes the proof of Theorem 7. \(\square\)

Proof of Theorem 8 and 9.

The proof of Theorem 8 is analogous to that of Theorem 2. Under \(H_1\), in order to obtain ?? , we just repeat the procedure in the proof of Theorem 7, and replace \(c_{1k}\) with \(d_{1k}\) in 46 , \(c_{2k}\) with \(d_{2k}\) in 47 . From ?? we know that \[\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_k\left(\hat{Z}_{ij}\right) = \mathrm{E}\left[\pi_k (Z)\right] + o_p(1).\] Thus, \[\tilde{\Psi}_K^2=\left( \frac{1}{N}\sum_{j = 1}^J \sum_{i = 1}^{N_j} \boldsymbol{\pi}_K \left(\hat{Z}_{ij}\right) \right)^\top \left(\sum_{j = 1}^J p_j \boldsymbol{\Omega}_K^{(j)} \right)^{-1} \left( \frac{1}{N}\sum_{j = 1}^J \sum_{i = 1}^{N_j} \boldsymbol{\pi}_K \left(\hat{Z}_{ij}\right) \right) \to_p \boldsymbol{a}_K^\top \left(\sum_{j = 1}^J p_j \boldsymbol{\Omega}_K^{(j)} \right)^{-1}\boldsymbol{a}_K.\] Besides, \[\begin{align} &\sqrt{N} \left( \tilde{\Psi}_K^2 - \boldsymbol{a}_K^\top \left(\sum_{j = 1}^J p_j \boldsymbol{\Omega}_K^{(j)} \right)^{-1}\boldsymbol{a}_K \right) \nonumber\\ =& \sqrt{N} \left\{\left[ \boldsymbol{a}_K + \frac{1}{N} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \boldsymbol{\pi}_K\left(\hat{Z}_{ij}\right)-\boldsymbol{a}_K\right]^\top \left(\sum_{j = 1}^J p_j \boldsymbol{\Omega}_K^{(j)} \right)^{-1} \left[ \boldsymbol{a}_K + \frac{1}{N} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \boldsymbol{\pi}_K\left(\hat{Z}_{ij}\right)-\boldsymbol{a}_K\right] \right. \nonumber\\ &\left. \qquad- \boldsymbol{a}_K^\top \left(\sum_{j = 1}^J p_j \boldsymbol{\Omega}_K^{(j)} \right)^{-1} \boldsymbol{a}_K \right\} \nonumber\\ =& 2a^\top \left(\sum_{j = 1}^J p_j \boldsymbol{\Omega}_K^{(j)} \right)^{-1} \frac{1}{\sqrt{N}} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \left( \boldsymbol{\pi}_K \left(\hat{Z}_{ij}\right) - \boldsymbol{a}_K \right) +\left[\frac{1}{\sqrt{N}} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \left( \boldsymbol{\pi}_K \left(\hat{Z}_{ij}\right) - \boldsymbol{a}_K \right)\right]^\top\nonumber\\ &\left(\sum_{j = 1}^J p_j \boldsymbol{\Omega}_K^{(j)} \right)^{-1}\left[\frac{1}{\sqrt{N}} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \left( \boldsymbol{\pi}_K \left(\hat{Z}_{ij}\right) - \boldsymbol{a}_K \right)\right]. \label{Psi95tilde94295derivation} \end{align}\tag{48}\] Under \(H_1\), by CLT, we have \[\begin{align} \frac{1}{\sqrt{N}}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\left(\boldsymbol{\pi}_K\left(\hat{Z}_{ij}\right)-\boldsymbol{a}_K\right) & =\sqrt{N}\left[\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_1\left(\hat{Z}_{ij}\right)-a_1,\ldots,\frac{1}{N}\sum_{j=1}^{J}\sum_{i=1}^{N_j}\pi_K\left(\hat{Z}_{ij}\right)-a_K\right]^\top \\ &\to_d \sum_{j = 1}^J \sqrt{p_j} \boldsymbol{W}_j, \end{align}\] where \(\boldsymbol{W}_1, \ldots, \boldsymbol{W}_J\) are independently distributed as \[\boldsymbol{W}_j \sim \mathcal{N}_K \left( 0, \boldsymbol{\Lambda}_K^{(j)} \right),\] with \(\boldsymbol{\Lambda}_K^{(j)}=\left(\lambda_{kl}^{(j)}\right)_{K \times K}\) given by \[\begin{align} \lambda_{kl}^{(j)} =& \mathrm{E}\left\{ \left(\pi_k (Z)-\mathrm{E}\left[ \pi_k (Z)\right]\right) -d_{1k} \left( \sum_{\ell = 1}^J \frac{p_{\ell}}{\sigma_{\ell}} \right) \frac{\sigma_j e}{q_j} - \frac{d_{2k}}{2} \left( e^2 - 1 \right)\right\} \\ &\quad \left\{ \left(\pi_l (Z)-\mathrm{E}\left[ \pi_l (Z)\right]\right) - d_{1l}\left( \sum_{\ell = 1}^J \frac{ p_{\ell}}{\sigma_{\ell}} \right) \frac{\sigma_j e}{q_j} - \frac{d_{2l}}{2} \left( e^2 - 1 \right)\right\} \\ = &\mathrm{E}\left[ \pi_k (Z) \pi_l (Z) \right]-a_k a_l- \frac{ (d_{1k} d_{3l}+d_{1l} d_{3k}) \sigma_j}{q_j} \sum_{\ell = 1}^J \frac{p_{\ell}}{\sigma_{\ell}} + \frac{d_{1k}d_{1 l} \sigma_j^2}{q_j^2} \left( \sum_{\ell = 1}^J \frac{p_{\ell}}{\sigma_{\ell}}\right)^2 \\ &+ \frac{1}{2} \left[ a_k d_{2l} + a_l d_{2k} \right]-\frac{1}{2}\left[ d_{2k}d_{4l} + d_{2l}d_{4k}\right]+ \frac{ (d_{1k} d_{2l}+d_{1l} d_{2k}) \sigma_j}{2q_j} \sum_{\ell = 1}^J \frac{p_{\ell}}{\sigma_{\ell}}\mathrm{E}\left[e^3\right]\\ &+\frac{d_{2k} d_{2l} }{4} \left[\mathrm{E}\left[e^4\right]-1 \right]. \end{align}\] Therefore, \[\frac{1}{\sqrt{N}} \sum_{j = 1}^J \sum_{i = 1}^{N_j} \left( \boldsymbol{\pi}_K \left(\hat{Z}_{ij}\right) - \boldsymbol{a}_K \right) \to_d \mathcal{N}_K\left( \boldsymbol{0}, \sum_{j = 1}^J p_j \boldsymbol{\Lambda}_K^{(j)}\right),\] and the asymptotic distribution of \(\tilde{\Psi}_K^2\) in ?? can be derived from 48 through similar arguments above. This completes the proof of Theorem 8.

The proof of Theorem 9 can be referred to that of Theorem 3 due to the similar arguments and techniques used. To obtain the results in Theorem 9, following the proof of Theorem 8, it suffices to show that under \(H_{1L}\), \(\mathrm{E}[\pi_k (Z)]=\delta_N \Delta_k\), \(\mathrm{E}[\pi_k(Z)\pi_l(Z)]=\delta_{kl}+o(1)\), \(d_{1k}=c_{1k}+o(1)\), \(d_{2k}=c_{2k}+o(1)\), \(d_{3k}=c_{1k}+o(1)\), \(d_{4k}=c_{2k}+o(1)\), \(\mathrm{E}[e^3]=o(1)\), and \(\mathrm{E}[e^4]=3+o(1)\), which can be found in the proof of Theorem 3. \(\square\)

Proof of Theorem 10.

We focus on the first case in Theorem 10, and the latter two cases can be verified analogously. We primarily follow well-established steps in the literature, for example, the proof of Theorem 2 in [37], Lemma 1 in [38], and Propositions 6 and 7 in [39]. Note that \(\{\hat{K}=k\} \subseteq \{N\hat{\Psi}^2_k-k \log N\ge N\hat{\Psi}^2_{k^\prime}-k^\prime \log N\}\) for \(k^\prime\ne k\), then we have \[\mathrm{P}\left( \hat{K} = k \right) \le \mathrm{P}\left(N\hat{\Psi}^2_k-k \log N\ge N\hat{\Psi}^2_{k^\prime}-k^\prime \log N \right),\quad k^\prime\ne k. \label{K95hat95subadditive}\tag{49}\]

Under \(H_0\), in order to show \(\mathrm{P}(\hat{K}=1)\to 1\) as \(N\to \infty\), we first show the contrary such that \(\sum_{k=2}^{D}\mathrm{P}(\hat{K}=k)\to 0\). In fact, from 49 we have \[\begin{align} &\sum_{k = 2}^{D} \mathrm{P}\left( \hat{K} = k \right)\\ \le& \sum_{k = 2}^{D} \mathrm{P}\left( N\hat{\Psi}^2_k - k \log N \ge N\hat{\Psi}^2_1 - \log N \right) \\ \le& \sum_{k = 2}^{D} \mathrm{P}\left( N\hat{\Psi}^2_k - k \log N \ge - \log N \right)\\ =& \sum_{k = 2}^{D} \mathrm{P}\left( N\hat{\Psi}^2_k \ge (k-1)\log N \right). \end{align}\] From Corollary 1 we know that under \(H_0\), \(N\hat{\Psi}^2_k\to_d \chi^2_k\) for each \(k=1,\ldots,D\). Thus for \(k=2,\ldots,D\), \(\mathrm{P}( N\hat{\Psi}^2_k \ge (k-1)\log N ) \to 0\) as \(N\to \infty\). Therefore, \[\sum_{k = 2}^{D} \mathrm{P}\left( \hat{K} = k \right)\le\sum_{k = 2}^{D}\mathrm{P}\left( N\hat{\Psi}^2_k \ge (k-1)\log N \right) \to 0,\] as \(N\to \infty\), then \(\mathrm{P}( \hat{K} = 1 ) =1- \sum_{k = 2}^{D} \mathrm{P}( \hat{K} = k ) \to 1\), and \(N\hat{\Psi}^2_{\hat{K}}\to_d \chi^2_1\) follows.

Under \(H_1^\prime\), 49 yields that \[\begin{align} \mathrm{P}\left( \hat{K} = k \right) &\le \mathrm{P}\left(N\hat{\Psi}^2_k-k \log N\ge N\hat{\Psi}^2_{K_0}-K_0 \log N \right) \\ &= \mathrm{P}\left(\hat{\Psi}^2_k-k \frac{\log N}{N}\ge \hat{\Psi}^2_{K_0}-K_0 \frac{\log N}{N} \right). \end{align}\] From Theorem 2 we know that as \(N\to \infty\), \(\hat{\Psi}^2_k\to_p 0\) for \(k=1,\ldots, K_0-1\), \(\hat{\Psi}^2_{K_0} \to_p \boldsymbol{a}_{K_0}^\top \boldsymbol{\Sigma}_{K_0}^{-1} \boldsymbol{a}_{K_0}>0\) and \(\log N/N \to 0\). Thus \(\mathrm{P}( \hat{K} = k ) \to 0\) for \(k=1,\ldots, K_0-1\), and \[\lim_{N\to \infty} \mathrm{P}\left( \hat{K} \ge K_0 \right)=1-\lim_{N\to\infty} \sum_{k=1}^{K_0-1} \mathrm{P}\left( \hat{K} = k \right)=1.\] Then, for any \(x\in \mathbb{R}\), as \(N\to\infty\), \[\begin{align} \mathrm{P}\left(N\hat{\Psi}^2_{\hat{K}}\le x\right)&=\sum_{k=1}^D \mathrm{P}\left(N\hat{\Psi}^2_k\le x,\hat{K}=k \right)\\ &\le \sum_{k=1}^{K_0-1} \mathrm{P}\left( \hat{K} = k \right) + \sum_{k=K_0}^D \mathrm{P}\left(N\hat{\Psi}^2_k\le x\right)\to 0, \end{align}\] since \(\hat{\Psi}^2_k \to_p \boldsymbol{a}_k^\top \boldsymbol{\Sigma}_k^{-1} \boldsymbol{a}_k>0\) for \(k=K_0,\ldots,D\). Therefore, for any \(x\in \mathbb{R}\), \[\lim_{N\to\infty}\mathrm{P}\left(N\hat{\Psi}^2_{\hat{K}}\le x\right) = 0,\] which means the test is consistent against the alternatives given by 15 . \(\square\)

Intuition behind the revised limiting null distribution 16 .

We primarily follow the arguments in [35], [18], and [38], and provide a heuristic derivation of the approximation \(H(x)\) for the first case, as the latter two cases can be derived analogously. Note that \[\begin{align} \label{eq95H40x41} \mathrm{P}\left(N \hat{\Psi}_{\hat{K}}^2 \le x\right)=&\mathrm{P}\left(N \hat{\Psi}_1^2 \le x, \hat{K} = 1\right) + \mathrm{P}\left(N \hat{\Psi}_2^2 \le x, \hat{K} = 2\right)\nonumber\\ &+ \mathrm{P}\left(N \hat{\Psi}_{\hat{K}}^2 \le x, \hat{K} \ge 3\right). \end{align}\tag{50}\] The third term on the right-hand side of 50 , \(\mathrm{P}(N \hat{\Psi}_{\hat{K}}^2 \le x, \hat{K} \ge 3)\), can be neglected under \(H_0\) [30]. The event \(\{\hat{K}=1\}\) is approximated by \(\{N\hat{\Psi}^2_1-\log N \ge N\hat{\Psi}^2_2-2\log N\}=\{N(\hat{\Psi}^2_2-\hat{\Psi}^2_1)\le \log N\}\), and \(\{\hat{K}=2\}\) is approximated by \(\{N(\hat{\Psi}^2_2-\hat{\Psi}^2_1)> \log N\}\).

We first investigate the asymptotic distribution of \((N\hat{\Psi}^2_1, N(\hat{\Psi}^2_2-\hat{\Psi}^2_1))^\top\). The limiting distributions of \(N\hat{\Psi}^2_1,N\hat{\Psi}^2_2\) are functions of a bivariate normal vector \((R_1,R_2)^\top \sim \mathcal{N}_2(\boldsymbol{0},\boldsymbol{\Sigma}_2)\). We generally denote elements of \(\boldsymbol{\Sigma}_2\) as \(\begin{pmatrix} a & b\\ b & c \end{pmatrix}\) and \(\rho=b/\sqrt{ac}\). The distribution of \((R_1,R_2)^\top\) can be constructed through two independent standard normal variables \(G_1, G_2\) if we denote \(\tilde{R}_1= \sqrt{a}(\sqrt{1- \rho ^2}G_1+ \rho G_2)\) and \(\tilde{R}_2= \sqrt{c}G_2\), then \((\tilde{R}_1, \tilde{R}_2)^\top \overset{d}{=} (R_1,R_2)^\top\). Thus, \(N\hat{\Psi}^2_1\to_d R_1^2/a\overset{d}{=} \tilde{R}_1^2/ a= (\sqrt{1-\rho^2}G_1+\rho G_2)^2\). For \(N\hat{\Psi}_2^2\), straightforward computations yield that \[\begin{align} N\hat{\Psi}_2^2&\to_d \left(\tilde{R}_1, \tilde{R}_2\right)\boldsymbol{\Sigma}_2^{-1}\left(\tilde{R}_1, \tilde{R}_2\right)^\top \\ &=\frac{1}{ac-b^2} \left(\tilde{R}_1, \tilde{R}_2\right)\begin{pmatrix} c & -b\\ -b & a\end{pmatrix}\left(\tilde{R}_1, \tilde{R}_2\right)^\top\\ &=\frac{1}{ac-b^2}\left(c\tilde{R}_1^2-2b\tilde{R}_1\tilde{R}_2+a\tilde{R}_2^2\right)\\ &=G_1^2+G_2^2. \end{align}\] Then \(N(\hat{\Psi}_2^2-\hat{\Psi}_1^2)\to_d(\rho G_1- \sqrt {1- \rho ^2}G_2)^2\). Since \((\sqrt{1-\rho^2}G_1+\rho G_2,\rho G_1-\sqrt {1- \rho ^2}G_2)^\top \sim \mathcal{N}_2(\boldsymbol{0},\mathbf{I}_2)\), we obtain that \(N\hat{\Psi}^2_1\), \(N(\hat{\Psi}^2_2- \hat{\Psi}^2_1)\) are both asymptotically \(\chi_1^2\) distributed and are asymptotically independent.

Now we calculate the approximation \(H(x)\). We will treat \(H(x)\) separately for \(x\le \log N\), \(\log N<x<2\log N\) and \(x\geq2\log N\). For \(x\le \log N\), note that \[\mathrm{P}\left(N\hat{\Psi}^2_2\le x,\hat{K}=2\right)\approx\mathrm{P}\left(N\hat{\Psi}^2_2\le x,N\hat{\Psi}^2_2-N\hat{\Psi}^2_1>\log N\right)=0,\] by \(N\hat{\Psi}^2_1\geq0\). Thus, we have the approximation \[\begin{align} &\mathrm{P}\left(N\hat{\Psi}^2_1\le x, N\hat{\Psi}^2_2- N\hat{\Psi}^2_1\le \log N\right)\\ \approx&\left[ 2\Phi \left(\sqrt{x}\right) - 1\right] \left[ 2\Phi \left(\sqrt{\log N}\right)-1\right]\equiv H(x), \quad x\le \log N, \end{align}\] from the asymptotic \(\chi_1^2\) distribution and asymptotic independence of \(N\hat{\Psi}^2_1\), \(N(\hat{\Psi}^2_2- \hat{\Psi}^2_1)\). For \(x\geq2\log N\), we have \[\begin{align} &\mathrm{P}\left(N\hat{\Psi}^2_2\le x,\hat{K}=2\right)\\ \approx&\mathrm{P}\left(N\hat{\Psi}^2_2\le x,N\hat{\Psi}^2_2-N\hat{\Psi}^2_1\ge\log N\right)\\ =&\mathrm{P}\left(N\hat{\Psi}^2_2-N\hat{\Psi}^2_1\ge\log N\right)-\mathrm{P}\left(N\hat{\Psi}^2_2>x,N\hat{\Psi}^2_2-N\hat{\Psi}^2_1\ge\log N\right). \end{align}\] As \(N(\hat{\Psi}^2_2-\hat{\Psi}^2_1)\) is approximately \(\chi_1^2\) distributed, we obtain \[\begin{align} &\mathrm{P}\left(N\hat{\Psi}^2_2-N\hat{\Psi}^2_1\ge\log N\right)\\ \approx& 2\left(1-\Phi\left(\sqrt{\log N}\right)\right)\approx 2\frac{\phi\left(\sqrt{\log N}\right)}{\sqrt{\log N}}=\sqrt{\frac{2}{\pi N\log N}}, \end{align}\] by the fact that \(t\phi(t)/(t+1) \le1-\Phi (t) \le \phi(t)/t\) for \(t\to \infty\). Besides, \[\begin{align} &\mathrm{P}\left(N\hat{\Psi}^2_2>x,N\hat{\Psi}^2_2-N\hat{\Psi}^2_1\ge\log N\right)\\ \le&\mathrm{P}\left(N\hat{\Psi}^2_2>x\right)\\ \le&\mathrm{P}\left(N\hat{\Psi}^2_2>2\log N\right)\\ \approx&\exp\left\{-\frac{1}{2}\times 2\log N\right\}\\ =&\frac{1}{N}, \end{align}\] where we use the distribution function of the limiting \(\chi^2_2\) for \(N\hat{\Psi}^2_2\). Hence \(\mathrm{P}(N\hat{\Psi}^2_2-N\hat{\Psi}^2_1\ge\log N)\) converges to 0 much slower than \(\mathrm{P}(N\hat{\Psi}^2_2>x,N\hat{\Psi}^2_2-N\hat{\Psi}^2_1\ge\log N)\), and thus the latter can be neglected. Therefore, for \(x\geq2\log N\), we have the approximation \[\begin{align} &\mathrm{P}\left(N\hat{\Psi}^2_1\le x,N\hat{\Psi}^2_2-N\hat{\Psi}^2_1\le\log N\right)+\mathrm{P}\left(N\hat{\Psi}^2_2-N\hat{\Psi}^2_1\ge\log N\right)\\ \approx&\left[2\Phi\left(\sqrt{x}\right)-1\right]\left[2\Phi\left(\sqrt{\log N}\right)-1\right]+2\left[1-\Phi\left(\sqrt{\log N}\right)\right]\\ \equiv& H(x),\quad x\geq2\log N. \end{align}\] For \(\log N<x < 2\log N\), [30] suggested the linearization as follows \[H(x)=H(\log N)+\frac{x-\log N}{\log N}[H(2\log N)-H(\log N)],\quad\log N<x<2\log N.\] This completes the derivation of the approximation 16 for the null distribution of the test statistic \(N\hat{\Psi}^2_{\hat{K}}\). \(\square\)

Implement issues in simulation studies.

In our simulation study, we adopt the orthonormal Legendre polynomials on \([0,1]\) for \(\{\pi_k\}_{k = 0}^{\infty}\). The properties of orthonormal Legendre polynomials lead to the following expressions for constants \(c_{1k}\) and \(c_{2k}\): \[\begin{align} c_{1k} &= \frac{\sqrt{2 k + 1}}{2^k} \, \sum_{j = 0}^{[k / 2]} (-1)^j \binom{k}{j} \binom{2(k - j)}{k} \int_0^1 (2 z - 1)^{k - 2j} \, \Phi^{-1}(z) \, \mathrm{d}z \\ & = \sqrt{2 k + 1} \, \sum_{j = 0}^{[k / 2]} \left( \frac{-1}{4} \right)^j \binom{k}{j} \binom{2(k - j)}{k} \, \int_{-1/2}^{1/2} x^{k - 2j} \, \Phi^{-1} \left( x + \frac{1}{2}\right) \, \mathrm{d}z, \end{align}\] and \[c_{2k} = \sqrt{2 k + 1} \, \sum_{j = 0}^{[k / 2]} \left( \frac{-1}{4} \right)^j \binom{k}{j} \binom{2(k - j)}{k} \, \int_{-1/2}^{1/2} x^{k - 2j} \, \left[\Phi^{-1} \left( x + \frac{1}{2}\right) \right]^2 \, \mathrm{d}z.\] Note that \(c_{2k}=0\) when \(k\) is odd and \(c_{1k}=0\) when \(k\) is even. Therefore, in our experiments, we only compute the non-zero elements based on the above two formulae for \(k=1,\ldots, K\) with \(K\le 5\).

Additional simulation results.

The results of Experiment \(^\prime\) are reported in Tables 14, 15, 16 and Figure 5. The main findings are broadly consistent with those in Experiment , with two additional observations. First, Table 14 indicates that the limiting \(\chi^2_1\) distribution for the data-driven test yields valid control of the Type error. Second, the empirical distribution of \(\hat{K}\) exhibits increased stability under both the null and the alternative hypotheses, with instances where it degenerates to a single value with probability one. Both phenomena are due to the increase in total sample size resulting from the larger number of groups, which is consistent with the theoretical results.

Table 14: Empirical rejection rates under \(H_0\) in Experiment \(^\prime\), for fixed \(K\) (\(K = 1,\ldots,5\)), the data-driven test with limiting \(\chi^2_1\) null distribution (\(\hat{K}\&\chi^2_1\)), and the data-driven test with approximated null distribution \(H(x)\) (\(\hat{K}\&H(x)\)). The empirical power under \(H_1\) equals 1 across all settings.
\(m\)	\(K = 1\)	\(K = 2\)	\(K = 3\)	\(K = 4\)	\(K = 5\)	\(\hat{K}\&\chi^2_1\)	\(\hat{K}\&H(x)\)
10	0.028	0.046	0.062	0.036	0.054	0.060	0.048
20	0.046	0.066	0.078	0.052	0.068	0.056	0.052
30	0.050	0.056	0.044	0.040	0.048	0.044	0.042
40	0.054	0.054	0.042	0.040	0.044	0.058	0.052
50	0.052	0.062	0.052	0.056	0.054	0.048	0.042
60	0.062	0.050	0.060	0.048	0.048	0.052	0.046
70	0.068	0.052	0.052	0.054	0.044	0.058	0.054
80	0.052	0.034	0.044	0.062	0.052	0.048	0.046
90	0.048	0.048	0.034	0.050	0.058	0.042	0.040
100	0.064	0.048	0.052	0.048	0.052	0.058	0.054
110	0.044	0.048	0.058	0.058	0.032	0.060	0.056
120	0.052	0.058	0.066	0.044	0.052	0.080	0.078
130	0.046	0.050	0.038	0.058	0.062	0.058	0.052
140	0.058	0.046	0.032	0.054	0.046	0.056	0.050
150	0.046	0.040	0.060	0.050	0.054	0.038	0.038

Table 15: Empirical frequency of \(\hat{K}\) under \(H_0\) in Experiment \(^\prime\)
\(m\)	\(\hat{K}=1\)	\(\hat{K}=2\)
10	0.994	0.006
20	1	0
30	0.992	0.008
40	0.994	0.006
50	0.988	0.012
60	0.994	0.006
70	0.998	0.002
80	0.992	0.008
90	0.996	0.004
100	0.994	0.006
110	0.996	0.004
120	0.998	0.002
130	0.990	0.010
140	0.994	0.006
150	0.998	0.002

Table 16: Empirical frequency of \(\hat{K}\) under \(H_1\) in Experiment \(^\prime\)
\(m\)	\(\hat{K}=4\)	\(\hat{K}=5\)
10	0.120	0.880
20	0.024	0.976
30	0	1
40	0	1
50	0	1
60	0	1
70	0	1
80	0	1
90	0	1
100	0	1
110	0	1
120	0	1
130	0	1
140	0	1
150	0	1

Figure 5: The sample means and error bars of \hat{K} in Experiment ^\prime — Figure 5: The sample means and error bars of \(\hat{K}\) in Experiment \(^\prime\)

References↩︎

[1]

Gelman, A. (2005). Analysis of variance: Why it is more important than ever. , 33: 1–53.

[2]

Scheffé, H. (1959). . John Wiley & Sons, New York.

[3]

Miller Jr, R. G. (1997). . CRC press.

[4]

Wickens, T. D. and Keppel, G. (2004). . Pearson Prentice-Hall Upper Saddle River, NJ.

[5]

Dean, A., Voss, D., and Draguljić, D. (2017). . Springer.

[6]

Hirotsu, C. (2017). . John Wiley & Sons.

[7]

Montgomery, D. C. (2017). . John wiley & sons.

[8]

Ali, M. M. and Sharma, S. C. (1996). Robustness to nonnormality of regression f-tests. , 71: 175–205.

[9]

Pearson, E. S. (1931). The analysis of variance in cases of non-normal variation. , pages 114–133.

[10]

Gayen, A. (1950). The distribution of the variance ratio in random samples of any size drawn from non-normal universes. , 37: 236–255.

[11]

David, F. and Johnson, N. (1951a). A method of investigating the effect of nonnormality and heterogeneity of variance on tests of the general linear hypothesis. , pages 382–392.

[12]

David, F. N. and Johnson, N. (1951b). The effect of non-normality on the power function of the f-test in the analysis of variance. , 38: 43–57.

[13]

Srivastava, A. (1959). Effect of non-normality on the power of the analysis of variance test. , 46: 114–122.

[14]

Atiqullah, M. (1962). The estimation of residual variance in quadratically balanced least-squares problems and the robustness of the f-test. , 49: 83–91.

[15]

Tiku, M. (1964). Approximating the general non-normal variance-ratio sampling distributions. , 51: 83–95.

[16]

Donaldson, T. S. (1968). Robustness of the f-test to errors of both kinds and the correlation between the numerator and denominator of the f-ratio. , 63: 660–676.

[17]

Tiku, M. L. (1971). Power function of the f-test under non-normal situations. , 66: 913–916.

[18]

Janic-Wróblewska, A. and Ledwina, T. (2000). Data driven rank test for two-sample problem. , 27: 281–297.

[19]

Bonett, D. G. and Woodward, J. A. (1990). Testing residual normality in the anova model. , 17: 383–387.

[20]

Bonett, D. G. and Seier, E. (2002). A test of normality with high uniform power. , 40: 435–445.

[21]

Hwang, Y.-T. and Wei, P. F. (2006). A novel method for testing normality in a mixed model of a nested classification. , 51: 1163–1183.

[22]

Neyman, J. (1937). Smooth test for goodness of fit. , 1937: 149–199.

[23]

Bera, A. K. and Ghosh, A. (2002). Neyman’s smooth test and its applications in econometrics. , 165: 170–230.

[24]

Thas, O. (2010). . Springer, New York.

[25]

Lehmann, E. and Romano, J. P. (2022). . Springer, New York.

[26]

Bera, A. K., Ghosh, A., and Xiao, Z. (2013). A smooth test for the equality of distributions. , 29: 419–446.

[27]

Song, X. and Xiao, Z. (2022). On smooth tests for the equality of distributions. , 38: 194–208.

[28]

Ledwina, T. (1994). Data-driven version of neyman’s smooth test of fit. , 89: 1000–1005.

[29]

Kallenberg, W. C. and Ledwina, T. (1995a). Consistency and monte carlo simulation of a data driven version of smooth goodness-of-fit tests. , pages 1594–1608.

[30]

Kallenberg, W. C. M. and Ledwina, T. (1995b). On data driven neyman’s tests. , 15: 409–426.

[31]

Inglot, T. and Ledwina, T. (1996). Asymptotic optimality of data-driven neyman’s tests for uniformity. , 24: 1982–2019.

[32]

Inglot, T., Kallenberg, W. C., and Ledwina, T. (1997). Data driven smooth tests for composite hypotheses. , 25: 1222–1250.

[33]

Kallenberg, W. C. and Ledwina, T. (1997). Data-driven smooth tests when the hypothesis is composite. , 92: 1094–1104.

[34]

Kallenberg, W. C. and Teresa, L. (1997). Data driven smooth tests for composite hypotheses comparison of powers. , 59: 101–121.

[35]

Kallenberg, W. C. and Ledwina, T. (1999). Data-driven rank tests for independence. , 94: 285–301.

[36]

Ducharme, G. R. and Fontez, B. (2004). A smooth test of goodness-of-fit for growth curves and monotonic nonlinear regression models. , 60: 977–986.

[37]

Ducharme, G. R. and Lafaye de Micheaux, P. (2004). Goodness-of-fit tests of normality for the innovations in arma models. , 25: 373–395.

[38]

Kraus, D. (2007). Data-driven smooth tests of the proportional hazards assumption. , 13: 1–16.

[39]

Duchesne, P., Lafaye De Micheaux, P., and Tagne Tatsinkou, J. (2016). Estimating the mean and its effects on neyman smooth tests of normality for arma models. , 44: 241–270.

[40]

Ma, Y., Mazumdar, M., and Memtsoudis, S. G. (2012). Beyond repeated-measures analysis of variance: advanced statistical methods for the analysis of longitudinal data in anesthesia research. , 37: 99–105.

[41]

Schober, P. and Vetter, T. R. (2018). Repeated measures designs and analysis of longitudinal data: if at first you do not succeed—try, try again. , 127: 569–575.

[42]

Langenberg, B., Helm, J. L., and Mayer, A. (2022). Repeated measures anova with latent variables to analyze interindividual differences in contrasts. , 57: 2–19.

[43]

Van der Vaart, A. W. (2000). . Cambridge university press.

Department of Business Statistics and Econometrics, Guanghua School of Management, Peking University, Beijing, 100871, China. E-mail: 2201111026@stu.pku.edu.cn.↩︎
Department of Business Statistics and Econometrics, Guanghua School of Management, Peking University, Beijing, 100871, China. E-mail: sxj@gsm.pku.edu.cn.↩︎
Department of Economics, University of California, San Diego, La Jolla, 92092, USA. E-mail: h8wei@ucsd.edu.↩︎
Here under \(H_{1L}\), the asymptotic normality of \(\sqrt{N}(\hat{\mu}-\mu)\) and \(\sqrt{N}(\hat{\sigma}^2-\sigma^2)\) are both derived from CLT for triangular arrays since the distribution of \(e_{ij}\) changes over \(N\). It can be verified that the conditions of Lindeberg–Feller CLT hold for both random sequences, see, for example, Proposition 2.27 of [43] .↩︎

Smooth Tests for Normality in ANOVA