Abstract

The analysis of platform trials can be enhanced by utilizing non-concurrent controls. Since including this data might also introduce bias in the treatment effect estimators if time trends are present, methods for incorporating non-concurrent controls adjusting for time have been proposed. However, so far their behavior has not been systematically investigated in platform trials that include interim analyses. To evaluate the impact of a futility interim analysis in trials utilizing non-concurrent controls, we consider a platform trial featuring two experimental arms and a shared control, with the second experimental arm entering later. We focus on a frequentist regression model that uses non-concurrent controls to estimate the treatment effect of the second arm and adjusts for time using a step function to account for temporal changes. We show that performing a futility interim analysis in Arm 1 may introduce bias in the point estimation of the effect in Arm 2, if the regression model is used without adjustment, and investigate how the marginal bias and bias conditional on the first arm continuing after the interim depend on different trial design parameters. Moreover, we propose a new estimator of the treatment effect in Arm 2, aiming to eliminate the bias introduced by both the interim analysis in Arm 1 and the time trends, and evaluate its performance in a simulation study. The newly proposed estimator is shown to substantially reduce the bias and type I error rate inflation while leading to power gains compared to an analysis using only concurrent controls.

1 Introduction↩︎

Recent years have seen an increased demand for innovative clinical trial designs to accelerate drug development and optimize the use of resources [1]. Platform trials provide a framework for investigating multiple treatment arms simultaneously, while allowing new promising arms that become available during the trial to enter later on [2], [3]. Typically, all treatment arms are compared to a shared control arm, which leads to a reduction in the required sample size compared to conducting separate clinical trials. Moreover, interim analyses are often included to enable an early decision on the efficacy or futility of the treatment arms and further acceleration of the drug development process [4]. For late-entering arms, the shared control arm is split into two groups: the concurrent (CC) and non-concurrent controls (NCC), where the first group refers to patients randomized to the control arm while the evaluated arm was also open for randomization, and the latter to patients allocated to the control arm before the evaluated arm joined the platform. Over the last years, it has been critically discussed whether and how the non-concurrent controls should be used for the analysis of late-entering arms, since direct pooling of concurrent and non-concurrent control may lead to biased treatment effect estimates and type I error rate inflation due to temporal drifts [5], [6]. Such temporal drifts may result from, for instance, changes in the patient population, standard of care treatment, or seasonal effects [7].

Several approaches aiming to utilize non-concurrent controls while ensuring valid statistical inference have recently been proposed. In particular, Lee and Wason [8] and Bofill Roig et al. [9] considered incorporating the NCC data using a regression model with categorical time adjustment, including the periods in the trial as a fixed effect, where periods are defined as time intervals bounded by any experimental arm entering or leaving the platform. This approach was investigated in a simple platform trial with two treatment arms and it was shown that this model yields unbiased treatment effect estimates if the time trends are equal across all arms and additive on the model scale [9]. Krotka et al. [10] examined the performance of this model in more complex platform trials with an arbitrary number of treatment arms and extended the frequentist methodology for incorporating non-concurrent controls by proposing more flexible methods, such as spline regression or mixed models. Another frequentist method proposed by Marschner and Schou [11] uses a network meta-analysis approach to analyze the platform trial as a network of direct randomized comparisons and indirect non-randomized comparisons. Methods based on propensity score weighting for incorporating NCC in platform trials with time trends under the framework of causal inference were considered by Guo et al. [12]. Among Bayesian approaches considered for utilizing NCC data are the Bayesian Time Machine [13] and the meta-analytic-predictive (MAP) prior approach [14], [15]. The Time Machine approach uses a Bayesian hierarchical model to smooth the control response over calendar time intervals. The MAP prior approach borrows data from the non-concurrent periods to obtain the prior distribution for the control response in the concurrent periods, while accounting for the between-trial heterogeneity. Several frequentist and Bayesian methods for incorporating NCC data were recently compared in a comprehensive simulation study [16]. Interim analyses in platform trials were discussed by Greenstreet et al. [17] in the context of a multi-stage design that allows for the late entry of new arms in a pre-planned way. They proposed an approach for performing interim analyses pre-specified in the planning phase, while controlling the family-wise error rate. However, this approach does not incorporate non-concurrent controls, nor does it account for potential time trends. Including interim analyses in platform trial designs that aim to utilize non-concurrent controls has not yet been discussed in the literature.

In group sequential trials, which allow for early termination of the trial due to efficacy or futility, it is known that the standard maximum likelihood estimates are generally no longer unbiased [18], [19]. This results from early stopping if more extreme results are observed in interim analyses. Many authors have proposed adjusted estimators to eliminate this bias [20], [21]. Aiming to reduce the mean bias, Whitehead [22] considered an adjustment strategy in which a new estimator is constructed from the original maximum likelihood estimator by subtracting the estimate of its bias. The resulting estimator is referred to as the mean adjusted estimator (MAE). This estimator offers a balanced trade-off in settings where equal importance is given to both bias and residual mean squared error, computed marginally as well as conditionally on stopping in a given stage [20]. Further improvement in the efficiency of unbiased estimators is often of interest. Emerson and Fleming proposed a derivation of a uniform minimum variance unbiased estimator (UMVUE) by applying the Rao-Blackwell theorem to an unbiased estimate of the treatment effect calculated from the data from the first stage [23]. The Rao-Blackwell theorem states that the variance of an unbiased estimator of an unknown parameter can be improved by conditioning on a sufficient test statistic for this parameter. Furthermore, according to the Lehmann-Scheffé theorem, if the considered test statistic is both sufficient and complete, the resulting Rao-Blackwellized estimator is a unique minimum variance unbiased estimator of the unknown parameter. Later, it was shown that the test statistic considered by Emerson and Fleming, assumed to be sufficient and complete, is indeed sufficient, but unfortunately not complete [24]. However, it was demonstrated that the proposed Rao-Blackwellized estimator is still UMVUE in a class of estimators that can be constructed in case of early stopping without requiring any knowledge of future analyses [20], [21], [24]. Another class of unbiased estimators is the so-called median unbiased estimators (MUE), which are constructed such that the probability of overestimating the true value of the unknown parameter is the same as the probability of underestimating it. In order to derive a median unbiased estimator in group sequential trials, a sample space ordering has to be chosen upfront, as the resulting MUE depends on this ordering. MUEs based on the stage-wise ordering [23], likelihood ratio ordering [25], or the score test ordering [26] can be constructed. A recent review by Grayling and Wason [20] provides a detailed overview of nine point estimators proposed for group sequential designs and compares them within a common framework for a two-stage group sequential trial. The authors argue that the optimal estimator depends on the operating characteristics that one desires to minimize in a given trial, and their importance in the marginal and conditional value.

Even though the bias in treatment effect estimates in classical group sequential trials has been well studied, the impact of conducting an interim analysis on the effect estimation in a platform trial utilizing non-concurrent controls in the analysis has not yet been investigated. In this work, we aim to examine how, in platform trials that evaluate multiple experimental treatments, the effect estimation in late-entering arms is affected by interim results in earlier arms. For simplicity, we focus on a two-arm platform trial, where the second arm enters the trial later on, and an interim analysis for Arm 1 is performed at the time when Arm 2 is added. Such a timing of the interim analysis enables monitoring of the ongoing trial at the time of addition of the new arm [17]. We examine the previously proposed model-based approach for including non-concurrent controls that includes time as a fixed effect [8], [9]. Focusing on platform trials with continuous endpoints, we describe how the weight of non-concurrent controls included in the estimation of the treatment effect of Arm 2 depends on the interim result in Arm 1, and show that applying the current regression model, as considered by Lee and Wason [8] and Bofill Roig et al. [9], in group sequential platform trials leads to biased treatment effect estimators and type I error rate inflation. Following the idea from Whitehead [22], we propose a mean adjusted estimator to mitigate this bias. We derive the analytical expression for the bias introduced due to the interim analysis enabling futility stopping, and investigate the performance of the newly proposed estimator in terms of the bias, root mean squared error, type I error rate, and power in a simulation study. The extensions to efficacy stopping are discussed in the Discussion Section.

The remainder of this paper is structured as follows: Section 2 describes the considered design setting and reviews the current model-based approach that includes periods as a fixed effect. In Section 3, we explore the marginal and conditional bias that may arise when using the current regression model without any adjustments in a platform trial with an interim analysis. In Section 4, we propose a treatment effect estimator that mitigates the bias due to interim analysis and consider the corresponding hypothesis test using this newly proposed estimator. In Section 5, we investigate the performance of the proposed estimators using simulations and compare them with the current methods. We conclude the paper with a discussion in Section 6.

2 Design Setting↩︎

Consider a platform trial with \(2\) treatment arms (indexed by \(k=1,2\)) and a common control group (\(k=0\)), where Arm 1 starts at the beginning of the trial and Arm 2 enters the ongoing trial later on. Both arms finish the trial at the same time. The trial is split into two periods, divided by the time point where Arm 2 joins the platform. For Arm 1, an interim analysis is performed, with the possibility of dropping this arm due to futility. It is assumed that this interim analysis is conducted at the time point of adding Arm 2. The second arm is only assessed in the final analysis at the end of the trial, after the total sample size is reached. In this final analysis, we aim to compare the efficacy of Arm 2 with the shared control group, i.e., including the NCC data for this arm. In this work, we consider platform trials with a continuous endpoint. The considered design is illustrated in Figure 1.

Before discussing trials involving an interim analysis, we review the model-based approach for analyzing platform trials using non-concurrent controls without interim analyses [8], [9], and describe the estimation of the treatment effect in Arm 2 compared to the shared control. Subsequently, we aim to adjust this estimate for an interim analysis conducted in Arm 1.

Figure 1: The considered platform trial design with an interim analysis for Arm 1 at the time point where Arm 2 joins the platform.

2.1 Regression model with period adjustment in trials without interim analysis↩︎

For the final analysis of Arm 2, we use a frequentist regression model that is fitted using all available data from the trial and adjusts for time trends by including the factor period as a categorical covariate [8], [9]. This model is defined as follows:

\[\label{eq95freqmodel} E(y_j) = \eta_0 + \sum_{k=1,2} \theta_k \cdot I(k_j = k) + \tau \cdot I(s_j=2)\tag{1}\]

where \(y_j\) is the continuous response of patient \(j\) (\(j = 1, \ldots, N\)), with \(N\) denoting the total sample size in the trial. The intercept \(\eta_0\) represents the control response in period 1, \(\theta_k\) denotes the treatment effect of arm \(k\) and \(\tau\) indicates stepwise effect of period 2. The response variance, denoted by \(\sigma^2\), is common for all arms and is assumed to be known.

Let \(\bar y_{ks}\) denote the sample mean in arm \(k\) and period \(s\). The regression model estimates the effect of the treatment Arm 2 as the difference between the sample mean from this treatment arm and the model-based estimate of the control response in period 2. This estimator is then given by:

\[\label{eq95trt295estimate} \tilde{\theta}_2 = \bar{y}_{22} - \tilde{y}_{02}\tag{2}\]

The model-based estimate of the control response in period 2 \(\tilde{y}_{02}\) is a weighted average of the mean of the concurrent controls and the mean of the non-concurrent controls, adjusted by the time trend estimated from Arm 1:

\[\label{eq95ctrl95estimate} \tilde{y}_{02} = (1- \varrho) \cdot \bar{y}_{02} + \varrho \cdot [ \bar{y}_{01} + \bar y_{12} - \bar y_{11} ]\tag{3}\]

The weight given to the non-concurrent controls is given by the factor \(\varrho\), which takes into account the sample sizes in each arm and period [9]:

\[\label{eq95rho} \varrho = \frac{ \frac{ 1 }{ n_{02} } }{ \frac{ 1 }{ n_{01} } + \frac{ 1 }{ n_{02} } + \frac{ 1 }{ n_{11} } + \frac{ 1 }{ n_{12} } }\tag{4}\] where \(n_{ks}\) denotes the sample size in arm \(k\) and period \(s\).

It was shown that in trials without interim analyses, 2 is an unbiased treatment effect estimate if the time trends in all arms are equal and additive on the model scale [9].

3 Bias introduced by futility interim analysis in regression-based estimators↩︎

In this section, we investigate the bias of the model-based estimator 2 if it is applied in a platform trial where an interim analysis is performed for Arm 1.

Suppose that an interim analysis of Arm 1 is performed at the time when Arm 2 is added to the trial. This interim analysis is based on a one-sided z-test comparing the mean responses between Arm 1 and control in period 1, i.e., testing \(H_0: \theta_1=0\) vs \(H_1: \theta_1 > 0\). We consider stopping for futility if the one-sided \(p\)-value from the z-test is larger than a futility bound \(\alpha_1\). Note that when including an interim analysis for Arm 1, the treatment effect estimator for Arm 2 depends on the interim decision. If Arm 1 continues in the trial, the estimator given by 2 depends on the non-concurrent control data, since by 4 \(\varrho>0\). However, if Arm 1 stops at the interim analysis, \(n_{12}=0\) and therefore the weight \(\varrho\) of the non-concurrent control data becomes zero and the model does not utilize the NCC data. In this case, the treatment effect estimate of Arm 2 is \(\tilde{\theta}_2 = \bar y_{22} - \bar{y}_{02}\). Hence, the extent of borrowing from the NCC data and thus the model-based treatment effect estimator in the final analysis for Arm 2 is stochastically dependent on the interim treatment effect estimate for Arm 1. As a result, it is also dependent on the non-concurrent control data. Below, we derive the marginal and conditional bias of the model-based treatment effect estimator for Arm 2 for the platform trial design in Figure 1.

Assume there is a stepwise time trend with strength \(\lambda\), i.e., the mean response increases by \(\lambda\) at the beginning of period 2. Hence, the sample means in period 1 \(\bar y_{01}\) and \(\bar y_{11}\) follow normal distributions with means \(\mu_0\) and \(\mu_1\), respectively. In period 2, the sample means \(\bar y_{02}\), \(\bar y_{12}\), and \(\bar y_{22}\) follow normal distributions with means \(\mu_0 + \lambda\), \(\mu_1 + \lambda\), and \(\mu_2 + \lambda\), respectively. Note that the treatment effects of arms 1 and 2, denoted by \(\theta_1 = \mu_1 - \mu_0\) and \(\theta_2 = \mu_2 - \mu_0\), respectively, are constant across the whole trial, since the time trend affects all arms equally. Denote by \(Z_{11} = (\bar y_{11} - \bar y_{01}) / (\sigma \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}})\) the interim Z-statistic for Arm 1. The arm is stopped for futility if the interim p-value \(p_{11} = 1- \Phi(Z_{11})\) is larger than a futility boundary \(\alpha_1\), hence if \(Z_{11} < \Phi^{-1}(1-\alpha_1) = c_1\). The probability to stop Arm 1 at the interim analysis is \(\text{I\kern-0.15em P}(Z_{11} < c_1) = \Phi(\gamma)\), with \(\gamma = c_1 - \theta_1/(\sigma\sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}})\). Note that under the null hypothesis for Arm 1, this probability simplifies to \(1-\alpha_1\).

As shown in Appendix 7, \(\tilde{\theta}_2\) is biased. Its marginal bias –that is, the expected bias when averaging over interim outcomes (Arm 1 stopping in the interim analysis and continuing to period 2)– is given by:

\[\begin{align} \label{eq95bias95marg} E[ \tilde{\theta}_2 - \theta_2] = \varrho \cdot \sigma\sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}} \cdot\phi \left( \gamma \right) \end{align}\tag{5}\] where \(\phi(x)\) is the probability density function of the standard normal distribution evaluated at \(x\). Note that the bias does not depend on the strength of the time trend \(\lambda\). It is not caused by the time trend but arises from the interim analysis. Moreover, conditional on the event that Arm 1 stops, \(\tilde{\theta}_2\) is unbiased. This follows since, in this case, the treatment effect estimate in Arm 2 does not depend on the data from period 1. Hence, the bias of \(\tilde{\theta}_2\) conditional on the event that Arm 1 continues is given by the marginal bias 5 divided by the probability that Arm 1 continues:

\[\begin{align} \label{eq95bias95cond} E[ \tilde{\theta}_2 - \theta_2 | Z_{11} \ge c_1] = \frac{E[ \tilde{\theta}_2 - \theta_2]}{1-\Phi\left( \gamma \right)} \end{align}\tag{6}\]

The extent of this bias depends on: the weight of the NCC data when Arm 1 continues, \(\varrho\) (i.e., the preplanned sample sizes in Arm 1 and control); the futility boundary, \(\alpha_1\); and the standardized effect size in Arm 1, \(\theta_1 / (\sigma\sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}})\). The model-based estimator is biased under both the null and alternative hypotheses for Arm 1. However, the bias does not depend on the treatment effect in Arm 2, \(\theta_2\). Note that under the null hypothesis for Arm 1 (\(\theta_1 = 0\)), 6 simplifies to:

\[\begin{align} \label{eq95bias95cond95null} E[ \tilde{\theta}_2 - \theta_2 | Z_{11} \ge c_1] = \varrho \cdot \sigma\sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}} \cdot \frac{\phi \left( c_1 \right)}{\alpha_1} \end{align}\tag{7}\]

Figure 2: Marginal and conditional bias of the unadjusted model-based treatment effect estimator for Arm 2 2 when varying different design parameters. In all cases, no treatment effect for Arm 1 is assumed (\theta_1=0), and a unit variance \sigma^2=1 in each arm is used. A) Varying the futility bound \alpha_1. Sample sizes are set to 150 per arm and period. B) Varying the sample sizes ratio between period 1 and period 2 r = n_{01}/n_{02} = n_{11}/n_{12}. Sample sizes per arm in period 2 are fixed (n_{02} = n_{12} = 150). Futility bound \alpha_1 = 0.5 is used. C) Varying the allocation ratio in Arm 1 vs control a = n_{11}/n_{01} = n_{12}/n_{02}. The sample size in the control arm is fixed to 150 in each period (n_{01} = n_{02} = 150). Futility bound \alpha_1 = 0.5 is used. — Figure 2: Marginal and conditional bias of the unadjusted model-based treatment effect estimator for Arm 2 2 when varying different design parameters. In all cases, no treatment effect for Arm 1 is assumed (\(\theta_1=0\)), and a unit variance \(\sigma^2=1\) in each arm is used.
A) Varying the futility bound \(\alpha_1\). Sample sizes are set to 150 per arm and period.
B) Varying the sample sizes ratio between period 1 and period 2 \(r = n_{01}/n_{02} = n_{11}/n_{12}\). Sample sizes per arm in period 2 are fixed (\(n_{02} = n_{12} = 150\)). Futility bound \(\alpha_1 = 0.5\) is used.
C) Varying the allocation ratio in Arm 1 vs control \(a = n_{11}/n_{01} = n_{12}/n_{02}\). The sample size in the control arm is fixed to 150 in each period (\(n_{01} = n_{02} = 150\)). Futility bound \(\alpha_1 = 0.5\) is used.

Figure 2 shows the marginal 5 and conditional 6 bias when varying different design parameters of the considered platform trial. Figure 2-A, shows the bias as a function of the futility bound \(\alpha_1\), when \(\theta_1 = 0\). While the conditional bias is decreasing in \(\alpha_1\), the marginal bias is maximized for \(\alpha_1 = 0.5\), i.e., when Arm 1 has a 50% probability to continue after the interim analysis. Note that for \(\alpha_1=0\) and \(\alpha_1=1\), the treatment effect estimator \(\tilde{\theta}_2\) is marginally unbiased, since these futility bounds result either in always using the separate analysis (since Arm 1 always stops) or always using the model-based approach (since Arm 1 always continues), respectively. The behavior of the bias with respect to the sample size ratio between period 1 and period 2 is shown in Figure 2-B. In particular, the sample sizes per arm in period 2 are set to 150, and the ratio \(r = n_{01}/n_{02} = n_{11}/n_{12}\) is varied. Both, the marginal and conditional bias are maximized for \(r=1\), i.e. when the periods are equally sized. This is because for smaller period 1 sample sizes (\(r<1\)), the weight of the period 1 data from Arm 1 in the Arm 2 treatment effect estimate is smaller. On the other hand, for larger period 1 sample sizes (\(r>1\)), the period 1 estimates have smaller standard errors and therefore their conditional bias (conditional on the event that the trial continues) is smaller. The impact of the allocation ratio between Arm 1 and control on the bias is illustrated in Figure 2-C. Specifically, the sample size in the control arm is fixed in each period (\(n_{01} = n_{02} = 150\)), and the sample size in Arm 1 is varied using the allocation ratio \(a = n_{11}/n_{01} = n_{12}/n_{02}\). Note that the sample sizes per period are equal in each arm. Both marginal and conditional bias increase with the sample size in Arm 1 relative to the control sample size.

4 An Estimator Adjusting for Bias Induced by the Interim Analysis↩︎

In this section, we introduce a bias-adjusted treatment effect estimator for Arm 2 to mitigate the bias induced by the interim analysis in Arm 1.

4.1 Mean adjusted estimator↩︎

To account for the bias due to the interim analysis for Arm 1, we consider a mean adjusted estimator (MAE) [22], constructed by subtracting the estimated conditional bias of \(\tilde{\theta}_2\) (conditional on the event that Arm 1 continues after the interim analysis) \(E[ \tilde{\theta}_2 - \theta_2 | Z_{11} \ge c_1]\) from the original model-based estimator, \(\tilde{\theta}_2\), in cases when Arm 1 continues after the interim analysis. The resulting bias-adjusted estimator of the treatment effect in Arm 2, \(\tilde{\theta}^A_2\), is given by: \[\begin{align} \label{eq95MAE} \tilde{\theta}^A_2 = \begin{cases} \bar{y}_{22} - \bar{y}_{02}, & \text{if } Z_{11} < c_1, \\[6pt] \tilde{\theta}_2 - \widehat{B}\left[\tilde{\theta}_2 \mid Z_{11} \ge c_1\right], & \text{if } Z_{11} \ge c_1, \end{cases} \end{align}\tag{8}\] with \[\begin{align} \widehat{B}\left[\tilde{\theta}_2 \mid Z_{11} \ge c_1 \right]= \frac{\varrho \cdot \sigma\sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}} \cdot \phi (\hat{\gamma})}{ 1 - \Phi (\hat{\gamma})} \text{,where} \hat{\gamma} = c_1 - \frac{\hat{\theta}_1}{\sigma \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}}} \end{align}\] The term \(\widehat{B}[\tilde{\theta}_2 | Z_{11} \ge c_1]\) denotes the estimator of the conditional bias defined in 6 . Note that we use the conditional bias of \(\tilde{\theta}_2\) for the adjustment, because the MAE is only employed in case Arm 1 continues, as there is no bias in the effect estimation for Arm 2 if Arm 1 stops at the interim.

The expression for the conditional bias 6 depends on the futility bound and the standardized treatment effect of Arm 1. While the sample sizes, futility bound, and sample variance are considered as known parameters here, the treatment effect in Arm 1, \(\theta_1\), is an unknown parameter. However, given that the mean adjusted estimator is computed at the end of the trial, we can estimate \(\theta_1\) using the observed data. Therefore, in order to estimate \(\widehat{B}[\tilde{\theta}_2 | Z_{11} \ge c_1]\), we can use a plug-in estimator for \(\theta_1\), denoted by \(\hat{\theta}_1\). In particular, we consider four approaches for estimating \(\theta_1\): (i) using the sample means of responses from Arm 1 and the control arm from the whole trial, denoted by \(\bar y_{1\cdot}\) and \(\bar y_{0\cdot}\) respectively, that is \(\hat{\theta}_1 = \bar y_{1\cdot} - \bar y_{0\cdot}\); (ii) using only the sample means of responses from period 1, that is \(\hat{\theta}_1 = \bar y_{11} - \bar y_{01}\); (iii) using only data from period 2, that is \(\hat{\theta}_1 = \bar y_{12} - \bar y_{02}\); and (iv) the conditional uniform minimum variance unbiased estimator (CUMVUE), denoted by \(\hat{\theta}_1^{CUMVUE}\), derived by applying the Rao-Blackwell theorem. The estimators (iii) and (iv) are unbiased estimators of \(\theta_1\) conditionally on continuing after the interim.

As shown by Grayling and Wason [20], the CUMVUE estimator for Arm 1 after period 2 can be constructed from the uniform minimum variance unbiased estimator (UMVUE), given by:

\[\begin{align} \label{eq95umvue} \hat{\theta}_1^{UMVUE} = (\bar y_{1\cdot} - \bar y_{0\cdot}) - \frac{I_2 - I_1}{I_2 \sqrt{I_1}} \cdot \frac{- \phi(c_1, Z_{12}\sqrt{I_1/I_2}, (I_2 - I_1)/I_2)}{1-\Phi(c_1, Z_{12} \sqrt{I_1/I_2}, (I_2 - I_1)/I_2)} \end{align}\tag{9}\]

where \(Z_{12} = (\bar y_{1\cdot} - \bar y_{0\cdot}) / \sigma\sqrt{1/(n_{11} + n_{12}) + 1/(n_{01} + n_{02})}\) denotes the Z-statistic for arm 1 at the end of the trial, and \(I_1 = 1/(\sigma^2(1/n_{11} + 1/n_{01}))\) and \(I_2 = 1/(\sigma^2(1/(n_{11} + n_{12}) + 1/(n_{01} + n_{02})))\) are the information levels in periods 1 and 2, respectively. The CUMVUE for \(\theta_1\) after period 2 is then constructed using 9 as follows:

\[\begin{align} \label{eq95cumvue} \hat{\theta}_1^{CUMVUE} = \frac{Z_{12} \cdot \sqrt{I_2} - I_1 \cdot \hat{\theta}_1^{UMVUE}}{ (I_2 - I_1)} \end{align}\tag{10}\]

4.1.1 Hypothesis testing↩︎

Now consider the problem of testing the null hypothesis of no treatment effect in Arm 2, \(H_{0,2}: \theta_2 = 0\) against the one-sided alternative \(H_{1,2}: \theta_2 > 0\). If Arm 1 continues after the interim analysis, we aim to utilize non-concurrent controls while adjusting for biases resulting from time trends and interim analysis in Arm 1. Therefore, we propose using the mean adjusted estimator, in 8 , to define the Wald-type test statistic:

\[\begin{align} \label{eq95test95stat} T = \begin{cases} \tilde{\theta}^A_2 / \sigma\sqrt{1/n_{02} + 1/n_{22}}, & \text{if } Z_{11} < c_1, \\[6pt] \tilde{\theta}^A_2 / \sqrt{\widehat{\text{Var}}(\tilde{\theta}^A_2 \mid Z_{11} \ge c_1)}, & \text{if } Z_{11} \ge c_1, \end{cases} \end{align}\tag{11}\] Under the null hypothesis \(H_{0,2}\), the test statistic \(T\) follows the standard normal distribution. We reject \(H_{0,2}\) at the one-sided significance level \(\alpha\) if \(T > \Phi^{-1}(1-\alpha)\). Note that for the case where Arm 1 stops at the interim analysis, this is equivalent to the standard z-test.

To obtain an estimator of the variance of the mean adjusted estimator \(\tilde{\theta}^A_2\) in cases where Arm 1 continues, we employ the bootstrap. Specifically, the conditional variance \(\text{Var}(\tilde{\theta}^A_2 \mid Z_{11} \ge c_1)\) is estimated via bootstrap stratified per arm and period. Letting \(Y^{i*}_{ks}\) denote a response observation randomly drawn from arm \(k\) and period \(s\), we considered the following bootstrap algorithm:

Draw two independent bootstrap samples \(Y^{1 *}_{11}, \ldots, Y^{n_{11} *}_{11}\) and \(Y^{1 *}_{01}, \ldots, Y^{n_{01} *}_{01}\) from period 1 observations in Arm 1 and control, respectively.
Compute the interim test statistic \(Z_{11}^* = (\bar y^*_{11} - \bar y^*_{01}) / \sqrt{\frac{\sigma^2}{n_{11}} + \frac{\sigma^2}{n_{01}}}\) using data from the bootstrap sample. Here, \(\bar y^*_{11}\) and \(\bar y^*_{01}\) denote the mean of the bootstrap samples drawn in Step 1.
If \(Z_{11}^* < c_1\), discard this resample and restart at Step 1. Otherwise, continue to Step 4.
Draw two independent bootstrap samples \(Y^{1 *}_{12}, \ldots, Y^{n_{12} *}_{12}\) and \(Y^{1 *}_{02}, \ldots, Y^{n_{02} *}_{02}\) from period 2 of the trial; from Arm 1 and control, respectively.
Compute \(\tilde{\theta}_{2}^{A*}\) with bootstrapped datasets from both periods.
Repeat Steps 1-5 until Arm 1 has continued after the interim analysis \(B\) times, yielding estimates \(\tilde{\theta}_{2,1}^{A*}, \ldots, \tilde{\theta}_{2, B}^{A*}\).
Compute the bootstrap variance of \(\tilde{\theta}^A_2\) as follows:

\[\widehat{\text{Var}} \left( \tilde{\theta}^A_2 \mid Z_{11} \ge c_1 \right) = \frac{1}{B} \sum_{j=1}^{B} \left( \tilde{\theta}_{2,j}^{A*} - \frac{1}{B} \sum_{j=1}^{B} \tilde{\theta}_{2,j}^{A*} \right)^2\]

Note that the bootstrap procedure mimics the course of the original trial, preserving the sample sizes per arm and period, as well as the futility stopping rule used at the interim analysis. In order to base the variance estimation on an equal number of bootstrap estimates (denoted by \(B\)), regardless of the futility bound \(\alpha_1\), we repeat the bootstrap procedure until we obtain \(B\) samples where Arm 1 has continued after the interim analysis.

5 Simulation Study↩︎

5.1 Evaluated operating characteristics↩︎

We performed a simulation study to evaluate the properties of the proposed estimator and of the corresponding test statistic.

In addition to examining the bias of the new estimator, it is also crucial to investigate its variance, quantified in terms of the root mean squared error, since unbiased or bias-reducing estimators oftentimes come with the price of increased variance. An estimator that, despite being unbiased, has a substantial increase in the root mean squared error compared to another approach should not be considered for practical use [18]. Moreover, in group sequential designs, each operating characteristic can be considered in its marginal (overall) value, as well as conditional on stopping in a particular stage.

In the simulation study presented below, we therefore investigate the newly proposed estimator in terms of the bias, root mean squared error, and the corresponding hypothesis test in terms of type I error rate, and statistical power. We discuss the marginal and conditional bias of both considered methods. For the remaining operating characteristics, we focus on its value conditional on Arm 1 continuing after the interim analysis, since the mean adjusted estimator is only employed in this case. Results for marginal operating characteristics can be found in the supplementary material. We compare the performance of the proposed mean adjusted estimate to the unadjusted model-based estimate and the separate analysis, which only considers data from period 2. All simulation scenarios were replicated 100.000 times to estimate the investigated operating characteristics. In the bootstrap algorithm for estimating the variance of \(\tilde{\theta}^A_2\), we used \(B=1000\) replicates.

5.2 Data generation and considered design parameters↩︎

We evaluate the performance of the proposed estimators in platform trials without time trends, as well as in trials that contain temporal drifts of different patterns and strengths.

Consider a platform trial with two experimental arms, as described in Section 2 and illustrated in Figure 1. The continuous outcome \(y_j\) for patient \(j\) is then drawn from a normal distribution according to \(y_j \sim \mathcal{N}(\mu, \sigma^2)\) with \[\begin{gather} \mu = \eta_0 + \sum_{k=1,2} \theta_k \cdot I(k_j = k) + f(j)\text{and}\sigma^2 = 1 \end{gather}\]

where \(\eta_0\) and \(\theta_k\) are the response in the control arm and the effect of treatment \(k\). The function \(f(j)\) denotes the time trend with a magnitude parameterized by the parameter \(\lambda\). In particular, we consider the following time trend patterns:

Linear time trend: \(f(j) = \lambda \cdot \frac{j-1}{N-1}\), where \(N\) indicates the maximum sample size in the trial, assuming that Arm 1 continues to period 2. The mean response in arm \(k\) linearly increases with the slope \(\lambda\) over time.
Stepwise time trend: \(f(j) = \lambda \cdot I(s_j = 2)\), where \(s_j\) denotes the period for patient \(j\). Hence, there is a jump in the mean response of size \(\lambda\) when Arm 2 is added to the trial.

Table 1 summarizes the values of the design parameters that we consider in the simulation study. In particular, the futility bound \(\alpha_1\) varies from 0.1 to 0.95. We do not consider the extreme bounds of 0 and 1, as they result in either always conducting a separate analysis based only on the period 2 data or always using the regression model, and thus the analysis is marginally unbiased in these cases. The sample sizes per arm and period are varied in such a way that the impact of two ratios on the operating characteristics can be investigated. First, the ratio between the sample sizes in periods 1 and 2, \(r = n_{01}/n_{02} = n_{11}/n_{12}\), which is varied from 1/15 to 10. Thus, the sample size per arm in period 1 can then be expressed as a multiple of the sample size per arm in period 2, which is kept constant. Second, the allocation ratio in Arm 1 and the control arm, denoted by \(a = n_{11}/n_{01} = n_{12}/n_{02}\), is also varied from 1/15 to 10, allowing to examine the behavior of the methods under different allocation ratios. In particular, the sample size in Arm 1 is \(a\)-times larger/smaller with respect to the sample size in the control arm, which is kept constant. The sample sizes in Arm 2 and the control arm in period 2 are always set to 150. The strength of the time trend \(\lambda\) is varied from \(-0.15\) to \(0.15\), i.e., considering both increase and decrease in the mean response over time.

Table 1: Considered values of the design parameters varied in the simulations and the resulting sample sizes. When varying one design parameter, the remaining ones were kept constant at the value highlighted in bold type. In total, 28 scenarios were considered under both the null and alternative hypotheses.
Design parameters:	Considered values:
Futility bound \(\alpha_1\)	0.1, 0.15, 0.2, 0.25, 0.35, 0.50, 0.65, 0.75, 0.95
Ratio \(r = n_{01}/n_{02} = n_{11}/n_{12}\)	1/15, 1/3, 1, 2, 4, 7, 10
Ratio \(a = n_{11}/n_{01} = n_{12}/n_{02}\)	1/15, 1/3, 1, 2, 4, 7, 10
Time trend strength \(\lambda\)	-0.15, -0.075, 0, 0.075, 0.15
Sample sizes considered fixed:
\(n_{02}\)	150
\(n_{22}\)	150
Sample sizes resulting from the considered ratios:
\(n_{01}\)	10, 50, 150, 300, 600, 1050, 1500
\(n_{11}\)	10, 50, 150, 300, 600, 1050, 1500
\(n_{12}\)	10, 50, 150, 300, 600, 1050, 1500

We focus on testing the null hypothesis \(H_{0,2}: \theta_2 = \mu_2 - \mu_0 = 0\) against the one-sided alternative \(H_{1,2}: \theta_2 > 0\) at the significance level \(\alpha=0.025\) using the proposed mean adjusted estimator \(\tilde{\theta}^A_2\), as well as the unadjusted model-based estimator \(\tilde{\theta}_2\). For comparison, we include results for the root mean squared error and statistical power obtained when using the separate analysis, regardless of the interim outcome. We consider scenarios where Arm 2 is under the null hypothesis (\(\theta_2 = 0\)), as well as cases where the alternative holds. In the latter case, the treatment effect for Arm 2 is set to \(\theta_2 = 0.32\) such that the separate analysis comparing Arm 2 to the concurrent controls leads to approximately 80% power using the given sample sizes. In the simulations, we only consider cases where the null hypothesis holds for Arm 1 (\(\theta_1=0\)). This is sufficient, as the results for Arm 2 depend on the probability of Arm 1 continuing to period 2, which is based on the combination of both the treatment effect and the rule used for the interim analysis.

5.3 Results↩︎

We first examine the marginal and conditional bias, summarized in Figures 3 and 4. Figure 3 shows the marginal bias in the effect estimator for Arm 2 for varying futility bounds \(\alpha_1\), and under different time trend patterns. The bias is given for the unadjusted model-based estimator, as well as the proposed mean adjusted estimator with \(\theta_1\) estimated from either both, only period 1, only period 2, or using the Rao-Blackwellized CUMVUE. As discussed in Section 3, the marginal bias of the unadjusted estimator \(\tilde{\theta}^A_2\) reaches its maximum for \(\alpha_1 = 0.5\), i.e., in cases where Arm 1 continues in the trial with 50% probability. The mean adjusted estimators provide a substantial reduction in the marginal bias. In particular, the MAEs with \(\theta_1\) estimated only from period 2 or using the CUMVUE even lead to a conservative effect estimator for Arm 2 and hence to a slightly negative bias. The MAEs with \(\hat{\theta}_1\) estimated from both periods or only period 1 still maintain some bias due to the inclusion of the period 1 data, which was used for the interim decision. This is more pronounced when the period 1 data are used for the estimation of \(\theta_1\) exclusively. The obtained bias is consistent across scenarios with no time trend, and linear or stepwise time trends with \(\lambda=0.15\). Hence, the mean adjusted estimator is robust to time trends, even if their shape is misspecified relative to the assumption of a stepwise pattern used for the derivation. This holds for all considered operating characteristics. Therefore, in the main paper, we only present results under no time trends, and results for linear time trends. The corresponding results for other time trends are presented in the supplementary material.

Figure 4 shows the bias conditional on Arm 1 continuing after the interim analysis, as a function of the futility bound \(\alpha_1\), the ratio between sample sizes in periods 1 and 2 \(r\), the ratio between sample sizes in Arm 1 and control \(a\), and the strength of the time trend \(\lambda\). Again, the conditional bias of the model-based estimator is reduced in all cases when using the mean adjusted estimator. This improvement is more pronounced in designs that use a lower futility bound at the interim analysis, since the conditional bias of the unadjusted estimator has the highest value in these cases (see Figure 4-A). The MAEs with \(\theta_1\) estimated from either both periods or only period 1 still show a positive conditional bias, which also decreases with increasing \(\alpha_1\). This is because with lower futility bounds, only more extreme samples in period 1, i.e., those with higher estimated treatment effects in Arm 1, will allow for continuation after the interim analysis compared to scenarios with higher futility bounds. This overestimation of \(\theta_1\) in period 1 leads to lower values of the estimated bias used for the computation of the MAEs, if the period 1 data is included. As a result, there is insufficient bias reduction and hence a remaining positive bias in the treatment effect estimation for Arm 2. On the other hand, the MAEs with \(\theta_1\) estimated only from period 2 and using the CUMVUE are slightly negatively biased. However, their performance is more robust to the varying \(\alpha_1\). For increasing sample sizes in period 1, while keeping the period 2 sample sizes constant, the MAE with \(\theta_1\) estimated only from period 2 becomes more negatively biased, and can even lead to a similar absolute conditional bias as the unadjusted model-based estimator (see Figure 4-B). In this case, the MAE with \(\theta_1\) estimated using the CUMVUE provides a substantial improvement, since \(\hat{\theta}_1^{CUMVUE}\) has a lower variance than \(\hat{\theta}_1\) computed from period 2 only. Moreover, the conditional bias when using the MAEs with \(\theta_1\) estimated from period 2 or employing the CUMVUE, does not sensitively depend on the allocation ratio between Arm 1 and the control arm (see Figure 4-C), and does not depend on the strength of the time trend \(\lambda\) (see Figure 4-D).

Figure 5 shows the conditional root mean squared error (rMSE) of the newly proposed estimator compared to the model-based unadjusted estimator. In all scenarios, the adjusted estimators achieve a lower rMSE than the separate analysis using concurrent controls only. However, in most scenarios, the bias adjusted estimators based on the period 2 estimate of \(\theta_1\) and the CUMVUE have a larger mean squared error than the unadjusted estimator. For all considered methods, the rMSE is decreasing with increasing sample sizes in period 1 (see Figure 5-B), which is due to the increase in the total sample size in the platform trial in these scenarios.

Next, we assess the properties of the corresponding hypothesis test. The conditional type I error rate is shown in Figure 6. The inflation of the conditional type I error rate caused by applying the unadjusted model-based estimator is substantially reduced when using the mean adjusted estimators. However, the MAEs computed using \(\hat{\theta}_1\) from period 1 or both periods still show some inflation. The MAEs with \(\theta_1\) estimated using the CUMVUE yield type I error rates within the simulation error in all considered scenarios. In contrast, the MAE with \(\hat{\theta}_1\) computed only from period 2 data leads to a strictly conservative test if the sample size in period 1 is much larger than in period 2 (see Figure 6-B). Figure 6-D demonstrates that the type I error rate for given sample sizes and futility bound is not dependent on the strength and pattern of the time trend.

Figure 7 shows the conditional power of the model-based procedures and, for comparison, the separate analysis using only concurrent controls. The highest power is achieved by the unadjusted model-based estimator in all considered cases, which, however, is due to the bias in the treatment effect estimation and inflation in the type I error rate. Nevertheless, the mean adjusted estimators that control the type I error rate (based on \(\theta_1\) estimates from period 2 and using the CUMVUE) also lead to power gains compared to the separate analysis, as they make use of the non-concurrent control data. As expected, the conditional power increases for increasing period 1 sample sizes (keeping the period 2 sample size fixed) and increasing sample sizes in Arm 1 (see Figures 7-B and 7-C, respectively). As the other operating characteristics, the conditional power does not depend on the strength of the time trend (see Fig. 7-D).

Figure 3: Marginal bias in the treatment effect estimator for Arm 2 for varying futility bound (\alpha_1) using the unadjusted estimator and the mean adjusted estimator with \theta_1 estimated from both periods, only period 1, only period 2, or using the CUMVUE. This figure corresponds to a platform trial with sample sizes per arm and period set to 150. The strength of the time trend is \lambda=0.15 for both linear and stepwise patterns. — Figure 3: Marginal bias in the treatment effect estimator for Arm 2 for varying futility bound (\(\alpha_1\)) using the unadjusted estimator and the mean adjusted estimator with \(\theta_1\) estimated from both periods, only period 1, only period 2, or using the CUMVUE. This figure corresponds to a platform trial with sample sizes per arm and period set to 150. The strength of the time trend is \(\lambda=0.15\) for both linear and stepwise patterns.

Figure 4: Conditional bias in the treatment effect estimator for Arm 2 using the unadjusted estimator \tilde{\theta}_2 and the mean adjusted estimator \tilde{\theta}^A_2 with \theta_1 estimated from both periods, only period 1, only period 2, or using the CUMVUE. A) Varying futility bound (\alpha_1). This figure corresponds to a platform trial without time trends, with sample sizes per arm and period set to 150. B) Varying ratio between the sample sizes in period 1 and period 2 (r), where the period 2 sample sizes per arm are fixed. Sample sizes per arm in period 1 are determined by the chosen ratio r, i.e., increase or decrease with respect to the period 2 sample sizes r-times. This figure corresponds to a platform trial without time trends, with sample sizes per arm in period 2 set to 150, using a futility bound \alpha_1=0.5. C) Varying ratio between the sample sizes in Arm 1 and control (a), where the control arm sample sizes per period are fixed. Sample sizes in Arm 1 per period are determined by the chosen ratio a, i.e., increase or decrease with respect to the control arm sample sizes a-times. This figure corresponds to a platform trial without time trends, with sample sizes in the control arm per period set to 150, using a futility bound \alpha_1=0.5. D) Varying strength of the time trend (\lambda). This figure corresponds to a platform trial with a linear time trend of strength \lambda, with sample sizes per arm and period set to 150, using a futility bound \alpha_1=0.5. — Figure 4: Conditional bias in the treatment effect estimator for Arm 2 using the unadjusted estimator \(\tilde{\theta}_2\) and the mean adjusted estimator \(\tilde{\theta}^A_2\) with \(\theta_1\) estimated from both periods, only period 1, only period 2, or using the CUMVUE.
A) Varying futility bound (\(\alpha_1\)). This figure corresponds to a platform trial without time trends, with sample sizes per arm and period set to 150.
B) Varying ratio between the sample sizes in period 1 and period 2 (\(r\)), where the period 2 sample sizes per arm are fixed. Sample sizes per arm in period 1 are determined by the chosen ratio \(r\), i.e., increase or decrease with respect to the period 2 sample sizes \(r\)-times. This figure corresponds to a platform trial without time trends, with sample sizes per arm in period 2 set to 150, using a futility bound \(\alpha_1=0.5\).
C) Varying ratio between the sample sizes in Arm 1 and control (\(a\)), where the control arm sample sizes per period are fixed. Sample sizes in Arm 1 per period are determined by the chosen ratio \(a\), i.e., increase or decrease with respect to the control arm sample sizes \(a\)-times. This figure corresponds to a platform trial without time trends, with sample sizes in the control arm per period set to 150, using a futility bound \(\alpha_1=0.5\).
D) Varying strength of the time trend (\(\lambda\)). This figure corresponds to a platform trial with a linear time trend of strength \(\lambda\), with sample sizes per arm and period set to 150, using a futility bound \(\alpha_1=0.5\).

Figure 5: Conditional root mean squared error of the treatment effect estimator for Arm 2 using the unadjusted estimator \tilde{\theta}_2, the mean adjusted estimator \tilde{\theta}^A_2 with \theta_1 estimated from both periods, only period 1, only period 2, or using the CUMVUE, as well as using the separate analysis. See the legend of Figure 4 for the explanation of the subfigures A)-D). — Figure 5: Conditional root mean squared error of the treatment effect estimator for Arm 2 using the unadjusted estimator \(\tilde{\theta}_2\), the mean adjusted estimator \(\tilde{\theta}^A_2\) with \(\theta_1\) estimated from both periods, only period 1, only period 2, or using the CUMVUE, as well as using the separate analysis. See the legend of Figure 4 for the explanation of the subfigures A)-D).

Figure 6: Conditional type I error rate for Arm 2 using the unadjusted estimator \tilde{\theta}_2 and the mean adjusted estimator \tilde{\theta}^A_2 with \theta_1 estimated from both periods, only period 1, only period 2, or using the CUMVUE. All figures include a dashed reference line for the nominal significance level of 0.025 and the simulation error is shown as a gray area representing the 95% confidence interval of the simulated type I error rate. Note that in Figure 6-A, this area is not constant over different values of \alpha_1. This is because for lower values of \alpha_1, the conditional type I error rate is estimated based on fewer samples, since Arm 1 has a lower probability to continue after the interim analysis in these cases. See the legend of Figure 4 for the explanation of the subfigures A)-D). — Figure 6: Conditional type I error rate for Arm 2 using the unadjusted estimator \(\tilde{\theta}_2\) and the mean adjusted estimator \(\tilde{\theta}^A_2\) with \(\theta_1\) estimated from both periods, only period 1, only period 2, or using the CUMVUE. All figures include a dashed reference line for the nominal significance level of 0.025 and the simulation error is shown as a gray area representing the 95% confidence interval of the simulated type I error rate. Note that in Figure 6-A, this area is not constant over different values of \(\alpha_1\). This is because for lower values of \(\alpha_1\), the conditional type I error rate is estimated based on fewer samples, since Arm 1 has a lower probability to continue after the interim analysis in these cases. See the legend of Figure 4 for the explanation of the subfigures A)-D).

Figure 7: Conditional power for Arm 2 using the unadjusted estimator \tilde{\theta}_2, the mean adjusted estimator \tilde{\theta}^A_2 with \theta_1 estimated from both periods, only period 1, only period 2, or using the CUMVUE, as well as using the separate analysis. See the legend of Figure 4 for the explanation of the subfigures A)-D). — Figure 7: Conditional power for Arm 2 using the unadjusted estimator \(\tilde{\theta}_2\), the mean adjusted estimator \(\tilde{\theta}^A_2\) with \(\theta_1\) estimated from both periods, only period 1, only period 2, or using the CUMVUE, as well as using the separate analysis. See the legend of Figure 4 for the explanation of the subfigures A)-D).

6 Discussion↩︎

Existing methods for NCC inclusion have not accounted for interim analyses. We demonstrated that model-based approaches used to adjust for time trends in platform trials can introduce a positive bias and inflate type I error rates once futility interim analyses are included, even in the absence of actual time trends. In particular, conducting an interim analysis in one arm affects the treatment effect estimation in other arms when non-concurrent controls are used. This occurs because the weight of the non-concurrent controls in the model-based estimator depends on the interim result of other arms.

To address this challenge in a platform trial with two experimental arms and two periods, we proposed mean adjusted treatment effect estimators for Arm 2, derived from the original model-based estimator by subtracting its estimated bias. This estimator adjusts for time trends while accounting for the interim analysis in Arm 1. Because the bias-adjusted estimator for Arm 2 depends on an estimate of the treatment effect \(\theta_1\) of Arm 1, we considered several estimators. In a simulation study, we showed that using the CUMVUE to estimate \(\theta_1\) minimized the bias of the bias-adjusted estimator of Arm 2 across all considered options. In none of the settings did we observe a positive bias that overestimated the treatment effect.

Moreover, we proposed an adjusted hypothesis test for Arm 2, using the Wald test based on the mean adjusted estimator. This test controls the type 1 error rate when using the CUMVUE to estimate \(\theta_1\) and yields power gains compared to a separate analysis using only concurrent controls, due to the increased sample size from incorporating non-concurrent controls. However, these power gains are notably smaller than those achieved by the regression model in settings without interim analyses. This is because the mean adjusted estimator accounts simultaneously for time trends and the interim analysis. When using non-concurrent controls, the potential gains in power must be weighed against the added complexity and reduced robustness to model assumptions. The proposed estimator is robust to time trends, even when their shape deviates from the assumed stepwise pattern. However, since the time trend adjustment relies on the assumptions of equal time trends across all arms and additivity of the time trends on the model scale, deviations from these assumptions may lead to bias and inflation of the type I error rate [9].

In this work, we focused on futility interim analyses, where the unadjusted model-based estimator that does not account for the interim analysis is positively biased. In contrast, with interim analyses allowing early stopping for efficacy, the unadjusted model-based estimator is, by symmetry, negatively biased. This occurs because the mean response in the non-concurrent controls is overestimated when Arm 1 continues after the interim analysis, leading to underestimation of the treatment effect in Arm 2. When both futility and efficacy stopping rules are applied, the marginal bias of the model-based estimator is influenced by both mechanisms, and its direction depends on the treatment effect in Arm 1. For small treatment effects in Arm 1, early stopping occurs predominantly for futility, resulting in a positive marginal bias in \(\tilde{\theta}_2\). For large treatment effects in Arm 1, early stopping occurs predominantly for efficacy, resulting in a negative marginal bias in \(\tilde{\theta}_2\). For intermediate effect sizes, the opposing impacts of futility and efficacy stopping may partially offset one another, reducing the magnitude of the bias.

If, in addition to the interim analysis for Arm 1, an interim analysis is also performed for Arm 2, this introduces an additional bias in the treatment effect estimate for Arm 2. An efficacy interim analysis for Arm 2 will also impact the type 1 error rate of hypothesis tests, requiring the use of adjusted boundaries.

We considered a two-arm platform trial to illustrate the bias introduced by including non-concurrent control data, which is caused by the dependence of the weight of the NCC data in the treatment effect estimator on the interim results of previous arms. In our example, the interim analysis for Arm 1 is performed at the time Arm 2 is added. This timing of the interim analysis is the most extreme scenario, as the impact of the interim analysis on the weight of the non-concurrent control data is maximized: if Arm 1 stops early, the NCC data are not used at all in the estimate. If the interim analysis were performed at a later time point, the weight of the non-concurrent controls would remain strictly positive in case of stopping when Arm 1 stops.

Extending the proposed methodology to incorporate efficacy stopping, more general timings of interim analyses, trials with more than two experimental arms, and interim analyses for all arms is a topic for future research.

Additional results from the simulation study. (pdf)

The GitHub repository (https://github.com/pavlakrotka/NCC_InterimAnalysis) contains the R code to reproduce the results of the simulation study.

M.B.R., M.P. and P.K. conceived this research. P.K. prepared the initial draft of the text and performed the simulations. All authors discussed the results, provided comments and reviewed the manuscript.

This publication is supported by the predoctoral program AGAUR-FI ajuts (2024 FI-1 00401) Joan Oró, which is backed by the Secretariat of Universities and Research of the Department of Research and Universities of the Generalitat of Catalonia, as well as the European Social Plus Fund.

This work was supported by Grant PID2023-148033OB-C21 funded by MICIU/AEI/10.13039/501100011033 and by FEDER/UE; and the Departament d’Empresa i Coneixement de la Generalitat de Catalunya (Spain) under Grant 2021 SGR 01421 (GRBIO).

Marta Bofill Roig is a Serra Húnter Fellow.

The authors declare no potential conflict of interests.

7 Derivation of the bias of the model-based treatment effect estimator for Arm 2↩︎

Let \(\bar y_{ks}\) denote the sample mean in arm \(k\) and period \(s\). Assume there is a stepwise time trend in the platform trial, where the strength of the trend is parameterized by \(\lambda\), i.e., the mean response increases by \(\lambda\) at the onset of period 2. Hence, the sample means in period 1 \(\bar y_{01}\) and \(\bar y_{11}\) follow normal distributions with means \(\mu_0\) and \(\mu_1\), respectively. In period 2, the sample means \(\bar y_{02}\), \(\bar y_{12}\), and \(\bar y_{22}\) follow normal distributions with means \(\mu_0 + \lambda\), \(\mu_1 + \lambda\), and \(\mu_2 + \lambda\), respectively.

Denote by \(Z_{11} = (\bar y_{11} - \bar y_{01}) / \sigma \sqrt{1/n_{11} + 1/n_{01}}\) the Z-statistic from the interim analysis of Arm 1. The arm is dropped for futility if the p-value from the z-test is larger than a futility boundary \(\alpha_1\), hence if \(Z_{11} < \Phi^{-1}(1-\alpha_1) = c_1\). The probability of Arm 1 to be stopped at the interim analysis is \(\text{I\kern-0.15em P}(Z_{11} < c_1) = \Phi(c_1 - \theta_1 / (\sigma\sqrt{1/n_{11} + 1/n_{01}}))\). To simplify the notation, define \(\gamma = c_1 - \theta_1 / (\sigma\sqrt{1/n_{11} + 1/n_{01}})\).

The marginal bias of \(\tilde{\theta}_2\) relative to \(\theta_2\) is defined as: \[\begin{align} E \left[ \tilde{\theta}_2 - \theta_2 \right] = E \left[ \tilde{\theta}_2 \right] - \theta_2 \end{align}\] Based on the result of the interim analysis, the estimator \(\tilde{\theta}_2\) is either given by the separate analysis (if \(Z_{11} < c_1\)), or by the regression model (if \(Z_{11} \ge c_1\)). Let \(I(\cdot)\) denote the indicator function. The expected value \(E \left[ \tilde{\theta}_2 \right]\) is then given by:

\[\begin{align} & E \left[\tilde{\theta}_2 \right] = E \left[ \left[ \underbrace{\bar{y}_{22} - \bar{y}_{02}}_{(1)} \right] \cdot I\{Z_{11} < c_1\} \right] + E \left[ \left[\underbrace{\bar{y}_{22} - (1-\varrho) \bar{y}_{02}}_{(2)} - \underbrace{\varrho (\bar{y}_{01} + \bar{y}_{12} - \bar{y}_{11})}_{(3)} \right] \cdot I\{ Z_{11} \ge c_1\} \right] && \end{align}\]

Since the sample means from period 2 \(\bar y_{02}\), \(\bar y_{12}\), and \(\bar y_{22}\) are independent of \(Z_{11}\), and \(\theta_2 = \mu_2 - \mu_0\) it follows:

\[\begin{align} & \text{(i)} \;E[(1) \cdot I\{Z_{11} < c_1\}] = ((\mu_2 + \lambda) - (\mu_0 + \lambda)) \cdot \Phi(\gamma) \\ & \text{(ii)} \;E[(2) \cdot I\{ Z_{11} \ge c_1\}] = (\mu_2 + \lambda) (1-\Phi(\gamma)) - (1- \varrho) (\mu_0 + \lambda) (1-\Phi(\gamma)) \\ & \text{(iii)} \;E[(3) \cdot I\{ Z_{11} \ge c_1\}] = - \varrho (\mu_1 + \lambda) (1-\Phi(\gamma)) + \varrho \mu_1 (1-\Phi(\gamma)) - \varrho \mu_0 (1 - \Phi(\gamma)) + \varrho \cdot \text{Cov} \left[ (\bar y_{11} - \bar y_{01}), I\{ Z_{11} \ge c_1\} \right] && \end{align}\]

where in (iii) we have also used \(E \left[ (\bar y_{11} - \bar y_{01}) \cdot I\{ Z_{11} \ge c_1\} \right] = E \left[ \bar y_{11} - \bar y_{01} \right] \cdot E \left[ I\{ Z_{11} \ge c_1\} \right] + \text{Cov} \left[ (\bar y_{11} - \bar y_{01}), I\{ Z_{11} \ge c_1\} \right]\).

The expression (i) + (ii) + (iii) simplifies to:

\[\begin{align} E \left[\tilde{\theta}_2 \right] = \theta_2 + \varrho \cdot \text{Cov} \left[ (\bar y_{11} - \bar y_{01}), I\{ Z_{11} \ge c_1\} \right] && \end{align}\]

Note that a covariance between a continuous variable \(Y\) and a binary variable \(X\), which takes the value 1 with probability \(p\), i.e., \(E[X]=p\), can be written as \(\text{Cov}[X, Y] = p(1-p) \cdot (E[Y \mid X=1] - E[Y \mid X=0])\). Moreover, note that \(I (Z_{11} \ge c_1) = I \left( \bar y_{11} - \bar y_{01} \ge c_1 \cdot \sigma \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}} \right)\). Hence, we can rewrite the covariance \(\text{Cov} \left[ (\bar y_{11} - \bar y_{01}), I\{ Z_{11} \ge c_1\} \right]\) as follows:

\[\begin{align} & E \left[\tilde{\theta}_2 \right] = \theta_2 \;+ \\ & \varrho \cdot (1-\Phi(\gamma)) \cdot \Phi(\gamma) \cdot \left(E \left[\bar y_{11} - \bar y_{01} \mid \bar y_{11} - \bar y_{01} \ge c_1 \cdot \sigma \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}} \right] - E \left[ \bar y_{11} - \bar y_{01} \mid \bar y_{11} - \bar y_{01} < c_1 \cdot \sigma \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}} \right] \right) && \end{align}\]

Since \(\bar y_{11} - \bar y_{01} \sim \mathcal{N} \left( \theta_1, \sigma^2 \left( \frac{1}{n_{01}} + \frac{1}{n_{11}} \right) \right)\), we can use the following formulae for the expectations of the truncated normal distribution: Let \(X \sim N(\mu, \sigma^2)\), then \(E[ X \mid X > a] = \mu + \sigma \frac{\phi((a - \mu) / \sigma)}{1 - \Phi((a - \mu) / \sigma)}\) and \(E[ X \mid X < b] = \mu - \sigma \frac{\phi((b - \mu) / \sigma)}{\Phi((b - \mu) / \sigma)}\). Therefore, it follows:

\[\begin{align} & E \left[\tilde{\theta}_2 \right] = \theta_2 \;+ \\ & \varrho \cdot \left(1 - \Phi(\gamma)\right) \cdot \Phi (\gamma) \cdot \left( \left(\theta_1 +\sigma \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}} \cdot \frac{\phi \left(\frac{c_1 \cdot \sigma \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}} - \theta_1}{\sigma \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}}} \right)}{1 - \Phi \left(\frac{c_1 \cdot \sigma\sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}} - \theta_1}{\sigma \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}}} \right)} \right) - \left( \theta_1 - \sigma \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}} \cdot \frac{\phi \left(\frac{c_1 \cdot \sigma \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}} - \theta_1}{\sigma \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}}} \right)}{\Phi \left( \frac{c_1 \cdot \sigma \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}} - \theta_1}{\sigma \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}}} \right) }\right) \right) && \end{align}\]

Hence, the bias of \(\tilde{\theta}_2\) relative to \(\theta_2\) is given by:

\[\begin{align} E \left[ \tilde{\theta}_2 - \theta_2 \right] = \varrho \cdot \sigma \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}} \cdot \phi \left( \gamma \right) \end{align}\]

The expression for the conditional bias in equation 6 follows from the definition of the conditional expectation of a continuous random variable X conditional on an event A:

\[\begin{align} E[X \mid A] = \frac{E[X \cdot I(A)]}{\text{I\kern-0.15em P}(A)} \end{align}\]

Thus, the expectation of \(\tilde{\theta}_2\) conditional on \(Z_{11} \ge c_1\) is defined as follows:

\[\begin{align} E[\tilde{\theta}_2 \mid Z_{11} \ge c_1] = \frac{E \left[ \left[\bar{y}_{22} - (1-\varrho) \bar{y}_{02} - \varrho (\bar{y}_{01} + \bar{y}_{12} - \bar{y}_{11}) \right] \cdot I\{ Z_{11} \ge c_1\} \right]}{\text{I\kern-0.15em P}(Z_{11} \ge c_1)} && \end{align}\]

Through analogous calculations as we used for the unconditional expectation, we get the following result for the conditional bias:

\[\begin{align} E \left[ \tilde{\theta}_2 - \theta_2 \mid Z_{11} \ge c_1 \right] = \varrho \cdot \sigma \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}} \cdot \frac{\phi \left( \gamma \right)}{1 - \Phi \left( \gamma \right)} \end{align}\]

Equation 7 follows from substituting 0 for the treatment effect \(\theta_1\):

\[E \left[ \tilde{\theta}_2 - \theta_2 \mid Z_{11} \ge c_1 \right]_{\theta_1 = 0} = \varrho \cdot \sigma \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{01}}} \cdot \frac{\phi \left( c_1 \right)}{\alpha_1}\]

References↩︎

[1]

Elias Laurin Meyer, Peter Mesenbrink, Cornelia Dunger-Baldauf, Hans Jürgen Fülle, Ekkehard Glimm, Yuhan Li, Martin Posch, and Franz Koenig. . Clinical Therapeutics, 42 (7): 1330–1360, 2020. ISSN 0149-2918. .

[2]

Franz Koenig, Cécile Spiertz, Daniel Millar, Sarai Rodrı́guez-Navarro, Núria Machı́n, Ann Van Dessel, Joan Genescà, Juan M Pericàs, Martin Posch, Adrian Sánchez-Montalva, et al. . Eclinicalmedicine, 67, 2024.

[3]

Janet Woodcock and Lisa M. LaVange. . New England Journal of Medicine, 377 (1): 62–70, 2017. ISSN 0028-4793. URL https://www.nejm.org/doi/10.1056/NEJMra1510062.

[4]

Derek C. Angus, Brian M. Alexander, Scott Berry, Meredith Buxton, Roger Lewis, Melissa Paoloni, Steven A.R. Webb, Steven Arnold, Anna Barker, Donald A. Berry, Marc J.M. Bonten, Mary Brophy, Christopher Butler, Timothy F. Cloughesy, Lennie P.G. Derde, Laura J. Esserman, Ryan Ferguson, Louis Fiore, Sarah C. Gaffey, J. Michael Gaziano, Kathy Giusti, Herman Goossens, Stephane Heritier, Bradley Hyman, Michael Krams, Kay Larholt, Lisa M. LaVange, Philip Lavori, Andrew W. Lo, Alex John London, Victoria Manax, Colin McArthur, Genevieve O’Neill, Giovanni Parmigiani, Jane Perlmutter, Elizabeth A. Petzold, Craig Ritchie, Kathryn M. Rowan, Christopher W. Seymour, Nathan I. Shapiro, Diane M. Simeone, Bradley Smith, Bradley Spellberg, Ariel Dora Stern, Lorenzo Trippa, Mark Trusheim, Kert Viele, Patrick Y. Wen, and Janet Woodcock. . Nature Reviews Drug Discovery, 18 (10): 797–807, 2019. ISSN 1474-1784. URL https://www.nature.com/articles/s41573-019-0034-3.

[5]

Marta Bofill Roig, Cora Burgwinkel, Ursula Garczarek, Franz Koenig, Martin Posch, Quynh Nguyen, and Katharina Hees. On the use of non-concurrent controls in platform trials: A scoping review. Trials, 24 (1): 408, 2023.

[6]

Lori E. Dodd, Boris Freidlin, and Edward L. Korn. . New England Journal of Medicine, 384 (16): 1572–1573, 2021. ISSN 0028-4793. .

[7]

U.S. Food and Drug Administration. Master protocols for drug and biological product development - guidance for industry. 2023. https://www.fda.gov/media/174976/download.

[8]

Kim May Lee and James Wason. BMC Medical Research Methodology, 20 (1): 1–12, 2020. ISSN 14712288. .

[9]

Marta Bofill Roig, Pavla Krotka, Carl-Fredrik Burman, Ekkehard Glimm, Stefan M Gold, Katharina Hees, Peter Jacko, Franz Koenig, Dominic Magirr, Peter Mesenbrink, et al. On model-based time trend adjustments in platform trials with non-concurrent controls. BMC medical research methodology, 22 (1): 1–16, 2022.

[10]

Pavla Krotka, Martin Posch, Mohamed Gewily, Günter Höglinger, and Marta Bofill Roig. Statistical modeling to adjust for time trends in adaptive platform trials utilizing non-concurrent controls. Biometrical Journal, 67 (3): e70059, 2025.

[11]

Ian C Marschner and I Manjula Schou. Analysis of adaptive platform trials using a network approach. Clinical Trials, 19 (5): 479–489, 2022. . URL https://doi.org/10.1177/17407745221112001.

[12]

Beibei Guo, Li Wang, and Ying Yuan. Treatment comparisons in adaptive platform trials adjusting for temporal drift. Statistics in Biopharmaceutical Research, 16 (3): 361–370, 2024.

[13]

Benjamin R Saville, Donald A Berry, Nicholas S Berry, Kert Viele, and Scott M Berry. The bayesian time machine: Accounting for temporal drift in multi-arm platform trials. Clinical Trials, 19 (5): 490–501, 2022. . URL https://doi.org/10.1177/17407745221112013.

[14]

Heinz Schmidli, Sandro Gsteiger, Satrajit Roychoudhury, Anthony O’Hagan, David Spiegelhalter, and Beat Neuenschwander. . Biometrics, 70 (4): 1023–1032, 2014. .

[15]

Sebastian Weber, Yue Li, John W. Seaman III, Tomoyuki Kakizume, and Heinz Schmidli. . Journal of Statistical Software, 100 (19): 1–32, 2021. . URL https://www.jstatsoft.org/index.php/jss/article/view/v100i19.

[16]

Marta Bofill Roig, Pavla Krotka, Katharina Hees, Franz Koenig, Dominic Magirr, Peter Jacko, Tom Parke, and Martin Posch. Treatment-control comparisons in platform trials including non-concurrent controls. Statistics in Biopharmaceutical Research, (just-accepted): 1–29, 2025.

[17]

Peter Greenstreet, Thomas Jaki, Alun Bedding, Chris Harbron, and Pavel Mozgunov. A multi-arm multi-stage platform design that allows preplanned addition of arms while still controlling the family-wise error. Statistics in Medicine, 43 (19): 3613–3632, 2024.

[18]

Gernot Wassmer and Werner Brannath. Group Sequential and Confirmatory Adaptive Designs in Clinical Trials. Springer Series in Pharmaceutical Statistics, 2016. ISBN 978-3-319-32560-6. URL http://link.springer.com/10.1007/978-3-319-32562-0.

[19]

Xiaoyin Fan, David L DeMets, and KK Gordon Lan. Conditional bias of point estimates following a group sequential test. Journal of Biopharmaceutical Statistics, 14 (2): 505–530, 2004.

[20]

Michael J Grayling and James MS Wason. Point estimation following a two-stage group sequential trial. Statistical Methods in Medical Research, 32 (2): 287–304, 2023.

[21]

David S Robertson, Babak Choodari-Oskooei, Munya Dimairo, Laura Flight, Philip Pallmann, and Thomas Jaki. . Statistics in medicine, 42 (2): 122–145, 2023.

[22]

John Whitehead. On the bias of maximum likelihood estimation following a sequential test. Biometrika, 73 (3): 573–581, 1986.

[23]

Scott S Emerson and Thomas R Fleming. Parameter estimation following group sequential hypothesis testing. Biometrika, 77 (4): 875–892, 1990.

[24]

Aiyi Liu and WJ Hall. Unbiased estimation following a group sequential test. Biometrika, 86 (1): 71–78, 1999.

[25]

Myron N Chang. Confidence intervals for a normal mean following a group sequential test. Biometrics, pages 247–254, 1989.

[26]

Gary L Rosner and Anastasios A Tsiatis. Exact confidence intervals following a group sequential trial: A comparison of methods. Biometrika, 75 (4): 723–729, 1988.

marta.bofillroig@meduniwien.ac.at↩︎

On the inclusion of non-concurrent controls in platform trials with a futility interim analysis