We propose a generalization of the synthetic control and interventions methods to the setting with dynamic treatment effects. We consider the estimation of unit-specific treatment effects from panel data collected under a general treatment sequence. Here, each unit receives multiple treatments sequentially, according to an adaptive policy that depends on a latent, endogenously time-varying confounding state. Under a low-rank latent factor model assumption, we develop an identification strategy for any unit-specific mean outcome under any sequence of interventions. The latent factor model we propose admits linear time-varying and time-invariant dynamical systems as special cases. Our approach can be viewed as an identification strategy for structural nested mean models—a widely used framework for dynamic treatment effects—under a low-rank latent factor assumption on the blip effects. Unlike these models, however, it is more permissive in observational settings, thereby broadening its applicability. Our method, which we term synthetic blip effects, is a backwards induction process in which the blip effect of a treatment at each period and for a target unit is recursively expressed as a linear combination of the blip effects of a group of other units that received the designated treatment. This strategy avoids the combinatorial explosion in the number of units that would otherwise be required by a naive application of prior synthetic control and intervention methods in dynamic treatment settings. We provide estimation algorithms that are easy to implement in practice and yield estimators with desirable properties. Using unique Korean firm-level panel data, we demonstrate how the proposed framework can be used to estimate individualized dynamic treatment effects and to derive optimal treatment allocation rules in the context of financial support for exporting firms.
In many observational studies, units undergo multiple treatments sequentially over time—for example, patients receive multiple therapies, customers are exposed to multiple advertising campaigns, and governments implement multiple policies. The treatment sequence often follows a general pattern rather than being restricted to a staggered adoption design, and interventions typically occur in a data-adaptive manner, with treatment assignment depending on the current (potentially unobserved) state of the treated unit and its past treatments. Furthermore, temporal spillovers across treatments and intermediate outcomes make treatment effects inherently dynamic. A common policy question is what the expected outcome would have been under an alternative policy or course of action. Counterfactual analysis using observational data with multiple sequentially and adaptively assigned treatments is the focus of a long line of research in causal inference.
Typical approaches for identification with time-varying treatments require a strong sequential exogeneity assumption, where the treatment decision at each period is exogenous conditional on an observable state that comprises the history of outcomes and treatments. This assumption is a generalization of the standard conditional exogeneity assumption in static settings. However, most observational datasets are plagued with unobserved confounding, and endogeneity can take complex form especially in dynamic settings. Many techniques exist for addressing unobserved confounding in one-shot treatment settings, such as instrumental variables, difference-in-differences, regression discontinuity designs, and synthetic controls, some of which have been extended to dynamic contexts. For example, event studies and difference-in-differences have been generalized to accommodate sequences of treatments (see below), but most studies assume staggered designs (i.e., irreversible treatment sequences), with few exceptions [1]–[3]. Instrumental variables and regression discontinuity have also been extended to dynamic settings [4]–[7], which requires the existence of sequences of instruments or running variables over time. Beyond these contributions, methods for handling unobserved confounding in settings with general time-varying treatments remain largely underexplored.
In this work, we present the first extension of the synthetic controls literature to handle dynamic treatment effects. Synthetic controls [8], [9]—and its generalization to synthetic interventions [10]—are widely used empirical approaches for handling unobserved confounding from observational panel data. However, the existing literature assumes that units are treated only once or in a non-adaptive manner. This limits the applicability of the technique to policy-relevant settings where multiple interventions occur sequentially over time. We propose an extension of the synthetic controls and synthetic interventions framework that enables identification of mean counterfactual outcomes under arbitrary treatment sequences, even when the observational data arise from an adaptive dynamic treatment policy. As in the synthetic interventions framework, we assume that the panel data stem from a low-rank data generation model, with latent factors capturing unobserved confounding signals. In static settings, the low-rank assumption, together with a technical overlap condition, allows each unit’s mean outcomes under any sequence of interventions to be expressed as linear combinations of observed outcomes from a carefully chosen sub-group of other units. We generalize this idea to dynamic contexts under a low-rank linear structural nested mean model assumption. Our work can also be viewed as extending the g-estimation framework for structural nested mean models [11]–[13] to accommodate unobserved confounding under a low-rank structure. In doing so, our work helps connect the econometric literature on synthetic controls with the biostatistics literature on structural nested mean models.
The key idea of our identification strategy is to express the mean outcome for a unit under a sequence of interventions as an additive function of “blip” effects corresponding to that sequence. The blip effect of an intervention at a given period can be interpreted as the treatment effect of that intervention, relative to a baseline intervention for that specific period, assuming a common sequence of interventions for all other periods. Subsequently, under our low-rank assumption and by applying a recursive argument, we can identify the blip effect of each treatment for each unit and time period. Our procedure can be viewed as a dynamic programming approach, in which a synthetic-control-type procedure is used to compute “synthetic blip effects” at each step of the dynamic program. These step-specific causal quantities are then combined to build the overall counterfactual outcome of any unit under any sequence of interventions.
We illustrate the usefulness of the proposed framework by estimating individualized dynamic treatment effects and optimal treatment allocation rules in the context of providing financial support to exporting firms. Exporting is inherently risky, and thus government agencies play an important role to provide insurance and loans to promote export activities. Using novel Korean firm-level data, we first estimate the effects of insurance and loans as two distinct treatments on firm performances, such as export values. In particular, we recover individualized counterfactual outcomes for all hypothetical intervention sequences. Aggregating across firms yields average effects, which reveal the sequencing of treatments matters for improving export values over time. For example, for both insurance and loans, we find that concentrating interventions to early or later periods is on average more effective than smoothing them across periods. We then use the individualized dynamic treatment effects to estimate allocation rules that maximize performances for each firm. We show that such targeting rules can significantly improve outcomes while requiring less public spending. Finally, we construct decision trees that can guide public officials in selecting new firms for financial support and determining the schedule of interventions.
The paper is organized as follows. We close this section by discussing related work and introducing the setting and notation. Section 2 presents the latent factor model for time-varying treatments, and Section 3 discusses the limitations of the synthetic interventions approach in our setting. Sections 4 and 5 introduce our main models—the time-varying and time-invariant latent factor models—which involve modeling trade-offs. Each section establishes identification, develops an estimation algorithm, and provides the asymptotic theory for the resulting estimator. Section 6 contains our empirical application, and Section 7 concludes. The appendix includes all proofs and additional remarks on the models and assumptions.
Panel data methods in econometrics. Consider a setting where one observes repeated measurements of multiple heterogeneous units over \(T\) time steps. Prominent approaches for this setting include difference-in-differences [14]–[16] and synthetic controls [8], [9], [17]–[32]. These frameworks estimate what would have happened to a unit that undergoes an intervention (i.e., a “treated” unit) had it remained under control (i.e., no intervention), potentially in the presence of unobserved confounding. That is, they estimate the counterfactual outcome of a treated unit if it had remained under control for all \(T\) time steps. Recently, the difference-in-differences literature has advanced by taking heterogeneity seriously under staggered designs [33]–[36]. Staggered intervention has also been examined in the synthetic controls literature [37]–[40]. These approaches typically estimate the counterfactual trajectory of treated units had they remained not-yet-treated.
Both one-shot and staggered designs can be viewed as special cases of the general problem we study in this paper: estimating counterfactual outcomes for a unit under any hypothetical sequence of interventions over the \(T\) time steps. A critical aspect underlying the above methods is the structure assumed between units and time under “control.” One elegant way of encoding this structure is through a latent factor model (also known as an interactive fixed effects model), [41]–[48]. In such models, it is posited that there exist low-dimensional latent unit and time factors that capture unit- and time-specific heterogeneity, respectively, in the potential outcomes. Since the goal in these works is to estimate outcomes under “control,” no structure is imposed on the potential outcomes under intervention.
In [10], [49], the latent factor model is extended to incorporate latent factorization across interventions as well, which allows for identification and estimation of counterfactual mean outcomes under intervention rather than just under control. In Section 3, we provide a detailed comparison with the synthetic interventions framework introduced in [10]. That framework, however, is designed for static regimes and faces two key limitations in the dynamic treatment setting: (i) it does not allow for adaptive treatment assignment over time, and (ii) if there are \(A\) possible interventions at each of the \(T\) time steps, the sample complexity of the synthetic interventions estimator scales as \(A^T\) in order to estimate all possible intervention sequences. The non-adaptivity requirement and the exponential dependence on \(T\) make this estimator ill-suited for dynamic treatments, especially as \(T\) grows. We show that by imposing that an intervention at a given time step has an additive effect on future outcomes—i.e., an additive latent factor model—we achieve significant gains in what can be identified and estimated. We study two variants, time-varying and time-invariant versions, which respectively nest the classical linear time-varying and linear time-invariant dynamical system models as special cases. We establish identification results and propose associated estimators to infer all \(A^T\) counterfactual trajectories per unit. Importantly, our identification result allows the interventions to be selected in an adaptive manner, and the sample complexity of the estimator no longer exhibits exponential dependence on \(T\); see Table 1.
Linear Factor Models (LFM) |
Donor Granularity |
Donor Sample Complexity |
Adaptivity of Intervention Policy |
---|---|---|---|
Naive LFM (Synthetic Interventions) |
\(\bar d^T\) | \(O(A^{T})\) | Non-adaptive |
Additive Time-Varying LFM (This Work) |
\((d,t)\) | \(O(A \times T)\) | Adaptive after some periods (i.e., staggered adoption of adaptive policy) |
Additive Time-Invariant LFM (This Work) |
\(d\) | \(O(A)\) | Adaptive after period 1 |
Another extension of such factor models is the class of “dynamic factor models”, originally proposed in [50]. We refer the reader to [51], [52] for extensive surveys, and to [53] for a recent analysis of such time-varying factor models in the context of synthetic controls. These models are similar in spirit to our setting in that they allow outcomes for a given time period to be dependent on outcomes from lagged time periods in an autoregressive manner. To capture this phenomenon, dynamic factor models explicitly represent the time-varying factor as an autoregressive process. However, the target causal parameter in these works is significantly different—they focus on identifying the latent factors and/or forecasting. There is relatively less emphasis on estimating counterfactual mean outcomes for a given unit under different sequences of interventions.
Linear dynamical systems are an extensively studied class of models in the machine learning and applied mathematics literature, and are widely used as linear approximations to many nonlinear systems that nevertheless perform well in practice. A seminal work in this area is [54], which introduces the Kalman filter as a robust solution for identifying and estimating the linear parameters that define the system. We refer the reader to the classic survey in [55] and the more recent survey in [56]. Previous works typically assume that (i) the system is driven by independent, and identically distributed (i.i.d.) mean-zero sub-Gaussian noise at each time step, and (ii) both the outcome variable and a meaningful per-time step state are observed and used in estimation. In contrast, we allow for confounding—i.e., the per-time-step actions chosen can be correlated with the system’s state in an unknown manner—and we do not assume access to a per-time-step state, only to the outcome variable. To tackle this setting, we show that linear dynamical systems, both time-varying and time-invariant, are special cases of the latent factor model that we propose. Our recursive “synthetic blip effects” identification strategy enables estimation of mean counterfactual outcomes under any sequence of interventions without first performing system identification, and despite unobserved confounding.
Financial frictions play a central role in shaping firms’ export performance, particularly in times of crisis [57]–[59]. To mitigate financing barriers and sustain exports, governments provide public financial support via export credit agencies (ECAs), mainly in the form of insurance and loans. Public support can generate different effects depending not only on its scale but also on how it is allocated and structured [60], [61]. Empirical studies of ECAs are typically limited to a single treatment (mostly insurance) due to data constraint [62], leaving the broader impact of combined support largely unexplored. This paper considers the entire set of support programs and analyzes how the timing and sequencing of interventions influence firm performance. By going beyond estimating treatment effects, it provides evidence on allocation strategies that enhance the effectiveness of public funds.
Notation. \([R]\) denotes \(\{1, \dots, R\}\) for \(R \in \mathbb{N}\). \([R_1, R_2]\) denotes \(\{R_1, \dots, R_2\}\) for \(R_1, R_2 \in \mathbb{N}\), with \(R_1 < R_2\). \([R]_0\) denotes\(\{0, \dots, R\}\) for \(R \in \mathbb{N}\). For a vector \(a\), we define \(a^{\top}\) as its transpose. For vectors \(a, b \in \mathbb{R}^d\), we define the inner product of \(a\) and \(b\) as \(\left\langle a, b \right\rangle=a^{\top}b=\sum_{\ell=1}^{d}a_\ell b_\ell\). For a matrix \(M\in \mathbb{R}^{m \times n}\), we denote its Frobenius norm as \(\|M\|_F\). Let \(O_p\) and \(o_p\) denote the probabilistic versions of the deterministic big-\(O\) and little-\(o\) notations.
Let there be \(N\) heterogeneous units. We collect data over \(T\) time steps for each unit.
Observed outcomes. For each unit and time period \(n, t\), we observe \(Y_{n, t} \in \mathbb{R}\), which is the outcome of interest.
Treatments. For each \(n \in [N]\) and \(t \in [T]\), we observe treatment actions \(D_{n, t} \in [A]\), where \(A \in \mathbb{N}\). We allow \(D_{n, t}\) to be categorical, i.e., it can simply serve as a unique identifier for the action chosen. Denote a sequence of actions \((d_1, \dots, d_t)\) by \(\bar{d}^{t} \in [A]^t\); denote \((d_t, \dots, d_T)\) by \(\underline{d}^t \in [A]^{T - t}\). Define \(\bar{D}_n^{t}, \underline{D}_n^t\) analogously to \(\bar{d}^{t}, \underline{d}^t\), respectively, but now with respect to the observed sequence of actions \(D_{n, t}\).
Control and interventional period. For each unit \(n\), we assume there exists \(t^*_n \in [T]\) before which it is in “control”. We denote the control action at time step \(t\) as \(0_t \in [A]\).2 Note \(0_{\ell}\) and \(0_{t}\) for \(\ell \neq t\), do not necessarily equal each other. For \(t \in [T]\), denote \(\bar{0}^t = (0_1, \dots, 0_t)\) and \(\underline{0}^t = (0_t, \dots, 0_T)\). For \(t < t^*_n\), we assume \(D_{n, t} = 0_t\), i.e., \(\bar{D}_n^{t^*_n - 1} = \bar{0}^{t^*_n - 1}\). That is, during the control period all units are under a common sequence of actions, but for \(t \geq t^*_n\), each unit \(n\) can undergo a possibly different sequence of actions from all other units, denoted by \(\underline{D}^{t^*_n}_n\). Note that if \(t^*_n = 1\), then unit \(n\) is never in the control period.
Counterfactual outcomes. As stated earlier, for each unit and time period \(n, t\), we observe \(Y_{n, t} \in \mathbb{R}\), which is the outcome of interest. We denote the potential outcome if unit \(n\) had instead undergone \(\bar{d}^t\) as \(Y_{n, t}^{(\bar{d}^t)}\). More generally, we denote the potential outcome \(Y_{n, t}^{(\bar{D}^\ell_n, \underline{d}^{\ell + 1})}\) if unit \(n\) receives the observed sequence of actions \(\bar{D}^\ell_n\) till time step \(\ell\), and then instead undergoes \(\underline{d}^{\ell + 1}\) for the remaining \(t - \ell\) time steps. 3
We make the standard “stable unit treatment value assumption” (SUTVA) as follows.
Assumption 1 (Sequential Action SUTVA). For all \(n\in [N], t\in [T], \ell \in [t], \bar{d}^t \in [A]^{t}\): \[\begin{align} Y^{(\bar{D}^\ell_n, \underline{d}^{\ell + 1})}_{n, t} = \sum_{\bar{\delta}^\ell \in [A]^{\ell}} Y^{(\bar{\delta}^\ell, \underline{d}^{\ell + 1})}_{n, t} \cdot \mathbb{1}(\bar{D}^\ell_n = \bar{\delta}^\ell). \end{align}\] Further, for all \(\bar{D}^t_n \in [A]^t\): \[\begin{align} Y^{(\bar{D}^t_n)}_{n, t} = Y_{n, t}. \end{align}\] As an immediate implication, \(Y^{(\bar{d}^\ell, \bar{d}^{\ell + 1})}_{n, t} \mid \bar{D}^\ell_n = \bar{d}^\ell\) equals \(Y^{(\bar{D}^\ell_n, \underline{d}^{\ell + 1})}_{n, t} \mid \bar{D}^\ell_n = \bar{d}^\ell\), and \(Y^{(\bar{d}^t)}_{n, t} \mid \bar{D}^t_n = \bar{d}^t\) equals \(Y_{n, t} \mid \bar{D}^t_n = \bar{d}^t\).
Our goal is to accurately estimate the potential outcome if a given unit \(n\) had instead undergone \(\bar{d}^T\) (instead of the actual observed sequence \(\bar{D}_n^T\)), for any given sequence of actions \(\bar{d}^T\) over \(T\) time steps. That is, for all \(n \in [N]\) \(\bar{d}^T \in [A]^T\), our goal is to estimate \(Y_{n, T}^{(\bar{d}^T)}.\) We more formally define the target causal parameter in Section 2.
We now present a novel latent factor model for causal inference with dynamic treatments. Towards that, we first define the collection of latent factors that are of interest.
Definition 1 (Latent factors). For a given unit \(n\) and time step \(t\), denote its latent factor as \(v_{n, t}\). For a given sequence of actions over \(t\) time steps, \(\bar{d}^t\), denote its associated latent factor as \(w_{\bar{d}^t}\). Denote the collection of latent factors as \[\begin{align} \mathcal{LF} \mathrel{\vcenter{:}}= \left\{ v_{n, t} \right\}_{n \in [N], t \in [T]} \cup \left\{ w_{\bar{d}^t} \right\}_{\bar{d}^t \in [A]^{t}, \;t \in [T]}. \end{align}\] Here \(v_{n, t}, w_{\bar{d}^t} \in \mathbb{R}^{m(t)}\), where \(m(t)\) is allowed to depend on \(t\).
Assumption 2 (General factor model). Assume \(\forall \;n \in [N]\), \(t \in [T], \bar{d}^t \in [A]^t\), \[\begin{align} Y^{(\bar{d}^t)}_{n, t} &= \left\langle v_{n, t}, w_{\bar{d}^t} \right\rangle + \varepsilon^{(\bar{d}^t)}_{n, t}. \label{eq:general95factor95model} \end{align}\tag{1}\] Further, \[\begin{align} \mathbb{E}[\varepsilon^{(\bar{d}^t)}_{n, t} \mid \mathcal{LF}] = 0. \end{align}\]
In 1 , the key assumption made is that \(v_{n, t}\) does not depend on the action sequence \(\bar{d}^t\), while \(w_{\bar{d}^t}\) does not depend on unit \(n\). That is, \(v_{n, t}\) captures the unit \(n\) specific latent heterogeneity in determining the expected conditional potential outcome \(\mathbb{E}[Y^{(\bar{d}^t)}_{n, t} \mid \mathcal{LF}]\); \(w_{\bar{d}^t}\) follows a similar intuition but with respect to the action sequence \(\bar{d}^t\). Importantly, the factors can be correlated with the treatment sequence \(\bar{D}^t\), making them unobserved confounders. This latent factorization will be key in all our identification and estimation algorithms, and the associated theoretical results. An interpretation of \(\varepsilon^{(\bar{d}^t)}_{n, t}\) is that it represents the component of the potential outcome \(Y^{(\bar{d}^T)}_{n, T}\) that is not factorizable into the latent factors represented by \(\mathcal{LF}\); moreover, it helps model the inherent randomness in the potential outcomes \(Y^{(\bar{d}^T)}_{n, T}\). In Sections 4 and 5 below, we show how various standard models of dynamical systems are a special case of our proposed factor model in Assumption 2.
Our target causal parameter to estimate is, for all units \(n \in [N]\) and any action sequence \(\bar{d}^T \in [A]^T\), \[\begin{align} \mathbb{E}[Y^{(\bar{d}^T)}_{n, T} ~|~ \mathcal{LF}],\label{eq:target95causal95param} \end{align}\tag{2}\] i.e., the expected potential outcome conditional on the latent factors, \(\mathcal{LF}\). In total this amounts to estimating \(N \times A^T\) different (expected) potential outcomes, which we note grows exponentially in \(T\).
Given that our goal is to bring to bear a novel factor model perspective to the dynamic treatment effects literature, we first exposit on some of the limitations of the current methods from the factor model literature that were designed for the static interventions regime, i.e., where an intervention is done only once at a particular time step. We focus on the synthetic interventions (SI) framework [10], which is a recent generalization of the popular synthetic controls framework. In particular, we provide an identification argument which builds upon the SI framework [10] and then discuss its limitations.
Donor units. To explain the identification strategy, we first need to define a collection of subsets of units based on: (i) the action sequence they receive; (ii) the correlation between their potential outcomes and the chosen actions. These subsets are defined as follows.
Definition 2 (SI donor units). For \(\bar{d}^T \in [A]^T\), \[\begin{align} \mathcal{I}^{\bar{d}^T} &\mathrel{\vcenter{:}}= \{j \in [N]: (i) \;\bar{D}^T_j = \bar{d}^T, \;(ii) \;\;\forall \;\bar{\delta}^T \in [A]^T, \;\; \mathbb{E}[\varepsilon^{(\bar{\delta}^T)}_{j ,T} \mid \bar{D}^T_j, \mathcal{LF}] = 0 \}. \label{eq:general95factor95model95donor95units} \end{align}\tag{3}\]
The donor set \(\mathcal{I}^{\bar{d}^T}\) contains units that receive exactly the sequence \(\bar{d}^T\). Further, we require that for these particular units, the action sequence was chosen such that \(\forall \;\bar{\delta}^T \in [A]^T, \;\mathbb{E}[\varepsilon^{(\bar{\delta}^T)}_{j ,T} \mid \bar{D}^T_j, \mathcal{LF}] = \mathbb{E}[\varepsilon^{(\bar{\delta}^T)}_{j ,T} \mid \mathcal{LF}] = 0\), i.e., \(\varepsilon^{(\bar{\delta}^T)}_{j ,T}\) is conditionally mean independent of the action sequence \(\bar{D}^T_j\) unit \(j\) receives. Note a sufficient condition for property (ii) above is that \(\forall \;\bar{\delta}^T \in [A]^T, \;\;Y^{(\bar{\delta}^T)}_{j ,T} \perp \bar{D}^T_j \mid \mathcal{LF}\). That is, for these units, the action sequence for the entire time period \(T\) is chosen at \(t = 0\) conditional on the latent factors, i.e., the policy for these units is not adaptive (cannot depend on observed outcomes \(Y_{j, t}\) for \(t \in [T]\)).
Assumption 3. \(\forall n\in [N], \bar{d}^T \in [A]^T\) suppose that \(v_{n, T}\) satisfies a well-supported condition, i.e., there exists linear weights \(\beta^{n,\mathcal{I}^{\bar{d}^T}} \in \mathbb{R}^{|\mathcal{I}^{\bar{d}^T}|}\) such that: \[\begin{align} v_{n, T} =\sum_{j \in \mathcal{I}^{\bar{d}^T}} \beta_j^{n,\mathcal{I}^{\bar{d}^T}} v_{j, T}. \label{eq:general95well95supported95SI} \end{align}\tag{4}\]
Assumption 3 essentially states that for a given sequence of interventions \(\bar{d}^T \in [A]^T\), the latent factor for the target unit \(v_{n, T}\) lies in the linear span of the latent factors \(v_{j, T}\) associated with the “donor” units in \(\mathcal{I}^{\bar{d}^T}\). Note by Theorem 4.6.1 of [63], if the \(\{v_{j, T}\}_{j \in [N]}\) are sampled as independent, mean zero, sub-Gaussian vectors, then \(\text{span}(\{v_{j,T} : j \in \mathcal{I}^{\bar{d}^T}\}) = \mathbb{R}^{m(T)}\) with high probability as \(|\mathcal{I}^{\bar{d}^T}|\) grows, and if \(|\mathcal{I}^{\bar{d}^T}| \gg m(T)\) (recall \(m(T)\) is the dimension of \(v_{n, T}\)).
We then have identification for the target parameter, which states that the 2 can expressed as a function of observed outcomes. It is an adaptation of the identification argument in [10].
Theorem 1 (SI Identification Strategy). Let Assumptions 1, 2, and 3 hold. Then, for \(\forall n\in [N], \bar{d}^T \in [A]^T\), the mean counterfactual outcome can be expressed as: \[\begin{align} \mathbb{E}[Y_{n,T}^{(\bar{d}^T)}\mid \mathcal{LF}] =~& \mathbb{E}\left[\sum_{j \in \mathcal{I}^{\bar{d}^T}} \beta_j^{n,\mathcal{I}^{\bar{d}^T}} Y_{j ,T} \mid \mathcal{LF}, \mathcal{I}^{\bar{d}^T}\right]. \end{align}\]
Theorem 1 establishes that to estimate the mean counterfactual outcome of unit \(n\) under the action sequence \(\bar{d}^T\), select all donors that received that sequence, i.e., \(\bar{D}^T = \bar{d}^T\), and for whom we know that their action sequence was not adaptive. The target causal parameter then is simply a linear re-weighting of the observed outcomes \((Y_{j ,T})_{j \in \mathcal{I}^{\bar{d}^T}}\), where these linear weights \(\beta_j^{n,\mathcal{I}^{\bar{d}^T}}\) express the latent factor \(v_{n, T}\) for unit \(n\) as a linear combination of \(\{v_{j, T}\}_{j \in \mathcal{I}^{\bar{d}^T}}\).
Donor sample complexity. To estimate \(\mathbb{E}[Y^{(\bar{d}^T)}_{n, T} ~|~ \mathcal{LF}]\) for all units \(n \in [N]\) and any action sequence \(\bar{d}^T \in [A]^T\), this SI identification strategy requires the existence of a sufficiently large subset of donor units \(\mathcal{I}^{\bar{d}^T}\) for every \(\bar{d}^T \in [A]^T\). That is, the number of donor units we require will need to scale at the order of \(A^T\), which grows exponentially in \(T\).
Further, the actions picked for these donor units cannot be adaptive as we require \(\forall \;\bar{\delta}^T \in [A]^T, \;\; \mathbb{E}[\varepsilon^{(\bar{\delta}^T)}_{j ,T} \mid \bar{D}^T_j, \mathcal{LF}] = 0\) for them. See Figure 1 for a directed acyclic graph (DAG) that is consistent with the exogeneity conditions implied by the definition of \(\mathcal{I}^{\bar{d}^T}\) in 3 .
Given this combinatorial explosion in the number of donor units and the stringent non-adaptivity requirements on these donor units, in the following sections we study how additional structure on the latent factor model gives rise to novel identification strategies, which allows us to reduce the donor sample complexity and remove the exogeneity requirements between the chosen actions and the donor units.
Motivated by the limitation of the identification strategy in Section 3, we now impose additional structure on the latent factor model.
Assumption 4 (Linear time-varying (LTV) factor model). Assume \(\forall \;n \in [N]\), \(t \in [T], \bar{d}^t \in [A]^t\), \[\begin{align} Y^{(\bar{d}^t)}_{n, t} &= \sum^{t}_{\ell = 1} \left\langle \psi^{t, \ell}_{n}, w_{d_\ell} \right\rangle + \varepsilon^{(\bar{d}^t)}_{n, t},\label{eq:LTV95factor95model} \end{align}\tag{5}\] where \(\psi^{t, \ell}_{n}, w_{d_\ell} \in \mathbb{R}^m\) for \(\ell \in [t]\). Further, let \(\mathcal{LF}= \{\psi^{t, \ell}_{n} \}_{n \in [N], t \in [T], \ell \in [t]} \cup \{w_{d} \}_{d \in [A]}\). Assume \[\begin{align} \mathbb{E}[\varepsilon^{(\bar{d}^t)}_{n, t} \mid \mathcal{LF}] = 0. \end{align}\]
Remark 1. Note Assumption 4 implies Assumption 2 holds with \[\begin{align} v_{n, t} = [\psi^{t, 1}_{n}, \dots, \psi^{t, t}_{n}], \; w_{\bar{d}^t} = [w_{d_1}, \dots, w_{d_t}]. \end{align}\] Further \(m(t) = m \times t\) for \(m(t)\) in Definition 1.
We see that there is additional structure in the latent factors. In particular, the effect of action \(d_\ell\) on \(Y^{(\bar{d}^t)}_{n, t}\) for \(\ell \in [t]\) is additive, given by \(\langle \psi^{t, \ell}_{n}, w_{d_\ell} \rangle\). Intuitively, \(\psi_n^{t, \ell}\) captures the latent unit specific heterogeneity in the potential outcome for unit \(n\) at a given time step \(t\) for an action taken at time step \(\ell \le t\); analogously \(w_{d_\ell}\) captures the latent effect of action \(d_\ell\). This additional structure along will be useful in the identification strategy we employ in Section 4.2.
A time-varying dynamical system is useful in modeling the dynamic evolution of treatment and outcome sequences. We show that the classical linear time-varying dynamical system model satisfies Assumption 4. Suppose for all \(t \in [T]\), all units \(n \in [N]\) obey the following dynamic triangular model for a sequence of actions \(\bar{D}^t_{n}\) and counterfactual outcomes \(Y^{(\bar{D}_n^t)}_{n, t}\): \[\begin{align} D_{n, t} &= f_n(w_{D_{n, t-1}}, \;z^{(\bar{D}_n^{t - 1})}_{n, t - 1}),\tag{6} \\ Y^{(\bar{D}_n^t)}_{n, t} &= \left\langle \theta_{n, t}, z^{(\bar{D}_n^t)}_{n, t} \right\rangle + \left\langle \tilde{\theta}_{n, t}, w_{D_{n, t}} \right\rangle + \tilde{\eta}_{n, t},\tag{7} \end{align}\] where \(z^{(\bar{D}_n^t)}_{n, t} = \boldsymbol{B}_{n, t} \;z^{(\bar{D}_n^{t - 1})}_{n, t - 1} + \boldsymbol{C}_{n, t} \;w_{D_{n, t}} + \eta_{n, t}\) and \(z_{n, 0}= w_{D_{n, 0}} = 0\). Here, \(z_{n, t} \in \mathbb{R}^{m}\) is the latent state associated with unit \(n\) at time \(t\) and \(w_{D_{n, t-1}} \in \mathbb{R}^{m}\) is the chosen action at time \(t - 1\). \(\eta_{n, t} \in \mathbb{R}^{m}\) and \(\tilde{\eta}_{n, t} \in \mathbb{R}\) represent independent mean-zero random innovations at each time step \(t\). \(\boldsymbol{B}_{n, t}, \boldsymbol{C}_{n, t} \in \mathbb{R}^{m \times m}\) are matrices governing the linear dynamics of \(z^{(\bar{D}_n^t)}_{n, t}\). Note \(\boldsymbol{B}_{n, t}, \boldsymbol{C}_{n, t}\) are specific to time step \(t\) and this is what makes this model a time-varying dynamical system. In contrast, in the classic linear time-invariant dynamical system described in Section 5.1 below, \(\boldsymbol{B}_{n, t} = \boldsymbol{B}_n\) and \(\boldsymbol{C}_{n, t} = \boldsymbol{C}_n\) for all \(t \in [T]\). \(\theta_{n, t}, \tilde{\theta}_{n, t} \in \mathbb{R}^{m}\) are parameters governing how the outcome of interest \(Y^{(\bar{D}_n^t)}_{n, t}\) is a linear function of \(z^{(\bar{D}_n^t)}_{n, t}\) and \(w_{d_t}\), respectively. \(f_n(\cdot)\) is a function which decides how the next action \(w_{D_{n, t}}\) is chosen as a function of the previous action \(w_{D_{n, t - 1}}\), and current state \(z_{n, t}\). We see that due to the input of \(z^{(\bar{D}_n^t)}_{n, t}\) in \(f_n(\cdot)\), i.e., the action sequence is adaptive. As a result, \(\eta_{n, \ell}\) is correlated with \(D_{n, t}\) for \(\ell < t\).
Proposition 1. Suppose the dynamic triangular model 6 –7 holds. Then we have the following representation, \[\begin{align} Y^{(\bar{d}^t)}_{n, t} = \sum^{t}_{\ell = 1} \Big(\left\langle \psi^{t, \ell}_{n}, w_{d_\ell} \right\rangle + \varepsilon_{n, t, \ell} \Big), \label{eq:LTV95factor95model95representation} \end{align}\qquad{(1)}\] where \(\psi^{t, \ell}_{n}, w_{d_\ell} \in \mathbb{R}^m\) for \(\ell \in [t]\); here, \[\begin{align} \psi^{t, \ell}_{n} &= \left( \left(\prod^t_{k = \ell + 1}\boldsymbol{B}_{n, k} \right) \boldsymbol{C}_{n, {\ell}}\right)' \theta_{n, t} \quad \text{for} \quad \ell \in [t-1], \\ \psi^{t, t}_{n} &= \boldsymbol{C}_{n, {t}}'\theta_{n, t} + \tilde{\theta}_{n, t}, \\ \varepsilon_{n, t, \ell} &= \left(\left(\prod^t_{k = \ell + 1}\boldsymbol{B}_{n, k}\right) \eta_{n, \ell} \right)' \theta_{n, t} \quad \text{for} \quad \ell \in [t-1], \\ \varepsilon_{n, t, t} &= \theta_{n, t}'\eta_{n, t} + \tilde{\eta}_{n, t}. \end{align}\] Therefore, Assumption 4 holds with the additional structure that \(\varepsilon^{(\bar{d}^t)}_{n, t}\) has an additive factorization as \(\sum^{t}_{\ell = 1} \varepsilon_{n, t, \ell}\), and it is not a function of \(d_\ell\).
In this example, our target parameter \(\mathbb{E}[Y^{(\bar{d}^T)}_{n, T} ~|~ \mathcal{LF}]\) defined in 2 translates to the expected potential once we condition on the latent parameters \(\psi^{t, \ell}_{n}, w_{d_\ell}\), which are a function of \(\boldsymbol{B}_{n, t}, \boldsymbol{C}_{n, t}, \theta_{n, t}, \tilde{\theta}_{n, t}\). Here the expectation is take with respect to the per-step independent mean-zero random innovations, \(\varepsilon_{n, t, \ell}\), which are a function of \(\{\eta_{n, q}, \tilde{\eta}_{n, q}\}_{q \ge \ell}\) (and \(\boldsymbol{B}_{n, t}, \boldsymbol{C}_{n, t}, \theta_{n, t}, \tilde{\theta}_{n, t}\)).
In this section we identify \(\mathbb{E}[Y_{n,T}^{(\bar{d}^T)}\mid \mathcal{LF}]\), that is, we represent this expected potential outcome for a target unit \(n\) and action sequence \(\bar{d}^T\) as some function of observed outcomes.
Notation. We define the following useful notation for any unit \(n \in [N]\): \[\begin{align} \gamma_{n, T, t}(d_t) := \left\langle \psi^{T, t}_{n}, w_{d_t} - w_{0_t} \right\rangle. \nonumber \end{align}\] Note that \(\gamma_{n, T, t}(d_t)\) can be interpreted as a “blip effect”—the expected difference in potential outcomes if unit \(n\) undergoes the sequence \((\bar{d}^t, \underline{0}^{t + 1})\) instead of \((\bar{d}^{t -1}, \underline{0}^{t})\). In particular, note that Assumption 4 implies \[\begin{align} \mathbb{E}\left[Y^{(\bar{d}^t, \underline{0}^{t + 1})}_{n, T} - Y^{(\bar{d}^{t -1}, \underline{0}^{t})}_{n, T} \mid \mathcal{LF}\right] &= \mathbb{E}\left[\left\langle \psi^{T, t}_{n}, w_{d_t} - w_{0_t} \right\rangle + \varepsilon^{(\bar{d}^t, \underline{0}^{t + 1})}_{n, T} - \varepsilon^{(\bar{d}^{t -1}, \underline{0}^{t})}_{n, T} \mid \mathcal{LF}\right] \nonumber \\&= \left\langle \psi^{T, t}_{n}, w_{d_t} - w_{0_t} \right\rangle \mid \mathcal{LF}. \nonumber \end{align}\] Further, let \[\begin{align} b_{n, T} &:= \sum^T_{t = 1} \left\langle \psi^{T, t}_{n}, w_{0_t} \right\rangle. \nonumber \end{align}\] This can be interpreted as the expected potential outcome if unit \(j\) remains under the control sequence \(\bar{0}^T\) till time step \(T\). Again, Assumption 4 implies \[\begin{align} \mathbb{E}\left[ Y^{(\bar{0}^T)}_{n, T} \mid \mathcal{LF}\right] &= \mathbb{E}\left[\sum^T_{t = 1} \left\langle \psi^{T, t}_{n}, w_{0_t} \right\rangle + \varepsilon^{(\bar{0}^T)}_{n, T} \mid \mathcal{LF}\right] = \sum^T_{t = 1} \left\langle \psi^{T, t}_{n}, w_{0_t} \right\rangle \mid \mathcal{LF}. \end{align}\]
We now state assumptions we need for the identification strategy that we propose.
We define different subsets of units based on the treatment sequence they receive: \[\begin{align} \mathcal{I}^d_t \mathrel{\vcenter{:}}= \{j \in [N]: & \;(i) \;\bar{D}^t_{j} = (0_1, \dots, 0_{t -1}, d), \nonumber \\ &(ii) \;\forall \;\bar{\delta}^T \in [A]^T, \;\mathbb{E}[Y^{(\bar{\delta}^T)}_{j, T} \mid \mathcal{LF}, \bar{D}^t_{j}] = \mathbb{E}[Y^{(\bar{\delta}^T)}_{j, T} \mid \mathcal{LF}] \}. \label{eq:LTV95donor} \end{align}\tag{8}\] The donor set \(\mathcal{I}^d_t\) contain units that remain under the control sequence \((0_1, \dots, 0_{t - 1})\) till time step \(t - 1\), and at time step \(t\) receive action \(d\) (i.e., \(t^*_n \ge t - 1\)). Further, we require that for these particular units, the action sequence, \(\bar{D}^t_{j}\), till time step \(t\) was chosen such that \(\mathbb{E}[Y^{(\bar{\delta}^T)}_{j, T} \mid \mathcal{LF}, \bar{D}^t_{j}] = \mathbb{E}[Y^{(\bar{\delta}^T)}_{j, T} \mid \mathcal{LF}]\), i.e., the potential outcomes are conditionally mean independent of the action sequence \(\bar{D}^t_j\) unit \(j\) receives till time step \(t\). Of course, a sufficient condition for property (ii) above is that \(\forall \;\bar{\delta}^T \in [A]^T, \;\;Y^{(\bar{\delta}^T)}_{j ,T} \perp \bar{D}^t_j \mid \mathcal{LF}\). That is, for these units, the action sequence till time step \(t\) is chosen at \(t = 0\) conditional on the latent factors, i.e., the policy for these units can only be adaptive from time step \(t + 1\). Note, given Assumption 4, this property (ii) can be equivalently stated as \(\mathbb{E}[\varepsilon^{(\bar{\delta}^T)}_{j, T} \mid \mathcal{LF}, \bar{D}^t_{j}] = \mathbb{E}[\varepsilon^{(\bar{\delta}^T)}_{j, T} \mid \mathcal{LF}] = 0\).
Assumption 5. For \(n \in [N]\), let \(v_{n, T} := [\psi^{T, 1}_{n}, \dots, \psi^{T, T}_{n}]\). We assume that for all \(n \in [N]\), \(v_{n, T}\) satisfies a well-supported condition with respect to the various donor sets, i.e., for all \(d \in [A]\) and \(t \in [T]\), there exists \(\beta^{n,\mathcal{I}_{t}^d} \in \mathbb{R}^{|\mathcal{I}_{t}^d|}\) such that \[\begin{align} v_{n, T} = \sum_{k \in \mathcal{I}_{t}^d} \beta_k^{n,\mathcal{I}_{t}^d} v_{k, T}. \label{eq:LTV95well95supported} \end{align}\tag{9}\]
Assumption 5 requires that for units \(n \in [N]\), their latent factors \([\psi^{T, 1}_{n}, \dots, \psi^{T, T}_{n}]\) are expressible as a linear combination of the units in the donor sets \(\mathcal{I}_{t}^d\). See the discussion under Assumption 3 in Section 3 justifying such an assumption for settings when \(\mathcal{I}_{t}^d\) is sufficiently large.
Assumption 6. For all \(n \in \mathcal{I}_{t}^{d}, t \in [T], \bar{d}^t \in [A]^t\), \[\begin{align} \mathbb{E}\left[Y_{n, T}^{(\bar{d}^t, \underline{0}^{t + 1})} - Y_{n, T}^{(\bar{d}^{t - 1}, \underline{0}^{t})} \mid \bar{D}_n^{t} = \bar{d}^t, \mathcal{LF}\right] = \gamma_{n, T, t}(d_{t}) \mid \mathcal{LF}. \end{align}\] Note that given Assumption 4, this condition can be equivalently written as \[\begin{align} \mathbb{E}\left[\varepsilon_{n, T}^{(\bar{d}^t, \underline{0}^{t + 1})} - \varepsilon_{n, T}^{(\bar{d}^{t - 1}, \underline{0}^{t})} \mid \bar{D}_n^{t} = \bar{d}^t, \mathcal{LF}\right] = 0. \end{align}\]
Below we give two sufficient conditions under which Assumption 6 holds.
1. Sufficient condition: Non-action dependent noise. Assumption 6 holds if \(\varepsilon_{n, T}^{(\bar{d}^t, \underline{0}^{t + 1})} = \varepsilon_{n, T}^{(\bar{d}^{t - 1}, \underline{0}^{t})}\), which occurs if \(\varepsilon_{n, T}^{(\bar{d}^t, \underline{0}^{t + 1})}\) and \(\varepsilon_{n, T}^{(\bar{d}^{t - 1}, \underline{0}^{t})}\) are not a function of \((\bar{d}^t, \underline{0}^{t + 1})\), and \((\bar{d}^{t - 1}, \underline{0}^{t})\), respectively. The motivating example of a classic linear time-varying dynamical system given in Section 4.1 satisfies this property.
2. Sufficient condition: Additive action-dependent noise. We now relax the sufficient condition above that \(\varepsilon_{n, T}^{(\bar{d}^t, \underline{0}^{t + 1})}\) and \(\varepsilon_{n, T}^{(\bar{d}^{t - 1}, \underline{0}^{t})}\) are not a function of the action sequence. Instead, suppose for all \(\bar{d}^T \in [A]^T\), \(\varepsilon_{n, T}^{(\bar{d}^T)} = \sum^{T}_{t = 1} \eta^{(d_t)}_{n, t}\), where we assume that conditional on \(\mathcal{LF}\), \(\eta^{(d_t)}_{n, t}\) are mutually independent for all \(t \in [T]\), and \(d_t \in [A]\). Then \(\varepsilon_{n, T}^{(\bar{d}^t, \underline{0}^{t + 1})} - \varepsilon_{n, T}^{(\bar{d}^{t - 1}, \underline{0}^{t})} = \eta^{(d_t)}_{n, t} - \eta^{(0_t)}_{n, t}\). In this case, a sufficient condition for Assumption 6 is that \[\begin{align} \eta^{(d_t)}_{n, t}, \eta^{(0_t)}_{n, t} \; \mathbin{ \mathpalette{\@indep}{} }\;D_{n, t} \mid \mathcal{LF}. \end{align}\] That is, conditional on the latent factors, the action \(D_{n, t}\) at time step \(t\) is independent of the additional noise \(\eta^{(d_t)}_{n, T, t}, \eta^{(0_t)}_{n, T, t}\) generated at time step \(t\). Note, however that \(\varepsilon_{n, t}^{(\bar{d}^t)} \; \mathbin{ \mathpalette{\@indep}{/} }\;D_{n, t} \mid \mathcal{LF}\). This is because \(\varepsilon_{n, t - 1}^{(\bar{d}^{t -1})}\) and \(\varepsilon_{n, t}^{(\bar{d}^{t})}\) remain auto-correlated, i.e., . \(\varepsilon_{n, t}^{(\bar{d}^{t -1})} \; \mathbin{ \mathpalette{\@indep}{/} }\varepsilon_{n, t}^{(\bar{d}^{t})} \mid \mathcal{LF}\). Also, \(\varepsilon_{n, t}^{(\bar{d}^{t -1})} \; \mathbin{ \mathpalette{\@indep}{/} }\;D_{n, t} \mid \mathcal{LF}\), as the action \(D_{n, t}\) can be a function of the observed outcomes \(Y_{n, t -1}\).
We now connect our assumptions more closely to the notation and assumptions used in the structural nested mean model (SNMM) and the marginal structural model (MSM) in the statistics literature on dynamic treatment effects. A typical assumption in these literatures is sequential conditional exogeneity, which states that for some sequence of random state variables \(S_{n,t}\), the treatments are sequentially conditionally exogenous, i.e.: \[\begin{align} \label{eq:blip95seq95exog} \forall \bar{d} \in [A]^T: Y_{n, T}^{(\bar{d})} \mathbin{ \mathpalette{\@indep}{} }\;D_{n, t} \mid \bar{S}^{t -1}_{n}, \bar{D}^t_{n} = \bar{d}_t, \mathcal{LF}, \end{align}\tag{10}\] where \(\bar{S}^{t -1}_{n}=(S_{n,0}, \ldots, S_{n,t-1})\). Moreover, assume that the blip effects admit the following factor model representation: \[\begin{align} \label{eq:blip95effect} \mathbb{E}\left[Y_{n, T}^{(\bar{d}^t, \underline{0}^{t + 1})} - Y_{n, T}^{(\bar{d}^{t - 1}, \underline{0}^{t})} \mid \bar{S}^{t -1}_{n}, \bar{D}_n^{t} = \bar{d}^t, \mathcal{LF}\right] = \left\langle \psi^{T, t}_{n}, w_{d_t} - w_{0_t} \right\rangle \mid \mathcal{LF}. \end{align}\tag{11}\] 11 implies that the conditional mean of the blip effect is invariant of the past states and actions. Lastly, assume that the baseline potential outcome has a factor model representation, i.e.: \[\begin{align} \label{eq:blip95baseline95outcomes} \mathbb{E}\left[ Y^{(\bar{0}^T)}_{n, T} \mid \mathcal{LF}\right] = \sum^T_{t = 1} \left\langle \psi^{T, t}_{n}, w_{0_t} \right\rangle \mid \mathcal{LF}. \end{align}\tag{12}\] Then we have the following proposition,
The proof of Proposition 2 can be found in Appendix 8. The proof, which is an inductive argument, is in essence known in the literature, i.e., SNMM models that are past action and state independent also imply a marginal structural model, i.e. Assumption 4, (see e.g. Technical Point 21.4 of [64]). We include it in our appendix for completeness and to abide to our notation. Thus instead of Assumptions 4 and 6, one could impose 10 –12 , which are more in line with the dynamic treatment effect literature. Our identification argument would then immediately apply. However, our assumptions are more permissive and flexible in their current form. For instance, unlike a full SNMM specification, our blip definition in Assumption 6 only requires that the blip effect is not modified by past actions, but potentially allows for modification conditional on past states that confound the treatment. However, the full SNMM model presented above precludes such effect modifications.
Given these assumptions, we now present the main identification results. We first illustrate the key intuition behind the identification analysis in a simple two-period setting. Note that, for given unit \(n\) and action sequence \((d_{1},d_{2})\in [A]\times[A]\), the expected potential outcome—the main causal object of interest—can be decomposed into two blip effects and a baseline outcome: \[\label{eq:illustration} \mathbb{E} \left[Y_{n}^{(d_{1},d_{2})}\right] = \underbrace{\mathbb{E} \left[Y_{n}^{(d_{1},d_{2})}\right] - \mathbb{E} \left[Y_{n}^{(0_{1},d_{2})}\right]}_{\mathrm{Blip}_{1}(d_{1})} + \underbrace{\mathbb{E} \left[Y_{n}^{(0_{1},d_{2})}\right] - \mathbb{E} \left[Y_{n}^{(0_{1},0_{2})}\right]}_{\mathrm{Blip}_{2}(d_{2})} + \underbrace{\mathbb{E} \left[Y_{n}^{(0_{1},0_{2})}\right]}_{\text{Baseline}},\tag{13}\] where we suppress for simplicity the subscript \(T\) of terminal period (\(T=2\)) from the expression \(Y_{n}^{(d_{1},d_{2})}\) and conditioning of \(\mathcal{LF}\). Here \(\mathrm{Blip}_{t}(d_{t})\) is unit \(n\)’s treatment effect of intervention \(d_t\) compared to baseline \(0_t\) for time step \(t\) with common interventions for all other time steps and the baseline is unit \(n\)’s expected potential outcome if the unit is remained under the control sequence for all \(T\) time steps. Under Assumption 4, \(\mathbb{E}[Y_{n}^{(d_{1},d_{2})}] = \sum_{\ell=1}^{2} \langle v^{\ell}_n,\, w_{d_\ell}\rangle\). In other words, Assumption 4 imposes latent factor structure on blip effects and baseline outcomes: \(\mathrm{Blip}_{2}(d_{2})= \langle v^{2}_n,\, w_{d_2}-w_{0_2} \rangle\), \(\mathrm{Blip}_{1}(d_{1})= \langle v^{1}_n,\, w_{d_1}-w_{0_1} \rangle\), and \(\text{Baseline}= \sum_{\ell=1}^{2} \langle v^{\ell}_n,\, w_{0_\ell} \rangle\). This way, we assume blips effects and baseline of units can be represented as linear combinations of one another, making the identification problem akin to those in synthetic control and synthetic intervention.
Donor Type | Donor Set | Treatment Sequence |
---|---|---|
1 | \(\mathcal{I}^0_2\) | \((0_1, 0_2)\) |
2 | \(\mathcal{I}^{d_2}_2\) | \((0_1,d_2)\) |
3 | \(\mathcal{I}^{d_1}_1\) | \((d_1,D_2)\) |
Based on the latent factor structure, we now demonstrate how each component in 13 can be recursively identified by constructing synthetic units based on an appropriate type of donors (Table 2). In Step 1, starting with the baseline \(\mathbb{E}[Y_{n}^{(0_{1},0_{2})}]\), we create a synthetic baseline for all units via linear combination of the observed outcomes of Type 1 donors, which identifies the baseline. Next in Step 2, for \(\mathrm{Blip}_{2}(d_{2})\), we create a synthetic \(\mathrm{Blip}_{2}(d_{2})\) for all units via linear combination of \(\mathrm{Blip}_{2}(d_{2})= \mathbb{E}[Y_{n}^{(0_{1},d_{2})} - Y_{n}^{(0_{1},0_{2})}]\) for Type 2 donors. Note that the blip effects of these donors are “observed” (i.e., already identified), as \(\mathbb{E}[Y_{n}^{(0_{1},d_{2})}]=\mathbb{E}[Y_{n}]\) for these donors and their baselines \(\mathbb{E}[Y_{n}^{(0_{1},0_{2})}]\) are identified in Step 1. Finally in Step 3, to identify \(\mathrm{Blip}_{1}(d_{1})\), we first construct “observed” \(\mathrm{Blip}_{1}(d_{1})\) for Type 3 donors. Note that, under Assumption 4, \(\mathrm{Blip}_{1}(d_{1})\) can be expressed as, with the observed action \(D_2\) (\(n\) suppressed) of the unit, \[\mathbb{E} \left[Y_{n}^{(d_{1},D_{2})}\right] - \mathbb{E} \left[Y_{n}^{(0_{1},D_{2})}\right] = \mathbb{E} \left[Y_{n}^{(d_{1},D_{2})}\right] - \mathbb{E} \left[Y_{n}^{(0_{1},D_{2})} - Y_{n}^{(0_{1},0_{2})}\right] - \mathbb{E} \left[Y_{n}^{(0_{1},0_{2})}\right].\] On the right-hand side, the first term satisfies \(\mathbb{E}[Y_{n}^{(d_{1},D_{2})}]=\mathbb{E}[Y_{n}]\) for these donors, the second term is \(\mathrm{Blip}_{2}(D_{2})\) identified for these donors in Step 2, and the third term is the baseline identified for these donors in Step 1. Now that \(\mathrm{Blip}_{1}(d_{1})\) for Type 3 donors are identified, linear combination of them identifies \(\mathrm{Blip}_{1}(d_{1})\) for all units. Therefore, we identify \(\mathbb{E}[Y_{n}^{(d_{1},d_{2})}]\) for all units and any given \((d_{1},d_{2})\in [A]^2\).
We now present the formal identification results for general \(T\).
Theorem 2. Let Assumptions 1,4, 5, and 6 hold. Then, for any unit \(n \in [N]\) and action sequence \(\bar{d}^T \in [A]^T\), the expected counterfactual outcome can be expressed as: \[\begin{align} \mathbb{E}[Y_{n,T}^{(\bar{d}^T)}\mid \mathcal{LF}] =~& \sum_{t =1}^T \gamma_{n, T, t}(d_t) + b_{n,T} \mid \mathcal{LF},\label{eq:LTV95identification} \end{align}\tag{14}\] where quantities on the right-hand side are identified as follows:
(i) We have the following representations of the baseline outcomes \[\begin{align} &\forall \;j \in \mathcal{I}^0_T~:~ b_{j,T} \mid \mathcal{LF}= \mathbb{E}[Y_{j, T} \mid \mathcal{LF}, \;j \in \mathcal{I}^0_T], \tag{15} \\ &\forall \;i \notin \mathcal{I}^0_T~:~ b_{i,T}\mid \mathcal{LF}= \sum_{j \in \mathcal{I}^0_T} \beta_j^{i,\mathcal{I}^0_T} \;b_{j,T} \mid \mathcal{LF}, \mathcal{I}^0_T. \tag{16} \end{align}\] (ii) We have the following representations of the blip effect at time \(T\) for \(\forall d \in [A]\): \[\begin{align} \forall \;j \in \mathcal{I}^d_T~:~& \gamma_{j, T, T}(d) \mid \mathcal{LF}=~ \mathbb{E}[Y_{j, T} \mid \mathcal{LF}, \;j \in \mathcal{I}^d_T] - b_{j, T} \mid \mathcal{LF}, \tag{17} \\ \forall \;i \notin \mathcal{I}^d_T~:~& \gamma_{i, T, T}(d) \mid \mathcal{LF}=~ \sum_{j \in \mathcal{I}^d_T} \beta_j^{i, \mathcal{I}^d_T} \gamma_{j, T, T}(d) \mid \mathcal{LF}, \mathcal{I}^d_T. \tag{18} \end{align}\] (iii) We have the following recursive representations of the blip effect \(\forall \;t < T, \;d \in [A]\): \[\begin{align} \forall \;j \in \mathcal{I}^d_t&~:~ \gamma_{j, T, t}(d) \mid \mathcal{LF}=~ \mathbb{E}[Y_{j, T} \mid \mathcal{LF}, \mathcal{I}^d_t] - b_{j, T} \mid \mathcal{LF}- \sum_{\ell=t+1}^T \gamma_{j,T, \ell}(D_{j, \ell}) \mid \mathcal{LF}, \tag{19} \\ \forall \;i \notin \mathcal{I}^d_t&~:~ \gamma_{i, T, t}(d) \mid \mathcal{LF}=~ \sum_{j \in \mathcal{I}^d_t} \beta_j^{i, \mathcal{I}^d_t} \gamma_{j, T, t}(d) \mid \mathcal{LF}, \mathcal{I}^d_t \tag{20}. \end{align}\]
14 states that our target causal parameter of interest can be written as an additive function of \(b_{n, T}\) and \(\gamma_{n,T,t}(d_t)\) for \(t \in [T]\) and \(d_t \in [A]\). Theorem 2 establishes that these various quantities are expressible as functions of observed outcomes\(\{Y_{j, T}\}_{j\in [N]}\). We give an interpretation below.
Identifying baseline outcomes. For units \(j \in \mathcal{I}^0_T\), 15 states that their baseline outcome \(b_{j, T}\) is simply their expected observed outcome at time step \(T\), i.e., \(Y_{j,T}\). For units \(i \notin \mathcal{I}^0_T\), 16 states that we can identify \(b_{i, T}\) by appropriately re-weighting the baseline outcomes \(b_{j, T}\) of the units \(j \in \mathcal{I}_T^0\) (identified via 15 ).
Identifying blip effects at time \(T\). For any given \(d \in [A]\): For units \(j \in \mathcal{I}^d_T\), 17 states that their blip effect \(\gamma_{j,T,T}(d)\) is equal to their observed outcome \(Y_{j,T}\) minus the baseline outcome \(b_{j, T}\) (identified via 16 ). For units \(i \notin \mathcal{I}^d_T\), 18 states that we can identify \(\gamma_{i,T,T}(d)\) by appropriately re-weighting the blip effects \(\gamma_{j,T,T}(d)\) of units \(j \in \mathcal{I}^d_T\) (identified via 17 ).
Identifying blip effects at time \(t < T\). Suppose by induction \(\gamma_{n, T, \ell}(d)\) is identified for every \(\ell \in [t + 1, T]\), \(n \in [N]\), \(d \in [A]\), i.e., can be expressed in terms of observed outcomes. Then for any given \(d \in [A]\): For units \(j \in \mathcal{I}^d_t\), 19 states that their blip effect \(\gamma_{j,T,t}(d)\) is equal to their their observed outcome \(Y_{j,T}\) minus the baseline outcome \(b_{j, T}\) (identified via 16 ) minus the sum of blip effects \(\sum_{\ell=t+1}^T \gamma_{j,T, \ell}(D_{j, t})\) (identified via the inductive hypothesis). For units \(i \notin \mathcal{I}^d_t\), 20 states that we can identify \(\gamma_{i,T,t}(d)\) by appropriately re-weighting the blip effects \(\gamma_{j,T,t}(d)\) of units \(j \in \mathcal{I}^d_T\) (identified via 19 ).
Donor sample complexity. To estimate \(\mathbb{E}[Y^{(\bar{d}^T)}_{n, T} ~|~ \mathcal{LF}]\) for all units \(n \in [N]\) and any action sequence \(\bar{d}^T \in [A]^T\), the LTV identification strategy requires the existence of a sufficiently large subset of donor units \(\mathcal{I}^d_t\) for every \(d \in [A]\) and \(t \in [T]\). That is, the number of donor units we require will need to scale at the order of \(A \times T\), which grows linearly in both \(A\) and \(T\) increases. Thus we see the the additional structure imposed by the time-varying factor model introduced in Assumption 4 leads to a decrease in sample complexity from \(A^T\) to \(A \times T\), when compared with the general factor model introduced in Assumption 2.
Remark 2. The additive structure in Assumption 4 can be relaxed to a hybrid structure that allows for flexible interaction among treatments for a fixed number (e.g., \(h\)) of consecutive periods, while maintaining additivity across the fixed windows: the latent factor \(w_{\bar{d}^t}\) in Assumption 2 satisfies \[\begin{align} w_{\bar{d}^t} = [w_{\tilde{d}_1}, \dots, w_{\tilde{d}_t}], \tilde{d}_{\ell} = (d_{\ell-h+1},...,d_{\ell}),\;\ell \ge h. \end{align}\] Then the sample complexity can be bounded by \(A^h \times T\). This remark highlights the trade-off between sample complexity and model flexibility.
Further, for \(j \in \mathcal{I}^d_t\), we require that \(\forall \;\delta^T \in [A]^T, \;\mathbb{E}[\varepsilon^{(\bar{\delta}^T)}_{j, T} \mid \mathcal{LF}, \bar{D}^t_{j}] = 0\). That is, the actions picked for these donor units are only required to be non-adaptive till time step \(t\) as opposed to being non-adaptive for the entire time period \(T\), which was required for the SI identification strategy in Section 3. See Figure 2 for a DAG that is consistent with the exogeneity conditions implied by the definition of \(\mathcal{I}^d_t\) in 8 .
We have shown that this additional linear time-varying latent factor structure, motivated by a linear time-varying dynamical system, yields substantial gains in terms of the number of donor units required and the flexibility of their action sequences. This begs the question of how much more can be gained if we instead consider a linear time-invariant latent factor structure, motivated by a linear time-invariant dynamical system. In Section 5, we show that this additional structure surprisingly implies far better donor sample complexity and less stringent exogeneity conditions on the donor units.
SBE-PCR
Estimator in LTV Setting↩︎Here we detail the specific algorithm that yields the SBE-PCR
estimator. To do so we consider the following additional covariates.
Assumption 7 (Additional Covariates). For each unit \(n \in [N]\), we assume access to covariates \(X_n \in \mathbb{R}^p\) such that each element satisfies\[X_{n,k} = \langle v_{n, T}, \rho_k \rangle + \varepsilon_{n,k},\] where \(v_{n, T}\) is the unit latent factor defined in Assumptions 2 and \(\varepsilon_{n,k}\) is independent mean-zero noise. Denote \(X \in \mathbb{R}^{p \times N} = [X_1, \dots, X_N]\). We can also design more general time-varying covariates as detailed in Appendix 10.3.
We make an additional assumption regarding control factors in order develop an algorithm with consistent control estimators as will be seen in later sections.
Assumption 8. For any donor set, i.e., any \(t \in [T]\), \(d \in [A]\), and unit \(n \in \mathcal{I}_t^d\) there exist weights \(\phi^{n, \mathcal{I}_t^d} \in \mathbb{R}^{|\mathcal{I}_t^d| - 1}\) such that \[v_{n,T} = \sum_{k \in \mathcal{I}_t^d \setminus n} \phi_k^{n, \mathcal{I}_t^d} v_{k,T}.\]
This assumption allows us to detail the algorithm for estimating weights using Principal Component Regression (PCR). Specifically, for each \(d \in [A]\), \(t \in [T]\), and unit \(n \in [N]\) we consider the donor set \(\mathcal{I}_t^d\) and estimate weights to express the response vector \(X_n \in \mathbb{R}^p\) as a linear combination of the covariates from other donor units. The corresponding matrix of covariates is \(X_{\mathcal{I}_t^d \setminus n} = X_{:,\mathcal{I}^d_t\setminus n} \in \mathbb{R}^{p \times |\mathcal{I}_t^d \setminus n|}\), which only chooses the relevant donor columns.
We will apply PCR by regressing \(X_n \in \mathbb{R}^p\) on the rank \(k_{\mathcal{I}_t^d \setminus n}\)-approximation \(X_{\mathcal{I}_t^d \setminus n}\)
with \(k_{\mathcal{I}_t^d \setminus n} = \text{rank}(\mathbb{E}[X_{\mathcal{I}_t^d \setminus n}])\), i.e., conducting PCR with parameter \(k_{\mathcal{I}_t^d \setminus n}\). Denote the
Singular Value Decomposition (SVD) of \(X_{\mathcal{I}_t^d \setminus n}\) as \[X_{\mathcal{I}_t^d \setminus n} = \sum_{l\geq 1}\sigma_lu_lv_l',\] where \(u_l
\in \mathbb{R}^p\) and \(v_l \in \mathbb{R}^{|\mathcal{I}_t^d \setminus n|}\) are the left and right singular vectors arranged in decreasing order of corresponding singular values \(\sigma_l\).4 At this point we know if \(n \in \mathcal{I}_t^d\) \[\hat{\phi}^{n,
\mathcal{I}_t^d} = \left(\sum_{l =1}^{k_{\mathcal{I}_t^d \setminus n}}(1/\sigma_l)v_l u_l'\right)X_n \in \mathbb{R}^{|\mathcal{I}_t^d| - 1},\] and if \(n \notin \mathcal{I}_t^d\) \[\hat{\beta}^{n, \mathcal{I}_t^d} = \left(\sum_{l =1}^{k_{\mathcal{I}_t^d}}(1/\sigma_l)v_l u_l'\right)X_n \in \mathbb{R}^{|\mathcal{I}_t^d|}.\] The distinction between using \(\beta\) and
\(\phi\) is to emphasize the difference in dimension. For justification of using PCR in our context, refer to [30], [31]. Given our weight estimation algorithm above we are ready for our SBE-PCR
algorithm.
Step 1: Estimate baseline outcomes.
For \(j \in \mathcal{I}^0_T\) \[\begin{align} \hat{b}_{j,T} = \sum_{k \in \mathcal{I}^0_T \setminus j}\hat{\phi}_k^{j, \mathcal{I}_T^0}Y_{k, T}. \end{align}\]
For \(i \notin \mathcal{I}^0_T\) \[\begin{align} \hat{b}_{i,T} = \sum_{j \in \mathcal{I}^0_T} \hat{\beta}_{j}^{i,\mathcal{I}^0_T} \hat{b}_{j,T}. \end{align}\]
Step 2: Estimate blip effects at time \(T\).
For \(d \in [A]\):
For \(j \in \mathcal{I}^d_T\) \[\begin{align} \hat{\gamma}_{j, T, T}(d) = \sum_{k \in \mathcal{I}^d_T \setminus j}\hat{\phi}_k^{j, \mathcal{I}_T^d}Y_{k, T} - \hat{b}_{j,T}. \end{align}\]
For \(i \notin \mathcal{I}^d_T\) \[\begin{align} \hat{\gamma}_{i, T, T}(d) = \sum_{j \in \mathcal{I}^d_T} \hat{\beta}_{j}^{i,\mathcal{I}^d_T} \hat{\gamma}_{j, T, T}(d). \end{align}\]
Step 3: Recursively estimate blip effects for time \(t < T\).
For \(d \in [A]\) and \(t \in \{T -1, \dots, 1\}\), recursively estimate as follows:
For \(j \in \mathcal{I}^d_t\) \[\begin{align} \hat{\gamma}_{j, T, t}(d) = \sum_{k \in \mathcal{I}^d_t \setminus j}\hat{\phi}_k^{j, \mathcal{I}_t^d}\left(Y_{k, T} - \hat{b}_{k,T} - \sum_{\ell=t+1}^T \hat{\gamma}_{k,T,\ell}(D_{k,\ell})\right). \end{align}\]
For \(i \notin \mathcal{I}^d_t\) \[\begin{align} \hat{\gamma}_{i, T, t}(d) = \sum_{j \in \mathcal{I}^d_t} \hat{\beta}_{j}^{i,\mathcal{I}^d_t} \hat{\gamma}_{j, T, t}(d). \end{align}\]
Step 4: Estimate target causal parameter. For \(n \in [N]\), and \(\bar{d}^T \in [A]^T\), estimate the causal parameter as follows: \[\begin{align} \label{eq:causal95estimator} \widehat{\mathbb{E}}[Y_{n, T}^{(\bar{d}^T)}\mid \mathcal{LF}] =~& \sum_{t =1}^T \hat{\gamma}_{n, T, t}(d_t) + \hat{b}_{n,T} . \end{align}\tag{21}\]
All the relevant weights in the above algorithm as computed via the previous PCR based algorithm.
SBE-PCR
Consistency in LTV Setting↩︎We state additional assumptions required to establish the consistency of the SBE-PCR
estimator.
Assumption 9 (Sub-Gaussian Noise). For all \(n \in [N]\) and \(\bar{d}^T \in [A]^T\), \(\varepsilon_{n,T}^{(\bar{d}^T)}\) are independent sub-Gaussian random variables with \(\mathrm{Var}(\varepsilon_{n,T}^{(\bar{d}^T)} \mid \mathcal{L}\mathcal{F}) = \sigma^2\) and \(\|\varepsilon_{n,T}^{(\bar{d}^T)} \mid \mathcal{L}\mathcal{F} \|_{\psi_2} \leq C\sigma\) for some constant \(C > 0\).
Assumption 10 (Bounded Expected Potential Outcomes).
For all \(n \in [N]\) and \(\bar{d}^T \in [A]^T\), we have \(\mathbb{E}[Y_{n,T}^{(\bar{d}^T)} \mid \mathcal{L}\mathcal{F}] \in [-1, 1]\).
Assumption 11 (Well-Balanced Singular Values). For all \(d \in [A]\) and \(t \in [T]\) we have \(\|\mathbb{E}[X_{\mathcal{I}^d_t}|\mathcal{LF}]\|_F \geq c'p|\mathcal{I}^d_t|\) where \(X_{\mathcal{I}^d_t} \in \mathbb{R}^{p \times |\mathcal{I}^d_t|}\) is the relevant data matrix of observed covariates and \(\kappa^{-1} \geq c\) where \(\kappa\) is the condition number of \(\mathbb{E}[X_{\mathcal{I}^d_t}|\mathcal{LF}]\) for constants \(c, c' > 0\).
Assumption 12 (Row-Space Inclusion). For all \(d \in [A]\) and \(t \in [T]\) there exist \(\{\xi_i^{(d,t)}\}_{i \in [p]}\) such that for any \(j \in \mathcal{I}^d_t\)
\[\mathbb{E}[Y_{j,T}|\mathcal{LF}, j \in \mathcal{I}^d_t] = \sum_{i = 1}^p \xi_i^{(d,t)} \cdot \mathbb{E}[(X_{\mathcal{I}_t^d})_{ij}|\mathcal{LF}, j \in \mathcal{I}^d_t],\]
where \(X_{\mathcal{I}_t^d} \in \mathbb{R}^{p \times |\mathcal{I}^d_t|}\) is the relevant data matrix of observed covariates.5
The first three assumptions are standard and identical to those presented in [10]. The final assumption facilitates consistency by ensuring that the test data lies
within the subspace spanned by the training data—specifically, within its row space—thereby enabling generalization of SBE-PCR
. It turns out that this is not a very restrictive assumption and standard within the literature. Appendix 10.3 lists a sufficient condition for it and an implication that will help us later on.
Theorem 3. Let assumption 1 to 12 hold.
Consider the SBE-PCR
estimator in Section 4.3 and suppose \(k = \max_{d \in [A], t\in [T]}\text{rank}(\mathbb{E}[X_{\mathcal{I}_t^d}])\). Then conditional
on the treatment assignments, \(\mathcal{LF}\), and \(\{\rho_i\}_{i \in [p]}\) we have:
(i) Baseline Consistency: For any \(n \in [N]\) \[\hat{b}_{n,T} - b_{n,T} \mid \mathcal{LF} = O_p\left(\sqrt{\log(p|\mathcal{I}_T^0|)}\left(\frac{k^{5/4}}{p^{1/4}} +k^{5/2}\max\left\{\frac{\sqrt{|\mathcal{I}_T^0|}}{p^{3/2}}, \frac{1}{\sqrt{|\mathcal{I}_T^0|-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right).\]
(ii) Terminal Blip Consistency: For any \(d \in [A]\) and unit \(n \in [N]\) \[\hat{\gamma}_{n,T,T}(d) - \gamma_{n,T,T}(d) \mid \mathcal{LF}= O_p\left(\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{7/4}}{p^{1/4}} +k^{3}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}_T^d|\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_T^0|,|\mathcal{I}_T^d|\}\).
(iii) Non-Terminal Blip Consistency: For any \(d \in [A]\), unit \(n \in [N]\), and \(t \in [1, \dots, T-1]\): \[\begin{align} \hat{\gamma}_{n, T, t}&(d) - \gamma_{n, T, t}(d) \mid \mathcal{LF}\\ &= O_p\left((T-t)\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{(T-t)}}{p^{1/4}} + k^{(T-t)}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right), \end{align}\] where \(\mathcal{C}= \{|\mathcal{I}_T^0|, |\mathcal{I}_t^d| ,(|\mathcal{I}_{q}^{D_{n,q}}|)_{n \in [N], q \in [t+1, \dots, T]}\}\) with \(\pi_{\mathcal{I}} = \max\mathcal{C}, \alpha_{\mathcal{I}} = \min\mathcal{C}\).
(iv) Target Causal Parameter Consistency: For \(n \in [N]\), and \(\bar{d}^T \in [A]^T\): \[\widehat{\mathbb{E}}[Y_{n, T}^{(\bar{d}^T)}] - \mathbb{E}[Y_{n, T}^{(\bar{d}^T)}\mid \mathcal{LF}]=O_p\left(T\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{T}}{p^{1/4}} + k^{T}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\] where \(\mathcal{C}= \{|\mathcal{I}_T^0|, (|\mathcal{I}_{t}^{d_{t}}|)_{t \in [T]} ,(|\mathcal{I}_{t}^{D_{n,t}}|)_{n \in [N], t \in [2, \dots, T]}\}\) with \(\pi_{\mathcal{I}} = \max\mathcal{C}\) and \(\alpha_{\mathcal{I}} = \min\mathcal{C}\). Here, each \(O_p(\cdot)\) is defined with respect to the sequence \(\min\{p, \alpha_{\mathcal{I}}\}\).6
Theorem 3 concludes that the SBE-PCR
estimator is consistent for the causal estimand. More precisely, for a fixed \(k\) and \(T\), the estimation error decays as donor set cardinalities and \(p\) grow, provided \(p =
\omega(\pi_{\mathcal{I}}^{1/3})\).7 Notably, the theorem establishes point-wise consistency, i.e., there is no average across units to establish the result.
The proof can be found in Appendix 10.4.
Assumption 13. Let the setup of Assumption 4 holds. We further assume the counterfactual potential outcomes depends on the most recent constant \(q\) blips, namely, for all units \(n \in [N]\) and \(t \in [q+1, T]\) we have \(\psi_n^{t, t-q-i} = 0\) for all \(i \in [t-q-1]\). Notably, this implies that for any \(n \in [N]\) and \(\bar{d}^T \in [A]^T\) we have \[\mathbb{E}[Y_{n,T}^{(\bar{d}^T)}|\mathcal{LF}] = \sum_{\ell = T- q }^T\langle\psi_n^{T,\ell}, w_{d_{\ell}}\rangle + \varepsilon^{(\bar{d}^T)}_{n,T}.\]
Theorem 4. Let the setup of Theorem 3 and Assumption 13 hold. Then modifying the SBE-PCR to only estimate the baseline, terminal blip, and previous \(q\) blips we have for any \(n \in [N]\), and \(\bar{d}^T \in [A]^T\): \[\widehat{\mathbb{E}}[Y_{n, T}^{(\bar{d}^T)}] - \mathbb{E}[Y_{n, T}^{(\bar{d}^T)}\mid \mathcal{LF}]=O_p\left(\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{q}}{p^{1/4}} + k^{q}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\] where \(\mathcal{C}= \{|\mathcal{I}_T^0|, (|\mathcal{I}_{t}^{d_{t}}|)_{t \in [T-q, \dots, T]} ,(|\mathcal{I}_{t}^{D_{n,t}}|)_{n \in [N], t \in [T-q+1, \dots, T]}\}\) with \(\pi_{\mathcal{I}} = \max\mathcal{C}\) and \(\alpha_{\mathcal{I}} = \min\mathcal{C}\).
Theorem 4 concludes that upon modifying the SBE-PCR
estimator to account for the system only depending
on a constant \(q\) lags we have a consistent estimate of the causal estimand. More precisely, for fixed \(k\), the estimation error decays as donor set cardinalities and number of
covariates \(p\) grow, provided \(p = \omega(\pi_{\mathcal{I}}^{1/3})\). Once again we have established pointwise consistency. However, the key difference from the previous theorem
is that now we allow \(T \to \infty\) as well, which justifies the growing number of covariates by including time-varying covariates, i.e., \(p\) can now depend on \(T\) asymptotically. In the empirical application of Section 6, we include time-varying covariates in estimation. The proof follows immediately from that of Theorem 3 and is included in Appendix 10.5.
Next, we introduce a linear time-invariant factor model, which is analogous to the factor model introduced in Assumption 4 in the previous section, but which further exploits the modeling trade-off discussed in Section 4.2.3.
Assumption 14 (Linear time-invariant (LTI) factor model). Assume \(\forall \;n \in [N]\), \(t \in [T], \bar{d}^t \in [A]^t\), \[\begin{align} Y^{(\bar{d}^t)}_{n, t} &= \sum^{t}_{\ell = 1} \left\langle \psi^{t - \ell}_{n}, w_{d_\ell} \right\rangle + \varepsilon^{(\bar{d}^t)}_{n, t}, \label{eq:LTI95factor95model} \end{align}\tag{22}\] where \(\psi^{t - \ell}_{n}, w_{d_\ell} \in \mathbb{R}^m\) for \(\ell \in [t]\). Further, let \(\mathcal{LF}= \{\psi^{t - \ell}_{n} \}_{n \in [N], t \in [T], \ell \in [0, t-1]} \cup \{w_{d} \}_{d \in [A]}\). Assume \[\begin{align} \mathbb{E}[\varepsilon^{(\bar{d}^t)}_{n, t} \mid \mathcal{LF}] = 0. \end{align}\]
Remark 3. Note Assumption 14 implies Assumption 2 holds with \[\begin{align} v_{n, t} = [\psi^{t - 1}_{n}, \dots, \psi^{t - t}_{n}], \; w_{\bar{d}^t} = [w_{d_1}, \dots, w_{d_t}]. \end{align}\] Further \(m(t) = m \times t\) for \(m(t)\) in Definition 1.
Note that the effect of action \(d_\ell\) on \(Y^{(\bar{d}^t)}_{n, t}\) for \(\ell \in [t]\) is additive, given by \(\langle \psi^{t - \ell}_{n}, w_{d_\ell} \rangle\). Intuitively, \(\psi_n^{t - \ell}\) captures the latent unit specific heterogeneity in the potential outcome for unit \(n\), at a given time step \(t\), for an action taken at time step \(\ell \le t\); analogously \(w_{d_\ell}\) captures the latent effect of action \(d_\ell\). Further, compared to Assumption 4, we now have the additional structure that, rather than being dependent on the specific time steps \(\ell\) and \(t\), \(\psi^{t - \ell}_{n}\) is only dependent on the lag \(t - \ell\). As a result, the effect of action taken at time \(\ell\) on the outcome at time \(t\) is only a function of the lag \(t - \ell\). Hence we call this a “time-invariant” latent factor model, as opposed to a “time-varying” latent factor model. This additional structure will be crucial in the identification strategy we employ in Section 5.2.
For this identification strategy, we make an additional assumption that the control sequence is also time invariant.
Assumption 15. There exists \(\tilde{0}\in [A]\) such that the control sequence \(0_t = \tilde{0}\) for all \(t \in [T]\).
We show that the classical linear time-invariant dynamical system model satisfies Assumption 14. Suppose for all \(t \in [T]\), all units \(n \in [N]\) obey the following dynamic triangular model for a sequence of actions \(\bar{D}^t_{n}\) and counterfactual outcomes \(Y^{(\bar{D}_n^t)}_{n, t}\): \[\begin{align} D_{n, t} &= f_n(w_{D_{n, t-1}}, \;z^{(\bar{D}_n^{t-1})}_{n, t - 1}),\tag{23} \\ Y^{(\bar{D}_n^t)}_{n, t} &= \left\langle \theta_{n}, z^{(\bar{D}_n^t)}_{n, t} \right\rangle + \left\langle \tilde{\theta}_{n}, w_{D_{n, t}} \right\rangle + \tilde{\eta}_{n, t},\tag{24} \end{align}\] where \(z^{(\bar{D}_n^t)}_{n, t} = \boldsymbol{B}_{n} \;z^{(\bar{D}_n^{t-1})}_{n, t - 1} + \boldsymbol{C}_{n} \;w_{D_{n, t}} + \eta_{n, t}\) and \(z_{n, 0} = w_{D_{n, 0}} = 0\). Here, \(z^{(\bar{D}_n^t)}_{n, t} \in \mathbb{R}^{m}\) is the latent state associated with unit \(n\) at time \(t\) and \(w_{D_{n, t-1}} \in \mathbb{R}^{m}\) is the chosen action at time \(t - 1\). \(\eta_{n, t} \in \mathbb{R}^{m}\) and \(\tilde{\eta}_{n, t} \in \mathbb{R}\) represent independent mean-zero random innovations at each time step \(t\). \(\boldsymbol{B}_{n}, \boldsymbol{C}_{n} \in \mathbb{R}^{m \times m}\) are matrices governing the linear dynamics of \(z^{(\bar{D}_n^t)}_{n, t}\). In contrast to the linear time-varying dynamical system described in Section 4.1 above, these transition matrices are invariant across all \(t \in [T]\). \(\theta_{n}, \tilde{\theta}_{n} \in \mathbb{R}^{m}\) are parameters governing how the outcome of interest \(Y^{(\bar{D}_n^t)}_{n, t}\) is a linear function of \(z^{(\bar{D}_n^t)}_{n, t}\) and \(w_{d_t}\), respectively. \(f_n(\cdot)\) is a function which decides how the next action \(w_{D_{n, t}}\) is chosen as a function of the previous action \(w_{D_{n, t - 1}}\), and current state \(z^{(\bar{D}_n^t)}_{n, t}\). We see that due to the input of \(z^{(\bar{D}_n^t)}_{n, t}\) in \(f_n(\cdot)\), i.e., the action sequence is adaptive. As a result, \(\eta_{n, \ell}\) is correlated with \(D_{n, t}\) for \(\ell < t\).
Proposition 3. Suppose the dynamic triangular model 23 –24 holds. Then we have the following representation, \[\begin{align} Y^{(\bar{d}^t)}_{n, t} = \sum^{t}_{\ell = 1} \Big(\left\langle \psi^{t - \ell}_{n}, w_{d_\ell} \right\rangle + \varepsilon_{n, t, \ell} \Big), \label{eq:LTI95factor95model95representation} \end{align}\qquad{(2)}\] where \(\psi^{t, \ell}_{n}, w_{d_\ell} \in \mathbb{R}^m\) for \(\ell \in [t]\); here, \[\begin{align} \psi^{t - \ell}_{n} &= \left( \boldsymbol{B}^{t - \ell}_n \boldsymbol{C}_n \right)' \theta_n \quad \text{for} \quad \ell \in [t-1], \\ \psi^{0}_{n} &= (\boldsymbol{C}_n)' \theta_n + \tilde{\theta_n}, \\ \varepsilon_{n, t, \ell} &= \left( \boldsymbol{B}^{t - \ell}_n \eta_{n, \ell} \right)' \theta_n \quad \text{for} \quad \ell \in [t-1], \\ \varepsilon_{n, t, t} &= \left( \eta_{n, t} \right)' \theta_n + \tilde{\eta}_{n, t}. \end{align}\] Therefore, Assumption 14 holds with the additional structure that \(\varepsilon^{(\bar{d}^t)}_{n, t}\) has an additive factorization as \(\sum^{t}_{\ell = 1} \varepsilon_{n, t, \ell}\), and it is not a function of \(d_\ell\).
In this example, our target parameter \(\mathbb{E}[Y^{(\bar{d}^T)}_{n, T} ~|~ \mathcal{LF}]\) defined in 2 translates to the expected potential once we condition on the latent parameters \(\psi^{t, \ell}_{n}, w_{d_\ell}\), which are a function of \(\boldsymbol{B}_{n}, \boldsymbol{C}_{n}, \theta_{n}, \tilde{\theta}_{n}\), and we take the average over the per-step independent mean-zero random innovations, \(\varepsilon_{n, t, \ell}\), which is a function of \(\eta_{n, \ell}\) (and \(\boldsymbol{B}_{n}, \boldsymbol{C}_{n}, \theta_{n}\)).
Our goal in this section is to identify \(\mathbb{E}[Y_{n,T}^{(\bar{d}^T)}\mid \mathcal{LF}]\), namely represent this expected potential outcome for a target unit \(n\) and action sequence \(\bar{d}^T\) as some function of observed outcomes.
Notation. We define the following useful notation for any unit \(n \in [N]\) and \(t \in [T]\): \[\begin{align} \gamma_{n, T - t}(d) := \left\langle \psi^{T - t}_{n}, w_{d} - w_{\tilde{0}} \right\rangle. \nonumber \end{align}\] The quantity \(\gamma_{n, T - t}(d)\) can be interpreted as a “blip effect”—the expected difference in potential outcomes if unit \(n\) undergoes the sequence \((\bar{d}^{T - t}, \underline{0}^{T - t + 1})\) instead of \((\bar{d}^{T - t - 1}, \underline{0}^{T - t})\). This is because, Assumption 14 and 15 imply \[\begin{align} &\mathbb{E}\left[Y^{(\bar{d}^{T - t}, \underline{0}^{T - t + 1})}_{n, T} - Y^{(\bar{d}^{T - t - 1}, \underline{0}^{T - t})}_{n, T} \mid \mathcal{LF}\right] \nonumber \\&= \mathbb{E}\left[\left\langle \psi^{T - t}_{n}, w_{d_t} - w_{\tilde{0}} \right\rangle + \varepsilon^{(\bar{d}^{T - t}, \underline{0}^{T - t + 1})}_{n, T} - \varepsilon^{(\bar{d}^{T - t - 1}, \underline{0}^{T - t})}_{n, T} \mid \mathcal{LF}\right] \nonumber \\&= \left\langle \psi^{T - t}_{n}, w_{d_t} - w_{\tilde{0}} \right\rangle \mid \mathcal{LF}. \nonumber \end{align}\] Further, let \[\begin{align} b_{n, T} &:= \sum^{T}_{\ell = 1} \left\langle \psi^{T - \ell}_{n}, w_{\tilde{0}} \right\rangle. \nonumber \end{align}\] This can be interpreted as the expected potential outcome if unit \(n\) remains under the control sequence \(\bar{0}^T\) till time step \(T\). Again, Assumption 14 and 15 imply \[\begin{align} \mathbb{E}\left[ Y^{(\bar{0}^T)}_{n, T} \mid \mathcal{LF}\right] &= \mathbb{E}\left[\sum^T_{t = 1} \left\langle \psi^{T, t}_{n}, w_{\tilde{0}} \right\rangle + \varepsilon^{(\bar{0}^T)}_{n, t} \mid \mathcal{LF}\right] = \sum^T_{t = 1} \left\langle \psi^{T, t}_{n}, w_{\tilde{0}} \right\rangle \mid \mathcal{LF}. \end{align}\]
We now state identifying assumptions.
We define two distinct subsets of units based on the treatment sequence they receive: \[\begin{align} \mathcal{I}^d \mathrel{\vcenter{:}}= \{j \in [N]: & \;(i) \;\bar{D}^{t^*_j}_{j} = (\tilde{0}, \dots, \tilde{0}, d), \nonumber \\ &(ii) \;\forall \;\bar{\delta}^t \in [A]^t, t \in [T], \;\mathbb{E}[Y^{(\bar{\delta}^t)}_{j, t} \mid \mathcal{LF}, \bar{D}^{t^*_j}_{j}] = \mathbb{E}[Y^{(\bar{\delta}^t)}_{j, t} \mid \mathcal{LF}] \}, \tag{25} \\ \mathcal{I}^{0}_t \mathrel{\vcenter{:}}= \{j \in [N]: & \;(i) \;\bar{D}^{t}_{j} = (\tilde{0}, \dots, \tilde{0}), \nonumber \\ &(ii) \;\forall \;\bar{\delta}^\ell \in [A]^\ell, \ell \in [T], \;\mathbb{E}[Y^{(\bar{\delta}^\ell)}_{j, \ell} \mid \mathcal{LF}, \bar{D}^{t}_{j}] = \mathbb{E}[Y^{(\bar{\delta}^\ell)}_{j, \ell} \mid \mathcal{LF}] \}. \tag{26} \end{align}\] The donor set \(\mathcal{I}^d\) contains units that remain under the control sequence \((\tilde{0}, \dots, \tilde{0})\) till time step \(t^*_j - 1\), and at time step \(t^*_j\) receive action \(d\). Further, we require that for these particular units, the action sequence, \(\bar{D}^{t^*_j}_{j}\), till time step \(t^*_j\) was chosen such that \(\mathbb{E}[Y^{(\bar{\delta}^t)}_{j, t} \mid \mathcal{LF}, \bar{D}^{t^*_j}_{j}] = \mathbb{E}[Y^{(\bar{\delta}^t)}_{j, t} \mid \mathcal{LF}]\) for all \(\bar{\delta}^t \in [A]^t\), i.e., the potential outcomes are conditionally mean independent of the action sequence \(\bar{D}^{t^*_j}_{j}\) unit \(j\) receives till time step \(t^*_j\). Of course, a sufficient condition for property (ii) above is that \(\forall \;\bar{\delta}^t \in [A]^t, \;\;Y^{(\bar{\delta}^t)}_{j ,t} \perp \bar{D}^{t^*_j}_{j} \mid \mathcal{LF}\). That is, for these units, the action sequence till time step \(t^*_j\) is chosen at \(t = 0\) conditional on the latent factors, i.e., the policy for these units can only be adaptive from time step \(t^*_j + 1\). Note, given Assumption 14, 25 can be equivalently stated as \(\mathbb{E}[\varepsilon^{(\bar{\delta}^t)}_{j, t} \mid \mathcal{LF}, \bar{D}^{t^*_j}_{j}] = \mathbb{E}[\varepsilon^{(\bar{\delta}^t)}_{j, t} \mid \mathcal{LF}] = 0\). The donor set \(\mathcal{I}^{0}_t\) follows a similar intuition to that of \(\mathcal{I}^d\).
Assumption 16. For \(n \in [N]\), let \(v_{n, T} := [\psi^{0}_{n}, \dots, \psi^{T - 1}_{n}]\). We assume that for all \(n \in [N]\), \(v_{n, T}\) satisfies a well-supported condition with respect to the various donor sets, i.e., for all \(d \in [A]\) there exists \(\beta^{n,\mathcal{I}^d} \in \mathbb{R}^{|\mathcal{I}^d|}\), and \(\beta^{n, \mathcal{I}^{0}_t} \in \mathbb{R}^{|\mathcal{I}^{0}_t|}\) such that \[\begin{align} v_{n, T} = \sum_{k \in \mathcal{I}^d} \beta_k^{n,\mathcal{I}^d} v_{k, T}, \quad v_{n, T} = \sum_{k \in \mathcal{I}^{0}_t} \beta_k^{n, \mathcal{I}^{0}_t} v_{k, T}. \label{eq:LTI95well95supported} \end{align}\tag{27}\]
Assumption 16 essentially states that for the various units \(n \in [N]\), their latent factors \([\psi^{T - 1}_{n}, \dots, \psi^{T - T}_{n}]\) are expressible as a linear combination of the units in the donor sets \(\mathcal{I}^d\) and \(\mathcal{I}^{0}_t\). See the discussion under Assumption 3 in Section 3 justifying such an assumption for settings where \(\mathcal{I}^d\) and \(\mathcal{I}^{0}_t\) are sufficiently large.
Assumption 17. For all \(n \in [N], t \in [T]\), \(\bar{\delta}^t \in [A]^t\), \[\begin{align} \mathbb{E}\left[\varepsilon_{n, t}^{(\bar{\delta}^t)} - \varepsilon_{n, t}^{(\bar{\delta}^{t - 1}, \tilde{0})} \mid \bar{D}^t_{n} = \bar{\delta}^t, \;\mathcal{LF}\right] = 0. \end{align}\]
For sufficient conditions under which Assumption 17 holds, see the discussion under Assumption 6 in Section 4.2—an analogous argument holds here. Also an analogous version of Proposition 2 holds for the linear time-invariant setting using an identical argument.
Given these assumptions, we now present our identification theorem.
Theorem 5. Let Assumptions 1, 14, 15, 16, and 17 hold. Then, for any unit \(n \in [N]\) and action sequence \(\bar{d}^T \in [A]^T\), the expected counterfactual outcome can be expressed as: \[\begin{align} \mathbb{E}[Y_{n,T}^{(\bar{d}^T)}\mid \mathcal{LF}] =~& \sum_{t =1}^T \gamma_{n, T - t}(d_t) + b_{n, T} \mid \mathcal{LF},\label{eq:LTI95identification} \end{align}\tag{28}\] where quantities on the right-hand side are identified as follows:
(i) We have the following representations of the baseline outcomes for all \(t \in [T]\) \[\begin{align} &\forall \;j \in \mathcal{I}^{0}_t~:~ b_{j,t} \mid \mathcal{LF}= \mathbb{E}[Y_{j, t} \mid \mathcal{LF}, \;j \in \mathcal{I}^{0}_t], \tag{29} \\ &\forall \;i \notin \mathcal{I}^{0}_t~:~ b_{i,t}\mid \mathcal{LF}= \sum_{j \in \mathcal{I}^{0}_t} \beta_j^{i,\mathcal{I}^{0}_t} \;b_{j,t} \mid \mathcal{LF}, \mathcal{I}^{0}_T. \tag{30} \end{align}\] (ii) We have the following representations of the blip effect with \(0\) lag, for \(\forall d \in [A]\): \[\begin{align} \forall \;j \in \mathcal{I}^d~:~& \gamma_{j, 0}(d) \mid \mathcal{LF}=~ \mathbb{E}[Y_{j, t^*_j} \mid \mathcal{LF}, \;j \in \mathcal{I}^d] - b_{j, t^*_j} \mid \mathcal{LF}, \tag{31} \\ \forall \;i \notin \mathcal{I}^d~:~& \gamma_{i, 0}(d) \mid \mathcal{LF}=~ \sum_{j \in \mathcal{I}^d} \beta_j^{i, \mathcal{I}^d} \gamma_{j, 0}(d) \mid \mathcal{LF}, \mathcal{I}^d. \tag{32} \end{align}\] (iii) We have the following recursive representations of the blip effect \(\forall \;t \in [T - 1], \;d \in [A]\): 8 \[\begin{align} \forall \;j \in \mathcal{I}^d&~:~ \gamma_{j, t}(d) \mid \mathcal{LF}=~ \mathbb{E}[Y_{j, t^*_j + t} \mid \mathcal{LF}, \mathcal{I}^d] - b_{j, t^*_j + t} \mid \mathcal{LF}- \sum^{t - 1}_{\ell = 0} \gamma_{j, \ell}(D_{j, t^*_j + t - \ell}) \mid \mathcal{LF}, \tag{33} \\ \forall \;i \notin \mathcal{I}^d&~:~ \gamma_{i, t}(d) \mid \mathcal{LF}=~ \sum_{j \in \mathcal{I}^d} \beta_j^{i, \mathcal{I}^d} \gamma_{j, t}(d) \mid \mathcal{LF}, \mathcal{I}^d \tag{34}. \end{align}\]
28 states that our target causal parameter of interest can be written as an additive function of \(b_{n, T}\) and \(\gamma_{n,T - t}(d_t)\) for \(t \in [T]\) and \(d_t \in [A]\). Theorem 5 establishes that these various quantities are expressible as functions of observed outcomes\(\{Y_{j, t}\}_{j\in [N], t \in [T]}\). We give an interpretation below.
Identifying baseline outcomes. Similar to the intuition for Theorem 2, for units \(j \in \mathcal{I}^0_t\), 29 states that their baseline outcome \(b_{j, t}\) is simply their expected observed outcome at time step \(t\), i.e., \(Y_{j, t}\). For units \(i \notin \mathcal{I}^0_t\), 30 states that we can identify \(b_{i, t}\) by appropriately re-weighting the baseline outcomes \(b_{j, t}\) of the units \(j \in \mathcal{I}_t^0\) (identified via 29 ).
Identifying blip effects for lag \(0\). For any given \(d \in [A]\): For units \(j \in \mathcal{I}^d\), 31 states that their blip effect \(\gamma_{j,0}(d)\) is equal to their observed outcome \(Y_{j,t^*_j}\) minus the baseline outcome \(b_{j, t^*_j}\) (identified via 30 ). Recall \(t^*_j\) is equal to the first time step that unit \(j\) is no longer in the control sequence. For units \(i \notin \mathcal{I}^d\), 32 states that we can identify \(\gamma_{i,0}(d)\) by appropriately re-weighting the blip effects \(\gamma_{j,0}(d)\) of units \(j \in \mathcal{I}^d\) (identified via 31 ).
Identifying blip effects for lag \(t\) with \(t \in [T - 1]\). Suppose by induction \(\gamma_{n, \ell}(d)\) is identified for every lag \(\ell < t\), \(n \in [N]\), \(d \in [A]\), i.e., can be expressed in terms of observed outcomes. Then for any given \(d \in [A]\): For units \(j \in \mathcal{I}^d\), 33 states that their blip effect \(\gamma_{j, t}(d)\) is equal to their their observed outcome at time step \(t^*_j + t\), \(Y_{j, t^*_j + t}\), minus the baseline outcome \(b_{j, t^*_j + t}\) (identified via 30 ) minus the sum of blip effects for smaller lags, \(\sum_{\ell= 0}^{t - 1} \gamma_{j, \ell}(D_{j, t^*_j + t - \ell})\) (identified via the inductive hypothesis). For units \(i \notin \mathcal{I}^d\), 34 states that we can identify \(\gamma_{i, t}(d)\) by appropriately re-weighting the blip effects \(\gamma_{j, t}(d)\) of units \(j \in \mathcal{I}^d\) (identified via 33 ).
Donor sample complexity. To estimate \(\mathbb{E}[Y^{(\bar{d}^T)}_{n, T} ~|~ \mathcal{LF}]\) for all units \(n \in [N]\) and any action sequence \(\bar{d}^T \in [A]^T\), the LTI identification strategy requires the existence of a sufficiently large subset of donor units \(\mathcal{I}^d\) for every \(d \in [A]\) and \(\mathcal{I}^{0}_t\) for \(t \in [T]\). That is, the number of donor units we require will need to scale at the order of \(A\) to ensure sufficient number of units for the donor sets \(\{\mathcal{I}^d\}_{d \in [A]}\). To ensure that we have sufficient number of donors units for \(\mathcal{I}^{0}_t\) for \(t \in [T]\). But notice from the definition of \(\mathcal{I}^{0}_t\) that for all \(t \in [T - 1]\), \(\mathcal{I}^{0}_t \subset \mathcal{I}^{0}_T\). Hence, we just require that \(\mathcal{I}^{0}_T\) is sufficiently large. As a result the total donor sample complexity needs to scale at the order of \(A + 1\). Thus we see the the additional structure imposed by the time-invariant factor model introduced in Assumption 14 leads to a decrease in sample complexity from \(A \times T\) to \(A + 1\), when compared with the time-varying factor model factor model introduced in Assumption 4. The other major assumption made is that the control sequence is also not time varying, see Assumption 15.
Further, for \(j \in \mathcal{I}^d\), we require that \(\forall \;\bar{\delta}^t \in [A]^t, t \in [T], \;\mathbb{E}[Y^{(\bar{\delta}^t)}_{j, t} \mid \mathcal{LF}, \bar{D}^{t^*_j}_{j}] = \mathbb{E}[Y^{(\bar{\delta}^t)}_{j, t} \mid \mathcal{LF}]\). That is, the actions picked for these donor units are only required to be non-adaptive till time step \(t^*_j\). As a special case, if we restrict ourselves to units \[\begin{align} \tilde{\mathcal{I}}^d \mathrel{\vcenter{:}}= \{j \in \mathcal{I}^d: t^*_j = 1\}, \end{align}\] then we actually impose no exogeneity conditions. That is, for these donor units, their entire action sequence can be adaptive. In contrast for the identification strategy in Section 4, we require that the donor units in \(\mathcal{I}^d_t\) are non-adaptive till time step \(t\). See Figure 3 for a DAG that is consistent with the exogeneity conditions implied by the definition of \(\tilde{\mathcal{I}}^d\) in 8 .
SBE-PCR
Estimator in LTI Setting↩︎Now we detail the specific algorithm that yields the SBE-PCR
estimator within the linear time-invariant setting. Analogous to the LTV case, we consider additional covariates with the usual factor decomposition (Assumption 7) and make an additional well supported assumption regarding control factors for consistency.
Assumption 18. For any donor set, i.e., any \(d \in [A]\) and \(n \in \mathcal{I}^d\) there exist weights \(\phi^{n, \mathcal{I}^d}\) such that \[v_{n, T} = \sum_{k \in \mathcal{I}^d \setminus n} \phi^{n, \mathcal{I}^d}_k \cdot v_{k, T},\] and for any \(t \in [T]\), there exist weights \(\phi^{n, \mathcal{I}^0_t}\) such that
\[v_{n, T} = \sum_{k \in \mathcal{I}^0_t \setminus n} \phi^{n, \mathcal{I}^0_t}_k \cdot v_{k, T},\] where \(v_{n, T} = \left[ \psi^{T-1}_n, \ldots, \psi^0_n \right].\)
This assumption allows us to detail the algorithm for estimating weights using PCR. Specifically, for each unit \(n \in [N]\) and each donor set \(\mathcal{I}\in \{\mathcal{I}^d, \mathcal{I}^0_t\}\), we estimate weights to express \(X_n\) as a linear combination of the covariates from other donor units. Let \(X_n \in \mathbb{R}^p\) be the observed covariate vector for unit \(n\), and let \(X_{\mathcal{I}\setminus n} \in \mathbb{R}^{p \times (|\mathcal{I}| - 1)}\) denote the matrix of covariates for the other units in the donor set.
We perform PCR by computing the rank-\(k\) approximation of \(X_{\mathcal{I}\setminus n}\), where \(k = \text{rank}(\mathbb{E}[X_{\mathcal{I}\setminus n}])\). Denote the SVD as \[X_{\mathcal{I}\setminus n} = \sum_{l \geq 1} \sigma_l u_l v_l^\top,\] where \(u_l \in \mathbb{R}^p\), \(v_l \in \mathbb{R}^{|\mathcal{I}|-1}\), and \(\sigma_l\) are sorted in descending order. If \(n \in \mathcal{I}\), \[\hat{\phi}^{n, \mathcal{I}} = \left( \sum_{l=1}^{k} (1/\sigma_l) v_l u_l^\top \right) X_n \in \mathbb{R}^{|\mathcal{I}| - 1},\] and if \(n \notin \mathcal{I}\), \[\hat{\beta}^{n, \mathcal{I}} = \left( \sum_{l=1}^{k} (1/\sigma_l) v_l u_l^\top \right) X_n \in \mathbb{R}^{|\mathcal{I}|}.\]
The distinction between \(\hat{\phi}^{n, \mathcal{I}}\) and \(\hat{\beta}^{n, \mathcal{I}}\) lies in whether the unit is part of the donor set (interpolation) or not (extrapolation), which has implications for estimator variance.
Step 1: Estimate baseline outcomes.
For \(t \in [T]\):
For \(j \in \mathcal{I}^0_t\) \[\begin{align} \hat{b}_{j, t} = \sum_{k \in \mathcal{I}^0_t \setminus j} \hat{\phi}_{k}^{j,\mathcal{I}^0_t} Y_{k,t}. \end{align}\]
For \(i \notin \mathcal{I}^0_t\) \[\begin{align} \hat{b}_{i, t} = \sum_{j \in \mathcal{I}^0_t} \hat{\beta}_{j}^{i,\mathcal{I}^0_t} Y_{j,t}. \end{align}\]
Step 2: Estimate blip effects for lag \(0\).
For \(d \in [A]\):
For \(j \in \mathcal{I}^d\) \[\begin{align} \hat{\gamma}_{j, 0}(d) &= \sum_{k \in \mathcal{I}^d \setminus j} \hat{\phi}_{k}^{j,\mathcal{I}^d} \left( Y_{k, t_k^*} - \hat{b}_{k, t_k^*} \right). \end{align}\]
For \(i \notin \mathcal{I}^d\) \[\begin{align} \hat{\gamma}_{i, 0}(d) = \sum_{j \in \mathcal{I}^d} \hat{\beta}_{j}^{i,\mathcal{I}^d} \hat{\gamma}_{j, 0}(d). \end{align}\]
Step 3: Recursively estimate blip effects for time \(t < T\).
For \(d \in [A]\) and \(t \in \{1, \dots, T -1\}\), recursively estimate as follows:
For \(j \in \mathcal{I}^d\) \[\begin{align} \hat{\gamma}_{j, t}(d) &= \sum_{k \in \mathcal{I}^d \setminus j} \hat{\phi}_{k}^{j,\mathcal{I}^d} \left( Y_{k, t_k^* + t} - \hat{b}_{k, t_k^* + t} - \sum_{\ell = 0}^{t - 1} \hat{\gamma}_{k, \ell}(D_{k, t_k^* + t - \ell}) \right). \end{align}\]
For \(i \notin \mathcal{I}^d\) \[\begin{align} \hat{\gamma}_{i, t}(d) = \sum_{j \in \mathcal{I}^d} \hat{\beta}_{j}^{i,\mathcal{I}^d} \hat{\gamma}_{j, t}(d). \end{align}\]
Step 4: Estimate target causal parameter. For \(n \in [N]\), and \(\bar{d}^T \in [A]^T\), estimate the causal parameter as follows: \[\begin{align} \label{eq:causal95estimator2} \widehat{\mathbb{E}}[Y_{n, T}^{(\bar{d}^T)}\mid \mathcal{LF}] =~& \sum_{t =1}^T \hat{\gamma}_{n, T - t}(d_t) + \hat{b}_{n,T} . \end{align}\tag{35}\]
All the relevant weights in the above algorithm as computed via the previous PCR based algorithm.
SBE-PCR
Consistency in LTI Setting↩︎We now state the assumptions required for consistency of the SBE-PCR
estimator under the LTI latent factor model. These assumptions parallel those in the LTV setting, with simplifications reflecting the time-invariant latent structure. We
unify the donor set notation by writing \(\mathcal{I}\in \{\mathcal{I}^d : d \in [A]\} \cup \{\mathcal{I}^0_t : t \in [T]\}\), and refer to the relevant donor set generically as \(\mathcal{I}\). Assumptions 9 and 10 in Section 4.4 are maintained here.
Assumption 19 (Well-Balanced Singular Values). For each donor set \(\mathcal{I}\), the covariate matrix \(X_{\mathcal{I}} \in \mathbb{R}^{p \times |\mathcal{I}|}\) satisfies: \[\left\|\mathbb{E}[X_{\mathcal{I}} | \mathcal{LF}] \right\|_F \geq c' p |\mathcal{I}|, \quad \text{and} \quad \kappa^{-1} \geq c,\] where \(\kappa\) is the condition number of \(\mathbb{E}[X_{\mathcal{I}} | \mathcal{LF}]\), and \(c, c' > 0\) are constants.
Assumption 20 (Row-Space Inclusion). For any \(t \in [T]\) we require existence of the weights \(\xi^{(0, t)} \in \mathbb{R}^p\) such that for any \(j \in \mathcal{I}^0_t\) \[\mathbb{E}[Y_{j, t} |\mathcal{LF}, j \in \mathcal{I}^0_t] = \sum_{i = 1}^p \xi^{(0,t)}_i \cdot \mathbb{E}[(X_{\mathcal{I}^0_t})_{ij} | \mathcal{LF}, j \in \mathcal{I}^0_t],\] for any \(t \geq 0\) and \(j \in \mathcal{I}^d\) there exist \(\xi^{(d,t)} \in \mathbb{R}^p\) such that \[\mathbb{E}[Y_{j, t_j^* + t}|\mathcal{LF}, j \in \mathcal{I}^d] = \sum_{i = 1}^p \xi_i^{(d,t)}\cdot \mathbb{E}[(X_{\mathcal{I}^d})_{ij}|\mathcal{LF}, j \in \mathcal{I}^d],\] and for any \(t \geq 0\) and \(j \in \mathcal{I}^d\) there exists \(\alpha^{(0,t)} \in \mathbb{R}^p\) such that \[\mathbb{E}\left[Y_{j, t_j^* + t}^{(\mkern 1.3mu\overline{\mkern-1.3mu\tilde{0}\mkern-1.3mu}\mkern 1.3mu^{t_j^* + t})}\big|\mathcal{LF}, j \in \mathcal{I}^d\right] = \sum_{i = 1}^p \alpha_i^{(0,t)}\cdot \mathbb{E}[(X_{\mathcal{I}^d})_{ij}|\mathcal{LF}, j \in \mathcal{I}^d].\]
Assumption 20 is similar to Assumption 12, but using donor sets \(\mathcal{I}\in \{\mathcal{I}^d\} \cup \{\mathcal{I}^0_t\}\) relevant to the LTI setting.
We only present the main consistency theorem that allows \(T\) to grow; consistent results for fixed \(T\) that serve as preliminaries for proving this theorem are contained in Appendix 11.4.
Assumption 21. Let the setup of Assumption 14 hold. We further assume the counterfactual potential outcomes depends on the most recent constant \(q\) blips, namely, for all units \(n \in [N]\) we have \(\psi_n^{q+i} = 0\) for all \(i \in [T-q-1]\). Notably, this implies that for any \(n \in [N]\) and \(\bar{d}^T \in [A]^T\) we have \[\mathbb{E}[Y_{n,T}^{(\bar{d}^T)}|\mathcal{LF}] = \sum_{\ell = T- q }^T\langle\psi_n^{T-\ell}, w_{d_{\ell}}\rangle + \varepsilon^{(\bar{d}^T)}_{n,T}.\]
Theorem 6. Let Assumption 1 to 7, 9, 10, and 14 to 21 hold. Consider the SBE-PCR
estimator in Section 5.3 modified to only estimate the baseline, terminal blip, and previous \(q\) blips, and suppose \(k = \max_{\mathcal{I}\in \{\mathcal{I}^d\}
\cup \{\mathcal{I}^0_t\}}\text{rank}(\mathbb{E}[X_{\mathcal{I}}])\). Then we have for any \(n \in [N]\), and \(\bar{d}^T \in [A]^T\): \[\widehat{\mathbb{E}}[Y_{n, T}^{(\bar{d}^T)}] - \mathbb{E}[Y_{n, T}^{(\bar{d}^T)}\mid \mathcal{LF}]=O_p\left(\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{q}}{p^{1/4}} + k^{q}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}},
\frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\] where \(\mathcal{C}= \{|\mathcal{I}_T^0|, |\mathcal{I}^0_1|,(|\mathcal{I}^{d_{t}}|)_{t \in [T-q, \dots, T]} ,(|\mathcal{I}^{D_{n,t_n^* +
t}}|)_{n \in [N], t \in [1, \dots, q]}\}\) with \(\pi_{\mathcal{I}} = \max\mathcal{C}\) and \(\alpha_{\mathcal{I}} = \min\mathcal{C}\).
Theorem 6 concludes that upon modifying the SBE-PCR
estimator to account for the system only
depending on a constant \(q\) lags we have a consistent estimator of the causal estimand. More precisely, for fixed \(k\), the estimation error decays as donor set cardinalities and
number of covariates \(p\) grow, provided \(p = \omega(\pi_{\mathcal{I}}^{1/3})\). The growing number of covariates can be justified by including time varying covariates with \(T \to \infty\). Again, we have established pointwise consistency, i.e., there is no average across units to establish the result. The theorem’s proof is included in Appendix 11.5.
To gain some intuition about the difference in order between the minimum donor set cardinality and the maximum donor set cardinality appearing in the error rate bound, observe the following. Suppose the maximum \(\pi_{\mathcal{I}}\) is attained at \(|\mathcal{I}^0_T|\) while the minimum \(\alpha_{\mathcal{I}}\) is attained at \(|\mathcal{I}^0_1|\). Since the donor sets are strictly nested, \(\mathcal{I}^0_1 \subset \mathcal{I}^0_2 \subset \cdots \subset \mathcal{I}^0_T\), their cardinalities are strictly increasing in \(t\). It follows that the ratio between the maximum and minimum cardinalities grows by at least a multiplicative factor of order \(T\). Even if the extremal values are not realized at the time dependent donor sets, the strict nesting still guarantees a gap of order at least \(T\) between the minimum and maximum donor set cardinalities.
The goal of this section is to showcase the usefulness of our approach in understanding individual dynamic treatment effects and developing optimal allocation rules in a real-world application where panel data are available. We first introduce the backgrounds on financial credit support for exporting firms and data (Sections 6.1 and 6.2) and report the synthetic blip estimates of support impacts (Section 6.3). We then investigate the extent of possible improvement of support allocation for each firm (Section 6.4.1) and develop an optimal targeting rule for allocating support based on firm characteristics (Section 6.4.2).
Exporting is inherently risky, requiring firms to secure upfront working capital, offer extended payment terms, and protect themselves against non-payment or foreign market shocks. When trade finance dried up during the Great Recession, the resulting contraction disproportionately hit firms reliant on weak banks or operating in finance-dependent sectors [57]–[59]. These vulnerabilities matter particularly in economies that rely heavily on international trade. The Korean economy is a great example of this, as exports accounted for 45–58% of Gross Domestic Product (GDP) between 2006 and 2015, making the economy highly sensitive to fluctuations in global trade and financial conditions. This dependence heightens exposure to geopolitical frictions, as Korea sits between China and the United States, where tariff disputes, supply-chain tensions, and restrictions on key sectors regularly generate uncertainty. In this environment, ECAs play a critical role by using public funds to provide insurance and loans that enable firms to sustain and expand their export activities.
Korea has two independent ECAs: the Korea Trade Insurance Corporation (K-SURE), which specializes in export insurance, and the Export-Import Bank of Korea (EXIM), which provides export loans. Firms seeking support apply through the relevant agency: K-SURE assesses the creditworthiness of exporters and their foreign buyers, while EXIM evaluates financial stability and contract documents. In practice, these agencies evaluate applications and select firms to support based on firm characteristics—an approach that connects to our later analyses of heterogeneity and optimal treatment allocation. Also, the agencies’ selections are independent of one another, leaving room for potential improvements through communication and collaboration.
Using Korean firm-level data described below, we empirically examine how export credit support shapes firm performance and how its allocation can be improved. Estimating treatment effects is a necessary first step, but the policy challenge goes further: agencies must decide which firms to support, through which instruments, and at what point in time. By comparing observed allocation patterns with the counterfactual benchmark predicted by our model, we demonstrate how more efficient targeting could deliver greater export growth with the same or fewer government resources. We further extend the optimal allocation analysis by allowing the rule to depend on firm characteristics. This motivation stems from heterogeneity in impacts across firm size, productivity, and financial constraints. Such a framework is particularly useful for making allocation decisions about newly entering firms and closely mirrors the agencies’ own selection processes.
Our empirical analysis relies on a novel Korean firm-level panel dataset for 2006–2015 that links three sources of firm-level data: (i) the Survey of Business Activities (SBA) from Statistics Korea,9 which provides detailed firm characteristics; (ii) export insurance data from the K-SURE; and (iii) export loan data from the EXIM. Combining these sources allows us to track which firms received support, the form and timing of support, and their subsequent performances. We define the treatment group as firms that did not receive support in the first five years (2006–2010) but received at least one form of support in the later period (2011–2015). The control group consists of firms that were never supported during the sample period. Out of 2,052 unique firms, 167 received support at least once in the later period.
The outcome of interest is the export value of firm \(n\) in year \(t\), \(Y_{n,t}\). The vector of firm-level covariates, \(X_{n,t}\), consists of eleven time-varying and two time-invariant variables. The time-varying covariates include exports relative to sales (export share), sales, number of workers, tangible capital stock, value-added, total factor productivity (TFP), total wage bill, R&D expenditure, debt-to-asset ratio, current assets over current liabilities (liquidity ratio), and a dummy for foreign direct investment (FDI).10 The time-invariant firm covariates include an indicator for parent-company affiliation and the firm’s age. At the industry level, we control for \(Z_{m}\), a vector of two indicators for whether industry \(m\) has an above average capital intensity and above average wage per worker.11
In each year \(t\in \{T_0 +1,...,T\}\), firm \(n\) receives treatment \(D_n\in[A]_0=\{0,1,2,3\}\): \(D_n=1\) if firm
\(n\) receives insurance, \(D_n=2\) if it receives loans, \(D_n=3\) if it receives both, and \(D_n=0\) indicates it receives
none. In this application, \(t_n^{*}=T_0 = 5\) for all \(n\) and \(T=10\) (i.e., no firm receives treatment until period \(T_0\)) and we redefine \(\bar{d}^t=(d_{T_0+1},...,d_{t})\) for notational simplicity. We assume the LTV latent factor model (Section 4) and the
number of lags to be \(q=1\) in Assumption 13. Using the SBE-PCR
algorithm, we
estimate \(\mathbb{E}[Y_{n,t}^{(\bar{d}^{t})}\mid\mathcal{LF}]\) for given \(\bar{d}^t \in [A]_0^t\) and for each firm \(n\) and \(t\ge T_{0}+q+1=7\) (i.e., the last four periods) in the data. These quantities are the crucial ingredient for all the analyses below: they are used to calculate average counterfactual outcomes and average treatment effects and
to conduct policy learning.
To understand time-varying effects of the financial support, we report the trajectories of various average counterfactual outcomes and average dynamic treatment effects. First, we estimate the average treatment effects of financial support relative to no intervention, where the average is taken across firms. Each support sequence is defined as \(\bar{d}^T=(d,d,d,d,d)\) for \(d=1\) or \(2\) (again, \(1\) being insurance support and \(2\) being loans support). Figure 4 shows distinct patterns between the two treatments over post-treatment periods. Insurance has little effect initially but generates sizable gains from the third year onward, consistent with insurance stabilizing performance and supporting longer-run growth. Evidence on export credit insurance similarly shows that risk-mitigation instruments help sustain trade by reducing uncertainty rather than generating immediate effects [65]. Loans show a negative effect in the first year, followed by positive effects that bring cumulative gains close to insurance. A natural interpretation is that early loan use covers input costs before output materializes, depressing short-run outcomes but enabling later expansion. This dynamic is consistent with the working-capital channel, where financing upfront input costs can depress short-run outcomes before revenues are realized [59], [66], [67]. The cumulative average treatment effect—i.e., the sum of effects across four years—amounts to 78.2 billion KRW for insurance and 65.6 billion KRW for loans.
We next turn to a sequence-specific analysis. We attempt to understand how timing (but not the total amount of support) affects the trajectory of outcomes by considering the following: (i) a front-loading treatment \(\bar{d}^T=(d,d,d,0,0)\) for either \(d=1\) or \(2\), which concentrates support at the start; (ii) an even-loading treatment \((d,0,d,0,d)\), which spreads it evenly; and (iii) a back-loading treatment \((0,0,d,d,d)\), which defers it to the end.12 Figure 5 reports average potential outcomes under these strategies, with panel (a) presenting insurance and panel (b) loans. For insurance, the cumulative average potential export value is 123.36 for front, 173.09 for even, and 225.21 for back insurance, all in billion KRW. For loans, the corresponding values are 158.06 for front, 123.73 for even, and 199.13 for back support. For insurance, back-loading produces the largest cumulative gains, while even-loading underperforms, suggesting that distributing support thinly is less effective than concentrating it. For loans, back-loading again dominates, with even-loading weaker than front-loading, consistent with credit being most valuable when timed around production peaks. Overall, the results indicate that not only timing but also spacing matters: smoothing interventions across periods is generally less effective than concentrating them, though the optimal pattern varies by treatment types.
In providing financial support, each ECA has its own rules for selecting export firms. It would be interesting to investigate (i) whether better (statistical) selection rules could have been used for each support program compared to the observed selections, (ii) whether collaboration among agencies in the selection process would have led to gains, and (iii) what selection rule could be implemented for new firms. Questions (i) and (ii) relate to retrospective policy learning, while (iii) corresponds to prospective policy learning.
For each firm \(n\) in the data, we consider the optimal treatment schedule \(\bar{d}^{T*}(n) \in \mathcal{D}\) that maximizes the aggregate outcome:
\[\label{eq:opt95alloc}
\bar{d}^{T*}(n) \in \mathop{\mathrm{\arg\!\max}}_{\bar{d}^{t}\in \mathcal{D}}\sum_{t=T_{0}+q+1}^{T}\mathbb{E}\left[Y_{n,t}^{(\bar{d}^t)}\mid\mathcal{LF}\right],\tag{36}\] where \(\mathbb{E}\left[Y_{n,t}^{(\bar{d}^t)}\mid\mathcal{LF}\right]\) is estimated using the SBE-PCR
algorithm. Here the set of possible schedules \(\mathcal{D}\) can be restricted for
institutional reasons or due to budget constraints. The example of the latter would be \(\mathcal{D}=\{\bar{d}^{T}:\sum_{t=T_{0}+q+1}^{T}p_{d_t}\cdot d_t \le B\}\) where \(p_{d_t}\) is the
price of treatment \(d_t\) and \(B\) is the budget. The example of the former is the independent selection process of each ECA, in which case K-SURE is equipped with \(\mathcal{D}=\{\bar{d}^{T}:d_t \in \{0,1\}\}\) and EXIM is equipped with \(\mathcal{D}=\{\bar{d}^{T}:d_t \in \{0,2\}\}\); this example is investigated below.
Using this policy learning framework, we calculate the best counterfactual allocation subject to the budget not exceeding the observed one. Figure 6 reports the average counterfactual trajectories across firms under the observed and optimal treatment schedules. Relative to the observed allocations, the optimal (cost-constrained) paths yield systematically higher outcomes in every post-intervention period. Across the four post-intervention periods, the optimal allocation raises average outcomes by roughly 25–40%. Moreover, these gains are achieved with lower resource use. The total cost of the optimal allocation is 354 supports, compared with 365 under the observed allocation.13 Overall, the findings indicate that the current allocation rules employed by the ECAs have substantial scope for improvement in terms of sequencing and timing of support. Our framework shows that policymakers can achieve better outcomes with fewer resources, highlighting the potential for more effective programs.
We next examine outcomes when insurance and loan are allocated independently, with the agencies acting separately, versus jointly, where decisions are coordinated as if by a single agency. As illustrated in Figure 7, average potential outcomes are higher under joint allocation in every period. Independent allocation uses 11,914 supports, while joint allocation requires 13,932. Despite the higher cost, efficiency is greater under joint allocation, with gains per unit cost rising from 88.6 to 124.6. These results indicate that coordination among agencies reduces misallocation across treatments and leverages complementarities, allowing resources to generate higher returns per cost.
Policymakers may want to estimate optimal allocation rules for new firms that are not observed in the data. To that end, we consider allocating support based on firms’ observed covariates. Specifically, we consider an allocation rule \(\bar{\delta}^{T}:\mathcal{X}\rightarrow[A]^{T}\) where \(\mathcal{X}\) is the support of pre-treatment covariate vector \(X_n\) and \[\begin{align} \label{eq:opt95alloc95x} \bar{\delta}^{T*} & \in\mathop{\mathrm{\arg\!\max}}_{\bar{\delta}^{T}\in \tilde{\mathcal{D}}}\sum_{t=T_{0}+q+1}^{T}\mathbb{E}\left[Y_{n,t}^{(\bar{\delta}^{t}(X_n))}\right], \end{align}\tag{37}\] where \(\tilde{\mathcal{D}}\) is the (possibly restricted) class of allocation rules. Note that \[\begin{align} \mathbb{E}\left[Y_{n,t}^{(\bar{\delta}^{t}(X_n))}\right]=\mathbb{E}\left[\sum_{\bar{d}^{t}}1\{\bar{\delta}^{t}(X_{n})=\bar{d}^{t}\}Y_{n,t}^{(\bar{d}^{t})}\right]=\mathbb{E}\left[\sum_{\bar{d}^{t}}1\{\bar{\delta}^{t}(X_{n})=\bar{d}^{t}\}E\left[Y_{n,t}^{(\bar{d}^{t})}|X_n\right]\right]. \end{align}\] Under Assumption 7 (that \(X_{n,k}\) has the latent factor structure), \[\begin{align} \mathbb{E}\left[\mathbb{E}\left[Y_{n,t}^{(\bar{d}^{t})}\mid\mathcal{LF}\right]\mid X_n\right] & =\mathbb{E}\left[Y_{n,t}^{(\bar{d}^{t})}\mid X_n\right], \end{align}\] and thus,\[\begin{align} \mathbb{E}\left[Y_{n,t}^{(\bar{\delta}^{t}(X_n))}\right]=\mathbb{E}\left[\sum_{\bar{d}^{t}}1\{\bar{\delta}^{t}(X_{n})=\bar{d}^{t}\}\mathbb{E}\left[Y_{n,t}^{(\bar{d}^{t})}\mid\mathcal{LF}\right]\right]. \end{align}\] Therefore, \(\bar{\delta}^{T*}\) is identified as we identify \(\mathbb{E}[Y_{n,t}^{(\bar{d}^{t})}\mid\mathcal{LF}]\) for all \(n,t\) and \(\bar{d}^{t}\) from Theorem 2 or 5. This argument is also useful in estimating \(\bar{\delta}^{T*}\) as we can take \(\mathbb{E}[Y_{n,t}^{(\bar{d}^{t})}\mid\mathcal{LF}]\) as a pseudo-outcome variable for a prediction problem with predictors \(X_n\) and for subsequent policy learning.
Based on this framework, we implement a tree-based policy learning algorithm that yields interpretable decision rules. Based on fourteen firm characteristics,14 the algorithm selects the most predictive variables and thresholds, partitioning firms into subgroups with distinct optimal treatment sequences. Each leaf in a decision tree corresponds to one recommended sequence, providing a transparent mapping from firm characteristics to intervention timing. Since considering all possible allocation rules in \(\tilde{\mathcal{D}}\) is computationally and practically infeasible, we restrict \(\tilde{\mathcal{D}}\) in the analysis.
First, we consider \(\tilde{\mathcal{D}}\) to be the set of early treatment \((d_1,d_2,0,0,0)\) and late treatment \((0,0,0,d_4,d_5)\) for \(d_t\in\{1,2\}\), yielding eight possible allocation rules in total. Figure 8 shows the optimal policy tree. We restrict attention to insurance or loan only, reflecting computational tractability and the budget constraints of policymakers, for whom providing multiple supports simultaneously is costly. The optimal decision tree suggests that firms with smaller wage bills are assigned to late support. Within this group, the number of workers determines the sequencing of instruments. Fewer workers lead to insurance then loan, while more workers lead to loan then insurance, reflecting payroll-driven liquidity needs. Among firms with larger wage bills, capital stock is decisive. Those with relatively low tangible assets are directed to early support, while those with stronger asset positions can defer to late support. It means firms with high labor costs but little collateralizable capital cannot easily finance wages internally and thus require earlier intervention.
Next, we consider allocation rules that not only concern timing but also spacing over time. In particular, we restrict \(\tilde{\mathcal{D}}\) to be the set of front-loaded \((d_1,d_2,d_3,0,0)\), evenly-spaced \((d_1,0,d_3,0,d_5)\), and back-loaded \((0,0,d_3,d_4,d_5)\) treatments for \(d_t\in\{1,2\}\), yielding twenty-four sequences in total. : high wages lead to front-loading with more loans, while lower wages lead to back-loading. Firms with moderate debt (around 0.55–0.61) are also routed to back-loading, consistent with temporary liquidity management. Interestingly, some low-debt firms are also assigned to front-loading, reflecting the model’s prediction that these firms gain more from early expansionary financing than from delayed support.
Remark 4. Note that, by equations 21 and 35 , \(\mathbb{E}\left[Y_{n,t}^{(\bar{d}^{t})}\mid\mathcal{LF}\right]=\left\langle \omega(\bar{d}^{T}),Y_{n}\right\rangle\) for an appropriate vector \(Y_n\) of observed outcomes and a vector \(\omega(\bar{d}^{T})\) of parameters, which implies that our objective function has the outcome-weighted form [68]: \(\mathbb{E}[\sum_{\bar{d}^{T}\in[A]^{T}}1\{\bar{\delta}^{T}(X_{n})=\bar{d}^{T}\}\left\langle \omega(\bar{d}^{T}),Y_{n}\right\rangle ]\). Therefore, analogous to [69] among others, we can show consistency of and bounds on the excess risk of the estimated policy by (i) using the convex surrogate version of the objective function and (ii) under the condition that \(\hat{\omega}(d)\) converges to \(\omega(d)\) at a certain rate. The condition (ii) can be guaranteed by our convergence rates in Theorem 4 or 6. We omit this analysis for succinctness.
In this work, we formulate a causal framework for dynamic treatment effects under unobserved confounding using panel data. We propose a latent factor model, which admits linear time-varying and time-invariant dynamical systems as special cases. Depending on the structure placed on this factor model, we quantify the trade-off on the sample complexity and the level of adaptivity allowed in the intervention policy, for estimating counterfactual mean outcomes. The estimated counterfactual outcomes are useful in estimating the impact of particular treatment schedule relative to another, as well as the optimal rules of allocating treatment schedules. We showcase this usefulness in the context of government’s financial support. We hope this work spurs further research connecting the growing fields of synthetic controls and panel data methods with dynamic treatment models studied in econometrics, and potentially sequential learning methods such as reinforcement learning studied in computer science.
Verifying Assumption 4 holds. In what follows, all the conditional expectations are also conditioned on the latent factors \(\mathcal{LF}\). However, for shorthand notation, we omit that conditioning. Note that: \[\begin{align} \mathbb{E}\left[Y_{n,t}^{(\bar{d}^t)} - Y_{n,t}^{(\bar{0}_t)}\right] =~& \sum_{\ell=1}^t \mathbb{E}\left[Y_{n,t}^{(\bar{d}^\ell, \underline{0}^{\ell+1})} - Y_{n,t}^{(\bar{d}^{\ell-1}, \underline{0}^{\ell})}\right] \label{eq:blip95telescoping} \end{align}\tag{38}\] We now prove that: \[\begin{align} Q_{n,t} \mathrel{\vcenter{:}}= \mathbb{E}\left[Y_{n,t}^{(\bar{d}^\ell, \underline{0}^{\ell+1})} - Y_{n,t}^{(\bar{d}^{\ell-1}, \underline{0}^{\ell})}\right] = \left\langle \psi_{n}^{t,\ell}, w_{d^{\ell}} - w_{0^\ell} \right\rangle \end{align}\] We establish this via a nested mean argument. Note \[\begin{align} Q_{n,t} =~& \mathbb{E}\left[\mathbb{E}\left[Y_{n,t}^{(\bar{d}_n^\ell, \underline{0}^{\ell+1})} - Y_{n,t}^{(\bar{d}_n^{\ell-1}, \underline{0}^{\ell})} \mid S_n^0\right]\right] \nonumber \\=~& \mathbb{E}\left[\mathbb{E}\left[Y_{n,t}^{(\bar{d}_n^\ell, \underline{0}^{\ell+1})} - Y_{n,t}^{(\bar{d}_n^{\ell-1}, \underline{0}^{\ell})} \mid S_n^{0}, D_n^1=d^{1}\right]\right] \label{eq:using95seq95exog} \end{align}\tag{39}\] where in 39 , we have used 10 . Now as our inductive step, suppose that we have shown: \[\begin{align} Q_{n,t} =~& \mathbb{E}\left[\mathbb{E}\left[\ldots \mathbb{E}\left[Y_{n,t}^{(\bar{d}^\ell, \underline{0}^{\ell+1})} - Y_{n,t}^{(\bar{d}^{\ell-1}, \underline{0}^{\ell})} \mid \bar{S}_n^{q-1}, \bar{D}_n^{q} = \bar{d}^{q}\right]\ldots \mid S_n^0, D_n^1=d^1\right]\right] \end{align}\] Then, \[\begin{align} Q_{n,t}=~& \mathbb{E}\left[\mathbb{E}\left[\ldots\mathbb{E}\left[\mathbb{E}\left[Y_{n,t}^{(\bar{d}^\ell, \underline{0}^{\ell+1})} - Y_{n,t}^{(\bar{d}^{\ell-1}, \underline{0}^{\ell})} \mid \bar{S}_n^{q}, \bar{D}_n^{q}=\bar{d}^q\right] \mid \bar{S}_n^{q-1}, \bar{D}_n^{q}=\bar{d}^{q}\right]\ldots \mid S_n^0, D_n^1=d^1\right]\right] \nonumber \\ =~& \mathbb{E}\left[\mathbb{E}\left[\ldots\mathbb{E}\left[\mathbb{E}\left[Y_{n,t}^{(\bar{d}^\ell, \underline{0}^{\ell+1})} - Y_{n,t}^{(\bar{d}^{\ell-1}, \underline{0}^{\ell})} \mid \bar{S}_n^{q}, \bar{D}_n^{q+1}=\bar{d}^{q+1}\right] \mid \bar{S}_n^{q-1}, \bar{D}_n^{q}=\bar{d}^{q}\right]\ldots\mid S_n^0, D_n^1=d^1\right]\right], \label{eq:blip95inducive95proof} \end{align}\tag{40}\] where in 40 , we have again used 10 . This concludes the inductive proof. Thus, we have \[\begin{align} Q_{n,t}=~& \mathbb{E}\left[\mathbb{E}\left[\ldots\mathbb{E}\left[Y_{n,t}^{(\bar{d}^\ell, \underline{0}^{\ell+1})} - Y_{n,t}^{(\bar{d}^{\ell-1}, \underline{0}^{\ell})} \mid \bar{S}_n^{\ell-1}, \bar{D}_n^{\ell}=\bar{d}^{\ell}\right]\ldots\mid S_n^0, D_n^1=d^1\right]\right] \nonumber \\=~& \mathbb{E}\left[\mathbb{E}\left[\ldots\gamma_{n, t, \ell}(d^\ell)\ldots\mid S_n^0, D_n^1=d^1\right]\right] = \left\langle \psi_{n}^{t,\ell}, w_{d^{\ell}} - w_{0^\ell} \right\rangle \label{eq:blip95key95representation} \end{align}\tag{41}\] where in 41 , we have used 11 and the fact that \(\left\langle \psi_{n}^{t,\ell}, w_{d^{\ell}} - w_{0^\ell} \right\rangle\) is independent of \(S^{\ell}_n\) and \(\bar{D}^{\ell - 1}_n\). Re-arranging 38 and 41 , we have: \[\begin{align} \label{eqn:recurs} \mathbb{E}\left[ Y_{n,t}^{(\bar{d}^t)}\right] = \mathbb{E}\left[Y_{n,t}^{(\bar{0}^t)}\right] + \sum_{\ell=1}^t \gamma_{n,t,\ell}({d}^\ell). \end{align}\tag{42}\] Combining 42 and 12 implies Assumption 4 holds.
Verifying Assumption 6 holds. Assumption 6 is immediately implied by 11 and a simple application of the tower law of expectations. In particular, we integrate \(\bar{S}_{n,t-1}\) out of both sides of 11 .
By Assumption 2, \[\begin{align} \mathbb{E}\left[Y_{n, T}^{(\bar{d}^T)} \mid \mathcal{LF}\right] &= \mathbb{E}\left[\left\langle v_{n, T}, \;w_{\bar{d}^T} \right\rangle + \varepsilon^{(\bar{d}^T)}_{n, t} \mid \mathcal{LF}\right] \nonumber \\ &= \mathbb{E}\left[\left\langle v_{n, T}, \;w_{\bar{d}^T} \right\rangle \mid \mathcal{LF}\right] \nonumber \\ &= \left\langle v_{n, T}, \;w_{\bar{d}^T} \right\rangle \mid \mathcal{LF}\tag{43} \\ &= \left\langle v_{n, T}, \;w_{\bar{d}^T} \right\rangle \mid \mathcal{LF}, \mathcal{I}^{\bar{d}^T}, \tag{44} \end{align}\] where 43 and 44 follow since \(v_{n, T}, w_{\bar{d}^T}\) are deterministic conditional on the latent factors.
Then by Assumption 3, \[\begin{align} \left\langle v_{n, T}, \;w_{\bar{d}^T} \right\rangle \mid \mathcal{LF}, \mathcal{I}^{\bar{d}^T} &= \sum_{j \in \mathcal{I}^{\bar{d}^T}} \beta^{n, \mathcal{I}^{\bar{d}^T}}_j\left\langle v_{j, T}, \;w_{\bar{d}^T} \right\rangle \mid \mathcal{LF}, \mathcal{I}^{\bar{d}^T} \nonumber \end{align}\]
Then by appealing to the conditional mean exogeneity of \(\varepsilon_{j, T}^{(\bar{d}^T)}\) in Definition 2, we have \[\begin{align} &\sum_{j \in \mathcal{I}^{\bar{d}^T}} \beta^{n, \mathcal{I}^{\bar{d}^T}}_j \left\langle v_{j, T}, \;w_{\bar{d}^T} \right\rangle \mid \mathcal{LF}, \mathcal{I}^{\bar{d}^T} \nonumber \\ &= \sum_{j \in \mathcal{I}^{\bar{d}^T}} \beta^{n, \mathcal{I}^{\bar{d}^T}}_j \left\langle v_{j, T}, \;w_{\bar{d}^T} \right\rangle \mid \mathcal{LF}, \mathcal{I}^{\bar{d}^T} + \sum_{j \in \mathcal{I}^{\bar{d}^T}} \beta^{n, \mathcal{I}^{\bar{d}^T}}_j \mathbb{E}[\varepsilon_{j, T}^{(\bar{d}^T)} \mid \mathcal{LF}, \mathcal{I}^{\bar{d}^T} ] \nonumber \\ &= \sum_{j \in \mathcal{I}^{\bar{d}^T}} \beta^{n, \mathcal{I}^{\bar{d}^T}}_j \mathbb{E}[Y_{j, T}^{(\bar{d}^T)} \mid \mathcal{LF}, \mathcal{I}^{\bar{d}^T} ], \tag{45} \\ &= \sum_{j \in \mathcal{I}^{\bar{d}^T}} \beta^{n, \mathcal{I}^{\bar{d}^T}}_j \mathbb{E}[Y_{j, T} \mid \mathcal{LF}, \mathcal{I}^{\bar{d}^T} ], \tag{46} \end{align}\] where 45 follows from Assumption 2; 46 follows from Assumption 1 and Definition 2.
This completes the proof.
Recall \(z_{n, t}\) is the latent state of unit \(n\) if it undergoes action sequence \(\bar{d}^t\). By a simple recursion we have \[\begin{align} z^{(\bar{d}^t)}_{n, t} &= \sum^{t - 1}_{\ell = 1} \left(\prod^t_{k = \ell + 1}\boldsymbol{B}_{n, k} \right) \boldsymbol{C}_{n, {\ell}} \;w_{d_\ell} + \boldsymbol{C}_{n, {t}} \;w_{d_t} + \sum^{t - 1}_{\ell = 1} \left(\prod^t_{k = \ell + 1}\boldsymbol{B}_{n, k}\right) \eta_{n, \ell} + \eta_{n, t} \end{align}\] Hence, \[\begin{align} &Y^{(\bar{d}^t)}_{n, t} \\ &= \left\langle \theta_{n, t}, \; \sum^{t - 1}_{\ell = 1} \left(\prod^t_{k = \ell + 1}\boldsymbol{B}_{n, k} \right) \boldsymbol{C}_{n, {\ell}} \;w_{d_\ell} + \boldsymbol{C}_{n, {t}} \;w_{d_t} + \sum^{t - 1}_{\ell = 1} \left(\prod^t_{k = \ell + 1}\boldsymbol{B}_{n, k}\right) \eta_{n, \ell} + \eta_{n, t} \right\rangle + \langle \tilde{\theta}_{n, t}, w_{d_t} \rangle + \tilde{\eta}_{n, t} \\ &= \sum^{t}_{\ell = 1} \Big(\left\langle \psi^{t, \ell}_{n}, w_{d_\ell} \right\rangle + \varepsilon_{n, t, \ell} \Big), \end{align}\] where in the last line we use the definitions of \(\psi^{t, \ell}_{n}\) and \(\varepsilon_{n, t, \ell}\) in the proposition statement. This completes the proof.
For simplicity, we omit the conditioning on \(\mathcal{LF}\) in all derivations; all expectations are conditioned on \(\mathcal{LF}\).
Verifying 14 . First, we verify 14 holds, which allows us to express the counterfactual outcomes, in terms of the blips and the baseline. For all \(n \in [N]\), using Assumption 4 we have: \[\begin{align} \mathbb{E}[Y^{(\bar{d}^T)}_{n, T} \mid \mathcal{LF}] \nonumber &= \mathbb{E}[Y^{(\bar{d}^T)}_{n, T} - Y^{(\bar{0}^T)}_{n, T} \mid \mathcal{LF}] + \mathbb{E}[Y^{(\bar{0}^T)}_{n, T} \mid \mathcal{LF}] \nonumber \\ &= \mathbb{E}\left[ \sum^{T}_{t = 1} \left\langle \psi^{T, t}_{n}, w_{d_t} - w_{0_t} \right\rangle + \varepsilon^{(\bar{d}^T)}_{n, T} - \varepsilon^{(\bar{0}^T)}_{n, T} \mid \mathcal{LF}\right] + \mathbb{E}\left[\sum^{T}_{t = 1} \left\langle \psi^{T, t}_{n}, w_{0_t} \right\rangle + \varepsilon^{(\bar{0}^T)}_{n, T} \mid \mathcal{LF}\right] \nonumber \\ &= \sum^{T}_{t = 1} \gamma_{n, T, t}(d_t) \mid \mathcal{LF}+ b_{n, T} \mid \mathcal{LF}\nonumber \end{align}\]
We first show 15 holds. For \(j \in \mathcal{I}^0_T\): \[\begin{align} b_{j, T} \mid \mathcal{LF} &= \sum^T_{t = 1} \left\langle \psi^{T, t}_{j}, w_{0_t} \right\rangle \mid \mathcal{LF} = \mathbb{E}\left[ \sum^T_{t = 1} \left\langle \psi^{T, t}_{j}, w_{0_t} \right\rangle + \varepsilon^{(\bar{0}^T)}_{j, T} \mid \mathcal{LF}\right] \tag{47} \\ &= \mathbb{E}\left[ \sum^T_{t = 1} \left\langle \psi^{T, t}_{j}, w_{0_t} \right\rangle + \varepsilon^{(\bar{0}^T)}_{j, T} \mid \mathcal{LF}, \mathcal{I}^0_T \right] \tag{48} \\ &= \mathbb{E}\left[Y_{j,T}^{(\bar{0}^T)} \mid \mathcal{LF}, j \in \mathcal{I}^0_T \right] \tag{49} \\&= \mathbb{E}\left[Y_{j,T} \mid \mathcal{LF}, j \in \mathcal{I}^0_T \right], \tag{50} \end{align}\] where 47 and 49 follow from Assumption 4; 48 follows from the fact that \(\left\langle \psi^{T, t}_{j}, w_{0_t} \right\rangle\) is deterministic conditional on \(\mathcal{LF}\), and that \(\mathbb{E}[\varepsilon^{(\bar{0}^T)}_{j, T} \mid \mathcal{LF}, \mathcal{I}^0_T] = \mathbb{E}[\varepsilon^{(\bar{0}^T)}_{j, T} \mid \mathcal{LF}]\) as seen in the definition of \(\mathcal{I}^0_T\); 50 follows from Assumption 1.
Next we show 16 holds. For \(i \notin \mathcal{I}^0_T\): \[\begin{align} b_{i, T} \mid \mathcal{LF} &= \sum^{T}_{t = 1} \left\langle \psi^{T, t}_{i}, w_{0_t} \right\rangle \mid \mathcal{LF}\nonumber \\&= \sum^{T}_{t = 1} \left\langle \psi^{T, t}_{i}, w_{0_t} \right\rangle \mid \mathcal{LF}, \mathcal{I}^0_T \tag{51} \\&= \sum^{T}_{t = 1} \sum_{j \in \mathcal{I}^0_T} \beta_j^{i,\mathcal{I}^0_T} \left\langle \psi^{T, t}_{j}, w_{0_t} \right\rangle \mid \mathcal{LF}, \mathcal{I}^0_T \tag{52} \\&= \sum_{j \in \mathcal{I}^0_T} \beta_j^{i,\mathcal{I}^0_T} b_{j, T} \mid \mathcal{LF}, \mathcal{I}^0_T \nonumber \end{align}\] where 51 follows from the fact that \(\left\langle \psi^{T, t}_{i}, w_{0_t} \right\rangle\) is deterministic conditional on \(\mathcal{LF}\); 52 follows from Assumption 5;
We first show 17 holds. For all \(d \in [A]\) and \(j \in \mathcal{I}^d_T\): \[\begin{align} & \gamma_{j, T, T}(d) \mid \mathcal{LF}\nonumber = \left\langle \psi^{T, T}_{j}, w_{d} - w_{0_T} \right\rangle \mid \mathcal{LF}\nonumber \\ &= \mathbb{E}\left[\left\langle \psi_{j}^{T,T}, w_{d} - w_{0_T} \right\rangle + \varepsilon^{(\underline{0}^{T - 1}, d)}_{j, T} \pm \sum^{T - 1}_{t = 1} \left\langle \psi_{j}^{T,t}, w_{0_t} \right\rangle \mid \mathcal{LF}\right] \tag{53} \\ &= \mathbb{E}\left[\left\langle \psi_{j}^{T,T}, w_{d} \right\rangle + \varepsilon^{(\underline{0}^{T - 1}, d)}_{j, T} + \sum^{T - 1}_{t = 1} \left\langle \psi_{j}^{T,t}, w_{0_t} \right\rangle \mid \mathcal{LF}\right] - \sum^{T}_{t = 1} \left\langle \psi_{j}^{T,t}, w_{0_t} \right\rangle \mid \mathcal{LF}\tag{54} \\ &= \mathbb{E}\left[\left\langle \psi_{j}^{T,T}, w_{d} \right\rangle + \varepsilon^{(\underline{0}^{T - 1}, d)}_{j, T} + \sum^{T - 1}_{t = 1} \left\langle \psi_{j}^{T,t}, w_{0_t} \right\rangle \mid \mathcal{LF}, j \in \mathcal{I}^d_T \right] - b_{j, T} \mid \mathcal{LF}\nonumber \\ &= \mathbb{E}[Y^{(\bar{D}^T_j)}_{j, T} \mid \mathcal{LF}, \;j \in \mathcal{I}^d_T] - b_{j, T} \mid \mathcal{LF}\tag{55} \\ &= \mathbb{E}[Y_{j, T} \mid \mathcal{LF}, \;j \in \mathcal{I}^d_T] - b_{j, T} \mid \mathcal{LF}\tag{56} \end{align}\] where 53 , 54 follow from Assumption 4; 55 follows from the definition of \(\mathcal{I}^d_t\) and Assumption 4; 56 follows from Assumption 1.
Next we show 18 holds. For \(i \notin \mathcal{I}^d_T\) \[\begin{align} \gamma_{i, T, T}(d) \mid \mathcal{LF} &= \left\langle \psi^{T, T}_{i}, w_{d} - w_{0_T} \right\rangle \mid \mathcal{LF} = \left\langle \psi^{T, T}_{i}, w_{d} - w_{0_T} \right\rangle \mid \mathcal{LF}, \mathcal{I}^d_T \tag{57} \\ &= \sum_{j \in \mathcal{I}^d_T} \beta^{i, \mathcal{I}^d_T}_{j}\left\langle \psi_{j}^{T,T}, w_{d} - w_{0_T} \right\rangle \mid \mathcal{LF}, \mathcal{I}^d_T \tag{58} \\ &= \sum_{j \in \mathcal{I}^d_T} \beta^{i, \mathcal{I}^d_T}_{j} \gamma_{j, T, T}(d) \mid \mathcal{LF}, \mathcal{I}^d_T \nonumber \end{align}\] where 57 follows from the fact that \(\left\langle \psi^{T, T}_{i}, w_{d} - w_{0_T} \right\rangle\) is deterministic conditional on \(\mathcal{LF}\); 58 follows from Assumption 5.
We first show 19 holds. For all \(d \in [A]\), \(t < T\), \(j \in \mathcal{I}^d_t\):
\[\begin{align} &\mathbb{E}\left[ Y_{j, T} - Y^{(\bar{0}_T)}_{j, T} \mid \mathcal{LF}, j \in \mathcal{I}^d_t \right] = \mathbb{E}\left[ Y^{(\bar{D}^T_j)}_{j, T} - Y^{(\bar{0}_T)}_{j, T} \mid \mathcal{LF}, j \in \mathcal{I}^d_t \right] \tag{59} \\ &= \mathbb{E}\left[ Y^{(\bar{D}^T_j)}_{j, T} - Y^{(\bar{D}^{t - 1}, \underline{0}^t)}_{j, T} \mid \mathcal{LF}, j \in \mathcal{I}^d_t \right] \tag{60} \\ &= \sum^T_{\ell = t} \mathbb{E}\left[ Y^{(\bar{D}^\ell_j, \underline{0}^{\ell + 1})}_{j, T} - Y^{(\bar{D}^{\ell - 1}_j, \underline{0}^{\ell})}_{j, T} \mid \mathcal{LF}, j \in \mathcal{I}^d_t \right] \tag{61} \end{align}\] where 59 follows from Assumption 1; 60 uses that for \(j \in \mathcal{I}^d_t\), \(\bar{D}^{t}_n = (0_1, \dots, 0_{t-1}, d_t)\), and Assumption 1. Then, \[\begin{align} &\sum^T_{\ell = t} \mathbb{E}\left[ Y^{(\bar{D}^\ell_j, \underline{0}^{\ell + 1})}_{j, T} - Y^{(\bar{D}^{\ell - 1}_j, \underline{0}^{\ell})}_{j, T} \mid \mathcal{LF}, j \in \mathcal{I}^d_t \right] \nonumber \\ &= \sum^T_{\ell = t} \mathbb{E}\left[ \left\langle \psi^{T, \ell}_j, w_{D_{j, \ell}} -w_{0_\ell} \right\rangle + \varepsilon^{(\bar{D}^\ell_j, \underline{0}^{\ell + 1})}_{j, T} - \varepsilon^{(\bar{D}^{\ell - 1}_j, \underline{0}^{\ell})}_{j, T} \mid \mathcal{LF}, j \in \mathcal{I}^d_t \right] \tag{62} \\ &= \mathbb{E}\left[ \left\langle \psi^{T, t}_j, w_{D_{j, t}} -w_{0_t} \right\rangle + \varepsilon^{(\bar{D}^\ell_j, \underline{0}^{\ell + 1})}_{j, T} - \varepsilon^{(\bar{D}^{\ell - 1}_j, \underline{0}^{\ell})}_{j, T} \mid \mathcal{LF}, j \in \mathcal{I}^d_t \right] + \sum^T_{\ell = t + 1} \mathbb{E}\left[\left\langle \psi^{T, \ell}_j, w_{D_{j, \ell}} -w_{0_\ell} \right\rangle \mid \mathcal{LF}, j \in \mathcal{I}^d_t \right] \nonumber \\ &= \mathbb{E}\left[ \left\langle \psi^{T, t}_j, w_{d} -w_{0_t} \right\rangle + \varepsilon^{(\bar{D}^\ell_j, \underline{0}^{\ell + 1})}_{j, T} - \varepsilon^{(\bar{D}^{\ell - 1}_j, \underline{0}^{\ell})}_{j, T} \mid \mathcal{LF}, j \in \mathcal{I}^d_t \right] + \sum^T_{\ell = t + 1}\mathbb{E}\left[\left\langle \psi^{T, \ell}_j, w_{D_{j, \ell}} -w_{0_\ell} \right\rangle \mid \mathcal{LF}, j \in \mathcal{I}^d_t \right] \tag{63} \\ &= \left\langle \psi^{T, t}_j, w_{d} -w_{0_t} \right\rangle\mid \mathcal{LF}+ \sum^T_{\ell = t + 1}\mathbb{E}\left[\left\langle \psi^{T, \ell}_j, w_{D_{j, \ell}} -w_{0_\ell} \right\rangle \mid \mathcal{LF}, j \in \mathcal{I}^d_t \right] \nonumber \\ & \quad \quad + \mathbb{E}\left[ \varepsilon^{(\bar{D}^\ell_j, \underline{0}^{\ell + 1})}_{j, T} - \varepsilon^{(\bar{D}^{\ell - 1}_j, \underline{0}^{\ell})}_{j, T} \mid \mathcal{LF}, j \in \mathcal{I}^d_t \right] \nonumber \\ &= \left\langle \psi^{T, t}_j, w_{d} -w_{0_t} \right\rangle\mid \mathcal{LF}+ \sum^T_{\ell = t + 1}\mathbb{E}\left[\left\langle \psi^{T, \ell}_j, w_{D_{j, \ell}} -w_{0_\ell} \right\rangle \mid \mathcal{LF}, j \in \mathcal{I}^d_t \right] \nonumber \\ & \quad \quad + \mathbb{E}\left[ \mathbb{E}\left[\varepsilon^{(\bar{\delta}^\ell, \underline{0}^{\ell + 1})}_{j, T} - \varepsilon^{(\bar{\delta}^{\ell - 1}, \underline{0}^{\ell})}_{j, T} \mid \bar{D}^{\ell}_{j} = \bar{(\delta}^\ell, \mathcal{LF}, j \in \mathcal{I}^d_t \right] \right] \nonumber \\ &= \left\langle \psi^{T, t}_j, w_{d} -w_{0_t} \right\rangle\mid \mathcal{LF}+ \sum^T_{\ell = t + 1} \left\langle \psi^{T, \ell}_j, w_{D_{j, \ell}} -w_{0_\ell} \right\rangle \mid \mathcal{LF}, j \in \mathcal{I}^d_t \tag{64} \\ &= \gamma_{j, T, t}(d) \mid \mathcal{LF}+ \sum^T_{\ell = t + 1} \gamma_{j, T, \ell}(D_{j, \ell}) \mid \mathcal{LF}\tag{65} \end{align}\] where 62 follows from Assumption 4; 63 follows from the definition of \(\mathcal{I}^d_t\), i.e., for \(j \in \mathcal{I}^d_t\), \(\bar{D}^{t}_j = (\bar{0}^{t - 1}, d)\) and that \(\forall \;\delta \in [A], \ell \in [T], \;\mathbb{E}[\varepsilon^{(\delta)}_{j, T, \ell} \mid \mathcal{LF}, \bar{D}^t_{j}] = \mathbb{E}[\varepsilon^{(\delta)}_{j, T, \ell} \mid \mathcal{LF}]\); 64 follows from Assumption 6, where we require the last term on the l.h.s. of the equality to be zero only for \(j\in\mathcal{I}^d_t\).
Re-arranging 65 we have that, \[\begin{align} \gamma_{j, T, t}(d) \mid \mathcal{LF} &= \mathbb{E}\left[ Y_{j, t} - Y^{(\bar{0}_T)}_{j, T} \mid \mathcal{LF}, j \in \mathcal{I}^d_t \right] - \sum^T_{\ell = t + 1} \gamma_{j, T, \ell}(D_{j, \ell}) \mid \mathcal{LF}\nonumber \\ &= \mathbb{E}\left[ Y_{j, t} \mid \mathcal{LF}, j \in \mathcal{I}^d_t \right] - \mathbb{E}\left[ Y^{(\bar{0}_T)}_{j, T} \mid \mathcal{LF}\right] - \sum^T_{\ell = t + 1} \gamma_{j, T, \ell}(D_{j, \ell}) \mid \mathcal{LF}\tag{66} \\ &= \mathbb{E}\left[ Y_{j, t} \mid \mathcal{LF}, j \in \mathcal{I}^d_t \right] - b_{j, T} \mid \mathcal{LF}- \sum^T_{\ell = t + 1} \gamma_{j, T, \ell}(D_{j, \ell}) \mid \mathcal{LF}\tag{67} \end{align}\] where 66 follows from the definition of \(\mathcal{I}^d_t\); 67 follows from Assumption 4.
Next we show 20 holds. For all \(d \in [A]\), \(t < T\), \(i \notin \mathcal{I}^d_t\): \[\begin{align} \gamma_{i, T, t}(d) \mid \mathcal{LF} &= \left\langle \psi^{T, t}_{i}, w_{d} - w_{0_t} \right\rangle \mid \mathcal{LF}\tag{68} \\ &= \left\langle \psi^{T, t}_{i}, w_{d} - w_{0_t} \right\rangle \mid \mathcal{LF}, \mathcal{I}^d_t \tag{69} \\ &= \sum_{j \in \mathcal{I}^d_t} \beta^{i, \mathcal{I}^d_t}_{j}\left\langle \psi_{j}^{T,t}, w_{d} - w_{0_t} \right\rangle \mid \mathcal{LF}, \mathcal{I}^d_T \tag{70} \\ &= \sum_{j \in \mathcal{I}^d_t} \beta^{i, \mathcal{I}^d_t}_{j} \gamma_{j, T, t}(d) \mid \mathcal{LF}, \mathcal{I}^d_t \nonumber \end{align}\] where 69 follows from the the fact that \(\left\langle \psi^{T, t}_{i}, w_{d} - w_{0_t} \right\rangle\) is deterministic conditional on \(\mathcal{LF}\); 70 follows from Assumption 5;
We point out that Assumption 7 in its base form does not allow for time-varying covariates. Specifically, it assumes access to \(p\) covariates for each unit, each with respect to their unit factor at the terminal time, i.e., \(v_{n,T}\). Furthermore, as seen in Theorem 3 we require \(p \to \infty\) and fixed \(T\) for consistency which seems highly unlikely in practice.
However, notice in Theorem 4 we are able to send \(T \to \infty\) as well. As such, inclusion of time varying covariates allow for \(p \to \infty\) be justified. To that end, here would be a general construction of such covariates.
Assumption 22. For each unit \(n \in [N]\), we have covariates \(X_n = (X_{n,1}^{\top}, X_{n,2}^{\top}, \dots, X_{n,T}^{\top})^{\top} \in \mathbb{R}^{pT}\). Specifically, for any \(t \in [T]\) we have \(X_{n,t} \in \mathbb{R}^p\) where \[X_{n,t,k} = \langle v_{n,t}, \rho^t_k \rangle + \varepsilon_{n,t,k}\] for any \(k \in [p]\) where \(v_{n, t}\) is the unit latent factor defined in Assumptions 2 and \(\varepsilon_{ntk}\) is mean-zero noise. Specifically, we collect \(p\) features at each time step \(t \in [T]\). Denote \(X \in \mathbb{R}^{pT \times N} = [X_1, \dots, X_N]\).
Notice that with the added flexibility in the above formulation we can use observed values as covariates as well.
Here is a sufficient condition on which we have row-space inclusion.
Lemma 1. Let the setup of Assumption 12 hold and denote \[V_{\mathcal{I}^d_t} = ([v_{j,T}]_{j \in \mathcal{I}^d_t})^{\top} \in \mathbb{R}^{|\mathcal{I}^d_t| \times mT} \quad \text{and} \quad b = ([\langle v_{j,T}, w_{(\bar{D}_{j}^T)}\rangle]_{j \in \mathcal{I}^d_t})^{\top} \in \mathbb{R}^{|\mathcal{I}^d_t|}.\] If \(\text{span}(\{\rho_i\}_{i \in [p]}) \cap \{y \in \mathbb{R}^{mT}: V_{\mathcal{I}^d_t}y = b\}\) is non-empty, then Assumption 12 holds for \(\rho_i\) as defined in Assumption 7.
Proof. We require weights \(\xi_i^{(d,t)}\) such that following holds for any \(j \in \mathcal{I}^d_t\) \[\mathbb{E}[Y_{j,T}|\mathcal{LF}] = \sum_{i = 1}^p \xi_i^{(d,t)} \cdot \mathbb{E}[(X_{\mathcal{I}^d_t})_{ij}|\mathcal{LF}].\]
Note that \(\mathbb{E}[Y_{j,T}|\mathcal{LF}] = \langle v_{j,T}, w_{(\bar{D}_{j}^T)}\rangle\) for any \(j \in \mathcal{I}^d_t\). Furthermore, \(\mathbb{E}[(X_{\mathcal{I}^d_t})_{ij}|\mathcal{LF}] = \langle v_{j,T}, \rho_i \rangle\) under the formulation presented in Assumption 7. As such, if we define \(B_{ij} = \langle v_{i,T}, \rho_j \rangle\) for all \(i \in \mathcal{I}^d_t\) and \(j \in [p]\) then our problem is equivalent to there being a solution \(\xi^{(d,t)} \in \mathbb{R}^p\) to the linear system \(B\xi^{(d,t)} = b\). To conclude notice that \(B = V_{\mathcal{I}^d_T}[\rho_1, \dots, \rho_p]\). ◻
In general, the point is if the covariates defined by \(\{\rho_i\}_{i \in [p]}\) are sufficiently expressive then row-space inclusion holds. Specifically, we seek to maximize the dimension of their span, as would occur if they were linearly independent. The next result is a consequence of Assumption 12 and will be essential in establishing consistency.
Lemma 2. Let Assumption 12 hold. Then for any \(d \in [A]\) and \(t \in [T]\) there exists \(\alpha^{(d,t)} \in \mathbb{R}^p\) such that \[w_{(0_1, \dots, 0_{t-1}, d, 0_{t+1}, \dots, 0_T)} = \sum_{i = 1}^p \alpha_{i}^{(d,t)} \cdot \rho_i.\] That is \(w_{(0_1, \dots, 0_{t-1}, d, 0_{t+1}, \dots, 0_T)} \in \text{span}(\{\rho_i\}_{i \in [p]})\).
Proof. By Assumption 12, for any unit \(j \in \mathcal{I}^d_t\) and any collection of treatment sequences \((D_{j,t+1}, \dots, D_{j,T})_{j \in \mathcal{I}^d_t}\), there exists a solution to the following system: \[V_{\mathcal{I}^d_t}[\rho_1, \dots, \rho_p]\xi = [\langle v_{j,T}, w_{(0_1, \dots, 0_{t-1}, d, D_{j,t+1}, \dots, D_{j,T})}\rangle]^{\top}_{j \in \mathcal{I}_t^d}.\] As such, there exists a solution for the following set of sequences \((0_{t+1}, \dots, 0_{T})_{j \in \mathcal{I}^d_t}\) as well. In that case the system can be written as \[V_{\mathcal{I}^d_t}[\rho_1, \dots, \rho_p]\xi = V_{\mathcal{I}^d_t}w_{(0_1, \dots, 0_{t-1}, d,0_{t+1}, \dots, 0_{T})},\] which we know to have a solution \(\xi\). This implies \([\rho_1, \dots, \rho_p]\xi - w_{(0_1, \dots, 0_{t-1}, d,0_{t+1}, \dots, 0_{T})} \in \text{ker}(V_{\mathcal{I}^d_t})\).
By assumption we know \(\text{rank}(V_{\mathcal{I}^d_t}) = mT\) or equivalently the matrix has full column rank. See the discussion under Assumption 3 in Section 3 justifying such an assumption for settings when \(\mathcal{I}_{t}^d\) is sufficiently large. This implies that \(\text{ker}(V_{\mathcal{I}^d_t}) = \{0\}\), by Rank-Nullity Theorem.
Combining the above results we know \([\rho_1, \dots, \rho_p]\xi - w_{(0_1, \dots, 0_{t-1}, d,0_{t+1}, \dots, 0_{T})} = 0\). Since, this is true for any \(d \in [A]\) and \(t \in [T]\) we have the desired result upon rearranging. ◻
Lemma 3. Let Assumption 12 hold. Then for all \(d \in [A]\) and \(t \in [T]\) there exist \(\xi_i^{(d,t)'} \in \mathbb{R}^p\) such that \[\mathbb{E}[Y_{j,T}^{(\mkern 1.3mu\overline{\mkern-1.3muD_{j, -\ell } \cup 0_{\ell}\mkern-1.3mu}\mkern 1.3mu)}|\mathcal{LF}] = \sum_{i = 1}^p \xi_i^{(d,t)'}\cdot \mathbb{E}[(X_{\mathcal{I}_t^d})_{ij}|\mathcal{LF}],\] where \(\mkern 1.3mu\overline{\mkern-1.3muD_{j, -\ell } \cup 0_{\ell}\mkern-1.3mu}\mkern 1.3mu= (0_1,\dots, 0_{t-1}, d, D_{j, t+1}, \dots, D_{j, \ell-1}, 0_{\ell}, D_{j, \ell +1}, \dots, D_{j, T})\) for any \(\ell > t\).
Proof. This holds as an immediate consequence of Assumption 12 where we consider \(D_{j,\ell} = 0_{\ell}\) instead. ◻
Assumption 13 is not restrictive. Recalling the Linear Dynamical System setting from Proposition 1, we present a few sufficient conditions for the above to hold true.
Hard Memory Cutoff \[\exists q \in \mathbb{N}, \quad \forall T, \quad \prod_{j=T-q}^{T} \boldsymbol{B}_{n,j} = 0.\]
Exponential Forgetting (Spectral Decay Condition) \[\exists C > 0, \rho \in (0,1), \text{ such that for all } T, t, \quad \left\| \prod_{j=t}^{T} \boldsymbol{B}_{n,j} \right\|_2 \leq C \rho^{T-t}.\]
Soft Memory Cutoff (Higher-Order Markov Property) \[\mathbb{P}(z_{n,T} \mid z_{n,T-1}, z_{n,T-2}, \dots, z_{n,0}) = \mathbb{P}(z_{n,T} \mid z_{n,T-1}, \dots, z_{n,T-q}).\]
Clearly, the first condition is the strongest and implies the other two. In general, this shows that our assumption of fixed memory is a reasonable one proving the effectiveness of our methodology within the dynamic treatment regime from a statistical perspective.
1. Verifying Baseline Consistency: We first check the units not in control the entire time.
Donor Set Baseline Consistency: Consider unit \(n \in \mathcal{I}_T^0\). Denote \(X_{\mathcal{I}_T^0 \setminus n} = X_{:, \mathcal{I}^0_T\setminus n} \in \mathbb{R}^{ p \times |\mathcal{I}_T^0 \setminus n|}\). We know the baseline outcome admits the representation \[\hat{b}_{n,T} - b_{n,T} \mid \mathcal{LF}= \left\langle \hat{\phi}^{n, \mathcal{I}_T^0}, Y_{\mathcal{I}_T^0 \setminus n} \right\rangle - \left\langle \phi^{n, \mathcal{I}_T^0}, \mathbb{E}[Y_{\mathcal{I}_T^0 \setminus n} \mid \mathcal{LF}] \right\rangle,\] where \(\hat{\phi}^{n, \mathcal{I}_T^0}\) are the regression coefficients from regressing additional covariates \(X_n \in \mathbb{R}^p\) on the rank \(k_{\mathcal{I}_T^0 \setminus n}\)-approximation \(X_{\mathcal{I}_T^0 \setminus n}\) with \(k_{\mathcal{I}_T^0 \setminus n} = \text{rank}(\mathbb{E}[X_{\mathcal{I}_T^0 \setminus n}])\), i.e., doing PCR with parameter \(k_{\mathcal{I}_T^0 \setminus n}\).
Lemma 4. We claim the following \[\left\langle \phi^{n, \mathcal{I}_T^0}, \mathbb{E}[Y_{\mathcal{I}_T^0 \setminus n}] \right\rangle = \left\langle \tilde{\phi}^{n, \mathcal{I}_T^0}, \mathbb{E}[Y_{\mathcal{I}_T^0 \setminus n} ] \right\rangle,\] where \(\tilde{\phi}^{n, \mathcal{I}_T^0} = VV^{\top}\phi^{n, \mathcal{I}_T^0}\) where \(V \in \mathbb{R}^{|\mathcal{I}_T^0 \setminus n| \times k_{\mathcal{I}_T^0 \setminus n}}\) denotes the right singular vectors of \(\mathbb{E}[X_{\mathcal{I}_T^0 \setminus n}]\) and \(k_{\mathcal{I}_T^0 \setminus n} = \text{rank}(\mathbb{E}[X_{\mathcal{I}_T^0 \setminus n}])\), i.e, \[\mathbb{E}[X_{\mathcal{I}_T^0 \setminus n}] = \sum_{l = 1}^{k_{\mathcal{I}_T^0 \setminus n}}\sigma_lu_lv_l^{\top} = U\Sigma V^T,\] where \(u_\ell \in \mathbb{R}^p\) and \(v_\ell \in \mathbb{R}^{|\mathcal{I}_T^0 \setminus n|}\)
Proof. By Assumption 12 there exists \(\xi^{(0,T)}\) such that for any \(j \in \mathcal{I}^0_T \setminus n\)
\[\mathbb{E}[Y_{j,T}|\mathcal{LF}, j \in \mathcal{I}^0_T \setminus n] = \sum_{i = 1}^p \xi^{(0,T)}_i \cdot \mathbb{E}[(X_{\mathcal{I}^0_T\setminus n})_{ij}|\mathcal{LF}, j \in \mathcal{I}^0_T \setminus n].\]
As such, the row-space of \(\mathbb{E}[Y_{\mathcal{I}_T^0 \setminus n}]^{\top} \in \mathbb{R}^{1 \times |\mathcal{I}_T^0 \setminus n|}\) is included in row space of \(\mathbb{E}[X_{\mathcal{I}_T^0 \setminus n}] \in \mathbb{R}^{p \times |\mathcal{I}_T^0 \setminus n|}\).15 This yields \[\mathbb{E}[Y_{\mathcal{I}_T^0 \setminus n}] = VV^T \mathbb{E}[Y_{\mathcal{I}_T^0 \setminus n}],\] which gives us \[\left\langle \tilde{\phi}^{n, \mathcal{I}_T^0}, \mathbb{E}[Y_{\mathcal{I}_T^0 \setminus n} ] \right\rangle = \left\langle VV^{\top}\phi^{n, \mathcal{I}_T^0}, \mathbb{E}[Y_{\mathcal{I}_T^0 \setminus n}] \right\rangle = \mathbb{E}[Y_{\mathcal{I}_T^0 \setminus n} ]^{\top}VV^{\top} \cdot \phi^{n, \mathcal{I}_T^0} = \left\langle \phi^{n, \mathcal{I}_T^0}, \mathbb{E}[Y_{\mathcal{I}_T^0 \setminus n} ] \right\rangle\] proving the desired result. ◻
Using Lemma 4, we can now lift the proof technique in [10] Theorem \(2\) (Appendix C) to show consistency for \(n \in \mathcal{I}_T^0\) \[\begin{align} \label{eq:donor-baseline-consistency} \hat{b}_{n,T} - b_{n,T} \mid \mathcal{LF}&= \left\langle \hat{\phi}^{n, \mathcal{I}_T^0}, Y_{\mathcal{I}_T^0 \setminus n} \right\rangle - \left\langle \tilde{\phi}^{n, \mathcal{I}_T^0}, \mathbb{E}[Y_{\mathcal{I}_T^0 \setminus n}] \right\rangle \nonumber\\ &= O_p \left( \sqrt{\log(p |\mathcal{I}_T^0|)} \left[ \frac{{k}^{3/4}}{p^{1/4}} + {k}^2 \max \left\{ \frac{\sqrt{|\mathcal{I}_T^0|}}{p^{3/2}}, \frac{1}{\sqrt{p}}, \frac{1}{\sqrt{|\mathcal{I}_T^0|-1}} \right\} \right] \right), \end{align}\tag{71}\] where we set \(T_1 = 1\), \(\tilde{w}^{(i,d)} = \tilde{\phi}^{n, \mathcal{I}_T^0}\), \(\hat{w}^{(i,d)} = \hat{\phi}^{n, \mathcal{I}_T^0}\), \(Y_{t,\mathcal{I}^{(d)}} = Y_{\mathcal{I}_T^0 \setminus n}\), \(\mathbb{E}[Y_{t,\mathcal{I}^{(d)}}] = \mathbb{E}[Y_{\mathcal{I}_T^0 \setminus n} \mid \mathcal{LF}]\), and \(\mathcal{P}_{V_{\text{pre}}} = VV^{\top}\). Furthermore, in the final rate we set \(T_0 = p\), \(N_d = |\mathcal{I}_T^0 \setminus n|\), and \(r_{\text{pre}} = k_{\mathcal{I}_T^0 \setminus n}\). To conclude, we used that \(|\mathcal{I}_T^0 \setminus n| = |\mathcal{I}_T^0| - 1\) and \(k_{\mathcal{I}_T^0 \setminus n} \leq k\) where \(k\) is the uniform upper bound on the rank on all possible expected covariate matrices, i.e., \(k = \max_{d \in [A], t\in [T]}\text{rank}(\mathbb{E}[X_{\mathcal{I}_t^d}])\).
Non-Donor Set Baseline Consistency: Consider unit \(n \notin \mathcal{I}_T^0\). Denote \(X_{\mathcal{I}_T^0} = X_{:, \mathcal{I}^0_T}\in \mathbb{R}^{p \times |\mathcal{I}_T^0|}\). We know the baseline outcome admits the representation \[\hat{b}_{n,T} - b_{n,T} \mid \mathcal{LF}= \left\langle \hat{\beta}^{n, \mathcal{I}_T^0}, \hat{b}_{\mathcal{I}_T^0 } \right\rangle - \left\langle \beta^{n, \mathcal{I}_T^0}, b_{\mathcal{I}_T^0 } \right\rangle,\] where \(\hat{\beta}^{n, \mathcal{I}_T^0}\) are the regression coefficients from regressing additional covariates \(X_n \in \mathbb{R}^p\) on the rank \(k_{\mathcal{I}_T^0 }\)-approximation of \(X_{\mathcal{I}_T^0 }\) with \(k_{\mathcal{I}_T^0 } = \text{rank}(\mathbb{E}[X_{\mathcal{I}_T^0}])\), i.e., doing PCR with parameter \(k_{\mathcal{I}_T^0 }\).
Lemma 5. We have that \[\left\langle \beta^{n, \mathcal{I}_T^0}, b_{\mathcal{I}_T^0} \right\rangle = \left\langle \tilde{\beta}^{n, \mathcal{I}_T^0}, b_{\mathcal{I}_T^0} \right\rangle\] with \(\tilde{\beta}^{n, \mathcal{I}_T^0} = VV^{\top}{\beta}^{n, \mathcal{I}_T^0}\), where \(V\) denotes the right singular vectors of \(\mathbb{E}[X_{ \mathcal{I}_T^0}]\).
Proof. It would suffice to prove that \[VV^{\top}b_{\mathcal{I}_T^0} = b_{\mathcal{I}_T^0},\] which is equivalent to \(b_{\mathcal{I}_T^0}^{\top}\) being in the rowspace of \(\mathbb{E}[X_{\mathcal{I}_T^0}]\). By definition, for any \(j \in \mathcal{I}^0_T\) we know \(b_{j,T} = \mathbb{E}[Y_{j,T}|\mathcal{LF}, j \in \mathcal{I}_T^0]\). Lastly, by Assumption 12 there exists \(\xi^{(0,T)} \in \mathbb{R}^p\) such that \[\mathbb{E}[Y_{j,T}|\mathcal{LF}, j \in \mathcal{I}_T^0] = \sum_{i = 1}^p\xi_i^{(0,T)} \cdot \mathbb{E}[(X_{\mathcal{I}_T^0})_{ij}|\mathcal{LF}, j \in \mathcal{I}_T^0].\] This concludes the proof. ◻
Lemma 5 allows us to write \[\begin{align} \hat{b}_{n,T} - b_{n,T} \mid \mathcal{LF}&= \left\langle \hat{\beta}^{n, \mathcal{I}_T^0}, \hat{b}_{\mathcal{I}_T^0 } \right\rangle - \left\langle \tilde{\beta}^{n, \mathcal{I}_T^0}, b_{\mathcal{I}_T^0 } \right\rangle\\ &= \underbrace{\langle \tilde{\beta}^{n,\mathcal{I}_T^0}, \eta_{\mathcal{I}_T^0}\rangle}_{\text{Term 1a}} + \underbrace{\langle \Delta_{n, \mathcal{I}_T^0} , \eta_{\mathcal{I}_T^0}\rangle}_{\text{Term 1b}} + \underbrace{\langle \Delta_{n, \mathcal{I}_T^0}, b_{\mathcal{I}_T^0}\rangle}_{\text{Term 1c}}, \end{align}\] where \(\eta_{\mathcal{I}_T^0} = \hat{b}_{\mathcal{I}_T^0} - b_{\mathcal{I}_T^0}\) and \(\Delta_{n, \mathcal{I}_T^0} = \hat{\beta}^{n,\mathcal{I}_T^0} - \tilde{\beta}^{n,\mathcal{I}_T^0}\).16
Bounding term 1a: For this term, we first state the following result without proof.
Lemma 6 (Appendix B.\(4\), Lemma \(8\) of [10]). Given any \(n \in [N]\), \(d\in [A]\), and \(t \in [T]\) let all relevant assumptions hold. Then conditioned on the latent factors and treatment assignments, we have that \[\|\tilde{\beta}^{n,\mathcal{I}_t^d}\|_2 \leq C \cdot \sqrt{\frac{k_{\mathcal{I}_t^d}}{|\mathcal{I}_t^d|}}\] for some constant \(C > 0\). This immediately implies that \(\|\tilde{\beta}^{n,\mathcal{I}_t^d}\|_1 \leq C \sqrt{k_{\mathcal{I}_t^d}}\).
Note that by Hölder and Cauchy-Schwarz Inequalities, \[\begin{align} \langle \tilde{\beta}^{n,\mathcal{I}_T^0}, \eta_{\mathcal{I}_T^0}\rangle &\leq \|\tilde{\beta}^{n,\mathcal{I}_T^0}\|_1 \cdot \|\eta_{\mathcal{I}_T^0}\|_{\infty}\\ &\leq \sqrt{|\mathcal{I}_T^0|} \cdot \|\tilde{\beta}^{n,\mathcal{I}_T^0}\|_2 \cdot \|\eta_{\mathcal{I}_T^0}\|_{\infty}. \end{align}\] Lemma 6 gives us that \(\|\tilde{\beta}^{n,\mathcal{I}_T^0}\|_2 \leq C \sqrt{k/|\mathcal{I}_T^0|}\). Donor Baseline Consistency (Equation 71 ) yields \[\label{eq:baseline-consistency} \|\eta_{\mathcal{I}_T^0}\|_{\infty} = O_p \left( \sqrt{\log(p |\mathcal{I}_T^0|)} \left[ \frac{{k}^{3/4}}{p^{1/4}} + {k}^2 \max \left\{ \frac{\sqrt{|\mathcal{I}_T^0|}}{p^{3/2}}, \frac{1}{\sqrt{p}}, \frac{1}{\sqrt{|\mathcal{I}_T^0|-1}} \right\} \right] \right).\tag{72}\] Combining both results, we know \[\label{eq:part1a} \langle \tilde{\beta}^{n,\mathcal{I}_T^0}, \eta_{\mathcal{I}_T^0}\rangle = O_p \left( \sqrt{k\log(p |\mathcal{I}_T^0|)} \left[ \frac{{k}^{3/4}}{p^{1/4}} + {k}^2 \max \left\{ \frac{\sqrt{|\mathcal{I}_T^0|}}{p^{3/2}}, \frac{1}{\sqrt{p}}, \frac{1}{\sqrt{|\mathcal{I}_T^0|-1}} \right\} \right] \right).\tag{73}\]
Bounding term 2b: Once again we state the following without proof
Lemma 7 (Appendix B.\(4\), Lemma \(7\) of [10]). Let the setup of Lemma 6 hold, the w.p. at least \(1 - O(1/(p |\mathcal{I}_t^d|)^{10})\), \[\|\tilde{\beta}^{n,\mathcal{I}_t^d} - \hat{\beta}^{n,\mathcal{I}_t^d}\|_2^2 \leq C(\sigma) \cdot \log(p |\mathcal{I}_t^d|) \left( \frac{k^{3/2}}{p^{1/2} |\mathcal{I}_t^d|} + \frac{k^3}{\min\{p, |\mathcal{I}_t^d|\}^2} \right),\] where \(C(\sigma)\) is a constant that only depends on \(\sigma\), which appears in Assumption 9.
Once again note that \[\langle \Delta_{n, \mathcal{I}_T^0} , \eta_{\mathcal{I}_T^0}\rangle \leq \sqrt{|\mathcal{I}_T^0|} \cdot \|\Delta_{n, \mathcal{I}_T^0}\|_2 \cdot \|\eta_{\mathcal{I}_T^0}\|_{\infty},\] where using Lemma 7 and Equation 72 gives us \[\begin{align} \label{eq:part1b} &\langle \Delta_{n, \mathcal{I}_T^0} , \eta_{\mathcal{I}_T^0}\rangle\\ &= O_p\Bigg(\sqrt{|\mathcal{I}_T^0|} \cdot \sqrt{\log(p |\mathcal{I}_T^0|)} \left( \frac{k^{3/4}}{p^{1/4} |\mathcal{I}_T^0|^{1/2}} + \frac{k^{3/2}}{\min\{p, |\mathcal{I}_T^0|\}} \right)\cdot \sqrt{\log(p |\mathcal{I}_T^0|)}\Bigg( \frac{{k}^{3/4}}{p^{1/4}} \nonumber \\ &+ {k}^2 \max \left\{ \frac{\sqrt{|\mathcal{I}_T^0|}}{p^{3/2}}, \frac{1}{\sqrt{p}}, \frac{1}{\sqrt{|\mathcal{I}_T^0|-1}} \right\} \Bigg) \Bigg).\nonumber \end{align}\tag{74}\] Bounding term 2c: Lemma 5 and Cauchy–Schwarz gives \[\langle \Delta_{n, \mathcal{I}_T^0}, b_{\mathcal{I}_T^0}\rangle = \langle b_{\mathcal{I}_T^0}, VV^{\top}\Delta_{n, \mathcal{I}_T^0} \rangle \leq \|b_{\mathcal{I}_T^0}\|_2 \cdot \|VV^{\top}\Delta_{n, \mathcal{I}_T^0}\|_2.\] We introduce the following result without proof.
Lemma 8 (Appendix C, Lemma \(9\) of [10]). Let the setup from Lemma 6 hold, then \[VV^{\top}\Delta_{n, \mathcal{I}_t^d}= O_p \left( \frac{\sqrt{k}}{\sqrt{|\mathcal{I}_t^d|} p^{1/4}} + \frac{k^{3/2} \sqrt{\log(p |\mathcal{I}_t^d|)}}{\sqrt{|\mathcal{I}_t^d|} \cdot \min\{\sqrt{p}, \sqrt{|\mathcal{I}_t^d|}\}} + \frac{k^2 \sqrt{\log(p |\mathcal{I}_t^d|)}}{\min\{p^{3/2}, |\mathcal{I}_t^d|^{3/2}\}} \right).\]
Assumption 10 gives for any \(j \in \mathcal{I}_T^0\) \[|b_{j,T}| = \left|\mathbb{E}[Y_{j,T}^{(\bar{0}^T)}]\right| \leq 1,\] which lets us conclude \[\|b_{\mathcal{I}_T^0}\|_2 \leq \sqrt{|\mathcal{I}_T^0|}.\] Together we know \[\label{eq:part1c} \langle \Delta_{n, \mathcal{I}_T^0}, b_{\mathcal{I}_T^0}\rangle = O_p \left( \frac{\sqrt{k}}{ p^{1/4}} + \frac{k^{3/2} \sqrt{\log(p |\mathcal{I}_T^0|)}}{\min\{\sqrt{p}, \sqrt{|\mathcal{I}_T^0|}\}} + \frac{k^2 \sqrt{|\mathcal{I}_T^0|\log(p |\mathcal{I}_T^0|)}}{\min\{p^{3/2}, |\mathcal{I}_T^0|^{3/2}\}} \right).\tag{75}\] Combining the Equations 73 , 74 , and 75 gives the following final rate for units \(n \notin \mathcal{I}_T^0\):17 \[\begin{align} &\hat{b}_{n,T} - b_{n,T} \mid \mathcal{LF}\\ &= O_p\left(\sqrt{k\log(p|\mathcal{I}_T^0|)}\left(\frac{k^{3/4}}{p^{1/4}} + k^2 \max\left\{\frac{\sqrt{|\mathcal{I}_T^0|}}{p^{3/2}}, \frac{1}{\sqrt{|\mathcal{I}_T^0|-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right). \end{align}\]
Baseline Consistency: The above two sections allows us to conclude that for any \(n \in [N]\) \[\begin{align} \label{eq:baseline-consistency-rate} &\hat{b}_{n,T} - b_{n,T} \mid \mathcal{LF}\\ &= O_p\left(\sqrt{\log(p|\mathcal{I}_T^0|)}\left(\frac{k^{5/4}}{p^{1/4}} +k^{5/2}\max\left\{\frac{\sqrt{|\mathcal{I}_T^0|}}{p^{3/2}}, \frac{1}{\sqrt{|\mathcal{I}_T^0|-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right).\nonumber \end{align}\tag{76}\]
2. Verifying Terminal Blip Consistency:
For any \(d \in [A]\):
Donor Set Consistency: Consider unit \(n \in \mathcal{I}_T^d\). Denote \(X_{\mathcal{I}_T^d \setminus n} = X_{:, \mathcal{I}^d_T \setminus n}\in \mathbb{R}^{p \times |\mathcal{I}_T^d \setminus n|}\). We know the baseline outcome admits the representation \[\hat{\gamma}_{n,T,T}(d) - \gamma_{n,T,T}(d) \mid \mathcal{LF}= \underbrace{\left\langle \hat{\phi}^{n, \mathcal{I}_T^d}, Y_{\mathcal{I}_T^d \setminus n} \right\rangle - \left\langle \phi^{n, \mathcal{I}_T^d}, \mathbb{E}[Y_{\mathcal{I}_T^d \setminus n} \mid \mathcal{LF}] \right\rangle}_{\text{Term 1}} + \underbrace{b_{n,T}\mid \mathcal{LF}- \hat{b}_{n,T}}_{\text{Term 2}},\] where \(\hat{\phi}^{n, \mathcal{I}_T^d}\) are the regression coefficients from regressing additional covariates \(X_n \in \mathbb{R}^p\) on the rank \(k_{\mathcal{I}_T^d \setminus n}\)-approximation \(X_{\mathcal{I}_T^d \setminus n}\) with \(k_{\mathcal{I}_T^d \setminus n} = \text{rank}(\mathbb{E}[X_{\mathcal{I}_T^d \setminus n}])\), i.e., doing PCR with parameter \(k_{\mathcal{I}_T^d \setminus n}\).
Bounding Term \(1\): This argument is nearly identical to that for Donor Set Baseline Consistency.
Lemma 9. We have that \[\left\langle \phi^{n, \mathcal{I}_T^d}, \mathbb{E}[Y_{\mathcal{I}_T^d \setminus n} ]\right\rangle = \left\langle \tilde{\phi}^{n, \mathcal{I}_T^d}, \mathbb{E}[Y_{\mathcal{I}_T^d \setminus n} ] \right\rangle\] with \(\tilde{\phi}^{n, \mathcal{I}_T^d} = VV^{\top}{\phi}^{n, \mathcal{I}_T^d}\), where \(V\) denotes the right singular vectors of \(\mathbb{E}[X_{ \mathcal{I}_T^d\setminus n}]\).
Proof. It would suffice to prove that \[VV^{\top}\mathbb{E}[Y_{\mathcal{I}_T^d \setminus n} ] = \mathbb{E}[Y_{\mathcal{I}_T^d \setminus n} ],\] which is equivalent to \(\mathbb{E}[Y_{\mathcal{I}_T^d \setminus n}]^{\top}\) being in the rowspace of \(\mathbb{E}[X_{\mathcal{I}_T^d \setminus n}]\). By Assumption 12 there exists \(\xi^{(d,T)}\) such that for any \(j \in \mathcal{I}^d_T \setminus n\)
\[\mathbb{E}[Y_{j,T}|\mathcal{LF}, j \in \mathcal{I}^d_T \setminus n] = \sum_{i = 1}^p \xi^{(d,T)}_i \cdot \mathbb{E}[(X_{\mathcal{I}^d_T\setminus n})_{ij}|\mathcal{LF}, j \in \mathcal{I}^d_T \setminus n].\]
This concludes the proof. ◻
Using Lemma 9, we can once again use the proof technique in [10] Theorem \(2\) (Appendix C) to show consistency of \[\begin{align} \label{eq:donor-terminal-blip-consistency-term-1} \text{Term 1} &= \left\langle \hat{\phi}^{n, \mathcal{I}_T^d}, Y_{\mathcal{I}_T^d \setminus n} \right\rangle - \left\langle \tilde{\phi}^{n, \mathcal{I}_T^d}, \mathbb{E}[Y_{\mathcal{I}_T^d \setminus n}] \right\rangle\\ &= O_p \left( \sqrt{\log(p |\mathcal{I}_T^d|)} \left[ \frac{{k}^{3/4}}{p^{1/4}} + {k}^2 \max \left\{ \frac{\sqrt{|\mathcal{I}_T^d|}}{p^{3/2}}, \frac{1}{\sqrt{p}}, \frac{1}{\sqrt{|\mathcal{I}_T^d|-1}} \right\} \right] \right).\nonumber \end{align}\tag{77}\] Bounding Term 2: This rate is exactly as listed in Equation 76 .
Combining Term \(1\) and \(2\) rates, we find for any \(n \in \mathcal{I}^d_T\)
\[\label{eq:donor-terminal-rate} \hat{\gamma}_{n,T,T}(d) - \gamma_{n,T,T}(d) \mid \mathcal{LF}= O_p\left(\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{5/4}}{p^{1/4}} +k^{5/2}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\tag{78}\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}_T^d|\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_T^0|,|\mathcal{I}_T^d|\}\).
Non-Donor Set Consistency: Consider unit \(n \notin \mathcal{I}_T^d\). Denote \(X_{\mathcal{I}_T^d} = X_{:, \mathcal{I}^d_T} \in \mathbb{R}^{p \times |\mathcal{I}_T^d|}\). We know the baseline outcome admits the representation \[\hat{\gamma}_{n,T,T}(d) - \gamma_{n,T, T}(d) \mid \mathcal{LF}= \left\langle \hat{\beta}^{n, \mathcal{I}_T^d}, \hat{\gamma}_{\mathcal{I}_T^d, T, T}(d) \right\rangle - \left\langle \beta^{n, \mathcal{I}_T^d}, \gamma_{\mathcal{I}_T^d, T, T}(d) \right\rangle,\] where \(\hat{\beta}^{n, \mathcal{I}_T^d}\) are the regression coefficients from regressing additional covariates \(X_n \in \mathbb{R}^p\) on the rank \(k_{\mathcal{I}_T^d }\)-approximation \(X_{\mathcal{I}_T^d }\) with \(k_{\mathcal{I}_T^d } = \text{rank}(\mathbb{E}[X_{\mathcal{I}_T^d}])\), i.e., doing PCR with parameter \(k_{\mathcal{I}_T^d }\).
We use an essentially identical argument to that established in Non-Donor Set Baseline Consistency.
Lemma 10. We have that \[\left\langle \beta^{n, \mathcal{I}_T^d}, \gamma_{\mathcal{I}_T^d, T, T}(d) \right\rangle = \left\langle \tilde{\beta}^{n, \mathcal{I}_T^d}, \gamma_{\mathcal{I}_T^d, T, T}(d) \right\rangle\] with \(\tilde{\beta}^{n, \mathcal{I}_T^d} = VV^{\top}{\beta}^{n, \mathcal{I}_T^d}\), where \(V\) denotes the right singular vectors of \(\mathbb{E}[X_{ \mathcal{I}_T^d}]\).
Proof. It would suffice to prove that \[VV^{\top}\gamma_{\mathcal{I}_T^d, T, T}(d) = \gamma_{\mathcal{I}_T^d, T, T}(d) ,\] which is equivalent to \(\gamma_{\mathcal{I}_T^d, T, T}(d) ^{\top}\) being in the rowspace of \(\mathbb{E}[X_{\mathcal{I}_T^d}]\). To that end, recall for any \(j \in \mathcal{I}_T^d\) \[\begin{align} \gamma_{j, T, T}(d) &= \mathbb{E}[Y_{j,T}^{(0_1, \dots, 0_{T-1},d)}|\mathcal{LF}, j \in \mathcal{I}_T^d]- \mathbb{E}[Y_{j,T}^{(\bar{0}^T)}|\mathcal{LF}, j \in \mathcal{I}_T^d]\\ &= \mathbb{E}[Y_{j,T}^{(0_1, \dots, 0_{T-1},d)}|\mathcal{LF}, j \in \mathcal{I}_T^d]- \mathbb{E}[\langle v_{j,T},w_{(\bar{0}^T)}\rangle + \varepsilon_{j,T}^{(\bar{0}^T)}|\mathcal{LF}, j \in \mathcal{I}_T^d]\\ &= \mathbb{E}[Y_{j,T}^{(0_1, \dots, 0_{T-1},d)}|\mathcal{LF}, j \in \mathcal{I}_T^d] - \langle v_{j,T},w_{(\bar{0}^T)}\rangle | \mathcal{LF}, j \in \mathcal{I}_T^d\\ &= \mathbb{E}[Y_{j,T}^{(0_1, \dots, 0_{T-1},d)}|\mathcal{LF}, j \in \mathcal{I}_T^d] - \langle v_{j,T},w_{(\bar{0}^T)}\rangle | \mathcal{LF},\{\rho_i\}_{i \in [p]}, j \in \mathcal{I}_T^d\\ &= \mathbb{E}[Y_{j,T}^{(0_1, \dots, 0_{T-1},d)}|\mathcal{LF}, j \in \mathcal{I}_T^d] - \left\langle v_{j,T},\sum_{i = 1}^p \alpha_i^{(0,T)} \cdot \rho_i\right\rangle | \mathcal{LF},\{\rho_i\}_{i \in [p]}, j \in \mathcal{I}_T^d\\ &= \mathbb{E}[Y_{j,T}^{(0_1, \dots, 0_{T-1},d)}|\mathcal{LF}, j \in \mathcal{I}_T^d] - \sum_{i = 1}^p \alpha_i^{(0,T)} \cdot \langle v_{j,T},\rho_i\rangle | \mathcal{LF},\{\rho_i\}_{i \in [p]}, j \in \mathcal{I}_T^d\\ &= \mathbb{E}[Y_{j,T}^{(0_1, \dots, 0_{T-1},d)}|\mathcal{LF}, j \in \mathcal{I}_T^d] - \sum_{i = 1}^p \alpha_i^{(0,T)} \cdot \mathbb{E}[\langle v_{j,T},\rho_i\rangle + \varepsilon_{ji}| \mathcal{LF},\{\rho_i\}_{i \in [p]}, j \in \mathcal{I}_T^d]\\ &=\mathbb{E}[Y_{j,T}^{(0_1, \dots, 0_{T-1},d)}|\mathcal{LF}, j \in \mathcal{I}_T^d] - \sum_{i = 1}^p \alpha_i^{(0,T)} \cdot \mathbb{E}[(X_{\mathcal{I}^d_T})_{ij}| \mathcal{LF},\{\rho_i\}_{i \in [p]}, j \in \mathcal{I}_T^d]\\ &=\sum_{i = 1}^p \xi_i^{(d,T)}\cdot \mathbb{E}[(X_{\mathcal{I}^d_T})_{ij}|\mathcal{LF}, j \in \mathcal{I}_T^d] - \sum_{i = 1}^p \alpha_i^{(0,T)} \cdot \mathbb{E}[(X_{\mathcal{I}^d_T})_{ij}| \mathcal{LF},\{\rho_i\}_{i \in [p]}, j \in \mathcal{I}_T^d]\\ &= \sum_{i = 1}^p (\xi_i^{(d,T)} - \alpha_i^{(0,T)})\cdot\mathbb{E}[(X_{\mathcal{I}^d_T})_{ij}|\mathcal{LF}, \{\rho_i\}_{i \in [p]},j \in \mathcal{I}_T^d]. \end{align}\] We use Lemma 2 in the fifth equality and Assumption 12 in the second to last equality. The remainder of the steps follows from relevant definitions and standard manipulations. This completes the proof. ◻
Lemma 10 allows us to write \[\begin{align} \hat{\gamma}_{n,T,T}(d) - \gamma_{n,T, T}(d) \mid \mathcal{LF}&= \left\langle \hat{\beta}^{n, \mathcal{I}_T^d}, \hat{\gamma}_{\mathcal{I}_T^d, T, T}(d) \right\rangle - \left\langle \beta^{n, \mathcal{I}_T^d}, \gamma_{\mathcal{I}_T^d, T, T}(d) \right\rangle\\ &= \underbrace{\langle \tilde{\beta}^{n,\mathcal{I}_T^d}, \eta_{\mathcal{I}_T^d}(d)\rangle}_{\text{Term 1a}} + \underbrace{\langle \Delta_{n, \mathcal{I}_T^d} , \eta_{\mathcal{I}_T^d}(d)\rangle}_{\text{Term 1b}} + \underbrace{\langle \Delta_{n, \mathcal{I}_T^d}, \gamma_{\mathcal{I}_T^d, T,T}(d)\rangle}_{\text{Term 1c}}, \end{align}\] where \(\eta_{\mathcal{I}_T^d}(d) = \hat{\gamma}_{\mathcal{I}_T^d, T, T}(d) - \gamma_{\mathcal{I}_T^d, T, T}(d)\) and \(\Delta_{n, \mathcal{I}_T^d} = \hat{\beta}^{n,\mathcal{I}_T^d} - \tilde{\beta}^{n,\mathcal{I}_T^d}\). Using the previously referenced argument and applying the appropriate version of Lemmas 6, 7, and 8 allows us to claim for \(n \notin \mathcal{I}_T^d\) \[\label{eq:non-donor-terminal-rate} \hat{\gamma}_{n,T,T}(d) - \gamma_{n,T,T}(d) \mid \mathcal{LF}= O_p\left(\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{7/4}}{p^{1/4}} +k^{3}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\tag{79}\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}_T^d|\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_T^0|,|\mathcal{I}_T^d|\}\).
Terminal Blip Consistency: The above two sections allows us to conclude that for any \(n \in [N]\) \[\label{eq:terminal-rate} \hat{\gamma}_{n,T,T}(d) - \gamma_{n,T,T}(d) \mid \mathcal{LF}= O_p\left(\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{7/4}}{p^{1/4}} +k^{3}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\tag{80}\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}_T^d|\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_T^0|,|\mathcal{I}_T^d|\}\).
3. Verifying Non-Terminal Blip Consistency:
For any unit \(n \in [N]\), treatment \(d \in [A]\), and \(t \in [1, \dots, T-1]\), consider the statement \(P_{d,n}(t)\): \[\begin{align} &\hat{\gamma}_{n, T, t}(d) - \gamma_{n, T, t}(d) \mid \mathcal{LF}\\ &= O_p\left((T-t)\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{(T-t)}}{p^{1/4}} + k^{(T-t)}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right), \end{align}\] where \(\mathcal{F}= \{|\mathcal{I}_T^0|, |\mathcal{I}_t^d| ,(|\mathcal{I}_{q}^{D_{n,q}}|)_{n\in [N],q \in [t+1, \dots, T]}\}\) with \(\pi_{\mathcal{I}} = \max\mathcal{F}, \alpha_{\mathcal{I}} = \min\mathcal{F}\).
We proceed by strong induction.
To that end, consider the base case \(t = T-1\), i.e., proving \(P_{d,n}(T-1)\):
For any \(d \in [A]\):
Donor Set Consistency: Consider unit \(n \in \mathcal{I}_{T-1}^d\). Denote \(X_{\mathcal{I}_{T-1}^d \setminus n} = X_{:,\mathcal{I}^d_{T-1}\setminus n} \in \mathbb{R}^{p \times|\mathcal{I}_{T-1}^d \setminus n|}\). We know the baseline outcome admits the representation \[\begin{align} &\hat{\gamma}_{n,T,T-1}(d) - \gamma_{n,T,T-1}(d) \mid \mathcal{LF}= \underbrace{\left\langle \hat{\phi}^{n, \mathcal{I}_{T-1}^d}, Y_{\mathcal{I}_{T-1}^d \setminus n} \right\rangle - \left\langle \phi^{n, \mathcal{I}_{T-1}^d}, \mathbb{E}[Y_{\mathcal{I}_{T-1}^d \setminus n} \mid \mathcal{LF}] \right\rangle}_{\text{Term 1}}\\ &+ \underbrace{\left\langle \phi^{n, \mathcal{I}_{T-1}^d},b_{\mathcal{I}^d_{T-1} \setminus n}\mid \mathcal{LF}\right\rangle - \left\langle \hat{\phi}^{n, \mathcal{I}_{T-1}^d},\hat{b}_{\mathcal{I}^d_{T-1}\setminus n} \right \rangle}_{\text{Term 2}}\\ &+ \underbrace{\left\langle \phi^{n, \mathcal{I}_{T-1}^d}, \gamma_{\mathcal{I}^d_{T-1}\setminus n,T,T}(D_{\mathcal{I}^d_{T-1}\setminus n,T})\mid \mathcal{LF}\right \rangle - \left\langle \hat{\phi}^{n, \mathcal{I}_{T-1}^d}, \hat{\gamma}_{\mathcal{I}^d_{T-1}\setminus n,T,T}(D_{\mathcal{I}^d_{T-1}\setminus n,T}) \right \rangle}_{\text{Term 3}}. \end{align}\] where \(\gamma_{\mathcal{I}^d_{T-1}\setminus n,T,T}(D_{\mathcal{I}^d_{T-1}\setminus n,T}) = [(\gamma_{j, T, T}(D_{j, T}))_{j \in \mathcal{I}^d_{T-1}}]^{\top}\) and \(\hat{\phi}^{n, \mathcal{I}_{T-1}^d}\) are the regression coefficients from regressing additional covariates \(X_n \in \mathbb{R}^p\) on the rank \(k_{\mathcal{I}_{T-1}^d \setminus n}\)-approximation \(X_{\mathcal{I}_{T-1}^d \setminus n}\) with \(k_{\mathcal{I}_{T-1}^d \setminus n} = \text{rank}(\mathbb{E}[X_{\mathcal{I}_{T-1}^d \setminus n})\), i.e., doing PCR with parameter \(k_{\mathcal{I}_{T-1}^d \setminus n}\).
Bounding Term 1: We prove a similar row space result.
Lemma 11. We have for any \(t \in [T-1]\) \[\left\langle \phi^{n, \mathcal{I}_t^d}, \mathbb{E}[Y_{\mathcal{I}_t^d \setminus n} ]\right\rangle = \left\langle \tilde{\phi}^{n, \mathcal{I}_t^d}, \mathbb{E}[Y_{\mathcal{I}_t^d \setminus n} ] \right\rangle\] with \(\tilde{\phi}^{n, \mathcal{I}_t^d} = VV^{\top}{\phi}^{n, \mathcal{I}_t^d}\), where \(V\) denotes the right singular vectors of \(\mathbb{E}[X_{ \mathcal{I}_t^d\setminus n}]\).
Proof. It would suffice to prove that \[VV^{\top}\mathbb{E}[Y_{\mathcal{I}_t^d \setminus n} ] = \mathbb{E}[Y_{\mathcal{I}_t^d \setminus n} ],\] which is equivalent to \(\mathbb{E}[Y_{\mathcal{I}_t^d \setminus n}]^{\top}\) being in the rowspace of \(\mathbb{E}[X_{\mathcal{I}_t^d \setminus n}]\). By Assumption 12 there exists \(\xi^{(d,t)}\) such that for any \(j \in \mathcal{I}^d_t \setminus n\) \[\mathbb{E}[Y_{j,T}|\mathcal{LF}, j \in \mathcal{I}^d_t \setminus n] = \sum_{i = 1}^p \xi^{(d,t)}_i \cdot \mathbb{E}[(X_{\mathcal{I}^d_t\setminus n})_{ij}|\mathcal{LF}, j \in \mathcal{I}^d_t \setminus n].\] This concludes the proof. ◻
Using Lemma 11 for \(t = T-1\), we use the proof technique in [10] Theorem \(2\) (Appendix C) to show consistency of \[\begin{align} \label{eq:donor-non-terminal-consistency-term-1} \text{Term 1} &= \left\langle \hat{\phi}^{n, \mathcal{I}_{T-1}^d}, Y_{\mathcal{I}_{T-1}^d \setminus n} \right\rangle - \left\langle \tilde{\phi}^{n, \mathcal{I}_{T-1}^d}, \mathbb{E}[Y_{\mathcal{I}_{T-1}^d \setminus n}] \right\rangle\\ &= O_p \left( \sqrt{\log(p |\mathcal{I}_{T-1}^d|)} \left[ \frac{{k}^{3/4}}{p^{1/4}} + {k}^2 \max \left\{ \frac{\sqrt{|\mathcal{I}_{T-1}^d|}}{p^{3/2}}, \frac{1}{\sqrt{p}}, \frac{1}{\sqrt{|\mathcal{I}_{T-1}^d|-1}} \right\} \right] \right).\nonumber \end{align}\tag{81}\]
Bounding Term 2:
Lemma 12. We have for any \(t \in [T-1]\) \[\left\langle \phi^{n, \mathcal{I}_{t}^d},b_{\mathcal{I}^d_{t}\setminus n} \right\rangle = \left\langle \tilde{\phi}^{n, \mathcal{I}_{t}^d},b_{\mathcal{I}^d_{t}\setminus n}\right\rangle\] with \(\tilde{\phi}^{n, \mathcal{I}_t^d} = VV^{\top}{\phi}^{n, \mathcal{I}_t^d}\), where \(V\) denotes the right singular vectors of \(\mathbb{E}[X_{ \mathcal{I}_t^d\setminus n}]\).
Proof. It would suffice to prove that \[VV^{\top}b_{\mathcal{I}^d_{t}\setminus n}= b_{\mathcal{I}^d_{t}\setminus n},\] which is equivalent to \((b_{\mathcal{I}^d_{t}\setminus n,T})^{\top}\) being in the rowspace of \(\mathbb{E}[X_{\mathcal{I}_t^d \setminus n}]\). Applying Lemma 2 we know for any \(j \in \mathcal{I}^d_t \setminus n\) \[b_{j,T} = \langle v_{j,T}, w_{(\mkern 1.3mu\overline{\mkern-1.3mu0\mkern-1.3mu}\mkern 1.3mu^T)} \rangle = \sum_{i = 1}^p \alpha_i^{(0_T, T)}\cdot \left\langle v_{j,T},\rho_i \right\rangle = \sum_{i = 1}^p \xi_i \cdot \mathbb{E}[(X_{\mathcal{I}^d_t\setminus n})_{ij}|\mathcal{LF}, j \in \mathcal{I}^d_t \setminus n].\] This concludes the proof. ◻
Using Lemma 12 for \(t = T-1\) we can write \[\left\langle \phi^{n, \mathcal{I}_{T-1}^d},b_{\mathcal{I}^d_{T-1}\setminus n} \right\rangle - \left\langle \hat{\phi}^{n, \mathcal{I}_{T-1}^d},\hat{b}_{\mathcal{I}^d_{T-1}\setminus n} \right \rangle = \left\langle \tilde{\phi}^{n, \mathcal{I}_{T-1}^d},b_{\mathcal{I}^d_{T-1}\setminus n}\right\rangle - \left\langle \hat{\phi}^{n, \mathcal{I}_{T-1}^d},\hat{b}_{\mathcal{I}^d_{T-1}\setminus n} \right \rangle\] Next we negate the RHS and decompose as follows:18 \[\begin{align} \left\langle \hat{\phi}^{n, \mathcal{I}_{T-1}^d},\hat{b}_{\mathcal{I}^d_{T-1}\setminus n} \right \rangle&-\left\langle \tilde{\phi}^{n, \mathcal{I}_{T-1}^d},b_{\mathcal{I}^d_{T-1}\setminus n}\right\rangle \\ &= \underbrace{\left\langle \tilde{\phi}^{n, \mathcal{I}_{T-1}^d}, \eta_{\mathcal{I}^d_{T-1} \setminus n}\right\rangle}_{\text{Term 1a}} + \underbrace{\left\langle \Delta_{n, \mathcal{I}^d_{T-1}\setminus n}, \eta_{\mathcal{I}^d_{T-1}\setminus n}\right\rangle}_{\text{Term 1b}} + \underbrace{\left\langle \Delta_{n, \mathcal{I}^d_{T-1}\setminus n} , b_{\mathcal{I}^d_{T-1}\setminus n,T}\right\rangle}_{\text{Term 1c}}, \end{align}\] where \(\eta_{\mathcal{I}^d_{T-1}\setminus n} = \hat{b}_{\mathcal{I}^d_{T-1}\setminus n} - b_{\mathcal{I}^d_{T-1}\setminus n}\) and \(\Delta_{n, \mathcal{I}^d_{T-1}\setminus n} = \hat{\phi}^{n, \mathcal{I}_{T-1}^d} - \tilde{\phi}^{n, \mathcal{I}_{T-1}^d}\). Using the previously referenced argument by applying the appropriate version of Lemma 6, 7, and 8 alongside Equation 76 for Terms 1a, 1b, and 1c respectively allows to claim \[\begin{align} \label{eq:donor-non-terminal-consistency-term-2} \text{Term 2} &= \left\langle \hat{\phi}^{n, \mathcal{I}_{T-1}^d},\hat{b}_{\mathcal{I}^d_{T-1}\setminus n,T} \right \rangle-\left\langle \tilde{\phi}^{n, \mathcal{I}_{T-1}^d},b_{\mathcal{I}^d_{T-1}\setminus n,T}\right\rangle\\ &= O_p \left( \sqrt{\log(p \pi_{\mathcal{I}}|)} \left[ \frac{{k}^{7/4}}{p^{1/4}} + {k}^3 \max \left\{ \frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{p}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}} \right\} \right] \right),\nonumber \end{align}\tag{82}\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}_{T-1}^d|\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_T^0|,|\mathcal{I}_{T-1}^d|\}\).
Bounding Term 3:
Lemma 13. We have for any \(t \in [T-1]\) and \(\ell > t\) \[\left\langle \phi^{n, \mathcal{I}_{t}^d}, \gamma_{\mathcal{I}^d_{t}\setminus n,T,\ell}(D_{\mathcal{I}^d_{t}\setminus n,\ell})\right\rangle = \left\langle \tilde{\phi}^{n, \mathcal{I}_{t}^d}, \gamma_{\mathcal{I}^d_{t}\setminus n,T,\ell}(D_{\mathcal{I}^d_{t}\setminus n,\ell})\right\rangle\] with \(\tilde{\phi}^{n, \mathcal{I}_t^d} = VV^{\top}{\phi}^{n, \mathcal{I}_t^d}\), where \(V\) denotes the right singular vectors of \(\mathbb{E}[X_{ \mathcal{I}_t^d\setminus n}]\).
Proof. It would suffice to prove that \[VV^{\top} \gamma_{\mathcal{I}^d_{t}\setminus n,T,\ell}(D_{\mathcal{I}^d_{t}\setminus n,\ell})= \gamma_{\mathcal{I}^d_{t}\setminus n,T,\ell}(D_{\mathcal{I}^d_{t}\setminus n,\ell}),\] which is equivalent to \((\gamma_{\mathcal{I}^d_{t}\setminus n,T,\ell}(D_{\mathcal{I}^d_{t}\setminus n,\ell}))^{\top}\) being in the rowspace of \(\mathbb{E}[X_{\mathcal{I}_t^d \setminus n}]\). Assumption 12 and Lemma 2 give the existence of \(\xi^{(d,t)}\) and \(\xi^{(d,t)'}\) such that for any \(j \in \mathcal{I}^d_t \setminus n\) \[\begin{align} \gamma_{j,T,\ell}(D_{j,\ell}) &=\langle \psi_{j}^{T,\ell}, w_{D_{j,\ell}} - w_{0_{\ell}}\rangle \pm \sum_{t \neq \ell} \langle \psi_{j}^{T,t},w_{D_{j,t}} \rangle\\ &=\mathbb{E}[Y_{j,T}] - \mathbb{E}[Y_{j,T}^{(\mkern 1.3mu\overline{\mkern-1.3muD_{j, -\ell} \cup 0_{\ell}\mkern-1.3mu}\mkern 1.3mu)}]\\ &= \sum_{i = 1}^p (\xi^{(d,t)}_i - \xi^{(d,t)'}) \cdot \mathbb{E}[(X_{\mathcal{I}^d_t\setminus n})_{ij}|\mathcal{LF}, j \in \mathcal{I}^d_t \setminus n]. \end{align}\] This concludes the proof. ◻
Using Lemma 13 for \(t = T-1\) and \(\ell = T\) we can write \[\begin{align} &\left\langle \phi^{n, \mathcal{I}_{T-1}^d},\gamma_{\mathcal{I}^d_{T-1}\setminus n,T, T}(D_{\mathcal{I}^d_{T-1}\setminus n, T}) \right\rangle - \left\langle \hat{\phi}^{n, \mathcal{I}_{T-1}^d},\hat{\gamma}_{\mathcal{I}^d_{T-1}\setminus n,T, T}(D_{\mathcal{I}^d_{T-1}\setminus n, T}) \right \rangle\\ &= \left\langle \tilde{\phi}^{n, \mathcal{I}_{T-1}^d},\gamma_{\mathcal{I}^d_{T-1}\setminus n,T, T}(D_{\mathcal{I}^d_{T-1}\setminus n, T})\right\rangle - \left\langle \hat{\phi}^{n, \mathcal{I}_{T-1}^d},\hat{\gamma}_{\mathcal{I}^d_{T-1}\setminus n,T, T}(D_{\mathcal{I}^d_{T-1}\setminus n, T}) \right \rangle \end{align}\] At this point we can follow the earlier approach for Term \(2\) by negating, using the same decomposition, and applying the appropriate version of Lemma 6, 7, and 8 alongside Equation 80 to write
\[\begin{align} \text{Term }3 = O_p\left(\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{9/4}}{p^{1/4}} +k^{7/2}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right) \end{align}\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}_{T-1}^d|, (|\mathcal{I}_T^{D_{n,T}}|)_{n \in [N]}\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_T^0|,|\mathcal{I}_{T-1}^d|, (|\mathcal{I}_T^{D_{n,T}}|)_{n \in [N]}\}\). Notice that this dominates the rates for Terms \(1\) and \(2\) and as such we also have for any \(n \in \mathcal{I}_{T-1}^d\) \[\begin{align} \hat{\gamma}_{n,T,T-1}(d) &- \gamma_{n,T,T-1}(d) \mid \mathcal{LF}\label{eq:donor-base-blip-rate}\\ &= O_p\left(\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{9/4}}{p^{1/4}} +k^{7/2}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\nonumber \end{align}\tag{83}\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}_{T-1}^d|, (|\mathcal{I}_T^{D_{n,T}}|)_{n \in [N]}\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_T^0|,|\mathcal{I}_{T-1}^d|, (|\mathcal{I}_T^{D_{n,T}}|)_{n \in [N]}\}\).
Non-Donor Set Consistency: Consider any \(t \in [T-1]\) and unit \(n \notin \mathcal{I}_t^d\). Denote \(X_{\mathcal{I}_t^d} = X_{:,\mathcal{I}^d_t} \in \mathbb{R}^{p \times |\mathcal{I}_t^d|}\). We know the baseline outcome admits the representation \[\hat{\gamma}_{n,T,t}(d) - \gamma_{n,T,t}(d) \mid \mathcal{LF}= \left\langle \hat{\beta}^{n, \mathcal{I}_t^d}, \hat{\gamma}_{\mathcal{I}_t^d, T,t}(d) \right\rangle - \left\langle \beta^{n, \mathcal{I}_t^d}, \gamma_{\mathcal{I}_t^d, T,t}(d) \right\rangle,\] where \(\hat{\beta}^{n, \mathcal{I}_t^d}\) are the regression coefficients from regressing additional covariates \(X_n \in \mathbb{R}^p\) on the rank \(k_{\mathcal{I}_t^d }\)-approximation \(X_{\mathcal{I}_t^d }\) with \(k_{\mathcal{I}_t^d } = \text{rank}(\mathbb{E}[X_{\mathcal{I}_t^d}])\), i.e., doing PCR with parameter \(k_{\mathcal{I}_t^d }\).
We use an identical argument to that established in Baseline Consistency – Non-Donor Set.
Lemma 14. We have that \[\left\langle \beta^{n, \mathcal{I}_t^d}, \gamma_{\mathcal{I}_t^d, T,t}(d) \right\rangle = \left\langle \tilde{\beta}^{n, \mathcal{I}_t^d}, \gamma_{\mathcal{I}_t^d, T,t}(d) \right\rangle\] with \(\tilde{\beta}^{n, \mathcal{I}_t^d} = VV^{\top}{\beta}^{n, \mathcal{I}_t^d}\), where \(V\) denotes the right singular vectors of \(\mathbb{E}[X_{ \mathcal{I}_t^d}]\).
Proof. It would suffice to prove that \[VV^{\top}\gamma_{\mathcal{I}_t^d, T,t}(d) = \gamma_{\mathcal{I}_t^d, T,t}(d) ,\] which is equivalent to \(\gamma_{\mathcal{I}_t^d, T,t}(d) ^{\top}\) being in the rowspace of \(\mathbb{E}[X_{\mathcal{I}_t^d}]\). To that end, recall for any \(j \in \mathcal{I}_t^d\), \[\begin{align} \gamma_{j, T,t}(d) &= \langle \psi_j^{T,t}, w_d - w_{0_t}\rangle\mid \mathcal{LF}\\ &= \langle \psi_j^{T,t}, w_d - w_{0_t}\rangle \pm \sum_{\ell \neq t} \langle \psi_{j}^{T,\ell}, w_{0_{\ell}}\rangle\mid \mathcal{LF}\\ &=\mathbb{E}[\langle v_{j,T},w_{(\bar{0}^{t-1}, d, \underline{0}^{t+1})}\rangle + \varepsilon_{j,T}^{(\bar{0}^{t-1}, d, \underline{0}^{t+1})}|\mathcal{LF}, j \in \mathcal{I}_t^d]- \mathbb{E}[\langle v_{j,T},w_{(\bar{0}^{T})}\rangle + \varepsilon_{j,T}^{(\bar{0}^{T})}|\mathcal{LF}, j \in \mathcal{I}_t^d]\\ &= \langle v_{j,T},w_{(\bar{0}^{t-1}, d, \underline{0}^{t+1})}\rangle - \langle v_{j,T},w_{(\bar{0}^{T})}\rangle | \mathcal{LF}, j \in \mathcal{I}_t^d\\ &= \langle v_{j,T},w_{(\bar{0}^{t-1}, d, \underline{0}^{t+1})} -w_{(\bar{0}^{T})}\rangle | \mathcal{LF}, \{\rho_i\}_{i \in [p]}, j \in \mathcal{I}_t^d\\ &= \left\langle v_{j,T},\sum_{i =1}^p \alpha_i^{(d,t)} \rho_i- \sum_{i = 1}^p \alpha_i^{(0,t)} \rho_i\right\rangle | \mathcal{LF}, \{\rho_i\}_{i \in [p]}, j \in \mathcal{I}_t^d\\ &= \sum_{i = 1}^p (\alpha_i^{(d,t)} - \alpha_i^{(0,t)})\cdot\mathbb{E}[\langle v_{j,T}, \rho_i \rangle + \varepsilon_{ji}|\mathcal{LF},\{\rho_i\}_{i \in [p]}, j \in \mathcal{I}_t^d]\\ &= \sum_{i = 1}^p (\alpha_i^{(d,t)} - \alpha_i^{(0,t)})\cdot\mathbb{E}[(X_{\mathcal{I}^d_t})_{ij}|\mathcal{LF}, \{\rho_i\}_{i \in [p]},j \in \mathcal{I}_t^d]. \end{align}\] The sixth equality is due to Assumption 12 being applied to each term. The remainder of the steps follows from relevant definitions and standard manipulations. This completes the proof. ◻
Using the above framework and Lemma 14 with \(t = T-1\) allows us to write \[\begin{align} \hat{\gamma}_{n,T,T-1}(d) - \gamma_{n,T,T-1}(d) \mid \mathcal{LF}&= \left\langle \hat{\beta}^{n, \mathcal{I}_{T-1}^d}, \hat{\gamma}_{\mathcal{I}_t^d, T,T-1}(d) \right\rangle - \left\langle \tilde{\beta}^{n, \mathcal{I}_{T-1}^d}, \gamma_{\mathcal{I}_t^d, T,T-1}(d) \right\rangle\\ &= \underbrace{\langle \tilde{\beta}^{n,\mathcal{I}_{T-1}^d}, \eta_{\mathcal{I}_{T-1}^d}(d)\rangle}_{\text{Term 1a}} + \underbrace{\langle \Delta_{n, \mathcal{I}_{T-1}^d} , \eta_{\mathcal{I}_{T-1}^d}(d)\rangle}_{\text{Term 1b}} + \underbrace{\langle \Delta_{n, \mathcal{I}_{T-1}^d}, \gamma_{\mathcal{I}_t^d, T,T-1}(d)\rangle}_{\text{Term 1c}}, \end{align}\] where \(\eta_{\mathcal{I}_{T-1}^d}(d) = \hat{\gamma}_{\mathcal{I}_t^d, T,T-1}(d) - \gamma_{\mathcal{I}_t^d, T,T-1}(d)\) and \(\Delta_{n, \mathcal{I}_{T-1}^d} = \hat{\beta}^{n,\mathcal{I}_{T-1}^d} - \tilde{\beta}^{n,\mathcal{I}_{T-1}^d}\). Using the previously referenced argument by applying the appropriate version of Lemmas 6, 7, and 8 allows to claim for \(n \notin \mathcal{I}_{T-1}^d\) \[\label{eq:non-donor-second-to-terminal-rate} \hat{\gamma}_{n,T,T-1}(d) - \gamma_{n,T,T-1}(d) \mid \mathcal{LF}= O_p\left(\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{11/4}}{p^{1/4}} +k^{4}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\tag{84}\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}_{T-1}^d|, (|\mathcal{I}_T^{D_{n,T}}|)_{n \in [N]}\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_T^0|,|\mathcal{I}_{T-1}^d|, (|\mathcal{I}_T^{D_{n,T}}|)_{n \in [N]}\}\). Combining equations 83 and 84 yields the base case.
Inductive Step: We assume \(P_{d,n}(\ell)\) for \(\ell \in [t+1, \dots, T-1]\) and prove \(P_{d,n}(t)\).
For any \(d \in [A]\):
Donor Set Consistency: Consider unit \(n \in \mathcal{I}_t^d\). Denote \(X_{\mathcal{I}_t^d \setminus n} = X_{:,\mathcal{I}^d_t \setminus n} \in \mathbb{R}^{p \times |\mathcal{I}_t^d \setminus n|}\). We know the baseline outcome admits the representation \[\begin{align} \hat{\gamma}_{n,T,t}(d) - \gamma_{n,T,t}(d) \mid \mathcal{LF}&= \underbrace{\left\langle \hat{\phi}^{n, \mathcal{I}_t^d}, Y_{\mathcal{I}_t^d \setminus n} \right\rangle - \left\langle \phi^{n, \mathcal{I}_t^d}, \mathbb{E}[Y_{\mathcal{I}_t^d \setminus n} \mid \mathcal{LF}] \right\rangle}_{\text{Term 1}}\\ &+ \underbrace{ \left\langle \phi^{n, \mathcal{I}^d_t}, b_{\mathcal{I}^d_t \setminus n}\right\rangle - \left\langle \hat{\phi}^{n, \mathcal{I}^d_t} , \hat{b}_{\mathcal{I}^d_t \setminus n}\right\rangle}_{\text{Term 2}}\\ &+ \underbrace{\left\langle \phi^{n, \mathcal{I}^d_t}, \gamma_{\mathcal{I}^d_t \setminus n, T, T}(D_{\mathcal{I}^d_t \setminus n, T}) \right\rangle - \left\langle \hat{\phi}^{n, \mathcal{I}^d_t}, \hat{\gamma}_{\mathcal{I}^d_t \setminus n, T, T}(D_{\mathcal{I}^d_t \setminus n, T}) \right\rangle}_{\text{Term 3}}\\ &+ \sum_{\ell = t+1}^{T-1}\left( \underbrace{ \left\langle \phi^{n, \mathcal{I}^d_t}, \gamma_{\mathcal{I}^d_t \setminus n, T, \ell}(D_{\mathcal{I}^d_t \setminus n,\ell}) \right\rangle -\left\langle \hat{\phi}^{n, \mathcal{I}^d_t}, \hat{\gamma}_{\mathcal{I}^d_t \setminus n, T, \ell}(D_{\mathcal{I}^d_t \setminus n,\ell}) \right\rangle}_{\text{Term } \ell}\right). \end{align}\] where \(\hat{\phi}^{n, \mathcal{I}_t^d}\) are the regression coefficients from regressing additional covariates \(X_n \in \mathbb{R}^p\) on the rank \(k_{\mathcal{I}_t^d \setminus n}\)-approximation \(X_{\mathcal{I}_t^d \setminus n}\) with \(k_{\mathcal{I}_t^d \setminus n} = \text{rank}(\mathbb{E}[X_{\mathcal{I}_t^d \setminus n})\), i.e., doing PCR with parameter \(k_{\mathcal{I}_t^d \setminus n}\).
Bounding Term 1: We simply use Lemma 11 which holds for any \(t \in [T-1]\) to leverage the proof technique in [10] Theorem \(2\) (Appendix C) to show consistency of \[\begin{align} \label{eq:donor-non-terminal-general-consistency-term1} \text{Term 1} &= \left\langle \hat{\phi}^{n, \mathcal{I}_{t}^d}, Y_{\mathcal{I}_{t}^d \setminus n} \right\rangle - \left\langle \tilde{\phi}^{n, \mathcal{I}_{t}^d}, \mathbb{E}[Y_{\mathcal{I}_{t}^d \setminus n}] \right\rangle\\ &= O_p \left( \sqrt{\log(p |\mathcal{I}_{t}^d|)} \left[ \frac{{k}^{3/4}}{p^{1/4}} + {k}^2 \max \left\{ \frac{\sqrt{|\mathcal{I}_{t}^d|}}{p^{3/2}}, \frac{1}{\sqrt{p}}, \frac{1}{\sqrt{|\mathcal{I}_{t}^d|-1}} \right\} \right] \right).\nonumber \end{align}\tag{85}\]
Bounding Term 2: Using the previously referenced argument for Term \(2\) in the base case by applying the appropriate version of Lemma 6, 7, and 8 alongside Equation 76 and Lemma 12 we know \[\begin{align} \label{eq:donor-baseline-inductive32step} \text{Term 2} &= \left\langle \hat{\phi}^{n, \mathcal{I}_{t}^d},\hat{b}_{\mathcal{I}^d_{t}\setminus n} \right \rangle-\left\langle \tilde{\phi}^{n, \mathcal{I}_{t}^d},b_{\mathcal{I}^d_{t}\setminus n}\right\rangle\\ &= O_p \left( \sqrt{\log(p \pi_{\mathcal{I}}|)} \left[ \frac{{k}^{7/4}}{p^{1/4}} + {k}^3 \max \left\{ \frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{p}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}} \right\} \right] \right),\nonumber \end{align}\tag{86}\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}_{t}^d|\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_T^0|,|\mathcal{I}_{t}^d|\}\).
Bounding Term 3: Using the previously referenced argument for Term \(3\) in the base case by applying the appropriate version of Lemma 6, 7, and 8 alongside Equation 80 for any \(d \in \{D_{n,T}\}_{n \in [N]}\) and Lemma 13 with \(\ell = T\) to write
\[\begin{align} \label{eq:donor-inductive-term3} \text{Term }3 &= \left\langle \phi^{n, \mathcal{I}^d_t}, \gamma_{\mathcal{I}^d_t \setminus n, T, T}(D_{\mathcal{I}^d_t \setminus n, T}) \right\rangle - \left\langle \hat{\phi}^{n, \mathcal{I}^d_t}, \hat{\gamma}_{\mathcal{I}^d_t \setminus n, T, T}(D_{\mathcal{I}^d_t \setminus n, T}) \right\rangle\\ &=O_p\left(\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{9/4}}{p^{1/4}} +k^{7/2}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\nonumber \end{align}\tag{87}\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}_{t}^d|, (|\mathcal{I}_T^{D_{n,T}}|)_{n \in [N]}\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_T^0|,|\mathcal{I}_{t}^d|, (|\mathcal{I}_T^{D_{n,T}}|)_{n \in [N]}\}\).
Bounding Term \(\ell\) for \(\ell \in [t+1, \dots, T-1]\): For any such \(\ell\), we use an argument similar to Term \(3\) in the base case by applying the appropriate version of Lemma 6, 7, and 8 alongside the inductive hypothesis \(P_{d, n}(\ell)\) for all \(d \in \{D_{n, \ell}\}_{n \in [N]}\) and Lemma 13 to write
\[\begin{align} \label{eq:donor-terml} &\text{Term }\ell = \left\langle \phi^{n, \mathcal{I}^d_t}, \gamma_{\mathcal{I}^d_t \setminus n, T, \ell}(D_{\mathcal{I}^d_t \setminus n,\ell}) \right\rangle -\left\langle \hat{\phi}^{n, \mathcal{I}^d_t}, \hat{\gamma}_{\mathcal{I}^d_t \setminus n, T, \ell}(D_{\mathcal{I}^d_t \setminus n,\ell}) \right\rangle\\ &= O_p\left((T-\ell)\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{(T-\ell)}}{p^{1/4}} + k^{(T-\ell)}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\nonumber \end{align}\tag{88}\] where \(\mathcal{F}= \{|\mathcal{I}_T^0|, |\mathcal{I}_{t}^d| ,(|\mathcal{I}_{q}^{D_{n,q}}|)_{n\in [N],q \in [\ell, \dots, T]}\}\) with \(\pi_{\mathcal{I}} = \max\mathcal{F}, \alpha_{\mathcal{I}} = \min\mathcal{F}\).
Note that Terms \(1\)-\(3\) are dominated by the summation, as such it suffices to analyze the latter. To that end, \[\sum_{\ell = t+1}^{T-1}\text{Term }\ell = O_p\left(\sum_{\ell = t+1}^{T-1}(T-\ell)\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{(T-\ell)}}{p^{1/4}} + k^{(T-\ell)}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\]
where \(\mathcal{F}= \{|\mathcal{I}_T^0|, |\mathcal{I}_{t}^d| ,(|\mathcal{I}_{q}^{D_{n,q}}|)_{n\in [N],q \in [t+1, \dots, T]}\}\) with \(\pi_{\mathcal{I}} = \max\mathcal{F}, \alpha_{\mathcal{I}} = \min\mathcal{F}\). Notice we bounded the smaller donor set cardinalates by the largest one, i.e., when \(\ell = t+1\). We analyze the time dependent terms and denote \[C := \sqrt{\log(p\pi_{\mathcal{I}})}, \quad C' := \max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}.\] Upon substitution and reindexing we have \[C \sum_{m = 1}^{T - t - 1} m \left( \frac{k^m}{p^{1/4}} + C'k^m \right) = C \left( \frac{1}{p^{1/4}} + C' \right) \sum_{m = 1}^{T - t - 1} m k^m.\]
We apply the geometric sum derivative trick for \(k \geq 1\) \[\sum_{m=1}^{M} m k^m = \frac{k(1 - (M+1)k^M + Mk^{M+1})}{(1 - k)^2} = \Theta(Mk^{M+1})\] Taking \(M = T - t - 1\), we conclude \[\sum_{\ell = t+1}^{T-1}\text{Term }\ell = O_p\left((T - t)\sqrt{\log(p\pi_{\mathcal{I}})}\left( \frac{k^{(T - t)}}{p^{1/4}} + k^{(T - t)}\max\left\{ \frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}} - 1}}, \frac{1}{\sqrt{p}} \right\} \right) \right),\]
Combining every term yields for any \(n \in \mathcal{I}^d_t\) \[\begin{align} \label{eq:inductive-step-donor-blip-rate} &\hat{\gamma}_{n,T,t}(d) - \gamma_{n,T,t}(d) \mid \mathcal{LF}\\ &= O_p\left((T-t)\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{(T-t)}}{p^{1/4}} + k^{(T-t)}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\nonumber \end{align}\tag{89}\] where \(\mathcal{F}= \{|\mathcal{I}_T^0|, |\mathcal{I}_t^d| ,(|\mathcal{I}_{q}^{D_{n,q}}|)_{n\in[N], q \in [t+1, \dots, T]}\}\) with \(\pi_{\mathcal{I}} = \max\mathcal{F}, \alpha_{\mathcal{I}} = \min\mathcal{F}\).
Non-Donor Set Consistency: Applying the Non-Donor Set Consistency argument written for the Base Case for general \(t\), specifically Lemma 14 for any \(t \in [T-1]\), proves \(P_{d,n}(t)\).
4. Verifying Target Causal Parameter Consistency: For any unit \(n \in [N]\) and \(\bar{d}^T \in [A]^T\) we recall the SBE-PCR
estimator and the
corresponding causal estimand. \[\hat{\mathbb{E}}\left[Y_{n,T}^{(\bar{d}^T)}\right] = \sum_{t = 1}^T \hat{\gamma}_{n,T,t}(d_t) + \hat{b}_{n,T} \quad \text{and} \quad \mathbb{E}\left[Y_{n,t}^{(\bar{d}^T)} \mid \mathcal{LF}\right]
= \sum_{t=1}^T \gamma_{n,T,t}(d_t) + b_{n,T} \mid \mathcal{LF}.\] The difference is exactly \[\hat{\mathbb{E}}\left[Y_{n,T}^{(\bar{d}^T)}\right] - \mathbb{E}\left[Y_{n,t}^{(\bar{d}^T)} \mid \mathcal{LF}\right] = \left(
\hat{b}_{n,T} - b_{n,T} \mid \mathcal{LF}\right) +
\sum_{t=1}^T \left(\hat{\gamma}_{n,T,t}(d_t) - \gamma_{n,T,t}(d_t) \mid \mathcal{LF}\right)\]
We apply the known bound for each term, specifically Equation 76 , Equation 80 with \(d = d_T\), and \(P_{d_{t},n}(t)\) for every \(t \in [T-1]\). Once again we encounter the same geometric sum, which gives the desired result upon noting that the baseline rate is dominated by that of the sum.
We recall that for any unit \(n \in [N]\) and \(\bar{d}^T \in [A]^T\)
\[\mathbb{E}\left[Y_{n,t}^{(\bar{d}^T)} \mid \mathcal{LF}\right] = \sum_{t=1}^T \gamma_{n,T,t}(d_t) + b_{n,T} \mid \mathcal{LF}= \sum_{t=1}^T \langle \psi_n^{T,t}, w_{d_t} - w_{0_t}\rangle + b_{n,T} \mid \mathcal{LF}.\]
Given Assumption 13 we know that \(\psi_{n}^{T,T-q-i} = 0\) for all \(i \geq 1\). As such, \[\mathbb{E}\left[Y_{n,t}^{(\bar{d}^T)} \mid \mathcal{LF}\right] = \sum_{t=T-q}^T \langle \psi_n^{T,t}, w_{d_t} - w_{0_t}\rangle + b_{n,T} \mid \mathcal{LF}= \sum_{t=T-q}^T \gamma_{n,T,t}(d_t) + b_{n,T} \mid \mathcal{LF}\]
We modify the SBE-PCR
estimator accordingly \[\hat{\mathbb{E}}\left[Y_{n,T}^{(\bar{d}^T)} \mid \mathcal{LF}\right] := \sum_{t = T-q}^T \hat{\gamma}_{n,T,t}(d_t) + \hat{b}_{n,T}.\]
Applying the analysis from the proof of Theorem 3 yields the desired result.
Recall \(z_{n, t}\) is the latent state of unit \(n\) if it undergoes action sequence \(\bar{d}^t\). By a simple recursion we have \[\begin{align} z^{(\bar{d}^t)}_{n, t} &= \sum^{t}_{\ell = 1} \boldsymbol{B}^{t - \ell}_{n} \boldsymbol{C}_{n} \;w_{d_\ell} + \sum^{t}_{\ell = 1} \boldsymbol{B}^{t - \ell}_{n} \eta_{n, \ell} + \eta_{n, t} \end{align}\] Hence, \[\begin{align} Y^{(\bar{d}^t)}_{n, t} &= \left\langle \theta_{n}, \; \sum^{t}_{\ell = 1} \boldsymbol{B}^{t - \ell}_{n} \boldsymbol{C}_{n} \;w_{d_\ell} + \sum^{t}_{\ell = 1} \boldsymbol{B}^{t - \ell}_{n} \eta_{n, \ell} \right\rangle + \langle \tilde{\theta}_{n}, w_{d_t} \rangle + \tilde{\eta}_{n, t} \\ &= \sum^{t}_{\ell = 1} \Big(\left\langle \psi^{t - \ell}_{n}, w_{d_\ell} \right\rangle + \varepsilon_{n, t, \ell} \Big), \end{align}\] where in the last line we use the definitions of \(\psi^{t - \ell}_{n}\) and \(\varepsilon_{n, t, \ell}\) in the proposition statement. This completes the proof.
For simplicity, we omit the conditioning on \(\mathcal{LF}\) in all derivations; all expectations are conditioned on \(\mathcal{LF}\).
1. Verifying 28 . First, we verify 28 holds, which allows us to express the counterfactual outcomes, in terms of the blips and the baseline. For all \(n \in [N]\), using Assumption 14 we have: \[\begin{align} \mathbb{E}[Y^{(\bar{d}^T)}_{n, T} \mid \mathcal{LF}] \nonumber &= \mathbb{E}[Y^{(\bar{d}^T)}_{n, T} - Y^{(\bar{0}^T)}_{n, T} \mid \mathcal{LF}] + \mathbb{E}[Y^{(\bar{0}^T)}_{n, T} \mid \mathcal{LF}] \nonumber \\ &= \mathbb{E}\left[ \sum^{T}_{t = 1} \left\langle \psi^{T - t}_{n}, w_{d_t} - w_{\tilde{0}} \right\rangle + \varepsilon^{(\bar{d}^T)}_{n, T} - \varepsilon^{(\bar{0}^T)}_{n, T} \mid \mathcal{LF}\right] + \mathbb{E}\left[\sum^{T}_{t = 1} \left\langle \psi^{T - t}_{n}, w_{\tilde{0}} \right\rangle + \varepsilon^{(\bar{0}^T)}_{n, T} \mid \mathcal{LF}\right] \nonumber \\ &= \sum^{T}_{t = 1} \gamma_{n, T- t}(d_t) \mid \mathcal{LF}+ b_{n, T} \mid \mathcal{LF}\nonumber \end{align}\]
We first show 29 holds. For \(j \in \mathcal{I}^{0}_t\): \[\begin{align} b_{j, t} \mid \mathcal{LF} &= \sum^t_{\ell = 1} \left\langle \psi^{t - \ell}_{j}, w_{\tilde{0}} \right\rangle \mid \mathcal{LF} = \mathbb{E}\left[ \sum^t_{\ell = 1} \left\langle \psi^{t - \ell}_{j}, w_{\tilde{0}} \right\rangle + \varepsilon^{(\bar{0}^t)}_{j, t} \mid \mathcal{LF}\right] \tag{90} \\ &= \mathbb{E}\left[ \sum^t_{\ell = 1} \left\langle \psi^{t - \ell}_{j}, w_{\tilde{0}} \right\rangle + \varepsilon^{(\bar{0}^t)}_{j, t} \mid \mathcal{LF}, \mathcal{I}^{0}_t \right] \tag{91} \\ &= \mathbb{E}\left[Y_{j,t}^{(\bar{0}^t)} \mid \mathcal{LF}, j \in \mathcal{I}^{0}_t \right] \tag{92} \\&= \mathbb{E}\left[Y_{j,t} \mid \mathcal{LF}, j \in \mathcal{I}^{0}_t \right], \tag{93} \end{align}\] where 90 and 92 follow from Assumption 14; 91 follows from the fact that \(\left\langle \psi^{t - \ell}_{j}, w_{\tilde{0}} \right\rangle\) is deterministic conditional on \(\mathcal{LF}\), and that \(\mathbb{E}[\varepsilon^{(\bar{0}^t)}_{j, t} \mid \mathcal{LF}, \mathcal{I}^{0}_t] = \mathbb{E}[\varepsilon^{(\bar{0}^t)}_{j, t} \mid \mathcal{LF}]\) as seen in the definition of \(\mathcal{I}^{0}_t\); 93 follows from Assumption 1.
Next we show 30 holds. For \(i \notin \mathcal{I}^{0}_t\): \[\begin{align} b_{i, t} \mid \mathcal{LF} &= \sum^{t}_{\ell = 1} \left\langle \psi^{t - \ell}_{i}, w_{\tilde{0}} \right\rangle \mid \mathcal{LF}\nonumber \\&= \sum^{t}_{\ell = 1} \left\langle \psi^{t - \ell}_{i}, w_{\tilde{0}} \right\rangle \mid \mathcal{LF}, \mathcal{I}^{0}_t \tag{94} \\&= \sum^{t}_{\ell = 1} \sum_{j \in \mathcal{I}^{0}_t} \beta_j^{i,\mathcal{I}^{0}_t} \left\langle \psi^{t - \ell}_{j}, w_{\tilde{0}} \right\rangle \mid \mathcal{LF}, \mathcal{I}^{0}_t \tag{95} \\&= \sum_{j \in \mathcal{I}^{0}_t} \beta_j^{i,\mathcal{I}^{0}_t} b_{j, t} \mid \mathcal{LF}, \mathcal{I}^{0}_t \nonumber \end{align}\] where 94 follows from the fact that \(\left\langle \psi^{T - t}_{i}, w_{\tilde{0}} \right\rangle\) is deterministic conditional on \(\mathcal{LF}\); 95 follows from Assumption 16;
We first show 31 holds. For all \(d \in [A]\) and \(j \in \mathcal{I}^d\): \[\begin{align} & \gamma_{j, 0}(d) \mid \mathcal{LF}\nonumber = \left\langle \psi^{0}_{j}, w_{d} - w_{\tilde{0}} \right\rangle \mid \mathcal{LF}\nonumber \\ &= \mathbb{E}\left[\left\langle \psi_{j}^{0}, w_{d} - w_{\tilde{0}} \right\rangle + \varepsilon^{(\bar{0}^{t^*_j-1}, d)}_{j, t^*_j} \pm \sum^{t^*_j - 1}_{\ell = 1} \left\langle \psi_{j}^{\ell}, w_{\tilde{0}} \right\rangle \mid \mathcal{LF}\right] \tag{96} \\ &= \mathbb{E}\left[\left\langle \psi_{j}^{0}, w_{d} \right\rangle + \varepsilon^{(\bar{0}^{t^*_j-1}, d)}_{j, t^*_j} + \sum^{t^*_j - 1}_{\ell = 1} \left\langle \psi_{j}^{\ell}, w_{\tilde{0}} \right\rangle \mid \mathcal{LF}\right] - \sum^{t^*_j - 1}_{\ell = 0} \left\langle \psi_{j}^{\ell}, w_{\tilde{0}} \right\rangle \mid \mathcal{LF}\nonumber \\ &= \mathbb{E}\left[\left\langle \psi_{j}^{0}, w_{d} \right\rangle + \varepsilon^{(\bar{0}^{t^*_j-1}, d)}_{j, t^*_j} + \sum^{t^*_j - 1}_{t = 1} \left\langle \psi_{j}^{\ell}, w_{\tilde{0}} \right\rangle \mid \mathcal{LF}\right] - b_{j, t^*_j} \mid \mathcal{LF}\nonumber \\ &= \mathbb{E}[Y^{(\bar{D}^{t^*_j}_j)}_{j, t^*_j} \mid \mathcal{LF}, \;j \in \mathcal{I}^d] - b_{j, t^*_j} \mid \mathcal{LF}\tag{97} \\ &= \mathbb{E}[Y_{j, t^*_j} \mid \mathcal{LF}, \;j \in \mathcal{I}^d] - b_{j, t^*_j} \mid \mathcal{LF}\tag{98} \end{align}\] where 96 follows from Assumption 14; 97 follows from the definition of \(\mathcal{I}^d\) and Assumption 14; 98 follows from Assumption 1.
Next we show 32 holds. For \(i \notin \mathcal{I}^d\) \[\begin{align} \gamma_{i, 0}(d) \mid \mathcal{LF} &= \left\langle \psi^{0}_{i}, w_{d} - w_{\tilde{0}} \right\rangle \mid \mathcal{LF} = \left\langle \psi^{0}_{i}, w_{d} - w_{\tilde{0}} \right\rangle \mid \mathcal{LF}, \mathcal{I}^d \tag{99} \\ &= \sum_{j \in \mathcal{I}^d} \beta^{i, \mathcal{I}^d}_{j}\left\langle \psi_{j}^{0}, w_{d} - w_{\tilde{0}} \right\rangle \mid \mathcal{LF}, \mathcal{I}^d \tag{100} \\ &= \sum_{j \in \mathcal{I}^d} \beta^{i, \mathcal{I}^d}_{j} \gamma_{j, 0}(d) \mid \mathcal{LF}, \mathcal{I}^d \nonumber \end{align}\] 99 follows from the fact that \(\left\langle \psi^{0}_{i}, w_{d} - w_{\tilde{0}} \right\rangle\) is deterministic conditional on \(\mathcal{LF}\); 100 follows from Assumption 16.
We first show 33 holds. For all \(d \in [A]\), \(t \in [T - 1]\), \(j \in \mathcal{I}^d\):
\[\begin{align} &\mathbb{E}\left[ Y_{j, t^*_j + t} - Y^{(\bar{0}_{t^*_j + t})}_{j, t^*_j + t} \mid \mathcal{LF}, j \in \mathcal{I}^d \right] = \mathbb{E}\left[ Y^{(\bar{D}^{t^*_j + t}_j)}_{j, t^*_j + t} - Y^{(\bar{0}_{t^*_j + t})}_{j, t^*_j + t} \mid \mathcal{LF}, j \in \mathcal{I}^d \right] \tag{101} \\ &= \mathbb{E}\left[ Y^{(\bar{D}^{t^*_j + t}_j)}_{j, t^*_j + t} - Y^{(\bar{D}^{t^*_j - 1}_j, \underline{0}^{t^*_j})}_{j, t^*_j + t} \mid \mathcal{LF}, j \in \mathcal{I}^d \right] \tag{102} \\ &= \sum^{t}_{\ell = 0} \mathbb{E}\left[ Y^{(\bar{D}^{t^*_j + t - \ell}_j, \underline{0}^{t^*_j + t - \ell + 1})}_{j, t^*_j + t} - Y^{(\bar{D}^{t^*_j + t - \ell - 1}_j, \underline{0}^{t^*_j + t - \ell})}_{j, t^*_j + t} \mid \mathcal{LF}, j \in \mathcal{I}^d \right] \tag{103} \end{align}\] where 101 follows from Assumption 1; 102 uses that for \(j \in \mathcal{I}^d\), \(\bar{D}^{t^*_j - 1}_n = (\tilde{0}, \dots, \tilde{0})\), and Assumption 1. Then, \[\begin{align} &\sum^{t}_{\ell = 0} \mathbb{E}\left[ Y^{(\bar{D}^{t^*_j + t - \ell}_j, \underline{0}^{t^*_j + t - \ell + 1})}_{j, t^*_j + t} - Y^{(\bar{D}^{t^*_j + t - \ell - 1}_j, \underline{0}^{t^*_j + t - \ell})}_{j, t^*_j + t} \mid \mathcal{LF}, j \in \mathcal{I}^d \right] \nonumber \\ &= \sum^{t}_{\ell = 0} \mathbb{E}\left[ \left\langle \psi^{\ell}_j, w_{D_{j, t^*_j + t - \ell}} -w_{\tilde{0}} \right\rangle + \varepsilon^{(\bar{D}^{t^*_j + t - \ell}_j, \underline{0}^{t^*_j + t - \ell + 1})}_{j, t^*_j + t} - \varepsilon^{(\bar{D}^{t^*_j + t - \ell - 1}_j, \underline{0}^{t^*_j + t - \ell})}_{j, t^*_j + t} \mid \mathcal{LF}, j \in \mathcal{I}^d \right] \tag{104} \\ &= \mathbb{E}\left[\left\langle \psi^{t}_j, w_{D_{j, t^*_j + t}} -w_{\tilde{0}} \right\rangle\mid \mathcal{LF}, j \in \mathcal{I}^d \right] \nonumber \\ & \quad \quad + \sum^{t - 1}_{\ell = 0} \mathbb{E}\left[ \left\langle \psi^{\ell}_j, w_{D_{j, t^*_j + t - \ell}} -w_{\tilde{0}} \right\rangle + \varepsilon^{(\bar{D}^{t^*_j + t - \ell}_j, \underline{0}^{t^*_j + t - \ell + 1})}_{j, t^*_j + t} - \varepsilon^{(\bar{D}^{t^*_j + t - \ell - 1}_j, \underline{0}^{t^*_j + t - \ell})}_{j, t^*_j + t} \mid \mathcal{LF}, j \in \mathcal{I}^d \right] \nonumber \\ &= \left\langle \psi^{t}_j, w_{d} -w_{\tilde{0}} \right\rangle\mid \mathcal{LF}\nonumber \\ & \quad \quad + \sum^{t - 1}_{\ell = 0} \mathbb{E}\left[ \left\langle \psi^{\ell}_j, w_{D_{j, t^*_j + t - \ell}} -w_{\tilde{0}} \right\rangle + \varepsilon^{(\bar{D}^{t^*_j + t - \ell}_j, \underline{0}^{t^*_j + t - \ell + 1})}_{j, t^*_j + t} - \varepsilon^{(\bar{D}^{t^*_j + t - \ell - 1}_j, \underline{0}^{t^*_j + t - \ell})}_{j, t^*_j + t} \mid \mathcal{LF}, j \in \mathcal{I}^d \right] \tag{105} \\ &= \left\langle \psi^{t}_j, w_{d} -w_{\tilde{0}} \right\rangle\mid \mathcal{LF}+ \sum^{t - 1}_{\ell = 0} \mathbb{E}\left[ \left\langle \psi^{\ell}_j, w_{D_{j, t^*_j + t - \ell}} -w_{\tilde{0}} \right\rangle \mid \mathcal{LF}, j \in \mathcal{I}^d \right]. \nonumber \\ & \quad \quad + \sum^{t - 1}_{\ell = 0} \mathbb{E}\left[ \mathbb{E}\left[ \varepsilon^{(\bar{\delta}^{t^*_j + t - \ell}, \underline{0}^{t^*_j + t - \ell + 1})}_{j, t^*_j + t} - \varepsilon^{(\bar{\delta}^{t^*_j + t - \ell - 1}, \underline{0}^{t^*_j + t - \ell})}_{j, t^*_j + t} \mid \bar{D}_j^{t^*_j + t - \ell} = \bar{\delta}^{t^*_j + t - \ell} \right] \mid \mathcal{LF}, j \in \mathcal{I}^d \right] \nonumber \\ &= \left\langle \psi^{t}_j, w_{d} -w_{\tilde{0}} \right\rangle\mid \mathcal{LF} + \sum^{t - 1}_{\ell = 0} \left\langle \psi^{\ell}_j, w_{D_{j, t^*_j + t - \ell}} -w_{\tilde{0}} \right\rangle \mid \mathcal{LF}, j \in \mathcal{I}^d \tag{106} \\ &= \gamma_{j, t}(d) \mid \mathcal{LF} + \sum^{t - 1}_{\ell = 0} \gamma_{j, t}(D_{j, t^*_j + t - \ell}) \mid \mathcal{LF}\tag{107} \end{align}\] where 104 follows from Assumption 14; 105 follows from the definition of \(\mathcal{I}^d\), i.e., for \(j \in \mathcal{I}^d\), \(\bar{D}^{t^*_j}_j = (\bar{0}^{t^*_j - 1}, d)\); 106 follows from Assumption 17. Re-arranging 107 we have that, \[\begin{align} \gamma_{j, t}(d) \mid \mathcal{LF} &= \mathbb{E}\left[ Y_{j, t^*_j + t} - Y^{(\bar{0}_{t^*_j + t})}_{j, t^*_j + t} \mid \mathcal{LF}, j \in \mathcal{I}^d \right] - \sum^{t - 1}_{\ell = 0} \gamma_{j, t}(D_{j, t^*_j + t - \ell}) \mid \mathcal{LF}\nonumber \\ &= \mathbb{E}\left[ Y_{j, t^*_j + t} \mid \mathcal{LF}, j \in \mathcal{I}^d \right] - \mathbb{E}\left[ Y^{(\bar{0}_{t^*_j + t})}_{j, t^*_j + t} \mid \mathcal{LF}\right] - \sum^{t - 1}_{\ell = 0} \gamma_{j, t}(D_{j, t^*_j + t - \ell}) \mid \mathcal{LF}\tag{108} \\ &= \mathbb{E}\left[ Y_{j, t^*_j + t} \mid \mathcal{LF}, j \in \mathcal{I}^d \right] - b_{j, t^*_j + t} \mid \mathcal{LF}- \sum^{t - 1}_{\ell = 0} \gamma_{j, t}(D_{j, t^*_j + t - \ell}) \mid \mathcal{LF}\tag{109} \end{align}\] where 108 follows from the definition of \(\mathcal{I}^d\); 109 follows from Assumption 14.
Next we show 34 holds. For all \(d \in [A]\), \(t < T\), \(i \notin \mathcal{I}^d\): \[\begin{align} \gamma_{i, t}(d) \mid \mathcal{LF} &= \left\langle \psi^{t}_{i}, w_{d} - w_{\tilde{0}} \right\rangle \mid \mathcal{LF}\tag{110} \\ &= \left\langle \psi^{t}_{i}, w_{d} - w_{\tilde{0}} \right\rangle \mid \mathcal{LF}, \mathcal{I}^d \tag{111} \\ &= \sum_{j \in \mathcal{I}^d} \beta^{i, \mathcal{I}^d}_{j}\left\langle \psi_{j}^{t}, w_{d} - w_{\tilde{0}} \right\rangle \mid \mathcal{LF}, \mathcal{I}^d \tag{112} \\ &= \sum_{j \in \mathcal{I}^d} \beta^{i, \mathcal{I}^d}_{j} \gamma_{j, t}(d) \mid \mathcal{LF}, \mathcal{I}^d \nonumber \end{align}\] where 111 follows from the the fact that \(\left\langle \psi^{t}_{i}, w_{d} - w_{\tilde{0}} \right\rangle\) is deterministic conditional on \(\mathcal{LF}\); 112 follows from Assumption 16;
Assumption 14 is not restrictive. Recalling the Linear Dynamical System setting from Proposition 3, we present a few sufficient conditions for the above to hold true.
Hard Memory Cutoff \[\exists q \in \mathbb{N}, \boldsymbol{B}_{n}^{q+1} = 0.\]
Exponential Forgetting (Spectral Decay Condition) \[\exists C > 0, \rho \in (0,1), \text{ such that for all } t, \quad \left\| \boldsymbol{B}_{n}^{t} \right\|_2 \leq C \rho^{t}.\]
Soft Memory Cutoff (Higher-Order Markov Property) \[\mathbb{P}(z_{n,T} \mid z_{n,T-1}, z_{n,T-2}, \dots, z_{n,0}) = \mathbb{P}(z_{n,T} \mid z_{n,T-1}, \dots, z_{n,T-q}).\]
Clearly, the first condition is the strongest and implies the other two. In general, this shows that our assumption of fixed memory is a reasonable one proving the effectiveness of our methodology within the dynamic treatment regime from a statistical perspective.
The next result is a consequence of Assumption 20 and will be essential in establishing consistency.
Lemma 15. Let Assumption 20 hold. Then for all \(d \in [A]\) and \(t \in [T]\) and \(\ell < t\) there exists \(\alpha^{(d, t, \ell)} \in \mathbb{R}^p\) such that \[\mathbb{E}\left[Y_{j, t_j^* + t}^{(\mkern 1.3mu\overline{\mkern-1.3muD_{j, t_j^* + t - \ell}\mkern-1.3mu}\mkern 1.3mu, \underline{\tilde{0}^{t_j^* + t - \ell+ 1}})} \big| \mathcal{LF}\right] = \sum_{i = 1}^p \alpha_i^{(d,t, \ell)}\cdot \mathbb{E}[(X_{\mathcal{I}^d})_{ij}|\mathcal{LF}]\] and there exists \(\alpha^{(d, t, \ell)'} \in \mathbb{R}^p\) such that \[\mathbb{E}\left[Y_{j, t_j^* + t}^{(\mkern 1.3mu\overline{\mkern-1.3muD_{j, t_j^* + t - \ell - 1}\mkern-1.3mu}\mkern 1.3mu, \underline{\tilde{0}^{t_j^* + t - \ell}})} \big| \mathcal{LF}\right] = \sum_{i = 1}^p \alpha_i^{(d,t, \ell)'}\cdot \mathbb{E}[(X_{\mathcal{I}^d})_{ij}|\mathcal{LF}].\]
Proof. This holds as an immediate consequence of Assumption 20 where we consider \(D_{j,t_j^* + t - \ell + i} = 0_{\ell}\) for any \(i \in [\ell]\) for the first term and \(D_{j,t_j^* + t - \ell} = 0_{\ell}\) as well for the second term which is fine given our assumption that \(\ell < t\). ◻
We present consistent results (and their proofs) that serve as preliminaries for proving Theorem 6. This is analogous to Theorem 3, which serve as a preliminary result for Theorem 4.
Theorem 7. Let Assumption 1 to 20
hold.19 Consider the SBE-PCR
estimator in Section 5.3 and suppose \(k =
\max_{\mathcal{I}\in \{\mathcal{I}^d\} \cup \{\mathcal{I}^0_t\}}\text{rank}(\mathbb{E}[X_{\mathcal{I}}])\). Then conditional on the treatment assignments, \(\mathcal{LF}\), and \(\{\rho_i\}_{i \in [p]}\) we have:
(i) Baseline Consistency: For any \(n \in [N]\) and \(t \in [T]\) \[\hat{b}_{n,t} - b_{n,t} \mid \mathcal{LF} = O_p\left(\sqrt{\log(p|\mathcal{I}_t^0|)}\left(\frac{k^{5/4}}{p^{1/4}} +k^{5/2}\max\left\{\frac{\sqrt{|\mathcal{I}_t^0|}}{p^{3/2}}, \frac{1}{\sqrt{|\mathcal{I}_t^0|-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right).\]
(ii) Terminal Blip Consistency: For any \(d \in [A]\) and unit \(n \in [N]\) \[\hat{\gamma}_{n,0}(d) - \gamma_{n,0}(d) \mid \mathcal{LF}= O_p\left(\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{9/4}}{p^{1/4}} +k^{7/2}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}^d|\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_1^0|,|\mathcal{I}^d|\}\).
(iii) Non-Terminal Blip Consistency: For any \(d \in [A]\), unit \(n \in [N]\), and \(t \in [1, \dots, T-1]\): \[\begin{align} &\hat{\gamma}_{n, t}(d) - \gamma_{n, t}(d) \mid \mathcal{LF}\\ &= O_p\left(t\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{t}}{p^{1/4}} + k^{t}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right), \end{align}\] where \(\mathcal{C}= \{|\mathcal{I}^d|, |\mathcal{I}_T^0|, |\mathcal{I}_1^0|, (\mathcal{I}^{D_{n, t_n^* + q}})_{n \in [N], q\in[1, \dots, t]} \}\) with \(\pi_{\mathcal{I}} = \max\mathcal{C}, \alpha_{\mathcal{I}} = \min\mathcal{C}\).
(iv) Target Causal Parameter Consistency: For \(n \in [N]\), and \(\bar{d}^T \in [A]^T\): \[\widehat{\mathbb{E}}[Y_{n, T}^{(\bar{d}^T)}] - \mathbb{E}[Y_{n, T}^{(\bar{d}^T)}\mid \mathcal{LF}]=O_p\left(T\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{T}}{p^{1/4}} + k^{T}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\] where \(\mathcal{C}= \{|\mathcal{I}_T^0|, |\mathcal{I}^0_1|, (|\mathcal{I}^{d_{t}}|)_{t \in [T]} ,(\mathcal{I}^{D_{n, t_n^* + t}})_{n \in [N], t \in [1, \dots, T-1]}\}\) with \(\pi_{\mathcal{I}} = \max\mathcal{C}\) and \(\alpha_{\mathcal{I}} = \min\mathcal{C}\). Here, each \(O_p(\cdot)\) is defined with respect to the sequence \(\min\{p, \alpha_{\mathcal{I}}\}\).20
Below we provide a full proof of Theorem 7, which is quite similar to that of Theorem 3.
1. Verifying Baseline Consistency:
For any \(t \in [T]\):
Donor Set Baseline Consistency: Consider unit \(n \in \mathcal{I}_t^0\). Denote \(X_{\mathcal{I}_t^0 \setminus n} = X_{:, \mathcal{I}^0_t\setminus n} \in \mathbb{R}^{ p \times |\mathcal{I}_t^0 \setminus n|}\). We know the baseline outcome admits the representation \[\hat{b}_{n,t} - b_{n,t} \mid \mathcal{LF}= \left\langle \hat{\phi}^{n, \mathcal{I}_t^0}, Y_{\mathcal{I}_t^0 \setminus n,t} \right\rangle - \left\langle \phi^{n, \mathcal{I}_t^0}, \mathbb{E}[Y_{\mathcal{I}_t^0 \setminus n, t} \mid \mathcal{LF}] \right\rangle,\] where \(\hat{\phi}^{n, \mathcal{I}_t^0}\) are the regression coefficients from regressing additional covariates \(X_n \in \mathbb{R}^p\) on the rank \(k_{\mathcal{I}_t^0 \setminus n}\)-approximation \(X_{\mathcal{I}_t^0 \setminus n}\) with \(k_{\mathcal{I}_t^0 \setminus n} = \text{rank}(\mathbb{E}[X_{\mathcal{I}_t^0 \setminus n}])\), i.e., doing PCR with parameter \(k_{\mathcal{I}_t^0 \setminus n}\).
Lemma 16. We claim the following \[\left\langle \phi^{n, \mathcal{I}_t^0}, \mathbb{E}[Y_{\mathcal{I}_t^0 \setminus n, t}] \right\rangle = \left\langle \tilde{\phi}^{n, \mathcal{I}_t^0}, \mathbb{E}[Y_{\mathcal{I}_t^0 \setminus n,t} ] \right\rangle,\] where \(\tilde{\phi}^{n, \mathcal{I}_t^0} = VV^{\top}\phi^{n, \mathcal{I}_t^0}\) where \(V \in \mathbb{R}^{|\mathcal{I}_t^0 \setminus n| \times k_{\mathcal{I}_t^0 \setminus n}}\) denotes the right singular vectors of \(\mathbb{E}[X_{\mathcal{I}_t^0 \setminus n}]\).
Proof. By Assumption 20 there exists \(\xi^{(0,t)}\) such that for any \(j \in \mathcal{I}^0_t \setminus n\) \[\mathbb{E}[Y_{j,t}|\mathcal{LF}, j \in \mathcal{I}^0_t \setminus n] = \sum_{i = 1}^p \xi^{(0,t)}_i \cdot \mathbb{E}[(X_{\mathcal{I}^0_t\setminus n})_{ij}|\mathcal{LF}, j \in \mathcal{I}^0_t \setminus n].\]
As such, \[\mathbb{E}[Y_{\mathcal{I}_t^0 \setminus n, t}] = VV^T \mathbb{E}[Y_{\mathcal{I}_t^0 \setminus n, t}],\] which gives us \[\left\langle \tilde{\phi}^{n, \mathcal{I}_t^0}, \mathbb{E}[Y_{\mathcal{I}_t^0 \setminus n, t} ] \right\rangle = \left\langle VV^{\top}\phi^{n, \mathcal{I}_t^0}, \mathbb{E}[Y_{\mathcal{I}_t^0 \setminus n, t}] \right\rangle = \mathbb{E}[Y_{\mathcal{I}_t^0 \setminus n, t} ]^{\top}VV^{\top} \cdot \phi^{n, \mathcal{I}_t^0} = \left\langle \phi^{n, \mathcal{I}_t^0}, \mathbb{E}[Y_{\mathcal{I}_t^0 \setminus n, t} ] \right\rangle\] proving the desired result. ◻
Using Lemma 16, we can now lift the proof technique in [10] Theorem \(2\) (Appendix C) to show consistency for \(n \in \mathcal{I}_t^0\) \[\begin{align} \label{eq:donor-baseline-consistency-invariant} \hat{b}_{n,t} - b_{n,t} \mid \mathcal{LF}&= \left\langle \hat{\phi}^{n, \mathcal{I}_t^0}, Y_{\mathcal{I}_t^0 \setminus n, t} \right\rangle - \left\langle \tilde{\phi}^{n, \mathcal{I}_t^0}, \mathbb{E}[Y_{\mathcal{I}_t^0 \setminus n, t}] \right\rangle \nonumber\\ &= O_p \left( \sqrt{\log(p |\mathcal{I}_t^0|)} \left[ \frac{{k}^{3/4}}{p^{1/4}} + {k}^2 \max \left\{ \frac{\sqrt{|\mathcal{I}_t^0|}}{p^{3/2}}, \frac{1}{\sqrt{p}}, \frac{1}{\sqrt{|\mathcal{I}_t^0|-1}} \right\} \right] \right), \end{align}\tag{113}\] where we set \(T_1 = 1\), \(\tilde{w}^{(i,d)} = \tilde{\phi}^{n, \mathcal{I}_t^0}\), \(\hat{w}^{(i,d)} = \hat{\phi}^{n, \mathcal{I}_t^0}\), \(Y_{t,\mathcal{I}^{(d)}} = Y_{\mathcal{I}_t^0 \setminus n}\), \(\mathbb{E}[Y_{t,\mathcal{I}^{(d)}}] = \mathbb{E}[Y_{\mathcal{I}_t^0 \setminus n} \mid \mathcal{LF}]\), and \(\mathcal{P}_{V_{\text{pre}}} = VV^{\top}\). Furthermore, in the final rate we set \(T_0 = p\), \(N_d = |\mathcal{I}_t^0 \setminus n|\), and \(r_{\text{pre}} = k_{\mathcal{I}_t^0 \setminus n}\). To conclude, we used that \(|\mathcal{I}_t^0 \setminus n| = |\mathcal{I}_t^0| - 1\) and \(k_{\mathcal{I}_t^0 \setminus n} \leq k\) where \(k\) is the uniform upper bound on the rank on all possible expected covariate matrices, i.e., \(k = \max_{\mathcal{I}\in \{\mathcal{I}^d\} \cup \{\mathcal{I}^0_t\}}\text{rank}(\mathbb{E}[X_{\mathcal{I}}]).\)
Non-Donor Set Baseline Consistency: Consider unit \(n \notin \mathcal{I}_t^0\). Denote \(X_{\mathcal{I}_t^0} = X_{:, \mathcal{I}^0_t}\in \mathbb{R}^{p \times |\mathcal{I}_t^0|}\). We know the baseline outcome admits the representation \[\hat{b}_{n,t} - b_{n,t} \mid \mathcal{LF}= \left\langle \hat{\beta}^{n, \mathcal{I}_t^0}, \hat{b}_{\mathcal{I}_t^0,t } \right\rangle - \left\langle \beta^{n, \mathcal{I}_t^0}, b_{\mathcal{I}_t^0,t } \right\rangle,\] where \(\hat{\beta}^{n, \mathcal{I}_t^0}\) are the regression coefficients from regressing additional covariates \(X_n \in \mathbb{R}^p\) on the rank \(k_{\mathcal{I}_t^0 }\)-approximation of \(X_{\mathcal{I}_t^0 }\) with \(k_{\mathcal{I}_t^0 } = \text{rank}(\mathbb{E}[X_{\mathcal{I}_t^0}])\), i.e., doing PCR with parameter \(k_{\mathcal{I}_t^0 }\).
Lemma 17. We have that \[\left\langle \beta^{n, \mathcal{I}_t^0}, b_{\mathcal{I}_t^0,t} \right\rangle = \left\langle \tilde{\beta}^{n, \mathcal{I}_t^0}, b_{\mathcal{I}_t^0,t} \right\rangle\] with \(\tilde{\beta}^{n, \mathcal{I}_t^0} = VV^{\top}{\beta}^{n, \mathcal{I}_t^0}\), where \(V\) denotes the right singular vectors of \(\mathbb{E}[X_{ \mathcal{I}_t^0}]\).
Proof. It suffices to prove that \[VV^{\top}b_{\mathcal{I}_t^0,t} = b_{\mathcal{I}_t^0,t},\] which is equivalent to \((b_{\mathcal{I}_t^0, t})^{\top}\) being in the rowspace of \(\mathbb{E}[X_{\mathcal{I}_t^0}]\). By definition, for any \(j \in \mathcal{I}^0_t\) we know \(b_{j,t} = \mathbb{E}[Y_{j,t}|\mathcal{LF}, j \in \mathcal{I}_t^0]\). Lastly, by Assumption 20 there exists \(\xi^{(0,t)} \in \mathbb{R}^p\) such that \[\mathbb{E}[Y_{j,t}|\mathcal{LF}, j \in \mathcal{I}_t^0] = \sum_{i = 1}^p\xi_i^{(0,t)} \cdot \mathbb{E}[(X_{\mathcal{I}_t^0})_{ij}|\mathcal{LF}, j \in \mathcal{I}_t^0].\] This concludes the proof. ◻
Lemma 17 allows us to write \[\begin{align} \hat{b}_{n,t} - b_{n,t} \mid \mathcal{LF}&= \left\langle \hat{\beta}^{n, \mathcal{I}_t^0}, \hat{b}_{\mathcal{I}_t^0,t } \right\rangle - \left\langle \tilde{\beta}^{n, \mathcal{I}_t^0}, b_{\mathcal{I}_t^0 ,t} \right\rangle\\ &= \underbrace{\langle \tilde{\beta}^{n,\mathcal{I}_t^0}, \eta_{\mathcal{I}_t^0}\rangle}_{\text{Term 1a}} + \underbrace{\langle \Delta_{n, \mathcal{I}_t^0} , \eta_{\mathcal{I}_t^0}\rangle}_{\text{Term 1b}} + \underbrace{\langle \Delta_{n, \mathcal{I}_t^0}, b_{\mathcal{I}_t^0}\rangle}_{\text{Term 1c}}, \end{align}\] where \(\eta_{\mathcal{I}_t^0} = \hat{b}_{\mathcal{I}_t^0,t} - b_{\mathcal{I}_t^0,t}\) and \(\Delta_{n, \mathcal{I}_t^0} = \hat{\beta}^{n,\mathcal{I}_t^0} - \tilde{\beta}^{n,\mathcal{I}_t^0}\). Using the previously referenced argument in Section 10.4 for any Non-Donor Set Component and applying the appropriate version of Lemmas 6, 7, and 8 allows us to claim for \(n \notin \mathcal{I}_t^0\) \[\begin{align} \hat{b}_{n,t} - b_{n,t} \mid \mathcal{LF}&= O_p\left(\sqrt{k\log(p|\mathcal{I}_t^0|)}\left(\frac{k^{3/4}}{p^{1/4}} + k^2 \max\left\{\frac{\sqrt{|\mathcal{I}_t^0|}}{p^{3/2}}, \frac{1}{\sqrt{|\mathcal{I}_t^0|-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right). \end{align}\]
Baseline Consistency: The donor and non-donor cases together imply that for any \(n \in [N]\) \[\begin{align} \label{eq:baseline-consistency-rate-invariant} \hat{b}_{n,t} - b_{n,t} \mid \mathcal{LF}&= O_p\left(\sqrt{\log(p|\mathcal{I}_t^0|)}\left(\frac{k^{5/4}}{p^{1/4}} +k^{5/2}\max\left\{\frac{\sqrt{|\mathcal{I}_t^0|}}{p^{3/2}}, \frac{1}{\sqrt{|\mathcal{I}_t^0|-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right). \end{align}\tag{114}\]
2. Verifying Terminal Blip Consistency:
For any \(d \in [A]\):
Donor Set Consistency: Consider unit \(n \in \mathcal{I}^d\). Denote \(X_{\mathcal{I}^d \setminus n} = X_{:, \mathcal{I}^d \setminus n}\in \mathbb{R}^{p \times |\mathcal{I}^d \setminus n|}\). We know the baseline outcome admits the representation \[\begin{align} \hat{\gamma}_{n,0}(d) - \gamma_{n,0}(d) \mid \mathcal{LF}&= \underbrace{\left\langle \hat{\phi}^{n, \mathcal{I}^d}, Y_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}} \right\rangle - \left\langle \phi^{n, \mathcal{I}^d}, \mathbb{E}[Y_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}} \mid \mathcal{LF}] \right\rangle}_{\text{Term 1}}\\ &+ \underbrace{\left\langle \phi^{n, \mathcal{I}^d \setminus n}, b_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}} \right\rangle - \left\langle\hat{\phi}^{n, \mathcal{I}^d \setminus n}, \hat{b}_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}}\right\rangle}_{\text{Term 2}}, \end{align}\] where \(Y_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}} = [(Y_{j, t_j^*})_{j \in \mathcal{I}^d \setminus n}]^{\top}\) and \(\hat{\phi}^{n, \mathcal{I}^d}\) are the regression coefficients from regressing additional covariates \(X_n \in \mathbb{R}^p\) on the rank \(k_{\mathcal{I}^d \setminus n}\)-approximation of \(X_{\mathcal{I}^d \setminus n}\) with \(k_{\mathcal{I}^d \setminus n} = \text{rank}(\mathbb{E}[X_{\mathcal{I}^d \setminus n}])\), i.e., doing PCR with parameter \(k_{\mathcal{I}^d \setminus n}\).21
Bounding Term \(1\): This argument is nearly identical to that for Donor Set Baseline Consistency.
Lemma 18. We have that \[\left\langle \phi^{n, \mathcal{I}^d}, \mathbb{E}[Y_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}}] \right\rangle = \left\langle \tilde{\phi}^{n, \mathcal{I}^d}, \mathbb{E}[Y_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}}] \right\rangle\] with \(\tilde{\phi}^{n, \mathcal{I}^d} = VV^{\top}{\phi}^{n, \mathcal{I}^d}\), where \(V\) denotes the right singular vectors of \(\mathbb{E}[X_{ \mathcal{I}^d\setminus n}]\).
Proof. It would suffice to prove that \[VV^{\top}\mathbb{E}[Y_{\mathcal{I}^d \setminus n} ] = \mathbb{E}[Y_{\mathcal{I}^d \setminus n} ],\] which is equivalent to \(\mathbb{E}[Y_{\mathcal{I}^d \setminus n}]^{\top}\) being in the rowspace of \(\mathbb{E}[X_{\mathcal{I}^d \setminus n}]\). By Assumption 20 there exists \(\xi^{(d,0)}\) such that for any \(j \in \mathcal{I}^d \setminus n\) \[\mathbb{E}[Y_{j,t_j^*}|\mathcal{LF}, j \in \mathcal{I}^d \setminus n] = \sum_{i = 1}^p \xi^{(d,0)}_i \cdot \mathbb{E}[(X_{\mathcal{I}^d\setminus n})_{ij}|\mathcal{LF}, j \in \mathcal{I}^d \setminus n].\] This concludes the proof. ◻
Using Lemma 18, we can once again use the proof technique in [10] Theorem \(2\) (Appendix C) to show consistency of \[\begin{align} \label{eq:donor-terminal-blip-consistency-term-1-invariant} \text{Term 1} &= \left\langle \hat{\phi}^{n, \mathcal{I}^d}, Y_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}} \right\rangle - \left\langle \tilde{\phi}^{n, \mathcal{I}^d}, \mathbb{E}[Y_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}}] \right\rangle\\ &= O_p \left( \sqrt{\log(p |\mathcal{I}^d|)} \left[ \frac{{k}^{3/4}}{p^{1/4}} + {k}^2 \max \left\{ \frac{\sqrt{|\mathcal{I}^d|}}{p^{3/2}}, \frac{1}{\sqrt{p}}, \frac{1}{\sqrt{|\mathcal{I}^d|-1}} \right\} \right] \right).\nonumber \end{align}\tag{115}\]
Bounding Term 2:
Lemma 19. We have \[\left\langle \phi^{n, \mathcal{I}^d},b_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}} \right\rangle = \left\langle \tilde{\phi}^{n, \mathcal{I}^d},b_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}}\right\rangle\] with \(\tilde{\phi}^{n, \mathcal{I}^d} = VV^{\top}\phi^{n, \mathcal{I}^d}\), where \(V\) denotes the right singular vectors of \(\mathbb{E}[X_{ \mathcal{I}^d\setminus n}]\).
Proof. It would suffice to prove that \[VV^{\top}b_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}}= b_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}},\] which is equivalent to \((b_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}})^{\top}\) being in the rowspace of \(\mathbb{E}[X_{\mathcal{I}^d \setminus n}]\). Applying the third conclusion of Assumption 20 with \(t = 0\) we know for any \(j \in \mathcal{I}^d \setminus n\) \[b_{j,t_j^* } = \mathbb{E}\left[Y_{j, t_j^*}^{(\mkern 1.3mu\overline{\mkern-1.3mu\tilde{0}\mkern-1.3mu}\mkern 1.3mu^{t_j^*})}\big|\mathcal{LF}, j \in \mathcal{I}^d\right] = \sum_{i = 1}^p \alpha_i^{(0,0)}\cdot \mathbb{E}[(X_{\mathcal{I}^d})_{ij}\mid \mathcal{LF}j \in \mathcal{I}^d \setminus n].\] This concludes the proof. ◻
Using Lemma 19 we can write \[\left\langle \phi^{n, \mathcal{I}^d},b_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}}\right\rangle - \left\langle \hat{\phi}^{n, \mathcal{I}^d},\hat{b}_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}} \right \rangle = \left\langle \tilde{\phi}^{n, \mathcal{I}^d},b_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}} \right\rangle - \left\langle \hat{\phi}^{n, \mathcal{I}^d},\hat{b}_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}} \right \rangle\] Next we negate the RHS and decompose as follows:22 \[\begin{align} \left\langle \tilde{\phi}^{n, \mathcal{I}^d},b_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}} \right\rangle& - \left\langle \hat{\phi}^{n, \mathcal{I}^d},\hat{b}_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}} \right \rangle \\ &= \underbrace{\left\langle \tilde{\phi}^{n, \mathcal{I}^d}, \eta_{\mathcal{I}^d \setminus n}\right\rangle}_{\text{Term 1a}} + \underbrace{\left\langle \Delta_{n, \mathcal{I}^d}, \eta_{\mathcal{I}^d \setminus n}\right\rangle}_{\text{Term 1b}} + \underbrace{\left\langle \Delta_{n, \mathcal{I}^d} , b_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}}\right\rangle}_{\text{Term 1c}}, \end{align}\] where \(\eta_{\mathcal{I}^d \setminus n} = \hat{b}_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}} - b_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}}\) and \(\Delta_{n, \mathcal{I}^d} = \hat{\phi}^{n, \mathcal{I}^d} - \tilde{\phi}^{n, \mathcal{I}^d}\). Using the previously referenced argument by applying the appropriate version of Lemma 6, 7, and 8 alongside Equation 114 for Terms 1a, 1b, and 1c respectively allows to claim \[\begin{align} \label{eq:donor-terminal-consistency-term-2} \text{Term 2} &= \left\langle \hat{\phi}^{n, \mathcal{I}^d},\hat{b}_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}} \right \rangle - \left\langle \tilde{\phi}^{n, \mathcal{I}^d},b_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}} \right\rangle\\ &= O_p \left( \sqrt{\log(p \pi_{\mathcal{I}}|)} \left[ \frac{{k}^{7/4}}{p^{1/4}} + {k}^3 \max \left\{ \frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{p}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}} \right\} \right] \right),\nonumber \end{align}\tag{116}\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}^d|\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_1^0|,|\mathcal{I}^d|\}\). To be precise the both collection of donor sets above should include \((\mathcal{I}_t^0)_{t\in [T]}\), but note that \(\mathcal{I}^0_1 \subset \dots \subset \mathcal{I}^0_T\).
Combining Term \(1\) and \(2\) rates, we find for any \(n \in \mathcal{I}^d\)
\[\label{eq:donor-terminal-rate-invariant} \hat{\gamma}_{n,0}(d) - \gamma_{n,0}(d) \mid \mathcal{LF}= O_p\left(\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{7/4}}{p^{1/4}} +k^{3}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\tag{117}\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}^d|\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_1^0|,|\mathcal{I}^d|\}\).
Non-Donor Set Consistency: Consider unit \(n \notin \mathcal{I}^d\). Denote \(X_{\mathcal{I}^d} = X_{:, \mathcal{I}^d} \in \mathbb{R}^{p \times |\mathcal{I}^d|}\). We know the baseline outcome admits the representation \[\hat{\gamma}_{n,0}(d) - \gamma_{n,0}(d) \mid \mathcal{LF}= \left\langle \hat{\beta}^{n, \mathcal{I}^d}, \hat{\gamma}_{\mathcal{I}^d, 0}(d) \right\rangle - \left\langle \beta^{n, \mathcal{I}^d}, \gamma_{\mathcal{I}_T^d, 0}(d) \right\rangle,\] where \(\hat{\beta}^{n, \mathcal{I}^d}\) are the regression coefficients from regressing additional covariates \(X_n \in \mathbb{R}^p\) on the rank \(k_{\mathcal{I}^d }\)-approximation \(X_{\mathcal{I}^d }\) with \(k_{\mathcal{I}^d } = \text{rank}(\mathbb{E}[X_{\mathcal{I}^d}])\), i.e., doing PCR with parameter \(k_{\mathcal{I}^d }\).
We use an essentially identical argument to that established in Non-Donor Set Baseline Consistency.
Lemma 20. We have that \[\left\langle \beta^{n, \mathcal{I}^d}, \gamma_{\mathcal{I}^d, 0}(d) \right\rangle = \left\langle \tilde{\beta}^{n, \mathcal{I}^d}, \gamma_{\mathcal{I}^d, 0}(d) \right\rangle\] with \(\tilde{\beta}^{n, \mathcal{I}^d} = VV^{\top}{\beta}^{n, \mathcal{I}^d}\), where \(V\) denotes the right singular vectors of \(\mathbb{E}[X_{ \mathcal{I}^d}]\).
Proof. It would suffice to prove that \[VV^{\top}\gamma_{\mathcal{I}^d, 0}(d) = \gamma_{\mathcal{I}^d, 0}(d) ,\] which is equivalent to \(\gamma_{\mathcal{I}^d, 0}(d) ^{\top}\) being in the rowspace of \(\mathbb{E}[X_{\mathcal{I}^d}]\). To that end, recall for any \(j \in \mathcal{I}^d\) \[\begin{align} \gamma_{j, 0}(d) &= \langle \psi_j^0, w_d - w_{\tilde{0}}\rangle\\ &= \mathbb{E}\left[Y_{j, t_j^* }^{(\mkern 1.3mu\overline{\mkern-1.3muD_{j, t_j^*}\mkern-1.3mu}\mkern 1.3mu)}\right] - \mathbb{E}\left[Y_{j, t_j^*}^{(\mkern 1.3mu\overline{\mkern-1.3muD_{j, t_j^*-1}\mkern-1.3mu}\mkern 1.3mu, \underline{\tilde{0}^{t_j^* }})}\right]\\ &= \mathbb{E}\left[Y_{j, t_j^*} - Y_{j, t_j^* +t}^{(\mkern 1.3mu\overline{\mkern-1.3mu\tilde{0}\mkern-1.3mu}\mkern 1.3mu^{t_j^*})} \big| j \in \mathcal{I}^d\right]\\ &\sum_{i = 1}^p (\xi_i^{(d,0)} - \alpha_i^{(0,t)})\cdot\mathbb{E}[(X_{\mathcal{I}^d})_{ij}|\mathcal{LF},j \in \mathcal{I}^d]. \end{align}\] The first two equalities follow by the definition of blips, the third follows from \(\mkern 1.3mu\overline{\mkern-1.3muD_{j, t_j^*}\mkern-1.3mu}\mkern 1.3mu = (\tilde{0}, \dots, \tilde{0}, d)\) where \(d\) occurs in the \(t_j^*\) index. The last equality is due to the second and third conclusions of Assumption 20 being applied to each term respectively. ◻
Lemma 20 allows us to write \[\begin{align} \hat{\gamma}_{n,0}(d) - \gamma_{n,0}(d) \mid \mathcal{LF}&= \left\langle \hat{\beta}^{n, \mathcal{I}^d}, \hat{\gamma}_{\mathcal{I}^d, 0}(d) \right\rangle - \left\langle \beta^{n, \mathcal{I}^d}, \gamma_{\mathcal{I}^d, 0}(d) \right\rangle\\ &= \underbrace{\langle \tilde{\beta}^{n,\mathcal{I}^d}, \eta_{\mathcal{I}^d}(d)\rangle}_{\text{Term 1a}} + \underbrace{\langle \Delta_{n, \mathcal{I}^d} , \eta_{\mathcal{I}^d}(d)\rangle}_{\text{Term 1b}} + \underbrace{\langle \Delta_{n, \mathcal{I}^d}, \gamma_{\mathcal{I}^d, 0}(d)\rangle}_{\text{Term 1c}}, \end{align}\] where \(\eta_{\mathcal{I}^d}(d) = \hat{\gamma}_{\mathcal{I}^d, 0}(d) - \gamma_{\mathcal{I}^d, 0}(d)\) and \(\Delta_{n, \mathcal{I}^d} = \hat{\beta}^{n,\mathcal{I}^d} - \tilde{\beta}^{n,\mathcal{I}^d}\). Using the previously referenced argument and applying the appropriate version of Lemmas 6, 7, and 8 allows us to claim for \(n \notin \mathcal{I}_T^d\) \[\label{eq:non-donor-terminal-rate-invariant} \hat{\gamma}_{n,0}(d) - \gamma_{n,0}(d) \mid \mathcal{LF}= O_p\left(\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{9/4}}{p^{1/4}} +k^{3/2}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{7/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\tag{118}\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}^d|\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_1^0|,|\mathcal{I}^d|\}\).
Terminal Blip Consistency: The above two sections allows us to conclude that for any \(n \in [N]\) \[\label{eq:terminal-rate-invariant} \hat{\gamma}_{n,0}(d) - \gamma_{n,0}(d) \mid \mathcal{LF}= O_p\left(\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{9/4}}{p^{1/4}} +k^{7/2}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\tag{119}\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}^d|\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_1^0|,|\mathcal{I}^d|\}\).
3. Verifying Non-Terminal Blip Consistency:
For any unit \(n \in [N]\), treatment \(d \in [A]\), and \(t \in [1, \dots, T-1]\), consider the statement \(P_{d,n}(t)\): \[\begin{align} &\hat{\gamma}_{n, t}(d) - \gamma_{n, t}(d) \mid \mathcal{LF}\\ &= O_p\left(t\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{t}}{p^{1/4}} + k^{t}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right), \end{align}\] where \(\mathcal{F}= \{|\mathcal{I}^d|, |\mathcal{I}_T^0|, |\mathcal{I}_1^0|, (\mathcal{I}^{D_{n, t_n^* + q}})_{n \in [N], q\in[1, \dots, t]} \}\) with \(\pi_{\mathcal{I}} = \max\mathcal{F}, \alpha_{\mathcal{I}} = \min\mathcal{F}\).
We proceed by strong induction.
To that end, consider the base case \(t = 1\), i.e., proving \(P_{d,n}(1)\):
For any \(d \in [A]\):
Donor Set Consistency: Consider unit \(n \in \mathcal{I}^d\). Denote \(X_{\mathcal{I}^d \setminus n} = X_{:, \mathcal{I}^d \setminus n}\in \mathbb{R}^{p \times |\mathcal{I}^d \setminus n|}\). We know the blip admits the representation \[\begin{align} &\hat{\gamma}_{n,1}(d) - \gamma_{n,1}(d) \mid \mathcal{LF}= \underbrace{\left\langle \hat{\phi}^{n, \mathcal{I}^d}, Y_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+1} \right\rangle - \left\langle \phi^{n, \mathcal{I}^d}, \mathbb{E}[Y_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+1} \mid \mathcal{LF}] \right\rangle}_{\text{Term 1}}\\ &+ \underbrace{\left\langle \phi^{n, \mathcal{I}^d},b_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+1}\mid \mathcal{LF}\right\rangle - \left\langle \hat{\phi}^{n, \mathcal{I}^d},\hat{b}_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+1} \right \rangle}_{\text{Term 2}}\\ &+ \underbrace{\left\langle \phi^{n, \mathcal{I}^d}, \gamma_{\mathcal{I}^d\setminus n,0}(D_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}+1})\mid \mathcal{LF}\right \rangle - \left\langle \hat{\phi}^{n, \mathcal{I}^d}, \hat{\gamma}_{\mathcal{I}^d\setminus n,0}(D_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}+1}) \right \rangle}_{\text{Term 3}}. \end{align}\] where \(\gamma_{\mathcal{I}^d\setminus n,0}(D_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}+1}) = [(\gamma_{j, 0}(D_{j, t_j^*+1}))_{j \in \mathcal{I}^d\setminus n}]^{\top}\) and \(\hat{\phi}^{n, \mathcal{I}^d}\) are the regression coefficients from regressing additional covariates \(X_n \in \mathbb{R}^p\) on the rank \(k_{\mathcal{I}^d \setminus n}\)-approximation of \(X_{\mathcal{I}^d \setminus n}\) with \(k_{\mathcal{I}^d \setminus n} = \text{rank}(\mathbb{E}[X_{\mathcal{I}^d \setminus n}])\), i.e., doing PCR with parameter \(k_{\mathcal{I}^d \setminus n}\).
Bounding Term 1: We prove a similar row space result.
Lemma 21. We have for any \(t \in [T-1]\) \[\left\langle \phi^{n, \mathcal{I}^d}, \mathbb{E}[Y_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+t} ]\right\rangle = \left\langle \tilde{\phi}^{n, \mathcal{I}^d}, \mathbb{E}[Y_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+t} ] \right\rangle\] with \(\tilde{\phi}^{n, \mathcal{I}^d} = VV^{\top}\phi^{n, \mathcal{I}^d}\), where \(V\) denotes the right singular vectors of \(\mathbb{E}[X_{ \mathcal{I}^d\setminus n}]\).
Proof. It would suffice to prove that \[VV^{\top}\mathbb{E}[Y_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+t} ] = \mathbb{E}[Y_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+t} ],\] which is equivalent to \(\mathbb{E}[Y_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+t} ]^{\top}\) being in the rowspace of \(\mathbb{E}[X_{\mathcal{I}^d \setminus n}]\). By Assumption 20 there exists \(\xi^{(d,t)}\) such that for any \(j \in \mathcal{I}^d \setminus n\) \[\mathbb{E}[Y_{j,t_j^* + t}|\mathcal{LF}, j \in \mathcal{I}^d \setminus n] = \sum_{i = 1}^p \xi^{(d,t)}_i \cdot \mathbb{E}[(X_{\mathcal{I}^d\setminus n})_{ij}|\mathcal{LF}, j \in \mathcal{I}^d \setminus n].\] This concludes the proof. ◻
Using Lemma 21 for \(t = 1\), we use the proof technique in [10] Theorem \(2\) (Appendix C) to show consistency of \[\begin{align} \label{eq:donor-non-terminal-consistency-term-1-invariant} \text{Term 1} &= \left\langle \hat{\phi}^{n, \mathcal{I}^d}, Y_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+1} \right\rangle - \left\langle \phi^{n, \mathcal{I}^d}, \mathbb{E}[Y_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+1}] \right\rangle\\ &= O_p \left( \sqrt{\log(p |\mathcal{I}^d|)} \left[ \frac{{k}^{3/4}}{p^{1/4}} + {k}^2 \max \left\{ \frac{\sqrt{|\mathcal{I}^d|}}{p^{3/2}}, \frac{1}{\sqrt{p}}, \frac{1}{\sqrt{|\mathcal{I}^d|-1}} \right\} \right] \right).\nonumber \end{align}\tag{120}\]
Bounding Term 2:
Lemma 22. We have for any \(t \in [T-1]\) \[\left\langle \phi^{n, \mathcal{I}^d},b_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}+t} \right\rangle = \left\langle \tilde{\phi}^{n, \mathcal{I}^d},b_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}+t}\right\rangle\] with \(\tilde{\phi}^{n, \mathcal{I}^d} = VV^{\top}\phi^{n, \mathcal{I}^d}\), where \(V\) denotes the right singular vectors of \(\mathbb{E}[X_{ \mathcal{I}^d\setminus n}]\).
Proof. It would suffice to prove that \[VV^{\top}b_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}+t}= b_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}+t},\] which is equivalent to \((b_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}+t})^{\top}\) being in the rowspace of \(\mathbb{E}[X_{\mathcal{I}^d \setminus n}]\). Applying the third conclusion of Assumption 20 we know for any \(j \in \mathcal{I}^d \setminus n\) \[b_{j,t_j^* +t } = \mathbb{E}\left[Y_{j, t_j^* + t}^{(\mkern 1.3mu\overline{\mkern-1.3mu\tilde{0}\mkern-1.3mu}\mkern 1.3mu^{t_j^* + t})}\big|\mathcal{LF}, j \in \mathcal{I}^d\right] = \sum_{i = 1}^p \alpha_i^{(0,t)}\cdot \mathbb{E}[(X_{\mathcal{I}^d})_{ij}\mid \mathcal{LF}j \in \mathcal{I}^d \setminus n].\] This concludes the proof. ◻
Using Lemma 22 for \(t = 1\) we can write \[\left\langle \phi^{n, \mathcal{I}^d},b_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+1}\right\rangle - \left\langle \hat{\phi}^{n, \mathcal{I}^d},\hat{b}_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+1} \right \rangle = \left\langle \tilde{\phi}^{n, \mathcal{I}^d},b_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+1} \right\rangle - \left\langle \hat{\phi}^{n, \mathcal{I}^d},\hat{b}_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+1} \right \rangle\] Next we negate the RHS and decompose as follows:23 \[\begin{align} \left\langle \tilde{\phi}^{n, \mathcal{I}^d},b_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+1} \right\rangle& - \left\langle \hat{\phi}^{n, \mathcal{I}^d},\hat{b}_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+1} \right \rangle \\ &= \underbrace{\left\langle \tilde{\phi}^{n, \mathcal{I}^d}, \eta_{\mathcal{I}^d \setminus n}\right\rangle}_{\text{Term 1a}} + \underbrace{\left\langle \Delta_{n, \mathcal{I}^d}, \eta_{\mathcal{I}^d \setminus n}\right\rangle}_{\text{Term 1b}} + \underbrace{\left\langle \Delta_{n, \mathcal{I}^d} , b_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+1}\right\rangle}_{\text{Term 1c}}, \end{align}\] where \(\eta_{\mathcal{I}^d \setminus n} = \hat{b}_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+1} - b_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+1}\) and \(\Delta_{n, \mathcal{I}^d} = \hat{\phi}^{n, \mathcal{I}^d} - \tilde{\phi}^{n, \mathcal{I}^d}\). Using the previously referenced argument by applying the appropriate version of Lemma 6, 7, and 8 alongside Equation 114 for Terms 1a, 1b, and 1c respectively allows to claim \[\begin{align} \label{eq:donor-non-terminal-consistency-term-2-invariant} \text{Term 2} &= \left\langle \hat{\phi}^{n, \mathcal{I}^d},\hat{b}_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+1} \right \rangle - \left\langle \tilde{\phi}^{n, \mathcal{I}^d},b_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+1} \right\rangle\\ &= O_p \left( \sqrt{\log(p \pi_{\mathcal{I}}|)} \left[ \frac{{k}^{7/4}}{p^{1/4}} + {k}^3 \max \left\{ \frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{p}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}} \right\} \right] \right),\nonumber \end{align}\tag{121}\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}^d|\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_1^0|,|\mathcal{I}^d|\}\).
Bounding Term 3:
Lemma 23. We have for any \(t \in [T-1]\) and \(\ell < t\) \[\left\langle \phi^{n, \mathcal{I}^d}, \gamma_{\mathcal{I}^d\setminus n,\ell}(D_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}+t - \ell})\right\rangle = \left\langle \tilde{\phi}^{n, \mathcal{I}^d}, \gamma_{\mathcal{I}^d\setminus n,\ell}(D_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}+t - \ell})\right\rangle\] with \(\tilde{\phi}^{n, \mathcal{I}^d} = VV^{\top}{\phi}^{n, \mathcal{I}^d}\), where \(V\) denotes the right singular vectors of \(\mathbb{E}[X_{\mathcal{I}^d\setminus n}]\).
Proof. It would suffice to prove that \[VV^{\top} \gamma_{\mathcal{I}^d\setminus n,\ell}(D_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}+t - \ell})= \gamma_{\mathcal{I}^d\setminus n,\ell}(D_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}+t - \ell}),\] which is equivalent to \((\gamma_{\mathcal{I}^d\setminus n,\ell}(D_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}+t - \ell}))^{\top}\) being in the rowspace of \(\mathbb{E}[X_{\mathcal{I}^d \setminus n}]\). Notice that for any \(j \in \mathcal{I}^d \setminus n\)
\[\begin{align} \gamma_{j,\ell}(D_{j, t_j^*+t - \ell}) & = \langle \psi_j^{\ell}, w_{D_{j, t_j^* + t - \ell}} - w_{\tilde{0}}\rangle\\ &= \mathbb{E}\left[Y_{j, t_j^* + t}^{(\mkern 1.3mu\overline{\mkern-1.3muD_{j, t_j^* + t - \ell}\mkern-1.3mu}\mkern 1.3mu, \underline{\tilde{0}^{t_j^* + t - \ell+ 1}})}\right] - \mathbb{E}\left[Y_{j, t_j^* + t}^{(\mkern 1.3mu\overline{\mkern-1.3muD_{j, t_j^* + t - \ell-1}\mkern-1.3mu}\mkern 1.3mu, \underline{\tilde{0}^{t_j^* + t - \ell}})}\right]\\ &= \sum_{i = 1}^p (\alpha^{(0, t, \ell)} - \alpha^{(0, t, \ell)'}) \cdot \mathbb{E}[(X_{\mathcal{I}^d\setminus n})_{ij}|, j \in \mathcal{I}^d \setminus n], \end{align}\] where we use the definition of blips in the first two equalities and both conclusions of Lemma 15 yield the last equality. ◻
Using Lemma 23 for \(t = 1\) and \(\ell = 0\) we can write \[\begin{align} &\left\langle \phi^{n, \mathcal{I}^d},\gamma_{\mathcal{I}^d\setminus n,0}(D_{\mathcal{I}^d\setminus n, t^*_{\mathcal{I}^d}+1}) \right\rangle - \left\langle \hat{\phi}^{n, \mathcal{I}_{T-1}^d},\hat{\gamma}_{\mathcal{I}^d\setminus n,0}(D_{\mathcal{I}^d\setminus n, t^*_{\mathcal{I}^d}+1}) \right \rangle\\ &= \left\langle \tilde{\phi}^{n, \mathcal{I}^d},\gamma_{\mathcal{I}^d\setminus n,0}(D_{\mathcal{I}^d\setminus n, t^*_{\mathcal{I}^d}+1}) \right\rangle - \left\langle \hat{\phi}^{n, \mathcal{I}_{T-1}^d},\hat{\gamma}_{\mathcal{I}^d\setminus n,0}(D_{\mathcal{I}^d\setminus n, t^*_{\mathcal{I}^d}+1}) \right \rangle \end{align}\] At this point, we can follow the earlier approach for Term \(2\) by negating, using the same decomposition, and applying the appropriate version of Lemma 6, 7, and 8 alongside Equation 119 to write
\[\begin{align} \text{Term }3 &= O_p\left(\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{11/4}}{p^{1/4}} +k^{4}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right), \end{align}\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}^d|, (|\mathcal{I}^{D_{n,t^*_n + 1}}|)_{n \in [N]}\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_1^0|,|\mathcal{I}^d|, (|\mathcal{I}^{D_{n,t^*_n + 1}}|)_{n \in [N]}\}\). Notice that this dominates the rates for Terms \(1\) and \(2\) and as such we also have for any \(n \in \mathcal{I}^d\) \[\begin{align} \label{eq:donor-base-blip-rate-invariant} \hat{\gamma}_{n,1}(d) - \gamma_{n,1}(d) \mid \mathcal{LF} &= O_p\left(\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{11/4}}{p^{1/4}} +k^{4}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right), \end{align}\tag{122}\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}^d|, (|\mathcal{I}^{D_{n,t^*_n + 1}}|)_{n \in [N]}\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_1^0|,|\mathcal{I}^d|, (|\mathcal{I}^{D_{n,t^*_n + 1}}|)_{n \in [N]}\}\).
Non-Donor Set Consistency: Consider any \(t \in [T-1]\) and unit \(n \notin \mathcal{I}^d\). Denote \(X_{\mathcal{I}^d} = X_{:,\mathcal{I}^d} \in \mathbb{R}^{p \times |\mathcal{I}^d|}\). We know the blip effect admits the representation \[\hat{\gamma}_{n,t}(d) - \gamma_{n,t}(d) \mid \mathcal{LF}= \left\langle \hat{\beta}^{n, \mathcal{I}^d}, \hat{\gamma}_{\mathcal{I}^d, t}(d) \right\rangle - \left\langle \beta^{n, \mathcal{I}^d}, \gamma_{\mathcal{I}^d, t}(d) \right\rangle,\] where \(\hat{\beta}^{n, \mathcal{I}^d}\) are the regression coefficients from regressing additional covariates \(X_n \in \mathbb{R}^p\) on the rank \(k_{\mathcal{I}^d }\)-approximation \(X_{\mathcal{I}^d }\) with \(k_{\mathcal{I}^d } = \text{rank}(\mathbb{E}[X_{\mathcal{I}^d}])\), i.e., doing PCR with parameter \(k_{\mathcal{I}^d }\).
We use an identical argument to that established in Baseline Consistency – Non-Donor Set.
Lemma 24. We have that \[\left\langle \beta^{n, \mathcal{I}^d}, \gamma_{\mathcal{I}^d, t}(d) \right\rangle = \left\langle \tilde{\beta}^{n, \mathcal{I}^d}, \gamma_{\mathcal{I}^d, t}(d) \right\rangle\] with \(\tilde{\beta}^{n, \mathcal{I}^d} = VV^{\top}{\beta}^{n, \mathcal{I}^d}\), where \(V\) denotes the right singular vectors of \(\mathbb{E}[X_{ \mathcal{I}^d}]\).
Proof. It would suffice to prove that \[VV^{\top} \gamma_{\mathcal{I}^d, t}(d) = \gamma_{\mathcal{I}^d, t}(d),\] which is equivalent to \(\gamma_{\mathcal{I}^d, t}(d)^{\top}\) being in the rowspace of \(\mathbb{E}[X_{\mathcal{I}^d}]\). To that end, recall for any \(j \in \mathcal{I}^d\), \[\begin{align} \gamma_{j, t}(d) &= \langle \psi_j^t, w_d - w_{\tilde{0}} \rangle\\ &= \mathbb{E}\left[Y_{j, t_j^* + t}^{(\mkern 1.3mu\overline{\mkern-1.3muD_{j, t_j^*}\mkern-1.3mu}\mkern 1.3mu, \underline{\tilde{0}^{t_j^* +1}})}\right] - \mathbb{E}\left[Y_{j, t_j^* + t}^{(\mkern 1.3mu\overline{\mkern-1.3muD_{j, t_j^*-1}\mkern-1.3mu}\mkern 1.3mu, \underline{\tilde{0}^{t_j^* }})}\right]\\ &= \mathbb{E}\left[Y_{j, t_j^* + t} - Y_{j, t_j^* +t}^{(\mkern 1.3mu\overline{\mkern-1.3mu\tilde{0}\mkern-1.3mu}\mkern 1.3mu^{t_j^* + t})} \big| j \in \mathcal{I}^d\right]\\ &\sum_{i = 1}^p (\xi_i^{(d,t)} - \alpha_i^{(0,t)})\cdot\mathbb{E}[(X_{\mathcal{I}^d})_{ij}|\mathcal{LF},j \in \mathcal{I}^d]. \end{align}\] The first two equalities follow by the definition of blips, the third follows from \(\mkern 1.3mu\overline{\mkern-1.3muD_{j, t_j^*+t}\mkern-1.3mu}\mkern 1.3mu = (\tilde{0}, \dots, \tilde{0}, d, \tilde{0}, \dots, \tilde{0})\) where \(d\) occurs in the \(t_j^*\) index. The last equality is due to the second and third conclusions of Assumption 20 being applied to each term respectively. ◻
Using the above framework and Lemma 24 with \(t = 1\) allows us to write \[\begin{align} \hat{\gamma}_{n,1}(d) - \gamma_{n,1}(d) \mid \mathcal{LF}&= \left\langle \hat{\beta}^{n, \mathcal{I}^d}, \hat{\gamma}_{\mathcal{I}^d, 1}(d) \right\rangle - \left\langle \tilde{\beta}^{n, \mathcal{I}^d}, \gamma_{\mathcal{I}^d, 1}(d) \right\rangle\\ &= \underbrace{\langle \tilde{\beta}^{n, \mathcal{I}^d}, \eta_{\mathcal{I}^d}(d)\rangle}_{\text{Term 1a}} + \underbrace{\langle \Delta_{n, \mathcal{I}^d} , \eta_{\mathcal{I}^d}(d)\rangle}_{\text{Term 1b}} + \underbrace{\langle \Delta_{n, \mathcal{I}^d}, \gamma_{\mathcal{I}^d, 1}(d)\rangle}_{\text{Term 1c}}, \end{align}\] where \(\eta_{\mathcal{I}^d}(d) = \hat{\gamma}_{\mathcal{I}^d, 1}(d) - \gamma_{\mathcal{I}^d, 1}(d)\) and \(\Delta_{n, \mathcal{I}^d} = \hat{\beta}^{n,\mathcal{I}^d} - \tilde{\beta}^{n,\mathcal{I}^d}\). Using the previously referenced argument by applying the appropriate version of Lemmas 6, 7, and 8 allows to claim for \(n \notin \mathcal{I}^d\) \[\label{eq:non-donor-second-to-terminal-rate-invariant} \hat{\gamma}_{n,1}(d) - \gamma_{n,1}(d) \mid \mathcal{LF}= O_p\left(\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{13/4}}{p^{1/4}} +k^{9/2}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\tag{123}\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}^d|, (|\mathcal{I}^{D_{n,t^*_n + 1}}|)_{n \in [N]}\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_1^0|,|\mathcal{I}^d|, (|\mathcal{I}^{D_{n,t^*_n + 1}}|)_{n \in [N]}\}\). Combining equations 122 and 123 yields the base case.
Inductive Step: We assume \(P_{d,n}(\ell)\) for \(\ell \in [1, \dots, t-1]\) and prove \(P_{d,n}(t)\).
For any \(d \in [A]\):
Donor Set Consistency: Consider unit \(n \in \mathcal{I}^d\). Denote \(X_{\mathcal{I}^d \setminus n} = X_{:,\mathcal{I}^d \setminus n} \in \mathbb{R}^{p \times |\mathcal{I}^d \setminus n|}\). We know the baseline outcome admits the representation \[\begin{align} \hat{\gamma}_{n,t}(d) &-\gamma_{n,t}(d) \mid \mathcal{LF}= \underbrace{\left\langle \hat{\phi}^{n, \mathcal{I}^d}, Y_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+t} \right\rangle - \left\langle \phi^{n, \mathcal{I}^d}, \mathbb{E}[Y_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+t}] \right\rangle}_{\text{Term 1}}\\ &+ \underbrace{\left\langle \phi^{n, \mathcal{I}^d},b_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+t} \right\rangle - \left\langle \hat{\phi}^{n, \mathcal{I}^d},\hat{b}_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+t} \right \rangle}_{\text{Term 2}}\\ &+ \underbrace{\left\langle \phi^{n, \mathcal{I}^d}, \gamma_{\mathcal{I}^d\setminus n,0}(D_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}+t}) \right \rangle - \left\langle \hat{\phi}^{n, \mathcal{I}^d}, \hat{\gamma}_{\mathcal{I}^d\setminus n,0}(D_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}+t}) \right \rangle}_{\text{Term 3}}\\ &+ \sum_{\ell = 1}^{t-1}\left( \underbrace{ \left\langle \phi^{n, \mathcal{I}^d}, \gamma_{\mathcal{I}^d \setminus n, \ell}(D_{\mathcal{I}^d \setminus n,t^*_{\mathcal{I}^d} + t - \ell}) \right\rangle -\left\langle \hat{\phi}^{n, \mathcal{I}^d}, \hat{\gamma}_{\mathcal{I}^d \setminus n, \ell}(D_{\mathcal{I}^d \setminus n,t^*_{\mathcal{I}^d} + t - \ell}) \right\rangle}_{\text{Term } \ell}\right). \end{align}\] where \(\hat{\phi}^{n, \mathcal{I}^d}\) are the regression coefficients from regressing additional covariates \(X_n \in \mathbb{R}^p\) on the rank \(k_{\mathcal{I}^d \setminus n}\)-approximation of \(X_{\mathcal{I}^d \setminus n}\) with \(k_{\mathcal{I}^d \setminus n} = \text{rank}(\mathbb{E}[X_{\mathcal{I}^d \setminus n}])\), i.e., doing PCR with parameter \(k_{\mathcal{I}^d \setminus n}\).
Bounding Term 1: We simply use Lemma 21 which holds for any \(t \in [T-1]\) to leverage the proof technique in [10] Theorem \(2\) (Appendix C) to show consistency of \[\begin{align} \label{eq:donor-non-terminal-general-consistency-term1-invariant} \text{Term 1} &=\left\langle \hat{\phi}^{n, \mathcal{I}^d}, Y_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+t} \right\rangle - \left\langle \tilde{\phi}^{n, \mathcal{I}^d}, \mathbb{E}[Y_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+t}] \right\rangle\\ &= O_p \left( \sqrt{\log(p |\mathcal{I}^d|)} \left[ \frac{{k}^{3/4}}{p^{1/4}} + {k}^2 \max \left\{ \frac{\sqrt{|\mathcal{I}^d|}}{p^{3/2}}, \frac{1}{\sqrt{p}}, \frac{1}{\sqrt{|\mathcal{I}^d|-1}} \right\} \right] \right).\nonumber \end{align}\tag{124}\]
Bounding Term 2: Using the previously referenced argument for Term \(2\) in the base case by applying the appropriate version of Lemma 6, 7, and 8 alongside Equation 114 and Lemma 22 we know \[\begin{align} \label{eq:donor-baseline-inductive-step-invariant} \text{Term 2} &= \left\langle \hat{\phi}^{n, \mathcal{I}^d},\hat{b}_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+t} \right \rangle - \left\langle \tilde{\phi}^{n, \mathcal{I}^d},b_{\mathcal{I}^d \setminus n, t^*_{\mathcal{I}^d}+t} \right\rangle \\ &= O_p \left( \sqrt{\log(p \pi_{\mathcal{I}}|)} \left[ \frac{{k}^{7/4}}{p^{1/4}} + {k}^3 \max \left\{ \frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{p}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}} \right\} \right] \right),\nonumber \end{align}\tag{125}\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}^d|\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_1^0|,|\mathcal{I}^d|\}\).
Bounding Term 3: Using the previously referenced argument for Term \(3\) in the base case by applying the appropriate version of Lemma 6, 7, and 8 alongside Equation 119 for any \(d \in \{D_{n,t_n^* + t}\}_{n \in [N]}\) and Lemma 23 with \(\ell = 0\) to write
\[\begin{align} \label{eq:donor-inductive-term3-invariant} \text{Term }3 &= \left\langle \hat{\phi}^{n, \mathcal{I}^d}, \hat{\gamma}_{\mathcal{I}^d\setminus n,0}(D_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}+t}) \right \rangle - \left\langle \tilde{\phi}^{n, \mathcal{I}^d}, \gamma_{\mathcal{I}^d\setminus n,0}(D_{\mathcal{I}^d\setminus n,t^*_{\mathcal{I}^d}+t}) \right \rangle \\ &=O_p\left(\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{11/4}}{p^{1/4}} +k^{4}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\nonumber \end{align}\tag{126}\] where \(\pi_{\mathcal{I}} = \max\{|\mathcal{I}_T^0|,|\mathcal{I}^d|, (|\mathcal{I}^{D_{n,t^*_n + t}}|)_{n \in [N]}\}\) and \(\alpha_{\mathcal{I}} = \min\{|\mathcal{I}_1^0|,|\mathcal{I}^d|, (|\mathcal{I}^{D_{n,t^*_n + t}}|)_{n \in [N]}\}\).
Bounding Term \(\ell\) for \(\ell \in [1, \dots, t-1]\): For any such \(\ell\), we use an argument similar to Term \(3\) in the base case by applying the appropriate version of Lemma 6, 7, and 8 alongside the inductive hypothesis \(P_{d, n}(\ell)\) for all \(d \in \{D_{n, t^*_n + t - \ell}\}_{n \in [N]}\) and Lemma 23 to write
\[\begin{align} \label{eq:donor-terml-invariant} &\text{Term }\ell = \left\langle \phi^{n, \mathcal{I}^d}, \gamma_{\mathcal{I}^d \setminus n, \ell}(D_{\mathcal{I}^d \setminus n,t^*_{\mathcal{I}^d} + t - \ell}) \right\rangle -\left\langle \hat{\phi}^{n, \mathcal{I}^d}, \hat{\gamma}_{\mathcal{I}^d \setminus n, \ell}(D_{\mathcal{I}^d \setminus n,t^*_{\mathcal{I}^d} + t - \ell}) \right\rangle\\ &= O_p\left(\ell\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{\ell}}{p^{1/4}} + k^{\ell}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\nonumber \end{align}\tag{127}\] where \(\mathcal{F}= \{|\mathcal{I}_T^0|, |\mathcal{I}_1^T|, |\mathcal{I}^d| ,(|\mathcal{I}^{D_{n,t_n^* + q}}|)_{n\in [N],q \in [1, \dots, \ell]}, (|\mathcal{I}^{D_{n,t_n^* + t - \ell}}|)_{n\in [N]}\}\) with \(\pi_{\mathcal{I}} = \max\mathcal{F}, \alpha_{\mathcal{I}} = \min\mathcal{F}\).
Note that Terms \(1\)-\(2\) are dominated by the summation, as such it suffices to analyze the latter and Term-\(3\). To that end for the summation, \[\sum_{\ell = 1}^{t-1}\text{Term }\ell = O_p\left(\sum_{\ell = 1}^{t-1}\ell\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{\ell}}{p^{1/4}} + k^{\ell}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\]
where \(\mathcal{F}= \{|\mathcal{I}_T^0|, |\mathcal{I}_1^0|, |\mathcal{I}^d| ,(|\mathcal{I}^{D_{n,t_n^* + q}}|)_{n\in [N],q \in [1, \dots, t-1]}\}\) with \(\pi_{\mathcal{I}} = \max\mathcal{F}, \alpha_{\mathcal{I}} = \min\mathcal{F}\). Notice we bounded the smaller donor set cardinalates by the largest one, i.e., when \(\ell = t-1\). We analyze the time dependent terms and denote \[C := \sqrt{\log(p\pi_{\mathcal{I}})}, \quad C' := \max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}.\] Upon substitution we have \[C \sum_{m = 1}^{t - 1} m \left( \frac{k^m}{p^{1/4}} + C'k^m \right) = C \left( \frac{1}{p^{1/4}} + C' \right) \sum_{m = 1}^{t - 1} m k^m.\]
We apply the geometric sum derivative trick for \(k \geq 1\) \[\sum_{m=1}^{M} m k^m = \frac{k(1 - (M+1)k^M + Mk^{M+1})}{(1 - k)^2} = \Theta(Mk^{M+1})\] Taking \(M = t - 1\), we conclude \[\sum_{\ell = 1}^{t-1}\text{Term }\ell = O_p\left(t\sqrt{\log(p\pi_{\mathcal{I}})}\left( \frac{k^{t}}{p^{1/4}} + k^{t}\max\left\{ \frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}} - 1}}, \frac{1}{\sqrt{p}} \right\} \right) \right),\]
Combining this with Term-\(3\) yields for any \(n \in \mathcal{I}^d\) \[\begin{align} \label{eq:inductive-step-donor-blip-rate-invariant} &\hat{\gamma}_{n,t}(d) - \gamma_{n,t}(d) \mid \mathcal{LF}\\ &= O_p\left(t\sqrt{\log(p\pi_{\mathcal{I}})}\left(\frac{k^{t}}{p^{1/4}} + k^{t}\max\left\{\frac{\sqrt{\pi_{\mathcal{I}}}}{p^{3/2}}, \frac{1}{\sqrt{\alpha_{\mathcal{I}}-1}}, \frac{1}{\sqrt{p}}\right\}\right)\right),\nonumber \end{align}\tag{128}\] where \(\mathcal{F}= \{|\mathcal{I}_T^0|, |\mathcal{I}_1^0|, |\mathcal{I}^d| ,(|\mathcal{I}^{D_{n,t_n^* + q}}|)_{n\in [N],q \in [1, \dots, t]}\}\) with \(\pi_{\mathcal{I}} = \max\mathcal{F}, \alpha_{\mathcal{I}} = \min\mathcal{F}\).
Non-Donor Set Consistency: Applying the Non-Donor Set Consistency argument written for the Base Case for general \(t\), specifically Lemma 24 for any \(t \in [T-1]\), proves \(P_{d,n}(t)\).
4. Verifying Target Causal Parameter Consistency: For any unit \(n \in [N]\) and \(\bar{d}^T \in [A]^T\) we recall the SBE-PCR
estimator and the
corresponding causal estimand. \[\hat{\mathbb{E}}\left[Y_{n,T}^{(\bar{d}^T)}\right] = \sum_{t = 1}^T \hat{\gamma}_{n,T-t}(d_t) + \hat{b}_{n,T} \quad \text{and} \quad \mathbb{E}\left[Y_{n,t}^{(\bar{d}^T)} \mid \mathcal{LF}\right]
= \sum_{t=1}^T \gamma_{n,T-t}(d_t) + b_{n,T} \mid \mathcal{LF}.\] The difference is exactly \[\hat{\mathbb{E}}\left[Y_{n,T}^{(\bar{d}^T)}\right] - \mathbb{E}\left[Y_{n,t}^{(\bar{d}^T)} \mid \mathcal{LF}\right] = \left(
\hat{b}_{n,T} - b_{n,T} \mid \mathcal{LF}\right) +
\sum_{t=1}^T \left(\hat{\gamma}_{n,T-t}(d_t) - \gamma_{n,T-t}(d_t) \mid \mathcal{LF}\right)\]
We apply the known bound for each term, specifically Equation 114 , Equation 119 with \(d = d_T\), and \(P_{d_{t},n}(T-t)\) for every \(t \in [T-1]\). Once again we encounter the same geometric sum, which gives the desired result upon noting that the baseline rate is dominated by that of the sum.
We recall that for any unit \(n \in [N]\) and \(\bar{d}^T \in [A]^T\)
\[\mathbb{E}\left[Y_{n,t}^{(\bar{d}^T)} \mid \mathcal{LF}\right] = \sum_{t=1}^T \gamma_{n,T-t}(d_t) + b_{n,T} \mid \mathcal{LF}= \sum_{t=1}^T \langle \psi_n^{T-t}, w_{d_t} - w_{0_t}\rangle + b_{n,T} \mid \mathcal{LF}.\]
Given Assumption 21 we know that \(\psi_{n}^{q+i} = 0\) for all \(i \in [T-q-1]\). As such, \[\mathbb{E}\left[Y_{n,t}^{(\bar{d}^T)} \mid \mathcal{LF}\right] = \sum_{t=T-q}^T \langle \psi_n^{T-t}, w_{d_t} - w_{0_t}\rangle + b_{n,T} \mid \mathcal{LF}= \sum_{t=T-q}^T \gamma_{n,T-t}(d_t) + b_{n,T} \mid \mathcal{LF}\]
We modify the SBE-PCR
estimator accordingly \[\hat{\mathbb{E}}\left[Y_{n,T}^{(\bar{d}^T)} \mid \mathcal{LF}\right] := \sum_{t = T-q}^T \hat{\gamma}_{n,T-t}(d_t) + \hat{b}_{n,T}.\]
Applying the analysis from the proof of Theorem 7 yields the desired result.
Vasilis Syrgkanis was supported by NSF Award IIS-2337916. Haeyeon Yoon was supported by the National Research Foundation of Korea (NRF) Grant 2024S1A5A8022044. We are grateful for the data provided by the Export-Import Bank of Korea, Korea Trade Insurance Corporation, Statistics Korea, and the Korea Statistics Promotion Institute. All the results have been reviewed to ensure that no confidential information is disclosed. All errors are ours.↩︎
The notation \(0_t\) is introduced to allow a general control action that is not necessarily “no treatment.”↩︎
We are slightly abusing notation as the potential outcome \(Y_{n, t}^{(\bar{D}^\ell_n, \underline{d}^{\ell + 1})}\) is only a function of the first \(t - \ell\) components of \(\underline{d}^{\ell + 1}\), which is actually a vector of length \(T - \ell\).↩︎
Notice that depending on if unit \(n\) is in the donor set \(\mathcal{I}_t^d\) our covariate matrix size varies. This is intentional in order to unify notation between units in donor sets and those not in donor sets, since \(\mathcal{I}_t^d \setminus n = \mathcal{I}_t^d\) if \(n \notin \mathcal{I}_t^d\).↩︎
Here the latent factors we condition upon include the feature vectors \(\{\rho_i\}_{i \in [p]}\). We also assume this for donor sets of the form \(\mathcal{I}_t^d \setminus n\).↩︎
Notice that \(\alpha_{\mathcal{I}} \leq \pi_{\mathcal{I}}\) by definition.↩︎
To be explicit we are taking \(N \to \infty\), \(p \to \infty\) and with the additional assumption that each donor set grows at the same rate there is a regime, i.e., relationship between \(p\) and \(N\) where estimation errors decays.↩︎
We implicitly assume we have access to outcomes till time step \(2T - 1\). which we assume to be true without loss of generality. To see why consider \(t^*_n = T\) and \(t = T - 1\).↩︎
This annual survey provides detailed information on inputs, outputs, and trade activities of all firms with at least 50 employees and annual sales exceeding 300 million KRW (around 215K USD).↩︎
TFP is measured as value-added divided by \(K^{1/3}L^{2/3}\), where \(K\) denotes tangible capital stock and \(L\) the number of workers. Sales, tangible capital stock, value-added, total wage bill, and R&D expenditure are expressed in natural logarithms, with the underlying unit being million KRW (around 720 USD).↩︎
Capital intensity is defined as tangible capital stock per worker, and wage per worker is measured as the total wage bill divided by the number of workers.↩︎
The analysis with \(\bar{d}^T=(d,d,0,0,0)\), \((0,d,0,d,0)\), and \((0,0,0,d,d)\) produces a similar result, especially for insurance as support.↩︎
One support is counted as one unit (e.g., the cost of giving insurance or a loan is one, and giving both is two).↩︎
Pre-treatment averages of export share, sales, employment, tangible capital, value added, TFP, total wage bill, R&D expenditure, debt-to-asset ratio, liquidity ratio, and an indicator for FDI status, as well as parent-company affiliation, firm age, and industry dummies for above-average capital intensity and wage per worker.↩︎
This is equivalent to the column space of the right singular vectors of \(\mathbb{E}[Y_{\mathcal{I}_T^0 \setminus n}]^{\top}\) being included in the column space of \(V\), or equivalently \(\mathbb{E}[Y_{\mathcal{I}_T^0 \setminus n}] \in \text{span}(\{v_1, \dots, v_{k_{\mathcal{I}_T^0 \setminus n}}\})\).↩︎
Notice that the analysis from [10] that resolved the donor unit analysis no longer applies since \(\eta_{\mathcal{I}_T^0}\) is not composed of independent \(\sigma^2\)-subgaussian random variables.↩︎
In order to get this final rate we made some assumptions on how \(|\mathcal{I}_T^0|\) and \(p\) grow relative to each other.↩︎
The negation is used primarily for convenience sake as it makes no difference in the final rate.↩︎
To be precise this theorem statement does not require any of the corresponding assumptions in Section 4.↩︎
Notice that \(\alpha_{\mathcal{I}} \leq \pi_{\mathcal{I}}\) by definition.↩︎
The vectorized baseline term is defined similarly to outcome as shown above.↩︎
The negation is used primarily for convenience sake as it makes no difference in the final rate.↩︎
The negation is used primarily for convenience sake as it makes no difference in the final rate.↩︎