Abstract

In the classic quickest change detection problem, an observer performs only one experiment to monitor a stochastic process. This paper considers the case where, at each observation time, the decision-maker needs to choose between multiple experiments with different information qualities and costs. The goal is to minimize the worst-case average detection delay subject to false alarm and cost constraints. An algorithm called the 2E-CUSUM Algorithm has been developed to achieve this goal for the two-experiment case. Extensions to multiple-experiment designs are also studied, and 2E-CUSUM is extended accordingly. Data efficiency, where the observer has the choice not to perform an experiment, is explored as well. The proposed algorithms are analyzed and shown to be asymptotically optimal.

Controlled sensing, CUSUM, data-efficient, experiment design, multiple-experiment, quickest change detection, sampling control.

1 Introduction↩︎

The problem of quickest change detection (QCD) involves a stochastic process whose statistical properties change at some point in time and which an observer (or decision-maker) wishes to detect as soon as possible. QCD algorithms have been developed for statistical process control, sensor networks surveillance, computer network security, and network analysis, among others [1]–[4]. These algorithms have been developed to optimize a metric on the detection delay, subject to a constraint on a metric on the rate of false alarms. We provide an overview of the relevant literature in Section 1.1.

The classical QCD problem involves observing outputs from one experiment that monitors the stochastic process under investigation. In this paper, we study QCD with experiment design where there are multiple experiments that are able to detect the change in the process. Specifically, we assume that when a change occurs, no matter which experiment is performed, the change can be detected with that experiment. The decision-maker can choose only one of the experiments during an observation time, and there is an associated cost with each experiment. We also study the case of data-efficiency, where there is a constraint on the fraction of time any experiment is used. There is extensive literature on sequential hypothesis testing and quickest change detection, particularly in the context of experimental design. We provide a brief overview of this literature in Section 1.1. To the best of our knowledge, the precise setup we study in this paper has not been investigated before.

As an example of our problem setup, consider a battery-operated intruder or anomaly detection system (e.g., an autonomous vehicle) with a motion sensor and a video sensor, with the assumption that the latter sensor gives more accurate information but is costlier than the former. In this setup, one can perform three experiments: 1) activating and observing just the motion sensor, 2) using only the video sensor, and 3) using both the sensors at the same time. At any given time, the observer can choose to perform only one of the three experiments. Budget and energy consumption constraints can limit the number of times an experiment is chosen, so the decision-maker has to carefully choose which experiment to perform, leading to a constraint in the fraction of time each sensor is used. In some situations, when the decision-maker believes that no change or anomaly can occur, it might decide not to employ any sensors, leading to energy efficiency or data efficiency. More generally, when there are a total of \(\ell\) sensors, there are \(2^{\ell}\) possible experiments one can perform, including the one where no experiment is chosen.

In this paper, we develop an algorithm or design a policy that can sequentially detect the change as soon as it occurs, while also deciding which experiment to perform at each time step. The policy avoids false alarms and also satisfies various constraints on the fraction of time each experiment is performed before the change occurs. We also show that this algorithm is second-order optimal, i.e., optimal within a constant of the best possible. While we discussed experiments involving sensors as examples, the mathematical discussion in the paper applies to an abstract concept of an experiment qualified by its Kullback-Leibler divergence.

We begin our mathematical discussion by first providing a problem setup for the case of \(m\) experiments (Section 2). Before providing a solution and analysis for this, we consider first the special cases of two- (Section 3) and three-experiment (Section 4) systems to understand the various issues involved. To minimize the worst-case average detection delay (WADD), a metric that we use for delay, for the two-experiment design case, we propose an algorithm which we call the Two-Experiment CUSUM (2E-CUSUM) Algorithm. A detection threshold is used to satisfy the average run length to false alarm (ARLFA) constraint, while a scaling factor and truncated test are used to satisfy what we call the Pre-change Observation Ratio (POR), which is the fraction of time an experiment is performed. Truncated test provides a limit on the number of times an experiment can be done before reverting to a higher-quality experiment. 2E-CUSUM is then extended for the three-experiment scenario. Motivated by the structure of the algorithm for three experiments, we then explore the data-efficient two-experiment design scenario (Section 5). We then extend this to the data-efficient three-experiment case. Finally, we use the insights from these simple cases to generalize to an arbitrary number of experiments (Section 6).

1.1 Prior Work↩︎

In the classical problem of QCD, there is only one experiment to be performed. Algorithms and optimality theories in this single-experiment setup are developed in [1]–[11].

The problem of sequential hypothesis testing with experiment design has been studied in [12]–[28]. A foundational idea in this type of analysis is that there are unknown parameters in the observation process, and the notion of the best experiment to perform depends on these unknown parameters. For example, in an intrusion detection problem, an unknown parameter could capture the true location of an intruder, and the optimal experiment would be to collect data near that location. The problem studied in our paper is a QCD problem, and we assume that the optimal experiment is known, i.e., there are no unknown parameters in the distributions of the observed data.

The problem of QCD with experiment design has been studied in [29]–[35]. In some of these papers [29], [30], there are unknown parameters in the post-change distribution, and the optimal experiment to perform depends on this unknown parameter. Also, there is no direct cost constraint on the experiments. In other works [31]–[35], the problem is studied in a multi-stream or multi-channel setup, where there is a constraint on the number of streams one can observe at a time. Our problem setup differs from those studied in these papers because we assume that there is a cost for each experiment, and there is a notion of the best experiment that can be performed. In this sense, there is an asymmetry in our problem that is absent from these papers.

The problem of QCD with experiment design, with cost constraints on the experiment, has been studied in [36]–[42] and [43]. The papers [36]–[42] study the problem where there is only one experiment, but there is a penalty associated with performing that experiment all the time or a reward for data-efficiency or energy-efficiency. This paper extends the works in these papers by allowing for multiple experiments. The nature of our results is similar in spirit to those in [36]–[42]. But we provide a comprehensive study while addressing several issues that are encountered in such an extension.

The work in [43] extends the works in [36]–[42] by allowing for two experiments without resorting to data-efficiency. Our general algorithm for multiple experiments also applies to the two-experiment case. However, our two-experiment solution differs from that in [43] in several ways. First, in our algorithm, we begin processing data by using the experiment with higher quality and switch to the experiment with lower quality only when the data suggests we should do so. Second, our algorithm can be designed to satisfy every possible constraint value. Finally, our algorithm design is different and is amenable to extension to multiple experiments. We discuss our two-experiment solution in Section 3 and provide detailed justification for each of the design choices.

2 Problem Formulation↩︎

In this section, we formulate the problem of quickest change detection (QCD) with cost-constrained experiment design. Consider a process whose statistical properties change at time \(\nu\). At each time point, a decision-maker observes the process by performing one of \(m\) experiments, with experiment \(m\) having the highest information quality, experiment \(m - 1\) the next-highest, and so on, and experiment 1 having the lowest quality (to be made precise below). Let \(X_n^i\) be the observation at time \(n\) using experiment \(i\). The sequence of random variables \(\{ X_n^i \}\) has a probability density function \(f_0^i\) for \(n < \nu\), and \(f_1^i\) for \(n \geq \nu\). Moreover, the sequence \(\{ X_n^j \}\) has a higher information quality than \(\{ X_n^i \}\) for \(j > i\), as measured by their Kullback-Leibler divergence, i.e., \[\label{eq:KLDiverg95order} D(f_1^m \mathrel{\Vert} f_0^m) > D(f_1^{m - 1} \mathrel{\Vert} f_0^{m - 1}) > \dots > D(f_1^1 \mathrel{\Vert} f_0^1).\tag{1}\] As a consequence, in many practical situations, it seems reasonable to assume that the experiment \(j\) is costlier to perform than experiment \(i\), for \(j > i\).

At each time \(n \geq 0\), the decision-maker decides whether or not a change has occurred based on the information at time \(n\). If the information available suggests no change has occurred, the decision-maker decides which experiment to perform at time \(n + 1\). To formalize this, let \(S_n^i\) (“switch") be the indicator random variable such that \(S_n^i = 1\) if experiment \(i\) is used for decision-making, and \(S_n^i = 0\) otherwise. Also, let \[S_n = (S_n^1, \dots, S_n^m).\] The decision vector \(S_{n + 1}\) is a function of the information \(I_n\) available at time \(n \geq 1\): \[\label{eq:Sn95def} S_{n+1} = \phi_n(I_n), \; n \geq 0.\tag{2}\] Here \(I_n = ( I_n^1, I_n^2, \dots, I_n^m )\) with \[I_n^i = \left[ S_1^i, \dots, S_n^i, (X_1^i)^{(S_1^i)}, \dots, (X_n^i)^{(S_n^i)} \right],\] and \(I_0^i = \varnothing\). For \(k = 1, \dots, n\), \((X_k^i)^{(S_k^i)}\) represent \(X_k^i\) if \(S_k^i = 1\). Otherwise, \(X_k^i\) is not in \(I_n^i\).

At any time \(n\), using the information \(I_n\) available at that time, the decision-maker decides whether a change has occurred using a stopping time \(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont mE}}\). Therefore, a policy for the multiple-experiment QCD is \(\Psi = \{ \tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont mE}}, \phi_0, \dots, \phi_{\tau - 1} \}\).

Each experiment has a different cost, with higher-quality experiments possibly being costlier than lower-quality experiments. Thus, we seek control policies that meet a constraint on the fraction of time each experiment is done before the change point. We propose the following observation cost metric, which we call the Pre-change \(i\) Observation Ratio (\(\mathrm{POR}_i\)) for experiment \(i\): \[\label{eq:POR95def} \mathrm{POR}_i (\Psi) = \limsup_{n \to \infty} \mathsf{E}_n \left[ \frac{1}{n} \sum_{k = 1}^{n - 1} S_k^i\right] = \limsup_{n \to \infty} \mathsf{E}_\infty \left[ \frac{1}{n} \sum_{k = 1}^{n - 1} S_k^i\right].\tag{3}\] Clearly, \(\mathrm{POR}_i (\Psi) \leq 1\). If experiment \(i\) is performed all the time in a policy, then \(\mathrm{POR}_i (\Psi) = 1\). However, we avoid this case because this scenario will be too costly for the decision-maker. Hence, we consider the case where \(\mathrm{POR}_i (\Psi) < 1\). We note that another way to define \(\mathrm{POR}_i\) is to define it as \[\limsup_{n \to \infty} \mathsf{E}_\infty \left[ \frac{1}{n} \sum_{k = 1}^{n - 1} S_k^i \; \Bigg\vert \; \tau \geq n \right].\] However, based on the analysis done in [36], [38], it is clear that both expressions lead to similar analysis, with the unconditional metric leading to simpler analysis. The metric defined in 3 is interpreted as one where the metric is calculated when the observation process continues forever without ever stopping.

For the mean time to false alarm, also called the average run length to false alarm (ARLFA), we consider the metric used in [8] and [44], and constrain it to be a nonnegative number \(\gamma\): \[\mathrm{ARLFA} (\Psi) = \mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont mE}}] \geq \gamma.\]

For the detection delay, we consider Lorden’s criterion where we minimize the worst-case average detection delay (WADD) [8]: \[\mathrm{WADD} (\Psi) = \sup_{\nu \geq 1} \mathop{\mathrm{ess\,sup}}\mathsf{E}_\nu [(\tau - \nu + 1)^+ \mid I_{\nu - 1}].\]

Our optimization problem is a minimax formulation for the multiple-experiment QCD problem: \[\begin{align} \begin{aligned} \min_{\Psi} \quad & \mathrm{WADD} (\Psi) \\ \text{s.t. } \quad & \mathrm{ARLFA} (\Psi) \geq \gamma \\ & \mathrm{POR}_i (\Psi) \leq \beta_i \text{ for } i = 2, \dots, m \end{aligned} \label{eq:mE-qcd} \end{align}\tag{4}\] where \(\gamma \geq 0\), \(0 < \beta_m < 1\), and \(0 \leq \beta_i < 1\) for \(i = 2, \dots, m - 1\) are given constants. Note that \(\mathrm{POR}_1 (\Psi) \leq \beta_1\) where \(\displaystyle\beta_1 = 1 - \sum_{i = 2}^m \beta_i\).

Remark 1. In Section 5, we consider the case where \(\displaystyle\sum_{i = 1}^m \beta_i < 1\). In this instance, \(\displaystyle 1 - \sum_{i = 1}^m \beta_i\) is the fraction of time no experiment is performed. We call this the data-efficiency case of the multiple-experiment QCD problem.

The challenge is to come up with an asymptotically optimal algorithm that achieves the constraints on ARLFA and POR. The solution is given in Section 6. However, in order to appreciate and understand its analysis, we consider first the special cases of two- and three-experiment designs (and their data-efficiency cases).

3 Two-Experiment Design↩︎

In this section, we consider the special case of the two-experiment QCD problem and propose an algorithm to solve it. We then analyze the algorithm and show its asymptotic optimality. For simplicity of notation, we use variables \(X\) and \(Y\) to denote the observations from the two experiments.

Suppose a decision-maker wants to detect the change in a stochastic process by performing one of two experiments at any given time: experiment \(X\) or experiment \(Y\), with experiment \(Y\) having the higher information quality, i.e., KL-divergence. The pre- and post-change densities for \(X\) are given by \(f_0^X\) and \(f_1^X\), respectively. The corresponding densities for \(Y\) are \(f_0^Y\) and \(f_1^Y\).

We wish to minimize detection delay subject to an ARLFA constraint and constraints on the fraction of time each experiment is performed. Let \(\Psi\) be the policy for the two-experiment QCD. If \(\beta_Y\) is the fraction of time experiment \(Y\) is performed and \(\mathrm{POR}_Y\) is the POR of experiment \(Y\), then the two-experiment QCD problem is \[\begin{align} \begin{aligned} \min_{\Psi} \quad & \mathrm{WADD} (\Psi) \\ \text{s.t. } \quad & \mathrm{ARLFA} (\Psi) \geq \gamma \\ & \mathrm{POR}_Y (\Psi) \leq \beta_Y \end{aligned} \label{eq:2e-qcd} \end{align}\tag{5}\] where \(\gamma \geq 0\) and \(0 < \beta_Y < 1\) are given constants. Note that \(\mathrm{POR}_X (\Psi) \leq \beta_X\) where \(\beta_X = 1 - \beta_Y\).

3.1 The 2E-CUSUM Algorithm↩︎

The classical Cumulative Sum (CUSUM) Algorithm, proposed in [6], can be described as follows:

Remark 2. The corresponding variables for the experiment \(Y\) would be \(\Psi_{\mathrm{C}}^X\), \(C_n^Y, \tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Y, \tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Y(A), \ell_Y\).

We build on this algorithm and allow negative values for the statistic to develop what we call the Two-Experiment CUSUM (2E-CUSUM) Algorithm, which minimizes WADD for the two-experiment QCD problem. This approach of using negative values is inspired by the work in [36]. The algorithm proposed below needs two important parameters: a scale factor \(a_Y\) and the limit for a truncated test \(N_X\). We first discuss the algorithm and then the motivation behind its design, including comparisons with a similar algorithm developed in [43].

A typical evolution of the 2E-CUSUM algorithm is illustrated in Figure 3. The achievable \(\mathrm{POR}_Y\) values as a function of the scale factor \(a_Y\) and truncation \(N_X\), as shown in Figure 4, show that \(\mathrm{POR}_Y\) decreases rapidly as a function of \(a_Y\). This behavior exists because increasing the scale factor \(a_Y\) is equivalent to increasing the threshold for the CUSUM algorithm for \(X\) observations. Since \(\mathrm{POR}_Y\) value is calculated under the pre-change hypothesis, the drift of the CUSUM for \(X\) is negative, and the time to reach the level \(0\) is exponential in \(a_Y U_Y\). The \(\mathrm{POR}_Y\) values were calculated using Monte Carlo simulations.

Remark 3. We now comment on the design choices in the 2E-CUSUM algorithm.

Special case: When \(N_X = 0\), the 2E-CUSUM algorithm reduced to the CUSUM algorithm for experiment \(Y\) (and experiment \(X\) is never performed).
Achieving high \(\mathrm{POR}_Y\) values: Unlike the algorithm proposed in [43], the 2E-CUSUM algorithm can be designed to achieve any \(\mathrm{POR}_Y\) value between \(0\) and \(1\). This is important in applications where a low value of delay is important, but experiment design is also desired. In these cases, one may choose a \(\mathrm{POR}_Y\) value between \(0.5\) and \(1.0\) without significantly affecting the delay. In short, the best trade-off is often achievable for higher \(\mathrm{POR}_Y\) values [36].
Starting the decision process with the \(Y\) experiment: 2E-CUSUM starts by choosing experiment \(Y\) first because we wish to use the better experiment to detect the change quickly if it occurs early. If the change occurs at a later time, then POR constraints imply that it will not matter which experiment is performed first. This is in contrast with the algorithm in [43], which starts with experiment \(X\).

Figure 3: An evolution of the statistic D_n of 2E-CUSUM for f_0^X = f_0^Y = \mathcal{N}(0, 1), f_1^X = \mathcal{N}(0.75, 1), f_1^Y = \mathcal{N}(1, 1), \nu = 50, and \gamma = 1000, with a_Y = 1.0 and N_X = 2.0. The decision-maker performs experiment Y whenever D_n \geq 0, and experiment X when D_n < 0. — Figure 3: An evolution of the statistic \(D_n\) of 2E-CUSUM for \(f_0^X = f_0^Y = \mathcal{N}(0, 1)\), \(f_1^X = \mathcal{N}(0.75, 1)\), \(f_1^Y = \mathcal{N}(1, 1)\), \(\nu = 50\), and \(\gamma = 1000\), with \(a_Y = 1.0\) and \(N_X = 2.0\). The decision-maker performs experiment \(Y\) whenever \(D_n \geq 0\), and experiment \(X\) when \(D_n < 0\).

Figure 4: \mathrm{POR}_Y of 2E-CUSUM as a function of the scaling factor a_Y for f_0^X = f_0^Y = \mathcal{N}(0, 1), f_1^X = \mathcal{N}(0.75, 1), and f_1^Y = \mathcal{N}(1, 1) with scaling factor N_X \in \{ 0.1, 1, 10, 100 \}. \mathrm{POR}_Y exponentially decreases as a function of a_Y. — Figure 4: \(\mathrm{POR}_Y\) of 2E-CUSUM as a function of the scaling factor \(a_Y\) for \(f_0^X = f_0^Y = \mathcal{N}(0, 1)\), \(f_1^X = \mathcal{N}(0.75, 1)\), and \(f_1^Y = \mathcal{N}(1, 1)\) with scaling factor \(N_X \in \{ 0.1, 1, 10, 100 \}\). \(\mathrm{POR}_Y\) exponentially decreases as a function of \(a_Y\).

3.2 Analysis and Optimality of the 2E-CUSUM Algorithm↩︎

Figure 5: One pre-change cycle of 2E-CUSUM. Using pre-change distributions of \{ X_n \} and \{ Y_n \}, \sigma^Y is the length of time experiment Y is performed while \tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^X (a_Y U_Y, N_X) is the length of time experiment X is chosen during the cycle. — Figure 5: One pre-change cycle of 2E-CUSUM. Using pre-change distributions of \(\{ X_n \}\) and \(\{ Y_n \}\), \(\sigma^Y\) is the length of time experiment \(Y\) is performed while \(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^X (a_Y U_Y, N_X)\) is the length of time experiment \(X\) is chosen during the cycle.

In this section, we analyze the 2E-CUSUM algorithm and prove its optimality. We define some new notations to be used in the theorem. Define \[\sigma^Y (A) = \inf \{ n \geq 0: D_n < 0 \text{ without ever crossing } A \}.\] We let \(\sigma^Y (\infty) = \sigma^Y\), i.e., the first time the statistic falls below 0. We also define (recall Algorithm 1) \[\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^X (A, N_X) = \min\{\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^X(A), N_X\}.\] Also, recall from the definition of the 2E-CUSUM algorithm that \(U_Y\) is the variable we use for the undershoot of the CUSUM statistic for experiment \(Y\), when it falls below zero for the first time: \(U_Y = -D_{\sigma^Y}\). Figure 5 illustrates these notations.

Theorem 1. For the 2E-CUSUM Algorithm, let \(0 < D(f_1^X \mathrel{\Vert} f_0^X) < D(f_1^Y \mathrel{\Vert} f_0^Y) < \infty\).

For any fixed \(a_Y > 0\) and \(N_X > 0\), with \(A = \log \gamma\), \[\mathrm{ARLFA}(\Psi_{\mathrm{2E}}) \geq \gamma.\]
For fixed values of \(A\), \(a_Y > 0\), and \(N_X > 0\), \[\begin{align} & \mathrm{POR}_Y (\Psi_{\mathrm{2E}} (A, N_X, a_Y)) = \frac{\mathsf{E}_\infty [\sigma^Y]}{\mathsf{E}_\infty [\sigma^Y] + \mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^X (a_Y U_Y, N_X)]}. \end{align}\] For any choice of \(A\), there exist \(a_Y^* > 0\) and \(N_X^*>0\) to meet any given POR constraint of \(\beta_Y \in [0,1]\) with equality: \[\mathrm{POR}_Y (\Psi_{\mathrm{2E}} (A, a_Y^*, N_X^*)) = \beta_Y.\]
For fixed values of \(a_Y > 0\) and \(N_X > 0\), and for each \(A\), we have \[\mathrm{WADD}(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}) = \mathsf{E}_1 [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}] + N_X.\] Consequently, we have \[\mathrm{WADD}(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}) = \mathrm{WADD}(\Psi_{\mathrm{C}}^Y) + K_2,\] where \(\Psi_{\mathrm{C}}^Y\) is the CUSUM algorithm for experiment \(Y\) (see Remark 2), and \(K_2\) is a constant and is not a function of \(A\).

Proof.

It is well known that \[\mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Y (A)] \geq e^A,\] i.e., the expected length of time it takes for the CUSUM statistic (for \(X\), \(Y\), or another other) to cross the threshold \(A\), under the pre-change model, is lower-bounded by \(e^A\) [8]. Because of the independence of observations, the sojourns of the statistic \(D_n\) below zero, and the time spent conducting the experiment \(X\), only add to the time it takes to hit the threshold \(A\). Thus, \[\mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}] \geq \mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}} (S_n = 1 \;\; \forall n)] = \mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Y (A)] \geq e^A.\] Setting \(A = \log \gamma\), we get \[\mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}] \geq e^{\log \gamma} = \gamma.\]
This result follows from the renewal reward theorem because every time the statistic of the 2E-CUSUM algorithm, \(D_n\), goes above zero from below, it is reset to zero. This creates a renewal cycle with the reward being the time spent by statistic \(D_n\) above \(0\). Under the pre-change distribution, the expected number of observations of experiment \(Y\) is \(\mathsf{E}_\infty [\sigma^Y]\). Now, at time \(\sigma^Y\), we have \(D_{\sigma^Y} = a_Y U_Y\). Hence, the expected number of observations of experiment \(X\) is \(\mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^X (a_Y U_Y, N_X)]\). Therefore, \[\mathrm{POR}_Y (\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}) = \frac{\mathsf{E}_\infty [\sigma^Y]}{\mathsf{E}_\infty [\sigma^Y] + \mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^X (a_Y U_Y, N_X)]}.\] Moreover, \[\mathrm{POR}_X (\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}) = 1 - \mathrm{POR}_Y (\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}).\] Finally, note that as \(N_X \to 0\), \[\mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^X (a_Y U_Y, N_X)] \to 0, \quad \text{and} \quad \mathrm{POR}_Y \to 1.\] Also, as \(N_X \to \infty\) and \(a_Y \to \infty\), \[\mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^X (a_Y U_Y, N_X)] \to \infty, \quad \text{and} \quad \mathrm{POR}_Y \to 0.\]
Suppose the change occurs at time \(\nu=2\). The essential supremum operation in Lorden’s delay metric would cause the observation at time \(1\) to be such that the undershoot \(U_Y \to \infty\). Then the only way for the statistic \(D_n\) to go above zero would be to exhaust all the sample values \(N_X\) designated for \(X\). Similar situations when \(\nu=3, 4, \dots\) would give us that \[\mathrm{WADD}(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}) = \mathsf{E}_1 [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}] + N_X = \mathsf{E}_1 [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Y] + K + N_X = \mathrm{WADD}(\Psi_{\mathrm{C}}^Y) + K + N_X,\] where \(K\) is a constant not a function of the threshold \(A\). The second equality follows because after the change occurs, and after a finite number of bounded excursions below \(0\), the statistic \(D_n\) would never go below zero [36].

◻

The above theorem implies strong optimality properties of the 2E-CUSUM algorithm.

Corollary 1. Under the conditions of the theorem, suppose we set \(A = \log \gamma\) to satisfy \[\mathrm{ARLFA}(\Psi_{\mathrm{2E}}) \geq \gamma.\] Also, we set \(a_Y\) and \(N_X\) to values to satisfy \[\mathrm{POR}_Y (\Psi_{\mathrm{2E}}) \leq \beta_Y.\] Then \[\mathrm{WADD}(\Psi_{\mathrm{2E}} (A, a_Y, N_X)) \sim \mathrm{WADD}(\Psi_{\mathrm{C}}) \sim \frac{\log \gamma}{D(f_1^Y \mathrel{\Vert} f_0^Y)}, \quad \gamma \to \infty.\] This shows that the 2E-CUSUM algorithm is asymptotically optimal for the two-experiment QCD problem stated in 4 .

Proof. The asymptotic performance follows from the performance of the CUSUM algorithm and the theorem statement. For the statement on asymptotic optimality, we note that if the experiment \(X\) is used for a fixed and positive fraction of time after the change, and all the way to the stopping, then the information number accumulated over time will be a convex combination of the KL-divergence for \(X\) and the KL-divergence for \(Y\). According to the general theory of change detection developed in [10], the delay performance will then be inversely proportional to this convex combination, resulting in a higher delay. Thus, any alternative solution to this problem must ultimately utilize \(Y\), resulting in the delay we obtain with the 2E-CUSUM algorithm. ◻

3.3 Performance Plots for the 2E-CUSUM Algorithm↩︎

In this subsection, we provide delay-false alarm trade-off curves for the 2E-CUSUM algorithm computed using Monte Carlo simulations. For ARLFA, each simulation of an algorithm is run using only pre-change distributions and, once the algorithm ends, the termination time is recorded. ARLFA is the average of the simulated termination times. WADD is computed in the same manner but using only post-change distributions. To get the value of the WADD metric, we add \(N_X\) to the computed average termination times. These values are reported in Figure 6 for different choices of the POR constraint. As can be seen from the figure, as the POR value goes to \(1\), the performance of the 2E-CUSUM algorithm approaches that of the CUSUM algorithm using experiment \(Y\).

We also compare the performance of our algorithm with an alternative way to solve the two-experiment problem, which we call the Random Switch Scheme (RSS). To evaluate the performance of 2E-CUSUM against that of CUSUM and Random Switch Scheme (RSS), we plot their WADD vs \(\log (\mathrm{ARLFA})\) graphs for different values of detection threshold \(A\). The RSS works as follows. The statistic \(D_n\) starts at 0, and \(D_n\) evolves according to CUSUM using observations from either experiment \(Y\) or experiment \(X\). The decision-maker performs experiment \(Y\) first. After updating \(D_n\), a (possibly biased) coin is tossed: if it lands on heads, experiment \(Y\) is done next; otherwise, experiment \(X\) is performed. The process continues until \(D_n \geq A\), at which point the algorithm stops and the decision-maker declares that a change has occurred. Figure 7 shows the WADD vs \(\log (\mathrm{ARLFA})\) graphs of 2E-CUSUM, CUSUM, and RSS. To allow proper comparison of 2E-CUSUM and RSS, we set their parameters to correspond to the same POR levels of \(\mathrm{POR}_Y = \mathrm{POR}_X = 0.50\). Among the three algorithms, since CUSUM uses more informative observations all the time, it has the lowest detection delay. We also observe that 2E-CUSUM outperforms RSS as evidenced by the former’s lower WADD.

Figure 6: WADD vs \log (\mathrm{ARLFA}) graphs of 2E-CUSUM for f_0^X = f_0^Y = \mathcal{N}(0, 1), f_1^X = \mathcal{N}(0.75, 1), and f_1^Y = \mathcal{N}(1, 1) and different \mathrm{POR}_Y levels. 2E-CUSUM performs better at a higher \mathrm{POR}_Y level. — Figure 6: WADD vs \(\log (\mathrm{ARLFA})\) graphs of 2E-CUSUM for \(f_0^X = f_0^Y = \mathcal{N}(0, 1)\), \(f_1^X = \mathcal{N}(0.75, 1)\), and \(f_1^Y = \mathcal{N}(1, 1)\) and different \(\mathrm{POR}_Y\) levels. 2E-CUSUM performs better at a higher \(\mathrm{POR}_Y\) level.

Figure 7: WADD vs \log (\mathrm{ARLFA}) graphs of 2E-CUSUM, CUSUM, and RSS for f_0^X = f_0^Y = \mathcal{N}(0, 1), f_1^X = \mathcal{N}(0.75, 1), and f_1^Y = \mathcal{N}(1, 1). For 2E-CUSUM we set the parameters at a_Y = 1.0 and N_X = 2.0, corresponding to \mathrm{POR}_Y = \mathrm{POR}_X = 0.50. For RSS, we set the probability of performing experiment Y at p_Y = 0.50. 2E-CUSUM outperforms RSS. — Figure 7: WADD vs \(\log (\mathrm{ARLFA})\) graphs of 2E-CUSUM, CUSUM, and RSS for \(f_0^X = f_0^Y = \mathcal{N}(0, 1)\), \(f_1^X = \mathcal{N}(0.75, 1)\), and \(f_1^Y = \mathcal{N}(1, 1)\). For 2E-CUSUM we set the parameters at \(a_Y = 1.0\) and \(N_X = 2.0\), corresponding to \(\mathrm{POR}_Y = \mathrm{POR}_X = 0.50\). For RSS, we set the probability of performing experiment \(Y\) at \(p_Y = 0.50\). 2E-CUSUM outperforms RSS.

4 Three-Experiment Design↩︎

In this section, we consider the case where the decision-maker can choose between three experiments \(X\), \(Y\), or \(Z\), where experiment \(Z\) has the highest information quality while experiment \(X\) has the lowest, i.e., \[D(f_1^Z \mathrel{\Vert} f_0^Z) > D(f_1^Y \mathrel{\Vert} f_0^Y) > D(f_1^X \mathrel{\Vert} f_0^X).\] We first formulate the three-experiment QCD problem. We then present the Three-Experiment CUSUM (3E-CUSUM) Algorithm, which solves the problem and is an extension of the 2E-CUSUM algorithm. Finally, the mathematical analysis of the algorithm and its optimality is presented.

Motivated by the design and optimality of the 2E-CUSUM algorithm, the algorithm for a three-experiment QCD problem will start with the CUSUM statistic for experiment \(Z\), and when the statistic goes below zero, it will switch to the next costly experiment \(Y\). At this point, we need to address, among other issues, how to switch to \(X\), and more importantly, how to truncate the use of \(Y\) and \(X\) precisely to make sure that the POR constraints can be met with equality. At the same time, we need to make sure that WADD for the stopping rule is finite. The goal of this section is to address these issues in a precise manner.

The three-experiment QCD problem is similar to the two-experiment case, except we impose an additional constraint on the fraction of time experiment \(Y\) is performed. By extending the definitions of the variables defined in the previous section, our optimization problem is a minimax formulation for the three-experiment QCD problem: \[\begin{align} \begin{aligned} \min_{\Psi} \quad & \mathrm{WADD} (\Psi) \\ \text{s.t. } \quad & \mathrm{ARLFA} (\Psi) \geq \gamma \\ & \mathrm{POR}_Z (\Psi) \leq \beta_Z \\ & \mathrm{POR}_Y (\Psi) \leq \beta_Y \end{aligned} \label{eq:3e-qcd} \end{align}\tag{6}\] where \(\gamma \geq 0\), \(0 < \beta_Z < 1\), and \(0 \leq \beta_Y < 1\) are given constants. Note that \(\mathrm{POR}_X (\Psi) \leq \beta_X\) where \(\beta_X = 1 - \beta_Y - \beta_Z\).

4.1 The 3E-CUSUM Algorithm↩︎

We now propose the 3E-CUSUM Algorithm. We use the important notation of \[\Psi_{\mathrm{2E}} (b, b+A, a_Y, N_X)\] which is the 2E-CUSUM starting at \(b\). Here, the switch from \(Y\) to \(X\) occurs at \(b\) (instead of zero), and the stopping threshold is \(b+A\). When \(N_X=0\), this is the CUSUM algorithm for \(Y\) starting at and reflected at \(b\). The 2E-CUSUM algorithm defined in Algorithm 2 is \(\Psi_{\mathrm{2E}} (0, A, a_Y, N_X)\) with \(b=0\). We also use \[\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}(a, a+A, a_Y, N_X)\] to denote the stopping rule of \(\Psi_{\mathrm{2E}} (b, b+A, a_Y, N_X)\).

Next, we define a version of the truncated 2E-CUSUM algorithm. We note that there are many ways to truncate a test, including the classical version of truncating at a fixed time. Our contribution here is to identify a specific way to introduce truncation that can be generalized to multiple experiments. The notation we use for the truncated 2E-CUSUM test is \[\Psi_{\mathrm{2E}} (b, b+A, a_Y, N_X, N_Y).\] This algorithm is similar to the 2E-CUSUM algorithm except that only a maximum number of \(N_Y\) observations of the costlier experiment \(Y\) is allowed. When the last \(N_Y\)th observation of \(Y\) has been used, and the statistic \(D_n\) goes below zero, then the switch to experiment \(X\) does occur. The experiment \(X\) is then performed, and after the statistic comes above \(0\) or \(N_X\) observations are used, the stopping occurs. If \(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}(a, a+A, a_Y, N_X, N_Y)\) is the stopping time for this truncated test, then note that \[\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}(a, a+A, a_Y, N_X, N_Y) \leq N_Y + N_Y N_X.\] We now discuss the 3E-CUSUM algorithm.

For completeness, we describe the 3E-CUSUM algorithm in words. In this algorithm, the statistic \(D_n\) starts at 0, and the decision-maker performs experiment \(Z\). Thus, initially, the statistic \(D_n\) evolves according to the CUSUM statistic from experiment \(Z\) until either it crosses the threshold \(A\) or it falls below \(0\). If it crosses \(A\), we stop and declare the change. If the statistic falls below \(0\), we stop using experiment \(Z\) and switch to experiment \(Y\).

The undershoot, say \(U_Z\), of the statistic \(D_n\) when it goes below \(0\) is scaled by a factor of \(a_Z > 0\). This scaled value of the undershoot \(a_Z U_Z\) becomes the new level zero for a CUSUM statistic update using experiment \(Y\). The statistic \(D_n\) is updated using the CUSUM statistic for \(Y\), until either it goes above zero or goes below the new negative level \(a_Z U_Z\). If the statistic goes above \(0\), we reset it to \(0\) and switch back to experiment \(Z\). At this point, the process renews itself.

If while updating \(D_n\) using the CUSUM update for \(Y\), the statistic goes below \(a_Z U_Z\) (without ever becoming positive), then we switch to experiment \(X\). The undershoot, say \(U_Y\), at this time is scaled by \(a_Y\), and \(a_Z U_Z + a_Y U_Y\) becomes the new level \(0\) for the CUSUM statistic using experiment \(X\).

The update with experiment \(X\) is continued, with statistic \(D_n\) reflected at \(a_Z U_Z + a_Y U_Y\), until the statistic either goes above \(a_Z U_Z\) or consumes \(N_X\) observations. In both cases, we switch back to \(Y\).

Every time we start using the data from experiment \(Y\), we start a counter with \(N_Y\) and count down each time we use an observation of \(Y\). While switching between \(Y\) and \(X\), if we consume \(N_Y\) observations of \(Y\), we switch back to \(Z\). We note, however, that when using the \(N_Y\)th observation, we use the strategy, discussed before Algorithm 8, employed by the truncated 2E-CUSUM algorithm.

When \(N_X = 0\), 3E-CUSUM simply becomes 2E-CUSUM for experiment \(Y\) and \(Z\) (and experiment \(X\) is never chosen). On the other hand, when \(N_Y = 0\), 3E-CUSUM reduces to CUSUM for experiment \(Z\).

Figure 9: An evolution of the statistic D_n of 3E-CUSUM for f_0^X = f_0^Y = f_0^Z = \mathcal{N}(0, 1), f_1^X = \mathcal{N}(0.5, 1), f_1^Y = \mathcal{N}(0.75, 1), f_1^Z = \mathcal{N}(1, 1), \nu = 50, and \gamma = 1000, with a_Y = a_Z = 1.0, N_X = 3.0, and N_Y = 2.0. The decision-maker performs experiment Z whenever D_n \geq 0, experiment Y when D_n is below 0 and above its threshold, and experiment X when D_n is below the threshold for experiment Y. — Figure 9: An evolution of the statistic \(D_n\) of 3E-CUSUM for \(f_0^X = f_0^Y = f_0^Z = \mathcal{N}(0, 1)\), \(f_1^X = \mathcal{N}(0.5, 1)\), \(f_1^Y = \mathcal{N}(0.75, 1)\), \(f_1^Z = \mathcal{N}(1, 1)\), \(\nu = 50\), and \(\gamma = 1000\), with \(a_Y = a_Z = 1.0\), \(N_X = 3.0\), and \(N_Y = 2.0\). The decision-maker performs experiment \(Z\) whenever \(D_n \geq 0\), experiment \(Y\) when \(D_n\) is below 0 and above its threshold, and experiment \(X\) when \(D_n\) is below the threshold for experiment \(Y\).

4.2 Achieving Different POR values↩︎

We now show through simulations that we can design the 3E-CUSUM algorithm to achieve various choices of constraints on the fraction of time each experiment is performed. This fact is also established analytically in the next subsection using the renewal reward theorem to analyze the POR metrics.

We used Monte Carlo simulations to compute the PORs of 3E-CUSUM. Figure 10 and Table 1 show that various combinations of PORs can be achieved using 3E-CUSUM. Note that when \(\beta_Y = 0\), then \(\beta_X = 0\), i.e., if experiment \(Y\) cannot be performed, then experiment \(X\) cannot be chosen as well. When \(\beta_X = 0\), then \(\beta_Y \geq 0\), and experiment \(Y\) can be performed even if experiment \(X\) is not utilized.

Figure 10: Different combinations of target PORs for experiment X, Y, and Z. The blue dots represent combinations of target \beta_X, \beta_Y, and \beta_Z for which we determined the parameters of 3E-CUSUM, which approximately attains the target PORs (see Table 1). The parameters of 3E-CUSUM can be tuned to obtain any target PORs. — Figure 10: Different combinations of target PORs for experiment \(X\), \(Y\), and \(Z\). The blue dots represent combinations of target \(\beta_X\), \(\beta_Y\), and \(\beta_Z\) for which we determined the parameters of 3E-CUSUM, which approximately attains the target PORs (see Table 1). The parameters of 3E-CUSUM can be tuned to obtain any target PORs.

Table 1: Parameter values and POR results for 3E-CUSUM given various target PORs for \(f_0^X = f_0^Y = f_0^Z = \mathcal{N}(0, 1)\), \(f_1^X = \mathcal{N}(0.5, 1)\), \(f_1^Y = \mathcal{N}(0.75, 1)\), and \(f_1^Z = \mathcal{N}(1, 1)\). The parameters of 3E-CUSUM can be tuned to obtain any target PORs.
Target			Parameters				Ouput
\(\beta_X\)	\(\beta_Y\)	\(\beta_Z\)	\(a_Y\)	\(a_Z\)	\(N_X\)	\(N_Y\)	\(\mathrm{POR}_X\)	\(\mathrm{POR}_Y\)	\(\mathrm{POR}_Z\)
0.00	0.99	0.01	1.00	100.00	0.00	200.00	0.0000	0.9904	0.0096
0.00	0.90	0.10	1.00	10.00	0.00	19.00	0.0000	0.9022	0.0978
0.00	0.80	0.20	1.00	1.00	0.00	13.50	0.0000	0.8029	0.1971
0.00	0.70	0.30	1.00	1.00	0.00	5.60	0.0000	0.6992	0.3008
0.00	0.60	0.40	1.00	1.00	0.00	3.30	0.0000	0.6006	0.3994
0.00	0.50	0.50	1.00	1.00	0.00	2.00	0.0000	0.5030	0.4970
0.00	0.40	0.60	1.00	1.00	0.00	1.30	0.0000	0.4031	0.5969
0.00	0.30	0.70	1.00	1.00	0.00	0.80	0.0000	0.2953	0.7047
0.00	0.20	0.80	1.00	1.00	0.00	0.46	0.0000	0.1959	0.8041
0.00	0.10	0.90	1.00	1.00	0.00	0.21	0.0000	0.0997	0.9003
0.00	0.00	1.00	1.00	1.00	0.00	0.00	0.0000	0.0000	1.0000
0.10	0.01	0.89	1.00	1.00	29.50	0.03	0.0996	0.0144	0.8860
0.20	0.01	0.79	10.00	1.00	42.00	0.02	0.2003	0.0088	0.7910
0.30	0.01	0.69	10.00	1.00	75.00	0.02	0.3007	0.0075	0.6918
0.40	0.01	0.59	20.00	1.00	74.00	0.03	0.3976	0.0095	0.5929
0.50	0.01	0.49	20.00	1.00	100.10	0.03	0.5002	0.0082	0.4916
0.60	0.01	0.39	20.00	1.00	155.00	0.03	0.6034	0.0062	0.3903
0.70	0.01	0.29	20.00	1.00	199.00	0.04	0.7050	0.0062	0.2889
0.80	0.01	0.19	20.00	1.00	262.00	0.05	0.8009	0.0053	0.1939
0.90	0.01	0.09	30.00	1.00	300.10	0.11	0.9035	0.0053	0.0912
0.10	0.89	0.01	1.00	60.00	0.25	200.00	0.0966	0.8947	0.0087
0.20	0.79	0.01	1.00	60.00	0.60	200.00	0.2044	0.7881	0.0076
0.30	0.69	0.01	1.00	60.00	1.00	100.00	0.3001	0.6864	0.0134
0.40	0.59	0.01	1.00	60.00	1.65	100.00	0.4011	0.5875	0.0115
0.50	0.49	0.01	1.00	60.00	2.60	100.00	0.4993	0.4913	0.0094
0.60	0.39	0.01	1.00	60.00	4.40	100.00	0.6010	0.3914	0.0076
0.70	0.29	0.01	1.00	60.00	8.30	100.00	0.6998	0.2945	0.0057
0.80	0.19	0.01	10.00	60.00	10.00	70.00	0.8023	0.1923	0.0054
0.90	0.09	0.01	10.00	60.00	24.00	30.00	0.9028	0.0915	0.0057
0.99	0.005	0.005	100.00	60.00	310.00	1.90	0.9893	0.0054	0.0053
0.20	0.600	0.200	1.00	1.00	0.57	7.70	0.1974	0.5984	0.2042
0.20	0.400	0.400	1.00	1.00	0.80	2.00	0.1978	0.4023	0.3999
0.20	0.200	0.600	1.00	1.00	1.55	0.64	0.1967	0.2033	0.6000
0.40	0.400	0.200	1.00	1.00	1.80	4.50	0.4007	0.3982	0.2010
0.40	0.200	0.400	1.00	1.00	3.55	0.95	0.3973	0.2031	0.3997
0.60	0.200	0.200	2.00	1.00	5.50	2.00	0.5986	0.2009	0.2005

4.3 Design and Optimality of the 3E-CUSUM Algorithm↩︎

We now show that we can set the parameters of 3E-CUSUM to achieve any desired ARLFA and POR constraints. We also establish its asymptotic optimality. Define \[\sigma = \sigma^Z (A) = \inf \{ n \geq 0: D_n < 0 \text{ without ever crossing } A \}.\] We let \(\sigma^Z (\infty) = \sigma^Z\), i.e., the first time the statistic falls below 0. Note that in the 3E-CUSUM algorithm, the statistic \(D_n\) is updated using log likelihood ratios for experiment \(Z\) while it is above \(0\). When \(D_\sigma < 0\), we use \(D_\sigma\) to denote its undershoot. Also, we recall that \(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Y (A)\) is the CUSUM stopping rule with observations from experiment \(Y\). Figure 11 illustrates these notations.

Figure 11: One pre-change cycle of 3E-CUSUM. Using pre-change distributions of \{ X_n \}, \{ Y_n \}, and \{ Z_n \}, \sigma^Z is the length of time experiment Z is performed, \tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Y (a_Z U_Z, N_Y) is the length of time experiment Y is chosen, and \tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}^{YX} (a_Z U_Z, N_Y, N_X) is the total length of time experiments X and Y are utilized during the cycle. — Figure 11: One pre-change cycle of 3E-CUSUM. Using pre-change distributions of \(\{ X_n \}\), \(\{ Y_n \}\), and \(\{ Z_n \}\), \(\sigma^Z\) is the length of time experiment \(Z\) is performed, \(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Y (a_Z U_Z, N_Y)\) is the length of time experiment \(Y\) is chosen, and \(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}^{YX} (a_Z U_Z, N_Y, N_X)\) is the total length of time experiments \(X\) and \(Y\) are utilized during the cycle.

Theorem 2. For the 3E-CUSUM Algorithm, let \[0 < D(f_1^X \mathrel{\Vert} f_0^X) < D(f_1^Y \mathrel{\Vert} f_0^Y) < D(f_1^Z \mathrel{\Vert} f_0^Z) < \infty.\]

For any fixed \(a_Y > 0\), \(a_Z > 0\), \(N_X > 0\), and \(N_Y > 0\), with \(A = \log \gamma\), \[\mathrm{ARLFA}(\Psi_{\mathrm{3E}}) \geq \gamma.\]
For fixed values of \(A\), \(a_Y > 0\), \(a_Z > 0\), \(N_X > 0\), and \(N_Y > 0\), \[\begin{align} \mathrm{POR}_Z (\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 3E}}) &= \frac{\mathsf{E}_\infty [\sigma^Z]}{\mathsf{E}_\infty [\sigma^Z] + \mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}(a_Z D_{\sigma^Z}, 0, a_Y, N_X, N_Y)]} \\ \mathrm{POR}_Y (\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 3E}}) & = \frac{\mathsf{E}_\infty [\min \{\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Y (-a_Z D_{\sigma^Z}), N_Y\}]}{\mathsf{E}_\infty [\sigma^Z] + \mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}(a_Z D_{\sigma^Z}, 0, a_Y, N_X, N_Y)]}. \end{align}\] The POR expressions are not a function of \(A\) (because of the way they are defined). Thus, for any \(A\), we can always choose values for \(a_Z\), \(a_Y\), \(N_X\), and \(N_Y\) to meet any given POR constraint of \(\beta_Y\) and \(\beta_Z\): \[\mathrm{POR}_Z (\Psi_{\mathrm{3E}} (A, a_Y, a_Z, N_X, N_Y)) \leq \beta_Z\] and \[\mathrm{POR}_Y (\Psi_{\mathrm{3E}} (A, a_Y, a_Z, N_X, N_Y)) \leq \beta_Y.\]
For fixed values of \(a_Y > 0\), \(a_Z > 0\), \(N_X > 0\), and \(N_Y > 0\), and for each \(A\), \[\mathrm{WADD}(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 3E}}) \leq \mathsf{E}_1 [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 3E}}] + N_Y + N_Y N_X.\] Consequently, \[\mathrm{WADD}(\Psi_{\mathrm{3E}}) \leq \mathrm{WADD}(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Z) + K_3,\] where \(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Z\) is the CUSUM algorithm for the experiment \(Z\) and \(K_3\) is a constant that is not a function of threshold \(A\).

Proof.

The false alarm analysis for 3E-CUSUM is similar to that of 2E-CUSUM. Specifically, for any value of threshold \(A\), we have \[\mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 3E}}] \geq \mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Z (A)] \geq e^A.\]
The expression for \(\mathrm{POR}_Z\) follows again from the renewal reward theorem because under the pre-change distribution, the expected number of observations of experiment \(Z\) is \(\mathsf{E}_\infty [\sigma^Z]\) (without stopping). Also, the average time spent below zero is given by the time for the truncated 2E-CUSUM algorithm to stop and bring the statistic above \(0\): \(\mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}(a_Z D_{\sigma^Z}, 0, a_Y, N_X, N_Y)]\). This gives \[\mathrm{POR}_Z (\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 3E}}) = \frac{\mathsf{E}_\infty [\sigma^Z]}{\mathsf{E}_\infty [\sigma^Z] + \mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}(a_Z D_{\sigma^Z}, 0, a_Y, N_X, N_Y)]}.\]

When \(N_Y = 0\), i.e., experiment \(Y\) is never performed (consequently, experiment \(X\) is never performed as well), \(\mathrm{POR}_Y = 1\) and experiment \(Z\) is utilized all the time. To decrease the fraction of time experiment \(Z\) is chosen, we increase \(N_Y\) and/or \(N_X\) (and also \(a_Y\) and \(a_Z\)), the number of times experiments \(Y\) or \(X\), respectively, can be done. Specifically, note that as \(a_Z \to \infty\) and \(N_Y \to \infty\), we have \(\mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}(a_Z D_{\sigma^Z}, 0, a_Y, N_X, N_Y)] \to \infty\) and, thus, \(\mathrm{POR}_Z \to 0\).

The expression for \(\mathrm{POR}_Y\) also follows from the renewal reward theorem because in the renewal cycles starting with statistic \(D_n=0\) to the point where \(D_n\) is again zero after being negative, the time spent using experiment \(Y\) is equal to the time for a truncated (at \(N_Y\)) CUSUM statistic for experiment \(Y\) to reach \(0\) after starting and getting reflected at the negative level \(a_Z U_Z\). This time is statistically equal to \(\min \{\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Y (-a_Z D_{\sigma^Z}), N_Y\}\). Thus, \[\mathrm{POR}_Y (\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 3E}}) = \frac{\mathsf{E}_\infty [\min \{\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Y (-a_Z D_{\sigma^Z}), N_Y\}]}{\mathsf{E}_\infty [\sigma^Z] + \mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}(a_Z D_{\sigma^Z}, 0, a_Y, N_X, N_Y)]}.\] To control \(\mathrm{POR}_Y\), first note that as \(N_X \to 0\), \[\mathrm{POR}_Y (\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 3E}}) = \frac{\mathsf{E}_\infty [\min \{\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Y (-a_Z D_{\sigma^Z}), N_Y\}]}{\mathsf{E}_\infty [\sigma^Z] + \mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}(a_Z D_{\sigma^Z}, 0, a_Y, N_X, N_Y)]} \quad \to \quad \frac{\mathsf{E}_\infty [\min \{\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Y (-a_Z D_{\sigma^Z}), N_Y\}]}{\mathsf{E}_\infty [\sigma^Z] + \mathsf{E}_\infty [\min \{\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Y (-a_Z D_{\sigma^Z}), N_Y\}]}\] Thus, one way to increase \(\mathrm{POR}_Y\) is bring \(N_X\) to \(0\) and increase \(a_Z\) and \(N_Y\). On the other hand, to decrease \(\mathrm{POR}_Y (\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 3E}})\), we increase \(N_X\) and \(a_Y\).

Finally, we have \[\mathrm{POR}_X (\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 3E}}) = 1 - \mathrm{POR}_Y (\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}) - \mathrm{POR}_Z (\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}).\] Note that the expected number of observations of experiment \(X\) in a cycle can be written as \[\mathsf{E}_\infty [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 2E}}(a_Z D_{\sigma^Z}, 0, a_Y, N_X, N_Y)] - \mathsf{E}_\infty [\min \{\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Y (-a_Z D_{\sigma^Z}), N_Y\}].\]
For the delay analysis of 3E-CUSUM algorithm, suppose the change occurs at time \(\nu=2\). Then the worst possible realization from experiment \(Z\) at time \(1\) would create a potentially infinite amount of undershoot \(U_Z\). Thus, the only way for the statistic \(D_n\) to come back to \(0\) would be to exhaust all the \(N_Y\) samples from the experiment \(Y\). Starting time \(2\), each time an observation from experiment \(Y\) is chosen, there is a potential \(N_X\) number of observations one can collect from experiment \(X\). Thus, one can bound the number of samples before \(D_n\) comes above zero by \[N_Y + N_Y N_X.\] Once the statistic \(D_n\) comes above \(0\), it is reset to \(0\), and the delay from that point onwards is \[\mathsf{E}_1 [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 3E}}].\] A similar argument for other change points gives us \[\mathrm{WADD}(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 3E}}) \leq \mathsf{E}_1 [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 3E}}] + N_Y + N_Y N_X.\] Using an argument similar to that used in the case of the 2E-CUSUM algorithm, we have \[\mathrm{WADD}(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 3E}}) \leq \mathsf{E}_1 [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 3E}}] + N_Y + N_Y N_X = \mathsf{E}_1 [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Z] + K + N_Y + N_Y N_X = \mathrm{WADD}(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Z) + K + N_Y + N_Y N_X.\] Here \(K\) is a constant that does not depend on the threshold \(A\).

◻

The above theorem implies strong optimality properties of the 3E-CUSUM algorithm.

Corollary 2. Under the conditions of the theorem, for a given \(\gamma\), set \(A = \log \gamma\). Then for any choice of \(a_Y\), \(a_Z\), \(N_X\), and \(N_Y\), \[\mathrm{ARLFA}(\Psi_{\mathrm{3E}}) \geq \gamma.\] For a given \(\beta_Y\) and \(\beta_Z\), we can find parameters \(a_Y\), \(a_Z\), \(N_X\), and \(N_Y\) such that for all \(A\) (specifically, \(A = \log \gamma\)), \[\mathrm{POR}_Z (\Psi_{\mathrm{3E}}) \leq \beta_Z,\] and \[\mathrm{POR}_Y (\Psi_{\mathrm{3E}}) \leq \beta_Y.\] Moreover, for such a choice of these parameters, \[\begin{align} \mathrm{WADD}(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont 3E}}) \sim \mathrm{WADD}(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Z) \sim \frac{\log \gamma}{D(f_1^Z \mathrel{\Vert} f_0^Z)}, \quad \gamma \to \infty. \end{align}\]

Proof. The proof follows from Theorem 2 and the property of the CUSUM algorithm. ◻

5 Data-Efficiency in QCD with Two- and Three-Experiment Design↩︎

In this section, we address the issue of data-efficiency in the two- and three-experiment QCD problems, wherein the decision-maker has the additional option not to perform any experiment.

5.1 Data-Efficient Two-Experiment Design↩︎

In the data-efficient two-experiment QCD problem, the decision-maker has two experiments it can perform: the higher quality experiment \(Y\) and the lower quality experiment \(X\). It also has the option not to perform any experiment. The problem formulation is similar to the original two-experiment QCD problem: \[\begin{align} \begin{aligned} \min_{\Psi} \quad & \mathrm{WADD} (\Psi) \\ \text{s.t. } \quad & \mathrm{ARLFA} (\Psi) \geq \gamma \\ & \mathrm{POR}_Y (\Psi) \leq \beta_Y \\ & \mathrm{POR}_X (\Psi) \leq \beta_X \end{aligned} \label{eq:de2e-qcd} \end{align}\tag{7}\] where \(\gamma \geq 0\), \(0 \leq \beta_Y \leq 1\), and \(0 \leq \beta_X \leq 1\) are given constants. If \(\beta_Y + \beta_X=1\), then this problem reduces to the problem in equation 5 . Thus, in this problem, we assume that \(\beta_Y + \beta_X < 1.\) The gap \(\beta_0 = 1 - \beta_Y - \beta_X\) is the constraint on the fraction of time no experiments (neither \(Y\) nor \(X\)) are performed.

Our solution to this problem is similar to the three-experiment design, except we replace the experiment with the lowest information quality with the case where no experiment is performed. Instead of utilizing a truncated test at the level for a third experiment, a truncated test is done at this level with no experiments, and the statistic is increased deterministically using a design parameter \(\mu\).

We now propose the Data-Efficient 2E-CUSUM (DE2E-CUSUM) Algorithm (\(\Psi_{\mathrm{DE2E}} (A, a_X, a_Y, N_0, N_X, \mu)\)). This algorithm is essentially the 3E-CUSUM Algorithm. However, when the decision-maker has the opportunity to utilize experiment \(X\), they do not use it and simply choose not to perform any experiment at all. The statistic is then updated deterministically by adding to it a design parameter \(\mu\) every time step when no experiment is performed. The use of a deterministic parameter to update A typical evolution of DE2E-CUSUM is illustrated in Figure 12.

Figure 12: An evolution of the statistic D_n of DE2E-CUSUM for f_0^X = f_0^Y = \mathcal{N}(0, 1), f_1^X = \mathcal{N}(0.75, 1), f_1^Y = \mathcal{N}(1, 1), \nu = 50, and \gamma = 1000, with a_X = a_Y = 1.0, N_X = 2.0, N_0 = 3.0, and \mu = 0.1. The decision-maker performs experiment Y whenever D_n \geq 0, and may perform experiment X or not perform any experiment at all when D_n < 0. — Figure 12: An evolution of the statistic \(D_n\) of DE2E-CUSUM for \(f_0^X = f_0^Y = \mathcal{N}(0, 1)\), \(f_1^X = \mathcal{N}(0.75, 1)\), \(f_1^Y = \mathcal{N}(1, 1)\), \(\nu = 50\), and \(\gamma = 1000\), with \(a_X = a_Y = 1.0\), \(N_X = 2.0\), \(N_0 = 3.0\), and \(\mu = 0.1\). The decision-maker performs experiment \(Y\) whenever \(D_n \geq 0\), and may perform experiment \(X\) or not perform any experiment at all when \(D_n < 0\).

We now state a theorem, similar to the previous ones, which states that we can set the parameters of DE2E-CUSUM to achieve any desired ARLFA and POR constraints. Also, we show that the proposed algorithm is asymptotically optimal.

Theorem 3. For the DE2E-CUSUM Algorithm, let \(0 < D(f_0^X \mathrel{\Vert} f_1^X) < D(f_0^Y \mathrel{\Vert} f_1^Y) < \infty.\)

For any fixed \(a_X > 0\), \(a_Y > 0\), \(N_0 > 0\), and \(N_X > 0\), with \(A = \log \gamma\), \(\mathrm{ARLFA}(\Psi_{\mathrm{DE2E}}) \geq \gamma.\)
For any \(A\), we can choose the parameters such that \[\mathrm{POR}_Y (\Psi_{\mathrm{DE2E}} (A, a_X, a_Y, N_0, N_X, \mu)) \leq \beta_Y, \quad \mathrm{POR}_X (\Psi_{\mathrm{DE2E}} (A, a_X, a_Y, N_0, N_X, \mu)) \leq \beta_X.\] If \(\beta_X + \beta_Y < 1\), then the data-efficiency constraint is also satisfied.
For fixed values of \(a_X > 0\), \(a_Y > 0\), \(N_0 > 0\), \(N_X > 0\), and \(\mu\); and for each \(A\), \[\mathrm{WADD}(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont DE2E}}) \leq \mathsf{E}_1 [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont DE2E}}] + N_X + N_X N_0.\] Consequently, \[\mathrm{WADD}(\Psi_{\mathrm{DE2E}}) \leq \mathrm{WADD}(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Y) + K_2,\] where \(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Y\) is the CUSUM algorithm for the experiment \(Y\) and \(K_2\) is a constant that is not a function of threshold \(A\).

Hence, the DE2E-CUSUM Algorithm is asymptotically optimal.

Proof. The proof is similar to the proof of Theorem 2. ◻

5.2 Three-Experiment Design↩︎

The data-efficient three-experiment design starts by performing the highest-quality experiment and then the mid-quality one. After performing the lowest-quality experiment, the decision-maker has the option of not performing any experiment.

Extending the data-efficient two-experiment QCD problem, our optimization problem is a minimax formulation for the data-efficient three-experiment QCD problem: \[\begin{align} \begin{aligned} \min_{\Psi} \quad & \mathrm{WADD} (\Psi) \\ \text{s.t. } \quad & \mathrm{ARLFA} (\Psi) \geq \gamma \\ & \mathrm{POR}_Z (\Psi) \leq \beta_Z \\ & \mathrm{POR}_Y (\Psi) \leq \beta_Y \\ & \mathrm{POR}_X (\Psi) \leq \beta_X \end{aligned} \label{eq:de3e-qcd} \end{align}\tag{8}\] where \(\gamma \geq 0\), \(0 < \beta_Z < 1\), \(0 \leq \beta_Y < 1\), and \(0 \leq \beta_X < 1\) are given constants. If \(\beta_Z + \beta_Y + \beta_X = 1\), then this problem reduces to the problem in equation 6 . Thus, in this problem, we assume that \(\beta_Z + \beta_Y + \beta_X < 1\). The gap \(\beta_0 = 1 - \beta_Z - \beta_Y - \beta_X\) is the constraint on the fraction of time no experiments are performed.

The algorithm, which minimizes detection delay for this data-efficient case, is an extension of DE2E-CUSUM, which has four levels wherein the first three levels are for the three experiments while the fourth one is for the case wherein no experiment is performed.

We now propose the Data-Efficient 3E-CUSUM (DE3E-CUSUM) Algorithm (\(\Psi_{\mathrm{DE3E}} (A, a_X, a_Y, a_Z, N_0, N_X, N_Y, \mu\)). This algorithm starts by using experiment \(Z\), the highest-quality experiment, while the statistic is above 0. Once it falls below 0, we scale the undershoot and use this scaled value as the starting point of the truncated DE2E-CUSUM Algorithm for experiments \(Y\) and \(X\) wherein the number of observations of experiment \(Y\) is limited by \(N_Y\) (recall that, in the DE2E-CUSUM Algorithm described in Section 5.1, we do not limit the number of times experiment \(Y\) is observed) until the statistic crosses 0 from below. At this point, we start using experiment \(Z\) again. A typical evolution of DE3E-CUSUM is illustrated in Figure 13.

Figure 13: An evolution of the statistic D_n of DE3E-CUSUM for f_0^X = f_0^Y = f_0^Z = \mathcal{N}(0, 1), f_1^X = \mathcal{N}(0.5, 1), f_1^Y = \mathcal{N}(0.75, 1), f_1^Z = \mathcal{N}(1, 1), \nu = 50, and \gamma = 1000, with a_X = a_Y = a_Z = 1.0, N_X = 3.0, N_Y = 2.0, N_0 = 4.0, and \mu = 0.1. The decision-maker performs experiment Z whenever D_n \geq 0. When D_n < 0, the observer performs experiments Y and X within their thresholds, or does not perform any experiment after experiment X. — Figure 13: An evolution of the statistic \(D_n\) of DE3E-CUSUM for \(f_0^X = f_0^Y = f_0^Z = \mathcal{N}(0, 1)\), \(f_1^X = \mathcal{N}(0.5, 1)\), \(f_1^Y = \mathcal{N}(0.75, 1)\), \(f_1^Z = \mathcal{N}(1, 1)\), \(\nu = 50\), and \(\gamma = 1000\), with \(a_X = a_Y = a_Z = 1.0\), \(N_X = 3.0\), \(N_Y = 2.0\), \(N_0 = 4.0\), and \(\mu = 0.1\). The decision-maker performs experiment \(Z\) whenever \(D_n \geq 0\). When \(D_n < 0\), the observer performs experiments \(Y\) and \(X\) within their thresholds, or does not perform any experiment after experiment \(X\).

We now state a theorem on the design and optimality of the DE3E-CUSUM algorithm.

Theorem 4.

For the DE3E-CUSUM Algorithm, let \(0 < D(f_0^X \mathrel{\Vert} f_1^X) < D(f_0^Y \mathrel{\Vert} f_1^Y) < D(f_0^Z \mathrel{\Vert} f_1^Z) < \infty\).

For any fixed \(a_X > 0\), \(a_Y > 0\), \(a_Z > 0\), \(N_0 > 0\), \(N_X > 0\), \(N_Y > 0\), and \(\mu\), with \(A = \log \gamma\), \(\mathrm{ARLFA}(\Psi_{\mathrm{DE3E}}) \geq \gamma\).
For any \(A\), we can choose the parameters such that \[\mathrm{POR}_Z (\Psi_{\mathrm{DE3E}} (A, a_X, a_Y, a_Z, N_0, N_X, N_Y, \mu)) \leq \beta_Z,\] \[\mathrm{POR}_Y (\Psi_{\mathrm{DE3E}} (A, a_X, a_Y, a_Z, N_0, N_X, N_Y, \mu)) \leq \beta_Y\] and \[\mathrm{POR}_X (\Psi_{\mathrm{DE3E}} (A, a_X, a_Y, a_Z, N_0, N_X, N_Y, \mu)) \leq \beta_X.\] If \(\beta_X + \beta_Y + \beta_Z < 1\), then the data-efficiency constraint is also satisfied.
For fixed values of \(a_X > 0\), \(a_Y > 0\), \(a_Z > 0\), \(N_0 > 0\), \(N_X > 0\), \(N_Y > 0\), and \(\mu\); and for each \(A\), \[\mathrm{WADD}(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont DE3E}}) \leq \mathsf{E}_1 [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont DE3E}}] + N_Y + N_Y N_X + N_Y N_X N_0.\] Consequently, \[\mathrm{WADD}(\Psi_{\mathrm{DE3E}}) \leq \mathrm{WADD}(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Z) + K_3,\] where \(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^Z\) is the CUSUM algorithm for the experiment \(Z\) and \(K_3\) is a constant that is not a function of threshold \(A\).

Hence, the DE3E-CUSUM Algorithm is asymptotically optimal.

Proof. The proof is similar to the proof of Theorem 2. ◻

6 Multiple-Experiment Design↩︎

At this point, we are now finally ready to solve the multiple-experiment QCD problem we formulated in Section 2. The analyses we did for the two- and three-experiment cases, including their data-efficiency version, will help use generalize our proposed algorithms in order to apply to the \(m\)-experiment case.

Observe that the 3E-CUSUM Algorithm starts by using experiment \(Z\), the highest-quality experiment, while the statistic is above 0. Once it falls below 0, we scale the undershoot and use this scaled value as the starting point of the truncated 2E-CUSUM Algorithm for experiments \(Y\) and \(X\). Once the statistic crosses 0 from below, we use experiment \(Z\) again.

In a similar vein, we can propose the 4E-CUSUM Algorithm as follows: while the statistic is above 0, the highest-quality experiment is performed. When the statistic falls below 0, the scaled undershoot is used as the starting point of the truncated 3E-CUSUM Algorithm (which is similar to the 3E-CUSUM Algorithm described in Section 4.1 but the highest-quality experiment has a limited number of allowed observations) for the remaining three experiments. Once the statistic goes above 0, the highest-quality experiment is used again.

Following this recursive description, we propose the \(m\)E-CUSUM Algorithm (\(\Psi_{\mathrm{mE}} (A, a_2, a_3, \dots, a_m, N_1, N_2, \dots, N_{m - 1})\)). This algorithm starts by using experiment \(m\), the highest-quality experiment, while the statistic is above 0. Once it falls below 0, we scale the undershoot and use this value as the starting point of the truncated \((m - 1)\)E-CUSUM Algorithm for the remaining \(m - 1\) experiments wherein the number of observations of experiment \(m - 1\) is limited by \(N_{m - 1}\). Once the statistic crosses 0 from below, we perform experiment \(m\) again.

Theorem 5.

For the \(m\)E-CUSUM Algorithm, let \(0 < D(f_1^j \mathrel{\Vert} f_0^j) < D(f_0^i \mathrel{\Vert} f_1^i) < \infty\) for \(i > j\) where \(i, j = 1, 2, \dots, m\).

For any fixed \(a_i > 0\) (\(i = 2, 3, \dots, m\)) and \(N_j > 0\) (\(j = 1, 2, \dots, m - 1\)), with \(A = \log \gamma\), \(\mathrm{ARLFA}(\Psi_{\mathrm{mE}}) \geq \gamma\).
For any \(A\), we can choose the parameters such that \(\mathrm{POR}_i (\Psi_{\mathrm{mE}} (A, a_2, a_3, \dots, a_m, N_1, N_2, \dots, N_{m - 1})) \leq \beta_i\) for \(i = 2, 3, \dots, m\).
For fixed values of \(a_i > 0\) (\(i = 2, 3, \dots, m\)) and \(N_j > 0\) (\(j = 1, 2, \dots, m - 1\)), and for each \(A\), \[\mathrm{WADD}(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont mE}}) \leq \mathsf{E}_1 [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont mE}}] + \sum_{r = 1}^{m - 1} \prod_{k = m - r}^{m - 1} N_k.\] Consequently, \[\mathrm{WADD}(\Psi_{\mathrm{mE}}) \leq \mathrm{WADD}(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^m) + K_m,\] where \(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^m\) is the CUSUM algorithm for the experiment \(m\) and \(K_m\) is a constant that is not a function of threshold \(A\).

Hence, the \(m\)E-CUSUM Algorithm is asymptotically optimal.

In the data-efficient \(m\)-experiment QCD problem, the decision-maker has \(m\) experiments it can perform. It also has the option not to perform any experiment. The problem formulation is similar to the original \(m\)-experiment QCD problem: \[\begin{align} \begin{aligned} \min_{\Psi} \quad & \mathrm{WADD} (\Psi) \\ \text{s.t. } \quad & \mathrm{ARLFA} (\Psi) \geq \gamma \\ & \mathrm{POR}_i (\Psi) \leq \beta_i \text{ for } i = 1, \dots, m \end{aligned} \label{eq:DEmE-qcd} \end{align}\tag{9}\] where \(\gamma \geq 0\), \(0 < \beta_m < 1\), and \(0 \leq \beta_i < 1\) for \(i = 2, \dots, m - 1\) are given constants. If \(\displaystyle\sum_{i = 1}^m \beta_i = 1\), then this problem reduces to the problem in equation 4 . Thus, in this problem, we assume that \[\sum_{i = 1}^m \beta_i < 1.\] The gap \[\beta_0 = 1 - \sum_{i = 1}^m \beta_i\] is the constraint on the fraction of time no experiments are performed.

Recall that we described the DE3E-CUSUM Algorithm as starting with the highest-quality experiment while the statistic is above 0. When it falls below 0, the truncated DE2E-CUSUM Algorithm (which starts at the scaled undershoot) is used until the statistic goes above 0, at which point, the highest-quality experiment is used again. Similarly, while the statistic is 0 for the DE4E-CUSUM Algorithm, the highest-quality experiment is performed, and the truncated DE3E-CUSUM Algorithm, starting at the scaled undershoot, is utilized when the statistic falls below 0. Once the statistic crosses 0 from below, we perform the highest-quality experiment again.

The Data-Efficient \(m\)E-CUSUM (DE\(m\)E-CUSUM) Algorithm (\(\Psi_{\mathrm{DEmE}} (A, a_1, a_2, \dots, a_m, N_0, N_1, \dots, N_{m - 1}, \mu)\)) is proposed as follows. The algorithm starts with the highest-quality experiment, which is used while the statistic is above 0. When it falls below 0, the undershoot is scaled and this scaled value is used as the starting point of the truncated DE\((m - 1)\)E-CUSUM Algorithm for experiments 1 to \(m - 1\) where the number of observations of experiment \(m - 1\) is limited by \(N_{m - 1}\). The highest-quality experiment is performed again once the statistic crosses 0 from below.

We have the following theorem which establishes the asymptotic optimality of the DE\(m\)E-CUSUM Algorithm.

Theorem 6.

For the DE\(m\)E-CUSUM Algorithm, let \(0 < D(f_0^j \mathrel{\Vert} f_1^j) < D(f_0^i \mathrel{\Vert} f_1^i) < \infty\) for \(i > j\) where \(i, j = 1, 2, \dots, m\).

For any fixed \(a_i > 0\) (\(i = 1, \dots, m\)) and \(N_j > 0\) (\(j = 0, \dots, m - 1\)), with \(A = \log \gamma\), \(\mathrm{ARLFA}(\Psi_{\mathrm{DEmE}}) \geq \gamma.\)
For any \(A\), we can choose the parameters such that \(\mathrm{POR}_i (\Psi_{\mathrm{DEmE}} (A, a_1, a_2, \dots, a_m, N_0, N_1, \dots, N_{m - 1}, \mu)) \leq \beta_i.\) If \(\displaystyle\sum_{i = 1}^m \beta_i < 1\), then the data-efficiency constraint is also satisfied.
For fixed values of \(a_i > 0\) (\(i = 1, 2, \dots, m\)) and \(N_j > 0\) (\(j = 0, 1, \dots, m - 1\)), and for each \(A\), \[\mathrm{WADD}(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont DEmE}}) \leq \mathsf{E}_1 [\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont DEmE}}] + \sum_{r = 0}^{m - 1} \prod_{k = (m - 1) - r}^{m - 1} N_k.\] Consequently, \[\mathrm{WADD}(\Psi_{\mathrm{DEmE}}) \leq \mathrm{WADD}(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^m) + K_m,\] where \(\tau_{\mathrm{\fontsize{6 pt}{10 pt}\selectfont C}}^m\) is the CUSUM algorithm for the experiment \(m\) and \(K_m\) is a constant that is not a function of threshold \(A\).

Hence, the DE\(m\)E-CUSUM Algorithm is asymptotically optimal.

7 Conclusion↩︎

The QCD problem typically involves observing a stochastic process using a single experiment. In this paper, we extended the classical problem by considering the case where the observer chooses between multiple experiments with different information quality to minimize WADD subject to ARLFA and POR constraints. We proposed the 2E-Cusum Algorithm to solve the two-experiment QCD problem by starting with the higher-quality experiment in order to detect a change quickly in case it occurs early. With 2E-CUSUM, the decision-maker is able to attain any POR level desired.

We then explored the three-experiment case and developed the 3E-CUSUM Algorithm, which is an extension of 2E-CUSUM. 3E-CUSUM begins by performing the highest-quality experiment. It then utilizes 2E-CUSUM for the mid- and lowest-quality experiments once the statistic falls below 0.

Data-efficiency for the two- and three-experiment cases, wherein we incorporated periods when no experiment is performed, were also studied. An algorithm for the data-efficient two-experiment scenario was easily developed, motivated by the structure of the three-experiment design: 3E-CUSUM was modified to develop the DE2E-CUSUM Algorithm. Similarly, an algorithm incorporating data-efficiency for the three-experiment case was developed by extending DE2E-CUSUM.

After understanding the issues and gaining insights from two- and three-experiment designs, we then formulated the problem and proposed the \(n\)E-CUSUM Algorithm, an algorithm which solves the \(n\)-experiment design case.

For future studies, one can explore the Bayesian formulation of the multiple-experiment QCD problem and its data-efficient case. Algorithms similar to the ones presented here can also be developed to solve these problems.

Acknowledgment↩︎

Taposh Banerjee was supported in part by the U.S. National Science Foundation under grant 2427909.

References↩︎

[1]

V. V. Veeravalli and T. Banerjee, “Quickest change detection,” in Academic Press Library in Signal Processing: Array and Statistical Signal Processing.Oxford, U.K.: Acad. Press, 2013, pp. 209–255.

[2]

H. V. Poor and O. Hadjiliadis, Quickest detection.Cambridge, U.K.: Cambridge Univ. Press, 2009.

[3]

I. V. N. A. G. Tartakovsky and M. Basseville, Sequential analysis: Hypothesis testing and change-point detection.Boca Raton, FL, USA: CRC Press, 2014.

[4]

A. G. Tartakovsky, Sequential change detection and hypothesis testing: General non-i.i.d. stochastic models and asymptotically optimal rules, 1st ed.Chapman and Hall/CRC, 2019.

[5]

W. A. Shewhart, “The application of statistics as an aid in maintaining quality of a manufactured product,” J. Am. Stat. Assoc., vol. 20, no. 152, pp. 546–548, Dec. 1925.

[6]

E. S. Page, “Continuous inspection schemes,” Biometrika, vol. 41, no. 1–2, pp. 100–115, Jun. 1954.

[7]

A. N. Shiryaev, “On optimum methods in quickest detection problems,” Theory Probab. Appl., vol. 8, no. 1, pp. 22–46, Jan. 1963.

[8]

G. Lorden, “Procedures for reacting to a change in distribution,” Ann. Math. Statist., vol. 42, no. 6, pp. 1897–1908, Dec. 1971.

[9]

G. V. Moustakides, “Optimal stopping times for detecting changes in distributions,” the Annals of Statistics, vol. 14, no. 4, pp. 1379–1387, 1986.

[10]

T. L. Lai, “Information bounds and quick detection of parameter changes in stochastic systems,” IEEE Transactions on Information Theory, vol. 44, no. 7, pp. 2917–2929, 1998.

[11]

A. G. Tartakovsky and V. V. Veeravalli, “General asymptotic bayesian theory of quickest change detection,” Theory of Probability & Its Applications, vol. 49, no. 3, pp. 458–497, 2005.

[12]

V. Krishnamurthy, Partially observed markov decision processes: From filtering to controlled sensing.Cambridge, U.K.: Cambridge Univ. Press, 2016.

[13]

H. Chernoff, “Sequential design of experiments,” Ann. Math. Statist., vol. 30, no. 3, pp. 755–770, Sep. 1959.

[14]

S. A. Bessler, “Theory and applications of the sequential design of experiments, k-actions and infinitely many experiments, part ii applications,” Dept. Statist., Standford Univ., Rep. 56, Apr. 1960.

[15]

A. E. Albert, “The sequential design of experiments for infinitely many states of nature,” Ann. Math. Statist., vol. 32, no. 3, pp. 774–799, Sep. 1961.

[16]

J. Kiefer and J. Sacks, “Asymptotically optimum sequential inference and design,” Ann. Math. Statist., vol. 34, no. 3, pp. 705–750, Sep. 1963.

[17]

S. P. Lalley and G. Lorden, “A control problem arising in the sequential design of experiments,” Ann. Probab., vol. 14, no. 1, pp. 136–172, Jan. 1986.

[18]

R. Keener, “Second order efficiency in the sequential design of experiments,” Ann. Statist., vol. 12, no. 2, pp. 510–532, Jun. 1984.

[19]

G. K. A. S. Nitinawarat and V. V. Veeravalli, “Controlled sensing for multihypothesis testing,” IEEE Trans. Aut. Control, vol. 58, no. 10, pp. 2451–2464, Oct. 2013.

[20]

M. Naghshvar and T. Javidi, “Active sequential hypothesis testing,” Ann. Statist., vol. 41, no. 6, pp. 2703–2738, Dec. 2013.

[21]

S. Nitinawarat and V. Veeravalli, “Controlled sensing for sequential multihypothesis testing with controlled markovian observations and non-uniform control cost,” Sequential Anal., vol. 34, no. 1, pp. 1–24, Feb. 2015.

[22]

V. V. V. A. Deshmukh and S. Bhashyam, “Sequential controlled sensing for composite multihypothesis testing,” Sequential Anal., vol. 40, no. 2, pp. 1–38, Mar. 2021.

[23]

K. C. A. Gurevich and Q. Zhao, “Sequential anomaly detection under a nonlinear system cost,” IEEE Trans. Signal Process., vol. 67, no. 14, pp. 3689–3703, Jul. Jul. 2019.

[24]

K. C. B. Hemo, T. Gafni and Q. Zhao, “Searching for anomalies over composite hypotheses,” IEEE Trans. Signal Process., vol. 68, no. 2, pp. 1181–1196, Feb. 2020.

[25]

N. K. Vaidhiyan and R. Sundaresan, “Learning to detect an oddball target,” IEEE Trans. Inf. Theory, vol. 64, no. 2, pp. 831–852, Feb. 2018.

[26]

G. F. A. Tsopelakos and V. V. Veeravalli, “Sequential anomaly detection with observation control,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), 2019, pp. 2389–2393.

[27]

A. Tsopelakos and G. Fellouris, “Sequential anomaly detection with observation control under a generalized error metric,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), 2020, pp. 1165–1170.

[28]

——, “Sequential anomaly detection with sampling constraints,” IEEE Trans. Inf. Theory, vol. 69, no. 12, pp. 8126–8146, Dec. 2023.

[29]

G. F. V. V. Veeravalli and G. V. Moustakides, “Quickest change detection with controlled sensing,” IEEE J. Sel. Areas Inf. Theory, vol. 5, pp. 1–11, Feb. 2024.

[30]

B. L. A. Gopalan and V. Saligrama, “Bandit quickest changepoint detection,” Proc. Adv. Neural Inf. Process. Syst., pp. 1–10, Nov. 2021.

[31]

V. Dragalin, “A simple and effective scanning rule for a multi-channel system,” Metrika, vol. 43, no. 1, pp. 165–182, Dec. 1996.

[32]

W. Zhang and Y. Mei, “Bandit change-point detection for real-time monitoring high-dimensional data under sampling control,” Technometrics, vol. 65, no. 1, pp. 33–43, Apr. 2022.

[33]

Y. M. Q. Xu and G. V. Moustakides, “Optimum multistream sequential change-point detection with sampling control,” IEEE Trans. Inf. Theory, vol. 67, no. 11, pp. 7627–7636, Nov. 2021.

[34]

G. F. A. Chaudhuri and A. Tajer, “Sequential change detection of a correlation structure under a sampling constraint,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), 2021, pp. 605–610.

[35]

Q. Xu and Y. Mei, “Asymptotic optimality theory for active quickest detection with unknown postchange parameters,” Sequential Anal., vol. 42, no. 2, pp. 150–181, May 2023.

[36]

T. Banerjee and V. V. Veeravalli, “Data-efficient quickest change detection in minimax settings,” IEEE Trans. Inf. Theory, vol. 59, no. 10, pp. 6917–6931, Oct. 2013.

[37]

——, “Energy-efficient quickest change detection in sensor networks,” in Proc. IEEE Statist. Signal Process. Workshop, 2012, pp. 636–639.

[38]

——, “Data-efficient minimax quickest change detection with composite post-change distribution,” IEEE Trans. Inf. Theory, vol. 61, no. 9, pp. 5172–5184, Sep. 2015.

[39]

——, “Data-efficient quickest change detection in distributed and multichannel systems,” Presented at the IEEE Int. Conf. Acoust., Speech, Signal Process., May 2013.

[40]

V. V. V. T. Banerjee and A. Tartakovsky, “Decentralized data-efficient quickest change detection,” Presented at the IEEE Int. Symp. Inf. Theory, Jul. 2013.

[41]

T. Banerjee and V. V. Veeravalli, “Data-efficient minimax quickest change detection in a decentralized system,” Sequential Anal., vol. 34, no. 2, pp. 148–170, Feb. 2015.

[42]

——, “Data-efficient quickest change detection in sensor networks,” IEEE Trans. Signal Process., vol. 63, no. 14, pp. 3727–3735, Jul. 2015.

[43]

K. H. J. X. Ren and L. Shi, “Quickest change detection with observation scheduling,” IEEE Trans. Autom. Control, vol. 62, no. 6, pp. 2635–2647, Jun. 2017.

[44]

M. Pollak, “Optimal detection of a change in distribution,” Ann. Statist., vol. 13, no. 1, pp. 206–227, Mar. 1985.

The authors are with the University of Pittsburgh, Pittsburgh, PA 15260 USA (email: pnl8\(@\)pitt.edu, taposh.banerjee\(@\)pitt.edu)↩︎

Quickest Change Detection with Cost-Constrained Experiment Design