Intrusion Tolerance for Networked Systems

through Two-Level Feedback Control

April 02, 2024

We study automated intrusion tolerance using feedback control. Following a novel approach, we formulate intrusion tolerance of a replicated system as a two-level optimal control problem. On the first control level, a distributed set of node controllers decide when to perform intrusion recovery and on the second control level, a system-wide controller decides when to increase the replication factor. We show that these two control problems correspond to two classical problems in operations research, namely the machine replacement problem and the inventory replenishment problem. Based on this insight, we prove structural properties of the optimal control strategies and develop tolerance: a novel intrusion-tolerant system based on feedback control. tolerance builds on previous generations of intrusion-tolerant systems and adds two levels of feedback-control to decide when to perform recovery and how to respond to intrusions. To assess the performance of tolerance we evaluate it in an emulation environment where we run real intrusions and response actions. The results show that tolerance significantly improves service availability and reduces costs when compared with the state-of-the-art.

Intrusion tolerance, fault tolerance, Byzantine fault tolerance, intrusion response, intrusion recovery, optimal control, reinforcement learning, pomdp, mdp, cmdp.

As society’s reliance on online services is growing, there is an increasing demand for reliable systems that are highly available and provide correct service without disruption. Traditionally, the main cause of service disruptions in networked systems
has been hardware failures and power outages [1], [2]. While tolerance against these types of failures is still important, a rapidly growing source of service disruptions is network intrusions [3]. Intrusions are fundamentally different from hardware failures as an attacker can behave in arbitrary (Byzantine) ways, often leading to unanticipated failure behavior. Due to the high costs of such hazardous failures
and the realization that it is likely impossible to completely prevent intrusions, the ability to *tolerate* and *recover* from intrusions becomes necessary [4]. Such abilities are particularly important for safety-critical applications, e.g., real-time control systems, where use of intrusion-tolerant systems has a long history [5]–[10].
Further, as the frequency of intrusions has escalated in recent years [3], intrusion-tolerant systems are increasingly being deployed in new application
areas, for example e-commerce applications [11] and control-planes for software defined networks [12].

Tolerating network intrusions requires four main functions: redundancy, detection, recovery, and response [4], [13], [14]. Redundancy comes down to creating protocols for service replication that guarantees correct service even when parts of the system is compromised. Such protocols are called “Byzantine fault-tolerant (bft) consensus protocols” and there is a large body of research devoted to designing them (see survey [15]). Under appropriate synchrony assumptions, bft protocols are able to overcome up to \(f\) compromised nodes by running \(3f+1\) replicas [16]–[19]. Furthermore, by appropriate use of cryptography, bft protocols can preserve data confidentiality and integrity in spite of an intrusion [20], [21].

While redundancy is necessary to tolerate any intrusion, it is not sufficient. No matter how many replicas a system has, as intrusions accumulate over time it becomes likely that the number of compromised replicas eventually passes the tolerance threshold. To avoid this scenario, the system must recover compromised replicas and take appropriate response actions to maintain service availability and prevent the attacker from repeating the intrusion. Current intrusion-tolerant systems either do not provide this functionality or rely on inefficient heuristics, such as periodic recoveries [16], [22], static rule-based recoveries [14], or manual recoveries by system administrators [23].

In this paper, we address the above limitations and present tolerance: __T__w__o__-__le__vel __r__ecovery __an__d response __c__ontrol with f__e__edback. tolerance is a
replicated system that adds two levels of feedback control on top of previous generations of intrusion-tolerant systems (see Fig. 1). On the first control level, each node in the system is equipped with a node controller
that estimates the node’s state and decides when recovery is required. On the second control level, a system-wide controller collects state estimates from node controllers and decides when to increase the replication factor. We show that these two control
problems correspond to two classical problems in operations research, namely the *machine replacement problem* [24] and the
*inventory replenishment problem* [25], which provides theoretical insight. Based on this insight,
we prove structural properties of the optimal control strategies and design an efficient stochastic approximation algorithm for computing them.

The key benefits of the two-level control architecture of tolerance are: (*i*) through feedback in terms of alerts and system measurements, the system can promptly adapt to network intrusions; (*ii*) by using
two levels of control rather than one, the control system can tolerate partial failures and network partitions; and (*iii*) through the connection with classical problems from operations research, the control system can leverage well-established
control techniques with strong theoretical guarantees [26], [27].

To assess the performance of tolerance, we implement it on top of an existing intrusion-tolerant system and evaluate it against emulated network intrusions. The results show that when compared against the state-of-the-art, tolerance significantly increases availability against several types of network intrusions while also reducing costs.

Our main contributions can be summarized as follows:

We develop tolerance, a novel intrusion-tolerant system that uses two levels of feedback control for automated intrusion recovery and response.

We prove structural properties of the optimal control strategies and develop an efficient stochastic approximation algorithm for computing them, which outperforms the state-of-the-art for our use case.

We show that the intrusion recovery and response problems correspond to two classical problems in operations research, which provides theoretical insight and allows to leverage well-established control techniques. To our knowledge, we are the first to make these connections.

We evaluate the performance of tolerance against emulated network intrusions in a realistic environment.

**Source code and documentation.** Documentation of the software framework that implements tolerance is available at [28], [29]; the source code is available at [30];
and a video demonstration is available at [31].

We consider an intrusion tolerance use case that involves a replicated system that offers a web service to a client population (see Fig. 2). The system is organized as a set of service replicas, each of which is located in a separate geographic location and consist of compute and storage nodes. Service replicas are synchronized through a replication protocol that is guaranteed to provide correct service as long as no more than \(f\) replicas are compromised, where \(f\) is a given threshold. (We use the term “node” to refer to a computer in a distributed system and we say that a node is “compromised” if it does not follow the system protocol.) Further, service replicas run different versions of the same software to minimize the risk that several replicas get compromised simultaneously (i.e., the replicas follow a software diversification scheme [32]).

Clients access the replicated service through a set of api gateways, which also are accessible to the attacker. The attacker’s goal is to intrude on the system, compromise replicas, and disrupt service. To achieve this, the attacker explores the system through reconnaissance and exploits vulnerabilities while avoiding detection. We assume that the attacker does not have physical access to any node in the system, which means that it has to access nodes through the client interface. We further assume that the attacker is computationally bound and hence cannot subvert cryptographic techniques. Apart from these standard assumptions, we allow for a very strong attacker that can control a compromised service replica in arbitrary ways, e.g., it can shut down the replica, delay the replica’s service responses, or coordinate with other compromised replicas to cause maximum damage to the replicated service.

To prevent the attacker from simultaneously compromising \(f\) replicas, the system can recover existing replicas and add new replicas. Both of these actions involve costs, e.g., recovering a replica may require re-imaging the operating system. The challenge for the system is to minimize costs while meeting service availability constraints.

This section provides an overview of fault-tolerant systems, distributed consensus, and intrusion tolerance.

Research on fault-tolerant systems has almost a century-long history with the seminal work being made by von Neumann [33] and Shannon [34] in 1956. The early work focused on tolerance against hardware failures. Since then the field has broadened to also include tolerance against software bugs, operator
mistakes, and malicious attacks [1], [2], [35]–[37]. Although many approaches to fault tolerance
have been proposed over the years, they all share a common strategy: *redundancy*. Redundancy can be implemented both in time and in space. Redundancy in space is achieved through service replication, whereby a service is provided by a set of
identical nodes, each of which can respond to service requests. Redundancy in time is achieved by timely recovery of nodes. Through these redundancies, the compromise of any one node is masked by the surviving nodes as long as a) the number of compromises
between two recoveries is below the tolerance threshold; and b) the surviving nodes are able to coordinate their responses. The latter coordination problem is known as the *consensus* problem.

Distributed consensus is the problem of reaching agreement on a single value among a set of distributed nodes subject to failures [38]. The feasibility of
solving this problem depends on synchrony and failure assumptions. The main synchrony assumptions used in the literature are: (*i*) the *synchronous model*, where there is an upper bound on the communication delay and clock drift between any
two nodes; (*ii*) the *partially synchronous model*, where an upper bound exists but the system may have periods of instability where the upper bound does not hold, and (*iii*) the *asynchronous model*, where there are no bounds
on either delays or clock drifts [38], [39]. Similarly, the three most common
failure assumptions are: (*i*) the *crash-stop failure model* where nodes fail by crashing; (*ii*) the *Byzantine failure model*, where failed nodes may behave arbitrarily; and (*iii*) the *hybrid failure model*,
where nodes may fail arbitrarily but are equipped with trusted components (e.g., trusted co-processors) that fail only by crashing.

It is well known that consensus is not solvable in the asynchronous model [40]. In the partially synchronous model, however, consensus is solvable with \(N\) nodes and up to a) \(f=\frac{N-1}{2}\) crash-stop failures [41]; b) \(f=\frac{N-1}{3}\) Byzantine failures [42]; and c) \(f=\frac{N-1}{2}\) hybrid failures [[43]][44][45]–[47].

(In this paper we use the term “consensus” to refer to the classical consensus problem in distributed computing [38]. We do not consider alternative forms of consensus, e.g., probabilistic consensus [48] and consensus on sparse graphs [49].)

Intrusion tolerance extends fault-tolerance with mechanisms specifically designed for tolerating network intrusions, e.g., mechanisms for intrusion detection, recovery, and response [13], [14], [50]. A system is said to be intrusion-tolerant if it remains secure and operational with a quantifiable probability when an intrusion occurs (see Fig. 3) [13], [14]. Several metrics have been proposed to quantify intrusion tolerance [4], [51]–[55]. In this paper we focus on the following three metrics:

*Tolerance threshold \(f\)*: the number of compromised nodes that the system can tolerate and still provide service.*Average time-to-recovery \(T^{(\mathrm{R})}\)*: the average time it takes from the moment a node is compromised until it is fully recovered.*Average availability \(T^{(\mathrm{A})}\)*: the fraction of time where the number of compromised nodes is less than \(f\). (To circumvent the cap theorem [56], in this paper we consider the system to be available when a network partition occurs as long as one partition can continue to provide service.)

In this section we describe the architecture of tolerance: __T__w__o__-__le__vel __r__ecovery __an__d response __c__ontrol with f__e__edback. The architecture is shown in Fig. 4 and extends the vm-fit architecture with two-levels of feedback control [22], [46]. It is a distributed system consisting of \(N\) nodes connected through a communication network that provides a replicated service to a client population. Clients
access the service through a public interface which also is accessible to a possible attacker.

tolerance builds on previous generations of intrusion-tolerant systems and follows standard best practices for tolerating intrusions, such as: (*i*) filtering of messages containing confidential data at the
virtualization layer [57] (*ii*) rate-limiting to defend against denial of service attacks [20]; (*iii*) software diversification to reduce the probability that an attacker can use the same attack to compromise all nodes in the system [22], [32]; (*iv*) periodic backups to persistent storage to support intrusion recovery; and (*v*)
data encryption with session keys to ensure data confidentiality and integrity.

Similar to the vm-fit architecture, each node in tolerance is segmented into two domains: a privileged domain and an application domain. The privileged domain runs a virtualization layer and provisions a virtual machine that runs a service replica in the application domain. The virtualization layer intercepts all network traffic to the virtual machine and runs a reconfigurable consensus protocol to coordinate client responses with the other nodes (e.g., the minbft protocol [43]). (Here “reconfigurable” means that nodes can be added to the system without restarting it [58].)

To tolerate network intrusions beyond the tolerance threshold of the underlying intrusion-tolerant consensus protocol, tolerance includes a two-level control system. On the first control level, each node runs a *node
controller* in the privileged domain that monitors the application domain through alerts from an Intrusion Detection System (ids). Based on these alerts, the controller estimates the service replica’s state (i.e., whether
it is compromised or not) and decides when it should be recovered. When it decides to recover the replica, the replica’s state is rolled back to a previous check point and its software is updated by creating a new virtual machine, where the operating
system of the machine is decided by a software diversification scheme [32].

The second level of control is performed by a *system controller* that collects state estimates from node controllers and makes system-wide scaling decisions to respond to intrusions. These state estimates provide the system controller with
global knowledge about the estimated number of compromised replicas, allowing it to make informed decisions about whether the replication factor needs to be increased to maintain service availability. Similar to the node controllers, the system controller
runs in a privileged domain and can only fail by crashing. Throughout this paper we assume that the probability that the system controller crashes is negligible (i.e., we assume it is implemented by a fault-tolerant system).

The main benefits of the control system described above are: (*i*) it makes the system self-stabilizing [59], i.e., it can detect intrusions
and take action to bring itself back to a safe state; and (*ii*), by using two levels of control rather than one, the control system tolerates partial failures and network partitions. That is, if any one controller becomes unresponsive, the other
controllers can continue to operate.

In designing the control strategies, the two key challenges are a) that the state estimates are imperfect; and b) that each control action involves a cost that has to be weighed against the potential security benefits. In the following, we provide a formal model of this trade-off and formulate the associated two-level optimal control problem. Through this formulation we show that the control problems of the node and system controllers correspond to two classical problems in operations research, namely the machine replacement problem [24] and the inventory replenishment problem [25].

We model tolerance as a partially synchronous system with hybrid failures. From standard results in distributed systems theory we know that such a system tolerates up to \(f = \frac{N-1}{2}\) compromised nodes [[43]][22]. Given this tolerance threshold, the goal of tolerance is to keep the number of compromised nodes below \(f\) at all times while minimizing operational costs. In the following, we formulate this problem as a constrained two-level optimal control problem.

**Notations.** Random variables are denoted with upper-case letters (e.g., \(X\)) and their values by lower-case letters (e.g., \(x\)). \(\mathbb{P}\) is the probability measure and the expectation of \(f\) with respect to \(X\) is denoted with \(\mathbb{E}_X[f]\).
When \(f\) includes many random variables that depend on \(\pi\) we simplify the notation to \(\mathbb{E}_{\pi}[f]\). We use \(x
\sim f\) to denote that \(x\) is sampled from \(f\) and sometimes write \(\mathbb{P}[x|z,y]\) instead of \(\mathbb{P}[X=x|Z=z,Y=y]\) when \(X,Z,Y\) are clear from the context. Upper-case calligraphic letters (e.g., \(\mathcal{V}\)) represent sets. \(\llbracket \cdot \rrbracket\) denotes the Iverson bracket and \(\lfloor \cdot \rfloor\) denotes the floor function. Symbols used throughout the paper are listed in Table 1.

Notation(s) |
Description |
---|---|

\(\mathcal{N}, N, f\) | Set of nodes, number of nodes, tolerance threshold |

\(T^{(\mathrm{R})}\) | Average time-to-recovery when an intrusion occurs |

\(F^{(\mathrm{R})}\) | Frequency of recoveries |

\(T^{(\mathrm{A})}\) | Average service availability |

\(\pi_{i,t}, J_i\) | Strategy and objective of node \(i\) at time \(t\) |

\(\Pi_{\mathrm{N}}\) | Strategy space of node controllers \(\pi_{i,t} \in \Pi_{\mathrm{N}}\) |

\(\pi_{i,t}^{*}, \alpha_i^{*}\) | Optimal strategy and threshold for node \(i\) |

\(s_{i,t},o_{i,t},b_{i,t}\) | State, observation, and belief of node \(i\) at time \(t\) |

\(S_{i,t},O_{i,t}, B_{i,t}\) | Random variables with realizations \(s_{i,t},o_{i,t},b_{i,t}\) |

\(a_{i,t},c_{i,t}\) | Action and cost of node controller \(i\) at time \(t\) |

\(A_{i,t},C_{i,t}\) | Random variables with realizations \(a_{i,t},c_{i,t}\) |

\(\tau_{i,k}\) | Time of the \(k\)th recovery of node \(i\) |

\(\mathcal{T}_{i,k}\) | Random variable with realization \(\tau_{i,k}\) |

\(\mathcal{S}_{\mathrm{N}},\mathcal{A}_{\mathrm{N}},\mathcal{O}_{\mathrm{N}}\) | State, action and observation spaces of nodes |

\(\mathfrak{W},\mathfrak{R} = 0,1\) | The (\(\mathfrak{W}\))ait and (\(\mathfrak{R}\))ecovery actions |

\(\mathfrak{H},\mathfrak{C}\) | The (\(\mathfrak{H}\))ealthy and (\(\mathfrak{C}\))compromised node states |

\(\emptyset\) | The crashed node state |

\(f_{\mathrm{N}}, Z\) | Transition and observation functions for a node |

\(c_{\mathrm{N}}(s_{i,t}, a_{i,t})\) | Cost function for a node |

\(p_{\mathrm{A}}\) | Probability that a node is compromised |

\(p_{\mathrm{C}_1}\) | Probability that a healthy node crashes |

\(p_{\mathrm{C}_2}\) | Probability that a compromised node crashes |

\(p_{\mathrm{U}}\) | Probability that a service replica is updated |

\(\Delta_{\mathrm{R}}\) | Maximum allowed time between node recoveries |

\(\mathcal{W},\mathcal{R}\) | Wait and recovery sets for node controllers |

\(s_{t},\delta_{t}\) | State and state delta of system controller at time \(t\) |

\(S_{t},\Delta_{t}\) | Random variables with realizations \(s_{t},\delta_{t}\) |

\(p_{\Delta}\) | Distribution of \(\Delta_{t}\) |

\(a_{t},c_t\) | Action and cost of system controller at time \(t\) |

\(A_{t},C_{t}\) | Random variables with realizations \(a_t,c_t\) |

\(f_{\mathrm{S}}\) | Transition function of the system controller |

\(\mathcal{S}_{\mathrm{S}},\mathcal{A}_{\mathrm{S}}\) | State and action spaces of the system controller |

\(s_{\mathrm{max}}\) | Maximum state of the system controller |

\(\pi, \Pi_{\mathrm{S}}\) | Strategy and strategy space of the system controller |

\(J\) | Objective of the system controller |

\(\epsilon_{\mathrm{A}}\) | Minimum allowed average service availability |

The first control level consists of node controllers that decide when to recover nodes from intrusions. We model intrusion recovery as an instance of the classical *machine replacement problem* from operations research [24], which is a special type of optimal stopping problem [60].

Denote with \(\mathcal{N} \triangleq \{1, 2, \ldots, N\}\) the set of nodes and with \(\pi_{1,t}, \pi_{2,t},\ldots, \pi_{N,t}\) the strategies of the corresponding node controllers at time \(t\). Each node \(i \in \mathcal{N}\) has a state \(s_{i,t}\) that evolves in time-steps \(t=1,2,\ldots\). It can take on three values: \(\emptyset\) if the node has crashed, \(\mathfrak{C}\) if the node is compromised, and \(\mathfrak{H}\) if the node is healthy. The state space is thus \(\mathcal{S}_{\mathrm{N}}=\{\mathfrak{H},\mathfrak{C},\emptyset\}\).

A node controller can invoke two actions: (\(\mathfrak{W}\))ait and (\(\mathfrak{R}\))ecover, which we encode with \(0\) and \(1\), respectively. The action space is thus \(\mathcal{A}_{\mathrm{N}}=\{\mathfrak{W},\mathfrak{R}\}\). Let \(a_{i,t}\) denote the action of \(\pi_{i,t}\) at time \(t\). The evolution of the node state \(s_{i,t}\) can then be written as \[\begin{align} s_{i,t+1} &\sim f_{\mathrm{N}}(\cdot \mid S_{i,t}=s_{i,t}, A_{i,t}=a_{i,t}) \end{align}\] where \(f\) is a Markovian transition function defined as \[\tag{1} \begin{align} &f_{\mathrm{N}}(\emptyset \mid \emptyset, \cdot) \triangleq 1 \tag{2}\\ &f_{\mathrm{N}}(\emptyset \mid \mathfrak{H}, \cdot) \triangleq p_{\mathrm{C}_1}\tag{3}\\ &f_{\mathrm{N}}(\emptyset \mid \mathfrak{C}, \cdot) \triangleq p_{\mathrm{C}_2}\tag{4}\\ &f_{\mathrm{N}}(\mathfrak{H} \mid \mathfrak{H}, \mathfrak{R})\triangleq(1-p_{\mathrm{A}})(1-p_{\mathrm{C}_1})\tag{5}\\ &f_{\mathrm{N}}(\mathfrak{H} \mid \mathfrak{C}, \mathfrak{R})\triangleq(1-p_{\mathrm{A}})(1-p_{\mathrm{C}_2})\tag{6}\\ &f_{\mathrm{N}}(\mathfrak{H} \mid \mathfrak{C}, \mathfrak{W}) \triangleq (1-p_{\mathrm{C}_2})p_{\mathrm{U}}\tag{7}\\ &f_{\mathrm{N}}(\mathfrak{H} \mid \mathfrak{H}, \mathfrak{W}) \triangleq(1-p_{\mathrm{C}_1})(1-p_{\mathrm{A}})\tag{8}\\ &f_{\mathrm{N}}(\mathfrak{C} \mid \mathfrak{H}, \mathfrak{W}) \triangleq f_{\mathrm{N}}(\mathfrak{C} \mid \mathfrak{H}, \mathfrak{R})=(1-p_{\mathrm{C_1}})p_{\mathrm{A}}\tag{9} \\ &f_{\mathrm{N}}(\mathfrak{C} \mid \mathfrak{C}, \mathfrak{R})\triangleq (1-p_{\mathrm{C_2}})p_{\mathrm{A}}\tag{10}\\ &f_{\mathrm{N}}(\mathfrak{C} \mid \mathfrak{C}, \mathfrak{W}) \triangleq (1-p_{\mathrm{C}_2})(1-p_{\mathrm{U}})\tag{11} \end{align}\] Here \(p_{\mathrm{A}}\) is the probability that the attacker compromises the node, \(p_{\mathrm{C}_1}\) is the probability that a healthy node crashes, \(p_{\mathrm{C}_2}\) is the probability that a compromised node crashes, and \(p_{\mathrm{U}}\) is the probability that the node’s software is updated as part of the software diversity scheme described in §4. These probabilities can be selected based on domain knowledge and based on failure models from reliability theory [61]. Figure 6 shows the effect of tuning \(p_{\mathrm{A}}\).

The evolution of a node’s state can be described with the state transition diagram in Fig. 5. Equations (2 )–(4 ) capture the transitions to \(\emptyset\), which is an absorbing state [62]. Next, (5 )–(8 ) define the transitions to \(\mathfrak{H}\), which is reached when the controller takes action \(\mathfrak{R}\) (7 ) or when its software is updated (8 ). Lastly, (9 )–(11 ) capture the transitions to \(\mathfrak{C}\), which occur when an intrusion starts.

The node states are hidden from the controllers, which observe \(o_{1,t},o_{t,2},\ldots, o_{N,t}\). Here \(o_{i,t} \in \mathcal{O}\) represents the weighted sum of ids alerts generated at node \(i\) at time \(t\), where \(\mathcal{O}\) is a finite set. Each such observation \(o_{i,t}\) is drawn from the random variable \(O_{i,t}\) with distribution \[\begin{align} Z(o_{i,t} \mid s_{i,t}) &\triangleq \mathbb{P}\left[O_{i}=o_{i,t} \mid S_{i,t}=s_{i,t}\right] \end{align}\] Based on these observations, a node controller \(i \in \mathcal{N}\) forms a belief about \(s_{i,t}\), which is expressed through the belief state \[\begin{align} b_{i,t} \triangleq \mathbb{P}[S_{i,t} = 1 \mid o_{i,1}, a_{i,1},\ldots, a_{i,t-1}, o_{i,t}] \end{align}\] which is a sufficient statistic for the Markov state \(s_{i,t}\) (see App. 11). Hence we can define the strategy of a node controller to be a function \(\pi_{i,t}: [0,1] \rightarrow \{\mathfrak{W}, \mathfrak{R}\}\).

When selecting the strategy \(\pi_{i,t}\) the controller balances two conflicting goals: minimize the average time-to-recovery when an intrusion occurs \(T_{i}^{(\mathrm{R})}\) and minimize the frequency of recoveries \(F^{(R)}_{i}\). The weight \(\eta \geq 1\) controls the trade-off between these two objectives, which results in the bi-objective \[\begin{align} \min J_i &\triangleq \lim_{T \rightarrow \infty}\left[\eta T_{i,T}^{(\mathrm{R})} + F^{(\mathrm{R})}_{i,T}\right]\label{eq:objective95recovery}\\ &=\lim_{T \rightarrow \infty}\Bigg[\frac{1}{T}\sum_{t=1}^T\underbrace{\eta s_{i,t} - a_{i,t}\eta s_{i,t} + a_{i,t}}_{c_{\mathrm{N}}(s_{i,t}, a_{i,t})}\Bigg]\nonumber \end{align}\tag{12}\] where \(T_{i,T}^{(\mathrm{R})}\) is the average time-to-recovery for node \(i\) at time \(T\), \(F^{(\mathrm{R})}_{i,T}\) is the frequency of recoveries of node \(i\) at time \(T\), and \(c_{\mathrm{N}}(s_{i,t}, a_{i,t})\) is the cost function.

We define intrusion recovery to be the problem of minimizing the above objective (12 ) subject to a bounded-time-to-recovery (btr) safety constraint for each node [63]. The intrusion recovery problem can then be stated as

**Problem 1** (Intrusion Recovery). *\[\label{eq:recovery95problem}
\begin{align}
\mathop{\mathrm{minimize}}_{\pi_{i,t} \in \Pi, i \in \mathcal{N}} \quad &\sum_{i \in \mathcal{N}} \mathbb{E}_{\pi_{i,t}}\left[J_i \mid B_{i,1}=p_{\mathrm{A}}\right]\label{eq:node95maximization}\\ \mathrm{subject}\text{ }\mathrm{to} \text{ }\quad &
a_{i,k\Delta_{\mathrm{R}}} = \mathfrak{R} &&\forall i,k \label{eq:recovery95constraint}\\ &s_{i,t+1} \sim f_{\mathrm{N}}(\cdot \mid s_{i,t}, a_{i,t}) &&\forall i,t \label{eq:dynamics95recov95constraint}\\ \quad & o_{i,t+1} \sim
Z(\cdot \mid s_{i,t}) &&\forall i,t \label{eq:obs95recov95constraint}\\ \quad & a_{i, t+1} \sim \pi_{i,t}(b_{i,t}) &&\forall i,t \label{eq:action95recov95constraint}
\end{align}\] {#eq: sublabel=eq:eq:recovery95problem,eq:eq:node95maximization,eq:eq:recovery95constraint,eq:eq:dynamics95recov95constraint,eq:eq:obs95recov95constraint,eq:eq:action95recov95constraint} *

where \(k=1,2,\ldots\), \(\Pi_{\mathrm{N}}\) is the strategy space; \(B_{i,1}\) is the initial belief of node \(i\), \(\mathbb{E}_{\pi_{i,t}}\) denotes the expectation of the random variables \((S_{i,t}, O_{i,t}, A_{i,t}, B_{i,t})_{t \in \{1,2,\ldots\}}\) under strategy \(\pi_{i,t}\); (?? ) is the btr constraint; (?? ) is the dynamics constraint; (?? ) capture the observations; and (?? ) capture the actions.

We say that a strategy \(\pi_{i,t}^{*}\) is optimal if it achieves the minimization in (?? ) and satisfies all of the constraints. We note that (?? ) corresponds to a special case of the classical machine replacement problem [24], which leads us to the following structural result.

**Theorem 1**. *If the following holds \[\begin{align}
&0 < p_{\mathrm{A}}, p_{\mathrm{U}}, p_{\mathrm{C}_1}, p_{\mathrm{C}_2}< 1 \label{eq:assumption95A}\\
& p_{\mathrm{A}} + p_{\mathrm{U}} \leq 1 \label{eq:assumption95B}\\
&\frac{p_{\mathrm{C}_1}(p_{\mathrm{U}}-1)}{p_{\mathrm{A}}(p_{\mathrm{C}_1}-1) + p_{\mathrm{C}_1}(p_{\mathrm{U}}-1)} \leq p_{\mathrm{C}_2} \label{eq:assumption95C}\\
&Z(o_{i,t} \mid s_{i,t}) > 0 &\forall o_{i,t},s_{i,t}\label{eq:assumption95D}\\
&Z\text{ }\text{is }\text{\mathrm{\small tp-2}}\text{ }\label{eq:assumption95E}\text{\cite{krishnamurthy_2016}}
\end{align}\] {#eq: sublabel=eq:eq:assumption95A,eq:eq:assumption95B,eq:eq:assumption95C,eq:eq:assumption95D,eq:eq:assumption95E} then there exists an optimal strategy \(\pi_{i,t}^{*}\) that solves Problem 1 for each node \(i \in \mathcal{N}\) and satisfies \[\begin{align}
\pi_{i,t}^{*}(b_{i,t}) = \mathfrak{R} \iff b_{i,t} &\geq \alpha_{t}^{*} && \forall t, \alpha_t^{*} \in [0,1]\label{eq:threhsold95structure}
\end{align}\qquad{(1)}\] *

*Proof.* See App. 12. ◻

**Corollary 1**. *The recovery thresholds satisfy \(\alpha_{t+1}^{*} \geq \alpha_{t}^{*}\) for all \(t \in [k\Delta_{R}, (k+1)\Delta_{R}]\) and as \(\Delta_{\mathrm{R}} \rightarrow \infty\), the thresholds converge to a time-independent threshold \(\alpha^{*}\).*

*Proof.* See App. 13. ◻

Theorem 1 states that under assumptions generally met in practice, there exists an optimal strategy for each node \(i \in \mathcal{N}\) that recovers the node when its belief state exceeds a threshold. Further, Cor. 1 says that a) the threshold increases as the time until the next periodic recovery decreases; and b) when there are no periodic recoveries (i.e., when \(\Delta_{R}=\infty\)), the threshold is independent of time.

The above statements rely on five assumptions (?? –?? ). Assumptions ?? –?? are mild assumptions which state that the attack, crash, and upgrade probabilities are small but non-zero and that the difference between \(p_{\mathrm{C}_2}\) and \(p_{\mathrm{C}_1}\) is sufficiently large. Examples of parameter values that satisfy assumptions ?? –?? are \(p_{\mathrm{U}}=10^{-1}\), \(p_{\mathrm{A}}=10^{-2}\), \(p_{\mathrm{C}_1}=10^{-6}\), and \(p_{\mathrm{C}_2}=10^{-4}\). Assumption ?? is a technical assumption that says that the probability of observing any element in the observation space is non-zero, which generally holds in practice. For example, the probability of observing \(k\) ids alerts for any \(k\) within a reasonable range is non-zero in most real systems. Lastly, assumption ?? means, informally, that the probability of high-priority ids alerts increases when an intrusion occurs. This is a pretty strong assumption but is necessary to make any statements regarding the optimal recovery strategies. If assumption ?? does not hold, simple periodic recoveries may be optimal as there is little available information of a possible intrusion.

Knowing that optimal strategies with the threshold properties of Thm. 1 and Cor. 1 exist has two benefits. First, it allows for concise and efficient implementations of the strategies. Second, the complexity of computing or learning the optimal strategies can be reduced by exploiting the threshold properties. In §8 we describe a stochastic approximation algorithm that exploits the structural result in Thm. 1 and Cor. 1.

At each time \(t\), the node controllers \(\pi_{1,t},\ldots,\pi_{N,t}\) send their belief states \(b_{1,t},\ldots,b_{N,t}\) to a system controller \(\pi\) (see Fig. 1). Given these belief states, the task of the system controller is to decide whether a new node should be added to the system or not, i.e., whether the replication factor
\(N\) should be increased. We model this task as an instance of the classical *inventory replenishment problem* from operations research [25].

We define the state \(s_{t} \in \mathbb{N}_{\leq s_{\mathrm{max}}}\) to represent the expected number of healthy nodes in the system at time \(t\). The initial number of nodes is \(s_{1}=N\) and \(s_{\mathrm{max}}\) is an upper bound on the number of nodes. Hence the state space is \(\mathcal{S}_{\mathrm{S}} = \{0,1,\ldots,s_{\mathrm{max}}\}\).

At each time \(t > 1\), the system controller takes an action \(a_t \in \{0,1\}=\mathcal{A}_{\mathrm{S}}\). \(a_t=1\) means that a new node is added to the system. Given these actions, the evolution of the state \(s_{t}\) can be expressed through the following equation \[\begin{align} s_{t+1} &= \underbrace{\min\left[\max\left[s_t + \delta_t + a_t,0\right], s_{\mathrm{max}}\right]}_{\triangleq f_{\mathrm{S}}(s_t,a_t,\delta_t)} \end{align}\] where \(\delta_t = \lfloor \sum_i b_{i,t}\rfloor - s_{t}\). (We will sometimes abuse notation and write \(f_{\mathrm{S}}(s_{t+1} \mid s_t, a_t)\) to mean \(\mathbb{P}[S_{t+1}=s_{t+1} \mid S_{t}=s_{t}, A_{t}=a_{t}]\).)

\(\delta_t\) realizes the random variable \(\Delta_t\) with distribution \(p_{\Delta}(s_t)\). This distribution depends on the state \(s_t\) and the strategies of the node controllers \(\pi_{1,t},\ldots,\pi_{N,t}\), as well as characteristics of the communication channels between the node controllers and the system controller.

When selecting the control strategy \(\pi\), the system controller balances two conflicting goals: maximize the average service availability \(T_T^{(\mathrm{A})}\) and minimize the number of nodes. We model these two goals with the following constrained objective \[\begin{align} \min \quad &J\triangleq \lim_{T \rightarrow \infty}\left[\frac{1}{T}\sum_{t=1}^Ts_t\right]\label{eq:objective95response}\\ \mathrm{subject}\text{ }\mathrm{to} \quad &T^{(\mathrm{A})} \geq \epsilon_{\mathrm{A}}\nonumber\\ \implies &\lim_{T \rightarrow \infty}\left[\frac{1}{T}\sum_{t=1}^{T}\llbracket N_t \geq 2f+1 \rrbracket \right] \geq \epsilon_{\mathrm{A}}\nonumber\\ \implies &\lim_{T \rightarrow \infty}\left[\frac{1}{T}\sum_{t=1}^{T}\llbracket s_t \geq f+1\rrbracket \right] \geq \epsilon_{\mathrm{A}}\nonumber \end{align}\tag{13}\] where \(\epsilon_{\mathrm{A}}\) is the minimum allowed average service availability.

Using (13 ), we define the intrusion response problem of the system controller as

**Problem 2** (Intrusion Response). *\[\label{eq:response95problem}
\begin{align} \mathop{\mathrm{minimize}}_{\pi \in \Pi_{\mathrm{S}}} \quad &\mathbb{E}_{\pi}\left[J \mid S_{1}=N\right]\label{eq:system95maximization}\\ \mathrm{subject}\text{ }\mathrm{to} \text{ }\quad &
\mathbb{E}_{\pi}\left[T^{(\mathrm{A})}\right] \geq \epsilon_{\mathrm{A}} && \forall t \label{eq:system95avail95constraint}\\ \quad & s_{t+1} = f_{(\mathrm{S})}(s_t,a_t,\delta_t) &&\forall t \label{eq:system95dynamics95constraint}\\
\quad & \delta_{t} \sim p_{\Delta}(s_t) &&\forall t \label{eq:system95delta95constraint}\\ \quad & a_{t+1} \sim \pi_{t}(s_{t}) &&\forall t \label{eq:action95response95constraint}
\end{align}\] {#eq: sublabel=eq:eq:response95problem,eq:eq:system95maximization,eq:eq:system95avail95constraint,eq:eq:system95dynamics95constraint,eq:eq:system95delta95constraint,eq:eq:action95response95constraint} *

where \(\Pi_{\mathrm{S}}\) is the strategy space; \(S_1\) is the initial state; \(\mathbb{E}_{\pi}\) denotes the expectation of the random variables \((\Delta_t)_{t=1,2,\ldots}\) under strategy \(\pi\); (?? ) is the availability constraint; (?? ) is the state dynamics constraint; (?? ) capture the belief updates; and (?? ) capture the actions.

We say that a strategy \(\pi^{*}\) is optimal if it achieves the minimization in (?? ) and satisfies all of the constraints. We note that (?? ) corresponds to a special case of the classical *inventory replenishment
problem* from operations research [25], which leads us to the following structural result.

**Theorem 2**. *If the following holds \[\begin{align}
& \exists \pi \in \Pi_{\mathrm{S}} \text{ such that }\mathbb{E}_{\pi}\left[T^{(\mathrm{A})}\right] \geq \epsilon_{\mathrm{A}}\label{eq:feasibility95assumption}\\
& f_{\mathrm{S}}(s^{\prime} \mid s, a) > 0 \quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\text{ }\text{ }\text{ }\text{ }\text{ } \forall s^{\prime},s, a \label{eq:unichain95assumption}\\
& \sum_{s^{\prime}=s}^{s_{\mathrm{max}}} f_{\mathrm{S}}(s^{\prime}\mid\hat{s}+1, a) \geq \sum_{s^{\prime}=s}^{s_{\mathrm{max}}} f_{\mathrm{S}}(s^{\prime} \mid \hat{s}, a) \quad\quad\quad\forall s,\hat{s},a \label{eq:dominance95transitions}\\
& \sum_{s^{\prime}=s}^{s_{\mathrm{max}}} f_{\mathrm{S}}(s^{\prime}|\hat{s}, 1)-f_{\mathrm{S}}(s^{\prime}|\hat{s}, 0) \text{ non-decreasing in s} \text{ }\forall s,\hat{s}\label{eq:tail95sum}
\end{align}\] {#eq: sublabel=eq:eq:feasibility95assumption,eq:eq:unichain95assumption,eq:eq:dominance95transitions,eq:eq:tail95sum} then there exists two strategies \(\pi_{\lambda_1}\) and \(\pi_{\lambda_1}\) that satisfy \[\begin{align}
\pi_{\lambda_1}(s_{t}) &= 1 \iff s_{t} \leq \beta_1 && \forall t\\
\pi_{\lambda_2}(s_{t}) &= 1 \iff s_{t} \leq \beta_2 && \forall t
\end{align}\] and an optimal randomized threshold strategy \(\pi^{*}\) that solves Problem 2 and satisfies \[\begin{align}
\pi^{*}(s_{t}) &= \kappa \pi_{\lambda_1}(s_{t}) + (1-\kappa)\pi_{\lambda_2}(s_{t}) && \forall t \label{eq:randomized95threshold}
\end{align}\qquad{(2)}\] for some probability \(\kappa \in [0,1]\).*

*Proof.* See App. 14 ◻

Theorem 2 states that under assumptions generally met in practice, there exists an optimal intrusion response strategy that is a randomized threshold strategy (i.e., a randomized mixture of two threshold strategies). The implication of this structure is that the space of strategies that have to be considered is significantly reduced. This reduction in the number of strategies to consider allows to compute an optimal strategy more efficiently.

The theorem relies on four assumptions (?? –?? ). Assumption ?? states that Problem 2 is feasible, i.e., that there exists a strategy that satisfies the availability constraint (?? ). This assumption can always be fulfilled by tuning \(\epsilon_{\mathrm{A}}\). Assumptions ?? –?? are mild assumptions which generally hold in practice. More specifically, assumption ?? states that the probability of a large number of simultaneous intrusions and recoveries are non-zero. Further, assumption ?? states, informally, that the larger the current number of healthy nodes are, the more probable it is to have a large number of healthy nodes in the future, which is intuitively true. Lastly, assumption ?? is a technical assumption which states that the transition probabilities are tail-sum supermodular [65], i.e., the estimated number of healthy nodes when adding a new node to the system does not decrease as the current number of healthy nodes is increased, which is true in general.

In this section we describe our implementation of tolerance.

Intrusion tolerance is studied in several broad areas of research, including: Byzantine fault tolerance [15], dependability [67], [68], reliability [69], and cyber resilience [4], [51]–[55], [70]. This research effort has led to a range of mechanisms for implementing intrusion-tolerant systems, such as: intrusion-tolerant consensus protocols [15], [67]–[69], software diversification schemes [32], schemes for geographic distribution of nodes [71], cryptographic mechanisms for data confidentiality and integrity [20], [21], and rate limiting to reduce the impact of denial of service attacks [57]. All of these mechanisms provide the foundation for the system that we propose in this paper, which assumes an underlying intrusion-tolerant system and adds automated intrusion recovery and response capabilities on top of it.

While the system proposed in this paper (tolerance) builds on all of the above works, we limit the following discussion to explain how tolerance differs from current state-of-the-art intrusion-tolerant systems and how the control system of tolerance relates to prior work that uses feedback control.

Existing state-of-the-art intrusion-tolerant systems include pbft[16], zyzzyva[17], hq[18], vm-fit[22], [46], prrw[72], recover[73], scit[74], coca[75], [76], spire[77], itcis-prr[78], crutial[79], and skynet[80]. All of these systems are based on an intrusion-tolerant consensus protocols and support intrusion recovery. Our system (tolerance) differs from the above systems in two main ways. First, tolerance uses feedback control to decide when to perform intrusion recovery. This contrasts with all of the referenced systems, which either use periodic or heuristic recovery schemes. Second, tolerance implements adaptive redundancy in response to intrusions. In comparison, all of the referenced systems except the system described in [76] use fixed redundancy. Further, the adaptive redundancy scheme in [76] is based on time-outs as opposed to our scheme which is based on feedback control. The benefit of our approach is that it adapts the redundancy based on measurements obtained from the system, allowing the system to adapt promptly to intrusions, not having to wait for a time-out.

As opposed to the classical research on intrusion-tolerant systems, where most of the fundamental results were obtained in the early 2000s [15], [16], the problems of automated intrusion recovery and response are active areas of research that uses concepts and methods from various emergent and traditional fields. Most notably from reinforcement learning (examples [70], [81]–[88]), control theory (see survey [89]), causal inference (see example [90]), game theory (see examples [91]–[95]), natural language processing (see example [96]), and evolutionary computation (see example [97]). While these works have obtained promising results, none of them consider the integration with classical intrusion-tolerant systems as we do in this paper. Another drawback of the existing solutions is that many of them provide solutions based on neural networks that are difficult to interpret and lack safety guarantees. Finally, and most importantly, nearly all of the above works are limited to simulation environments and it is not clear how they generalize to practical systems. In contrast, the system that we propose in this paper is practical: it can be integrated with existing intrusion-tolerant systems, it satisfies safety constraints, and it incorporates several important optimizations that makes it computationally efficient.

This paper presents tolerance: a novel intrusion-tolerant system based on automated intrusion recovery and response. tolerance builds on previous generations of intrusion-tolerant systems and adds two levels of feedback-control to decide when to perform recovery and how to respond to intrusions. We show that the recovery and response problems correspond to two classical problems in operations research, namely the machine replacement problem and the inventory replenishment problem, which provides theoretical insight. Based on this insight we prove that the optimal control strategies have threshold structures, which we exploit to design an efficient stochastic approximation algorithm that approximates the optimal strategies. To assess the performance of tolerance, we implement it in an emulation environment where we run real intrusions and response actions. The results demonstrate that tolerance significantly improves availability and reduces costs against several types of network intrusions when compared against the state-of-the-art. Future work will focus on extending tolerance to withstand additional attack types, e.g., denial of service attacks. We also plan to extend our formal control-theoretic model to a game-theoretic one.

The belief state for each node \(i \in \mathcal{N}\) can be computed recursively as \[\begin{align} &b_{i,t}(s_{i,t}) \stackrel{\scriptscriptstyle(\mkern-1.5mua\mkern-1.5mu)}{=} \mathbb{P}\big[s_{i,t} \mid \overbrace{o_{i,t},a_{i,t-1},o_{i,t-1},\ldots,a_{i,1}, o_{i,1}}^{h_{i,t}}\big]\\ &\stackrel{\scriptscriptstyle(\mkern-1.5mub\mkern-1.5mu)}{=}\frac{\mathbb{P}\big[o_{i,t} \mid s_{i,t},a_{i,t-1},h_{i,t-1}]\mathbb{P}[s_{i,t} \mid a_{i,t-1},h_{i,t-1}\big]}{\mathbb{P}\big[o_{i,t} \mid a_{i,t-1},h_{i,t-1}\big]}\nonumber\\ &\stackrel{\scriptscriptstyle(\mkern-1.5muc\mkern-1.5mu)}{=}\frac{Z(o_{i,t} \mid s_{i,t})\mathbb{P}[s_{i,t} \mid a_{i,t-1},h_{i,t-1}\big]}{\mathbb{P}\big[o_{i,t} \mid a_{i,t-1}h_{i,t-1}\big]}\nonumber\\ &\stackrel{\scriptscriptstyle(\mkern-1.5mud\mkern-1.5mu)}{=}\frac{Z(o_{i,t} \mid s_{i,t})\sum_{s \in \mathcal{S}}b_{i,t-1}(s)f_{\mathrm{N}}(s_{i,t} \mid s, a_{t-1})}{\mathbb{P}\big[o_{i,t} \mid a_{i,t-1}h_{i,t-1}\big]}\nonumber\\ &\stackrel{\scriptscriptstyle(\mkern-1.5mue\mkern-1.5mu)}{=}\frac{Z(o_{i,t} \mid s_{i,t})\sum_{s \in \mathcal{S}}b_{i,t-1}(s)f_{\mathrm{N}}(s_{i,t} \mid s, a_{t-1})}{\sum_{s^{\prime},s\in \mathcal{S}}Z(o_{i,t}\mid s^{\prime})f_{\mathrm{N}}(s^{\prime} \mid s, a_{i,t-1})b_{i,t-1}(s)}\nonumber \end{align}\] (a) follows from the definition of \(b_{i,t}\); (b) is an expansion of the conditional probability using Bayes rule; (c) follows from the Markov property of \(Z\); (d) follows from the Markov property of \(f_{\mathrm{N}}\); and (e) follows by definition of \(Z\) and \(f_{\mathrm{N}}\).

Solving (?? ) corresponds to solving \(N\) finite and stationary Partially Observed Markov Decision Processes (pomdps) with bounded costs and the average cost optimality criterion. Since the \(N\) pomdps are equivalent it suffices to prove the statement for a single pomdp.

It follows from (1 ) and assumption ?? that any of the above pomdps has the following ergodicity property: \(f_{\mathrm{N}}(s^{\prime} \mid s, a) > 0\) for all \(t,s^{\prime},s,a\). Given this property and assumption ?? , we know that there exists a deterministic optimal strategy \(\pi_{i,t}^{*}\) for which the limit in (12 ) exists and which satisfies the following Bellman equation [98] \[\begin{align} &\pi_{i,t}^{*}(b_{i,t}) \in \mathop{\mathrm{arg\,min}}_{a \in \{\mathfrak{W}, \mathfrak{R}\}}\bigg[c_{\mathrm{N}}(b_{i,t}, a) + \sum_{o \in \mathcal{O}}\mathbb{P}\left[o| b_{i,t}, a\right]V_{i,t}^{*}(b_{i,t+1})\bigg]\label{eq:belief95bellman} \end{align}\tag{14}\] for all \(b_{i,t}\). Here \(c_{\mathrm{N}}(b_{i,t}, a)\) is the expected immediate cost of taking action \(a\) in belief state \(b_{i,t}\) and \(V_{i,t}^{*}\) is the value function [65]. (See [99] for a full proof of (14 ).)

Next note that each of the constrained pomdps with infinite horizons defined in (?? ) can be converted into a sequence of unconstrained pomdps\((\mathcal{M}_{i,k})_{k=1,2,\ldots}\) with horizons defined by \(T=\Delta_{\mathrm{R}}\), where \(a_{i,T}=\mathfrak{R}\) ensures that the btr constraint (?? ) is satisfied. Solving this sequence of pomdps is equivalent to solving the original pomdp because \[\begin{align} &\mathop{\mathrm{arg\,min}}_{\pi_{i,t}}\left[\lim_{T\rightarrow \infty}\mathbb{E}_{\pi_{i,t}}\left[\frac{1}{T}\sum_{t=1}^{T}C_{i,t} \mid B_{i,1}=p_{\mathrm{A}}\right]\right]\nonumber\\ &\stackrel{\scriptscriptstyle(\mkern-1.5mua\mkern-1.5mu)}{=}\mathop{\mathrm{arg\,min}}_{\pi_{i,t}}\Bigg[\lim_{T\rightarrow \infty}\frac{1}{T}\bigg(\mathbb{E}_{\pi_{i,t}}\bigg[\sum_{t=1}^{\Delta_{\mathrm{R}}}C_{i,t} \mid B_{i,1}=p_{\mathrm{A}}\bigg] + \nonumber\\ &\quad\quad\quad\quad\quad\mathbb{E}_{\pi_{i,t}}\bigg[\sum_{t=\Delta_{\mathrm{R}}}^{2\Delta_{\mathrm{R}}}C_{i,t} \mid B_{i,\Delta_{\mathrm{R}}}=p_{\mathrm{A}}\bigg] + \ldots\bigg)\Bigg]\nonumber\\ &\stackrel{\scriptscriptstyle(\mkern-1.5mub\mkern-1.5mu)}{=}\mathop{\mathrm{arg\,min}}_{\pi_{i,t}}\left[\lim_{T\rightarrow \Delta_{\mathrm{R}}}\mathbb{E}_{\pi_{i,t}}\left[\frac{1}{T}\sum_{t=1}^{\Delta_{\mathrm{R}}}C_{i,t} \mid B_{i,1}=p_{\mathrm{A}}\right]\right]\label{eq:decomposed95recovery95prob} \end{align}\tag{15}\] where \(C_{i,t}\) is a random variable representing the cost of node \(i\) at time \(t\); (a) follows from linearity of \(\mathbb{E}\); (b) follows because all elements inside the parentheses are equivalent, which means that a strategy that minimizes one element minimizes the whole expression.

Now consider the threshold structure in (?? ). We know that any deterministic strategy that achieves the minimization in (15 ) induces a partition of the belief space \([0,1]\) into two regions at each time \(t\): a wait region \(\mathcal{W}_t\) where the prescribed action is \(\mathfrak{W}\) and
a recovery region \(\mathcal{R}_t\) where the prescribed action is \(\mathfrak{R}\). The idea behind the proof of (?? ) is to show that \(\mathcal{R}_t =
[\alpha_t^{*}, 1]\) for all \(t\) and some threshold values \((\alpha_t^{*})_{t=1,\ldots,T}\). Towards this deduction, we first note that \(\mathcal{W}_t\) and \(\mathcal{R}_t\) are connected sets [65]. This
follows because (*i*) the transition probability and observation probability matrices are tp-2(it is a consequence of assumptions ?? , ?? , ?? , and ?? ); (*ii*) \(c_{\mathrm{N}}(s_{i,t},a_{i,t})\) (12 ) is submodular [65]; and
(*iii*) \(c_{\mathrm{N}}(s_{i,t},a_{i,t})\) is weakly increasing in \(s_{i,t}\) for each \(a_{i,t}\).

Note that since \(\mathcal{R}_t\) is connected, \(\mathcal{R}_t=[\alpha_t^{*}, 1] \iff 1 \in \mathcal{R}_t\). From (14 ) we obtain \[\begin{align} 1 \in \mathcal{R}_t \iff& \mathbb{E}_{B_{i,t+1}}\left[V_{i,t+1}^{*}(B_{i,t+1}) \mid A_{i,t}=\mathfrak{R}, B_{i,t}=1\right] \leq\\ &\mathbb{E}_{B_{i,t+1}}\left[V_{i,t+1}^{*}(B_{i,t+1}) \mid A_{i,t}=\mathfrak{W}, B_{i,t}=1\right] \end{align}\] Clearly \[\begin{align} \mathbb{E}_{B_{i,t+1}}\left[B_{i,t+1} \mid A_{i,t}=\mathfrak{R}, B_{i,t}=1\right] \leq\\ \mathbb{E}_{B_{i,t+1}}\left[B_{i,t+1} \mid A_{i,t}=\mathfrak{W}, B_{i,t}=1\right] \end{align}\] Further \(B^{\prime} \leq B \implies V_{i,t+1}^{*}(B^{\prime}) \leq V_{i,t+1}^{*}(B)\) for all \(i \in \mathcal{N}\) and \(t \geq 1\) [65], which gives the result. We have thus shown that \(\mathcal{R}_t\) is connected and that \(1 \in \mathcal{R}_t\), which means that \(\mathcal{R}_t = [\alpha_t^{*}, 1]\) for some threshold value \(\alpha_t^{*}\) at all times \(t\). 0◻

When \(\Delta_{\mathrm{R}} \rightarrow \infty\), (?? ) reduces to a set of unconstrained stationary and finite pomdps, which means that there exists an optimal deterministic stationary strategy for each node [99]. Such a strategy partitions the belief space into two time-independent regions \(\mathcal{W}\) and \(\mathcal{R}\), which means that the recovery threshold \(\alpha^{*}\) is time-independent.

When \(\Delta_{\mathrm{R}} < \infty\), it follows from (14 ) that it is optimal to recover node \(i \in \mathcal{N}\) at time \(t\) iff \[\begin{align} & c_{\mathrm{N}}(b_{i,t}, \mathfrak{R}) + \mathbb{E}_{B_{i,t+1}}[V^{*}_{i,t}(B_{i,t+1}) \mid a_t=\mathfrak{R}, b_{i,t}] \leq\\ &c_{\mathrm{N}}(b_{i,t}, \mathfrak{W}) + \mathbb{E}_{B_{i,t+1}}[V^{*}_{i,t}(B_{i,t+1}) \mid a_t=\mathfrak{W},b_{i,t}]\\ &\iff 1 \leq \eta b_{i,t} -W_{i,t}(b_{i,t})\\ &\iff b_{i,t} \geq \underbrace{\frac{1 -W_{i,t}(b_{i,t})}{\eta}}_{\alpha^{*}_t} \end{align}\] where \(W_{i,t}(b_{i,t}) \triangleq \mathbb{E}_{B_{i,t+1}}[V_{i,t+1}^{*}(B_{i,t+1}) \mid a_{i,t}=\mathfrak{W}, b_{i,t}] - \mathbb{E}_{B_{i,t+1}}[V^{*}_{i,t+1}(B_{i,t+1}) \mid a_{i,t}=\mathfrak{R}, b_{i,t}]\)

Hence \(\alpha^{*}_{t}\leq \alpha^{*}_{t+1}\) iff \(W_{i,t}\) is non-increasing in \(t\) for all \(b_{i,t}\) and \(i\). We prove this using mathematical induction on \(k=T,T-2,\ldots,1\). For \(k=T-1\) we have \(W_{i,T-1}(b_{i,T-1})=0\) for all \(b\) and \(i\). Next, for \(k=T-2\) we have \[\begin{align} &W_{i,T-2}(b_{T-2}) = \min\big[1 + \mathbb{E}_{B_{i,T}}[V^{*}_{i,T}(B_{i,T}) \mid a_{T-1}=\mathfrak{R}], \\ &\mathbb{E}_{B_{i,T-1},B_{i,T}}[\eta B_{i,T-1} + V^{*}_{i,T}(B_{i,T}) \mid a_{T-2}=a_{T-1}=\mathfrak{W}, b_{T-2}]\big]\\ & \quad\quad\quad\quad\quad\quad-\min\big[1 -\mathbb{E}_{B_{i,T}}[V^{*}_{i,T}(B_{i,T}) \mid a_{T-1}=\mathfrak{R}], \\ &\mathbb{E}_{B_{i,T-1},B_{i,T}}[\eta B_{i,T-1} + V^{*}_{i,T}(B_{i,T}) \\ &\quad\quad\quad\quad\quad\quad\text{ }\mid a_{T-2}=\mathfrak{R},a_{T-1}=\mathfrak{W}, b_{T-2}]\big]\\ &\stackrel{\scriptscriptstyle(\mkern-1.5mua\mkern-1.5mu)}{\geq} 0 = W_{i,T-1}(b_{i,T-2}) \end{align}\] where (a) follows because \(\mathbb{E}[B^{\prime} \mid a_t=\mathfrak{W}, b] \geq \mathbb{E}[B^{\prime} \mid a_t=\mathfrak{R}, b]\) by definition (1 ).

Now assume by induction that \(W_{i,k}(b) \geq W_{i,k+1}(b)\) for \(k=t,t+1,\ldots,T-3\) and all \(b\) and \(i\). We will show that this assumption implies \(W_{i,k-1}(b) \geq W_{i,k}(b)\) for all \(b\) and \(i\).

There are three cases to consider:

If \(B_k \in \mathcal{R}\) both when \(a_{i,k-1} = \mathfrak{W}\) and when \(a_{i,k-1} = \mathfrak{R}\), then \[\begin{align} &W_{i,k-1}(b_{i,k-1}) = \mathbb{E}_{B_k}[V_{i,k}^{*}(B_k) \mid a_{i,k-1}=\mathfrak{W}, b_{i,k-1}] -\\ &\quad\quad\quad\quad\quad\quad\quad\text{ }\text{ }\mathbb{E}_{B_{k}}[V_{i,k}^{*}(B_k) \mid a_{i,k-1}=\mathfrak{R}, b_{i,k-1}]\\ &\stackrel{\scriptscriptstyle(\mkern-1.5mua\mkern-1.5mu)}{=} \mathbb{E}_{B_{k+1}}[1+ V_{i,k+1}^{*}(B_{k+1}) \mid a_{i,k}=\mathfrak{R}] -\\ &\quad\text{ }\mathbb{E}_{B_k+1}[1+ V_{i,k+1}^{*}(B_{k+1}) \mid a_{i,k}=\mathfrak{R}]\\ & \stackrel{\scriptscriptstyle(\mkern-1.5mub\mkern-1.5mu)}{=} \mathbb{E}_{B_{k+1}}[V_{i,k+1}^{*}(B_{k+1}) \mid a_{i,k}=\mathfrak{R}] -\\ &\quad\text{ }\mathbb{E}_{B_{k+1}}[V_{i,k+1}^{*}(B_{k+1}) \mid a_{i,k}=\mathfrak{R}]\\ &\stackrel{\scriptscriptstyle(\mkern-1.5muc\mkern-1.5mu)}{=} W_{i,k}(b_{i,k-1}) \end{align}\] where (a) follows from (14 ).

If \(B_k \in \mathcal{W}\) both when \(a_{i,k-1} = \mathfrak{W}\) and when \(a_{i,k-1} = \mathfrak{R}\), then \[\begin{align} &W_{i,k-1}(b_{i,k-1}) = \mathbb{E}_{B_k}[V_{i,k}^{*}(B_k) \mid a_{i,k-1}=\mathfrak{W}, b_{i,k-1}] -\\ &\quad\quad\quad\quad\quad\quad\quad\text{ }\text{ }\mathbb{E}_{B_{k}}[V_{i,k}^{*}(B_k) \mid a_{i,k-1}=\mathfrak{R}, b_{i,k-1}]\\ &\stackrel{\scriptscriptstyle(\mkern-1.5mua\mkern-1.5mu)}{=} \mathbb{E}_{B_{k},B_{k+1}}[\eta B_{k} + V_{i,k+1}^{*}(B_{k+1}) \mid \\ &\quad\quad\quad\quad\quad\quad a_{i,k}=a_{i,k-1}=\mathfrak{W}, b_{i,k-1}] - \\ &\quad\text{ }\mathbb{E}_{B_k,B_{k+1}}[\eta B_{k} + V_{i,k+1}^{*}(B_{k+1})\mid\\ & \quad\quad\quad\quad\quad\quad a_{i,k}=\mathfrak{W},a_{i,k-1}=\mathfrak{R}, b_{i,k-1}]\\ & \stackrel{\scriptscriptstyle(\mkern-1.5mub\mkern-1.5mu)}{\geq} \mathbb{E}_{B_{k+1}}[V_{i,k+1}^{*}(B_{k+1}) \mid a_{i,k}=a_{i,k-1}=\mathfrak{W},b_{i,k-1}] -\\ &\quad\text{ }\mathbb{E}_{B_{k+1}}[V_{i,k+1}^{*}(B_{k+1}) \mid a_{i,k}=\mathfrak{W},a_{i,k-1}=\mathfrak{R},b_{i,k-1}]\\ &\stackrel{\scriptscriptstyle(\mkern-1.5muc\mkern-1.5mu)}{=} W_{i,k}(b_{i,k-1}) \end{align}\] where (b) follows because \(\mathbb{E}[B^{\prime} \mid a=\mathfrak{W}, b] \geq \mathbb{E}[B^{\prime} \mid a=\mathfrak{R}, b]\) by definition (1 ).

If \(B_k \in \mathcal{R}\) when \(a_{i,k-1} = \mathfrak{W}\) and \(B_k \in \mathcal{W}\) when \(a_{i,k-1} = \mathfrak{R}\), then \[\begin{align} &W_{i,k-1}(b_{i,k-1}) = \mathbb{E}_{B_k}[V_{i,k}^{*}(B_k) \mid a_{i,k-1}=\mathfrak{W}, b_{i,k-1}] -\\ &\quad\quad\quad\quad\quad\quad\quad\text{ }\text{ }\mathbb{E}_{B_{k}}[V_{i,k}^{*}(B_k) \mid a_{i,k-1}=\mathfrak{R}, b_{i,k-1}]\\ &\stackrel{\scriptscriptstyle(\mkern-1.5mua\mkern-1.5mu)}{=} \mathbb{E}_{B_{k+1}}[1+ V_{i,k+1}^{*}(B_{k+1}) \mid a_{i,k}=\mathfrak{R}] -\\ &\quad\text{ }\mathbb{E}_{B_k, B_{k+1}}[\eta B_k + V_{i,k+1}^{*}(B_{k+1})\mid a_{i,k}=\mathfrak{W},a_{i,k-1}=\mathfrak{R}] \\ & \stackrel{\scriptscriptstyle(\mkern-1.5mub\mkern-1.5mu)}{\geq} \mathbb{E}_{B_{k+1}}[1+ V_{i,k+1}^{*}(B_{k+1}) \mid a_{i,k}=\mathfrak{R}] -\\ &\quad\text{ }\mathbb{E}_{B_{k+1}}[1+ V_{i,k+1}^{*}(B_{k+1}) \mid a_{i,k}=\mathfrak{R}]\\ & \stackrel{\scriptscriptstyle(\mkern-1.5muc\mkern-1.5mu)}{\geq} \mathbb{E}_{B_{k+1}}[V_{i,k+1}^{*}(B_{k+1}) \mid a_{i,k}=\mathfrak{R}] -\\ &\quad\text{ }\mathbb{E}_{B_{k+1}}[V_{i,k+1}^{*}(B_{k+1}) \mid a_{i,k}=\mathfrak{R}]\\ &\stackrel{\scriptscriptstyle(\mkern-1.5mud\mkern-1.5mu)}{=} W_{i,k}(b_{i,k-1}) \end{align}\] where (b) follows from (14 ).

The case where \(B_k \in \mathcal{R}\) when \(a_{i,k-1} = \mathfrak{R}\) and where \(B_k \in \mathcal{W}\) when \(a_{i,k-1} = \mathfrak{W}\) can be discarded due to Thm 1 since \(\mathbb{E}[B^{\prime} \mid a=\mathfrak{W}, b] \geq \mathbb{E}[B^{\prime} \mid a=\mathfrak{R}, b]\), which means that if \(B_k \in \mathcal{R}\) when \(a_{i,k-1} = \mathfrak{R}\), then \(B_k \in \mathcal{W}\) also when \(a_{i,k-1} = \mathfrak{W}\).

It follows by induction that \(W_{i,t}(b) \geq W_{i,t+1}(b)\) for all \(t\), \(b\) and \(i\). 0◻

Solving (?? ) corresponds to solving a finite and stationary Constrained Markov Decision Process (cmdp) with bounded costs and the average cost optimality criterion. Assumption ?? implies that the cmdp is feasible and assumption ?? implies that the cmdp is unichain [65], which means that there exists an optimal stationary strategy \(\pi^{*}\) for which the limit in (13 ) exists [[100]][65].

By introducing the Lagrange multiplier \(\lambda\geq 0\) and defining the immediate cost to be \(c_{\lambda}(s_t) = s_t + \lambda \llbracket s_t \geq f+1 \rrbracket\) we can reformulate the cmdp as an unconstrained mdp through Lagrangian relaxation [101]. The optimal strategy in this unconstrained mdp satisfies the following Bellman equation: \[\begin{align} &\pi_{\lambda}^{*}(s_{t}) \in \mathop{\mathrm{arg\,min}}_{a \in \{0,1\}}\bigg[c_{\lambda}(s_t) + \mathbb{E}\left[V_{\lambda}^{*}(s_{t+1})\right]\bigg]\label{eq:bellman95eq95state} \end{align}\tag{16}\] where \(V_{\lambda}^{*}\) is the value function [101].

Since \(c_{\lambda}(s_t)\) is non-decreasing in \(s\) it follows from assumptions ?? and ?? that there exists an optimal threshold strategy for this mdp for any \(\lambda\) [[65]][100]. Further, we know from Lagrangian dynamic programming theory that there exists an optimal strategy in the cmdp which is a randomized mixture of two optimal deterministic strategies of the Lagrangian mdp with different Lagrange multipliers \(\lambda_1\) and \(\lambda_2\) (?? ) [65], [101]. When combined, these two properties imply the statement of the theorem. 0◻

[1]

.
[2]

.
[3]

.
[4]

.
[5]

.
[6]

.
[7]

.
[8]

.
[9]

.
[10]

.
[11]

.
[12]

.
[13]

.
[14]

.
[15]

.
[16]

.
[17]

.
[18]

.
[19]

M. Abd-El-Malek, G. R. Ganger, G. R. Goodson, M. K. Reiter, and J. J. Wylie, “Fault-scalable byzantine fault-tolerant services,” 2005 , isbn = {1595930795}, p. 59?74, doi:
10.1145/1095810.1095817.

[20]

.
[21]

.
[22]

.
[23]

.
[24]

.
[25]

.
[26]

J. A. H. Gooszen, “The use of control charts in the clinical laboratory,” *Clinica Chimica Acta*, vol. 5, no. 3, pp. 431–438, 1960, doi: https://doi.org/10.1016/0009-8981(60)90150-9.

[27]

D. J. White, “Real applications of markov decision processes,” *Interfaces*, vol. 15, no. 6, pp. 73–83, 1985, doi: 10.1287/inte.15.6.73.

[28]

.
[29]

K. Hammar and book Stadler Rolf, “NOMS 2023-2023 IEEE/IFIP network operations and management symposium , title=Digital Twins for Security Automation,” 2023, pp.
1–6, doi: 10.1109/NOMS56928.2023.10154288.

[30]

K. Hammar, “The cyber security learning environment (CSLE).” 2023, [Online]. Available: https://github.com/Limmen/csle , note = {\url{https://github.com/Limmen/csle}}.

[31]

K. Hammar, “A software framework for building self-learning security systems.” 2023, [Online]. Available: https://www.youtube.com/watch?v=iE2KPmtIs2A& , note = {\url{https://www.youtube.com/watch?v=iE2KPmtIs2A&}}.

[32]

.
[33]

.
[34]

[35]

.
[36]

.
[37]

.
[38]

.
[39]

.
[40]

.
[41]

.
[42]

.
[43]

.
[44]

W. Xu, “Hybrid fault tolerant consensus in wireless embedded systems,” PhD thesis, 2021.

[45]

M. Correia, N. F. Neves, and book Verissimo P., “Proceedings of the 23rd IEEE international symposium on reliable distributed systems, 2004. , title=How to tolerate half less
one Byzantine nodes in practical distributed systems,” 2004, pp. 174–183, doi: 10.1109/RELDIS.2004.1353018.

[46]

.
[47]

.
[48]

K. C. Li, X. Chen, H. Jiang, and isbn=9780429674440. Bertino E., *Essentials of blockchain technology*. CRC Press,
2019.

[49]

H. Ishii, Y. Wang, and keywords =. C. systems,. C. D. algorithms,. M. consensus,. F. injection attacks,. D. attacks Shuai Feng, “An overview on multi-agent consensus under
adversarial attacks,” *Annual Reviews in Control*, vol. 53, no. of extensions with enhanced resiliency will be established. As the second class of attacks, the effects of denial–of–service (DoS) attacks will be examined in the context of
multi–agent consensus. By employing a DoS model based on the energy constraints of the attacker, we will observe that robustness against such attacks may depend on system properties such as dynamics of the individual agents and network structures.
Applications of the algorithms will be further discussed for clock synchronization in wireless sensor networks and control of a group of mobile robots., pp. 252–272, 2022, doi: https://doi.org/10.1016/j.arcontrol.2022.01.004.

[50]

.
[51]

.
[52]

.
[53]

.
[54]

.
[55]

M. J. Weisman *et al.*, “Quantitative measurement of cyber resilience: Modeling and experimentation.” 2023 , eprint={2303.16307}, archivePrefix={arXiv},
primaryClass={cs.CR}.

[56]

.
[57]

.
[58]

L. Lamport, D. Malkhi, and L. Zhou, “Reconfiguring a state machine,” *SIGACT News*, vol. 41, no. 1, p. 63?73, 2010 , issue_date = {March 2010}, doi: 10.1145/1753171.1753191.

[59]

E. W. Dijkstra, “A belated proof of self-stabilization,” *Distrib. Comput.*, vol. 1, no. 1, p. 5?6, 1986 , issue_date = {Jan. 1986}, doi: 10.1007/BF01843566.

[60]

A. Wald, Wiley; Sons, New York, 1947.

[61]

.
[62]

.
[63]

.
[64]

.
[65]

.
[66]

.
[67]

.
[68]

.
[69]

.
[70]

.
[71]

.
[72]

.
[73]

.
[74]

.
[75]

.
[76]

.
[77]

.
[78]

.
[79]

.
[80]

vol. (DSN–S) , 2023, pp. 111–116.

[81]

.
[82]

.
[83]

.
[84]

.
[85]

.
[86]

.
[87]

.
[88]

.
[89]

.
[90]

.
[91]

.
[92]

.
[93]

.
[94]

.
[95]

.
[96]

.
[97]

.
[98]

bib Bellman Richard, “A markovian decision process , url = http://www.jstor.org/stable/24900506,” *Journal of Mathematics and Mechanics*,
vol. 6, no. 5, pp. 679–684, timestamp = 2017-12-20T14:47:54.000+0100, 1957, [Online]. Available: https://www.bibsonomy.org/bibtex/2c04c5f89b4e8445651eded5b56c67342/becker ,
interhash = {d7aa065c075b248c9980b4c45d635b66}, intrahash = {c04c5f89b4e8445651eded5b56c67342}.

[99]

.
[100]

.
[101]

.