Unifying Qualitative and Quantitative Safety Verification of DNN-Controlled Systems


Abstract

The rapid advance of deep reinforcement learning techniques enables the oversight of safety-critical systems through the utilization of Deep Neural Networks (DNNs). This underscores the pressing need to promptly establish certified safety guarantees for such DNN-controlled systems. Most of the existing verification approaches rely on qualitative approaches, predominantly employing reachability analysis. However, qualitative verification proves inadequate for DNN-controlled systems as their behaviors exhibit stochastic tendencies when operating in open and adversarial environments. In this paper, we propose a novel framework for unifying both qualitative and quantitative safety verification problems of DNN-controlled systems. This is achieved by formulating the verification tasks as the synthesis of valid neural barrier certificates (NBCs). Initially, the framework seeks to establish almost-sure safety guarantees through qualitative verification. In cases where qualitative verification fails, our quantitative verification method is invoked, yielding precise lower and upper bounds on probabilistic safety across both infinite and finite time horizons. To facilitate the synthesis of NBCs, we introduce their \(k\)-inductive variants. We also devise a simulation-guided approach for training NBCs, aiming to achieve tightness in computing precise certified lower and upper bounds. We prototype our approach into a tool called UniQQand showcase its efficacy on four classic DNN-controlled systems.

-8 6 =8em Introduction

The widespread adoption of deep reinforcement learning techniques has propelled advancements in autonomous systems, endowing them with adaptive decision-making capabilities by Deep Neural Networks (DNNs) [1]. Ensuring the safety of these DNN-controlled systems emerges as a critical concern, necessitating the provision of certified safety guarantees. Formal methods, renowned for their rigorousness and automaticity in delivering verified safety assurances, stand as a promising means to address this concern. However, most of the existing formal verification approaches rely on qualitative approaches, predominantly employing reachability analysis [2]. Despite their significance, qualitative results fall short for DNN-controlled systems due to the constant influence of various uncertainties from different sources, such as environment noises [3], unreliable sensors [4], and even malicious attacks [5]. When qualitative verification fails, it becomes both desirable and practical to obtain quantitative guarantees, including quantified lower and upper bounds on the safety probabilities of the systems. This necessitates the use of quantitative verification engines [2].

Quantitative verification has proven its efficacy in enhancing the design and deployment across a variety of applications, including autonomous systems [6], self-adaptive systems [7], distributed communication protocols [8], and probabilistic programs [9]. These applications are commonly modeled using automata-based quantitative formalisms [10], such as Markov chains, timed automata, and hybrid automata, and undergo verification using tools such as Prism [11] and Storm [12]. Nonetheless, the quantitative verification of DNN-controlled systems is challenging due to the incorporation of intricate and almost inexplicable decision-making models by DNNs [13]. Compounding the issue, the difficulty is amplified by the continuous and infinite state space, as well as the non-linear dynamics inherent in DNN-controlled systems. First, building a faithful automata-based probabilistic model for a DNN-controlled system is challenging. This difficulty arises as one cannot predict the action a DNN might take until a specific state is provided, and exhaustively enumerating all continuous states is impractical. Second, even if such a model is constructed under certain constraints, such as bounded steps [14] and state abstractions [15], verification is susceptible to state exploration issues—a well-known problem in model checking[16]. For instance, the verification process can take up to 50 minutes for just 7 steps [14].

Leveraging barrier certificates (BCs) for verification emerges as a promising technique for formally establishing the safety of non-linear and stochastic systems [17], [18]. A BC partitions the state space of the system into two parts, ensuring that all trajectories starting from a given initial set, located within one side of the BC, cannot reach a given set of states (deemed to be unsafe), located on the other side, almost surely (i.e., with probability \(1\)) or with probability at least \(p\in [0,1)\). Once a BC is computed, it can be used to certify systems’ safety properties either qualitatively or quantitatively. Recently, studies have shown that BCs can be implemented and trained in neural forms called Neural Barrier Certificates (NBCs). NBCs facilitate the synthesis of BCs and improve their expressiveness [19][23]. A relevant survey is delegated to [24].

In this paper, we propose a unified framework for both qualitatively and quantitatively verifying the safety of DNN-controlled systems by leveraging NBCs. The key idea is to reduce both qualitative and quantitative verification problems into a cohesive synthesis task of their respective NBCs. Specifically, we first seek to establish almost-sure safety guarantees through qualitative verification. In cases where qualitative verification fails, our quantitative verification method is invoked, yielding precise lower and upper bounds on probabilistic safety across both infinite and finite time horizons.

We also establish relevant theoretical results. In qualitative verification, we prove that an NBC satisfying corresponding conditions serves as a qualitative safety certificate. In the quantitative counterpart, we establish that valid NBCs can be utilized to calculate certified upper and lower bounds on the probabilistic safety of systems, encompassing both infinite and finite time horizons. For infinite time horizons, as the lower bounds on probabilistic safety approach zero, indicating a decreasing trend in safety probabilities along the time horizon, we provide both linearly and exponentially decreasing lower and upper bounds on the safety probabilities over finite time horizons.

To facilitate the synthesis of valid NBCs, we further relax their constraints by defining their \(k\)-inductive variants  [25]. This necessitates the conditions to be inductive for \(k\)-compositions of the transition relation within a specified bound \(k\) [26]. Consequently, synthesizing a qualified NBC becomes more manageable under these \(k\)-inductive conditions, while ensuring safety guarantees. As valid NBCs are not unique and yield different certified bounds, we devise a simulation-guided approach to train potential NBCs. This approach aims to enhance their capability to produce more precise certified bounds. Specifically, we estimate safety probabilities through simulation. The differences between the simulation results and the bounds provided by potential NBCs are incorporated into the loss function. This integration can yield more precise certified bounds after potential NBCs are successfully validated.

We prototype our approach into a tool, called UniQQ, and apply it to four classic DNN-controlled problems. The experimental results showcase the effectiveness of our unified verification approach in delivering both qualitative and quantitative safety guarantees across diverse noise scenarios. Additionally, the results underscore the efficacy of \(k\)-inductive variants in reducing verification overhead, by 25% on average, and that of our simulation-based training method in yielding tighter safety bounds, with an up to 47.5% improvement over ordinary training approaches.

Contributions. Overall, we make the following contributions.

  1. We present a novel framework that unifies both qualitative and quantitative safety verification of DNN-controlled systems by reducing these verification problems into the cohesive task of synthesizing NBCs.

  2. We establish relevant theoretical results, including new constraints of NBCs for both qualitative and quantitative safety verification and the associated lower and upper bounds for safety probabilities in both linear and exponential forms.

  3. To accelerate training, we relax the constraints of NBCs by introducing their \(k\)-inductive variants. We also present a simulation-guided approach designed to train potential NBCs to compute safety bounds as tightly as possible.

  4. We develop a prototype of our approach, showcasing its efficacy across four classic DNN-controlled systems.

All omitted proofs and supplementary experimental results are given in the Appendix.

-8 6 =8em Preliminaries

Let \({\mathbb{N}}\), \({\mathbb{Z}}\), and \({\mathbb{R}}\) be the sets of natural numbers, integers, and real numbers, respectively.

-8 6 =8em DNN-controlled Systems We consider DNN-controlled systems where the control policies are implemented by deep neural networks and suppose the networks are trained for specific tasks. Formally, a DNN-controlled system is a tuple \(M=(S,S_0,A,\pi,f,R)\), where \(S\subseteq {\mathbb{R}}^n\) is the set of (possibly continuous and infinite) system states, \(S_0\subseteq S\) is the set of initial states, \(A\) is the set of actions, \(\pi:S\rightarrow A\) is the trained policy implemented by a neural network, \(f:S\times A\rightarrow S\) is the system dynamics, and \(R:S\times A\times S\rightarrow {\mathbb{R}}\) is the reward function.

Trajectories. A trained DNN-controlled system \(M=(S,S_0,A,\pi,f,R)\) is a decision-making system that continuously interacts with the environment. At each time step \(t\in{\mathbb{N}}_0\), it observes a state \(s_t\) and feeds \(s_t\) into its planted NN to compute the optimal action \(a_t=\pi(s_t)\) that shall be taken. Action \(a_t\) is then performed, which transits \(s_t\) into the next state \(s_{t+1}=f(s_t,a_t)\) via the system dynamics \(f\) and earns a reward \(r_{t+1} = R(s_t,a_t,s_{t+1})\). Given an initial state \(s_0 \in S_0\), a sequence of states generated during interaction is called a trajectory, denoted as \(\omega=\{s_t\}_{t\in{\mathbb{N}}_0}\). To ease the notation, we denote by \(\omega_t\) the \(t\)-th element of \(\omega\), i.e., \(\omega_t=s_t\), and by \(\Omega\) the set of all trajectories.

State Perturbations. As DNN-controlled systems collect state information via sensors, uncertainties inevitably originate from sensor errors, equipment inaccuracy, or even adversarial attacks [3], [27]. Therefore, the observed states of the systems can be perturbed and actions are computed based on the perturbed states. Formally, an observed state at time step \(t\) is \(\hat{s}_t:=s_t+\delta_t\) where \(\delta_t\sim \mu\) is a random noise and \(\mu\) is a probability distribution over \({\mathbb{R}}^n\). We denote by \(W:=\textsf{supp}\left({\mu}\right)\) the support of \(\mu\). Due to perturbation, the actual successor state is \(s_{t+1}:=f(s_t,\hat{a}_t)\) with \(\hat{a}_t:=\pi(\hat{s}_t)\) and the reward is \(r_{t+1}:=R(s_t,\hat{a}_t,s_{t+1})\). Note that the successor state and the reward are calculated according to the actual state and the action on the perturbed state, and this update is common [3]. We then denote a DNN-controlled system \(M\) perturbed by a noise distribution \(\mu\) as \(M_\mu=(S,S_0,A,\pi,f,R,\mu)\).

Assumptions. Given a DNN-controlled system \(M=(S,S_0,A,\pi,f,R)\), we assume that the state space \(S\) is compact in the Euclidean topology of \({\mathbb{R}}^n\), its system dynamics \(f\) and trained policy \(\pi\) are Lipschitz continuous. This assumption is common in control theory [28]. For perturbation, we assume that the noise distribution \(\mu\) either has bounded support or is a product of independent univariate distributions.

Probability Space. Given a DNN-controlled system \(M_\mu=(S,S_0,A,\pi,f,R,\mu)\), for each initial state \(s_0\in S_0\), there exists a probability space \((\Omega_{s_0},\mathcal{F}_{s_0},{\mathbb{P}}_{s_0})\) such that \(\Omega_{s_0}\) is the set of all trajectories starting from \(s_0\) by the environmental interaction, \(\mathcal{F}_{s_0}\) is a \(\sigma\)-algebra over \(\Omega_{s_0}\) (i.e., a collection of subsets of \(\Omega_{s_0}\) that contains the empty set \(\emptyset\) and is closed under complementation and countable union), and \({\mathbb{P}}_{s_0}:\mathcal{F}_{s_0}\rightarrow [0,1]\) is a probability measure on \(\mathcal{F}_{s_0}\). We denote the expectation operator in this probability space by \({\mathbb{E}}_{s_0}\).

-8 6 =8em Barrier Certificate and its Neural Implementation

Barrier certificates (BCs) are powerful tools to certify the safety of continuous-time dynamical systems. In the following we describe the discrete-time BCs which this work is based upon. We refer readers to [29], [30] for details about continuous-time BCs.

Definition 1 (Discrete-time Barrier Certificates). Given a DNN-controlled system \(M=(S,S_0,A,f,\pi,R)\) with an unsafe set \(S_u\subseteq S\) such that \(S_u\cap S_0=\emptyset\). A discrete-time barrier certificate* is a real-valued function \(B:S\rightarrow {\mathbb{R}}\) such that for some constant \(\lambda\in (0,1]\), it holds that: \[\begin{align} & B(s)\le 0 \;&\text{for all }s\in S_0, \label[condition]{eq:dis-bc-initial} \\ & B(s)>0 \;& \text{for all } s\in S_u, \label[condition]{eq:dis-bc-unsafe} \\ & B(f(s,\pi(s)))-B(s)+\lambda\cdot B(s)\le 0 & \text{for all } s \in S. \label[condition]{eq:dis-bc-decrease} \end{align}\]*

If there exists such a BC for the system \(M\), then \(M\) is safe, i.e., the system cannot reach a state in the unsafe set \(S_u\) from the initial set \(S_0\). The intuition is that: implies that for any \(s\in S\) such that \(B(s)\le 0\), \(B(f(s,\pi(s)))\le 0\). Since asserts that the initial value of \(B\) is not greater than zero, any trajectory \(\omega\in \Omega_{s_0}\) starting from an initial state \(s_0\in S_0\) cannot enter the unsafe set \(S_u\), where \(B(s) > 0\) (see ), thereby ensuring the safety of the system.

Finding a BC is restricted to the expressiveness of templates. For example, even if there exists a function satisfying , it may be not found under polynomial forms. Recent work [19], [31], [32] proposes a neural implementation of BCs as deep neural networks, leveraging the expressiveness of neural networks. The neural implementation of a BC is called a neural barrier certificate (NBC), which consists of training and validation. First, a learner trains a neural network (NN) to fit over a finite set of samples the conditions for a BC. After training, an NBC is then checked whether it meets the conditions. This is achieved by a verifier using SMT solvers [19], [31] or other methods like Sum-of-Squares programming [32]. If the validation result is false, a set of counterexamples can be generated for future training. This iteration is repeated until a trained candidate is validated or a given timeout is reached. This training and validation iteration is called CounterExample-Guided Inductive Synthesis (CEGIS) [33].

-8 6 =8em Verification Problem and Our Framework

-8 6 =8em Problem Statement

We consider the safety of DNN-controlled systems from both qualitative and quantitative perspectives. Below we fix a DNN-controlled system \(M_\mu=(S,S_0,A,\pi,f,R,\mu)\) and an unsafe set \(S_u\subseteq S\) such that \(S_0\cap S_u =\emptyset\) throughout the paper.

Definition 2 (Almost-Sure Safety). The system \(M_\mu\) is almost-surely (a.s.) safe, if a.s. no trajectories starting from any initial state \(s_0\in S_0\) enter \(S_u\), i.e., \[\forall s_0\in S_0.\omega\in\Omega_{s_0} \Longrightarrow \omega_t \not\in S_u \;\forall t\in{\mathbb{N}}.\notag\]

This almost-sure safety is a qualitative property and we call it “almost-sure” due to the stochasticity from state perturbations. Since the almost-sure safety does not always exist with the increase of state perturbations, we propose the notion of probabilistic safety over infinite time horizons.

Definition 3 (Probabilistic Safety over Infinite Time Horizons). The system \(M_\mu\) is probabilistically safe over infinite time horizons* with \([l_{\mathrm{inf}},u_{\mathrm{inf}}]\), where \(0\leq l_{\mathrm{inf}}\leq u_{\mathrm{inf}}\leq 1\), if the probability of not entering \(S_u\) falls into \([l_{\mathrm{inf}},u_{\mathrm{inf}}]\) for all the trajectories from any initial state \(s_0\in S_0\), i.e., \[\forall s_0\in S_0.{\mathbb{P}}_{s_0}\left[\{\omega\in\Omega_{s_0} \mid \omega_t\not\in S_u\;\text{for all }t\in{\mathbb{N}}\}\right]\in [l_{\mathrm{inf}},u_{\mathrm{inf}}]. \notag\]*

The probabilistic safety is a quantitative property and \(l_{\mathrm{inf}},u_{\mathrm{inf}}\) are called lower and upper bounds on the safety probabilities over infinite time horizons, respectively. Once both bounds equal one, it implies the almost-sure safety. When the lower bound \(l_{\mathrm{inf}}=0\), indicating that the system reaches the unsafe region at some time step \(T<\infty\), it is significant to figure out how the safety probability decreases over the finite time horizon. Therefore, we present the probabilistic safety over finite time horizons as follows.

Definition 4 (Probabilistic Safety over Finite Time Horizons). The system \(M_\mu\) is probabilistically safe over a finite time horizon \(T\in [0,\infty)\)* with \([l_{\mathrm{fin}},u_{\mathrm{fin}}]\), where \(0\leq l_{\mathrm{fin}} \leq u_{\mathrm{fin}}\leq 1\), if the probability of not entering \(S_u\) within \(T\) falls into \([l_{\mathrm{fin}},u_{\mathrm{fin}}]\) for all the trajectories starting from any initial state \(s_0\in S_0\), \[\forall s_0\in S_0.{\mathbb{P}}_{s_0}\left[\{ \omega\in\Omega_{s_0} \mid \omega_t\not\in S_u\;\text{for all }t\le T \}\right]\in [l_{\mathrm{fin}},u_{\mathrm{fin}}]. \notag\]*

Safety Verification Problems of DNN-Controlled Systems. Consider a DNN-controlled system \(M_\mu=(S,S_0,A,\pi,f,R,\mu)\) with an unsafe set \(S_u\in S\) such that \(S_0\cap S_u=\emptyset\). We formulate the qualitative and quantitative safety verification problems of \(M_\mu\) as follows:

  1. Qualitative Verification (QV): To answer whether \(M_\mu\) is almost-surely safe.

  2. Quantitative Verification over Infinite Time Horizons (QVITH): To compute certified lower and upper bounds \(l_{\mathrm{inf}},u_{\mathrm{inf}}\) on the safety probability of \(M_\mu\) over infinite time horizons.

  3. Quantitative Verification over Finite Time Horizons (QVFTH): To compute certified lower and upper bounds \(l_{\mathrm{fin}},u_{\mathrm{fin}}\) on the safety probability of \(M_\mu\) over a finite time horizon \(T\).

-8 6 =8em Overview of Our Framework

Figure 1: Our unified verification framework.

We first provide an overview of our unified framework designed to address the three safety verification problems. Our framework builds on two fundamental results: (i) all the problems can be reduced to the task of defining BCs under specific conditions, and the defined BCs can be used to certify almost-sure safety for QV or safety bounds for QVITH and QVFTH, respectively, and (ii) these BCs can be implemented and trained in neural forms. The fundamental results are presented in , respectively.

The synthesis of NBCs has a preset timeout threshold, i.e., it will fail if NBCs cannot be successfully synthesized within the time threshold. The procedure of our framework is sketched in , which consists of the following three steps:

Step 1: QV. We try to synthesize an NBC satisfying conditions in . If such an NBC is successfully synthesized, we can conclude that the system \(M_\mu\) is almost-surely safe by  and finish the verification. Alternatively, we can resort to synthesizing a \(k\)-inductive NBC in whose conditions are weaker than those in . If the synthesis fails, we proceed to quantitative verification.

Step 2: QVITH. We try to synthesize two NBCs under the conditions in , respectively. If the synthesis fails, a timeout will be reported and the process will be terminated. Otherwise, we can obtain the lower bound \(l_{\mathrm{inf}}\) and the upper bound \(u_{\mathrm{inf}}\) on probabilistic safety over infinite time horizons. Alternatively, we can choose to synthesize the \(k\)-inductive variants of NBCs in . If the lower bound \(l_{\mathrm{inf}}\) is no less than some preset safety threshold \(\delta\in (0,1)\), we terminate the verification. The purpose of setting \(\delta\) is to prevent the verification from returning a meaningless lower bound such as \(0\). If \(l_{\mathrm{inf}}\) is less than \(\delta\), we resort to computing safety bounds over finite time horizons.

Step 3: QVFTH. We try to synthesize two NBCs satisfying conditions in , respectively. If the synthesis fails, a timeout will be reported and the verification will terminate. Otherwise, we can compute the linear lower and upper bounds on probabilistic safety over finite time horizons according to the synthesized NBCs. Alternatively, we can choose to synthesize two NBCs satisfying conditions in to achieve exponential bounds, which might be tighter than linear ones.

-8 6 =8em Qualitative and Quantitative Safety Verification

In this section, we reduce all three safety verification problems of DNN-controlled systems into a cohesive problem of defining corresponding BCs. We establish specific conditions for candidate BCs and provide formulas for computing lower and upper bounds for quantitative verification based on the defined BCs.

-8 6 =8em Qualitative Safety Verification

Theorem 1 (Almost-Sure Safety). Given an \(M_\mu\) with an initial set \(S_0\) and an unsafe set \(S_u\), if there exists a barrier certificate \(B:S\rightarrow {\mathbb{R}}\) such that for some constant \(\lambda\in (0,1]\), the following conditions hold: \[\begin{align} & B(s)\le 0 \;&\text{for all }s\in S_0, \label{eq:thm-almost-initial} \\ & B(s)>0 \;& \text{for all } s\in S_u, \label{eq:thm-almost-unsafe} \\ & B(f(s,\pi(s+\delta)))-B(s)+\lambda\cdot B(s)\le 0 & \text{for all } (s,\delta)\in S\times W, \label[condition]{eq:thm-almost-decrease} \end{align}\] {#eq: sublabel=eq:eq:thm-almost-initial,eq:eq:thm-almost-unsafe} then \(M_\mu\) is almost-surely safe, i.e., \(\forall s_0\in S_0. \; \omega\in\Omega_{s_0} \Longrightarrow \omega_t \not\in S_u \;\forall t\in{\mathbb{N}}.\)

Intuition. The BC in is similar to that in except , in which we consider all stochastic behaviors of the system from state perturbations. We prove the theorem by contradiction and the proof resembles that in [17].

-8 6 =8em Quantitative Safety Verification over Infinite Time Horizon

Below we present the state-dependent lower and upper bounds on probabilistic safety over infinite time horizons.

Theorem 2 (Lower Bounds on Infinite-time Safety). Given an \(M_\mu\) with an initial set \(S_0\) and an unsafe set \(S_u\), if there exists a barrier certificate \(B:S\rightarrow {\mathbb{R}}\) such that for some constant \(\epsilon\in [0,1]\), the following conditions hold: \[\begin{align} & B(s)\ge 0 \;& \text{for all } s\in S, \label[condition]{eq:thm-inf-nonnegative}\\ & B(s) \le \epsilon \;& \text{for all } s\in S_0, \label[condition]{eq:thm-inf-lower-initial}\\ & B(s)\ge 1 \;& \text{for all } s\in S_u, \label[condition]{eq:thm-inf-lower-unsafe}\\ & {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]-B(s)\le 0\; & \text{for all }s\in S \setminus S_u , \label[condition]{eq:thm-inf-lower-decrease} \end{align}\] then the safety probability over infinite time horizons is bounded from below by \[\forall s_0\in S_0.\; {\mathbb{P}}_{s_0}\left[\{\omega\in\Omega_{s_0}\mid \omega_t\not\in S_u \;\text{for all } t \in {\mathbb{N}}\} \right]\ge 1-B(s_0). \label{eq:thm-inf-lowerbound}\qquad{(1)}\]

Intuition. A BC under conditions in is a non-negative real-valued function satisfying the supermartingale property, i.e., the expected value of the function remains non-increasing at every time step for all states not in \(S_u\) (see ). We prove the theorem by Ville’s Inequality [34] and the proof resembles that in [17].

Theorem 3 (Upper Bounds on Infinite-time Safety). Given an \(M_\mu\) with an initial set \(S_0\) and an unsafe set \(S_u\), if there exists a barrier certificate \(B:S\rightarrow {\mathbb{R}}\) such that for some constants \(\gamma\in (0,1)\), \(0\le \epsilon'<\epsilon\le 1\), the following conditions hold: \[\begin{align} & 0\le B(s)\le 1 \;& \text{for all } s\in S, \label[condition]{eq:thm-inf-upper-bound}\\ & B(s)\ge \epsilon \;&\text{ for all } s\in S_0,\label[condition]{eq:thm-inf-upper-initial} \\ & B(s)\le \epsilon' \;&\text{ for all } s\in S_u, \label[condition]{eq:thm-inf-upper-unsafe}\\ & B(s)-\gamma\cdot{\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]\le 0\; & \text{for all }s\in S \setminus S_u, \label[condition]{eq:thm-inf-upper-decrease} \end{align}\] then the safety probability over infinite time horizons is bounded from above by \[\forall s_0\in S_0.\; {\mathbb{P}}_{s_0}\left[\{\omega\in\Omega_{s_0}\mid \omega_t\not\in S_u \;\text{for all } t\in {\mathbb{N}}\} \right]\le 1-B(s_0). \label{eq:thm-inf-upperbound}\qquad{(2)}\]

Intuition. A BC under conditions in is a bounded non-negative function satisfying the \(\gamma\)-scaled submartingale property [35], i.e., the expected value of \(B\) is increasing at each time step for states not in \(S_u\) (). We prove the theorem by Optional Stopping Theorem [36], while former work is based on fix-point theory [37].

-8 6 =8em Quantitative Safety Verification over Finite Time Horizon

When the safety probability over infinite time horizons exhibits a decline, it becomes advantageous to analyze the decreasing changes over finite time horizons. In the following, we present our theoretical results on finite-time safety verification, starting with two results related to lower bounds.

Theorem 4 (Linear Lower Bounds on Finite-time Safety). Given an \(M_\mu\) with an initial set \(S_0\) and an unsafe set \(S_u\), if there exists a barrier certificate \(B:S\to {\mathbb{R}}\) such that for some constants \(\lambda > \epsilon \ge 0\) and \(c \ge 0\), the following conditions hold: \[\begin{align} & B(s)\ge 0 \;& \text{ for all }s\in S, \label[condition]{eq:thm-fin-lower-linear-nonnegative} \\ & B(s)\le \epsilon \;&\text{for all }s\in S_0, \label[condition]{eq:thm-fin-lower-linear-initial}\\ & B(s)\ge \lambda \;& \text{for all } s\in S_u, \label[condition]{eq:thm-fin-lower-linear-unsafe}\\ & {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]-B(s)\le c\; & \text{for all }s\in S, \label[condition]{eq:thm-fin-lower-linear-decrease} \end{align}\] then the safety probability over a finite time horizon \(T\) is bounded from below by \[\label{eq:fin-lowerbound-linear} \forall s_0\in S_0.\; {\mathbb{P}}_{s_0}\left[\{\omega\in\Omega_{s_0}\mid \omega_t\not\in S_u \;\text{for all } t\le T \} \right]\ge 1-(B(s_0)+cT)/\lambda. \notag\qquad{(3)}\]

Intuition. A BC in  satisfies the c-martingale property [38], i.e., the expected value of \(B\) can increase at every time step as long as it is bounded by a constant \(c\) (), which is less conservative than the supermartingle property (), at the cost providing safety guarantees over finite time horizons. We prove the theorem by Ville’s Inequality [34] and the proof resembles that in [25].

Theorem 5 (Exponential Lower Bounds on Finite-time Safety). Given an \(M_\mu\) if there exists a function \(B:S\rightarrow {\mathbb{R}}\) such that for some constants \(\alpha> 0, \beta\in{\mathbb{R}}\), and \(\gamma\in [0,1)\), the following conditions hold: \[\begin{align} & B(s)\ge 0 \;& \text{ for all }s\in S, \label[condition]{eq:thm-fin-lower-exp-nonnegative} \\ & B(s)\le \gamma \;&\text{for all }s\in S_0,\label[condition]{eq:thm-fin-lower-exp-initial} \\ & B(s) \ge 1 \;& \text{for all } s\in S_u,\label[condition]{eq:thm-fin-lower-exp-unsafe} \\ & \alpha {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]-B(s) \le \alpha \beta \; & \text{for all }s\in S\setminus S_u.\label[condition]{eq:thm-fin-lower-exp-moni} \end{align}\] then the safety probability over a finite time horizon \(T\) is bounded from below by \[\label{eq:fin-lower-expbound} \forall s_0\in S_0.\; {\mathbb{P}}_{s_0}\left[\{ \omega\in\Omega_{s_0}\mid\omega_t\not\in S_u \;\text{for all } t\le T \} \right]\ge 1-\frac{\alpha\beta}{\alpha-1} + (\frac{\alpha\beta}{\alpha-1}-B(s_0))\cdot \alpha^{-T}.\notag\qquad{(4)}\]

Intuition. A BC in  satisfies that its \(\alpha\)-scaled expectation can increase at most \(\alpha\beta\) at every time step (Condition ([eq:thm-fin-lower-exp-moni])). We establish a new result in discrete-time DNN-controlled systems and prove it by the discrete version of Gronwall’s Inequality [39], which is inspired by former work [40] in continuous-time dynamical systems.

Then we propose our two results of upper bounds on safety probabilities.

Theorem 6 (Linear Upper Bounds on Finite-time Safety). Given an \(M_\mu\) with an initial set \(S_0\) and an unsafe set \(S_u\), if there exists a barrier function \(B:S\rightarrow {\mathbb{R}}\) such that for some constants \(\beta\in (0,1), \beta<\alpha<1+\beta, c\ge 0\), the following conditions hold: \[\begin{align} & B(s)\ge 0 \;& \text{ for all }s\in S, \label[condition]{eq:thm-fin-upper-linear-nonnegative} \\ & B(s)\le \beta \;&\text{for all }s\in S\setminus S_u,\label[condition]{eq:thm-fin-upper-exp-initial} \\ & \alpha \le B(s) \le 1+\beta \;& \text{for all } s\in S_u,\label[condition]{eq:thm-fin-upper-linear-unsafe} \\ & {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]-B(s) \ge c \; & \text{for all }s\in S\setminus S_u.\label[condition]{eq:thm-fin-upper-linear-moni} \end{align}\] then the safety probability over a finite time horizon \(T\) is bounded from above by \[\label{eq:fin-upper-linearbound} \forall s_0\in S_0.\; {\mathbb{P}}_{s_0}\left[\{\omega\in\Omega_{s_0}\mid \omega_t\not\in S_u \;\text{for all } t\le T \} \right]\le 1-B(s_0)-\frac{1}{2}c\cdot T+\beta.\notag\qquad{(5)}\]

Intuition. A BC in  is non-negative and its value is bounded when states are in \(S_u\) (). Moreover, is the inverse of the c-martingale property in , i.e., the expected value of \(B\) should increase at least \(c\) at every time step.

Theorem 7 (Exponential Upper Bounds on Finite-Time Safety). Given an \(M_\mu\) with an initial set \(S_0\) and an unsafe set \(S_u\), if there exists a barrier function \(B:S\to {\mathbb{R}}\) such that for some constants \(K'\le K<0\), \(\epsilon>0\) and a non-empty interval \([a,b]\), the following conditions hold: \[\begin{align} & B(s)\ge 0 \;&\text{for all }s\in S\setminus S_u,\label[condition]{eq:thm-fin-upper-exp-nonnegative}\\ & K'\le B(s) \le K \;& \text{for all } s\in S_u,\label[condition]{eq:thm-fin-upper-exp-unsafe}\\ & {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]-B(s)\le -\epsilon \; & \text{for all }s\in S\setminus S_u, \label[condition]{eq:thm-fin-upper-exp-moni}\\ & a\le B(f(s,\pi(s+\delta)))-B(s) \le b \;& \text{for all } s\in S\setminus S_u \text{ and } \delta \in W\label[condition]{eq:thm-fin-upper-exp-diffbound}, \end{align}\] then the safety probability over a finite time horizon \(T\) is bounded from above by \[\label{eq:fin-upper-expbound} \forall s_0\in S_0.\; {\mathbb{P}}_{s_0}\left[\{\omega\in\Omega_{s_0}\mid \omega_t\not\in S_u \;\text{for all } t\le T \} \right]\le exp(-\frac{2(\epsilon \cdot T- B(s_0))^2}{T\cdot (b-a)^2}).\notag\qquad{(6)}\]

Intuition. A BC under the is a difference-bounded ranking supermartingale [41]. is the supermartingale difference condition, i.e., the expectation of \(B\) should decrease at least \(\epsilon\) at each time step, while implies that the update of \(B\) should be bounded. We prove this theorem by Hoeffding’s Inequality on Supermartingales [42] and the proof resembles that in the work [41].

-8 6 =8em Relaxed \(k\)-Inductive Barrier Certificates

We now introduce \(k\)-inductive barrier certificates, capable of offering both qualitative and quantitative safety guarantees, while relaxing the strict conditions for safety through the utilization of the \(k\)-induction principle [26], [43]. Prior to presenting our theoretical results, we first define the notion of \(k\)-inductive update functions as follows.

Definition 5 (\(k\)-inductive Update Functions). Given an \(M_\mu=(S,S_0,A,\pi,f,R,\mu)\), a \(k\)-inductive update function \(g^k_{\pi,f}\) with respect to \(\pi,f\) is defined recursively, i.e., \[g^k_{\pi,f}(s_t,\Delta_t^k)= \begin{cases} g_{\pi,f}(g^{k-1}_{\pi,f}(s_t,\Delta_t^{k-1}), \delta_{t+k-1}) & \text{ if } k>1 \\ f(s_t,\pi(s_t+\delta_t))& \text{ if } k=1 \\ s_t & \text{ if } k=0 \end{cases}\] where \(\Delta_t^k=[ \delta_t,\delta_{t+1},\dots, \delta_{t+k-1} ]\) is a noise vector of length \(k\) with each \(\delta_t\sim\mu\), and \(g_{\pi,f}(s_t,\delta_t):=f(s_t,\pi(s_t+\delta_t))\).

Intuitively, \(g_{\pi,f}^k\) computes the value of a state after \(k\) steps given a \(k\)-dimensional noise vector \(\Delta^k\in W^k\subseteq{\mathbb{R}}^{n\times k}\), where \(W=\textsf{supp}\left({\mu}\right)\) is the support of \(\mu\). To calculate the expectation w.r.t. \(k\)-dimensional noises, we denote by \(\mu^k\) the product measure on \(W^k\).

-8 6 =8em \(k\)-Inductive Barrier Certificates for Qualitative Safety

Theorem 8 (\(k\)-inductive Variant of Almost-Sure Safety). Given an \(M_\mu\) with an initial set \(S_0\) and an unsafe set \(S_u\), if there exists a \(k\)-inductive barrier certificate \(B:S\rightarrow {\mathbb{R}}\) such that the following conditions hold: \[\begin{align} & \textstyle\bigwedge_{0\le i<k} B(g_{\pi,f}^{i}(s,\Delta^i))\le 0 \;& \forall (s,\Delta^i)\in S_0\times W^i, \label[condition]{eq:thm-almost-indc-initial} \\ & B(s)>0 \;& \forall s\in S_u, \label[condition]{eq:thm-almost-indc-unsafe} \\ &\textstyle\bigwedge_{0\le i<k} (B(g_{\pi,f}^{i}(s,\Delta^i))\le 0)\Longrightarrow B(g_{\pi,f}^k(s,\Delta^k))\le 0 & \forall (s,\Delta^i)\in S\times W^i, \label[condition]{eq:thm-almost-indc-decrease} \end{align}\] then the system \(M_\mu\) is almost-surely safe, i.e., \(\forall s_0\in S_0.\; \omega\in\Omega_{s_0} \Longrightarrow \omega_t \not\in S_u \;\forall t\in{\mathbb{N}}\).

Intuition. implies that the state sequences starting from the safe set will remain in the safe set for the next \(k-1\) consecutive time steps, while means that for any \(k\) consecutive time steps, if the system is safe, then the system will still be safe at the \((k+1)\)-th time step. We prove the theorem by contradiction.

Note that contains an implication, in order to compute the \(k\)-inductive BC, we replace it with its sufficient condition: \[-B(g_{\pi,f}^k(s,\Delta^k))-\textstyle\sum_{0\le i<k} \tau_i\cdot (-B(g_{\pi,f}^i(s,\Delta^i)))\ge 0,\;\forall (s,\Delta^i)\in S\times W^i. \label{eq:thm-almost-indc-decrease-replace}\tag{1}\] If there exist \(\tau_0,\dots,\tau_{k-1}\ge 0\) satisfying , is satisfied.

-8 6 =8em \(k\)-Inductive Barrier Certificates for Quantitative Safety

Theorem 9 (\(k\)-inductive Lower Bounds on Infinite-time Safety). Given an \(M_\mu\), if there exists a k-inductive barrier certificate \(B:S\to {\mathbb{R}}\) such that for some constants \(k\in{\mathbb{N}}_{\ge 1}\), \(\epsilon\in [0,1]\) and \(c\ge 0\), the following conditions hold: \[\begin{align} & B(s) \ge 0 \;& \text{for all } s\in S \label[condition]{eq:thm-inf-lower-induc-nonnegative} \\ & B(s)\le \epsilon \;&\text{for all }s\in S_0, \label{eq:thm-inf-lower-induc-initial} \\ & B(s)\ge 1 \;& \text{for all } s\in S_u, \label[condition]{eq:thm-inf-lower-induc-unsafe} \\ & {\mathbb{E}}_{\delta\sim\mu}[B(f(s,\pi(s+\delta)))\mid s]-B(s)\le c \; & \text{for all } s\in S , \label[condition]{eq:thm-inf-lower-induc-c-martin} \\ & {\mathbb{E}}_{\Delta^k\sim \mu^k}[B(f_{\pi,f}^k(s,\Delta^k))\mid s]-B(s)\le 0 \; & \text{for all } s\in S, \label[condition]{eq:thm-inf-lower-induc-k-martin} \end{align}\qquad{(7)}\] then the safety probability over infinite time horizons is bounded from below by \[\label{eq:inf-lower-induc-bound} \forall s_0\in S_0.\; {\mathbb{P}}_{s_0}\left[\{\omega_0\in\Omega_{s_0}\mid \omega_t\not\in S_u \;\text{ for all } t\in {\mathbb{N}}\} \right]\ge 1-kB(s_0)-\frac{k(k-1)c}{2}.\notag\qquad{(8)}\]

Intuition. requires the barrier certificate to be a c-martingale at every time step and requires the barrier certificate sampled after every \(k\)-th step to be a supermartingale. We prove the theorem by Ville’s Inequality [34].

Theorem 10 (\(k\)-inductive Upper Bounds on Infinite-time Safety). Given an \(M_\mu\), if there exists a barrier certificate \(B:S\rightarrow {\mathbb{R}}\) such that for some constant \(\gamma\in (0,1)\), \(0\le \epsilon'<\epsilon\le 1\), \(c\le 0\) the following conditions hold: \[\begin{align} & 0\le B(s)\le 1 \;& \text{for all } s\in S \label[condition]{eq:thm-inf-upper-induc-bound}\\ & B(s)\ge \epsilon \;& \text{ for all } s\in S_0, \label[condition]{eq:thm-inf-upper-induc-initial}\\ & B(s)\le \epsilon' \;& \text{ for all } s\in S_u, \label[condition]{eq:thm-inf-upper-induc-unsafe}\\ & {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]-B(s)\ge c \; & \text{ for all } s\in S, \label[condition]{eq:thm-inf-upper-induc-c-martin} \\ & B(s)-\gamma^k \cdot{\mathbb{E}}_{\Delta^k\sim \mu^k}[B(g_{\pi,f}^k(s,\Delta^k))\mid s]\le 0\; & \text{for all }s\in S \setminus S_u, \label[condition]{eq:thm-inf-upper-induc-decrease} \end{align}\] then the safety probability over infinite time horizons is bounded from above by \[\label{eq:inf-upper-induc-bound} \forall s_0\in S_0.\; {\mathbb{P}}_{s_0}\left[\{\omega\in\Omega_{s_0}\mid \omega_t\not\in X_u \;\text{ for all } t\in {\mathbb{N}}\} \right]\le 1-k B(s_0)-\frac{k(k-1)c}{2}.\notag\qquad{(9)}\]

Intuition. This BC is non-negative and bounded (Condition [eq:thm-inf-upper-induc-bound]). Condition ([eq:thm-inf-upper-induc-c-martin]) is the inverse of the c-martingale property, while Condition ([eq:thm-inf-upper-induc-decrease]) requires the barrier certificate sampled after every \(k\)-th step to be a \(\gamma^k\)-scaled submartingale. We prove the theorem by the Optional Stopping Theorem [36].

To make the probabilistic bounds in  and non-trivial, the value of \(k\) should be bounded by \[1\le k \le \frac{(c-2B(s_0))+\sqrt{4B(s_0)^2+c^2-4c(B(s_0)-2)}}{2c} .\]

-8 6 =8em Synthesis of Neural Barrier Certificates

In this section, we show that the BCs defined in the previous sections for DNN-controlled systems can be implemented and synthesized in the form of DNNs, akin to those for linear or nonlinear stochastic systems [32].

We adopt the CEGIS-based method  [33] to train and validate target NBCs. sketches the workflow. In each loop iteration, we train a candidate BC in the form of a neural network which is then passed to the validation. If the validation result is false, we compute a set of counterexamples for future training. This iteration is repeated until a trained candidate is validated or a given timeout is reached. Moreover, we propose a simulation-guided training method by adding additional terms to the loss functions to improve the tightness of upper and lower bounds calculated by the trained NBCs.

We present the synthesis of NBCs in   for probabilistic safety over infinite time horizons, as an example. We defer to Appendix the synthesis of other NBCs.

Figure 2: CEGIS-based NBC synthesis  [33].

-8 6 =8em Training Candidate NBCs

Two pivotal factors in the training phase are the generation of training data and the construction of the loss function.

Training Data Discretization. As the state space \(S\) is possibly continuous and infinite, we choose a finite set of states for training candidate NBCs. This can be achieved by discretizing the state space \(S\) and constructing a discretization \(\tilde{S}\subseteq S\) such that for each \(s \in S\), there is a \(\tilde{s} \in \tilde{S}\) with \(||s-\tilde{s}||_1 < \tau\), where \(\tau>0\) is called the granularity of \(\tilde{S}\). As \(S\) is compact and thus bounded, this discretization can be computed by simply picking the vertices of a grid with sufficiently small cells. For the re-training after validation failure, \(\tilde{S}\) will be reconstructed with counterexamples and a smaller granularity \(\tau\). Once the discretization \(\tilde{S}\) is obtained, we construct two finite sets \(\tilde{S}_0:=\tilde{S}\cap S_0\) and \(\tilde{S}_u:=\tilde{S}\cap S_u\) used for the training process.

Loss Function Construction. A candidate NBC is initialized as a neural network \(h_\theta\) w.r.t. the network parameter \(\theta\). \(h_\theta\) is trained by minimizing the following loss function: \[{\mathcal{L}}(\theta):=k_1 \cdot {\mathcal{L}}_{1}(\theta)+k_2 \cdot {\mathcal{L}}_{2}(\theta)+k_3 \cdot {\mathcal{L}}_{3}(\theta)+k_4 \cdot {\mathcal{L}}_{4}(\theta)+k_5 \cdot {\mathcal{L}}_{5}(\theta)\notag\] where \(k_i\in{\mathbb{R}}, \;i=1,\cdots,5\) are the algorithmic parameters balancing the loss terms.

The first loss term is defined via the condition in  as:

\[{\mathcal{L}}_{1}(\theta)=\frac{1}{|\tilde{S}|} \textstyle\sum\limits_{s\in \tilde{S}} \left(\mathop{\mathrm{max}}\{ 0 - h_\theta(s),0\} \right)\notag\] Intuitively, a loss will incur if either \(h_\theta(s)\) is less than zero for any \(s\in \tilde{S}\).

Correspondingly, the second and third loss terms are defined via as: \[\begin{align} {\mathcal{L}}_{2}(\theta)=\frac{1}{|\tilde{S}_0|} \textstyle\sum\limits_{s\in \tilde{S}_0} \left(\mathop{\mathrm{max}}\{ h_\theta(s) - \epsilon ,0\} \right), and{\mathcal{L}}_{3}(\theta)=\frac{1}{|\tilde{S}_u|} \textstyle\sum\limits_{s\in \tilde{S}_u} \left(\mathop{\mathrm{max}}\{ 1 - h_\theta(s),0\} \right).\notag \end{align}\] The fourth loss term is defined via the condition in  as: \[{\mathcal{L}}_{4}(\theta)=\frac{1}{|\tilde{S} \setminus \tilde{S}_u|}\textstyle\sum\limits_{s\in \tilde{S} \setminus \tilde{S}_u} \left( \mathop{\mathrm{max}}\{ \textstyle\sum\limits_{s'\in \mathcal{D}_s}\frac{h_\theta(s')}{N} -h_\theta(s)+\zeta ,0\} \right)\notag\] where for each \(s\in \tilde{S} \setminus \tilde{S}_u\), \(\mathcal{D}_s\) is the set of its successor states such that \(\mathcal{D}_s:=\{s'\mid s'=f(s,\pi(s+\delta_i)), \delta_i\sim \mu, i\in [1,N] \}\), \(N>0\) is the sample number of successor states. We use the mean of \(h_\theta(\cdot)\) at the \(N\) successor states to approximate the expected value \({\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))]\) for each \(s\in \tilde{S} \setminus \tilde{S}_u\), and \(\zeta>0\) to tighten the condition.

Simulation-guided Loss Term. A trained BC that satisfies the above four conditions can provide lower bounds on probabilistic safety over infinite time horizons for the system. However, these conditions have nothing to do with the tightness of lower bounds and we may obtain a trivial zero-valued lower bound by the trained BC.

To assure the tightness of lower bounds from trained NBCs, we propose a simulation-guided method based on . For each \(s_0 \in \tilde{S}_0\), we execute the control system \(N'>0\) episodes, and calculate the safety frequency \(\varmathbb{f}_s\) of all the \(N'\) trajectories over infinite time horizons. Based on the statistical results, the last loss term is defined as: \[{\mathcal{L}}_{5}(\theta)=\frac{1}{|\tilde{S}_0|}\textstyle\sum\limits_{s\in \tilde{S}_0} \left( \mathop{\mathrm{max}}\{\varmathbb{f}_s + h_\theta(s) - 1,0\}\right)\notag\] Intuitively, this term is to enforce the value of the derived lower bound to approach the statistical result as closely as possible, ensuring its tightness.

We emphasize that our simulation-guided method plus the NBC validation (see next section) is sound, as we will validate the trained BC to ensure it satisfies all the BC conditions (see also ).

-8 6 =8em NBC Validation

A candidate NBC \(h_\theta\) is valid if it meets the . The first three conditions condition can be checked by the following constraint \[\mathop{\mathrm{inf}}\limits_{s\in S} h_\theta(s)\ge 0 \; \land \;\mathop{\mathrm{sup}}\limits_{s\in S_0} h_\theta(s)\le \epsilon \; \land \;\mathop{\mathrm{inf}}\limits_{s\in S_u} h_\theta(s)\ge 1 \notag\] using the interval bound propagation approach  [44], [45]. When any state violates the above equation, it is treated as a counterexample and added to \(\tilde{S}\) for future training.

For ,   reduces the validation from infinite states to finite ones, which are easier to check.

Theorem 11. Given an \(M_{\mu}\) and a function \(B:S\rightarrow{\mathbb{R}}\), we have \({\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]-B(s)\le 0\) for any state \(s\in S \setminus S_u\) if the formula below \[\begin{align} {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))\mid \tilde{s}]\le B(\tilde{s})-\zeta \label{equ:validation95NBC} \end{align}\qquad{(10)}\] holds for any state \(\tilde{s}\in \tilde{S} \setminus \tilde{S}_u\), where \(\zeta= \tau\cdot L_B\cdot (1+L_f\cdot (1+L_\pi))\) with \(L_f,L_\pi, L_B\) being the Lipschitz constants of \(f,\pi\) and \(B\), respectively.

To check the satisfiablility of in \(h_\theta\) and a state \(\tilde{s}\), we need to compute the expected value \({\mathbb{E}}_{\delta\sim \mu}[h_\theta(f(\tilde{s},\pi(\tilde{s}+\delta)))\mid \tilde{s}]\). However, it is difficult to compute its closed form because \(h_\theta\) is provided in the form of neural networks. Hence, We bound the expected value \({\mathbb{E}}_{\delta\sim \mu}[h_\theta(f(\tilde{s},\pi(\tilde{s}+\delta)))\mid \tilde{s}]\) via interval arithmetic [44], [45] instead of computing it, which is inspired by the work [28], [46]. In particular, given the noise distribution \(\mu\) and its support \(W =\{\delta \in {\mathbb{R}}^n \;| \;\mu(\delta) > 0 \}\), we first partition \(W\) into finitely \(m\ge 1\) cells, i.e., \(\mathrm{cell(W)} = \{W_1,\cdots,W_m\}\), and use \(\mathrm{maxvol} = \mathop{\mathrm{max}}\limits_{W_i \in \mathrm{cell(W)}} \mathrm{vol}(W_i)\) to denote the maximal volume with respect to the Lebesgue measure of any cell in the partition, respectively. For the expected value in , we bound it from above: \[\begin{align} \label{eq:overes-up} {\mathbb{E}}_{\delta\sim \mu}[h_\theta(f(\tilde{s},\pi(\tilde{s}+\delta)))\mid \tilde{s}]\le \textstyle\sum\limits_{W_i\in \mathrm{cell(W)}} \mathop{\mathrm{maxvol}} \cdot \mathop{\mathrm{sup}}_{\delta}F(\delta) \notag \end{align}\tag{2}\] where, \(F(\delta)=h_\theta(f(\tilde{s},\pi(\tilde{s}+\delta)))\). The supremum can be calculated via interval arithmetic. We refer interested readers to [46] and [28] for more details.

Theorem 12 (Soundness). If a trained NBC is valid, it can certify the almost-sure safety for the qualitative verification, or the derived bound by the NBC is a certified lower/upper bound on the safety probability for the quantitative case.

The proof of soundness is straightforward by the NBC validation.

-8 6 =8em Evaluation

Our experimental goals encompass evaluating the effectiveness of (i) the qualitative and quantitative verification methods within our framework, (ii) the \(k\)-inductive BCs, and (iii) the simulation-guided training method, respectively.

-8 6 =8em Benchmarks and Experimental Setup

We assess the effectiveness of our approach on four classic DNN-controlled tasks from public benchmarks: Pendulum and Cartpole from the DRL training platform OpenAI’s Gym [47], while B1 and Tora commonly used by the state-of-the-art safety verification tools [48]. All experiments are executed on a workstation running Ubuntu 18.04, with a 32-core AMD Ryzen Threadripper CPU, 128GB RAM, and a single 24564MiB GPU .

For the safety verification of DNN-controlled systems, we consider state perturbations of uniform noises with zero means and different radii. Specifically, for each state \(s = (s_1,\ldots,s_n)\), we add noises \(X_1,\ldots,X_n\) to each dimension of \(s\) and obtain the perturbed state \((s_1 + X_1,\ldots,s_n + X_n)\), where \(X_i \sim \mathbf{U}(-r, r)\) (\(1\le i \le n\), \(r\ge 0\)).

r0.44

1pt

Table 1: Qualitative verification results.
Task Perturbation Verification \(k\). #Fail.
CP 0 \(✔\) 1 0
\(r=0.01\) Unknown 1 0
\(r=0.01\) \(✔\) 2 0
\(r=0.03\) Unknown 1 207
PD \(r=0\) \(✔\) 1 0
\(r=0.01\) Unknown 1 675
\(r=0.03\) Unknown 1 720
Tora \(r=0\) \(✔\) 1 0
\(r=0.02\) Unknown 1 0
\(r=0.02\) \(✔\) 2 0
\(r=0.04\) Unknown 1 1113
B1 \(r=0\) \(✔\) 1 0
\(r=0.1\) \(✔\) 1 0
\(r=0.2\) Unknown 1 43

For qualitative evaluations, the existence of an NBC in  can ensure the almost-sure safety of the whole system. Due to the data sparsity of an initial state, we randomly choose 10,000 initial states (instead of a single one) from the initial set \(S_0\). For quantitative evaluations, to measure the quantitative safety probabilities from the system level, we calculate the mean values of lower/upper bounds by NBCs on these 10,000 states under different perturbations. The correctness of such system-level safety bounds is witnessed by  as each lower/upper bound on a single state \(s_0\) is a certified bound for the exact safety probability from \(s_0\), and thus the same holds on the system level. We also simulate 10,000 episodes starting from each of these 10,000 initial states under different perturbations and use the statistical results as the baseline.

-8 6 =8em Effectiveness of Qualitative Safety Verification

shows the qualitative verification results under different perturbation radii \(r\)’s and induction bounds \(k\)’s. Given a perturbed DNN-controlled system, we verify its qualitative safety by training an NBC under the conditions in . Once such an NBC is trained and validated, the system is verified to be almost-surely safe, marked as \(✔\). If no valid barrier certificates are trained within a given timeout, the result is marked as Unknown.

Figure 3: The certified upper and lower bounds over infinite (a-d) and finite (e-h) time horizons, respectively, and their comparison with the simulation results.

As for simulation, we record the number of those episodes where the system enters the unsafe region, marked as the column #Fail. in the table. We can observe that for the systems that are successfully verified by NBCs, no failed episodes are detected by simulation. For systems with failed episodes by simulation, no corresponding NBCs can be trained and validated. The consistency experimentally reflects the effectiveness of our approach.

Furthermore, we note that for CP with \(r=0.01\) and Tora with \(r =0.02\), there are no failed episodes, but no NBCs in can be synthesized for these systems. By applying , we find the \(2\)-inductive NBCs, which ensures the safety of the systems. It demonstrates that \(k\)-inductive variants can relax the conditions of NBCs and thus ease the synthesis of valid NBCs for qualitative safety verification.

As the perturbation radius increases, ensuring almost-sure safety becomes challenging, and qualitative verification only results in the conclusion of Unknown. Consequently, we proceed to conduct quantitative verification over infinite time horizons.

-8 6 =8em Effectiveness of Quantitative Safety Verification over Infinite Time Horizon

(a-d) show the certified upper and lower bounds and simulation results (i.e., black lines marked with ‘\(\bullet\)’) over infinite time horizons. The red lines marked with ‘\(\blacksquare\)’ and blue lines marked with ‘+’ represent the mean values of the lower bounds in and the upper bounds in on the chosen 10,000 initial states calculated by the corresponding NBCs, respectively. The purple lines marked with ‘\(\blacktriangle\)’ and green lines marked with ‘\(\times\)’ represent the mean values of the \(2\)-inductive upper and lower bounds calculated by the corresponding NBCs in , respectively. We can find that the certified bounds enclose the simulation outcomes, demonstrating the effectiveness of our trained NBCs.

r0.45 2pt

Table 2: Synthesis time for different NBCs.
Task Lower \(2\)-Lower Upper \(2\)-Upper
CP 2318.5 1876.0 2891.9 2275.3
PD 1941.6 1524.0 2282.7 1491.5
Tora 280.3 218.5 895.1 650.7
B1 587.4 313.6 1127.3 840.1

shows a comparison of average synthesis time (in seconds) for different NBCs. We observe that the synthesis time of \(2\)-inductive NBCs is 25% faster than that of normal NBCs, at the sacrifice of tightness. Note that the tightness of certified bounds depends on specific systems and perturbations. Investigating what factors influence the tightness to yield tighter bounds is an interesting future work to explore.

Approaching zero for infinite time horizons, the lower bounds indicate a declining trend in safety probabilities over time. Therefore, we proceed to conduct quantitative verification over finite time horizons, providing both linear and exponential lower and upper bounds.

-8 6 =8em Effectiveness of Quantitative Safety Verification over Finite Time Horizon

(e-h) depict the certified upper and lower bounds and simulation results (i.e., black lines) over finite time horizons from the system level. Fix a sufficiently large noise level for each system, the x-axis represents the time horizon, while the y-axis corresponds to the safety probabilities. The purple lines and blue lines represent the mean values of the exponential lower and upper bounds calculated by the corresponding NBCs in   and , respectively. The red lines and green lines represent the mean values of the linear lower and upper bounds calculated by the corresponding NBCs in   and , respectively. The results indicate that our computed certified bounds encapsulate the statistical outcomes. Moreover, the exponential upper bounds are always tighter than the linear upper bounds, and the exponential lower bounds become tighter than the linear ones with the increase of time. It is worth exploring the factors to generate tighter results in future work.

-8 6 =8em Effectiveness of Simulation-guided Loss Term

Figure 4: The certified bounds w/ and w/o simulation-guided loss terms over infinite time horizons.

The simulation-guided loss term is proposed in to tighten the certified bounds calculated by NBCs. To evaluate its effectiveness, we choose NBCs in , and train them with and without the simulation-guided loss terms. The comparison between them is shown in . The red lines marked with ‘\(\blacksquare\)’ and blue lines marked with ‘+’ represent the mean values of the bounds in on initial states calculated by the corresponding NBCs trained with the simulation-guided loss terms, respectively. The purple lines with ‘\(\blacktriangle\)’ and green lines with ‘\(\times\)’ represent the mean values of the bounds calculated by the NBCs trained without the simulation-guided loss terms. Apparently, the upper and lower bounds derived by NBCs trained without the simulation-guided loss terms are looser than the bounds trained with these terms. Specifically, the results computed by NBCs with simulation-guided loss terms can achieve an average improvement of 47.5% for lower bounds and 31.7% for upper bounds, respectively. Hence, it is fair to conclude that accounting for simulation-guided loss terms is essential when conducting quantitative safety verification.

-8 6 =8em Related Work

Barrier Certificates for Stochastic Systems. Our unified safety verification framework draws inspiration from research on the formal verification of stochastic systems employing barrier certificates. Prajna et al. [17], [29], [30] propose the use of barrier certificates in the safety verification of stochastic systems. This idea has been further expanded through data-driven approaches [49] and \(k\)-inductive variants [25]. As the dual problem of computing safety probabilities, computing reachability probabilities in stochastic dynamical systems has been studied for both infinite [50][52] and finite time horizons [40], [53]. Probabilistic programs, viewed as stochastic models, have their reachability and termination probabilities investigated using proof rules [54] and Martingale-based approaches [41], [55], [56], where the latter are subsequently unified through order-theoretic fixed-point approaches [35], [37], [57].

Formal Verification of DNN-controlled Systems. Modeling DNN-controlled systems as Markov Decision Processes (MDPs) and verifying these models using probabilistic model checkers, such as PRISM [11] and Storm [12], constitutes a quantitative verification approach. Bacci and Parker [14], [58] employ abstract interpretation to construct interval MDPs and yield safety probabilities within bounded time. Carr et al. [59] propose probabilistic verification of DNN-controlled systems by constraining the analysis to partially observable finite-state models. Amir et al. propose a scalable approach based on DNN verification techniques to first support complex properties such as liveness [60].

Reachability analysis is a pivotal qualitative approach in the safety verification of DNN-controlled systems. Bacci et al. [61] introduce a linear over-approximation-based method for calculating reachable set invariants over an infinite time horizon for DNN-controlled systems. Other reachability analysis approaches, such as Verisig [48] and Polar [62], focus solely on bounded time. These approaches do not consider perturbations as they assume actions on states to be deterministic.

Barrier Certificates for Training and Verifying DNN Controllers. BC-based methods [23], [31] have recently been investigated for training and verifying DNN controllers. The key idea is to train a safe DNN controller through interactive computations of corresponding barrier certificates to ensure qualitative safety [63], [64]. Existing BC-based approaches for the verification of DNN-controlled systems focus solely on qualitative aspects but neglect the consideration of perturbations [65], [66]. Our approach complements them by accommodating the inherent uncertainty in DNN-controlled systems.

-8 6 =8em Conclusion and Future Work We have systematically studied the BC-based qualitative and quantitative safety verification of DNN-controlled systems. This involves unifying and transforming the verification problems into a general task of training corresponding neural certificate barriers. We have also defined the conditions that a trained certificate should satisfy, along with the corresponding lower and upper bounds presented in both linear and exponential forms and \(k\)-inductive variants. Through the unification of these verification problems, we have established a comprehensive framework for delivering various safety guarantees, whether qualitatively or quantitatively, in a unified manner.

Our framework sheds light on the quest for scalable and multipurpose safety verification of DNN-controlled systems. It accommodates both qualitative and quantitative aspects in verified results, spans both finite and infinite time horizons, and encompasses certified bounds presented in both linear and exponential forms. Our work also showcases the potential to circumvent verification challenges posed by DNN controllers. From our experiments, we acknowledge that both qualitative and quantitative verification results are significantly dependent on the quality of the trained NBCs. Our next step is to explore more sophisticated deep learning methods to train valid NBCs for achieving more precise verification results.

-8 6 =8em *AcknowledgmentsWe thank the anonymous reviewers of CAV 2024 for their valuable comments. The work has been supported by the NSFC Programs (62161146001, 62372176), Huawei Technologies Co., Ltd., the Shanghai International Joint Lab (22510750100), the Shanghai Trusted Industry Internet Software Collaborative Innovation Center, and the National Research Foundation, Singapore, under its RSS Scheme (NRF-RSS2022-009).

Appendix

-8 6 =8em Probability Theory

We start by reviewing some notions from probability theory. Random Variables and Stochastic Processes. A probability space is a triple (\(\Omega, {\cal F}, {\mathbb{P}}\)), where \(\Omega\) is a non-empty sample space, \({\cal F}\) is a \(\sigma\)-algebra over \(\Omega\), and \(\mathbb{P}(\cdot)\) is a probability measure over \(\cal F\), i.e. a function \(\mathbb{P}\): \({\cal F} \rightarrow [0,1]\) that satisfies the following properties: (1) \({\mathbb{P}}(\emptyset)=0\), (2)\({\mathbb{P}}(\Omega - A)=1-{\mathbb{P}}[A]\), and (3) \({\mathbb{P}} (\textstyle\cup_{i=0}^{\infty} A_i)= \textstyle\sum_{i=0}^{\infty}{\mathbb{P}}(A_i)\) for any sequence \(\{A_i\}_{i=0}^{\infty}\) of pairwise disjoint sets in \({\cal F}\).

Given a probability space (\(\Omega, {\cal F}, {\mathbb{P}}\)), a random variable is a function \(X: \Omega \rightarrow {\mathbb{R}} \cup \{\infty\}\) that is \(\cal F\)-measurable, i.e., for each \(a \in {\mathbb{R}}\) we have that \(\{\omega \in \Omega | X(\omega) \leq a \} \in {\cal F}\). Moreover, a discrete-time stochastic process is a sequence \(\{X_n\}_{n=0}^{\infty}\) of random variables in (\(\Omega, {\cal F}, {\mathbb{P}}\)).

Conditional Expectation. Let (\(\Omega, {\cal F}, {\mathbb{P}}\)) be a probability space and \(X\) be a random variable in (\(\Omega, {\cal F}, {\mathbb{P}}\)). The expected value of the random variable \(X\), denoted by \({\mathbb{E}}[X]\), is the Lebesgue integral of \(X\) wrt \(\mathbb{P}\). If the range of \(X\) is a countable set \(A\), then \({\mathbb{E}}[X]= \textstyle\sum_{\omega \in A}\omega \cdot {\mathbb{P}}(X=\omega)\). Given a sub-sigma-algebra \({\cal F}' \subseteq {\cal F}\), a conditional expectation of \(X\) for the given \({\cal F}'\) is a \({\cal F}'\)-measurable random variable \(Y\) such that, for any \(A \in {\cal F}'\), we have: \[\begin{align} {\mathbb{E}}[X \cdot {\mathbb{I}}_{A}]={\mathbb{E}}[Y \cdot {\mathbb{I}}_{A}] \end{align}\]

Here, \({\mathbb{I}}_{A}: \Omega \rightarrow \{0,1\}\) is an indicator function of \(A\), defined as \({\mathbb{I}}_{A}(\omega)=1\) if \(\omega \in A\) and \({\mathbb{I}}_{A}(\omega)=0\) if \(\omega \notin A\). Moreover, whenever the conditional expectation exists, it is also almost-surely unique, i.e., for any two \({\cal F}'\)-measurable random variables \(Y\) and \(Y'\) which are conditional expectations of \(X\) for given \({\cal F}'\), we have that \({\mathbb{P}}(Y=Y')=1\).

Filtrations and Stopping Times. A filtration of the probability space (\(\Omega, {\cal F}, {\mathbb{P}}\)) is an infinite sequence \(\{{{\cal F}_n}\}_{n=0}^{\infty}\) such that for every \(n\), the triple (\(\Omega, {\cal F}_n, {\mathbb{P}}\)) is a probability space and \({\cal F}_n \subseteq {\cal F}_{n+1} \subseteq {\cal F}\). A stopping time with respect to a filtration \(\{{{\cal F}_n}\}_{n=0}^{\infty}\) is a random variable \(T: \Omega \rightarrow {\mathbb{N}}_0 \cup \{\infty\}\) such that, for every \(i \in {\mathbb{N}}_0\), it holds that \(\{\omega \in \Omega |T(\omega) \leq i \} \in {\cal F}_i\). Intuitively, \(T\) returns the time step at which some stochastic process shows a desired behavior and should be “stopped”.

A discrete-time stochastic process \(\{X_n\}_{n=0}^{\infty}\) in (\(\Omega, {\cal F}, {\mathbb{P}}\)) is adapted to a filtration \(\{{{\cal F}_n}\}_{n=0}^{\infty}\), if for all \(n \geq 0\), \(X_n\) is a random variable in (\(\Omega, {\cal F}_n, {\mathbb{P}}\)).

Stopped Process. Let \(\{X_n\}_{n=0}^{\infty}\) be a stochastic process adapted to a filtration \(\{\mathcal{F}_n\}_{n=0}^\infty\) and let \(T\) be a stopping time w.r.t. \(\{\mathcal{F}_n\}_{n=0}^\infty\). The stopped process \(\{\tilde{X}_n\}_{n=0}^\infty\) is defined by \[\tilde{X}_n= \begin{cases} X_n, & \text{ for } n<T, \\ X_T, & \text{ for } n\ge T. \end{cases}\]

Martingales. A discrete-time stochastic process \(\{X_n\}_{n=0}^\infty\) to a filtration \(\{{{\cal F}_n}\}_{n=0}^{\infty}\) is a martingale (resp. supermartingale, submartingale) if for all \(n \geq 0\), \({\mathbb{E}}[|X_n|] < \infty\) and it holds almost surely (i.e., with probability 1) that \({\mathbb{E}}[X_{n+1}|{\cal F}_n] = X_n\) (resp. \({\mathbb{E}}[X_{n+1}|{\cal F}_n] \leq X_n\), \({\mathbb{E}}[X_{n+1}|{\cal F}_n] \geq X_n\)).

Ranking Supermartingales. Let \(T\) be a stopping time w.r.t. a filtration \(\{{{\cal F}_n}\}_{n=0}^{\infty}\). A discrete-time stochastic process \(\{X_n\}_{n=0}^\infty\) w.r.t. a stopping time \(T\) is a ranking supermartingale (RSM) if for all \(n \geq 0\), \({\mathbb{E}}[|X_n|] < \infty\) and there exists \(\epsilon >0\) such that it holds almost surely (i.e., with probability 1) that \(X_n \geq 0\) and \({\mathbb{E}}[X_{n+1}|{\cal F}_n] \leq X_n-\epsilon \cdot {\mathbb{I}}_{T>n}\).

Theorem 13 (Hoeffding’s Inequality on Supermartingales [42]). Let \(\{X_n\}_{n\in{\mathbb{N}}_0}\) be a supermartingale w.r.t. a filtration \(\{\mathcal{F}_n\}_{n\in{\mathbb{N}}_0}\), and \(\{[a_n,b_n]\}_{n\in{\mathbb{N}}_0}\) be a sequence of non-empty intervals in \({\mathbb{R}}\). If \(X_0\) is a constant random variable and \(X_{n+1}-X_{n}\in [a_n,b_n]\) a.s. for all \(n\in{\mathbb{N}}_0\), then \[{\mathbb{P}}(X_n-X_0\ge \lambda)\le \mathrm{exp}\left(-\frac{2\lambda^2}{\sum_{k=0}^{n-1}(b_k-a_k)^2} \right)\] for all \(n\in{\mathbb{N}}_0\) and \(\lambda>0\).

Theorem 14 (Optional Stopping Theorem (OST) [36]). Consider a stopping time \(T\) w.r.t. a filtration \(\{\mathcal{F}_n\}_{n=0}^{\infty}\) and a martingale (resp. supermartingale, submartingale) \(\{X_n\}_{n=0}^{\infty}\) adapted to \(\{\mathcal{F}_n\}_{n=0}^{\infty}\). Then \({\mathbb{E}}[|X_T|]<\infty\) and \({\mathbb{E}}[X_T]={\mathbb{E}}[X_0]\) (resp. \({\mathbb{E}}[X_T]\le {\mathbb{E}}[X_0]\), \({\mathbb{E}}[X_T]\ge {\mathbb{E}}[X_0]\)) if one of the following conditions holds:

  • \(T\) is bounded, i.e., \(T<c\) for some constant \(c\);

  • \({\mathbb{E}}[T]<\infty\), and there exists a constant \(c\) such that for all \(n\in {\mathbb{N}}\), \(|X_{n+1}-X_n|\le c\);

  • The stopped process \(\{\tilde{X}_n\}_{n=0}^{\infty}\) w.r.t. \(T\) is bounded, i.e., there exists some constant \(c\) such that \(|\tilde{X}_n|\le c\) for all \(n\in{\mathbb{N}}\).

Theorem 15 (Ville’s Inequality [34]). For any nonnegative supermartingale \(\{X_n\}_{n=0}^\infty\) and any real number \(c>0\), \[{\mathbb{P}}[\mathrm{sup}_{n\ge 0} X_n\ge c]\le \frac{{\mathbb{E}}[X_0]}{c}.\]

This theorem is also called Doob’s nonnegative supermartingale inequality.

Theorem 16 (Discrete Version of Gronwall’s Inequality [39]). Consider a real number sequence \(\{u_n\}_{n=0}^\infty\) such that \[u_{n+1}\le a\cdot u_n +b\;\forall n\ge 0\] where \(a>0\) and \(b\in{\mathbb{R}}\). Then \[u_{n+1}\le a^n \cdot u_0+\sum_{k=1}^n a^{n-k}\cdot b\;\forall n\ge 0\]

-8 6 =8em Supplementary Materials for

Below we fix a DNN-controlled system \(M_\mu=(S,S_0,A,\pi,f,R,\mu)\) with an unsafe set \(S_u\in S\) such that \(S_0\cap S_u=\emptyset\). We also define \(\mathrm{1}_A(x)=1\) if \(x\in A\) and \(\mathrm{1}_A(x)=0\) if \(x\not\in A\), where \(x\in{\mathbb{R}}\) is a random variable and \(A\subseteq {\mathbb{R}}\) is a set.

-8 6 =8em Proof for

(Almost-Surely Safety). Given an \(M_\mu\) with an initial set \(S_0\) and an unsafe set \(S_u\), if there exists a barrier certificate \(B:S\rightarrow {\mathbb{R}}\) such that for some constant \(\lambda\in (0,1]\), the following conditions hold: \[\begin{align} & B(s)\le 0 \;&\text{for all }s\in S_0, & } \\ & B(s)>0 \;& \text{for all } s\in S_u, & } \\ & B(f(s,\pi(s+\delta)))-B(s)+\lambda\cdot B(s)\le 0 & \text{for all } (s,\delta)\in S\times W, & } \end{align}\] then the system \(M_\mu\) is almost-surely safe, i.e., \(\forall s_0\in S_0. \omega\in\Omega_{s_0} \Longrightarrow \omega_t \not\in S_u \;\forall t\in{\mathbb{N}}.\)

Proof. We prove it by contradiction. Assume that there exists a barrier certificate \(B\) satisfying conditions (?? )-([eq:thm-almost-decrease]), but the system is unsafe, i.e., there are a time step \(T>0\) and an initial state \(s_0\in S_0\) such that a trajectory \(\omega\in\Omega_{s_0}\) starting from \(s_0\) satisfies \(\omega_t \in S\) for all \(t\in [0,T]\) and \(\omega_T\in S_u\). Condition ([eq:thm-almost-decrease]) implies that for any state \(s\in S\) such that \(B(s)\le 0\) and a noise \(\delta \in W\), the value of \(B\) at the next step is no more than zero, i.e., \(B(f(s,\pi(s+\delta)))\le 0\). As a result, \(B(\omega_T)\) must be no more than zero, which is contradictory to Condition (?? ). Therefore, the system with a BC is almost-surely safe. 0◻ ◻

-8 6 =8em Proofs for

(Lower Bounds on Infinite-time Safety). Given an \(M_\mu\) with an initial set \(S_0\) and an unsafe set \(S_u\), if there exists a barrier certificate \(B:S\rightarrow {\mathbb{R}}\) such that for some constant \(\epsilon\in [0,1]\), the following conditions hold: \[\begin{align} & B(s)\ge 0 \;& \text{for all } s\in S, }\\ & B(s) \le \epsilon \;& \text{for all } s\in S_0, }\\ & B(s)\ge 1 \;& \text{for all } s\in S_u, } \\ & {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]-B(s)\le 0\; & \text{for all }s\in S \setminus S_u , } \end{align}\] then the safety probability over infinite time horizons is bounded from below by \[\forall s-0\in S_0. {\mathbb{P}}_{s_0}\left[\{\omega\in\Omega_{s_0}\mid \omega_t\not\in S_u \;\text{for all } t\in \in {\mathbb{N}}\} \right]\ge 1-B(s_0). }\]

Proof. First, we construct a stochastic process \(\{X_t\}_{t\ge 0}\) over the set \(\Omega\) of all trajectories such that \(X_t(\omega):=B(\omega_t)\) for any \(\omega\in\Omega\). Let \(\kappa\) be the first time that a trajectory enters the unsafe set \(S_u\), i.e., \(\kappa(\omega):=\mathrm{inf} \{t\in{\mathbb{N}}\mid \omega_t \in S_u\}\). Then we define the stopped process \(\{\tilde{X}_t\}_{t\ge 0}\) by \[\tilde{X}_t= \begin{cases} X_t, & \text{ for } t<\kappa, \\ X_\kappa, & \text{ for } t\ge \kappa. \end{cases}\] Next, we want to prove that \(\{\tilde{X}_t\}_{t\ge 0}\) is a non-negative supermartingale. For \(t<\kappa\), we have that \[\begin{align} {\mathbb{E}}[\tilde{X}_{t+1}\mid \tilde{\mathcal{F}}_t] &= {\mathbb{E}}[X_{t+1}\mid \mathcal{F}_t] \le X_t = \tilde{X}_t, \end{align}\] where the inequality is obtained by Condition ([eq:thm-inf-lower-decrease]). For \(t\ge \kappa\), we have that \[\begin{align} {\mathbb{E}}[\tilde{X}_{t+1}\mid \tilde{\mathcal{F}}_t] &= X_\kappa=\tilde{X}_t. \end{align}\] Combined with Condition ([eq:thm-inf-nonnegative]), we can conclude that \(\{\tilde{X}_t\}_{t\ge 0}\) is a non-negative supermartingale. According to Condition ([eq:thm-inf-lower-unsafe]), we have that \(S_u\subseteq \{s\in S\mid B(s)\ge 1 \}\). Therefore, for any initial state \(s_0\in S_0\), it follows that \[\begin{align} & {\mathbb{P}}_{s_0}\left[ \{ \omega\in\Omega_{s_0}\mid \omega_t\in S_u \text{ for some } t\in{\mathbb{N}}\} \right] \\ \le \; & {\mathbb{P}}_{s_0}\left[ \{ \omega\in\Omega_{s_0}\mid \mathrm{sup}_{t\in{\mathbb{N}}} X_t(\omega)\ge 1 \} \right] \\ = \;& {\mathbb{P}}_{s_0}\left[ \{ \omega\in\Omega_{s_0}\mid \mathrm{sup}_{t\in{\mathbb{N}}} \tilde{X}_t(\omega)\ge 1 \} \right] \\ \le \;& \tilde{X}_0 (\omega)= X_0(\omega) \end{align}\] where the last inequality is derived by Ville’s Inequality (see ). Finally, by means of complementation, we have the lower bound of . 0◻ ◻

(Upper Bounds on Infinite-time Safety). Given an \(M_\mu\) with an initial set \(S_0\) and an unsafe set \(S_u\), if there exists a barrier certificate \(B:S\rightarrow {\mathbb{R}}\) such that for some constants \(\gamma\in (0,1)\), \(0\le \epsilon'<\epsilon\le 1\), the following conditions hold: \[\begin{align} & 0\le B(s)\le 1 \;& \text{for all } s\in S, }\\ & B(s)\ge \epsilon \;&\text{ for all } s\in S_0, } \\ & B(s)\le \epsilon' \;&\text{ for all } s\in S_u, }\\ & B(s)-\gamma\cdot{\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]\le 0\; & \text{for all }s\in S \setminus S_u, } \end{align}\] then the safety probability over infinite time horizons is bounded from above by \[\forall s_0\in S_0. {\mathbb{P}}_{s_0}\left[\{ \omega\in\Omega_{s_0}\mid \omega_t\not\in S_u \;\text{for all } t\in {\mathbb{N}}\} \right]\le 1-B(s_0). }\]

Proof. Construct a stochastic process \(\{X_t\}_{t\ge 0}\) over the set \(\Omega\) of all trajectories such that \(X_t(\omega):=B(s_t)\) for any \(\omega\in\Omega\). Let \(\kappa\) be a stopping time at which the state first enters \(S_u\), i.e., \(\kappa(\omega):=\mathrm{inf} \{t\in{\mathbb{N}}\mid \omega_t \in S_u\}\). Then construct a new stochastic process \(\{Y_t\}_{t\ge 0}\) such that \(Y_t:=\gamma^t X_t\). The stopped process \(\{\tilde{Y}_t\}_{t\ge 0}\) is defined by \[\tilde{Y}_t= \begin{cases} Y_t, & \text{ for } t<\kappa, \\ Y_\kappa, & \text{ for } t\ge \kappa. \end{cases}\] Next, we want to prove that \(\{\tilde{Y}_t\}_{t\ge 0}\) is a submartingale. For \(t<\kappa\), we have that \[\begin{align} {\mathbb{E}}[\tilde{Y}_{t+1}\mid \tilde{\mathcal{F}}_t] &= {\mathbb{E}}[Y_{t+1}\mid \mathcal{F}_t] \\ &= {\mathbb{E}}[\gamma^{t+1}X_{t+1}\mid \mathcal{F}_t]=\gamma^{t+1} {\mathbb{E}}[X_{t+1}\mid \mathcal{F}_t] \\ &= \gamma^t\cdot \gamma{\mathbb{E}}[X_{t+1}\mid \mathcal{F}_t] \ge \gamma^t X_t=\tilde{Y}_t \end{align}\] where the inequality is obtained by Condition ([eq:thm-inf-upper-decrease]). For \(t\ge \kappa\), we have that \[\begin{align} {\mathbb{E}}[\tilde{Y}_{t+1}\mid \tilde{\mathcal{F}}_t]=Y_\kappa= \tilde{Y}_t \end{align}\] By Condition ([eq:thm-inf-upper-bound]), \(\tilde{Y}_t\in [0,1]\). Thus, \(\{\tilde{Y}_t\}_{t\ge 0}\) is a submartingale satisfying the third condition of the Optional Stopping Theorem (OST) (see ). By applying OST, we obtain that \[\begin{align} X_0&=&{\mathbb{E}}[\tilde{Y}_0]\le {\mathbb{E}}[\tilde{Y}_\kappa]={\mathbb{E}}[\gamma^{\kappa} X_{\kappa}] \\ &=& {\mathbb{E}}[\gamma^{\kappa}X_{\kappa}\mid \kappa<\infty] {\mathbb{P}}_{s_0}(\kappa<\infty)+{\mathbb{E}}[\gamma^{\kappa}X_{\kappa}\mid \kappa=\infty] {\mathbb{P}}_{s_0}(\kappa=\infty)\\ &=&{\mathbb{E}}[\gamma^{\kappa} X_\kappa\mid \kappa<\infty]{\mathbb{P}}_{s_0}(\kappa<\infty)+0\le {\mathbb{P}}_{s_0}(\kappa<\infty) \\ &=& {\mathbb{P}}_{s_0}\left[\{\omega\in \Omega_{s_0}\mid \omega_t\in S_u \text{ for some } t\in {\mathbb{N}}\}\right] \end{align}\] where the fourth equality is derived from the fact that \(\gamma^\kappa X_\kappa=0\) for \(\kappa=\infty\) as \(\gamma\in (0,1)\) and the second inequality stems from the fact that \(\gamma^\kappa X_\kappa\le 1\). Finally, by means of complementation, we obtain the upper bound of . 0◻ ◻

-8 6 =8em Proofs for

(Linear Lower Bounds on Finite-time Safety). Given an \(M_\mu\) with an initial set \(S_0\) and an unsafe set \(S_u\), if there exists a barrier certificate \(B:S\to {\mathbb{R}}\) such that for some constants \(\lambda > \epsilon \ge 0\) and \(c \ge 0\), the following conditions hold: \[\begin{align} & B(s)\ge 0 \;& \text{ for all }s\in S,}\\ & B(s)\le \epsilon \;&\text{for all }s\in S_0, }\\ & B(s)\ge \lambda \;& \text{for all } s\in S_u, }\\ & {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]-B(s)\le c\; & \text{for all }s\in S, } \end{align}\] then the safety probability over a finite time horizon \(T\) is bounded from below by \[\forall s_0\in S_0. {\mathbb{P}}_{s_0}\left[\{\omega\mid \omega\in\Omega_{s_0},\omega_t\not\in S_u \;\text{for all } t\le T \} \right]\ge 1-(B(s_0)+cT)/\lambda .\]

Proof. Construct a stochastic process \(\{X_t\}_{t\ge 0}\) over the set \(\Omega\) of all trajectories such that \(X_t(\omega):=B(s_t)\) for any \(\omega\in\Omega\). Let \(\kappa\) be a stopping time at which the state first enters \(S_u\), i.e., \(\kappa(\omega):=\mathrm{inf} \{t\in{\mathbb{N}}\mid \omega_t \in S_u\}\). Then construct a new stochastic process \(\{Y_t\}_{t\ge 0}\) such that \(Y_t:=X_t+ (\kappa-t)c\). The stopped process \(\{\tilde{Y}_t\}_{t\ge 0}\) is defined by \[\tilde{Y}_t= \begin{cases} Y_t, & \text{ for } t<\kappa, \\ Y_\kappa, & \text{ for } t\ge \kappa. \end{cases}\] Next, we want to prove that \(\{\tilde{Y}_t\}_{t\ge 0}\) is a non-negative supermartingale. According to Condition ([eq:thm-fin-lower-linear-nonnegative]), \(\{\tilde{Y}_t\}_{t\ge 0}\) is non-neagtive. For \(t<\kappa\), we have that \[\begin{align} {\mathbb{E}}[\tilde{Y}_{t+1}\mid \tilde{\mathcal{F}}_t]&= {\mathbb{E}}[Y_{t+1}\mid \mathcal{F}_t]={\mathbb{E}}[X_{t+1}+(\kappa-t-1)c\mid \mathcal{F}_t] \\ &= {\mathbb{E}}[X_{t+1}\mid \mathcal{F}_t]+ (\kappa-t)c-c\\ &\le X_t+c+(\kappa-t)c-c\\ &= X_t+(\kappa-t)c=Y_t=\tilde{Y}_t \end{align}\] where the inequality is obtained by Condition ([eq:thm-fin-lower-linear-decrease]). For \(t\ge \kappa\), we have that \[\begin{align} {\mathbb{E}}[\tilde{Y}_{t+1}\mid \tilde{\mathcal{F}}_t]=Y_\kappa=\tilde{Y}_t \end{align}\] Therefore, we can conclude that \(\{\tilde{Y}_t\}_{t\ge 0}\) is a non-negative supermartingale. By Ville’s Inequality (), we have that for a constant \(\lambda'>0\), \[\begin{align} {\mathbb{P}}[\mathrm{sup}_{t\in [0,\kappa]} Y_t\ge \lambda'] ={\mathbb{P}}[\mathrm{sup}_{t\in{\mathbb{N}}} \tilde{Y}_t\ge \lambda']\le \frac{X_0+c \kappa}{\lambda'} \end{align}\] By the definition, \(Y_t\ge \lambda'\Leftrightarrow X_t\ge \lambda'-(\kappa-t)c:=r_t\), so we have that \({\mathbb{P}}[\mathrm{sup}_{t\in [0,\kappa]} X_t \ge \mathrm{sup}_{t\in [0,\kappa]}r_t]\le \frac{X_0+c\kappa}{\lambda'}\). As the maximum of \(r_t\) in \([0,\kappa]\) is \(\lambda'\), let \(\lambda=\lambda'\), we can derive that \[\begin{align} {\mathbb{P}}[\mathrm{sup}_{t\in [0,\kappa]} X_t\ge \lambda]\le \frac{X_0+c\kappa}{\lambda}, \end{align}\] which by Condition ([eq:thm-fin-lower-linear-unsafe]) implies that \[\begin{align} \forall s_0\in S_0. {\mathbb{P}}_{s_0}[\{ \omega\in\Omega_{s_0}\mid \omega_t\in S_u \text{ for some } t\le \kappa \}]\le \frac{X_0(\omega)+c\kappa}{\lambda} \end{align}\] Finally, by means of complementation, we obtain the lower bound of . 0◻ ◻

(Exponential Lower Bounds on Finite-time Safety). Given an \(M_\mu\) with an initial set \(S_0\) and an unsafe set \(S_u\), if there exists a function \(B:S\rightarrow {\mathbb{R}}\) such that for some constants \(\alpha> 0, \beta\in{\mathbb{R}}\), and \(\gamma\in [0,1)\), the following conditions hold: \[\begin{align} & B(s)\ge 0 \;& \text{ for all }s\in S,}\\ & B(s)\le \gamma \;&\text{for all }s\in S_0,}\\ & B(s) \ge 1 \;& \text{for all } s\in S_u,}\\ & \alpha {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]-B(s) \le \alpha \beta \; & \text{for all }s\in S\setminus S_u. } \end{align}\] then the safety probability over a finite time horizon \(T\) is bounded from below by \[\forall s_0\in S_0. {\mathbb{P}}_{s_0}\left[\{\omega\mid \omega\in\Omega_{s_0},\omega_t\not\in S_u \;\text{for all } t\le T \} \right]\ge 1-\frac{\alpha\beta}{\alpha-1} + (\frac{\alpha\beta}{\alpha-1}-B(s_0))\cdot \alpha^{-T}. }\]

Proof. Construct a stochastic process \(\{X_t\}_{t\ge 0}\) over the set \(\Omega\) of all trajectories such that \(X_t(\omega):=\omega_t\) for any \(\omega\in\Omega\). Let \(\kappa\) be the first time that a trajectory enters the unsafe set \(S_u\), i.e., \(\kappa(\omega):=\mathrm{inf} \{t\in{\mathbb{N}}\mid \omega_t \in S_u\}\). Define the stopped process \(\{\tilde{X}_t\}_{t\ge 0}\) by \[\tilde{X}_t= \begin{cases} X_t, & \text{ for } t<\kappa, \\ X_\kappa, & \text{ for } t\ge \kappa. \end{cases}\] According to Condition ([eq:thm-fin-lower-exp-moni]) and the law of total expectation, we have that \[\begin{align} & \alpha {\mathbb{E}}[B(\tilde{X}_{t+1})\mid \tilde{\mathcal{F}}_t]-B(\tilde{X}_t)\le \alpha\beta \nonumber \\ \Rightarrow \;& {\mathbb{E}}[B(\tilde{X}_{t+1})\mid \tilde{\mathcal{F}}_t] \le \frac{B(\tilde{X}_t)}{\alpha}+\beta \nonumber \\ \Rightarrow \;& {\mathbb{E}}[B(\tilde{X}_{t+1})] \le \frac{{\mathbb{E}}[B(\tilde{X}_t)]}{\alpha}+\beta \label{eq:gronwalltype} \end{align}\tag{3}\] Next, define a new stochastic process \(\{Y_{t}\}_{t\ge 0}\) such that \(Y_t:={\mathbb{E}}[B(\tilde{X}_t)]\). By , we obtain that \(Y_{t+1}\le \frac{Y_t}{\alpha}+\beta\). By the discrete version of Gronwall’s Inequality (see ), we have that \[\begin{align} Y_{t} \le\;& \alpha^{-t}\cdot Y_0+\sum_{k=1}^t\left(\prod_{i=k+1}^t \alpha^{-1} \right)\cdot \beta \nonumber =\alpha^{-t}\cdot Y_0+\sum_{k=1}^t \left[\alpha^{-(t-k)}\cdot \beta\right] \nonumber\\ =\;& \alpha^{-t}\cdot Y_0+ \alpha^{-t}\cdot \beta \cdot\sum_{k=1}^{t} \alpha^k \nonumber= \alpha^{-t}\cdot Y_0+ \alpha^{-t}\cdot \beta \cdot \frac{\alpha(1-\alpha^t)}{1-\alpha} \nonumber\\ =\;& \alpha^{-t}\cdot Y_0+ \frac{\alpha\beta}{\alpha-1}-\frac{\alpha\beta}{\alpha-1}\cdot \alpha^{-t} \label{eq:gronwallbound} \end{align}\tag{4}\] Pick a time step \(T>0\). It is observed that \(\{\omega\in\Omega \mid X_t(\omega)\in S_u \text{ for some } t\le T \}=\{\omega\in \Omega\mid \tilde{X}_T\in S_u\}\). Thus, we have that \[\begin{align} & {\mathbb{P}}[X_t\in S_u \text{ for some }t\le T] \\ =&{\mathbb{P}}[\tilde{X}_T\in S_u]={\mathbb{E}}[\mathrm{1}_{S_u} (\tilde{X}_T)] \\ \le & {\mathbb{E}}[ B(\tilde{X}_T)] \\ \le & \alpha^{-T}\cdot B(X_0)+ \frac{\alpha\beta}{\alpha-1}-\frac{\alpha\beta}{\alpha-1}\cdot \alpha^{-T} \end{align}\] where the first inequality is obtained by Conditions ([eq:thm-fin-lower-exp-nonnegative]) and ([eq:thm-fin-lower-exp-unsafe]), the second inequality is derived by . Finally, by means of complementation, we obtain the lower bound. ◻

(Linear Upper Bounds on Finite-time Safety) Given an \(M_\mu\) with an initial set \(S_0\) and an unsafe set \(S_u\), if there exists a barrier function \(B:S\rightarrow {\mathbb{R}}\) such that for some constants \(\beta\in (0,1), \beta<\alpha<1+\beta, c\ge 0\), the following conditions hold: \[\begin{align} & B(s)\ge 0 \;& \text{ for all }s\in S, } \\ & B(s)\le \beta \;&\text{for all }s\in S\setminus S_u,}\\ & \alpha \le B(s) \le 1+\beta \;& \text{for all } s\in S_u,}\\ & {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]-B(s) \ge c \; & \text{for all }s\in S\setminus S_u. } \end{align}\] then the safety probability over a finite time horizon is bounded from above by \[\forall s_0\in S_0.\; {\mathbb{P}}_{s_0}\left[\{\omega\mid \omega\in\Omega_{s_0},\omega_t\not\in S_u \;\text{for all } t\le T \} \right]\le 1-B(s_0)-\frac{1}{2}c\cdot T+\beta.\]

Proof. Construct a stochastic process \(\{X_t\}_{t\ge 0}\) over the set \(\Omega\) of all trajectories such that \(X_t(\omega):=\omega_t\) for any \(\omega\in\Omega\). Let \(\kappa\) be the first time that a trajectory enters the unsafe set \(S_u\), i.e., \(\kappa(\omega):=\mathrm{inf} \{t\in{\mathbb{N}}\mid \omega_t \in S_u\}\). Define the stopped process \(\{\tilde{X}_t\}_{t\ge 0}\) by \[\tilde{X}_t= \begin{cases} X_t, & \text{ for } t<\kappa, \\ X_\kappa, & \text{ for } t\ge \kappa. \end{cases}\] Pick a time step \(T>0\). It is observed that \(\{\omega\in\Omega \mid X_t(\omega)\in S_u \text{ for some } t\le T \}=\{\omega \in\Omega \mid \tilde{X}_T\in S_u\}\). Thus, we have that \[\begin{align} & {\mathbb{P}}[X_t\in S_u \text{ for some }t\le T] \nonumber \\ =\;&{\mathbb{P}}[\tilde{X}_T\in S_u]={\mathbb{E}}[\mathrm{1}_{S_u} (\tilde{X}_T)] \label{eq:stopped-observe1} \end{align}\tag{5}\] Moreover, let \(0\le T_1\le T_2\le T\), it is found that \(\{\omega\in\Omega\mid \tilde{X}_{T_1}\in S_u\}\subseteq \{\omega \in\Omega \mid \tilde{X}_{T_2}\in S_u\}\). Therefore, we have that \[\begin{align} \label{eq:stopped-observe2} {\mathbb{P}}[\tilde{X}_{T_1}\in S_u]\le {\mathbb{P}}[\tilde{X}_{T_2}\in S_u] \end{align}\tag{6}\] By Conditions ([eq:thm-fin-upper-exp-initial]) and ([eq:thm-fin-upper-linear-unsafe]), we have that \[\begin{align} \label{eq:newobserve} B(\tilde{X}_t)\le \mathrm{1}_{S_u}(\tilde{X}_t)+\beta \text{ for any } t\ge 0 \end{align}\tag{7}\] By Condition ([eq:thm-fin-upper-linear-moni]) and the law of total expectation, we obtain that \[\begin{align} & {\mathbb{E}}[B(\tilde{X}_{t+1})\mid \tilde{\mathcal{F}}_t]-B(\tilde{X}_t)\ge c \\ \Rightarrow \; & {\mathbb{E}}[B(\tilde{X}_{t+1})\mid \tilde{\mathcal{F}}_t] \ge B(\tilde{X}_t) +c \\ \Rightarrow \; & {\mathbb{E}}[B(\tilde{X}_{t+1})] \ge {\mathbb{E}}[B(\tilde{X}_t)] +c \end{align}\] By the iterative calculation, we have that \({\mathbb{E}}[B(\tilde{X}_{t})] \ge B(X_0)+c\cdot t\). Then we can derive that \[\begin{align} {\mathbb{P}}[\tilde{X}_T\in S_u]=&{\mathbb{E}}[\mathrm{1}_{S_u}(\tilde{X}_T)] \\ \ge & \frac{\sum_{t=0}^T {\mathbb{E}}[\mathrm{1}_{S_u}(\tilde{X}_t)] }{T+1} \\ \ge & \frac{\sum_{t=0}^T ({\mathbb{E}}[B(\tilde{X}_t)]-\beta)}{T+1} \\ \ge & \frac{\sum_{t=1}^T B(X_0)+c\cdot t}{T+1}-\beta \\ = & B(X_0)+\frac{1}{2}c\cdot T-\beta \end{align}\] where the first equality is obtained by , the first inequality stems from and the second inequality is obtained by . Finally, by means of complementation, we obtain the upper bound.0◻ ◻

(Exponential Upper Bounds on Finite-Time Safety). Given an \(M_\mu\) with an initial set \(S_0\) and an unsafe set \(S_u\), if there exists a barrier function \(B:S\to {\mathbb{R}}\) such that for some constants \(K'\le K<0\), \(\epsilon>0\) and a non-empty interval \([a,b]\), the following conditions hold: \[\begin{align} & B(s)\ge 0 \;&\text{for all }s\in S\setminus S_u, }\\ & K'\le B(s) \le K \;& \text{for all } s\in S_u, }\\ & {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]-B(s)\le -\epsilon \; & \text{for all }s\in S\setminus S_u, }\\ & a\le B(f(s,\pi(s+\delta)))-B(s) \le b \;& \text{for all } s\in S\setminus S_u \text{ and } \delta \sim \mu }, \end{align}\] then the safety probability over a finite time horizon \(T\) is bounded from above by \[\forall s_0\in S_0. {\mathbb{P}}_{s_0}\left[\{\omega\mid \omega\in\Omega_{s_0},\omega_t\not\in S_u \;\text{for all } t\le T \} \right]\le exp(-\frac{2(\epsilon \cdot T- B(s_0))^2}{T\cdot (b-a)^2}).\]

Proof. Let \(\kappa\) be a stopping time over the set \(\Omega\) of all trajectories at which the state first enters \(S_u\), i.e., \(\kappa(\omega):=\mathrm{inf} \{t\in{\mathbb{N}}\mid \omega_t \in S_u\}\). Define the stopped process \(\{\tilde{X}_t\}_{t\ge 0}\) by \[X_t= \begin{cases} B(s_t), & \text{ for } t<\kappa, \\ B(s_\kappa), & \text{ for } t\ge \kappa. \end{cases}\] Next, we prove that \(\{X_t\}_{t\ge 0}\) is ranking supermartingale w.r.t. \(\kappa\), i.e., \({\mathbb{E}}[X_{t+1}|{\cal F}_t] \leq X_t-\epsilon \cdot \mathrm{1}_{\kappa>t}\) for all \(t\ge 0\). To prove this inequality, we consider two cases.

  • When \(\kappa>t\), we have that \(X_t=B(s_t)\). By Condition ([eq:thm-fin-upper-exp-moni]), we obtain that \({\mathbb{E}}[X_{t+1}\mid \mathcal{F}_t]={\mathbb{E}}_{\delta\sim\mu} [B(f(s_t,\pi(s_t+\delta)))\mid s_t]\le B(s_t)-\epsilon=X_t-\epsilon\).

  • When \(\kappa\le t\), we have that \(X_t=B(s_\kappa)\). Then we can obtain that \({\mathbb{E}}[X_{t+1}\mid \mathcal{F}_t]=X_\kappa=X_t\).

According to the two cases, we can conclude that \({\mathbb{E}}[X_{t+1}|{\cal F}_t] \leq X_t-\epsilon \cdot \mathrm{1}_{\kappa>t}\) for all \(t \ge 0\). Hence, we prove that \(\{X_t\}_{t\ge 0}\) is a ranking supermartingale w.r.t. \(\kappa\).

Construct a new stochastic process \(\{Y_t\}_{t\ge 0}\) such that \(Y_t:= X_t+\epsilon\cdot \mathrm{min}\{\kappa,t\}\). Then we want to prove that \(\{Y_t\}_{t\ge 0}\) is a supermatirngale and \(Y_{t+1}-Y_t\in [a+\epsilon, b+\epsilon]\). First, we have that \[\begin{align} {\mathbb{E}}[Y_{t+1}\mid \mathcal{F}_t] &= {\mathbb{E}}[X_{t+1}+\epsilon \cdot \mathrm{min}\{t+1,\kappa\}\mid \mathcal{F}_t] \\ &= {\mathbb{E}}[X_{t+1} \mid \mathcal{F}_t ]+\epsilon \cdot {\mathbb{E}}[\mathrm{min}\{t+1,\kappa\} \mid\mathcal{F}_t] \\ &\le X_t-\epsilon\cdot \mathrm{1}_{\kappa> t} + \epsilon \cdot {\mathbb{E}}[\mathrm{min}\{t+1,\kappa\} \mid\mathcal{F}_t] \end{align}\] Second, we consider two cases:

  • If \(\kappa>t\), then \(\mathrm{min}\{t+1,\kappa\}=t+1\) and \({\mathbb{E}}[\mathrm{min}\{t+1,\kappa\} \mid\mathcal{F}_t]=t+1\). We can thus derive that \[\begin{align} {\mathbb{E}}[Y_{t+1}\mid \mathcal{F}_t] &\le X_t-\epsilon\cdot \mathrm{1}_{\kappa>t} + \epsilon \cdot {\mathbb{E}}[\mathrm{min}\{t+1,\kappa\} \mid\mathcal{F}_t] \\ &= X_t-\epsilon+\epsilon\cdot (t+1) \\ &= X_t+\epsilon \cdot t \\ &= Y_t \end{align}\]

  • If \(\kappa\le t\), then \(\mathrm{min}\{t+1,\kappa\}=\kappa=\kappa\cdot \mathrm{1}_{\kappa\le t}\). Since the event \(\kappa\cdot \mathrm{1}_{\kappa\le t}\) is measurable in \(\mathcal{F}_t\), \({\mathbb{E}}[\kappa\cdot \mathrm{1}_{\kappa\le t}\mid \mathcal{F}_t]=\kappa\cdot \mathrm{1}_{\kappa\le t}\). We can thus derive that \[\begin{align} {\mathbb{E}}[Y_{t+1}\mid \mathcal{F}_t] &\le X_t-\epsilon\cdot \mathrm{1}_{\kappa> t} + \epsilon \cdot {\mathbb{E}}[\mathrm{min}\{t+1,\kappa\} \mid\mathcal{F}_t] \\ &= X_t-0+\epsilon\cdot \kappa \\ &= Y_t \end{align}\]

Hence, \(\{Y_t\}_{t\ge 0}\) is a supermartingale. Moreover, since \(\kappa\le t\) implies \(X_{t+1}=X_t\), we have that \(X_{t+1}-X_t=\mathrm{1}_{\kappa> t}\cdot (X_{t+1}-X_t)\). Thus, we can obtain that \[\begin{align} Y_{t+1}-Y_t &= X_{t+1}-X_t+ \epsilon \cdot (\mathrm{min}\{t+1,\kappa\}-\mathrm{min}\{t,\kappa \}) \\ &= X_{t+1}-X_t+ \epsilon \cdot \mathrm{1}_{\kappa> t} \\ &= \mathrm{1}_{\kappa> t}\cdot (X_{t+1}-X_t+ \epsilon) \end{align}\] where the second equality is derived by the fact that \(\mathrm{min}\{t+1,\kappa\}-\mathrm{min}\{t,\kappa \}=\mathrm{1}_{\kappa> t}\). It follows that \(Y_{t+1}-Y_t \in [a+\epsilon, b+\epsilon]\).

We use Hoeffding’s Inequality on Supermartingales (see ) to prove our result. Pick a time step \(T>0\). Let \(\nu=\epsilon\cdot T-X_0\) and \(\hat{\nu}=\epsilon\cdot\mathrm{min}\{T,\kappa\}-X_0\). Note that \(\nu=\hat{\nu}\) whenever \(\kappa>T\). By Conditions ([eq:thm-fin-upper-exp-nonnegative]) and ([eq:thm-fin-upper-exp-unsafe]), we have that \(\kappa>T\) iff \(X_T\ge 0\). For any \(s_0\in S_0\), we can derive that \[\begin{align} {\mathbb{P}}_{s_0}(\kappa>T)&={\mathbb{P}}_{s_0}(X_T\ge 0\wedge \kappa>T) \\ &={\mathbb{P}}_{s_0}((X_T+\hat{\nu}\ge \nu)\wedge \kappa>T) \\ &\le {\mathbb{P}}_{s_0}(X_T+\hat{\nu}\ge \nu) \\ &= {\mathbb{P}}_{s_0}(Y_T-Y_0\ge \epsilon\cdot T-X_0) \\ &\le \mathrm{exp}\left(\frac{-2\cdot (\epsilon\cdot T-B(s_0))^2}{T\cdot (b-a)^2} \right) \end{align}\] for all \(T>\frac{B(s_0)}{\epsilon}\). Finally, as \(\{\omega\mid \kappa(\omega)>T \}=\{\omega\mid \omega_t\not\in S_u \text{ for all } t\le T \}\), we have that \[\begin{align} {\mathbb{P}}_{s_0}(\kappa>T)={\mathbb{P}}_{s_0}\left[\{ \omega\in\Omega_{s_0}\mid \omega_t\not\in S_u \;\text{for all } t\le T \} \right], \end{align}\] which implies the upper bound. 0◻ ◻

-8 6 =8em Supplementary Materials for

Fix a DNN-controlled system \(M_\mu=(S,S_0,A,\pi,f,R,\mu)\) with an unsafe set \(S_u\in S\) such that \(S_0\cap S_u=\emptyset\).

-8 6 =8em Proof for

(\(k\)-inductive Variant of Almost-Sure Safety). Given an \(M_\mu\) with an initial set \(S_0\) and an unsafe set \(S_u\). If there exists a \(k\)-inductive barrier certificate \(B:S\rightarrow {\mathbb{R}}\) such that the following conditions hold: \[\begin{align} & \bigwedge_{0\le i<k} B(g_{\pi,f}^{i}(s,\Delta^i))\le 0 \;& \forall (s,\Delta^i)\in S_0\times W^i, } \\ & B(s)>0 \;& \forall s\in S_u, } \\ &\bigwedge_{0\le i<k} (B(g_{\pi,f}^{i}(s,\Delta^i))\le 0)\Longrightarrow B(g_{\pi,f}^k(s,\Delta^k))\le 0 & \forall (s,\Delta^i)\in S\times W^i,} \end{align}\] then the system \(M_\mu\) is almost-surely safe, i.e., \(\forall s_0\in S_0. \omega\in\Omega_{s_0} \Longrightarrow \omega_t \not\in S_u \;\forall t\in{\mathbb{N}}.\)

Proof. We prove it by contradiction. Assume that there exists a \(k\)-inductive barrier certificate \(B\) satisfying Conditions ([eq:thm-almost-indc-initial]) to ([eq:thm-almost-indc-decrease]), but the system is unsafe, i.e., there exist a time step \(T>0\) and an initial state \(s_0\in S_0\) such that a trajectory starting from \(s_0\) enters the unsafe set \(S_u\) at \(T\), which means \(s_T\in S_u\). By Condition ([eq:thm-almost-indc-unsafe]), we have that \(B(s_T)>0\). Condition ([eq:thm-almost-indc-initial]) implies that \(B(s_0)\le 0, B(s_1)\le 0, \dots, B(s_{k-1})\le 0\), which means the state sequences starting from the safe set will remain in the safe set for the next \(k-1\) consecutive time steps. By Condition ([eq:thm-almost-indc-decrease]), we have that for any \(k\) consecutive time steps, if the system is safe, then the system will still be safe in the \((k+1)\)-th time step. Then by applying the \(k\) induction principle with Condition ([eq:thm-almost-indc-initial]) as the base case and Condition ([eq:thm-almost-indc-decrease]) as the inductive hypothesis, we can derive that \(B(s_t)\le 0\) for all \(t\in{\mathbb{N}}\). This is contradictory to Condition ([eq:thm-almost-indc-unsafe]). Therefore, the system with a \(k\)-inductive BC is almost-surely safe. 0◻ ◻

-8 6 =8em Proofs for

(\(k\)-inductive Lower Bounds on Unbounded-time Safety). Given an \(M_\mu\) with an initial set \(S_0\) and an unsafe set \(S_u\). If there exists a k-inductive barrier certificate \(B:S\to {\mathbb{R}}\) such that for some constants \(k\in{\mathbb{N}}_{\ge 1}\), \(\epsilon\in [0,1]\) and \(c\ge 0\), the following conditions hold: \[\begin{align} & B(s) \ge 0 \;& \text{for all } s\in S } \\ & B(s)\le \epsilon \;&\text{for all }s\in S_0, } \\ & B(s)\ge 1 \;& \text{for all } s\in S_u, } \\ & {\mathbb{E}}_{\delta\sim\mu}[B(f(s,\pi(s+\delta)))\mid s]-B(s)\le c \; & \text{for all } s\in S , } \\ & {\mathbb{E}}_{\Delta^k\sim \mu^k}[B(g_{\pi,f}^k(s,\Delta^k))\mid s]-B(s)\le 0 \; & \text{for all } s\in S, } \end{align}\] then the safety probability over infinite time horizons is bounded from below by \[\forall s_0\in S_0. {\mathbb{P}}_{s_0}\left[\{\omega\mid \omega_0\in\Omega_{s_0},\omega_t\not\in S_u \;\text{ for all } t\in {\mathbb{N}}\} \right]\ge 1-kB(s_0)-\frac{k(k-1)c}{2}.\]

Proof. Let \(\kappa\) be the first time that a trajectory enters the unsafe set \(S_u\), i.e., \(\kappa(\omega):=\mathrm{inf} \{t\in{\mathbb{N}}\mid \omega_t \in S_u\}\). Construct a stochastic process \(\{X_t\}_{t\ge 0}\) over the set \(\Omega\) of all trajectories such that for any \(\omega\in\Omega\). \[X_t(\omega)= \begin{cases} B(s_t), & \text{ for } t<\kappa, \\ B(s_\kappa), & \text{ for } t\ge \kappa. \end{cases}\] According to Condition ([eq:thm-inf-lower-induc-unsafe]), we have that \(S_u\subseteq \{s\in S\mid B(s)\ge 1 \}\). Therefore, for any initial state \(s_0\in S_0\), it follows that \[\begin{align} & {\mathbb{P}}_{s_0}[\{\omega\mid \omega\in \Omega_{s_0}, \omega_t\in S_u \text{ for some } t\in {\mathbb{N}}\}] \\ \le\; & {\mathbb{P}}_{s_0}[\{\omega\mid \omega\in\Omega_{s_0}, \mathrm{sup}_{t\in{\mathbb{N}}} X_t(\omega)\ge 1\}] \end{align}\] Then construct \(k\) new stochastic processes \(\{Y^i_t\}_{t\ge 0}\)’s for all \(i\in [0,k-1]\) such that \(Y^i_t:=X_{i+tk}\). Intuitively, given a process \(\{X_t\}_{t\ge 0}\), each new process \(\{Y^i_t\}_{t\ge 0}\) consists of the states in \(\{X_t\}_{t\ge 0}\) after every \(k\) steps, starting from \(X_i\). Next, we want to prove that each \(\{Y^i_t\}_{t\ge 0}\) is a non-negative supermartingale. For \(i+tk<\kappa\), we have that \[\begin{align} {\mathbb{E}}[Y^i_{t+1}\mid \mathcal{F}_t]={\mathbb{E}}[X_{i+(t+1)k}\mid \mathcal{F}_t] \le X_{i+tk}=Y^i_t \end{align}\] where the inequality is obtained by Condition ([eq:thm-inf-lower-induc-k-martin]). For \(i+tk\ge \kappa\), we have that \[\begin{align} {\mathbb{E}}[Y^i_{t+1}\mid \mathcal{F}_t]= X_{i+tk}=Y^i_t. \end{align}\] Combined with Condition ([eq:thm-inf-lower-induc-nonnegative]), we can conclude that \(\{Y^i_t\}_{t\ge 0}\) is a non-negative supermartingale for all \(i\in [0,k-1]\). Then by Boole’s inequality, we have that \[\begin{align} & {\mathbb{P}}_{s_0}[\{ \omega\in\Omega_{s_0}\mid \mathrm{sup}_{t\in{\mathbb{N}}} X_t(\omega)\ge 1\}] \\ \le\;& \sum_{i=0}^{k-1} {\mathbb{P}}_{s_0}[\{ \omega\in\Omega_{s_0}\mid \mathrm{sup}_{t\in{\mathbb{N}}} X_{i+tk}(\omega)\ge 1\}] \\ = \;& \sum_{i=0}^{k-1} {\mathbb{P}}_{s_0}[\{ \omega\in\Omega_{s_0}\mid \mathrm{sup}_{t\in{\mathbb{N}}} Y^i_t(\omega)\ge 1\}] \\ \le \;& \sum_{i=0}^{k-1} {\mathbb{E}}[X_i] \end{align}\] where the last inequality is obtained by Doob’s nonnegative supermartingale inequality. By applying Condition ([eq:thm-inf-lower-induc-c-martin]) and the law of total expectation recursively, we obtain that \[\begin{align} & {\mathbb{E}}[X_{t+1}\mid \mathcal{F}_t]\le X_t+c \\ \Rightarrow \;& {\mathbb{E}}[X_{t+1}]\le {\mathbb{E}}[X_t]+c \\ \Rightarrow \;& {\mathbb{E}}[X_{t+1}] \le X_0+c\cdot (t+1) \end{align}\] Therefore, we can derive that \[\begin{align} & {\mathbb{P}}_{s_0}[\{ \omega\in \Omega_{s_0}\mid \omega_t\in S_u \text{ for some } t\in {\mathbb{N}}\}] \\ \le\;& X_0+\sum_{i=1}^{k-1} (X_0+ic) = kX_0+\frac{k(k-1)c}{2} \end{align}\] Finally, by means of complementation, we obtain the upper bound.0◻ ◻

To make the probabilistic bound above non-trivial, the value of \(k\) should be bounded by \[1\le k \le \frac{(c-2B(s_0))+\sqrt{4B(s_0)^2+c^2-4c(2+B(s_0))^2}}{2c}\]

(\(k\)-inductive Upper Bounds on Unbounded-time Safety). Given an \(M_\mu\) with an initial set \(S_0\) and an unsafe set \(S_u\), if there exists a barrier certificate \(B:S\rightarrow {\mathbb{R}}\) such that for some constant \(\gamma\in (0,1)\), \(0\le \epsilon'<\epsilon\le 1\), \(c\le 0\) the following conditions hold: \[\begin{align} & 0\le B(s)\le 1 \;& \text{for all } s\in S }\\ & B(s)\ge \epsilon \;& \text{ for all } s\in S_0, }\\ & B(s)\le \epsilon' \;& \text{ for all } s\in S_u, }\\ & {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]-B(s)\ge c \; & \text{ for all } s\in S}\\ & B(s)-\gamma^k \cdot{\mathbb{E}}_{\Delta^k\sim \mu^k}[B(g_{\pi,f}^k(s,\Delta^k))\mid s]\le 0\; & \text{for all }s\in S \setminus S_u } \end{align}\] then the safety probability over infinite time horizons is bounded from above by \[\forall s_0\in S_0. {\mathbb{P}}_{s_0}\left[\{ \omega\in\Omega_{s_0}\mid\omega_t\not\in X_u \;\text{ for all } t\in {\mathbb{N}}\} \right]\le 1-k B(s_0)-\frac{k(k-1)c}{2}.\]

Proof. Construct a stochastic process \(\{X_t\}_{t\ge 0}\) over the set \(\Omega\) of all trajectories such that \(X_t(\omega):=B(s_t)\) for any \(\omega\in\Omega\). Let \(\kappa\) be a stopping time at which the state first enters \(S_u\), i.e., \(\kappa(\omega):=\mathrm{inf} \{t\in{\mathbb{N}}\mid \omega_t \in S_u\}\). Then construct a new stochastic process \(\{Y_t\}_{t\ge 0}\) stopped at \(\kappa\) such that \[Y_t= \begin{cases} \gamma^t X_t, & \text{ for } t<\kappa, \\ \gamma^\kappa X_\kappa, & \text{ for } t\ge \kappa. \end{cases}\] It is observed that \(\{\omega\in\Omega \mid X_t(\omega)\in S_u \text{ for some } t\in{\mathbb{N}}\}=\{\omega \in\Omega \mid \kappa(\omega)<\infty\}\). Thus, we have that \[\begin{align} {\mathbb{P}}[X_t\in S_u \text{ for some }t\in{\mathbb{N}}] ={\mathbb{P}}[\kappa<\infty] \label{eq:stopped-observe-new} \end{align}\tag{8}\] Moreover, we can derive that \[\begin{align} {\mathbb{P}}[\kappa<\infty] \ge\; &{\mathbb{E}}[\gamma^\kappa X_{\kappa}\mid \kappa<\infty]{\mathbb{P}}[\kappa<\infty] +0 \\ = \;& {\mathbb{E}}[\gamma^\kappa X_{\kappa}\mid \kappa<\infty] {\mathbb{P}}[\kappa<\infty]+{\mathbb{E}}[\gamma^\kappa X_\kappa\mid \kappa=\infty] {\mathbb{P}}[\kappa=\infty] \\ = \;& {\mathbb{E}}[\gamma^\kappa X_\kappa] ={\mathbb{E}}[Y_\kappa] \end{align}\] where the first inequality is obtained by the fact \(\gamma\in (0,1)\) and Condition ([eq:thm-inf-upper-induc-bound]), and the second equality holds as \(\gamma^\kappa=0\) when \(\kappa=\infty\).

Next, we construct \(k\) new stochastic processes \(\{Z^i_t\}_{t\ge 0}\)’s for \(i \in [0,1]\) such that \(Z^i_t:=Y_{i+tk}\). Intuitively, given a process \(\{Y_t\}_{t\ge 0}\), each new process \(\{Z^i_t\}_{t\ge 0}\) consists of the states in \(\{Y_t\}_{t\ge 0}\) after every \(k\) steps, starting from \(Y_i\). We want to prove that each \(\{Z_t\}_{t\ge 0}\) is a submartingale. For \(i+tk<\kappa\), we have that \[\begin{align} {\mathbb{E}}[Z^i_{t+1}\mid \mathcal{F}_t]=\;& {\mathbb{E}}[\gamma^{i+(t+1)k} X_{i+(t+1)k} \mid \mathcal{F}_t] \\ =\;&\gamma^{i+tk}\cdot \gamma^k {\mathbb{E}}[X_{i+(t+1)k}\mid \mathcal{F}_t] \ge \gamma^{i+tk} X_{i+tk} =Z^i_t \end{align}\] where the inequality is obtained by Condition ([eq:thm-inf-upper-induc-decrease]). For \(i+tk\ge \kappa\), we have that \[\begin{align} {\mathbb{E}}[Z^i_{t+1}\mid \mathcal{F}_t]=Y_{i+tk}=Z^i_t \end{align}\] Hence, \(\{Z^i_t\}_{t\ge 0}\) is a submartingale for all \(i\in [0,k-1]\). By Condition ([eq:thm-inf-upper-induc-bound]), \(\{Z^i_t\}_{t\ge 0}\) satisfies the conditions of the Optional Stopping Theorem (OST). Then by applying OST, we obtain that \[\begin{align} {\mathbb{P}}[\kappa<\infty]\ge {\mathbb{E}}[Y_\kappa]=\sum_{i=0}^{k-1} {\mathbb{E}}[Z^i_\kappa]\ge \sum_{i=0}^{k-1} {\mathbb{E}}[Z_0^i]=\sum_{i=0}^{k-1} {\mathbb{E}}[X_i] \end{align}\] By applying Condition ([eq:thm-inf-upper-induc-c-martin]) and the law of total expectation recursively, we have that \[\begin{align} &{\mathbb{E}}[X_{t+1}\mid\mathcal{F}_t]\ge X_t+c \\ \Rightarrow \;& {\mathbb{E}}[X_{t+1}]\ge {\mathbb{E}}[X_t]+c \\ \Rightarrow \;& {\mathbb{E}}[X_{t+1}]\ge X_0 + (t+1)c \end{align}\] Therefore, we have that \[\begin{align} {\mathbb{P}}[\kappa<\infty]\ge X_0+\sum_{i=1}^{k-1} (X_0+ic)= kX_0+\frac{k(k-1)c}{2} \end{align}\] Finally, by means of complementation, we obtain the upper bound. 0◻ ◻

-8 6 =8em Supplementary Materials for

Similar to those in , the training phase of the other NBCs also involves two steps, i.e., training data construction and loss function definition. Training data construction can be achieved by discretizing the state space \(S\) and constructing a discretization \(\tilde{S}\subseteq S\) such that for each \(s \in S\), there is a \(\tilde{s} \in \tilde{S}\) with \(||s-\tilde{s}||_1 < \tau\), where \(\tau>0\) is called the granularity of \(\tilde{S}\). Once the discretization \(\tilde{S}\) is obtained, we construct two finite sets \(\tilde{S}_0:=\tilde{S}\cap S_0\) and \(\tilde{S}_u:=\tilde{S}\cap S_u\) used for the training process.

We provide the training loss functions and validation theorems for several barrier certificates, The fundamental idea for the others is consistent. The training loss functions are composed of the conditions of NBCs and the simulation-guided term for tightness. In the validation phase, we utilize Lipschitz continuity to reduce the validation of the infinite state space to a finite set of states , by applying stricter conditions.

-8 6 =8em Loss Functions of NBCs

0.0.0.1 The loss function of NBCs in .

Similarly, we define the loss function of NBCs in as:

\[\begin{align} &{\mathcal{L}}(\theta):=k_1 \cdot {\mathcal{L}}_{1}(\theta)+k_2 \cdot {\mathcal{L}}_{2}(\theta)+k_3 \cdot {\mathcal{L}}_{3}(\theta)+k_4 \cdot {\mathcal{L}}_{4}(\theta)+k_5 \cdot {\mathcal{L}}_{5}(\theta)\\ &{\mathcal{L}}_{1}(\theta)=\frac{1}{|\tilde{S}|} \sum\limits_{s\in \tilde{S}} \left(\mathop{\mathrm{max}}\{ h_\theta(s) - 1,0\} + \mathop{\mathrm{max}}\{ 0 - h_\theta(s),0\}\right)\\ &{\mathcal{L}}_{2}(\theta)=\frac{1}{|\tilde{S}_0|} \sum\limits_{s\in \tilde{S}_0} \left(\mathop{\mathrm{max}}\{\epsilon - h_\theta(s),0\} \right)\\ &{\mathcal{L}}_{3}(\theta)=\frac{1}{|\tilde{S}_u|} \sum\limits_{s\in \tilde{S}_u} \left(\mathop{\mathrm{max}}\{h_\theta(s)-\epsilon',0\} \right)\\ &{\mathcal{L}}_{4}(\theta)=\frac{1}{|\tilde{S} \setminus \tilde{S}_u|}\sum\limits_{s\in \tilde{S} \setminus \tilde{S}_u} \left( \mathop{\mathrm{max}}\{h_\theta(s) - \gamma\cdot\sum\limits_{s'\in \mathcal{D}_s}\frac{h_\theta(s')}{N}+\zeta ,0\} \right)\\ &{\mathcal{L}}_{5}(\theta)=\frac{1}{|\tilde{S}_0|}\sum\limits_{s\in \tilde{S}_0} \left( \mathop{\mathrm{max}}\{1 - \varmathbb{f}_s - h_\theta(s) ,0\}\right) \end{align}\] where \({\mathcal{L}}_{1}(\theta),\cdots, {\mathcal{L}}_{4}(\theta)\) are defined via the four conditions of the barrier certificate, correspondingly. \({\mathcal{L}}_{5}(\theta)\) is the regularization term to assure the tightness of upper bounds via the same simulation-based method and .

0.0.0.2 The loss function of NBCs in .

The loss function of NBCs in is:

\[\begin{align} &{\mathcal{L}}(\theta):=k_1 \cdot {\mathcal{L}}_{1}(\theta)+k_2 \cdot {\mathcal{L}}_{2}(\theta)+k_3 \cdot {\mathcal{L}}_{3}(\theta)+k_4 \cdot {\mathcal{L}}_{4}(\theta)+k_5 \cdot {\mathcal{L}}_{5}(\theta)\\ &{\mathcal{L}}_{1}(\theta)=\frac{1}{|\tilde{S}|} \sum\limits_{s\in \tilde{S}} \left(\mathop{\mathrm{max}}\{0 - h_\theta(s),0\}\right)\\ &{\mathcal{L}}_{2}(\theta)=\frac{1}{|\tilde{S}_0|} \sum\limits_{s\in \tilde{S}_0} \left( \mathop{\mathrm{max}}\{h_\theta(s)-\epsilon,0\}\right)\\ &{\mathcal{L}}_{3}(\theta)=\frac{1}{|\tilde{S}_u|} \sum\limits_{s\in \tilde{S}_u} \left(\mathop{\mathrm{max}}\{\lambda - h_\theta(s),0\} \right)\\ &{\mathcal{L}}_{4}(\theta)=\frac{1}{|\tilde{S}|}\sum\limits_{s\in \tilde{S}} \left( \mathop{\mathrm{max}}\{\sum\limits_{s'\in \mathcal{D}_s}\frac{h_\theta(s')}{N} - h_\theta(s) - c + \zeta ,0\} \right)\\ &{\mathcal{L}}_{5}(\theta)=\frac{1}{|\tilde{S}_0|} \sum\limits_{s\in \tilde{S}_0}\sum\limits_{t=1}^{T} \left( \mathop{\mathrm{max}}\{ \varmathbb{f}_t^{s} -1 +(h_\theta(s)+c \cdot t)/\lambda, 0 \}\right) \end{align}\] where the last term \({\mathcal{L}}_{5}(\theta)\) is to assure the tightness of bounds using the simulation-based method. Specifically, we execute the control system from each \(s\in \tilde{S}_0\), and calculate the safe frequency at each time step \(\varmathbb{f}_t^{s}, t=1,\cdots,T\) of all these trajectories starting from each initial state \(s\).

0.0.0.3 The loss function of NBCs in .

The loss function of NBCs in can be defined as:

\[\begin{align} &{\mathcal{L}}(\theta):=k_1 \cdot {\mathcal{L}}_{1}(\theta)+k_2 \cdot {\mathcal{L}}_{2}(\theta)+k_3 \cdot {\mathcal{L}}_{3}(\theta)+k_4 \cdot {\mathcal{L}}_{4}(\theta)+k_5 \cdot {\mathcal{L}}_{5}(\theta)\\ &{\mathcal{L}}_{1}(\theta)=\frac{1}{|\tilde{S}|} \sum\limits_{s\in \tilde{S}} \left(\mathop{\mathrm{max}}\{0 - h_\theta(s),0\} \right)\\ &{\mathcal{L}}_{2}(\theta)=\frac{1}{|\tilde{S}_0|} \sum\limits_{s\in \tilde{S}_0} \left(\mathop{\mathrm{max}}\{h_\theta(s)-\gamma,0\}\right)\\ &{\mathcal{L}}_{3}(\theta)=\frac{1}{|\tilde{S}_u|} \sum\limits_{s\in \tilde{S}_u} \left(\mathop{\mathrm{max}}\{1 - h_\theta(s),0\} \right)\\ &{\mathcal{L}}_{4}(\theta)=\frac{1}{|\tilde{S}\setminus \tilde{S}_u|} \sum\limits_{s\in \tilde{S}\setminus \tilde{S}_u} \left( \mathop{\mathrm{max}}\{\alpha\sum\limits_{s'\in \mathcal{D}_s}\frac{h_\theta(s')}{N} - h_\theta(s) - \alpha \beta \; + \zeta ,0\} \right)\\ & {\mathcal{L}}_{5}(\theta)=\frac{1}{|\tilde{S}_0|}\sum\limits_{s\in \tilde{S}_0} \sum\limits_{t=1}^{T} \left( \mathop{\mathrm{max}}\{ \varmathbb{f}_t^{s} - 1+\frac{\alpha\beta}{\alpha-1} - (\frac{\alpha\beta}{\alpha-1}-h_\theta(s))\cdot \alpha^{-t},0 \}\right) \end{align}\] where the last term \({\mathcal{L}}_{5}(\theta)\) is to assure the tightness of bounds using the simulation-based method. Specifically, we execute the control system from each \(s\in \tilde{S}_0\), and calculate the safe frequency at each time step \(\varmathbb{f}_t^s, t=1,\cdots,T\) of all these trajectories.

0.0.0.4 The loss function of NBCs in .

We define the loss function as: \[\begin{align} &{\mathcal{L}}(\theta):=k_1 \cdot {\mathcal{L}}_{1}(\theta)+k_2 \cdot {\mathcal{L}}_{2}(\theta)+k_3 \cdot {\mathcal{L}}_{3}(\theta)+k_4 \cdot {\mathcal{L}}_{4}(\theta)+k_5 \cdot {\mathcal{L}}_{5}(\theta)\\ &{\mathcal{L}}_{1}(\theta)=\frac{1}{|\tilde{S}|} \sum\limits_{s\in \tilde{S}} \left(\mathop{\mathrm{max}}\{0 - h_\theta(s),0\} \right)\\ &{\mathcal{L}}_{2}(\theta)=\frac{1}{|\tilde{S}\setminus \tilde{S}_u|} \sum\limits_{s\in \tilde{S}\setminus \tilde{S}_u} \left(\mathop{\mathrm{max}}\{ h_\theta(s) - \beta,0\} \right)\\ &{\mathcal{L}}_{3}(\theta)=\frac{1}{| \tilde{S}_u|} \sum\limits_{s\in \tilde{S}_u} \left( \mathop{\mathrm{max}}\{\alpha - h_\theta(s) ,0\} + \mathop{\mathrm{max}}\{ h_\theta(s) -1 - \beta ,0\} \right)\\ &{\mathcal{L}}_{4}(\theta)=\frac{1}{|\tilde{S}\setminus \tilde{S}_u|} \sum\limits_{s\in \tilde{S}\setminus \tilde{S}_u} \left( \mathop{\mathrm{max}}\{ c - \sum\limits_{s'\in \mathcal{D}_s}\frac{h_\theta(s')}{N} + h_\theta(s) + \zeta ,0\} \right)\\ & {\mathcal{L}}_{5}(\theta)=\frac{1}{|\tilde{S}_0|} \sum\limits_{s\in \tilde{S}_0}\sum\limits_{t=1}^{T} \left( 1-h_\theta(s)-\frac{1}{2}c\cdot t+\beta - \varmathbb{f}_t^s, 0 \}\right) \end{align}\]

0.0.0.5 The loss function of NBCs in .

We define the loss function as: \[\begin{align} &{\mathcal{L}}(\theta):=k_1 \cdot {\mathcal{L}}_{1}(\theta)+k_2 \cdot {\mathcal{L}}_{2}(\theta)+k_3 \cdot {\mathcal{L}}_{3}(\theta)+k_4 \cdot {\mathcal{L}}_{4}(\theta)+k_5 \cdot {\mathcal{L}}_{5}(\theta)\\ &{\mathcal{L}}_{1}(\theta)=\frac{1}{|\tilde{S}\setminus \tilde{S}_u|} \sum\limits_{s\in \tilde{S}\setminus \tilde{S}_u} \left(\mathop{\mathrm{max}}\{0 - h_\theta(s),0\} \right)\\ &{\mathcal{L}}_{2}(\theta)=\frac{1}{|\tilde{S}_u|} \sum\limits_{s\in \tilde{S}_u} \left(\mathop{\mathrm{max}}\{K' - h_\theta(s),0\} + \mathop{\mathrm{max}}\{h_\theta(s) -K,0\} \right)\\ &{\mathcal{L}}_{3}(\theta)=\frac{1}{|\tilde{S}\setminus \tilde{S}_u|} \sum\limits_{s\in \tilde{S}\setminus \tilde{S}_u} \left( \mathop{\mathrm{max}}\{\sum\limits_{s'\in \mathcal{D}_s}\frac{h_\theta(s')}{N} - h_\theta(s) + \epsilon \; + \zeta ,0\} \right)\\ &{\mathcal{L}}_{4}(\theta)=\frac{1}{|\tilde{S}\setminus \tilde{S}_u|} \sum\limits_{s\in \tilde{S}\setminus \tilde{S}_u} \left( \mathop{\mathrm{max}}\{h_\theta(s') - h_\theta(s) - b + \zeta ,0\} + \mathop{\mathrm{max}}\{a - h_\theta(s') + h_\theta(s) + \zeta ,0\} \right)\\ & {\mathcal{L}}_{5}(\theta)=\frac{1}{|\tilde{S}_0|} \sum\limits_{s\in \tilde{S}_0}\sum\limits_{t=1}^{T} \left( \mathop{\mathrm{max}}\{exp(-\frac{2(\epsilon t- h_\theta(s))^2}{t\cdot (b-a)^2}) - \varmathbb{f}_t^s, 0 \}\right) \end{align}\]

-8 6 =8em NBCs Validation

Given an \(M_{\mu}\) and a function \(B:S\rightarrow{\mathbb{R}}\), we have \({\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]-B(s)\le 0\) for any state \(s\in S \setminus S_u\) if the formula below \[\begin{align} {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))\mid \tilde{s}]\le B(\tilde{s})-\zeta \nonumber \end{align}\] holds for any state \(\tilde{s}\in \tilde{S} \setminus \tilde{S}_u\), where \(\zeta= \tau\cdot L_B\cdot (1+L_f\cdot (1+L_\pi))\) with \(L_f,L_\pi, L_B\) being the Lipschitz constants of \(f,\pi\) and \(B\), respectively.

Proof. Let \(L_f,L_\pi,L_B\) be the Lipschitz constants for the system dynamics \(f\), the trained policy \(\pi\) and the neural network function \(B\), respectively. Given a state \(s\in S \setminus S_u\), let \(\tilde{s}\) be such that \(||\tilde{s}-s||_1\le \tau\). By the Lipschitz continuities, we have that \[\begin{align} &{\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))] \\ \le & {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] +||f(\tilde{s},\pi(\tilde{s}+\delta))-f(s,\pi(s+\delta))||_1\cdot L_B \\ \le& {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] +||(\tilde{s},\pi(\tilde{s}+\delta))-(s,\pi(s+\delta))||_1\cdot L_B\cdot L_f \\ \le& {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] + ||\tilde{s}-s||_1\cdot L_B\cdot L_f \cdot (1+L_\pi) \\ \le& {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] + \tau\cdot L_B\cdot L_f \cdot (1+L_\pi) \end{align}\] and \[\begin{align} -B(s)\le -B(\tilde{s})+||\tilde{s}-s||_1\cdot L_B\le -B(\tilde{s})+\tau\cdot L_B, \end{align}\] Thus, we can derive that \[\begin{align} & {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))]-B(s)\\ \le &{\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] + \tau\cdot L_B\cdot L_f \cdot (1+L_\pi) -B(\tilde{s})+\tau\cdot L_B\\ \le &{\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] + \tau\cdot L_B\cdot L_f \cdot (1+L_\pi) -B(\tilde{s})+\tau\cdot L_B\\ \le & {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] - B(\tilde{s}) +\zeta \\ \le & 0 \end{align}\] ◻

0.0.0.6 Validating NBCs in .

A candidate NBC in can be validated if it meets the conditions in  . For the first three conditions, we can check

\[\mathop{\mathrm{inf}}\limits_{s\in S} B(s)\ge 0 \; \land \; \mathop{\mathrm{sup}}\limits_{s\in S} B(s)\le 1 \; \land \;\mathop{\mathrm{inf}}\limits_{s\in S_0} B(s)\ge \epsilon \; \land \;\mathop{\mathrm{sup}}\limits_{s\in S_u} B(s) \le \epsilon'\] using the interval bound propagation approach  [44], [45]. Similarly, for the condition in ,   reduces validation from infinite states to finite ones, which is easier to check.

Theorem 17. Given an \(M_{\mu}\) and a function \(B:S\rightarrow{\mathbb{R}}\), we have B(s)-\(\gamma\cdot{\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]\le 0\; \text{for all }s\in S \setminus S_u\) if the formula below \[\begin{align} B(\tilde{s}) - \gamma \cdot {\mathbb{E}}_{\delta\sim \mu }[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] + \zeta \le \; 0 \end{align}\] holds for any state \(\tilde{s}\in \tilde{S} \setminus \tilde{S}_u\), where \(\zeta=\tau\cdot L_B \cdot ( \gamma \cdot L_f \cdot (1+L_\pi) + 1)\) with \(L_f,L_\pi, L_B\) being the Lipschitz constants of \(f,\pi\) and \(B\), respectively.

Proof. By the Lipschitz continuities and \(\gamma\in (0,1)\), we have that \[\begin{align} & \gamma \cdot {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))] \\ \ge & \gamma \cdot \{{\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] -||f(\tilde{s},\pi(\tilde{s}+\delta))-f(s,\pi(s+\delta))||_1\cdot L_B\} \\ \ge& \gamma \cdot \{{\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] -||(\tilde{s},\pi(\tilde{s}+\delta))-(s,\pi(s+\delta))||_1\cdot L_B\cdot L_f \} \\ \ge& \gamma \cdot \{{\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] - ||\tilde{s}-s||_1\cdot L_B\cdot L_f \cdot (1+L_\pi) \}\\ \ge& \gamma \cdot \{{\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] - \tau\cdot L_B\cdot L_f \cdot (1+L_\pi)\} \end{align}\] and \[\begin{align} -B(s)\ge -B(\tilde{s})-||s-\tilde{s}||_1\cdot L_B\ge -B(\tilde{s})-\tau\cdot L_B \end{align}\] Thus, we can derive that \[\begin{align} & \gamma \cdot {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))] -B(s) \\ \ge \; & \gamma \cdot \{{\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] - \tau\cdot L_B\cdot L_f \cdot (1+L_\pi)\} -B(\tilde{s})-\tau\cdot L_B \\ = \;& \gamma \cdot {\mathbb{E}}_{\delta\sim \mu }[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] - B(\tilde{s}) - \gamma \cdot \tau \cdot L_B\cdot L_f \cdot (1+L_\pi)-\tau\cdot L_B \end{align}\] Finally, we have: \[\begin{align} & B(s) - \gamma \cdot {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))] \\ \le \; & B(\tilde{s}) - \gamma \cdot {\mathbb{E}}_{\delta\sim \mu }[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] + \gamma \cdot \tau \cdot L_B\cdot L_f \cdot (1+L_\pi) + \tau\cdot L_B \\ = \;& B(\tilde{s}) - \gamma \cdot {\mathbb{E}}_{\delta\sim \mu }[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] + \zeta \\ \le \; & 0 \end{align}\] ◻

0.0.0.7 Validating NBCs in .

A candidate NBC can be validated if it meets the conditions in  . For the first three conditions, we can check

\[\mathop{\mathrm{inf}}\limits_{s\in S} B(s)\ge 0 \; \land \;\mathop{\mathrm{sup}}\limits_{s\in S_0} B(s)\le \epsilon \; \land \;\mathop{\mathrm{inf}}\limits_{s\in S_u} B(s) \ge \lambda\] using the interval bound propagation approach  [44], [45]. For the condition in ,   reduces validation from infinite states to finite ones, which is easier to check.

Theorem 18. Given an \(M_{\mu}\) and a function \(B:S\rightarrow{\mathbb{R}}\), we have \({\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]-B(s)\le c\; \text{for all }s\in S\) if the formula below \[\begin{align} {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))\mid \tilde{s}]-B(\tilde{s}) + \zeta \le c \end{align}\] holds for any state \(\tilde{s}\in \tilde{S}\), where \(\zeta= \tau\cdot L_B\cdot (1+L_f\cdot (1+L_\pi))\) with \(L_f,L_\pi, L_B\) being the Lipschitz constants of \(f,\pi\) and \(B\), respectively.

Proof. Given a state \(s\in S\), let \(\tilde{s}\) be such that \(||\tilde{s}-s||_1\le \tau\). By the Lipschitz continuities, we have that \[\begin{align} &{\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))] \\ \le & {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] +||f(\tilde{s},\pi(\tilde{s}+\delta))-f(s,\pi(s+\delta))||_1\cdot L_B \\ \le& {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] +||(\tilde{s},\pi(\tilde{s}+\delta))-(s,\pi(s+\delta))||_1\cdot L_B\cdot L_f \\ \le& {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] + ||\tilde{s}-s||_1\cdot L_B\cdot L_f \cdot (1+L_\pi) \\ \le& {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] + \tau\cdot L_B\cdot L_f \cdot (1+L_\pi) \end{align}\] and \[\begin{align} -B(s)\le -B(\tilde{s})+||\tilde{s}-s||_1\cdot L_B\le -B(\tilde{s})+\tau\cdot L_B, \end{align}\] Thus, we can derive that \[\begin{align} & {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))]-B(s) -c\\ \le &{\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] -B(\tilde{s}) + \tau\cdot L_B\cdot L_f \cdot (1+L_\pi) +\tau\cdot L_B -c\\ \le & {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] - B(\tilde{s}) +\zeta -c \\ \le & 0 \end{align}\] ◻

0.0.0.8 Validating NBCs in .

A candidate NBC can be validated if it meets the conditions in  . For the first three conditions, we can check

\[\mathop{\mathrm{inf}}\limits_{s\in S } B(s)\ge 0 \; \land \;\mathop{\mathrm{sup}}\limits_{s\in S_0} B(s)\le \gamma \; \land \;\mathop{\mathrm{inf}}\limits_{s\in S_u} B(s) \ge 1\] using the interval bound propagation approach  [44], [45]. For the condition in ,   reduces validation from infinite states to finite ones, which is easier to check.

Theorem 19. Given an \(M_{\mu}\) and a function \(B:S\rightarrow{\mathbb{R}}\), we have \(\alpha \cdot {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]-B(s) \le \alpha \beta\) if the formula below \[\begin{align} \alpha \cdot {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))\mid \tilde{s}]-B(\tilde{s})\le \alpha \beta \end{align}\] holds for any state \(\tilde{s}\in \tilde{S}\setminus \tilde{S}_u\), where \(\zeta= \alpha \cdot \tau\cdot L_B\cdot L_f \cdot (1+L_\pi) + \tau\cdot L_B\) with \(L_f,L_\pi, L_B\) being the Lipschitz constants of \(f,\pi\) and \(B\), respectively.

Proof. Given a state \(s\in S\), let \(\tilde{s}\) be such that \(||\tilde{s}-s||_1\le \tau\). By the Lipschitz continuities, we have that \[\begin{align} &{\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))] \\ \le & {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] +||f(\tilde{s},\pi(\tilde{s}+\delta))-f(s,\pi(s+\delta))||_1\cdot L_B \\ \le& {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] +||(\tilde{s},\pi(\tilde{s}+\delta))-(s,\pi(s+\delta))||_1\cdot L_B\cdot L_f \\ \le& {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] + ||\tilde{s}-s||_1\cdot L_B\cdot L_f \cdot (1+L_\pi) \\ \le& {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] + \tau\cdot L_B\cdot L_f \cdot (1+L_\pi) \end{align}\] and \[\begin{align} -B(s)\le -B(\tilde{s})+||\tilde{s}-s||_1\cdot L_B\le -B(\tilde{s})+\tau\cdot L_B, \end{align}\] Thus, we can derive that \[\begin{align} &\alpha \cdot {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))]-B(s) - \alpha \beta \\ \le & \alpha \cdot {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] -B(\tilde{s}) - \alpha \beta + \alpha \cdot \tau\cdot L_B\cdot L_f \cdot (1+L_\pi) + \tau\cdot L_B \\ \le & {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] - B(\tilde{s}) - \alpha \beta +\zeta \\ \le & 0 \end{align}\] ◻

0.0.0.9 Validating NBCs in .

A candidate NBC can be validated if it meets the conditions in  . For the first two conditions, we can check

\[\mathop{\mathrm{inf}}\limits_{s\in S} B(s)\ge 0 \; \land \;\mathop{\mathrm{sup}}\limits_{s\in S \setminus S_u } B(s) \le \beta\] using the interval bound propagation approach  [44], [45]. The condition in is satisfied straightforwardly, Because candidate NBCs are neural networks that are Lipschitz continuous [67]. For the condition in ,   reduces validation from infinite states to finite ones, which is easier to check.

Theorem 20. Given an \(M_{\mu}\) and a function \(B:S\rightarrow{\mathbb{R}}\), we have \({\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]-B(s) \ge c\) if the formula below \[\begin{align} {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))\mid \tilde{s}]-B(\tilde{s}) \ge c + \zeta \end{align}\] holds for any state \(\tilde{s}\in \tilde{S}\setminus \tilde{S}_u\), where \(\zeta= \tau \cdot L_B\cdot L_f \cdot (1+L_\pi) + \tau\cdot L_B\) with \(L_f,L_\pi, L_B\) being the Lipschitz constants of \(f,\pi\) and \(B\), respectively.

Proof. By the Lipschitz continuities and \(\gamma\in (0,1)\), we have that \[\begin{align} & {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))] \\ \ge & {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] -||f(\tilde{s},\pi(\tilde{s}+\delta))-f(s,\pi(s+\delta))||_1\cdot L_B \\ \ge& {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] -||(\tilde{s},\pi(\tilde{s}+\delta))-(s,\pi(s+\delta))||_1\cdot L_B\cdot L_f \\ \ge& {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] - ||\tilde{s}-s||_1\cdot L_B\cdot L_f \cdot (1+L_\pi)\\ \ge& {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] - \tau\cdot L_B\cdot L_f \cdot (1+L_\pi) \end{align}\] and \[\begin{align} -B(s)\ge -B(\tilde{s})-||s-\tilde{s}||_1\cdot L_B\ge -B(\tilde{s})-\tau\cdot L_B \end{align}\] Thus, we can derive that \[\begin{align} & {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))] -B(s) -c \\ \ge \; & {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] - \tau\cdot L_B\cdot L_f \cdot (1+L_\pi) -B(\tilde{s}) -\tau\cdot L_B -c\\ = \;& {\mathbb{E}}_{\delta\sim \mu }[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] - B(\tilde{s}) - \tau \cdot L_B\cdot L_f \cdot (1+L_\pi)-\tau\cdot L_B -c \end{align}\] Finally, we have: \[\begin{align} & B(s) - {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))] + c \\ \le \; & B(\tilde{s}) - {\mathbb{E}}_{\delta\sim \mu }[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] + \tau \cdot L_B\cdot L_f \cdot (1+L_\pi) + \tau\cdot L_B +c \\ = \;& B(\tilde{s}) - {\mathbb{E}}_{\delta\sim \mu }[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] + c + \zeta \\ \le \; & 0 \end{align}\] ◻

0.0.0.10 Validating NBCs in .

A candidate NBC can be validated if it meets the conditions in  . For the first two conditions, we can check

\[\mathop{\mathrm{inf}}\limits_{s\in S \setminus S_u} B(s)\ge 0 \; \land \;\mathop{\mathrm{inf}}\limits_{s\in S_u} B(s)\ge K' \; \land \;\mathop{\mathrm{sup}}\limits_{s\in S_u} B(s) \le K\] using the interval bound propagation approach  [44], [45]. is satisfied straightforwardly, Because candidate NBCs are neural networks that are Lipschitz continuous [67]. For the condition in ,   reduces validation from infinite states to finite ones, which is easier to check.

Theorem 21. Given an \(M_{\mu}\) and a function \(B:S\rightarrow{\mathbb{R}}\), we have \({\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))\mid s]-B(s)\le -\epsilon\) if the formula below \[\begin{align} {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))\mid \tilde{s}]-B(\tilde{s})\le -\epsilon \end{align}\] holds for any state \(\tilde{s}\in \tilde{S}\setminus \tilde{S}_u\), where \(\zeta= \tau\cdot L_B\cdot (1+L_f\cdot (1+L_\pi))\) with \(L_f,L_\pi, L_B\) being the Lipschitz constants of \(f,\pi\) and \(B\), respectively.

Proof. Given a state \(s\in S\), let \(\tilde{s}\) be such that \(||\tilde{s}-s||_1\le \tau\). By the Lipschitz continuities, we have that \[\begin{align} &{\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))] \\ \le & {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] +||f(\tilde{s},\pi(\tilde{s}+\delta))-f(s,\pi(s+\delta))||_1\cdot L_B \\ \le& {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] +||(\tilde{s},\pi(\tilde{s}+\delta))-(s,\pi(s+\delta))||_1\cdot L_B\cdot L_f \\ \le& {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] + ||\tilde{s}-s||_1\cdot L_B\cdot L_f \cdot (1+L_\pi) \\ \le& {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] + \tau\cdot L_B\cdot L_f \cdot (1+L_\pi) \end{align}\] and \[\begin{align} -B(s)\le -B(\tilde{s})+||\tilde{s}-s||_1\cdot L_B\le -B(\tilde{s})+\tau\cdot L_B, \end{align}\] Thus, we can derive that \[\begin{align} & {\mathbb{E}}_{\delta\sim \mu}[B(f(s,\pi(s+\delta)))]-B(s) +\epsilon \\ \le &{\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] -B(\tilde{s}) + \tau\cdot L_B\cdot L_f \cdot (1+L_\pi) +\tau\cdot L_B + \epsilon \\ \le & {\mathbb{E}}_{\delta\sim \mu}[B(f(\tilde{s},\pi(\tilde{s}+\delta)))] - B(\tilde{s}) +\zeta + \epsilon \\ \le & 0 \end{align}\] ◻

0.0.0.11 Validating NBCs in .

A candidate NBC can be validated if it meets the conditions in  . For the first two conditions, we can check

\[\mathop{\mathrm{sup}}\limits_{s\in S_0} B(s)\le 0 \; \land \;\mathop{\mathrm{inf}}\limits_{s\in S_u} B(s) > 0\] using the interval bound propagation approach  [44], [45]. For the condition in ,   reduces validation from infinite states to finite ones, which is easier to check.

Theorem 22. Given an \(M_{\mu}\) and a function \(B:S\rightarrow{\mathbb{R}}\), we have \(\textstyle\bigwedge_{0\le i<k} (B(g_{\pi,f}^{i}(s,\Delta^i))\le 0)\Longrightarrow B(g_{\pi,f}^k(s,\Delta^k))\le 0 \; \forall (s,\Delta^i)\in S\times W^i\) if the formula below \[\begin{align} B(f(\tilde{s},\pi(\tilde{s}+\delta)))) -B(\tilde{s}) + \zeta \le 0 \end{align}\] holds for any state \(\tilde{s}\in \tilde{S}\), where \(\zeta= \tau\cdot L_B\cdot (1+L_f\cdot (1+L_\pi))\) with \(L_f,L_\pi, L_B\) being the Lipschitz constants of \(f,\pi\) and \(B\), respectively.

Proof. Given a state \(s\in S\), let \(\tilde{s}\) be such that \(||\tilde{s}-s||_1\le \tau\). By the Lipschitz continuities, we have that \[\begin{align} &B(f(s,\pi(s+\delta))) \\ \le & B(f(\tilde{s},\pi(\tilde{s}+\delta))) +||f(\tilde{s},\pi(\tilde{s}+\delta))-f(s,\pi(s+\delta))||_1\cdot L_B \\ \le& B(f(\tilde{s},\pi(\tilde{s}+\delta))) +||(\tilde{s},\pi(\tilde{s}+\delta))-(s,\pi(s+\delta))||_1\cdot L_B\cdot L_f \\ \le& B(f(\tilde{s},\pi(\tilde{s}+\delta))) + ||\tilde{s}-s||_1\cdot L_B\cdot L_f \cdot (1+L_\pi) \\ \le& B(f(\tilde{s},\pi(\tilde{s}+\delta))) + \tau\cdot L_B\cdot L_f \cdot (1+L_\pi) \end{align}\] and \[\begin{align} -B(s)\le -B(\tilde{s})+||\tilde{s}-s||_1\cdot L_B\le -B(\tilde{s})+\tau\cdot L_B, \end{align}\] Thus, we can derive that \[\begin{align} & B(f(s,\pi(s+\delta)))-B(s)\\ \le &B(f(\tilde{s},\pi(\tilde{s}+\delta))) -B(\tilde{s}) + \tau\cdot L_B\cdot L_f \cdot (1+L_\pi) +\tau\cdot L_B \\ \le & B(f(\tilde{s},\pi(\tilde{s}+\delta))) - B(\tilde{s}) +\zeta \\ \le & 0 \end{align}\] ◻

-8 6 =8em Implementation Details and Additional Experimental Results

-8 6 =8em Benchmarks and Experimental Settings

To demonstrate the generality of our approach, we train systems with different activation functions and network structures of the planted NNs, using different DRL algorithms such as DQN [68] and DDPG [69]. gives the details of training settings.

13pt

Table 3: Experimental settings.
Task Dim. Alg. A.F. Size A.T. S.P.
CP 4 DQN ReLU \(3\times200\) Dis. Gym
PD 3 DDPG Sigmoid \(2\times200\) Cont. Gym
Tora 4 DDPG Sigmoid \(2\times200\) Cont. Verisig
B1 2 DDPG ReLU \(2\times100\) Cont. Verisig

Remarks. Dim.: dimension; Alg.: DRL algorithm; A.F.: activation function; A.T.: action type; S.P.: sources of problems; Dis.: discrete; Cont.: continuous.

  1. CartPole (CP). A pole is attached by an un-actuated joint to a cart. The goal of training is to learn a controller that prevents the pole from falling over by applying a force of \(+1\) or \(-1\) to the cart.

  2. Pendulum (PD). A pendulum that can rotate around an endpoint is delineated. Starting from a random position, the pendulum shall swing up and stay upright.

  3. Tora. A cart is attached to a wall with a spring. It is free to move on a frictionless surface. Inside the cart, there is an arm free to rotate about an axis. The controller’s goal is to stabilize the system at the equilibrium state where all the variables are 0.

  4. B1. A classic nonlinear systems, where agents in both systems aim to avoid dangerous areas from the preset initial state space.

2.5pt

Table 4: Synthesis time of different NBCs (in seconds).
Task L.B.i. \(k\)-L.B.i. U.B.i. \(k\)-U.B.i. \(l\)-L.B.f. \(e\)-L.B.f. \(l\)-U.B.f. \(e\)-U.B.f. Quali. \(k\)-Quali.
CP 2318.5 1876.0 2891.9 2275.3 3127.5 3359.0 3277.8 3509.1 1755.2 1059.3
PD 1941.6 1524.0 2282.7 1491.5 2015.5 2218.3 2076.8 2272.2 1433.7 811.9
Tora 280.3 218.5 895.1 650.7 396.4 429.6 340.6 501.7 623.0 415.8
B1 587.4 313.6 1127.3 840.1 775.9 824.7 894.2 1184.6 1022.3 566.7

-8 6 =8em Efficiency of Training and Validating NBCs.

shows synthesis time of different NBCs in seconds, i.e., L.B.i.(NBCs for lower bounds on infinite-time safety), \(k\)-L.B.i.(\(k\)-inductive lower bounds on infinite-time safety), U.B.i.(NBCs for upper bounds on infinite-time safety), \(k\)-U.B.i.(\(k\)-inductive upper bounds on infinite-time safety), \(l\)-L.B.f.(NBCs for linear lower bounds on finite-time safety), \(e\)-L.B.f.(NBCs for exponential lower bounds on finite-time safety), \(l\)-U.B.f.(NBCs for linear upper bounds on finite-time safety), \(e\)-U.B.f.(NBCs for exponential upper bounds on finite-time safety), Quali.(NBCs for almost-surely safety), and \(k\)-Quali.(NBCs for \(k\)-inductive almost-surely safety).

In general, high-dimensional systems e.g., CP (4-dimensional state space) cost more time than low-dimensional ones, e.g, B1 (2-dimensional state space). That is because the validation step suffers from the curse of high-dimensionality [70].

References↩︎

[1]
Lillicrap, T., Hunt, J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. CoRR abs/1509.02971(2015).
[2]
Seshia, S.A., Sadigh, D., Sastry, S.S.: Toward verified artificial intelligence. Communications of the ACM 65(7), 46–55 (2022).
[3]
Zhang, H., Chen, H., Xiao, C., Li, B., Liu, M., Boning, D.S., Hsieh, C.: Robust deep reinforcement learning against adversarial perturbations on state observations. In: NeurIPS’20. pp. 21024–21037 (2020).
[4]
Wan, X., Zeng, L., Sun, M.: Exploring the vulnerability of deep reinforcement learning-based emergency control for low carbon power systems. In: IJCAI’22. pp. 3954–3961 (2022).
[5]
Zhang, H., Chen, H., Boning, D.S., Hsieh, C.: Robust reinforcement learning on state observations with learned optimal adversary. In: ICLR’21 (2021).
[6]
Kwiatkowska, M., Norman, G., Parker, D.: Probabilistic model checking and autonomy. Annu. Rev. Control Robot. Auton. Syst. 5, 385–410 (2022).
[7]
Calinescu, R., Ghezzi, C., Kwiatkowska, M., Mirandola, R.: Self-adaptive software needs quantitative verification at runtime. Communications of the ACM 55(9), 69–77 (2012).
[8]
Hamers, R., Jongmans, S.: Discourje: Runtime verification of communication protocols in clojure. In: TACAS’20. pp. 266–284 (2020).
[9]
Winkler, T., Gehnen, C., Katoen, J.: Model checking temporal properties of recursive probabilistic programs. In: FOSSACS’22. pp. 449–469. Springer (2022).
[10]
Hahn, E.M., Hartmanns, A., Hensel, C., Klauck, M., Klein, J., Křetı́nsk, J., Parker, D., Quatmann, T., Ruijters, E., Steinmetz, M.: The 2019 comparison of tools for the analysis of quantitative formal models: (QComp 2019 competition report). In: TACAS’19. pp. 69–92. Springer (2019).
[11]
Kwiatkowska, M., Norman, G., Parker, D.: Prism 4.0: Verification of probabilistic real-time systems. In: CAV’11. pp. 585–591. Springer (2011).
[12]
Hensel, C., Junges, S., Katoen, J.P., Quatmann, T., Volk, M.: The probabilistic model checker storm. International Journal on Software Tools for Technology Transfer pp. 1–22 (2021).
[13]
Samek, W., Montavon, G., Lapuschkin, S., et al.: Explaining deep neural networks and beyond: A review of methods and applications. Proc. of the IEEE 109(3), 247–278 (2021).
[14]
Bacci, E., Parker, D.: Probabilistic guarantees for safe deep reinforcement learning. In: FORMATS’20. pp. 231–248. Springer (2020).
[15]
Jin, P., Tian, J., Zhi, D., et al.: Trainify: A CEGAR-driven training and verification framework for safe deep reinforcement learning. In: CAV’22. pp. 193–218. Springer (2022).
[16]
Tschaikowski, M., Tribastone, M.: Tackling continuous state-space explosion in a markovian process algebra. Theoretical Computer Science 517, 1–33 (2014).
[17]
Prajna, S., Jadbabaie, A., Pappas, G.J.: A framework for worst-case and stochastic safety verification using barrier certificates. IEEE Trans. Automat. Contr. 52(8), 1415–1428 (2007).
[18]
Lavaei, A., Soudjani, S., Frazzoli, E.: Safety barrier certificates for stochastic hybrid systems. In: ACC’22. pp. 880–885. IEEE (2022).
[19]
Zhao, H., Zeng, X., Chen, T., Liu, Z.: Synthesizing barrier certificates using neural networks. In: HSCC’20. pp. 1–11 (2020).
[20]
Mathiesen, F.B., Calvert, S.C., Laurenti, L.: Safety certification for stochastic systems via neural barrier functions. IEEE Control Syst. Lett. 7, 973–978 (2022).
[21]
Xia, J., Hu, M., Chen, X., Chen, M.: Accelerated synthesis of neural network-based barrier certificates using collaborative learning. In: Proceedings of the 59th ACM/IEEE Design Automation Conference. pp. 1201–1206 (2022).
[22]
Meng, Y., Qin, Z., Fan, C.: Reactive and safe road user simulations using neural barrier certificates. In: IROS’21. pp. 6299–6306. IEEE (2021).
[23]
Abate, A., Ahmed, D., Edwards, A., Giacobbe, M., Peruffo, A.: FOSSIL: a software tool for the formal synthesis of lyapunov functions and barrier certificates using neural networks. In: HSCC’21. pp. 24:1–24:11. ACM(2021).
[24]
Dawson, C., Gao, S., Fan, C.: Safe control with learned certificates: A survey of neural lyapunov, barrier, and contraction methods for robotics and control. IEEE Trans. Robot. 39, 1749 – 1767 (2023).
[25]
Anand, M., Murali, V., Trivedi, A., Zamani, M.: k-inductive barrier certificates for stochastic systems. In: HSCC’22. pp. 12:1–12:11. ACM(2022).
[26]
Brain, M., Joshi, S., Kroening, D., Schrammel, P.: Safety verification and refutation by k-invariants and k-induction. In: SAS’15. pp. 145–161. Springer (2015).
[27]
Zhang, H., Gu, J., Zhang, Z., Du, L., et al.: Backdoor attacks against deep reinforcement learning based traffic signal control systems. Peer Peer Netw. Appl. 16(1), 466–474 (2023).
[28]
Zikelic, D., Lechner, M., Henzinger, T.A., Chatterjee, K.: Learning control policies for stochastic systems with reach-avoid guarantees. In: AAAI’23. pp. 11926–11935 (2023).
[29]
Prajna, S., Jadbabaie, A.: Safety verification of hybrid systems using barrier certificates. In: Alur, R., Pappas, G.J. (eds.) HSCC’04. pp. 477–492. Springer (2004).
[30]
Prajna, S., Rantzer, A.: On the necessity of barrier certificates. IFAC Proceedings Volumes 38(1), 526–531 (2005).
[31]
Peruffo, A., Ahmed, D., Abate, A.: Automated and formal synthesis of neural barrier certificates for dynamical models. In: TACAS’21. pp. 370–388. Springer (2021).
[32]
Zhao, H., Qi, N., Dehbi, L., Zeng, X., Yang, Z.: Formal synthesis of neural barrier certificates for continuous systems via counterexample guided learning. ACM Trans. Embed. Comput. Syst. 22(5s), 146:1–146:21 (2023).
[33]
Abate, A., David, C., Kesseli, P., Kroening, D., Polgreen, E.: Counterexample guided inductive synthesis modulo theories. In: CAV’18. pp. 270–288. Springer (2018).
[34]
Ville, J.: Etude critique de la notion de collectif (1939).
[35]
Urabe, N., Hara, M., Hasuo, I.: Categorical liveness checking by corecursive algebras. In: LICS’17. pp. 1–12. IEEE(2017).
[36]
Williams, D.: Probability with martingales. Cambridge university press (1991).
[37]
Takisaka, T., Oyabu, Y., Urabe, N., Hasuo, I.: Ranking and repulsing supermartingales for reachability in probabilistic programs. In: ATVA’18. pp. 476–493. Springer (2018).
[38]
Steinhardt, J., Tedrake, R.: Finite-time regional verification of stochastic non-linear systems. Int. J. Robotics Res. 31(7), 901–923 (2012).
[39]
Gronwall, T.H.: Note on the derivatives with respect to a parameter of the solutions of a system of differential equations. Annals of Mathematics pp. 292–296 (1919).
[40]
Xue, B.: A new framework for bounding reachability probabilities of continuous-time stochastic systems. CoRR abs/2312.15843(2023).
[41]
Chatterjee, K., Fu, H., Novotný, P., Hasheminezhad, R.: Algorithmic analysis of qualitative and quantitative termination problems for affine probabilistic programs. In: POPL’16. pp. 327–342. ACM(2016).
[42]
Hoeffding, W.: Probability inequalities for sums of bounded random variables. The collected works of Wassily Hoeffding pp. 409–426 (1994).
[43]
Donaldson, A.F., Haller, L., Kroening, D., Rümmer, P.: Software verification using k-induction. In: SAS’11. pp. 351–368. Springer (2011).
[44]
Gowal, S., Dvijotham, K., Stanforth, R., Bunel, R., Qin, C., Uesato, J., Arandjelovic, R., Mann, T.A., Kohli, P.: On the effectiveness of interval bound propagation for training verifiably robust models. CoRR abs/1810.12715(2018).
[45]
Xu, K., Shi, Z., Zhang, H., Wang, Y., Chang, K., Huang, M., et al.: Automatic perturbation analysis for scalable certified robustness and beyond. In: NeurIPS’20 (2020).
[46]
Lechner, M., Zikelic, D., Chatterjee, K., Henzinger, T.A.: Stability verification in stochastic control systems via neural network supermartingales. In: AAAI’22. pp. 7326–7336 (2022).
[47]
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: OpenAI Gym(2016), arXiv:1606.01540.
[48]
Ivanov, R., Carpenter, T., Weimer, J., Alur, R., Pappas, G., Lee, I.: Verisig 2.0: Verification of neural network controllers using taylor model preconditioning. In: CAV’21. pp. 249–262. Springer (2021).
[49]
Salamati, A., Lavaei, A., Soudjani, S., Zamani, M.: Data-driven safety verification of stochastic systems via barrier certificates. In: ADHS’21. pp. 7–12. Elsevier (2021).
[50]
Feng, S., Chen, M., Xue, B., Sankaranarayanan, S., Zhan, N.: Unbounded-time safety verification of stochastic differential dynamics. In: CAV’20. pp. 327–348. Springer (2020).
[51]
Xue, B., Li, R., Zhan, N., Fränzle, M.: Reach-avoid analysis for stochastic discrete-time systems. In: ACC’21. pp. 4879–4885. IEEE(2021).
[52]
Xue, B., Zhan, N., Fränzle, M.: Reach-avoid analysis for polynomial stochastic differential equations. IEEE Trans. Autom. Control. (2023).
[53]
Xue, B., Fränzle, M., Zhan, N.: Inner-approximating reachable sets for polynomial systems with time-varying uncertainties. IEEE Trans. Autom. Control. 65(4), 1468–1483 (2020).
[54]
Feng, S., Chen, M., Su, H., Kaminski, B.L., Katoen, J., Zhan, N.: Lower bounds for possibly divergent probabilistic programs. Proc. ACM Program. Lang. 7(OOPSLA1), 696–726 (2023).
[55]
Chakarov, A., Sankaranarayanan, S.: Probabilistic program analysis with martingales. In: CAV’13. pp. 511–526. Springer (2013).
[56]
Asadi, A., Chatterjee, K., Fu, H., Goharshady, A.K., Mahdavi, M.: Polynomial reachability witnesses via stellensätze. In: PLDI’21. pp. 772–787. ACM(2021).
[57]
Takisaka, T., Oyabu, Y., Urabe, N., Hasuo, I.: Ranking and repulsing supermartingales for reachability in randomized programs. ACM Trans. Prog. Lang. Syst. 43(2), 5:1–5:46 (2021).
[58]
Bacci, E., Parker, D.: Verified probabilistic policies for deep reinforcement learning. In: NFM’22. pp. 193–212. Springer (2022).
[59]
Carr, S., Jansen, N., Topcu, U.: Task-aware verifiable RNN-based policies for partially observable markov decision processes. Artif. Intell. Res. 72, 819–847 (2021).
[60]
Amir, G., Schapira, M., Katz, G.: Towards scalable verification of deep reinforcement learning. In: FMCAD’21. pp. 193–203. IEEE (2021).
[61]
Bacci, E., Giacobbe, M., Parker, D.: Verifying reinforcement learning up to infinity. In: IJCAI’21. pp. 2154–2160 (2021).
[62]
Huang, C., Fan, J., Chen, X., Li, W., Zhu, Q.: Polar: A polynomial arithmetic framework for verifying neural-network controlled systems. In: ATVA’22. pp. 414–430. Springer (2022).
[63]
Yang, Z., Zhang, Y., Lin, W., Zeng, X., Tang, X., Zeng, Z., Liu, Z.: An iterative scheme of safe reinforcement learning for nonlinear systems via barrier certificate generation. In: CAV’21. pp. 467–490. Springer (2021).
[64]
Deshmukh, J., Kapinski, J., Yamaguchi, T., Prokhorov, D.: Learning deep neural network controllers for dynamical systems with safety guarantees. In: ICCAD’19. pp. 1–7. IEEE (2019).
[65]
Sha, M., Chen, X., Ji, Y., Zhao, Q., Yang, Z., Lin, W., Tang, E., Chen, Q., Li, X.: Synthesizing barrier certificates of neural network controlled continuous systems via approximations. In: DAC’21. pp. 631–636. IEEE (2021).
[66]
Zeng, X., Yang, Z., Zhang, L., Tang, X., Zeng, Z., Liu, Z.: Safety verification of nonlinear systems with bayesian neural network controllers. In: AAAI’23. pp. 15278–15286 (2023).
[67]
Ruan, W., Huang, X., Kwiatkowska, M.: Reachability analysis of deep neural networks with provable guarantees. In: IJCAI’18. pp. 2651–2659 (2018).
[68]
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.A.: Playing atari with deep reinforcement learning. CoRR abs/1312.5602(2013).
[69]
Lillicrap, T., Hunt, J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. In: ICLR (2016).
[70]
Berkenkamp, F., Turchetta, M., Schoellig, A.P., Krause, A.: Safe model-based reinforcement learning with stability guarantees. In: NeurIPS. pp. 908–918 (2017).