April 02, 2024
Deep neural networks (DNNs) are increasingly deployed in safety-critical domains, but their vulnerability to adversarial attacks poses serious safety risks. Existing neuron-level methods using limited data lack efficacy in fixing adversaries due to the inherent complexity of adversarial attack mechanisms, while adversarial training, leveraging a large number of adversarial samples to enhance robustness, lacks provability. In this paper, we propose AdvRepair, a novel approach for provable repair of adversarial attacks using limited data. By utilizing formal verification, AdvRepair constructs patch modules that, when integrated with the original network, deliver provable and specialized repairs within the robustness neighborhood. Additionally, our approach incorporates a heuristic mechanism for assigning patch modules, allowing this defense against adversarial attacks to generalize to other inputs. AdvRepair demonstrates superior efficiency, scalability and repair success rate. Different from existing DNN repair methods, our repair can generalize to general inputs, thereby improving the robustness of the neural network globally, which indicates a significant breakthrough in the generalization capability of AdvRepair.
In the last few years, deep neural networks (DNNs) have made significant advancements in various domains, including computer vision [1], natural language processing [2], and speech recognition [3]. However, the adoption of DNNs in safety-critical domains has been slow due to concerns regarding their dependability. One major concern is the vulnerability of DNNs to adversarial attacks [4], [5], where adversaries can manipulate input data in a way that is imperceptible to humans but can cause the model to make incorrect decisions. This vulnerability poses serious safety risks in applications such as autonomous vehicles [6] and medical diagnosis [7]. Therefore, it is crucial to develop and deploy DNN models with safety and reliability requirements in mind to fully realize their potential in safety-critical domains.
The repair of DNNs involves rectifying errors in the network by modifying its architecture or parameters. These errors can arise from various sources, including training errors [8], adversarial attacks [9], [10], backdoor attacks [11], and distribution drift [12], [13]. Regular and efficient repair during DNN deployment is crucial to mitigate the risks associated with these errors and ensure the performance and reliability of the network in practical applications. Traditional machine learning approaches for DNN repair include techniques such as adversarial training [14]–[16], input sanitization, fine-tuning, transfer learning [17], [18], and data augmentation [19]–[21]. These methods rely on a large amount of data and may exhibit suboptimal performance in situations with limited or low-quality data. Neuron-level fault localization and repair methods [22] aim to improve the accuracy and reliability of DNN repair by identifying and correcting errors at the individual neuron level. By leveraging these neuron-level techniques, researchers and practitioners can achieve more precise and reliable DNN repair with limited data.
At present, the efficacy of neuron-level repair methods utilizing only a limited number of samples for robust defense against adversarial attacks remains limited. These methods demonstrate effectiveness primarily in the context of backdoor attacks or individual safety properties. This limitation arises from the inherent complexity of adversarial attack mechanisms, which often exhibit a higher degree of intricacy compared to backdoor attacks. Moreover, diverse adversarial attacks targeting distinct samples correspond to a unique dense mixture of features [23], rendering it arduous to generalize parameter adjustments at the neuron level based on a small sample set. Furthermore, while techniques such as adversarial training leverage a large number of adversarial samples to bolster the robustness of neural networks, their efficacy lacks provability. Consequently, neural networks repaired using such techniques continue to harbor latent vulnerabilities in terms of robustness.
Our proposed methodology aims to address the issue of adversarial attacks by leveraging a limited set of adversarial samples. Specifically, our approach focuses on rectifying these erroneous samples while simultaneously maximizing accuracy and ensuring local robustness on these samples. Moreover, we aim to achieve this local robustness in a manner that generalizes across the entire dataset. To achieve these objectives, our method integrates an external indicator, a hard-coded characteristic function, which facilitates the allocation of patch modules in conjunction with the original network. This indicator identifies inputs within the local neighborhood of a specific sample and assigns the appropriate patch modules for repair. Additionally, for inputs outside the local neighborhood, a heuristic allocation mechanism is employed to enable generalization across the entire dataset.
Each patch module constitutes a fully connected neural network structure. The core concept of our approach lies in the utilization of reachability analysis through a linear approximation for the provable training of patch modules. Specifically, we employ the verification method DeepPoly [24] to establish a linear approximation of the output neurons. This approximation is leveraged to determine the distance between the target behavior and the current behavior, serving as the loss function. The patch modules are then trained to minimize this distance. Once the loss function reaches zero, the patch modules provide provable repairs for adversarial attacks within the perturbation region. By employing an external indicator to identify faulty inputs, adversarial attacks within the same perturbation region can be repaired using the same patch module. Inputs outside the perturbation range of limited adversarial samples are assigned by the indicator, leveraging a heuristic allocation mechanism, thereby ensuring the generalization of the repair. Through the comprehensive framework of our method, we effectively enhance the resilience of the network against adversarial attacks while preserving accuracy. By leveraging reachability analysis and patch modules, we provide provable repairs for adversarial samples, thereby improving the robustness of the network.
The contributions of this paper can be summarized as follows:
We propose a novel approach, named AdvRepair, specifically designed for provable repair of adversarial attacks using limited data. This work represents a significant breakthrough in achieving provable repair for adversarial robustness. Our method introduces patch modules that are seamlessly integrated into the original network architecture. By combining a loss function derived from formal methods, we train these patch modules to provide provable repairs for local robustness. Through a heuristic allocation mechanism, our approach successfully achieves the generalization of local robustness repair.
To address the efficiency challenges associated with formal verification in large-scale DNNs, we leverage patch modules to perform repairs at the feature space of the networks. This strategy enables our method to scale effectively across various network architectures, ensuring efficient and practical implementation.
We conduct a comprehensive evaluation of AdvRepair on three diverse datasets and multiple DNN architectures. Through extensive comparisons with state-of-the-art repair and adversarial training approaches, our method consistently demonstrates superior efficiency and scalability. Moreover, it exhibits remarkable generalization capabilities to handle general inputs, thereby significantly enhancing the overall robustness of the network. This advancement in generalization capabilities represents a substantial contribution of our method.
Overall, our proposed approach, AdvRepair, offers a pioneering solution for provable repair of adversarial attacks using limited data. It not only addresses the challenges posed by formal verification in large-scale DNNs but also achieves remarkable generalization capabilities, leading to improved network robustness.
In this section we recall some basic notations of DNN repair.
A deep neural network is a function \({N}\colon \mathbb{R}^{n_0} \rightarrow \mathbb{R}^{n_L}\) that maps an input \(x \in \mathbb{R}^{n_0}\) to an output \(y \in \mathbb{R}^{n_L}\). We usually visualize a DNN \(N\) as a sequence of \(L\) layers, where the \(i\)th layer contains \(n_i\) neurons each representing a real variable. Between two adjacent layers is typically a composition of an affine function and a non-linear activation function, and the DNN \(N\) is the composition of the functions between layers. In many applications, DNNs are serving for classification tasks. In such a classification DNN \({N}\colon \mathbb{R}^{n_0} \rightarrow \mathbb{R}^{n_L}\), every output dimension corresponds to a classification label, and the one with the maximum output value is the classification results that the DNN \(N\) gives, i.e., \(C_N(x)=\arg \max_{1 \le i \le n_L} N(x)_i\), where \(N(x)_i\) is the \(i\)th entry of the vector \(N(x)\).
The notion of safety properties pertains to assertions that guarantee the absence of undesirable behavior. Within the context of DNNs, a safety property demands that a DNN operates correctly within a specified input range. The local robustness of a DNN refers to its ability to maintain stable and consistent predictions in the vicinity of its training data points, even in the presence of small perturbations or variations in the input. Here a neighborhood of an input \(x_0\) is usually defined as a closed ball \(B(x_0,r):=\{x \in \mathbb{R}^{n_0} \mid \|x-x_0\|_\infty \le r\}\), where \(\|\cdot\|_\infty\) is the \(L_\infty\)-norm, and \(r>0\) is the radius. Formally, a local robustness property of a classification DNN \(N\) requires that for any input \(x\) in a given neighborhood \(B(x_0,r)\) of an input \(x_0\), its classification should always be consistent with \(x_0\), i.e., \(\forall x \in B(x_0,r), C_N(x)=C_N(x_0)\). We denote this local robustness property as \((N,B(x_0,r))\); it is obviously a safety property. An input \(x^* \in B(x_0,r)\) is an adversarial sample of the DNN \(N\), if \(C_N(x^*) \ne C_N(x_0)\), i.e., \(x^*\) is a counterexample which violates the local robustness property \((N,B(x_0,r))\).
In this work, we focus on the problem of repairing adversarial attacks with limited adversarial samples. Different from repairing backdoor attacks, not only does the buggy behavior of the given adversarial samples need fixing, but we also require that there should be no adversarial attacks around given adversarial samples, and that this enhancement of local robustness should generalize to samples across the whole dataset. Now we formally state the problem of repairing adversarial attacks as follows:
Given a DNN \(N\) and a set of adversarial samples \(\{x_i^*\}_{i=1}^n\), where \(n\) can be significantly smaller than the size of the dataset, and each \(x_i^*\) is obtained by an adversarial attack on an input \(x_i\) with a given radius \(r\), we need to construct a DNN \(F\) which is locally robust on every \(B(x_i,r)\) while the accuracy is maintained and local robustness of other inputs is potentially improved.
We focus on fixing adversarial attacks with limited data in this work, and the aims of the repair are at least threefold: For a given radius \(r>0\),
the buggy behaviors of \(\{x_i^*\}_{i=1}^n\) are fixed,
the DNN is locally robust in \(B(x_i,r)\) for \(1 \le i \le n\),
the accuracy of the DNN is maintained, and for as many input \(x\) in the dataset as possible, the DNN is locally robust in \(B(x,r)\).
Here the first objective is the fundamental requirement showing that the repair is successful on the given adversarial samples \(\{x_i^*\}_{i=1}^n\). The second objective aims to ensure the local generalizability of the repair, whereby the absence of adversarial examples is guaranteed within the vicinity of the generated adversarial examples, while the third objective focuses on the global generalizability of the repair, wherein it simultaneously improves the robustness against all potential inputs in the dataset without significantly compromising the accuracy.
The current neural network repair methods under limited data conditions primarily involve parameter adjustments at the neuron level. While these methods demonstrate high efficacy against straightforward backdoor attacks, they face significant challenges in effectively addressing complex adversarial attacks. Even if the erroneous behaviors of \(\{x_i^*\}_{i=1}^n\) are successfully repaired, such repairs lack the ability to generalize either locally or globally. Inspired by Reassure [25], we propose a patch-based repair method. A patch refers to a specific modification or alteration made to a software system or codebase; it is a discrete set of changes applied to fix a bug, enhance functionality, or address security vulnerabilities. Patch-based DNN repair involves the integration of an external indicator designed to identify buggy inputs, followed by the application of specialized patch modules to repair input sets which have similar behaviors. Each indicator here corresponds precisely to a robustness neighborhood \(B(x_i,r)\) that requires repair, and we can leverage verification tools to ensure that the repair within each robustness neighborhood is provable. Furthermore, for other inputs in the dataset, we can heuristically match patches that maximize their robustness, thereby endowing this repair approach with global generalization. In this section, we propose a patch-based repair method named AdvRepair to defend from adversarial attacks with limited data.
The main produce of the repaired network is depicted in Fig. 1. For an input \(x \in \mathbb{R}^{n_0}\), the role of the indicator is to select an appropriate patch that, when applied, yields the sum of the output of the patch module and the output of the original DNN for \(x\). This sum represents the output obtained after the repair process.
The definition of an indicator function is standard:
Definition 1. Let \(\mathcal{C}=(X_1,\ldots,X_m)\) be a finite sequence of input properties. The indicator function \(\mathcal{I}_{\mathcal{C}}\colon \mathbb{R}^{n_0} \to \{0,1\}^{m}\) outputs in the \(j\)th entry as \[\mathcal{I}_{\mathcal{C}}(x)_j = \left\{\begin{array}{ll} 1, & \text{ if } x \models X_j, \\ 0, & \text{ otherwise}, \end{array}\right.\] where \(j=1,\ldots,m\).
Upon classification by the indicator, a set of patch modules is deployed to perform the repair. Here each patch module is specifically tailored to address an input set with the same local robustness property. The implementation of a patch module is a fully connected neural network in this work. For an input \(x\) that is not within the input set of any indicator, i.e., \(x \notin \bigcup_{i=1}^n B(x_i,r)\), it cannot be identified by the indicators, but there is still potential adversarial attack in \(B(x,r)\). In this situation, we heuristically allocate a patch module to this input to defend from adversarial attacks. Formally, the structure of the repaired DNN is as follows:
Definition 2. A repaired DNN is a tuple \(F=(N,\mathcal{C},\mathcal{P},\tau)\), where
\(N \colon \mathbb{R}^{n_0} \to \mathbb{R}^{n_L}\) is the original DNN,
\(\mathcal{C}=(X_1,\ldots,X_m)\) be a finite sequence of input properties,
\(\mathcal{P} = (P_1, \ldots , P_{m})^\mathrm{T}\) is a finite sequence of patch modules, each of which is a fully connected neural network,
and \(\tau \colon \mathbb{R}^{n_0} \setminus \bigcup_{i=1}^n B(x_i,r) \to 2^{\{1,\ldots,m\}}\) is a patch allocation function.
The semantics of the repaired DNN \(F=(N,\mathcal{C},\mathcal{P},\tau)\) is a function \[\begin{align} F\colon\mathbb{R}^{n_0} &\to \mathbb{R}^{n_L}, \\ x &\mapsto \left\{\begin{array}{ll} \mathcal{I}_{\mathcal{C}}(x)^\mathrm{T} \mathcal{P}(x) + N(x), & \text{ if } x \in \bigcup_{i=1}^n B(x_i,r), \\ \sum_{j \in \tau(x)}P_j(x)+N(x), & \text{ otherwise}, \end{array}\right. \end{align}\] where \(\mathcal{P}(x) = (P_1(x), \ldots , P_{m}(x))^\mathrm{T}\).
Generally there are three ways to determine the number of the patch modules and define the indicator:
Property-guided. The property-guided way sets \(\mathcal{C}_{\mathrm{P}} = \{B(x_i,r) \mid i=1,\ldots,n\}\), i.e., the indicator judges which robustness region the input belongs to, and the corresponding patch module defends against adversarial attacks in this specific region.
Untargeted label. A simple label-guided way takes every classification label as an input property. i.e., \(\mathcal{C}_{\mathrm{L}} = \{1,\ldots,n_L\}\), where \(x \models \ell\) iff \(C_N(x)=\ell\). Assuming that \(x\) is correctly classified by \(N\), the patch module to defend against adversarial attacks in \(B(x,r)\) is determined by \(C_N(x)\), and it works for fixing potential adversarial attacks of all the robustness regions where the ground truth is \(C_N(x)\).
Targeted label. We consider the set of label pairs \(\mathcal{C}_{\mathrm{L}} = \{(\ell_0,\ell_1) \in \{1,\ldots,n_L\}^2 \mid \ell_0 \ne \ell_1\}\), where \(x \models (\ell_0,\ell_1)\) iff \(N(x)_{\ell_0} > N(x)_{\ell_1} > N(x)_\ell\) for all \(\ell \ne \ell_0,\ell_1\), i.e., the indicator determines which label \(\ell_1\) is the most likely when attacking the ground truth \(\ell_0\), assuming that \(x\) is correctly classified by \(N\), and the corresponding patch module fixes adversarial attacks of this specific pattern.
The property-guided way has advantages when the number \(n\) of the given adversarial samples is small, while the label-guided ways work well in the situation when the number \(n_L\) of the output labels is small. For the label-guided ways, the patch allocation function \(\tau\) can be simply defined as the corresponding indicator, since this indicator can naturally generalize to other input. In the following, we present our approach in the property-guided way, and it can be easily adapted to the label-guided ways.
The first challenge is to train the patch modules, which are each implemented as a fully connected neural network, to repair each robustness property \((N,B(x_i,r))\) with a provable guarantee. To achieve provability, we employ formal verification in training these patch modules. The result of verification on a robustness property can be transformed into a loss function; once this loss reaches zero, the property is verified to be true, and the patch module in this stage can defend against all possible adversarial attacks in this robustness region. This idea has been widely adopted in DNN repair [26], [27].
In this work, we employ DeepPoly [24] as the verification engine. DeepPoly uses abstract interpretation to give every neuron an upper/lower bound in the form of an affine function, where only variables in previous layers occur, and the numerical bound can be derived by propagating backwards these affine upper/lower bounds to the input layer. Recall that a local robustness property \((N,B(x_i,r))\) holds iff for any \(x \in B(x_i,r)\) and \(\ell \ne \ell_0 =C_N(x_i)\), \(N(x)_\ell<N(x)_{\ell_0}\), so DeepPoly calculates the abstraction of \(N\) on \(B(x_i,r)\) for every neuron and the expressions \(N(x)_\ell-N(x)_{\ell_0}\) for \(\ell \ne \ell_0\). From the abstraction, we can obtain an affine function in which there are only input variables as a sound upper bound of \(N(x)_\ell-N(x)_{\ell_0}\) on \(B(x_i,r)\), i.e., \[\begin{align} \label{eq:deeppoly} \forall x \in B(x_i,r), C_N(x)_\ell-C_N(x)_{\ell_0} \le \alpha_\ell^\mathrm{T} x + \beta_\ell, \end{align}\tag{1}\] where \(\alpha_\ell \in \mathbb{R}^{n_0}\) and \(\beta_\ell \in \mathbb{R}\) are constants. It is easy to obtain the numerical upper bound of \(\alpha_\ell^\mathrm{T} x + \beta_\ell\) on the box region \(B(x_i,r)\). If this upper bound is negative for every \(\ell \ne \ell_0\), then the local robustness property \((N,B(x_i,r))\) is verified to be true by DeepPoly. Therefore, it is natural to use this upper bound as the loss function to train the corresponding patch module.
Definition 3. For a local robustness property \(\varphi=(N,B(x_i,r))\), we define the safety violated loss function of \(\varphi\) as \[\begin{align} \mathcal{L}(\varphi)(x) = \sum_{\ell \ne \ell_0} \max (\alpha_\ell^\mathrm{T} x + \beta_\ell,0), \end{align}\] where \(\alpha_\ell^\mathrm{T} x + \beta_\ell\) is the upper bound of \(N(x)_\ell-N(x)_{\ell_0}\) on \(B(x_i,r)\) given by DeepPoly, as shown in Eq. 1 . For local robustness properties \(\varphi_1,\ldots,\varphi_k\) which share the same ground truth label \(\ell_0\), we define \(\mathcal{L}(\bigwedge_{i=1}^k\varphi_i) = \sum_{i=1}^k \mathcal{L}(\varphi_i)\).
For a local robustness property \(\varphi=(N,B(x_i,r))\), its safety violated loss function \(\mathcal{L}(\varphi)\) being \(0\) implying that \(\varphi\) is verified to be true by DeepPoly.
Theorem 1. Let \(\varphi=(N,B(x_i,r))\) be a local robustness property. If \(\mathcal{L}(\varphi) = 0\) on \(B(x_i,r)\), i.e., \[\mathcal{L}^*(\varphi):=\sum_{\ell \ne \ell_0} \max (\max(\alpha_{\ell}^\mathrm{T},\boldsymbol{0}) \cdot (x_i+r\cdot\boldsymbol{1})+ \min(\alpha_{\ell}^\mathrm{T},\boldsymbol{0}) \cdot (x_i-r\cdot \boldsymbol{1}) + \beta_\ell,0)=0,\] where \(\boldsymbol{0}\) and \(\boldsymbol{1}\) are the vector in \(\mathbb{R}^{n_0}\) with all the entries \(0\) and \(1\), respectively, then the property \(\varphi\) holds.
When we train a patch module \(P\) on the DNN \(F=(N,\mathcal{C},\mathcal{P},\tau)\), the maximum of the safety violated loss function of a local robustness property \(\varphi\), namely \(\mathcal{L}^*(\varphi)\), is parameterized with the weights \(w\) in \(P\), and this \(\mathcal{L}^*(\varphi)(w)\) is the loss function for training this patch \(P\). By employing the standard gradient descent method, we can train the patch \(P\), until the loss reaches \(0\) or the number of epochs reaches a threshold \(R\). We claim that, even if \(\mathcal{L}^*(\varphi)(w)\) does not reach \(0\), the repair does not necessarily fail, since the upper bound \(\mathcal{L}^*(\varphi)\) given by DeepPoly is not exact; as it decreases in the training process, \(F\) is growing more likely to satisfy \(\varphi\). Also, to improve the precision of \(\mathcal{L}^*(\varphi)\) obtained from DeepPoly, we employ the input interval partitioning technique from ART [27]. It selects the partition dimension by computing the multiplication of the partial derivative of the safety violated loss function \(\mathcal{L}(\varphi)(x)\) and the size of the input interval in the corresponding dimension, and bisects the box region over the dimension with the maximum score.
The main algorithm of AdvRepair is shown in Alg. 2. We construct a dictionary \(D\), where \(D[i]\), initialized as \(\{(F,B(x_i,r))\}\), stores the properties that the \(i\)th patch module \(P_i\) repairs (Line 3–5, Alg. 2). The repair process has at most \(M\) iterations. In each iteration, we train the patches which does not yet provide provable repair, i.e., those whose index is in \(E\). The algorithm of training a specific patch module \(P\) for repairing a set \(U\) of properties is shown in Alg. 3, which outputs the optimized patch module \(P\), the set \(T\) of properties to be refined, and a boolean label recording whether the repair of the patch \(P\) has a provable guarantee. A standard gradient descent procedure is run until a provable repair is achieved, i.e., \(\mathcal{L}^*(\bigwedge U)(w)=0\), or the number of epochs reaches a threshold \(R\) (Line 3–7, Alg. 3). After that, if it is still not provable, we sort \(\mathcal{L}^*(\varphi)(w)\) for \(\varphi \in U\) from the largest to the smallest, and extract the largest \(K\) ones (whose value is strictly larger than \(0\)) as the properties to be refined (Line 8, Alg. 3). For a property \(\varphi=(F,X)\) to be refined, we select an input dimension to bisect the input space \(X\). For an input dimension \(d\), we define \[\mathrm{score}_d = \partial_d \mathcal{L}(\varphi)(x) \cdot \sup_{x,x' \in X} |x_d-x_d'|,\] where \(\partial_d \mathcal{L}(\varphi)(x)\) is the partial derivative of \(\mathcal{L}(\varphi)(x)\) on the \(d\)th dimension. We choose the dimension with the largest score to bisect \(X\), and the property \(\varphi=(F,X)\) is refined to two properties \(\varphi_1=(F,X_1)\) and \(\varphi_2=(F,X_2)\), recorded in the dictionary \(D\) (Line 14–19, Alg. 2). At the end of each iteration, we check whether all the patches provide provable repair; if so, it terminates immediately and outputs the current repaired DNN \(F\), and this repair is provable (Line 20–21, Alg. 2). If \(E\) is still non-empty after \(M\) iterations, it outputs \(F\) without provable guarantee (Line 22, Alg. 2). Although we are focusing on fixing adversarial attacks in this work, AdvRepair also works for repair safety properties of DNNs by making minor changes in Alg. 2.
In AdvRepair, we follow a classical way of defining the loss function with the linear relaxation obtained from DeepPoly, and incorporate it into our patch-based repair framework. This combination has several advantages. First, every patch is responsible for repairing properties in a specific pattern, which makes it much easier to obtain a provable repair; even if the repair is not provable, the safety violated loss is significantly declining in the training process, and it is highly possible that the repaired DNN is locally robust. Different from the neuron-level repair, there is no modification on the original DNN \(N\) in AdvRepair, and due to the design of the indicator, the patch modules does not affect the behaviors of other input before \(\tau\) allocates a patch to it. This mechanism avoids the drawdown of accuracy, which is quite severe in many neuron-level repair methods. Also, by allocating patches with \(\tau\), we can achieve good generalization of our repair to other inputs, and it is the key to solving this essential challenge in fixing adversarial attacks. In the following, we will present how to construct the patch allocation function \(\tau\) to improve generalization.
For the property-guided method in AdvRepair, we need to define a patch allocation function \(\tau\) because the construction of patches in this way does not naturally generalize to other inputs. This patch allocation function \(\tau\) is the key to achieving good generalization of the patch modules trained locally to global inputs.
To match the best patch modules for an input \(x' \notin \bigcup_{i=1}^n B(x_i,r)\), we need to extract some specific patterns of the patch modules which generalize to other inputs and their adversarial robustness. Such patterns may include:
the ground truth of \(x'\), which can be represented with \(\ell_0=C_N(x')\),
the targeted label, i.e., the index of the second largest entry in \(N(x')\),
and the cosine similarity between \(N(x')\) and \(N(x_i)\), i.e, \(\tfrac{\langle N(x'),N(x_i) \rangle}{\|N(x')\|_2 \|N(x_i)\|_2}\), where \(\langle \cdot,\cdot \rangle\) is the canonical inner product on \(\mathbb{R}^{n_L}\), and \(\|\cdot\|_2\) is the \(L_2\)-norm.
In the implementation of \(\tau\), the patches with the same ground truth label as \(x'\) is alternative for \(x'\), and we allocate the ten with the maximum cosine similarity between \(N(x')\) and \(N(x_i)\) to \(x'\). Here we do not match with the targeted label because it is possible that more than one labels can potentially attack \(N\) in the robustness region \(B(x',r)\), and matching by the cosine similarity can cover this possibility.
We seemingly have problem when \(x'\) is not correctly classified by \(N\), because in this case we may allocate inappropriate patches. If \(x'\) is an adversarial example, we can employ sampling-based methods like [28] to detect it and recognize its correct classification with a high probability. Otherwise, the wrong classification of \(x'\) may result from backdoor attacks, biased training data, overfitting, etc, and such situations are beyond the scope of fixing adversarial attacks in this work.
Once a set of patch modules has been allocated to an input, it works in the whole robustness region of this input. For an input \(x' \notin \bigcup_{i=1}^n B(x_i,r)\), we allocate it with \(\tau(x')\), a set of patch modules. After that, all the inputs \(x \in B(x',r)\) share the same set of patch modules as \(x'\), i.e., \(F(x)=\sum_{j \in \tau(x')} P_j(x)+N(x)\), instead of allocating \(x\) with \(\tau(x)\). In this sense, our defense is constructed before adversarial attacks.
To repair an extremely large DNN with high input dimensionality, the memory and computational overheads associated with the linear relaxations obtained from formal verification are notably significant and, at times, prohibitive. Also, faced with a huge DNN, the precision of abstraction-based verification methods like DeepPoly is severely harmed during a large number of layer-by-layer propagation. Consequently, the key to adopting AdvRepair to large DNNs is to reduce the input dimensionality and the size of the neural network in formal verification.
In the popular architectures of convolutional neural networks, the convolutional layers play the role of extracting important features from data, followed by several fully connected layers for classification according to these extracted features, so we call such a fully connected layer a feature space.
Definition 4. Let \(N \colon \mathbb{R}^{n_0} \to \mathbb{R}^{n_L}\) be a DNN where the function from the \(i\)th layer to the \((i+1)\)th is \(f_i\), i.e., \(N=f_{L-1} \circ \cdots \circ f_0\). For \(l\in [1,L]\), the feature space of the \(l\)th layer is \(\mathbb{R}^{n_l}\), and the behavior of the input \(x \in \mathbb{R}^{n_0}\) in this feature space is \(N_{0,l}:=f_{l-1}\circ \cdots \circ f_0(x)\). The neural network starting with the feature space of the \(l\)th layer in \(N\) is \(N_{l,L}=f_{L-1} \circ \cdots \circ f_l\).
The dimensionality of a feature space is usually far smaller than that of the input layer. If we start with a feature space in employing DeepPoly, it does not need to calculate the abstraction of a great deal of neurons in the convolutional layers, thus getting its efficiency and precision enhanced.
To conduct repair in a given feature space with AdvRepair, we first give an approximation of the feature space on every \(B(x_i,r)\). Here we do not require that this approximation should be a sound abstraction of the real semantics of the feature space on \(B(x_i,r)\), because calculating such an over-approximation with high precision is quite time-consuming. Instead, we simply sample in \(B(x_i,r)\) and obtain the tightest box region that contains the behaviors of the samples in the feature space as the approximation. This approximation is not sound yet, but it does not mean that the inputs in \(B(x_i,r)\) whose behaviors in the feature space are not within this approximation will not be repaired by the corresponding patch module \(P_i\), because the indicator \(\mathcal{I}_{\mathcal{C}}\) still remains the same. Since the approximation of the feature space is not sound, the repair does not have a provable guarantee. It is also worth mentioning that, even if two properties with different ground truth labels have their feature spaces overlapping, our repair still works. In this situation, although two contradicted properties are involved in the training process, but they must correspond to two different robustness regions \(B(x_i,r)\), and thus different patch modules \(P_i\). The correspondence of the robustness regions \(B(x_i,r)\) and the patch \(P_i\) is always preserved, so \(P_i\) is always working for repairing on \(B(x_i,r)\) with the correct classification label \(\ell_0\).
After we obtain an approximation \(X_{(i)}\) for each \(B(x_i,r)\) in the feature space, we employ Alg. 2, where the initialization of \(D[i]\) in Line 5 is \((N_{l,L}+P_i,X_{(i)})\) instead. Since the repair in the feature space is not provable, the safety violation loss function \(\mathcal{L}(\varphi)\) for \(\varphi=(N_{l,L}+P_i,X_{(i)})\) in Def. 3 is modified to be \[\mathcal{L}(\varphi)(x) = \sum_{\ell \ne \ell_0} (\alpha_\ell^\mathrm{T} x + \beta_\ell),\] where \(\alpha_\ell^\mathrm{T} x + \beta_\ell\) for \(\ell \ne \ell_0\) is obtained by DeepPoly abstracting \(N_{l,L}+P_i\) on \(X_{(i)}\), and \(\mathcal{L}^*(\varphi)\) is modified accordingly. Also, we do not run Line 6–7 in Alg. 3 and skip Line 6, 11–12 and 20–21 in Alg. 2, so that the loss \(\mathcal{L}^*\) decreases as much as possible and we can achieve a better repair performance.
Since we are here considering the feature space in AdvRepair, it is natural to take it into consideration the cosine similarity and the distance of the behaviors of \(x'\) and \(x_i\) in the feature space. If the ground truth label of \(x'\) and \(x_i\) matches, a high cosine similarity and a small distance of the behaviors of \(x'\) and \(x_i\) in the feature space strongly indicate that the patch module \(P_i\) for \(x_i\) is effective in fixing adversarial attacks in \(B(x',r)\). In this case, the patches with the same ground truth label as \(x'\) are still alternative for \(x'\), but we allocate the ten patches with the maximum value of \[\frac{\langle N_{0,l}(x'),N_{0,l}(x_i) \rangle}{\|N_{0,l}(x')\|_2 \cdot \|N_{0,l}(x_i)\|_2} \cdot \|N_{0,l}(x')-N_{0,l}(x_i)\|_2.\]
Repair in a feature space is an effective way to make AdvRepair scale on large DNNs. Although we sacrifice provability, its repair performance and generalization are extremely ideal in practical scenarios, which we will see in our experimental evaluation.
In this section, we evaluate AdvRepair by answering the following research questions:
RQ1:What is the overall performance of AdvRepair in repairing local robustness and correcting safety property violations?
RQ2: Does the repaired DNN exhibit the capability to defend against new adversarial attacks?
RQ3: Does AdvRepair have scalability to repair large networks?
RQ4: What impact does the number of adversarial samples needed for repair have on the effectiveness of the repair process?
All the experiments are conducted on a machine equipped with an AMD EPYC 7763 64-Core Processor, 256 GB of memory, and an NVIDIA GeForce RTX 3090 with 24 GB of GPU memory. Each experiment has a timeout set to 10 000 seconds. ^{1}
Dataset. We conduct evaluations of our approach on three datasets: MNIST [29], CIFAR-10 [30] and ACAS Xu [31], [32]. On MNIST, we train two fully connected neural networks and one convolutional neural network. On CIFAR-10, we train a VGG19 [33] and a ResNet18 [34]. We assess our approach across 35 ACAS Xu DNNs, which comprises six hidden layers, each housing 50 neurons. As documented in [35], these DNNs are expected to satisfy Property-2 but exhibit violations. Detailed model structures and hyperparameters are in Appendix 9. For the local robustness repair task, we subject the DNNs trained on MNIST and CIFAR-10 to PGD [4], generating 50, 100, 200, 500 and 1 000 adversarial samples. For MNIST, the radius \(r\) is set to be 0.05, 0.1 and 0.3, while for CIFAR-10 it is set to be \(\frac{4}{255}\) and \(\frac{8}{255}\). On ACAS Xu, our objective is to repair the violation of Property-2 while preserving the original performance of the ACAS Xu DNNs. We sample 500 counterexamples as the buggy inputs for repair.
Metrics. To assess generalization, we sample ten adversarial examples on each \(B(x_i,r)\) different from \(x_i^*\) to form a generalization set \(D_\mathrm{g}\) for MNIST and CIFAR-10, and use an independent set of 5 000 counterexamples for ACAS Xu. Besides these, we have an independent test set \(D_\mathrm{t}\) of size 10 000, 10 000 and 5 000 for MNIST, CIFAR-10 and ACAS Xu, respectively. We employ AutoAttack [5] to attack the repaired DNN \(F\) on \(\{x_i\}_{i=1}^n\) and \(D_\mathrm{t}\) with the same radius \(r\). The metrics we use include repair success rate (\(\mathrm{RSR}\)), repair generalization rate (\(\mathrm{RGR}\)), drawdown (\(\mathrm{DD}\)), defense success rate (\(\mathrm{DSR}\)), and defense generalization success rate (\(\mathrm{DGSR}\)), defined as follows: \[\begin{align} \mathrm{RSR}&= \frac{ |\{\,i \mid C_{F}(x_i^*) = C_N(x_i)\,\}|}{n}, \qquad \mathrm{RGR}= \frac{|\{{\,x \in D_{\mathrm{g}}} \mid C_{F}(x) = \ell_x\,\}|}{\left|D_{\mathrm{g}}\right|}, \\ \mathrm{DD} &= \frac{|\{\,{x \in D_{\mathrm{t}}} \mid C_{N}(x) = \ell_x\,\}| - |\{\,{x \in D_{\mathrm{t}}} \mid C_{F}(x) = \ell_x\,\}|}{\left|D_{\mathrm{t}}\right|}, \\ \mathrm{DSR} &= \frac{|\{\,i \mid \exists x \in \mathrm{AA}(F,B(x_i,r)), C_{F}(x) \neq C_N(x_i)\,\}|}{n}, \\ \mathrm{DGSR} &= \frac{|\{\,x \in D_{\mathrm{t}} \mid \exists x' \in \mathrm{AA}(F,B(x,r)), C_{F}(x') \neq C_N(x)\,\}|}{\left|D_{\mathrm{t}}\right|}, \end{align}\] where \(\ell_x\) denotes the ground truth of \(x\), and \(\mathrm{AA}(\varphi)\) represents the set of potential adversarial samples obtained by attacking the local robustness property \(\varphi\) with AutoAttack. On ACAS Xu, the metric Drawdown is replaced with Fidelity Drawdown (\(\mathrm{FDD}\)) because there is no labelled ground truth for inputs on ACAS Xu: \[\mathrm{FDD} = \frac{|\{\,{x \in D_{\mathrm{t}}} \mid C_{F}(x) = C_{N}(x)\,\}|}{|D_{\mathrm{t}}|}.\] The metrics \(\mathrm{RSR}\) and \(\mathrm{DD}\) (or \(\mathrm{FDD}\)) evaluates the overall performance of DNN repair by measuring the percentage of buggy inputs successfully repaired and how much the accuracy decreases, while \(\mathrm{RGR}\), \(\mathrm{DSR}\) and \(\mathrm{DGSR}\) reflects how the repair generalizes to the robustness regions \(B(x_i,r)\) and to other inputs.
Baselines. We compare AdvRepair with the state-of-the-art repair methods including CARE [22], APRNN [36], PRDNN [37], REASSURE [25] and ART [27]. Since APRNN and PRDNN require selecting a layer to repair on, we traverse all eligible layers across each baseline, and report the layer that exhibit the best performance. Additionally, for the local robustness repair task, we compare our approach with an adversarial training method TRADES [38], where we select the trained model with the best performance in 200 epochs with their default parameters.
Model | \(r\) | \(n\) | RSR/% | DD/% | Time/s | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4-24 | CARE | PRDNN | APRNN | REASS | TRADE | ART | Ours | CARE | PRDNN | APRNN | REASS | TRADE | ART | Ours | CARE | PRDNN | APRNN | REASS | TRADE | ART | Ours | ||
FNN_ small | 0.05 | 50 | 8.0 | 100.0 | 100.0 | 100.0 | 54.0 | 10.0 | 100.0 | 1.0 | 30.9 | 19.7 | 0.0 | 3.3 | 86.8 | 0.0 | 2.0 | 35.9 | 3.5 | 381.3 | 5.2 | 257.7 | 14.3 |
100 | 2.0 | 100.0 | 100.0 | 100.0 | 2.0 | 10.0 | 100.0 | 0.1 | 25.8 | 29.6 | 0.0 | 86.8 | 87.0 | 0.0 | 2.3 | 80.8 | 9.6 | 762.5 | 13.4 | 318.5 | 24.8 | ||
200 | 4.5 | 100.0 | 0.0 | 100.0 | 3.5 | 11.0 | 100.0 | 0.4 | 39.5 | 66.2 | 0.0 | 86.8 | 86.5 | 0.0 | 3.2 | 268.2 | 82.8 | 1486.6 | 24.6 | 422.3 | 60.6 | ||
500 | 5.8 | 100.0 | 0.0 | 100.0 | 2.8 | 11.0 | 100.0 | 1.7 | 56.1 | 68.3 | 0.0 | 86.8 | 86.8 | 0.0 | 8.2 | 1358.4 | 73.0 | 3812.3 | 54.1 | 688.7 | 236.7 | ||
1000 | 11.4 | 100.0 | 0.0 | – | 3.9 | 11.0 | 100.0 | 16.6 | 43.1 | 69.2 | – | 86.8 | 87.0 | 0.0 | 17.3 | 3473.5 | 34.3 | – | 103.0 | 1247.5 | 756.7 | ||
0.1 | 50 | 2.0 | 100.0 | 100.0 | 100.0 | 8.0 | 14.0 | 100.0 | 0.0 | 71.3 | 50.9 | 0.0 | 86.8 | 86.5 | 0.0 | 2.2 | 51.6 | 3.8 | 358.8 | 8.2 | 393.0 | 15.5 | |
100 | 2.0 | 100.0 | 0.0 | 100.0 | 13.0 | 9.0 | 100.0 | 0.0 | 62.4 | 48.5 | 0.0 | 86.8 | 87.0 | 0.0 | 1.7 | 92.2 | 72.8 | 737.7 | 13.4 | 321.4 | 37.3 | ||
200 | 1.0 | 100.0 | 0.0 | 100.0 | 10.5 | 12.0 | 100.0 | 0.3 | 60.1 | 63.9 | 0.0 | 86.8 | 86.5 | 0.0 | 2.8 | 321.8 | 149.2 | 1403.8 | 22.7 | 416.7 | 62.5 | ||
500 | 0.4 | 100.0 | 0.0 | 100.0 | 10.0 | 14.0 | 100.0 | -0.1 | 51.6 | 71.8 | 0.0 | 86.8 | 86.8 | 0.0 | 5.1 | 1324.6 | 27.9 | 3482.9 | 53.7 | 564.4 | 193.2 | ||
1000 | 0.6 | 100.0 | 0.0 | – | 10.0 | 14.0 | 100.0 | 0.1 | 42.4 | 73.0 | – | 86.8 | 86.8 | 0.0 | 8.0 | 3844.8 | 33.9 | – | 104.5 | 1733.2 | 768.1 | ||
0.3 | 50 | 0.0 | 100.0 | 100.0 | 100.0 | 8.0 | 8.0 | 94.0 | 0.0 | 77.0 | 42.9 | 0.0 | 86.8 | 86.8 | 0.0 | 1.8 | 55.3 | 3.6 | 328.3 | 13.3 | 255.7 | 18.6 | |
100 | 6.0 | 100.0 | 100.0 | 100.0 | 13.0 | 11.0 | 99.0 | 7.0 | 74.1 | 61.5 | 0.0 | 86.8 | 87.0 | 0.0 | 2.6 | 115.7 | 9.5 | 670.3 | 13.4 | 322.3 | 52.0 | ||
200 | 0.0 | 100.0 | 0.0 | 100.0 | 10.5 | 8.0 | 99.5 | 0.0 | 62.8 | 54.2 | 0.0 | 86.8 | 87.0 | 0.0 | 3.2 | 364.9 | 132.1 | 1306.2 | 23.2 | 405.6 | 115.8 | ||
500 | 0.0 | 100.0 | 0.0 | 100.0 | 10.0 | 10.0 | 99.4 | -0.1 | 57.1 | 65.7 | 0.0 | 86.8 | 86.3 | 0.0 | 4.8 | 1934.4 | 147.9 | 3248.1 | 52.5 | 555.8 | 255.9 | ||
1000 | 0.0 | 100.0 | 0.0 | – | 9.8 | 11.0 | 99.6 | 0.0 | 58.3 | 83.9 | – | 86.8 | 86.8 | 0.0 | 7.4 | 3759.6 | 44.7 | – | 103.2 | 2116.1 | 935.8 | ||
FNN_ big | 0.05 | 50 | 6.0 | 100.0 | 100.0 | 100.0 | 98.0 | 24.0 | 100.0 | 2.0 | 0.7 | 3.6 | 0.0 | 0.4 | 85.8 | 0.0 | 1.7 | 89.5 | 11.2 | 1807.6 | 5.7 | 436.5 | 14.5 |
100 | 10.0 | 100.0 | 100.0 | 100.0 | 78.0 | 9.0 | 100.0 | 2.8 | 0.8 | 16.0 | 0.0 | -0.1 | 86.9 | 0.0 | 3.6 | 171.1 | 24.3 | 3673.7 | 14.3 | 503.5 | 29.1 | ||
200 | 10.0 | 100.0 | 100.0 | 100.0 | 73.0 | 6.0 | 100.0 | 4.8 | 1.0 | 28.1 | 0.0 | 0.1 | 88.3 | 0.0 | 4.5 | 419.0 | 92.3 | 7399.1 | 24.3 | 628.3 | 51.2 | ||
500 | 10.0 | 100.0 | 100.0 | – | 67.0 | 10.0 | 100.0 | 6.1 | 1.6 | 25.8 | – | -0.1 | 87.1 | 0.0 | 9.2 | 1390.1 | 657.6 | – | 54.4 | 1080.2 | 290.6 | ||
1000 | 11.2 | 100.0 | 100.0 | – | 71.9 | 20.0 | 100.0 | 6.2 | 1.9 | 28.0 | – | 0.1 | 85.8 | 0.0 | 17.9 | 4139.0 | 5499.2 | – | 108.1 | 2167.6 | 1033.2 | ||
0.1 | 50 | 18.0 | 100.0 | 100.0 | 100.0 | 8.0 | 8.0 | 100.0 | 13.7 | 6.7 | 13.9 | 0.0 | 0.8 | 87.4 | 0.0 | 2.7 | 88.2 | 10.5 | 1612.1 | 7.1 | 442.2 | 13.9 | |
100 | 2.0 | 100.0 | 100.0 | 100.0 | 27.0 | 11.0 | 100.0 | 0.1 | 5.6 | 25.8 | 0.0 | 8.8 | 87.4 | 0.0 | 2.9 | 173.7 | 22.8 | 3337.2 | 13.5 | 514.6 | 32.1 | ||
200 | 17.5 | 100.0 | 100.0 | 100.0 | 15.5 | 10.0 | 100.0 | 10.6 | 6.1 | 27.7 | 0.0 | 0.7 | 87.4 | 0.0 | 6.4 | 445.5 | 95.8 | 6801.2 | 23.3 | 639.2 | 75.4 | ||
500 | 0.6 | 100.0 | 100.0 | – | 15.4 | 15.0 | 100.0 | 1.3 | 7.4 | 33.7 | – | 0.5 | 88.3 | 0.0 | 6.3 | 1403.6 | 684.8 | – | 57.1 | 1048.6 | 356.8 | ||
1000 | 0.5 | 100.0 | 100.0 | – | 26.4 | 11.0 | 100.0 | 0.7 | 8.3 | 33.3 | – | 0.5 | 86.9 | 0.0 | 11.2 | 4321.6 | 4726.4 | – | 107.8 | 2148.3 | 735.9 | ||
0.3 | 50 | 0.0 | 100.0 | 100.0 | 100.0 | 0.0 | 14.0 | 100.0 | 0.0 | 27.5 | 33.3 | 0.0 | 0.5 | 85.8 | 0.0 | 1.8 | 119.5 | 11.6 | 1280.9 | 7.1 | 439.2 | 17.1 | |
100 | 1.0 | 100.0 | 100.0 | 100.0 | 4.0 | 14.0 | 100.0 | 0.0 | 31.8 | 56.6 | 0.0 | 1.8 | 85.8 | 0.0 | 2.1 | 253.1 | 27.0 | 2612.3 | 13.5 | 518.9 | 37.0 | ||
200 | 1.0 | 100.0 | 100.0 | 100.0 | 2.0 | 10.0 | 100.0 | 0.2 | 28.5 | 28.4 | 0.0 | 0.4 | 86.9 | 0.0 | 3.7 | 630.5 | 105.6 | 5186.7 | 23.9 | 643.3 | 100.4 | ||
500 | 0.4 | 100.0 | 100.0 | – | 2.2 | 9.0 | 100.0 | 0.2 | 23.9 | 29.5 | – | 0.6 | 86.9 | 0.0 | 4.9 | 2117.7 | 671.8 | – | 55.2 | 1047.6 | 365.4 | ||
1000 | 0.6 | 100.0 | 100.0 | – | 4.8 | 12.0 | 100.0 | 0.7 | 23.6 | 26.3 | – | 0.8 | 85.8 | 0.0 | 10.9 | 5480.3 | 4297.3 | – | 108.8 | 2158.5 | 1246.4 | ||
CNN | 0.05 | 50 | 0.0 | 100.0 | 100.0 | 100.0 | 100.0 | 0.0 | 100.0 | 2.4 | 0.8 | 0.0 | 0.0 | 0.9 | 88.5 | 0.0 | 3.5 | 6.2 | 57.9 | 509.3 | 4.3 | 789.9 | 14.1 |
100 | 0.0 | 100.0 | 100.0 | 100.0 | 92.0 | 60.0 | 100.0 | 2.4 | 0.7 | 0.1 | 0.0 | 0.2 | 88.5 | 0.0 | 6.0 | 5.1 | 99.0 | 910.7 | 10.3 | 879.0 | 23.4 | ||
200 | 0.0 | 100.0 | 100.0 | 100.0 | 88.0 | 0.0 | 100.0 | 2.7 | 1.0 | 0.0 | 0.0 | 0.0 | 88.5 | 0.0 | 9.8 | 19.0 | 182.9 | 1825.6 | 18.0 | 969.9 | 83.6 | ||
500 | 0.0 | 100.0 | 100.0 | – | 92.2 | – | 100.0 | 3.4 | 1.2 | -0.1 | – | 0.0 | – | 0.0 | 26.9 | 146.2 | 547.8 | – | 40.4 | – | 221.9 | ||
1000 | 0.0 | 100.0 | 100.0 | – | 91.2 | – | 100.0 | 7.5 | 1.8 | 0.0 | – | 0.0 | – | 0.0 | 38.1 | 549.1 | 1611.6 | – | 80.9 | – | 1001.9 | ||
0.1 | 50 | 0.0 | 100.0 | 100.0 | 100.0 | 94.0 | 0.0 | 100.0 | 0.6 | 1.0 | 0.1 | 0.0 | 0.4 | 88.0 | 0.0 | 2.6 | 45.6 | 56.6 | 456.0 | 5.7 | 938.0 | 12.7 | |
100 | 0.0 | 100.0 | 100.0 | 100.0 | 94.0 | 10.0 | 100.0 | 1.3 | 1.0 | 0.1 | 0.0 | 0.7 | 88.7 | 0.0 | 5.1 | 9.8 | 104.8 | 896.8 | 10.4 | 1022.8 | 29.5 | ||
200 | 0.0 | 100.0 | 100.0 | 100.0 | 91.5 | 20.0 | 100.0 | 2.8 | 2.2 | 0.3 | 0.0 | 0.4 | 88.5 | 0.0 | 8.1 | 41.1 | 178.5 | 1800.8 | 17.7 | 1108.1 | 75.1 | ||
500 | 0.0 | 100.0 | 100.0 | – | 91.6 | – | 100.0 | 9.1 | 1.7 | 0.4 | – | 0.5 | – | 0.0 | 24.9 | 139.2 | 458.5 | – | 40.6 | – | 337.0 | ||
1000 | 0.0 | 100.0 | 100.0 | – | 92.8 | – | 100.0 | 5.4 | 2.4 | 0.2 | – | 0.5 | – | 0.0 | 32.0 | 492.7 | 1526.0 | – | 80.4 | – | 700.2 | ||
0.3 | 50 | 0.0 | 100.0 | 100.0 | 100.0 | 0.0 | 10.0 | 100.0 | 0.0 | 43.2 | 7.2 | 0.0 | 0.8 | 88.5 | 0.0 | 2.2 | 3.8 | 59.9 | 453.8 | 5.6 | 808.2 | 15.0 | |
100 | 0.0 | 100.0 | 100.0 | 100.0 | 0.0 | 10.0 | 100.0 | 0.0 | 38.8 | 5.0 | 0.0 | 0.2 | 89.3 | 0.0 | 2.6 | 29.4 | 105.4 | 907.5 | 10.1 | 899.5 | 32.6 | ||
200 | 0.0 | 100.0 | 100.0 | 100.0 | 0.0 | 0.0 | 100.0 | 0.0 | 36.8 | 3.6 | 0.0 | 0.2 | 89.3 | 0.0 | 4.6 | 59.9 | 196.2 | 1793.7 | 17.8 | 1014.7 | 87.9 | ||
500 | 0.0 | 100.0 | 100.0 | – | 0.2 | – | 100.0 | 0.5 | 48.8 | 3.2 | – | 0.3 | – | 0.0 | 9.2 | 167.9 | 518.8 | – | 39.9 | – | 361.4 | ||
1000 | 0.0 | 100.0 | 100.0 | – | 0.6 | – | 100.0 | 0.1 | 56.5 | 2.2 | – | 0.4 | – | 0.0 | 16.8 | 423.8 | 1667.0 | – | 80.4 | – | 1282.6 |
For the overall repair performance, we focus on \(\mathrm{RSR}\), \(\mathrm{DD}\) and running time, which indicate whether the buggy behaviors of \(\{x_i^*\}_{i=1}^n\) can be fixed, whether the accuracy is maintained, and the efficiency, respectively.
The results of the local robustness repair task on MNIST are summarized in Table 1. In the experiments of repairing local robustness, PRDNN, REASSURE and AdvRepair all achieve a perfect 100% \(\mathrm{RSR}\), attributed to their provable constructions. APRNN, while provable, exhibits limitations in effectively repairing specific instances within the FNN_small model. Despite being a provable method, ART is less effective in repairing tasks involving high-dimensional inputs. CARE and TRADES, which do not have provable guarantee, demonstrate lower \(\mathrm{RSR}\) compared to the provable approaches. CARE exhibits suboptimal repair performance for all the three models, possibly due to the intricate relationship between local robustness repair and the numerous parameters in the DNNs. The challenge lies in resolving such errors through fault localization and modifying part of parameters. While TRADES exhibits better repair results on the convolutional neural network compared to the fully connected ones, its efficacy diminishes when addressing local robustness errors with a larger radius. Concerning \(\mathrm{DD}\), AdvRepair and REASSURE outperform the other methods on all the models.
Model | RSR/% | RGR/% | FDD/% | Time/s | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2-21 | CARE | PRDNN | REASS | ART | Ours | CARE | PRDNN | REASS | ART | Ours | CARE | PRDNN | REASS | ART | Ours | CARE | PRDNN | REASS | ART | Ours |
\(N_{2,1}\) | 59.2 | 100.0 | 100.0 | 100.0 | 100.0 | 60.1 | 97.5 | 100.0 | 100.0 | 100.0 | 39.1 | 50.3 | 64.2 | 9.2 | 0.0 | 28.1 | 4.3 | 572.3 | 18.3 | 25.3 |
\(N_{2,2}\) | 67.0 | 100.0 | 100.0 | 100.0 | 100.0 | 65.6 | 99.0 | 99.7 | 100.0 | 100.0 | 56.5 | 87.6 | 96.5 | 9.9 | 0.0 | 23.6 | 3.8 | 545.6 | 18.6 | 18.8 |
\(N_{2,3}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 99.9 | 99.6 | 100.0 | 100.0 | 100.0 | 11.1 | 92.0 | 80.1 | 9.3 | 0.0 | 14.2 | 2.9 | 522.8 | 17.8 | 18.2 |
\(N_{2,4}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 98.8 | 99.9 | 100.0 | 100.0 | 100.0 | 10.5 | 97.6 | 67.6 | 8.8 | 0.0 | 13.9 | 3.1 | 515.4 | 17.9 | 18.5 |
\(N_{2,5}\) | 98.4 | 100.0 | 100.0 | 100.0 | 100.0 | 97.8 | 98.9 | 64.1 | 100.0 | 100.0 | 9.6 | 90.9 | 91.9 | 5.4 | 0.0 | 28.8 | 3.7 | 522.0 | 16.8 | 19.0 |
\(N_{2,6}\) | 59.4 | 100.0 | 100.0 | 100.0 | 100.0 | 54.7 | 98.5 | 87.3 | 100.0 | 100.0 | 5.1 | 92.8 | 94.6 | 2.7 | 0.0 | 29.2 | 4.4 | 518.1 | 17.2 | 24.4 |
\(N_{2,7}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 99.3 | 92.5 | 100.0 | 100.0 | 5.9 | 97.0 | 88.0 | 1.7 | 0.0 | 11.7 | 5.0 | 520.6 | 16.1 | 30.4 |
\(N_{2,8}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 97.8 | 93.7 | 100.0 | 100.0 | 6.3 | 95.7 | 86.1 | 2.1 | 0.0 | 12.7 | 4.4 | 526.1 | 15.8 | 18.1 |
\(N_{2,9}\) | 97.8 | 100.0 | 100.0 | 100.0 | 100.0 | 98.1 | 90.5 | 92.6 | 100.0 | 100.0 | 13.5 | 94.9 | 88.9 | 1.5 | 0.0 | 24.4 | 5.1 | 512.2 | 15.7 | 24.2 |
\(N_{3,1}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 97.8 | 99.2 | 100.0 | 100.0 | 8.7 | 90.9 | 79.2 | 7.3 | 0.0 | 12.7 | 6.3 | 521.1 | 17.2 | 18.5 |
\(N_{3,2}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 99.3 | 95.3 | 100.0 | 100.0 | 8.9 | 12.2 | 48.6 | 8.6 | 0.0 | 12.2 | 3.0 | 510.9 | 16.7 | 18.3 |
\(N_{3,3}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 99.6 | 94.3 | 98.8 | 100.0 | 100.0 | 35.7 | 66.4 | 68.6 | 8.3 | 0.0 | 24.6 | 4.3 | 516.6 | 17.4 | 18.5 |
\(N_{3,4}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 98.1 | 97.7 | 100.0 | 100.0 | 18.9 | 97.5 | 96.0 | 9.6 | 0.0 | 23.3 | 3.1 | 513.9 | 15.6 | 23.5 |
\(N_{3,5}\) | 99.6 | 100.0 | 100.0 | 100.0 | 100.0 | 99.4 | 97.7 | 80.7 | 100.0 | 100.0 | 13.0 | 98.8 | 95.4 | 5.8 | 0.0 | 33.8 | 3.6 | 528.0 | 17.5 | 17.8 |
\(N_{3,6}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 99.9 | 98.7 | 97.2 | 100.0 | 100.0 | 9.8 | 94.3 | 96.6 | 2.7 | 0.0 | 16.1 | 4.6 | 639.5 | 16.8 | 17.5 |
\(N_{3,7}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 91.6 | 81.8 | 100.0 | 100.0 | 5.5 | 96.4 | 80.0 | 2.2 | 0.0 | 20.6 | 5.0 | 515.4 | 16.9 | 18.6 |
\(N_{3,8}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 98.8 | 98.3 | 100.0 | 100.0 | 5.0 | 97.5 | 88.8 | 1.6 | 0.0 | 15.2 | 4.1 | 579.9 | 17.9 | 24.8 |
\(N_{3,9}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 99.6 | 96.0 | 49.2 | 100.0 | 100.0 | 9.2 | 97.0 | 86.1 | 1.9 | 0.0 | 16.2 | 4.9 | 774.4 | 16.8 | 19.0 |
\(N_{4,1}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 99.9 | 98.3 | 98.8 | 100.0 | 100.0 | 40.1 | 96.2 | 93.2 | 7.8 | 0.0 | 14.5 | 3.5 | 549.4 | 17.6 | 24.5 |
\(N_{4,3}\) | 98.4 | 100.0 | 100.0 | 100.0 | 100.0 | 97.5 | 99.4 | 99.8 | 100.0 | 100.0 | 20.1 | 91.1 | 87.5 | 7.0 | 0.0 | 28.0 | 2.9 | 506.7 | 17.1 | 17.7 |
\(N_{4,4}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 98.7 | 99.9 | 100.0 | 100.0 | 4.3 | 82.0 | 85.2 | 8.4 | 0.0 | 9.5 | 3.1 | 535.8 | 16.9 | 17.9 |
\(N_{4,5}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 99.8 | 94.6 | 100.0 | 100.0 | 33.6 | 46.4 | 94.8 | 6.0 | 0.0 | 9.8 | 5.0 | 544.8 | 18.2 | 17.7 |
\(N_{4,6}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 98.2 | 97.9 | 100.0 | 100.0 | 11.2 | 89.6 | 86.0 | 2.6 | 0.0 | 19.9 | 5.4 | 548.2 | 15.9 | 24.1 |
\(N_{4,7}\) | 77.6 | 100.0 | 100.0 | 100.0 | 100.0 | 78.1 | 98.8 | 85.6 | 100.0 | 100.0 | 3.9 | 96.3 | 96.2 | 1.6 | 0.0 | 38.4 | 4.5 | 544.0 | 16.0 | 24.5 |
\(N_{4,8}\) | 99.8 | 100.0 | 100.0 | 100.0 | 100.0 | 99.7 | 97.4 | 77.1 | 100.0 | 100.0 | 4.7 | 97.9 | 88.2 | 1.6 | 0.0 | 36.0 | 4.9 | 542.7 | 14.8 | 30.2 |
\(N_{4,9}\) | 99.8 | 100.0 | 100.0 | 100.0 | 100.0 | 99.8 | 95.0 | 64.0 | 100.0 | 100.0 | 7.7 | 98.4 | 90.1 | 3.3 | 0.0 | 21.2 | 5.0 | 545.8 | 16.4 | 23.8 |
\(N_{5,1}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 90.5 | 97.6 | 100.0 | 100.0 | 14.0 | 98.4 | 96.9 | 6.9 | 0.0 | 19.8 | 3.5 | 541.3 | 17.1 | 18.6 |
\(N_{5,2}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 99.9 | 99.3 | 100.0 | 100.0 | 100.0 | 4.1 | 91.8 | 94.1 | 7.2 | 0.0 | 13.3 | 4.0 | 543.4 | 17.2 | 18.3 |
\(N_{5,3}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 99.9 | 88.6 | 99.7 | 100.0 | 100.0 | 8.8 | 88.3 | 94.8 | 6.0 | 0.0 | 17.4 | 3.5 | 542.9 | 16.8 | 18.7 |
\(N_{5,4}\) | 99.6 | 100.0 | 100.0 | 100.0 | 100.0 | 98.9 | 95.9 | 99.5 | 100.0 | 100.0 | 17.7 | 98.0 | 91.9 | 7.1 | 0.0 | 16.9 | 2.9 | 540.7 | 16.1 | 18.2 |
\(N_{5,5}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 97.0 | 99.7 | 100.0 | 100.0 | 5.4 | 94.8 | 94.4 | 3.9 | 0.0 | 8.6 | 3.4 | 518.9 | 15.9 | 18.3 |
\(N_{5,6}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 99.2 | 98.8 | 100.0 | 100.0 | 6.0 | 95.7 | 93.7 | 9.6 | 0.0 | 13.1 | 4.7 | 519.5 | 15.9 | 30.0 |
\(N_{5,7}\) | 97.2 | 100.0 | 100.0 | 100.0 | 100.0 | 97.7 | 99.1 | 98.6 | 100.0 | 100.0 | 4.9 | 95.3 | 87.7 | 1.8 | 0.0 | 21.2 | 5.0 | 517.3 | 16.3 | 24.2 |
\(N_{5,8}\) | 99.2 | 100.0 | 100.0 | 100.0 | 100.0 | 98.9 | 99.7 | 95.5 | 100.0 | 100.0 | 5.2 | 99.2 | 80.0 | 1.8 | 0.0 | 28.3 | 4.3 | 517.9 | 15.9 | 24.1 |
\(N_{5,9}\) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 96.9 | 91.7 | 100.0 | 100.0 | 5.4 | 97.2 | 27.0 | 2.0 | 0.0 | 22.6 | 4.8 | 515.8 | 15.4 | 24.3 |
Avg | 95.8 | 100.0 | 100.0 | 100.0 | 100.0 | 95.5 | 97.3 | 92.2 | 100.0 | 100.0 | 13.4 | 88.8 | 85.1 | 5.2 | 0.0 | 20.0 | 4.2 | 540.9 | 16.8 | 21.4 |
The results of correcting safety property violations on ACAS Xu are presented in Table 2. For the task of correcting safety property violation, AdvRepair, PRDNN, REASSURE, and ART successfully construct a provable repair, resulting in a 100% \(\mathrm{RSR}\). In contrast, CARE attains an average \(\mathrm{RSR}\) of 95.8%. Regarding \(\mathrm{RGR}\), all baseline methods exhibit notable performance, with ART and our approach showcasing the highest performance at 100%. In terms of \(\mathrm{FDD}\), our approach achieves the most favorable performance with zero, surpassing CARE, ART, PRDNN, and REASSURE. The results highlight the effectiveness of our approach in achieving provable repairs and maintaining high fidelity.
In this experiment, we evaluate the generalization of the repaired network and its defense capabilities against new adversarial attacks, as shown in Table 3. AdvRepair outperforms the other baselines in terms of \(\mathrm{RGR}\), showcasing superior performance. Notably, PRDNN, APRNN and TRADES also demonstrate some level of generalization, especially on the convolutional neural network, while the remaining tools exhibit relatively poor results. In terms of \(\mathrm{DSR}\), AdvRepair exhibits superior performance compared to the other tools. TRADES is the only alternative showing a certain defensive capability against adversarial attacks on FNN_big and CNN, particularly when the radius is small. When evaluating \(\mathrm{DGSR}\), we find that AdvRepair once again achieves the best performance. CARE, PRDNN, APRNN and TRADES demonstrate limited defense capabilities against FNN_big and CNN, especially when the radius is small. Notice that \(\mathrm{DGSR}\) is the most significant metric for assessing generalization to global inputs. These results are sufficient to show that AdvRepair achieves a breakthrough in repairing adversarial attacks.
Model | \(r\) | \(n\) | RGR/% | DSR/% | DGSR/% | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4-24 | CARE | PRDNN | APRNN | REASS | TRADE | ART | Ours | CARE | PRDNN | APRNN | REASS | TRADE | ART | Ours | CARE | PRDNN | APRNN | REASS | TRADE | ART | Ours | ||
FNN_ small | 0.05 | 50 | 7.6 | 44.4 | 47.2 | 10.8 | 80.0 | 10.0 | 100.0 | 0.0 | 0.0 | 2.0 | 0.0 | 26.0 | 6.0 | 100.0 | 8.9 | 3.2 | 2.3 | 0.0 | 9.4 | 8.9 | 90.0 |
100 | 2.2 | 48.7 | 46.7 | 9.1 | 0.0 | 10.0 | 100.0 | 1.0 | 0.0 | 0.0 | 0.0 | 2.0 | 20.0 | 100.0 | 9.3 | 5.1 | 0.8 | 0.0 | 9.8 | 11.3 | 96.4 | ||
200 | 5.0 | 42.6 | 16.1 | 8.5 | 0.0 | 11.0 | 100.0 | 2.0 | 0.0 | 0.5 | 0.0 | 3.5 | 10.0 | 100.0 | 8.6 | 1.1 | 1.3 | 0.0 | 9.8 | 10.1 | 96.6 | ||
500 | 6.6 | 39.0 | 10.3 | 7.2 | 0.0 | 11.0 | 100.0 | 1.4 | 0.0 | 3.8 | 0.0 | 2.8 | 9.6 | 100.0 | 8.3 | 0.5 | 5.3 | 0.0 | 9.8 | 8.9 | 96.6 | ||
1000 | 12.7 | 41.3 | 10.9 | – | 2.0 | 11.0 | 100.0 | 2.3 | 0.0 | 0.4 | – | 3.9 | 14.4 | 100.0 | 6.1 | 0.2 | 0.9 | 0.0 | 9.8 | 11.3 | 96.6 | ||
0.1 | 50 | 0.8 | 24.2 | 32.0 | 2.4 | 20.0 | 14.0 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 8.0 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 9.8 | 9.6 | 96.3 | |
100 | 1.7 | 29.8 | 21.9 | 1.8 | 10.0 | 9.0 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 13.0 | 9.0 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 9.8 | 9.6 | 95.0 | ||
200 | 1.1 | 32.7 | 12.9 | 1.4 | 5.0 | 12.0 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 10.5 | 11.5 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 9.8 | 10.1 | 95.2 | ||
500 | 1.7 | 37.5 | 17.3 | 1.4 | 8.0 | 14.0 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 10.0 | 10.0 | 100.0 | 0.0 | 0.0 | 0.1 | 0.0 | 9.8 | 9.8 | 95.3 | ||
1000 | 2.3 | 42.5 | 15.2 | – | 13.0 | 14.0 | 100.0 | 0.0 | 0.0 | 1.2 | – | 10.0 | 10.0 | 100.0 | 0.0 | 0.0 | 0.2 | 0.0 | 9.8 | 9.8 | 96.5 | ||
0.3 | 50 | 0.0 | 24.2 | 37.4 | 0.8 | 20.0 | 8.0 | 94.8 | 0.0 | 0.0 | 0.0 | 0.0 | 8.0 | 8.0 | 92.0 | 0.0 | 0.0 | 0.0 | 0.0 | 9.8 | 9.7 | 86.4 | |
100 | 7.7 | 22.0 | 18.4 | 0.4 | 10.0 | 11.0 | 99.0 | 0.0 | 0.0 | 0.0 | 0.0 | 13.0 | 11.0 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 9.8 | 9.6 | 96.2 | ||
200 | 0.0 | 26.1 | 8.8 | 0.5 | 5.0 | 8.0 | 99.5 | 0.0 | 0.0 | 0.0 | 0.0 | 10.5 | 9.5 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 9.8 | 9.6 | 90.8 | ||
500 | 0.4 | 30.4 | 12.8 | 0.3 | 8.0 | 10.0 | 99.5 | 0.0 | 0.0 | 0.0 | 0.0 | 10.0 | 10.4 | 99.8 | 0.0 | 0.0 | 0.0 | 0.0 | 9.8 | 10.3 | 92.0 | ||
1000 | 0.6 | 31.9 | 12.5 | – | 13.0 | 11.0 | 99.7 | 0.0 | 0.0 | 0.0 | – | 9.8 | 9.8 | 99.7 | 0.0 | 0.0 | 0.0 | 0.0 | 9.8 | 9.8 | 96.6 | ||
FNN_ big | 0.05 | 50 | 6.6 | 63.0 | 64.8 | 8.4 | 100.0 | 24.0 | 100.0 | 2.0 | 2.0 | 8.0 | 0.0 | 82.0 | 24.0 | 100.0 | 37.7 | 38.3 | 31.7 | 0.0 | 53.5 | 11.3 | 97.0 |
100 | 10.9 | 63.3 | 67.4 | 9.6 | 90.0 | 9.0 | 100.0 | 7.0 | 1.0 | 11.0 | 0.0 | 60.0 | 9.0 | 100.0 | 35.7 | 39.8 | 19.0 | 0.0 | 54.9 | 10.3 | 97.1 | ||
200 | 10.7 | 58.9 | 64.6 | 7.3 | 69.5 | 6.0 | 100.0 | 4.5 | 2.5 | 2.5 | 0.0 | 54.5 | 5.0 | 100.0 | 37.9 | 33.2 | 12.8 | 0.0 | 57.0 | 8.9 | 97.1 | ||
500 | 10.6 | 54.8 | 66.2 | – | 78.8 | 10.0 | 100.0 | 4.8 | 0.6 | 0.0 | – | 52.8 | 7.4 | 100.0 | 35.4 | 29.7 | 1.7 | 0.0 | 61.2 | 10.1 | 97.2 | ||
1000 | 11.8 | 54.6 | 68.4 | – | 79.2 | 20.0 | 100.0 | 7.1 | 0.4 | 0.0 | – | 56.1 | 19.2 | 100.0 | 36.8 | 25.5 | 1.8 | 0.0 | 66.2 | 11.3 | 97.2 | ||
0.1 | 50 | 20.2 | 52.6 | 44.2 | 3.4 | 0.0 | 8.0 | 100.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 8.0 | 100.0 | 2.6 | 0.7 | 1.1 | 0.0 | 1.8 | 9.7 | 97.1 | |
100 | 2.0 | 47.5 | 46.3 | 3.2 | 20.0 | 11.0 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 7.0 | 11.0 | 100.0 | 1.8 | 0.9 | 0.0 | 0.0 | 4.0 | 9.8 | 97.1 | ||
200 | 19.2 | 49.6 | 46.8 | 2.5 | 22.5 | 10.0 | 100.0 | 1.5 | 0.0 | 0.0 | 0.0 | 6.0 | 10.5 | 100.0 | 2.6 | 0.8 | 0.1 | 0.0 | 4.9 | 9.8 | 97.1 | ||
500 | 1.2 | 50.6 | 49.3 | – | 23.6 | 15.0 | 100.0 | 0.0 | 0.2 | 0.0 | – | 2.8 | 9.0 | 100.0 | 1.2 | 0.2 | 0.0 | 0.0 | 5.4 | 8.9 | 97.2 | ||
1000 | 1.0 | 51.1 | 49.9 | – | 42.1 | 11.0 | 100.0 | 0.0 | 0.0 | 0.0 | – | 7.2 | 10.5 | 100.0 | 1.4 | 0.1 | 0.0 | 0.0 | 9.2 | 10.3 | 97.2 | ||
0.3 | 50 | 0.0 | 42.2 | 20.4 | 2.8 | 2.0 | 14.0 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 14.0 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 11.3 | 96.0 | |
100 | 0.1 | 43.4 | 27.8 | 1.4 | 2.0 | 14.0 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 14.0 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 11.3 | 96.7 | ||
200 | 0.1 | 40.8 | 31.0 | 1.6 | 11.0 | 10.0 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 10.0 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 10.3 | 97.2 | ||
500 | 0.3 | 36.9 | 36.0 | – | 9.2 | 9.0 | 100.0 | 0.0 | 0.0 | 0.0 | – | 0.0 | 10.2 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 10.3 | 97.2 | ||
1000 | 0.4 | 36.5 | 37.1 | – | 15.0 | 12.0 | 100.0 | 0.0 | 0.0 | 0.0 | – | 0.0 | 11.8 | 99.9 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 11.3 | 97.2 | ||
CNN | 0.05 | 50 | 27.0 | 69.6 | 67.6 | 47.2 | 100.0 | 0.0 | 100.0 | 0.0 | 0.0 | 12.0 | 0.0 | 100.0 | 18.0 | 100.0 | 70.6 | 37.5 | 70.4 | 0.0 | 87.0 | 9.8 | 96.6 |
100 | 29.4 | 74.5 | 65.6 | 45.7 | 100.0 | 60.0 | 100.0 | 1.0 | 0.0 | 10.0 | 0.0 | 86.0 | 27.0 | 100.0 | 70.6 | 29.4 | 69.2 | 0.0 | 88.2 | 9.7 | 98.3 | ||
200 | 27.6 | 79.8 | 62.5 | 50.9 | 97.0 | 0.0 | 100.0 | 1.0 | 0.0 | 3.0 | 0.0 | 84.0 | 20.5 | 100.0 | 70.6 | 10.4 | 70.6 | 0.0 | 89.9 | 9.7 | 98.3 | ||
500 | 30.8 | 89.0 | 62.3 | – | 92.4 | – | 100.0 | 1.4 | 0.0 | 0.8 | – | 87.0 | – | 100.0 | – | – | 71.4 | 0.0 | 91.2 | – | 98.3 | ||
1000 | 32.9 | 92.2 | 63.8 | – | 90.8 | – | 100.0 | 1.5 | 0.0 | 0.2 | – | 86.8 | – | 100.0 | – | – | 65.7 | 0.0 | 92.1 | – | 98.3 | ||
0.1 | 50 | 12.2 | 53.0 | 41.0 | 10.4 | 80.0 | 0.0 | 100.0 | 2.0 | 0.0 | 2.0 | 0.0 | 50.0 | 8.0 | 100.0 | 8.3 | 0.0 | 8.7 | 0.0 | 44.9 | 10.3 | 98.2 | |
100 | 22.2 | 69.1 | 46.0 | 14.7 | 88.0 | 10.0 | 100.0 | 1.0 | 0.0 | 0.0 | 0.0 | 57.0 | 11.0 | 100.0 | 8.2 | 0.0 | 8.6 | 0.0 | 48.5 | 9.6 | 98.2 | ||
200 | 21.8 | 73.2 | 46.1 | 13.0 | 96.5 | 20.0 | 100.0 | 0.5 | 0.0 | 0.5 | 0.0 | 62.0 | 11.0 | 100.0 | 8.2 | 0.0 | 9.2 | 0.0 | 53.3 | 9.8 | 98.2 | ||
500 | 30.4 | 84.5 | 47.0 | – | 93.2 | – | 100.0 | 0.2 | 0.0 | 0.0 | – | 66.6 | – | 99.8 | – | – | 8.4 | 0.0 | 62.4 | – | 98.2 | ||
1000 | 24.6 | 90.3 | 48.7 | – | 94.7 | – | 100.0 | 0.6 | 0.0 | 0.0 | – | 70.1 | – | 99.9 | – | – | 4.6 | 0.0 | 67.8 | – | 98.3 | ||
0.3 | 50 | 0.0 | 44.8 | 70.0 | 4.0 | 0.0 | 10.0 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 12.0 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 9.8 | 98.3 | |
100 | 0.0 | 59.9 | 75.5 | 2.7 | 13.0 | 10.0 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 7.0 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 8.9 | 98.3 | ||
200 | 0.0 | 69.4 | 78.2 | 3.6 | 12.5 | 0.0 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 7.0 | 100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 8.9 | 98.3 | ||
500 | 1.3 | 79.6 | 73.3 | – | 12.6 | – | 100.0 | 0.0 | 0.0 | 0.0 | – | 0.0 | – | 100.0 | – | – | 0.0 | 0.0 | 0.0 | – | 98.3 | ||
1000 | 1.1 | 85.9 | 70.6 | – | 16.0 | – | 100.0 | 0.0 | 0.0 | 0.0 | – | 0.0 | – | 100.0 | – | – | 0.0 | 0.0 | 0.0 | – | 98.2 |
We assess the scalability of AdvRepair on VGG19 and ResNet18 for local robustness repair using CIFAR-10 through the feature-space repair. The experimental results are presented in Table 4. Regarding \(\mathrm{RSR}\), AdvRepair, PRDNN and APRNN achieve 100%. TRADES also performs commendably, although it slightly trails behind the provable tools. Notably, CARE exhibits an inability to repair the local robustness properties on the CIFAR-10 dataset. In terms of \(\mathrm{RGR}\), AdvRepair demonstrates superior performance, with TRADES also achieving better results than other tools. As for \(\mathrm{DD}\), AdvRepair outperforms the others except in scenarios where the number of buggy input is 1 000. In this situation, TRADES attains the best performance. This phenomenon is reasonable, since TRADES tends to perform better in the DNN accuracy when the number of samples in adversarial training are getting large. Looking at \(\mathrm{DSR}\), we find that AdvRepair exhibits significantly better performance than the other tools, with the latter lacking the capability to defend against adversarial attacks on the original samples. For \(\mathrm{DGSR}\), AdvRepair achieves the same level as the accuracy, surpassing the other tools. Although TRADES also performs comparatively well, it remains notably lower than AdvRepair. The running time of AdvRepair is acceptable, regarding the far superior repair performance and generalization.
VGG19 | ResNet-18 | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3-22 | \(r = 4 / 255\) | \(r = 8/255\) | \(r = 4/255\) | \(r = 8/255\) | |||||||||||||||||
3-22 | 50 | 100 | 200 | 500 | 1000 | 50 | 100 | 200 | 500 | 1000 | 50 | 100 | 200 | 500 | 1000 | 50 | 100 | 200 | 500 | 1000 | |
RSR/% | CARE | 8.0 | 5.0 | 4.0 | 11.0 | 10.9 | 2.0 | 4.0 | 2.0 | 2.2 | 2.4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
PRDNN | 100.0 | 100.0 | 100.0 | – | – | 100.0 | 100.0 | 100.0 | – | – | 100.0 | 100.0 | 100.0 | – | – | 100.0 | 100.0 | 100.0 | – | – | |
TRADE | 96.0 | 95.0 | 95.0 | 97.4 | 96.7 | 96.0 | 92.0 | 92.5 | 95.6 | 96.0 | 100.0 | 99.0 | 100.0 | 99.8 | 99.5 | 100.0 | 96.0 | 99.5 | 99.4 | 96.9 | |
APRNN | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | |
Ours | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | |
RGR/% | CARE | 6.6 | 4.2 | 3.6 | 11.7 | 11.8 | 2.0 | 3.6 | 2.7 | 2.4 | 2.9 | 2.6 | 2.7 | 9.1 | 12.2 | 6.7 | 3.8 | 2.6 | 2.3 | 2.7 | 2.3 |
PRDNN | 69.8 | 68.2 | 68.5 | – | – | 58.0 | 66.6 | 61.4 | – | – | 68.0 | 68.6 | 67.2 | – | – | 59.8 | 57.8 | 54.9 | – | – | |
TRADE | 84.0 | 90.0 | 85.0 | 97.8 | 98.0 | 80.0 | 90.0 | 100.0 | 98.0 | 99.0 | 100.0 | 90.0 | 100.0 | 100.0 | 99.1 | 100.0 | 90.0 | 100.0 | 100.0 | 97.3 | |
APRNN | 69.2 | 68.9 | 70.9 | 75.5 | 80.4 | 63.2 | 65.2 | 68.2 | 66.1 | 70.7 | 70.6 | 65.1 | 62.0 | 62.7 | 68.2 | 65.0 | 55.6 | 53.0 | 49.6 | 59.0 | |
Ours | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | |
DD/% | CARE | 0.1 | 0.2 | 0.2 | 0.2 | 0.3 | 0.1 | 0.2 | 0.2 | 0.1 | 0.3 | 0.1 | 0.1 | 0.2 | 0.3 | 0.2 | 0.1 | 0.0 | 0.0 | 0.1 | 0.1 |
PRDNN | 13.6 | 24.8 | 30.3 | – | – | 19.8 | 40.5 | 50.2 | – | – | 2.1 | 2.1 | 2.0 | – | – | 14.0 | 13.7 | 12.2 | – | – | |
TRADE | 46.0 | 35.3 | 29.8 | 14.7 | -3.3 | 43.9 | 37.8 | 29.7 | 11.8 | -2.7 | 43.0 | 51.7 | 33.0 | 12.9 | -11.2 | 50.4 | 51.4 | 29.2 | 13.7 | -8.9 | |
APRNN | 14.2 | 20.0 | 29.3 | 48.7 | 46.4 | 24.0 | 39.5 | 61.9 | 75.5 | 82.1 | 1.8 | 1.8 | 2.2 | 3.1 | 6.1 | 5.5 | 4.2 | 7.6 | 12.7 | 19.5 | |
Ours | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
DSR/% | CARE | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
PRDNN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
APRNN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
TRADE | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
Ours | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | |
DGSR/% | CARE | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
PRDNN | 0.0 | 0.0 | 0.0 | – | – | 0.0 | 0.0 | 0.0 | – | – | 0.0 | 0.0 | 0.0 | – | – | 0.0 | 0.0 | 0.0 | – | – | |
APRNN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
TRADE | 3.6 | 3.7 | 2.9 | 3.4 | 3.4 | 4.1 | 3.7 | 3.2 | 2.3 | 2.6 | 5.7 | 6.7 | 6.0 | 5.4 | 5.2 | 5.6 | 5.0 | 5.4 | 5.8 | 5.2 | |
Ours | 83.6 | 93.4 | 93.4 | 93.4 | 93.4 | 93.4 | 93.4 | 93.4 | 93.4 | 93.4 | 88.3 | 88.3 | 88.3 | 88.3 | 88.3 | 88.3 | 88.3 | 88.3 | 88.3 | 88.3 | |
Time/s | CARE | 47.8 | 64.4 | 106.5 | 380.4 | 928.2 | 36.9 | 48.2 | 96.0 | 234.1 | 443.0 | 45.5 | 58.9 | 105.5 | 243.5 | 384.9 | 37.1 | 75.4 | 101.6 | 180.7 | 285.7 |
PRDNN | 1731.8 | 2640.6 | 6307.4 | – | – | 1160.4 | 2210.2 | 5126.9 | – | – | 731.8 | 1794.9 | 3307.5 | – | – | 715.4 | 1300.2 | 5757.3 | – | – | |
TRADE | 68.7 | 114.8 | 194.6 | 415.4 | 814.6 | 67.2 | 111.1 | 187.2 | 417.6 | 806.8 | 57.1 | 107.8 | 186.7 | 405.0 | 794.3 | 57.9 | 107.8 | 184.1 | 408.1 | 790.6 | |
APRNN | 188.9 | 288.2 | 451.3 | 1645.6 | 6211.4 | 209.2 | 318.6 | 492.4 | 1750.4 | 7787.1 | 307.7 | 746.1 | 812.2 | 1338.1 | 1710.8 | 580.9 | 990.8 | 863.2 | 1556.0 | 1988.4 | |
Ours | 238.7 | 471.6 | 963.0 | 2516.9 | 5441.6 | 229.9 | 470.7 | 950.1 | 2485.0 | 5436.0 | 265.8 | 525.1 | 1050.3 | 2586.4 | 5262.3 | 264.7 | 523.1 | 1042.8 | 2627.8 | 5325.1 |
We systematically evaluate the performance of the repair results concerning the number of buggy behaviors in the local robustness repair task. We consider five distinct configurations: 50, 100, 200, 500 and 1000. The comparative analysis encompasses the performance of various baselines alongside our approach across these configurations. AdvRepair not only showcases effectiveness in limiting the number of buggy behaviors but also excels when faced with a growing number of buggy behaviors. In contrast, APRNN encounters challenges in successfully repairing the FNN_small model when the number of buggy behaviors exceeds 100, and REASSURE experiences memory overflow issues when confronted with 1000 buggy behaviors.
Provable DNN repair. Our approach falls within the category of provable DNN repair methods. The methods most closely related to ours are ART [27] and REASSURE [25]. In comparison to REASSURE, our approach fundamentally differs in its repair objectives. REASSURE focuses on fixing activation patterns of neural networks, while we directly guarantee the properties, whose input constraints may encompass a large number of activation patterns. In this way, our method exhibits superior generalization compared to REASSURE. Although ART also utilizes training in its repair process, it directly tunes parameters in the original neural network. In contrast, our approach involves training patch modules specifically for different properties, enhancing the specialization of our repairs and, consequently, their effectiveness. Other methods such as PRDNN [37] and APRNN [36] formulate the repair problem in linear programming. However, their effectiveness diminishes in properties with high-dimensional polytopes, such as the robustness of image classifications.
Heuristic DNN repair. Utilizing heuristic algorithms such as particle swarm optimization and differential evolution, CARE [22] and Arachne [39] aim to localize the neurons responsible for the fault. VeRe [40] focuses on providing formal verification guidance for fault localization and defining target intervals for repair synthesis. DeepRepair [19] and few-shot guided mix [20], for instance, expand the set of negative samples to generate additional training data. Tian et al. [41], on the other hand, augment the existing data by introducing real-world environmental effects, like fog, to the samples. To the best of our knowledge, DL2 [26] is the first paper to combine logical constraints and loss function, but it does not adopt convex approximation to give the loss function a certified guarantee.
DNN verification. In 2010, [42] proposed the first method for DNN verification. Over the past decade, various approaches of DNN verification has been proposed including techniques such as constraint solving [43]–[50], abstract interpretation [14], [51]–[55], linear relaxation [56]–[60], global optimisation [61]–[63], CEGAR [64]–[66], reduction to two-player games [67], [68], and star-set abstraction [69], [70]. These method offer provable estimations of DNN robustness. Moreover, statistical approaches, presented in [71]–[80], prove to be more efficient and scalable, particularly suited for intricate DNN structures, allowing for the establishment of quantifiable robustness at a specified confidence level. Certified training [81]–[83] uses convex approximation in training the loss function, but it is only used to enhance the robustness of DNN, without making it accessible to repair properties in the training procedure.
We introduce AdvRepair, a novel approach for addressing adversarial attacks using limited data. Our method provides provable and targeted repairs within the robustness neighborhood, enabling the generalization of this defense to other inputs. In terms of efficiency, scalability, and generalization, our approach surpasses existing methods. For future work, we plan to integrate BaB-based verification methods to enhance the performance of our approach, improving both precision and offering an alternative refinement approach. Additionally, our patch-based repair framework has the potential to evolve into a black-box repair method.
Theorem 2. Let \(\varphi=(N,B(x_i,r))\) be a local robustness property. If \(\mathcal{L}(\varphi) = 0\) on \(B(x_i,r)\), i.e., \[\mathcal{L}^*(\varphi):= \max (\max(\alpha_{\ell}^\mathrm{T},\boldsymbol{0}) \cdot (x_i+r\cdot\boldsymbol{1})+ \min(\alpha_{\ell}^\mathrm{T},\boldsymbol{0}) \cdot (x_i-r\cdot \boldsymbol{1}) + \beta_\ell,0),\] where \(\boldsymbol{0}\) and \(\boldsymbol{1}\) are the vector in \(\mathbb{R}^{n_0}\) with all the entries \(0\) and \(1\), respectively, then the property \(\varphi\) holds.
Proof. As \(\alpha_\ell^\mathrm{T} x + \beta_\ell\) is a linear function with respect to \(x\), we can calculate the maximum of \(\alpha_\ell^\mathrm{T} x + \beta_\ell\) on \({B(x_i,r)}\) as follows: \[\begin{align} &\max_{x\in B(x_i,r)}\alpha_\ell^\mathrm{T} x + \beta_\ell \\ = &\sum_{i \in \mathbb{R}^{n_1}}\mathbb{1}_{\{(\alpha_\ell)_{i} > 0\}}(\alpha_\ell)_{i} \max_{x\in B(x_i,r)}x + \sum_{i \in \mathbb{R}^{n_1}}\mathbb{1}_{\{(\alpha_\ell)_{i} < 0\}}(\alpha_\ell)_{i} \max_{x\in B(x_i,r)}x + \beta_\ell \\ = &\sum_{i \in \mathbb{R}^{n_1}}\mathbb{1}_{\{(\alpha_\ell)_{i} > 0\}}(\alpha_\ell)_{i} \cdot (x_i+r) + \sum_{i \in \mathbb{R}^{n_1}}\mathbb{1}_{\{(\alpha_\ell)_{i} < 0\}}(\alpha_\ell)_{i} \cdot (x_i-r) + \beta_\ell \end{align}\] Then we have \[\begin{align} & \mathcal{L}(\varphi)(x) = \sum_{\ell \ne \ell_0} \max (\alpha_\ell^\mathrm{T} x + \beta_\ell,0) = 0,\forall x \in B(x_i,r)\\ \iff &\sum_{\ell \ne \ell_0}\max(\max(\alpha_{\ell}^\mathrm{T},\boldsymbol{0}) \cdot (x+r\cdot\boldsymbol{1})+ \min(\alpha_{\ell}^\mathrm{T},\boldsymbol{0}) \cdot (x-r\cdot \boldsymbol{1}) + \beta_\ell,0) = 0\\ \iff &\mathcal{L}^*(\varphi) = 0.\\ \end{align}\] Therefore, if \(\mathcal{L}^*(\varphi) = 0\), we have \[\begin{align} \forall x \in B(x_i,r), C_N(x)_\ell-C_N(x)_{\ell_0} \le \alpha_\ell^\mathrm{T} x + \beta_\ell \leq 0, \end{align}\] then the property \(\varphi\) holds. 0◻ ◻
In this experiment, we assess the efficacy of size and quantity of patch modules in the context of repairing local robustness properties. We conduct a comparative analysis of AdvRepair’s performance across various sizes and quantities, presenting the experimental outcomes in Table 5. Two distinct patch module sizes, namely small and large, are considered. The small patch module consists of three hidden layers with 64 neurons in the first two layers and 16 neurons in the last layer, while the large patch module comprises three hidden layers with 256 neurons in the first two layers and 64 neurons in the last layer. Additionally, we explore two different numbers of patch modules: property-guided patch modules and untargeted label patch modules.
3pt
Model | \(r\) | \(n\) | RSR/% | RGR/% | DSR/% | DGSR/% | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4-19 | PS | PL | LS | LL | PS | PL | LS | LL | PS | PL | LS | LL | PS | PL | LS | LL | ||
FNN_ small | 0.05 | 50 | 100.0 | 100.0 | 100.0 | 99.3 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 90.0 | 90.0 | 89.9 | 89.8 |
100 | 100.0 | 100.0 | 100.0 | 99.3 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 96.4 | 96.4 | 96.1 | 96.0 | ||
200 | 100.0 | 100.0 | 100.0 | 99.3 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 96.6 | 96.6 | 96.6 | 96.6 | ||
500 | 100.0 | 100.0 | 100.0 | 99.3 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 96.6 | 96.6 | 96.6 | 96.6 | ||
1000 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 96.6 | 96.6 | 96.6 | 96.6 | ||
0.1 | 50 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 96.3 | 96.3 | 89.5 | 89.5 | |
100 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 95.0 | 96.3 | 94.7 | 94.8 | ||
200 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 95.2 | 95.1 | 94.9 | 94.8 | ||
500 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 95.3 | 95.2 | 96.3 | 96.3 | ||
1000 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 96.5 | 96.5 | 96.3 | 96.3 | ||
0.3 | 50 | 94.0 | 94.0 | 94.0 | 94.0 | 94.8 | 94.8 | 94.8 | 94.8 | 92.0 | 92.0 | 92.0 | 92.0 | 86.4 | 79.9 | 44.6 | 44.5 | |
100 | 99.0 | 99.0 | 99.0 | 99.0 | 99.0 | 99.0 | 99.0 | 99.0 | 100.0 | 99.0 | 99.0 | 100.0 | 96.2 | 96.2 | 85.8 | 86.1 | ||
200 | 99.5 | 99.5 | 99.5 | 99.5 | 99.5 | 99.5 | 99.5 | 99.5 | 100.0 | 99.0 | 97.5 | 98.0 | 90.8 | 91.2 | 85.9 | 85.9 | ||
500 | 99.4 | 99.4 | 99.8 | 99.8 | 99.5 | 99.5 | 99.8 | 99.8 | 99.8 | 99.6 | 100.0 | 100.0 | 92.0 | 92.1 | 89.5 | 89.7 | ||
1000 | 99.6 | 99.6 | 99.9 | 99.9 | 99.7 | 99.7 | 100.0 | 100.0 | 99.7 | 99.6 | 100.0 | 100.0 | 96.6 | 96.5 | 94.2 | 94.2 | ||
FNN_ big | 0.05 | 50 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 97.0 | 97.0 | 96.8 | 96.8 |
100 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 97.1 | 97.1 | 96.8 | 96.8 | ||
200 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 97.1 | 97.1 | 96.8 | 96.8 | ||
500 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 97.2 | 97.2 | 97.2 | 97.2 | ||
1000 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 97.2 | 97.2 | 97.2 | 97.2 | ||
0.1 | 50 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 97.1 | 97.1 | 96.8 | 96.8 | |
100 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 97.1 | 97.2 | 96.8 | 96.9 | ||
200 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 97.1 | 97.1 | 96.9 | 96.8 | ||
500 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 97.2 | 97.2 | 96.8 | 96.8 | ||
1000 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 97.2 | 97.2 | 97.2 | 97.0 | ||
0.3 | 50 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 96.0 | 96.1 | 89.1 | 89.1 | |
100 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 96.7 | 96.8 | 89.1 | 89.1 | ||
200 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 97.2 | 97.2 | 89.3 | 89.3 | ||
500 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 97.2 | 97.2 | 95.9 | 95.9 | ||
1000 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 99.9 | 99.9 | 100.0 | 100.0 | 97.2 | 97.2 | 97.1 | 97.1 | ||
CNN | 0.05 | 50 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 96.6 | 96.6 | 96.6 | 96.7 |
100 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 98.3 | 98.3 | 98.3 | 98.3 | ||
200 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 98.3 | 98.3 | 98.3 | 98.3 | ||
500 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 98.3 | 98.3 | 98.3 | 98.3 | ||
1000 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 98.3 | 98.3 | 98.3 | 98.3 | ||
0.1 | 50 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 98.2 | 98.2 | 98.0 | 98.0 | |
100 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 98.2 | 98.2 | 98.1 | 98.0 | ||
200 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 99.5 | 99.5 | 99.5 | 98.2 | 98.2 | 98.0 | 98.1 | ||
500 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 99.8 | 99.8 | 99.8 | 99.8 | 98.2 | 98.2 | 98.0 | 98.1 | ||
1000 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 99.9 | 99.8 | 99.8 | 99.8 | 98.3 | 98.3 | 98.0 | 98.0 | ||
0.3 | 50 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 98.3 | 98.3 | 98.3 | 98.3 | |
100 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 98.3 | 98.3 | 98.3 | 98.3 | ||
200 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 98.3 | 98.3 | 98.3 | 98.3 | ||
500 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 98.3 | 98.3 | 98.3 | 98.3 | ||
1000 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 98.2 | 98.2 | 98.3 | 98.3 |
The performance of AdvRepair is evaluated across different combinations of patch module scales and quantities, including small property-guided patch module (PS), large property-guided patch modules (PL), small untargeted label repairing patch modules (LS), and large untargeted label repairing patch modules (LL). Notably, the experimental findings indicate that both sizes of the patch modules exhibit commendable performance, with minimal impact observed on the repair performance due to the network size. Concerning the quantity of patch modules, the property-guided patch modules outperform the untargeted label repairing patch modules in defending against adversarial attacks within \(D_t\). We hypothesize that this superiority is attributed to the specialization inherent in property-guided repair.
Name | Accuracy/% | Model structure |
---|---|---|
FNN_small | 96.6 | linear layer of 50 hidden units |
linear layer of 50 hidden units | ||
linear layer of 50 hidden units | ||
linear layer of 50 hidden units | ||
linear layer of 32 hidden units | ||
linear layer of 10 hidden units | ||
FNN_big | 98.3 | linear layer of 200 hidden units |
linear layer of 200 hidden units | ||
linear layer of 200 hidden units | ||
linear layer of 200 hidden units | ||
linear layer of 32 hidden units | ||
linear layer of 10 hidden units | ||
CNN | 97.2 | Conv2d(1, 16, 4, stride=2, padding=1) |
linear layer of 100 hidden units | ||
linear layer of 10 hidden units |
The network architectures and accuracies of the DNN trained on MNIST are detailed in Table 6. For CIFAR-10, the accuracies of VGG19 and ResNet18 are 93.4% and 88.3%, respectively. In Algorithm 2, the maximum number of iterations \(M\) is set to 25. In Algorithm 3, the maximum number of epochs \(R\), learning rate \(\eta\), and selection number \(K\) are set to 10, 10, and 800, respectively. The attack methods in AutoAttack have a maximum of 10 iterations, and the number of restarts is set to 1.
AdvRepair is available at an anonymous link: https://drive.google.com/drive/folders/1NU170UZ0XmYKgg6diKZc-XMcqsUAVEib?usp=drive_link↩︎