Property Testing is a formal framework to study the computational power and complexity of sampling from combinatorial objects. A central goal in standard graph property testing is to understand which graph properties are testable with sublinear query complexity. Here, a graph property \(P\) is testable with a sublinear query complexity if there is an algorithm that makes a sublinear number of queries to the input graph and accepts with probability at least \(2/3\), if the graph has property \(P\), and rejects with probability at least \(2/3\) if it is \(\varepsilon\)-far from every graph that has property \(P\).
In this paper, we introduce a new variant of the bounded degree graph model. In this variant, in addition to the standard representation of a bounded degree graph, we assume that every vertex \(v\) has a unique label \(\mathop{\mathrm{\mathfrak{num}}}(v)\) from \(\{1, \dots, |V|\}\), and in addition to the standard queries in the bounded degree graph model, we also allow a property testing algorithm to query for the label of a vertex (but not for a vertex with a given label).
Our new model is motivated by certain graph processes such as a DFS traversal, which assign consecutive numbers (labels) to the vertices of the graph. We want to study which of these numberings can be tested in sublinear time. As a first step in understanding such a model, we develop a property testing algorithm for discovery times of a DFS traversal with query complexity \(O(n^{1/3}/\varepsilon)\) and for constant \(\varepsilon>0\) we give a matching lower bound.
Depth-first search (DFS) is one of the most useful and frequently used algorithmic primitives in graph algorithms. The DFS algorithm has been known for well over a century [1], [2] as a technique for threading mazes and has been widely used in the design of graph and network algorithms since the late 1950s. DFS is a basic tool to traverse a graph in a structured way and is central to solve textbook problems such as connectivity, topological sorting [3], determining strongly connected components in directed graphs [4], biconnected components in undirected graphs [4], and to test planarity [5]. Because of its versatility and usefulness for solving many graph problems, DFS has become one of the most important tools in the design of algorithms for graphs, taught in the first year of many computing science study programs (see, e.g., the classical textbooks [6], [7]). One of the most useful combinatorial properties of DFS is the DFS numbering of vertices, which is the order in which a DFS algorithm discovers all vertices of the input graph [4]. (See also Algorithms [alg:DFS-numbering-def] and [alg:DFS-numbering-def-rec], where \(N(v)\) denotes the set of neighbors of vertex \(v\).)
We remark that sometimes, e.g., in the widely used textbook by Cormen et. al.(see Chapter 20.3 in [6]), the DFS numbering is also used together with finishing numbers (cf. 8). We also remark that the labeling is not unique and depends on the ordering in which the vertices are traversed in the for-loops of Algorithms [alg:DFS-numbering-def] and [alg:DFS-numbering-def-rec].
Definition 1 (DFS numberings). Let \(G = (V,E)\) be a labeled undirected graph on \(n\) vertices. A labeling \(\mathop{\mathrm{\mathfrak{num}}}: V \rightarrow \{1,\dots,n\}\) is called a DFS numbering of \(G\)* if DFS gives rise to it for some order of processing of the vertices in the for-loops of Algorithms [alg:DFS-numbering-def] and [alg:DFS-numbering-def-rec].*
Given a wide applicability of DFS and DFS numbering, a natural problem is to verify whether a given graph is correctly DFS numbered, i.e., the labels assigned to the vertices are a DFS numbering for some ordering of vertices and neighbors. The latter problem is easy to solve by simulating a DFS using the given numbering and locally verifying that the DFS is performed correctly. However, is it also possible to approximately verify whether the graph is properly DFS numbered using a sampling based algorithm? An answer to this question sheds some light on how the local structure of a DFS numbered graph (as implicitly given by the sample distribution) is related to the global DFS numbering.
We will study this question in the framework of Property Testing (see, e.g., the monographs [8], [9]). Property Testing provides a formal setting to study approximate decision problems. The goal is to distinguish objects that satisfy the tested property from those that are \(\varepsilon\)-far from every object that satisfies the property. Here, \(\varepsilon\)-far means that one has to modify more than an \(\varepsilon\)-fraction of the object’s description to obtain an object that has the property. A property testing algorithm requires some form of sampling access to the object.
In this paper, we will introduce a variant of the standard bounded-degree graph model [10] that in addition to the standard setting, allows also label queries. We will assume that there exists some labeling that assigns unique labels from the set \(\{1, \dots, |V|\}\) to the vertex set \(V\), i.e., the labeling is a bijection \(\mathop{\mathrm{\mathfrak{num}}}: V \rightarrow \{1, \dots, |V|\}\). We say that a graph with maximum degree \(d\) is \(\varepsilon\)-far from a DFS numbered graph, if we have to modify (insert or delete) more than \(\varepsilon|V|\) edges to obtain a graph that is correctly DFS numbered5. We do not allow to modify labels, though we notice that allowing to modify labels would also be a valid model (see 2.1 for some discussion). Our motivation to not consider it here was merely to stick as closely to the standard testing model as possible.
In this new framework, we study a fundamental problem of whether the input labeled undirected graph is properly DFS numbered or it is \(\varepsilon\)-far from having a valid DFS numbering. Our main result is a tight complexity bound for this task (for constant \(d\) and \(\varepsilon\)). First, in 4, we show that any tester for the DFS numbering requires \(\Omega(n^{1/3})\) queries, and then, we complement this bound in 5 with an algorithm that tests DFS numbering with \(O(n^{1/3}/\varepsilon)\) queries.
The proof of the lower bound (2 in 4) is by constructing two families \((G_n)_{n \in \mathbb{N}}\) and \((B_n)_{n \in \mathbb{N}}\) of good and bad random labeled graphs for which we will show that distinguishing between these families is necessary for any DFS tester. Then we will show that distinguishing between these families requires \(\Omega(n^{1/3})\) queries.
The proof of the upper bound (3 in 5) relies on a characterization of labelings that are \(\varepsilon\)-far from a valid DFS numbering. The characterization is described in a form of some conflicts between the labelings and we design two algorithms detecting such conflicts: one for local conflicts and one for global conflicts. By combining these two algorithms we will obtain an algorithm that tests DFS numbering with \(O(n^{1/3}/\varepsilon)\) queries.
Property Testing was introduced by Rubinfeld and Sudan [11] and first studied in the dense graph setting by Goldreich and Ron [12]. Constant time testability in the dense graph model has been fairly well understood [13] and is closely related to the regularity lemma. In our paper we build upon the bounded degree graph model, as introduced by Goldreich and Ron [10]. First results in this model included testers for properties such as connectivity, \(k\)-connectivity and being Eulerian [10]; bipartiteness [14] and cycle freeness [15] can be tested in the bounded degree graph model in \(O(\frac{\sqrt{n}}{\varepsilon^{O(1)}})\) time. A series of papers proved that every hyperfinite property of bounded degree graphs is testable in constant time [16]–[19]. Also, every constant time testable property in bounded degree graphs is either finite or contains a hyperfinite property [20]. Furthermore, it is known that every first-order logic property of bounded degree graph with an \(\exists \forall\) quantification is constant time testable while some properties with a \(\forall \exists\) quantification are not [21]. A number of other questions have been studied in the bounded degree graph model such as testability using proximity oblivious testers [22]. Further works in the bounded degree graph model include, e.g., testability using proximity oblivious testers [22].
In this paper (except for 7), let \(G = (V,E)\) denote an undirected graph with \(n = |V|\) vertices and \(m = |E|\) edges. Let \(N(v)\) denote the set of neighbors of \(v\), i.e., \(N(v) : = \{u \in V: \{v,u\} \in E\}\), and \(d\) be an upper bound on the maximum number of edges incident to any vertex in \(V\). We will assume that \(G\) is a labeled graph, i.e., there is a bijection (permutation) \(\mathop{\mathrm{\mathfrak{num}}}: V \rightarrow [n]\) that assigns numbers to the vertices of \(G\) and we use \([n]:= \{1,\dots,n\}\).
In this section we describe how our algorithm can access the input graph \(G = (V,E)\) of maximum degree \(d\) with vertex labeling \(\mathop{\mathrm{\mathfrak{num}}}\). Our model is a slight modification of the standard bounded degree graph model [10] that in addition allows access to labels. As usual in this model, we assume that \(V=[n]\) and \(n\) as well as \(d\) are given to the algorithm. The algorithm can ask the following two types of queries and receives an answer in constant time:
Neighbor queries: for every vertex \(v \in V\) and every \(1 \le i \le d\), one can query the \(i\)th neighbor of vertex \(v\).
Label queries: for every vertex \(v \in V\), one can query the label of \(v\), \(\mathop{\mathrm{\mathfrak{num}}}(v)\).
Observe that we allow access to vertices and their neighbors, and we can check the label \(\mathop{\mathrm{\mathfrak{num}}}(v)\) of any vertex \(v\), but we have no access to vertices through their labels \(\mathop{\mathrm{\mathfrak{num}}}\). In particular, to access vertex \(v\) with a given label \(\mathop{\mathrm{\mathfrak{num}}}\), we have to query the oracle until such vertex is returned. This is a special feature distinguishing our model from the standard model used in graph property testing and making the problem challenging: we have a labeled graph, but we cannot access the graph (its vertices) through the labels! Instead, the only way to access the graph is by taking any vertex \(v \in V = [n]\) and either querying its label \(\mathop{\mathrm{\mathfrak{num}}}(v)\) (via the label query) or querying its \(i\)th neighbor (via a neighbor query).
The query complexity of a testing algorithm is the number of oracle queries.
A property testing algorithm (in short, property tester) for DFS numbered graphs is an algorithm that has access to an input graph as described above and that accepts the input with probability at least \(2/3\), if \(G\) is a graph with \(\mathop{\mathrm{\mathfrak{num}}}\) being a valid DFS numbering. The algorithm rejects \(G\) with probability at least \(2/3\) if \(G\) is \(\varepsilon\)-far from having a valid DFS numbering \(\mathop{\mathrm{\mathfrak{num}}}\) according to the following definition. (The algorithm developed in this paper is a property tester with one-sided error, i.e., it always accepts, if \(\mathop{\mathrm{\mathfrak{num}}}\) is a DFS numbering.)
Definition 2 (\(\varepsilon\)-far from DFS numberings in bounded degree graphs). Let \(G = (V,E)\) be an undirected graph on \(n\) vertices with maximum degree at most \(d\) and let \(\mathop{\mathrm{\mathfrak{num}}}: V \rightarrow [n]\) be a bijection that assigns labels to \(V\). We say the labeled graph \(G\) \(\mathop{\mathrm{\mathfrak{num}}}\) is \(\varepsilon\)-far from having a valid DFS numbering, if one has to modify6 more than \(\varepsilon n\) edges in \(G\) to obtain a graph \(G' = (V,E')\) of maximum degree at most \(d\) for which \(\mathop{\mathrm{\mathfrak{num}}}\) is a valid DFS numbering.
To simplify the presentation, we will often assume \(\mathop{\mathrm{\mathfrak{num}}}(v) = v\). However, we will not use this knowledge in the algorithm as the model prevents us from querying \(\mathop{\mathrm{\mathfrak{num}}}^{-1}\). Our tester will only ask for a random vertex or for a vertex that occurred as the neighbor of a previously queried vertex.
Observe that in general, it might be natural to consider a revised definition of a labeling \(\mathop{\mathrm{\mathfrak{num}}}\) being \(\varepsilon\)-far from a valid DFS numbering, where while defining graph \(G'\) in 2, in addition to edge modifications, one would allow also for modifications of the labels7. We observe that for bounded degree graphs, adding the labels modifications does not change the problem significantly.
Lemma 1. Let \(\mathop{\mathrm{\mathfrak{num}}}\) be a permutation of \(V\) to \(\{1, \dots, n\}\). For a given bounded degree graph \(G = (V,E)\), a labeling \(\mathop{\mathrm{\mathfrak{num}}}\) is \(\varepsilon\)-far from a valid DFS numbering according to the edges modifications if and only if \(\mathop{\mathrm{\mathfrak{num}}}\) is \(\Theta(\varepsilon)\)-far from a valid DFS numbering according to the edges and labels modifications.
Proof. Observe that the number of modifications of the edges to obtain a valid DFS numbering is not smaller than the number of modifications of the edges and the labels to obtain a valid DFS numbering. Therefore, if for a given bounded degree graph \(G\), a labeling \(\mathop{\mathrm{\mathfrak{num}}}\) is \(\varepsilon\)-far from a valid DFS numbering according to the edges and labels modifications then \(\mathop{\mathrm{\mathfrak{num}}}\) is also \(\varepsilon\)-far from a valid DFS numbering according to the edges modifications.
For the other direction, if \(\mathop{\mathrm{\mathfrak{num}}}\) is \(\varepsilon\)-far from a valid DFS numbering according to the edges modifications then we can simulate modification of the labels by modification of the edges: to assign a given label to a vertex we just remove all its incident edges and add all edges incident to the vertex with the label sought. (Observe that a vertex might have had some edges of its own that have to be removed but by correcting the labels one by one, we can ensure that once we fix vertex \(v\) by assigning it to label \(i\) with vertex \(u\) having label \(i\) before, then later we will have to fix the label of vertex \(u\), resolving the issue.) Observe that if we apply this operation to all vertices with the labels to be changed, then we do not increase the maximum degree, and hence, modifying of \(k\) labels can be simulated by modifying at most \(2dk\) edges. Therefore, if a labeling \(\mathop{\mathrm{\mathfrak{num}}}\) is \(\varepsilon\)-far from a valid DFS numbering according to the edges modifications then \(\mathop{\mathrm{\mathfrak{num}}}\) is also \((2d\varepsilon)\)-far from a valid DFS numbering according to the edges and labels modifications. ◻
Our model assumes that the input labeling \(\mathop{\mathrm{\mathfrak{num}}}\) is a permutation (bijection) of \(V\) to \(\{1\,\dots,n\}\), but it may be natural to consider the case when \(\mathop{\mathrm{\mathfrak{num}}}\) is not necessarily a permutation but rather an arbitrary function from \(V\) to \(\{1\,\dots,n\}\), allowing repetitions of labels. Observe that already the simple problem of testing if \(\mathop{\mathrm{\mathfrak{num}}}\) is a permutation or is \(\varepsilon\)-far from being a permutation (also known as the element distinctness problem) is known to require \(\Theta(\sqrt{n}/\varepsilon)\) queries (see, e.g., [23]). In view of that bound, the best what we could hope for without assuming that \(\mathop{\mathrm{\mathfrak{num}}}\) is a permutation of \(V\) to \(\{1\,\dots,n\}\) is to achieve query complexity of \(\Theta(\sqrt{n}/\varepsilon)\). And this can be easily obtained by combining our algorithm in 3 with the known algorithms testing element distinctness, leading to a testing algorithm with \(\Theta(\sqrt{n}/\varepsilon)\) queries.
We begin with a review of basic DFS terminology, introduce some notions of our own, and make a few useful observations. (We also refer to standard textbooks, e.g., [6], [7].)
The DFS algorithm, as given in Algorithm [alg:DFS-numbering-def], numbers every vertex even if \(G\) is disconnected due to the outer loop over all vertices. Every vertex is initially undiscovered. When DSFVisit(\(v\)) is called, \(v\) becomes discovered. We say \(v\) is active as long as the call DSFVisit(\(v\)) persists and is finished as soon as it ends. The vertex \(p\) that initiates the call of DSFVisit(\(v\)) is the parent of \(v\). If the call of DSFVisit(\(v\)) is initiated from the outer loop then we say that \(v\) is an orphan with virtual parent \(0\). The idea is that the outer loop iterating over all vertices is like a call to DSFVisit(\(0\)) where \(0\) is a virtual vertex connected to all other vertices. Each connected component of \(G\) has exactly one orphan, which receives the smallest number in its connected component. At any point during the execution the white path consists of all active vertices including the virtual vertex \(0\), with every active vertex (other than \(0\)) connected to its parent. The DFS numbers appear in increasing order along the white path. In an implicitly labeled graph \(G = (V, E)\), we define \(p(v)\) for \(v \in V\) as \[\begin{align} p(v) := \begin{cases} \max\{u \in N(v) \cap [v-1]\} & \text{ if } N(v) \cap [v-1] \ne \emptyset \enspace,\\ 0 & \text{ otherwise.} \end{cases} \end{align}\]
Claim 1. If the numbers in an implicitly labeled graph correspond to a DFS numbering then \(p(v)\) is the (possibly virtual) parent of \(v\).
Proof. If \(v\) is an orphan then it has the smallest number in its connected component so \(\{u \in N(v): u < v\} = \emptyset\). Hence \(p(v) = 0\), which is the virtual parent of orphans by definition.
Now assume \(v\) was discovered from a non-virtual parent \(x \in V\). Then \(x \in N(v)\) with \(x < v\) so \(x \in \{u \in N(v): u < v\}\). To see that \(x\) is the maximum of \(\{u \in N(v): u < v\}\), consider \(x' \in \{x+1,\dots,v-1\}\). Immediately before \(v\) is discovered, \(x\) is at the end of the white path, hence the largest active vertex. Since \(x' > x\) has been discovered, \(x'\) must be finished. Hence all neighbours of \(x'\) have been discovered. Since \(v\) is not discovered we have \(x \notin N(v)\) so \(x \notin \{u \in N(v): u < v\}\). ◻
Observe that any edge \(e=\{u,v\}\) with \(u < v\) in an implicitly labeled graph corresponds to an interval \([u,v]\) representing a range of elements with respect to the DFS numbering. The analysis of the interrelation between such intervals plays a central role in our analysis.
Definition 3. A pair \((v,\{u,w\}) \in V \times E\) is a conflicting pair if \(p(v) < u < v < w\).
The following central lemma shows that the absence of conflicting pairs is necessary and sufficient for the validity of a DFS numbering.
Lemma 2. Let \(G = (V,E)\) be an implicitly labeled graph. The following are equivalent.
\(G\) has a valid DFS numbering.
There exists no conflicting pair.
Proof.
Consider a DFS run that agrees with the DFS numbering and let \((v,\{u,w\}) \in V \subseteq E\) be a pair with \(p(v) < u < v\). By 1, \(v\) was discovered from \(p(v)\). Consider the time immediately before this happens. Vertex \(u\) was already discovered (due to \(u < v\)) and is not active (since \(p(v) < u\) is the end of the white path and \(p(v)\) is the largest active vertex), meaning \(u\) is finished. Thus all neighbors of \(u\) are already discovered, so \(w < v\). In particular \((v,\{u,w\})\) is not a conflicting pair.
We use induction on \(k\), showing that there is a DFS run assigning number \(v\) to each \(v \in \{1,\dots,k\}\). For \(k = 0\) the claim is trivial. Assume the claim holds for some \(k-1\). Consider a corresponding DFS run that has just discovered vertex \(k-1\); at that moment vertex \(k\) is undiscovered. We have \(0 \le p(k) \le k-1\). If \(p(k)\) does not appear on the white path then, because both \(0\) and \(k-1\) appear on the white path, there is a vertex \(x\) on the white path with \(p(x) < p(k) < x < k\). This makes \((x,\{p(k),k\})\) a conflicting pair, contradicting (ii). Therefore \(p(k)\) does appear on the white path. There might be a vertex \(u > p(k)\) further along on the white path. Let \(w\) be a neighbor of \(u\). Due to \(p(k) < u < k\) we must have \(w \le k\), for otherwise \((k,\{u,w\})\) would be a conflicting pair. Moreover we have \(w \neq k\) by definition of \(p(k)\). Thus \(w \le k-1\), meaning \(w\) was already discovered. In particular the DFS will backtrack from active vertices until it reaches \(p(k)\), from which it may legally choose to discover \(k\) next. This concludes the induction.
◻
In this section, we prove our first main result.
Theorem 2. Every property tester for the property of having a valid DFS-labeling has a query complexity of \(\Omega(n^{1/3})\).
Let \(\varepsilon> 0\) be sufficiently small (any \(\varepsilon\le \frac{1}{33}\) would do). The proof of 2 is by constructing two families \((G_n)_{n \in \mathbb{N}}\) and \((B_n)_{n \in \mathbb{N}}\) of good and bad random labeled graphs for which we will show that distinguishing between these families is necessary for any DFS-tester. Then we will show that distinguishing between these families requires \(\Omega(n^{1/3})\) queries.
Let \(n, N \in \mathbb{N}\). Each graph \(G_n\) and \(B_n\) consists of \(\lfloor\frac{n}{8N}\rfloor\) arms of \(8N\) vertices each. The roots of these arms are connected with some binary tree that is the same for \(G_n\) and \(B_n\).
Figure 2: Four types of arms \((G1)\), \((G2)\), \((B1)\), and \((B2)\) used in our construction of \((G_n)_{n \in \mathbb{N}}\) and \((B_n)_{n \in \mathbb{N}}\). Each arm starts with a root, denoted by \(\blacksquare\). Each link corresponds to a path on \(N\) vertices and labels \(iN+1, \dots, i(N+1)\) ascending in the direction of the arrow (see Figure 3 (a)). We also have a comb graph (see Figure 3 (b)), which is obtained from by adding \(N\) new vertices, adding a matching between the new vertices and the vertices from , and labeling the new vertices with \(jN+1, \dots, (j+1)N\), this time descending in the direction of the arrow (i.e., the vertex with label \(iN+k\) is adjacent to the vertex with label \(jN+N-k+1\))..
There are four types of arms, as described in details in Figure 2.
Figure 3: Graphs corresponding to (a) a path and (b) a comb graph ..
Each arm of \(G_n\) is a copy of \((G1)\) or \((G2)\) chosen independently and uniformly at random. Similarly, each arm of \(B_n\) is a copy of \((B1)\) or \((B2)\) chosen independently and uniformly at random.
The labeling of \(G_n\) is then obtained by starting in the root of the tree \(T\) and using within the arms the relative ordering defined by the numbering of the arms \((G1)\) or \((G2)\). For an example, see, e.g., Figure 4. (Since each arm has the same number of vertices, the choice of one arm, whether in \((G1)\) or \((G2)\), does not affect the numbering of other arms.)
Similarly (see Figure 5), the labeling of \(B_n\) is defined by starting at a root of \(T\) and using within the arms the relative ordering defined by the numbering of the arms \((B1)\) or \((B2)\).
Figure 4: An example of labeling of \(G_n\) with \(n=64\), \(N = 4\), using \(\lfloor\frac{n}{8N}\rfloor = 2\) arms of size \(8 \cdot 4 = 32\). Vertices with labels 1, 2, 34 are in the tree \(T\), the top branch (with labels 2–33) corresponds to arm \((G2)\) and the bottom branch (with labels 34–65) corresponds to arm \((G1)\). For each arm, we also marked the corresponding parts defining it..
Figure 5: An example of labeling of \(B_n\) with \(n=64\), \(N = 4\), using \(\lfloor\frac{n}{8N}\rfloor = 2\) arms of size \(8 \cdot 4 = 32\). Vertices with labels 1, 2, 34 are in the tree \(T\), the top branch (with labels 2–33) corresponds to arm \((B2)\) and the bottom branch (with labels 34–65) corresponds to arm \((B1)\). For each arm, we also marked the corresponding parts defining it..
Let us first notice that our construction of good random labeled graphs \((G_n)_{n \in \mathbb{N}}\) easily ensures that they have valid DFS numberings (with probability 1).
Lemma 3. \(G_n\) has a valid DFS numbering. 0◻
A situation is different for bad random labeled graphs \((B_n)_{n \in \mathbb{N}}\). While the arms of type \((B1)\) also locally maintain a valid DFS-order, the arms of type \((B2)\) do not (because of the two comb graphs). As the result, the construction of \((B_n)_{n \in \mathbb{N}}\) typically gives labelings that are \(\varepsilon\)-far from valid DFS numberings.
Lemma 4. Let \(0 \le \varepsilon\le \frac{1}{33}\) and let \(n\) be sufficiently large. Then \(B_n\) has a labeling that is \(\varepsilon\)-far from a valid DFS numbering with probability \(1 - o(1)\).
Proof. Consider the numbering for \((B2)\). Let us fix any \(k \in \{1,2,\dots,N\}\) and focus on vertices \(p_1, c_1, p_2, c_2\) with labels \(2N+k, 7N-k+1, 4N+k, 8N-k+1\), respectively. (For example, in Figure 5, in the top branch (where numbers have an offset of \(1\)), if we took \(k=2\) then we would have \(\langle p_1, c_1, p_2, c_2 \rangle\) to be vertices with labels \(\langle 11, 28, 19, 32 \rangle\) in \(B_{64}\).) Observe that the fact that \(\mathop{\mathrm{\mathfrak{num}}}(p_1) < \mathop{\mathrm{\mathfrak{num}}}(p_2) < \mathop{\mathrm{\mathfrak{num}}}(c_1) < \mathop{\mathrm{\mathfrak{num}}}(c_2)\) implies that this is not a valid DFS numbering as long as \(c_1\) is the DFS-child of \(p_1\) and \(c_2\) is the DFS-child of \(p_2\). Therefore, to turn the labeled graph into one consistent with DFS numbering, one has to add or delete at least one edge incident to a vertex from \(\{p_1,c_1,p_2,c_2\}\). Since such a quadruple is obtained for each \(k \in \{1,\dots,N\}\) and since each of these quadruples is disjoint (for example, in Figure 5, the there are four such quadruples formed by vertices with labels \(\langle 10, 29, 18, 33 \rangle\), \(\langle 11, 28, 19, 32 \rangle\), \(\langle 12, 27, 20, 31 \rangle\), and \(\langle 13, 26, 21, 30 \rangle\)), one needs to modify at least \(\frac{N}{2}\) edges to obtain a valid DFS numbering. Since the expected number of copies of \((B2)\) in \(B_n\) is \(\frac{1}{2} \cdot \lfloor\frac{n}{8N}\rfloor\), the expected number of changes required in \(B_n\) to obtain a valid DFS numbering is at least \(\frac{N}{4} \cdot \lfloor\frac{n}{8N}\rfloor\). For sufficiently large \(n\), a standard Chernoff bound implies that graph \(B_n\) is \(\frac{1}{33}\)-far from having a valid DFS numbering with high probability. ◻
Our next central lemma provides a lower bound for the number of queries of any algorithm ALG that distinguishes between the families of random labeled graphs \((G_n)_{n \in \mathbb{N}}\) and \((B_n)_{n \in \mathbb{N}}\). We will assume that the input \(I_n\) for \(n \in \mathbb{N}\) on which ALG requires \(q_n\) oracle queries is obtained by first selecting a random bit \(b \in \{0,1\}\) and then setting \(I_n = \begin{cases}G_n & \text{ if } b = 0, \\ B_n & \text{ if } b = 1.\end{cases}\)
Lemma 5. Let \(0 < \xi \le \frac{N^3}{n}\). For every randomized algorithm ALG’ that receives \(I_n\) as input, performs \(q_n \le \sqrt{\frac{\xi n}{4N}}\) queries, and outputs a bit \(b' \in \{0,1\}\), we have \(\Pr[b = b'] \le \frac{1}{2} + \xi\).
Proof. Let ALG’ be an algorithm that receives \(I_n\) as its input. We assume that ALG’ can query the oracle for any vertex \(u\) by submitting an ID of \(u\), and the oracle returns the label \(\mathop{\mathrm{\mathfrak{num}}}(u)\) of \(u\), and the IDs of all neighboring vertices. Further, without loss of generality, we assume that the IDs \(1, \dots, n\) are randomly assigned to the vertices of the graph. Therefore ALG’ is limited to querying for a random vertex (a random query) or for a vertex that has been previously found as a neighbor of an earlier queried vertex (an explorative query).
Without loss of generality, we can assume that the vertices selected by random queries are chosen in the order \(v_1, v_2, \dots\) before the run of ALG’; let \(V_R = \{v_1, \dots, v_{q_n}\}\) be the sequence of these first \(q_n\) random vertices (ALG can select only these vertices). Let \(\mathcal{E}\) be the random event that no two vertices \(u, u' \in V_R\) come from the same arm of \(I_n\). We prove the following: \[\begin{align} \Pr[\neg\mathcal{E}] \le \xi \tag{1} \\ \Pr[b = b' | \mathcal{E}] = \tfrac12 \tag{2} \end{align}\]
Observe that the two inequalities (1 )–(2 ) imply 5: \[\begin{align} \Pr[b = b'] &= \Pr[b = b' | \mathcal{E}] \cdot \Pr[\mathcal{E}] + \Pr[b = b' | \neg\mathcal{E}] \cdot \Pr[\neg\mathcal{E}] \le \tfrac12 \cdot 1 + 1 \cdot \xi \le \tfrac12 + \xi \enspace. \end{align}\]
In order to prove inequality (1 ), let us first define \(C\) to be the number of pairs \(i \ne j\) with \(i, j \le q_n\) such that both \(v_i\) and \(v_j\) come from the same arm of \(I_n\), or in other words, so that \(v_i\) and \(v_j\) belong to the same copy of arm \((B1)\). We have, \[\begin{align} \mathbb{E}[C] &= \!\!\!\!\!\sum_{1 \le i < j \le q_n} \!\!\!\!\!\Pr[\text{v_i and v_j are in the same arm}] \le \binom{q_n}{2} \cdot \frac{8N}{n} \le \frac{4 q_n^2 N}{n} \le \frac{4 (\frac{\xi n}{4N}) N}{n} \le \xi \enspace. \end{align}\] Therefore we can conclude (1 ) by Markov’s inequality: \(\Pr[\neg\mathcal{E}] = \Pr[C \ge 1] \le \mathbb{E}[C] \le \xi\).
Next, we want to prove inequality (2 ). Let \(V_R^*\) be the set of vertices reachable from vertices in \(V_R\) in at most \(q_n\) steps, that is, \(V_R^* = \{ u: \exists_{1 \le i \le q_n} \mathop{\mathrm{dist}}(v_i,u) \le q_n\}\). Let \(I_n(V_R^*)\) be the subgraph of \(I_n\) induced by the vertex set \(V_R^*\). We observe that ALG’ will make its decision solely on seeing some subgraph of \(I_n(V_R^*)\). Hence, the output \(b'\) of ALG’ is a (random) function of \(I_n(V_R^*)\). Now, we claim that conditioned on \(\mathcal{E}\), the random variables \(b\) and \(I_n(V_R^*)\) are stochastically independent, which in turn, would imply identity (2 ).
To prove that \(b\) and \(I_n(V_R^*)\) are independent, let us consider a single vertex \(v_i\) from \(V_R\) and the subgraph \(I_n(v_i)\) induced by vertices with distance at most \(q_n\) from \(v_i\). If \(I_n(v_i)\) contains at least one vertex from \(T\), then \(I_n(v_i)\) consists solely of some of the vertices from \(T\) and some vertices that are close to the roots of some of the arms (within parts denoted by \(\xrightarrow{0}\) of the arms, since each such part has length \(N\) and we have \(q_n \le \frac{N}{2}\) since \(q_n \le \sqrt{\frac{\xi n}{4N}}\) and \(\xi \le \frac{N^3}{n}\)). Since these parts are identical in \(G_n\) and \(B_n\), such paths share no information on \(b\).
Otherwise, if \(I_n(v_i)\) contains no vertex from \(T\), then \(I_n(v_i)\) is a part of an arm of \(I_n\). But then, due to \(q_n \le \frac{N}{2}\), at most one joint8 of this arm is contained in \(I_n(v_i)\). Since the labels of the roots of the arms are the same in \(G_n\) and \(B_n\), it is always well-defined what the offset of a label within its arm is. For more details, see 9 and Figure 8 therein. ◻
By 3 4, any algorithm ALG that accepts a labeled graph with a valid DFS numbering with probability at least \(\frac{2}{3}\) and rejects a labeled graph with a DFS numbering that is \(\varepsilon\)-far (for \(0 < \varepsilon\le \frac{1}{33}\)) from being valid DFS numbering with probability at least \(\frac{2}{3}\), must be able to distinguish with probability at least \(\frac{5}{8}\) between the families of good and bad random labeled graphs \((G_n)_{n \in \mathbb{N}}\) and \((B_n)_{n \in \mathbb{N}}\). However by 5, by setting \(\xi = \frac{1}{8}\) and \(N = \lfloor n^{1/3} \rfloor\), the tasks of distinguishing between these families requires \(\Omega(n^{1/3})\) queries. 0◻
Our second main result shows that the lower bound in 2 is asymptotically tight.
Theorem 3. Let \(0 < \varepsilon< 1\). There is an algorithm that with oracle access to a labeled bounded degree undirected graph \(G\) on \(n\) vertices, performs \(O(n^{1/3}/\varepsilon)\) queries to the oracle and accepts, if \(G\) has a valid DFS numbering, and rejects with probability at least \(\frac{2}{3}\), if \(G\) is \(\varepsilon\)-far from having a valid DFS numbering.
Our tester in 3 has one-sided error and always accepts a valid DFS numbering. It achieves not only \(O(n^{1/3}/\varepsilon)\) query complexity, but also it has \(O(n^{1/3}/\varepsilon)\) running time, see 5.5.
For the rest of the section, let \(G\) be a corresponding labeled graph equipped with a possibly inappropriate DFS numbering \(\mathop{\mathrm{\mathfrak{num}}}: V \rightarrow [n]\). As in 3, we assume that \(\mathop{\mathrm{\mathfrak{num}}}\) is implicit so that we can, for instance, write \(v < w\) for \(v,w \in V\) instead of \(\mathop{\mathrm{\mathfrak{num}}}(v) < \mathop{\mathrm{\mathfrak{num}}}(w)\).
Our approach to prove 3 is first to extend 2 (which characterizes valid DFS numberings) to describe a simple and useful property of labelings that are \(\varepsilon\)-far from a valid DFS numbering. In 7 in 5.1, we will show that if the numbering is \(\varepsilon\)-far from a valid DFS numbering then not only we have \(\Omega(\varepsilon n)\) conflicting pairs (in the sense of 2), but in fact we have \(\Omega(\varepsilon n)\) conflicting pairs that are “unrelated.” Once we have this property, the task in hand will be to detect any of such conflicting pair. We observe that there are two types of conflicting pairs, local pairs involving vertices whose DFS numbers are close to each other, and global conflicts. In order to detect local conflicts, we first develop (in 5.2) some basic tools to traverse a given graph following DFS numbering. Once we know how to traverse the graph, we can design an algorithm that can determine if a given pair \((v,\{u,w\})\) is conflicting pair. Unfortunately, this algorithm is efficient only if vertex \(u\) or \(w\) is close to \(p(v)\) or \(v\) (that is, if one of \(\mathop{\mathrm{\mathfrak{num}}}(u) - \mathop{\mathrm{\mathfrak{num}}}(p(v))\), \(\mathop{\mathrm{\mathfrak{num}}}(v) - \mathop{\mathrm{\mathfrak{num}}}(u)\), \(\mathop{\mathrm{\mathfrak{num}}}(w) - \mathop{\mathrm{\mathfrak{num}}}(v)\) is small), and so we can use this approach to deal with local conflicts. In order to study global conflicts, we notice that if vertices \(u\) and \(w\) are far away from vertices \(p(v)\) and \(v\), then in fact a conflicting pair \((v,\{u,w\})\) can be also extended to multiple vertices \(v\). Once we have that property, we will show that if we sample randomly \(\Theta(n^{1/3}/\varepsilon)\) vertices and \(\Theta(n^{1/3}/\varepsilon)\) edges, then if there were many global conflicts, then there would be one that is determined by one of the sampled vertices and one of the sampled edges.
We begin with describing a useful property of labelings that are \(\varepsilon\)-far from a valid DFS numbering that they have \(\Omega(\varepsilon n)\) conflicting pairs that are “unrelated.” Recall that by 2 a numbering is a valid DFS numbering if and only if there is no conflicting pair \((v,\{u,w\}) \in V \times E\) with \(p(v) < u < v < w\). In that case, when \(p(v) < u < v < w\), we speak of a conflict involving vertex \(v\) and edge \(\{u,w\}\). While 2 characterizes valid and invalid DFS numberings, we will need a stronger claim about properties of numberings that are \(\varepsilon\)-far from valid DFS numberings. For that, we need to understand numberings that are not \(\varepsilon\)-far from valid DFS numberings because we can modify the input graph with at most \(\varepsilon n\) edge modifications to ensure that the resulting graph will have a valid DFS numbering.
Observe that 2 provides a simple tool to edit the input labeled graph to obtain a valid DFS numbering — to remove all conflicting pairs. With that in mind, the following claim provides a simple fix to remove all conflicts involving a specific vertex or all conflicts involving a specific edge (notice that the resulting graph may violate our degree bound \(d\), but one can address this issue using a framework developed for that in 10).
Lemma 6. Let \(v \in V\) and \(\{u,w\} \in E\) with \(u < w\).
(i) Adding the edge \(\{v-1,v\}\) to \(G\) (and doing nothing if \(\{v-1,v\}\) is already present) does not create any new conflicts and resolves all conflicts involving \(v\).
(ii) Removing \(\{u,w\}\) from \(G\) and adding \(\{w-1,w\}\) (if not already present) does not create any new conflicts and resolves all conflicts involving \(\{u,w\}\).
Proof.
(i) After the edit we have \(p(v) = v-1\) so no \(u \in V\) can satisfy \(p(v) < u < v\) any more, meaning all conflicts involving \(v\) are resolved. We do not create any new conflicts because \(p(v-1)\) does not change and the new edge \(\{v-1,v\}\) cannot be involved in a conflict because (again) no \(v'\) can satisfy \(v-1 < v' < v\).
(ii) Deleting \(\{u,w\}\) clearly resolves all conflicts of this edge. However, new conflicts involving \(w\) may arise since \(p(w)\) may change if we had \(p(w) = u\) before. By (i) we can fix these by adding \(\{w-1,w\}\).
◻
6 shows that adding or removing a few edges can resolve many related conflicts. We use this claim to show that in order for \(G\) to be \(\varepsilon\)-far from having a valid DFS numbering, not only do there have to be many conflicts, but in fact there have to be many mutually unrelated conflicts. To formalize this idea we consider the bipartite conflict graph \(\mathcal{C}= (V,E,C)\) where \(C \subseteq V \times E\) contains an edge \((v,\{u, w\})\) precisely if it is a conflicting pair. The intuition of mutually unrelated conflicts corresponds to a matching in \(\mathcal{C}\). We have the following.
Lemma 7 (\(\varepsilon\)-far DFS numberings). If \(G\) is \(\varepsilon\)-far from having a valid DFS numbering then there is a matching \(M \subseteq C\) of size \(|M| \ge \varepsilon n/5\) in \(\mathcal{C}\).
Proof. We will prove 7 by showing that if \(G\) is \(\varepsilon\)-far from having a valid DFS numbering then \(\mathcal{C}\) has a vertex cover of size at least \(\varepsilon n/5\), for if not, then we could modify at most \(\varepsilon n\) edges of \(G\) to make the labeling to be a valid DFS numbering of the resulting graph. Then 7 follows from König’s theorem.
Let \(M \subseteq C\) be a maximum matching in \(\mathcal{C}\). By Kőnig’s theorem there is a vertex cover \(\mathcal{VC}\subseteq V \cup E\) of size \(|\mathcal{VC}| = |M|\), i.e., a set of vertices and edges of \(G\) (vertices of \(\mathcal{C}\)) such that for every conflict pair \((v,\{u,w\})\) (for every edge of \(\mathcal{C}\)) \(v \in \mathcal{VC}\) or \(\{u,w\} \in \mathcal{VC}\). We can fix all conflicts in \(G\) in \(\le 2 |\mathcal{VC}|\) edits by applying for each \(v \in \mathcal{VC}\cap V\) the fix of 6 (i) (one edit) and for each \(\{u,w\} \in \mathcal{VC}\cap E\) the fix of 6 (ii) (two edits). We obtain a graph \(G^*\) with a valid DFS numbering. However, the vertices that appeared in a fix in the role of \(v\) or \(w\) may have degree \(d+1\) in \(G^*\), higher than permitted. By our “degree reduction framework” from 14 (see 10), another \(3|\mathcal{VC}|\) edits suffices to transform \(G^*\) into \(G^{**} = (V,E^{**})\) with maximum degree \(d\) while maintaining the validity of the DFS numbering. Overall \(|E^{**} \triangle E| \le 5|\mathcal{VC}|\) and since \(G\) is \(\varepsilon\)-far we have \(|E^{**} \triangle E| \ge \varepsilon n\). Hence \(|M| = |\mathcal{VC}| \ge \varepsilon n/5\). ◻
7 immediately implies a simple tester detecting \(\varepsilon\)-far instances: we randomly sample \(\Omega(\sqrt{n/\varepsilon})\) vertices and \(\Omega(\sqrt{n/\varepsilon})\) edges, and then with a constant probability one of the sampled vertices \(v\) and one of the sampled edges \(e\) will form a conflicting pair \((v,e)\).
Corollary 1. There is an algorithm that with oracle access to a labeled bounded degree graph \(G\) on \(n\) vertices, performs \(O(\sqrt{n/\varepsilon})\) queries to the oracle and accepts, if \(G\) has a valid DFS numbering, and rejects with probability \(\frac{2}{3}\), if \(G\) is \(\varepsilon\)-far from having a valid DFS numbering. 0◻
In what follows, we will show how to improve this result, as promised in 3.
In this section, we develop tools to find conflicting pairs for vertices. We first show how to traverse a graph using consecutive labels.
Proof. We give here a rather straightforward proof of 4 using traversal algorithms tree-next (Algorithm [alg:tree-next]) and tree-prev (Algorithm [alg:tree-prev]). It relies on the fact that finding all predecessors (or all successors) can be done with a single DFS traversal, and hence it can be done in \(O(|V|+|E|)\) time, which is, on average, \(O(1)\) time per vertex (for bounded degree graphs).
Consider tree-next (Algorithm [alg:tree-next]) and tree-prev (Algorithm [alg:tree-prev]). Calling tree-next(\(v\)) for all vertices \(v\) corresponds to a full DFS run on \(T\), which traverses each edge exactly twice. The average number of queries per call is therefore \(O(1)\). The same applies to Algorithm [alg:tree-prev] tree-prev. This immediately implies that for a random vertex \(v\), finding a successor and predecessor of \(v\) can be done in expected \(O(1)\) time. ◻
Using 4, we can prove the following lemma.
Lemma 8 (dfs-next). There is an algorithm dfs-next that for a given vertex \(v \in V\):
If the numbering of \(G\) is a valid DFS numbering then either vertex \(v+1\) is returned or end-of-component is returned if \(v\) is the largest vertex in its connected component.
The expected (over vertices in \(V\)) number of queries to \(G\) of algorithm dfs-next is \(O(1)\).
There also exists an algorithm dfs-prev with corresponding properties.
Proof. Let \(T\) be the ordered tree on \(V \cup \{0\}\) rooted at \(0\) with the edges \(\{p(v),v\}\) for \(v \in V\) and children ordered ascending by number. If the numbering of \(G\) is a valid DFS numbering then (by 1) \(T\) is precisely the DFS tree that gave rise to the numbering. We can navigate to successors and predecessors in \(T\) (by 4) with an expected number of \(O(1)\) queries to \(G\) per step. The only problem is that the virtual vertex \(0\) cannot be accessed and has unbounded degree. Whenever we would have to access it, we return end-of-component instead, which is appropriate because a valid DFS backtracks to \(0\) precisely if a connected component has been fully explored. (If the numbering of \(G\) is invalid, the produced answer may be meaningless, but it is still obtained within the claimed expected time budget.) ◻
7 implies that an \(\varepsilon\)-far instance has \(\Omega(\varepsilon n)\) pairwise distinct vertices and edges involved in conflicts. In this section, we will extend that approach and study separately local conflicting pairs and global conflicting pairs in order to improve the tester from 1. This notion depend on a locality parameter that we will later choose as \(\ell = n^{1/3}\).
Definition 4. Let \((v,\{u,w\}) \in V \subseteq E\) with \(p(v) < u < v < w\) be a conflicting pair. We speak of a local conflict of the following (not mutually exclusive) types
if \(u-p(v) \le \ell\) and \(p(v) \neq 0\),
if \(v-u \le \ell\),
if \(w-v \le \ell\).
We speak of a global conflict in all other cases i.e.,
if \(p(v) = 0\) and \(\max\{v-u,w-v\} > \ell\); or if \(\max\{u-p(v),v-u,w-v\} > \ell\).
Informally, local conflicts occur when \(u\) is close (in the sense of its DFS number) to \(p(v)\) or \(v\), or \(w\) is close \(v\), in which case one can traverse the input graph to efficiently detect the conflict. For global conflicts, some more global approach will be needed.
We can detect local conflicts by sampling vertices at random and walking forwards and backwards in the (supposed) DFS order using 8 as follows.
Lemma 9. There is an algorithm that with \(O(\ell/\varepsilon)\) queries in expectation accepts all valid DFS numberings and rejects with probability \(2/3\) if there is a matching \(M \subseteq V \subseteq E\) of size at least \(\frac{\varepsilon n}{30}\) consisting of conflicting pairs of type (L1). The same applies to types (L2) and (L3).
Proof. Consider the following procedure walk-from-\(p(v)\) that is given \(v \in V\) as input. It first checks if \(p(v) \neq 0\). If so, it attempts to locate vertices \(p(v)+1,p(v)+2,\dots\) using dfs-next from 8 until one of the following happens.
If dfs-next reaches vertices out of order, then report the error.
If dfs-next reaches a vertex \(u\) that has a neighbour \(w > v\) then report the error.
If \(v\) or \(p(v)+\ell\) is reached without finding an error, then conclude that \(v\) is not involved in a conflict of type (L1).
Clearly walk-from-\(p(v)\) finds an error if \(v\) is involved in a conflict of type (L1) and by 8 it performs \(O(\ell )\) queries in expectation (over all \(v \in V\)). Our algorithm to detect conflicts of type (L1) is now simply to repeat walk-from-\(p(v)\) for many \(v\) sampled uniformly at random. Since the \(\frac{\varepsilon n}{30}\) vertices matched in \(M\) are pairwise distinct, a random vertex from \(V\) is involved in a conflict of type (L1) with probability \(\frac{\varepsilon}{30}\). If we sample \(\frac{60}{\varepsilon}\) vertices at random then the probability of sampling at least one \(v\) involved in a conflict of type (L1) is at least \[\begin{align} 1-(1-\varepsilon/30)^{60/\varepsilon} \ge 1- \mathrm{e}^{-(\varepsilon/30) \cdot (60/\varepsilon)} = 1-\mathrm{e}^{-2} \ge 2/3 \enspace. \end{align}\] The expected total number of queries amounts to \(O(\ell /\varepsilon)\), which concludes the claim.
For conflicts of type (L2) a similar procedure walk-backwards-from-\(v\) works. For conflicts of type (L3) a procedure walk-forwards-from-\(v\) would not work because dfs-next might report end-of-component before \(w\) is reached (and without us being aware of \(w\)’s existence). A procedure walk-backwards-from-\(w\) does work, however. Note that we have to sample \(\frac{60d}{\varepsilon}\) edges uniformly at random which is still \(O(1/\varepsilon)\) because \(d\) is constant. ◻
9 provides an efficient tool to detect local conflicts but for global conflicts we use a different approach. We rely on 7 that promises that there is a matching \(M \subseteq V \subseteq E\) of size at least \(\varepsilon n/10\) consisting of conflicting pairs. Therefore, if most of conflicts defining the matching \(M\) are global, we can use the following lemma.
Lemma 10. There is an algorithm that performs \(O(\sqrt{n/\ell }/\varepsilon)\) queries in expectation and accepts all valid DFS numberings and that rejects with probability at least \(2/3\) if there is a matching \(M \subseteq V \subseteq E\) of size at least \(\varepsilon n/10\) consisting of conflicting pairs of type (G).
Proof. Let us partition matching \(M\) into strips \(M_1 \oplus M_2 \oplus \dots \oplus M_{\lceil \frac{n}{\ell} \rceil}\) according to the number of \(u\), so that \(M_j = \{(v,\{u,w\}) \in M: \lceil u/\ell\rceil = j\}\). Let us define sets \(V_j := \{v \in V: \exists_{e \in E} \;(v,e) \in M_j\}\) and \(E_j := \{e \in E: \exists_{v \in V} (v,e) \in M_j\}\). Let \(\bar{v}_j\) be the median of \(V_j\). Let \(V_j^-\) be those vertices from \(V_j\) that are at most \(\bar{v}_j\) and \(E_j^+\) those edges from \(E_j\) matched to a vertex that is at least \(\bar{v}_j\). Let \(m_j := |V_j^-|\) and note that \(|E_j^+| = m_j\) and that \(m_j \ge \frac{|M_j|}{2}\).
We claim that every pair in \(V_j^- \times E_j^+\) is a conflicting pair. Indeed, let \(v_1 \in V_j^-\), \(\{u_2,w_2\} \in E_j^+\) with \(u_2 < w_2\), and let \(p_1 = (v_1, \{u_1,w_1\})\) and \(p_2 = (v_2, \{u_2,w_2\})\) be the corresponding pairs in \(M_j\). Then we observe the following: \[\begin{align} \def\reason#1#2{\stackrel{\text{(#1)}}{#2}} p(v_1) \reason{1}{\le} \max\{0,u_1-\ell\} \reason{2}{<} u_2 \reason{2}{<} u_1+\ell \reason{1}{<} v_1 \reason{3}{\le} \bar{v}_j \reason{3}{\le} v_2 \reason{4}{<} w_2 \enspace. \end{align}\] To see this, we notice the following:
(1) \(p_1\) is of type (G). Hence \(u_1-p(v_1) > \ell\), unless \(p(v_1) = 0\); moreover \(v_1-u_1 > \ell\);
(2) since both \(p_1\) and \(p_2\) are in \(M_j\), we have \(|u_1 - u_2| < \ell\);
(3) by the definition of \(V_j^-\) and \(E_j^+\) the values of \(v_1\) and \(v_2\) fall on the respective side of the median \(\bar{v_j}\);
(4) \(p_2\) is a conflicting pair.
In view of the above, since every pair in \(V_j^- \times E_j^+\) is a conflicting pair, we observe that over all \(j\) we do not just have \(m = |M|\) conflicting pairs to work with, but a collection of bicliques with \(\sum_{i = 1}^{\lceil n/\ell\rceil} (m_j)^2\) conflicting pairs in total. If we happen to get hold of a \(v \in V_j^-\) and an \(e \in E_j^+\) for the same \(j\), then we have a conflicting pair \((v,e)\).
In order to find a conflicting pair using the approach above, let us sample (i.u.r.) \(s\) vertices from \(V\) and then (i.u.r.) \(s\) edges from \(E\), where \(s\) will be set up momentarily.
For any \(1 \le i, r \le s\), let \(X_{i,r}\) to be the indicator random variable that the \(i\)th sampled element from \(V\) and the \(r\)th sampled element from \(E\) are (respectively) from the sets \(V_j^-\) and \(E_j^+\) for the same \(j\). Observe that if we define random variable \(X\) as \(X = \sum_{i=1}^s\sum_{r=1}^s \mathbb{E}[X_{i,r}]\), then by the arguments above, if \(X\) is positive then we have detected a conflicting pair. Using the second moment method, we can prove the following lemma.
Lemma 11. Let \(s \ge c \sqrt{\frac{n^3}{\ell \cdot |M|^2}}\) for a sufficiency large constant \(c\) and let \(s = \omega((d/\varepsilon)^3)\). Then \(\Pr[X>0] \ge 0.99\).
Proof. The proof is a rather straightforward, though also rather lengthly application of the second moment method.
We will use the second moment method \[\begin{align} \label{ineq:final-analysis-2nd-moment} \Pr[X > 0] &\ge \frac{\mathbb{E}[X]^2}{\mathbb{E}[X^2]} \enspace. \end{align}\tag{3}\] We first give a lower bound (4 ) for \(\mathbb{E}[X]\) and then an upper bound (7 ) for \(\mathbb{E}[X^2]\) showing that \(\mathbb{E}[X^2] \le \mathbb{E}[X] + (1+o(1)) \cdot \mathbb{E}[X]^2\). By (3 ), combining these two results will yield 11.
We begin with the study of \(\mathbb{E}[X]\). Observe that \[\begin{align} \mathbb{E}[X_{i,r}] &= \sum_{j=1}^{\lceil n/\ell \rceil} \frac{|V_j^-| \cdot |E_j^+|}{n \cdot |E|} = \frac{1}{n \cdot |E|} \sum_{j=1}^{\lceil n/\ell \rceil} (m_j)^2 \enspace, \end{align}\] and therefore \[\begin{align} \mathbb{E}[X] &= \mathbb{E}[\sum_{i=1}^s\sum_{r=1}^s X_{i,r}] = \sum_{i=1}^s\sum_{r=1}^s \mathbb{E}[X_{i,r}] = \frac{s^2}{n \cdot |E|} \sum_{j=1}^{\lceil n/\ell \rceil} (m_j)^2 \enspace. \end{align}\] Next, by Cauchy-Schwarz inequality, if we set \(m := \sum_{j=1}^{\lceil n/\ell \rceil} m_j\), we can bound this as follows: \[\begin{align} \label{ineq:final-analysis-EX} \mathbb{E}[X] &= \frac{s^2}{n \cdot |E|} \sum_{j=1}^{\lceil n/\ell \rceil} (m_j)^2 \ge \frac{s^2}{n \cdot |E| \cdot \lceil n/\ell \rceil} \left(\sum_{j=1}^{\lceil n/\ell \rceil} m_j \right)^2 \ge \frac{s^2 \cdot m^2}{n \cdot |E| \cdot \lceil n/\ell \rceil} \enspace. \end{align}\tag{4}\]
Next, we will bound \(\mathbb{E}[X^2]\). We have the following, \[\begin{align} \notag \mathbb{E}[X^2] &= \sum_{\substack{1 \le i, i' \le s\\1\le r, r' \le s}} \mathbb{E}[X_{i,r} X_{i',r'}] \\ \notag & = \sum_{\substack{1 \le i \le s\\1\le r \le s}} \mathbb{E}[X_{i,r} X_{i,r}] + \!\!\!\!\!\sum_{\substack{1 \le i \ne i' \le s\\1\le r \le s}} \!\!\!\!\!\mathbb{E}[X_{i,r} X_{i',r}] + \!\!\!\!\!\sum_{\substack{1 \le i \le s\\1\le r \ne r' \le s}} \!\!\!\!\!\mathbb{E}[X_{i,r} X_{i,r'}] + \!\!\!\!\!\sum_{\substack{1 \le i \ne i' \le s\\1\le r \ne r' \le s}} \!\!\!\!\!\mathbb{E}[X_{i,r} X_{i',r'}] \\ \notag &= \mathbb{E}[X] + \sum_{\substack{1 \le i \ne i' \le s\\1\le r \le s}} \mathbb{E}[X_{i,r} X_{i',r}] + \sum_{\substack{1 \le i \le s\\1\le r \ne r' \le s}} \mathbb{E}[X_{i,r} X_{i,r'}] + \sum_{\substack{1 \le i \ne i' \le s\\1\le r \ne r' \le s}} \mathbb{E}[X_{i,r}] \mathbb{E}[X_{i',r'}] \\ &\le \mathbb{E}[X] + \sum_{\substack{1 \le i \ne i' \le s\\1\le r \le s}} \mathbb{E}[X_{i,r} X_{i',r}] + \sum_{\substack{1 \le i \le s\\1\le r \ne r' \le s}} \mathbb{E}[X_{i,r} X_{i,r'}] + \mathbb{E}[X]^2 \label{ineq:final-analysis-EX2} \enspace. \end{align}\tag{5}\]
Let \(v_i\) denote the \(i\)th sampled vertex and \(e_r\) the \(r\)th sampled edges. To bound the second and the third terms in the summation, we observe that \[\begin{align} \mathbb{E}[X_{i,r} X_{i',r}] &= \Pr[\exists _{1 \le k \le \lceil n/\ell \rceil} v_i, v_{i'} \in V_k^-, e_r \in E_k^+] = \sum_{k=1}^{\lceil n/\ell \rceil} \frac{|V_k^-|^2 \cdot |E_k^+|}{n^2 \cdot |E|} = \sum_{k=1}^{\lceil n/\ell \rceil} \frac{(m_k)^3}{n^2 \cdot |E|} \enspace. \\ \mathbb{E}[X_{i,r} X_{i,r'}] &= \Pr[\exists _{1 \le k \le \lceil n/\ell \rceil} v_i \in V_k^-, e_r, e_{r'} \in E_k^+] = \sum_{k=1}^{\lceil n/\ell \rceil} \frac{|V_k^-| \cdot |E_k^+|^2}{n \cdot |E|^2} = \sum_{k=1}^{\lceil n/\ell \rceil} \frac{(m_k)^3}{n \cdot |E|^2} \enspace. \end{align}\] Further, observe that since10 \(0 \le m_k \le d\ell\) for each \(m_k\), and since \(\sum_{k=1}^{\lceil n/\ell \rceil} m_k = m\), we have \[\begin{align} \sum_{k=1}^{\lceil n/\ell \rceil} (m_k)^3 &\le \sum_{k=1}^{\lceil m/d\ell \rceil} (m_k)^3 \le \lceil m/d\ell \rceil \cdot (d\ell)^3 \le 2 m d^2 \ell^2 \enspace. \end{align}\] Therefore, if we plug these bounds in our upper bound of \(\mathbb{E}[X^2]\) in (5 ) then we obtain \[\begin{align} \mathbb{E}[X^2] &\le \mathbb{E}[X] + \frac{2 m d^2 \ell^2 s^3}{n \cdot |E|} \cdot \left(\frac{1}{n} + \frac{1}{|E|}\right) + \mathbb{E}[X]^2 \label{ineq:final-analysis-key-bound} \enspace. \end{align}\tag{6}\] We claim that for \(s = \omega((d/\varepsilon)^3)\), the middle term is \(o(\mathbb{E}[X]^2)\). Observe the following,
\[\begin{align} \frac{2 m d^2 \ell^2 s^3}{n \cdot |E|} \cdot \left(\frac{1}{n} + \frac{1}{|E|}\right) &= \frac{2 m d^2 \ell^2 s^3 \cdot (n+|E|)}{n^2 \cdot |E|^2} \le \frac{2 m d^2 \ell^2 s^3 \cdot (dn)}{n^2 \cdot |E|^2} = \frac{4 s^4 m^4 \ell^2}{n^4 |E|^2} \cdot \frac{d^3 n^3}{2 s m^3} \enspace. \end{align}\] Therefore, if \(s = \omega((d/\varepsilon)^3)\), then, by (4 ), we obtain \[\begin{align} \frac{2 m d^2 \ell^2 s^3}{n \cdot |E|} \cdot \left(\frac{1}{n} + \frac{1}{|E|}\right) &\le \frac{4 s^4 m^4 \ell^2}{n^4 |E|^2} \cdot \frac{d^3 n^3}{2 s m^3} = o(\mathbb{E}[X]^2) \enspace. \end{align}\] This combined with (6 ) immediately implies that for \(s = \omega((d/\varepsilon)^3)\), it holds, \[\begin{align} \mathbb{E}[X^2] &\le \mathbb{E}[X] + \frac{2 m d^2 \ell^2 s^3}{n \cdot |E|} \cdot \left(\frac{1}{n} + \frac{1}{|E|}\right) + \mathbb{E}[X]^2 \le \mathbb{E}[X] + (1+o(1)) \cdot \mathbb{E}[X]^2 \label{ineq:final-analysis-very-key-bound} \enspace. \end{align}\tag{7}\]
To conclude the proof of 11, we take \(s = \omega((d/\varepsilon)^3)\) and put (7 ) in (3 ), to obtain \[\begin{align} \Pr[X > 0] &\ge \frac{\mathbb{E}[X]^2}{\mathbb{E}[X^2]} \ge \frac{\mathbb{E}[X]^2}{\mathbb{E}[X] + (1+o(1)) \cdot \mathbb{E}[X]^2} = \frac{1}{1 + o(1) + 1/\mathbb{E}[X]} \enspace. \end{align}\] Therefore, if \(\mathbb{E}[X] \ge 100\), then \(\Pr[X > 0] \ge 0.99\).
Now, by (4 ), we have the following, \[\begin{align} \mathbb{E}[X] &\ge \frac{s^2 \cdot m^2}{n \cdot |E| \cdot \lceil n/\ell \rceil} \ge \frac{s^2 \cdot |M|^2}{4 \cdot n \cdot |E| \cdot \lceil n/\ell \rceil} \ge \frac{s^2 \cdot |M|^2}{4 \cdot n \cdot (dn/2) \cdot (n/\ell)} = \frac{s^2 \cdot \ell \cdot |M|^2}{2 \cdot d \cdot n^3} \enspace. \end{align}\] Therefore, if \(s \ge \sqrt{\frac{200 \cdot d \cdot n^3}{\ell \cdot |M|^2}}\) and using the fact that \(s = \omega((d/\varepsilon)^3)\), then \(\mathbb{E}[X] \ge 100\), completing the proof of 11. ◻
Now, we are ready to complete the proof of 10. If we sample i.u.r. at least \(c \cdot \sqrt{\frac{n^3}{\ell |M|^2}}\) vertices from \(V\) and at least \(c \cdot \sqrt{\frac{n^3}{\ell |M|^2}}\) edges from \(E\) for a sufficiently large constant \(c\), then by 11, with a constant probability we will detect a conflicting vertex. At the same time, if the DFS numbering is valid then the algorithm will accept it. This yields 10 since in our setting \(|M| = \Omega(\varepsilon n)\) and \(\ell = n^{1/3}\). ◻
Now we are ready to complete the proof of 3. Our algorithm runs the algorithms from 9 (responsible or conflicts of type (L1), (L2) and (L3)) and from 10 one after the other. If any of them rejects \(G\) then we reject \(G\), otherwise we accept \(G\). The expected total running time is \(O(\ell /\varepsilon+\sqrt{n/\ell }/\varepsilon)\), which with \(\ell = \Theta(n^{1/3})\) gives \(O(n^{1/3}/\varepsilon)\) as claimed. (The query complexity can be made \(O(n^{1/3}/\varepsilon)\) using Markov inequality.)
Concerning correctness, it is clear that instances with valid DFS numberings are always accepted. If \(G\) is \(\varepsilon\)-far from a valid DFS numbering then by 7 there is a matching \(M \subseteq V \subseteq E\) of \(|M| \ge \varepsilon n/5\) conflicting pairs. Since each matching edge falls into (at least) one type, at least one of the following statements holds.
There is a matching \(M_{\boldsymbol{\upshape\sffamily\color{black} (L1)}}\) of \(|M_{\boldsymbol{\upshape\sffamily\color{black} (L1)}}|\ge \varepsilon n/30\) conflicting pairs of type (L1).
There is a matching \(M_{\boldsymbol{\upshape\sffamily\color{black} (L2)}}\) of \(|M_{\boldsymbol{\upshape\sffamily\color{black} (L2)}}|\ge \varepsilon n/30\) conflicting pairs of type (L2).
There is a matching \(M_{\boldsymbol{\upshape\sffamily\color{black} (L3)}}\) of \(|M_{\boldsymbol{\upshape\sffamily\color{black} (L3)}}|\ge \varepsilon n/30\) conflicting pairs of type (L3).
There is a matching \(M_{\boldsymbol{\upshape (G)}}\) of \(|M_{\boldsymbol{\upshape (G)}}|\ge \varepsilon n/10\) conflicting pairs of type (G).
In each case, the corresponding algorithm rejects \(G\) with probability \(2/3\) so overall we reject \(G\) with probability at least \(2/3\). 0◻
The following lemma implies immediately that our main algorithm from 3 does not merely have a query complexity of \(O(n^{1/3}/\varepsilon)\), but also a running time of \(O(n^{1/3}/\varepsilon)\).
Lemma 12. There is an algorithm that, given \(V' \subseteq V\) and \(E' \subseteq E\) returns
returns conflict if there exist \(v_1,v_2 \in V'\) such that \((v_1,\{p(v_2),v_2\})\) is a conflicting pair,
returns conflict if there is a conflicting pair in \(V' \subseteq E'\),
returns no conflict otherwise.
The algorithm runs in time \(\mathrm{sort}(O(|V'|+|E'|))\) where \(\mathrm{sort}(k)\) is the time needed to sort \(k\) elements of \([n]\).
Proof. In a manner resembling sweepline algorithms, we intend to “sweep” through the range \([0,n]\) from left to right handling relevant events in sorted order.
For each \(v \in V'\) we consider a vertex-interval \([p(v),v] \subseteq \mathbb{Z}\) and for each \(\{u<w\} \in E'\) an edge-interval \([u,w] \subseteq \mathbb{Z}\). For each such interval \([x,y]\) there is a corresponding start-event at \(x\) and an end-event \(y\), which we sort in increasing order.11
While iterating over the events we maintain a sorted list \(L\) of active \(v \in V'\) for which the start event but not end event of \([p(v),v]\)has been processed. Processing the end event of a vertex interval \([p(v),v]\) (at time \(v\)) just requires removing \(\min L\) from \(L\) (in \(O(1)\) time). When processing the start event of \([p(v),v]\) (at time \(p(v)\)) we check if \(v < \min L\). If so we can add \(v\) as the new minimum of \(L\) (in \(O(1)\) time). Otherwise we have \(p(\min(L)) < p(v) < \min(L) < v\) and can return conflict. Likewise, when processing the start event of an edge interval \([u,w]\) we check if \(w > \min(L)\) and if so can return conflict due to \(p(\min(L)) < u < \min(L) < w\). Otherwise we know that any \(v \in V'\) with \(p(v) < u < v\) must be active and therefore satisfy \(w \le \min(L) \le v\) so \(\{u,w\} \in E'\) is not in conflict with any \(v \in V'\). For end events of edge-intervals no action in required. If all events are processed and no conflict is reported, we can correctly report no conflict.
It is easy to see that the running time of our algorithm is dominated by sorting. ◻
In this paper we introduced a variant of the standard bounded-degree graph model in the property testing setting that works for labeled graphs and allows also label queries. We demonstrated the strength of the model on our new study of DFS numbering. Our main technical contribution is a tight analysis for detecting whether the input labeled graph is properly DFS numbered or it is \(\varepsilon\)-far from having a valid DFS numbering. We demonstrated that this task can be solved with \(\Omega(n^{1/3}/\varepsilon)\) queries and also \(\Omega(n^{1/3})\) queries are necessary.
We observe that while our analysis is presented for undirected graphs, similar arguments hold also for directed graphs. The lower bound from 2 trivially extends to directed graphs and a careful pass through the algorithm in 3 shows that the analysis can be extended accordingly. However, to implement this algorithm efficiently in our setting, we need to allow access to incoming and outgoing edges. (See Appendix 7 for more details.)
Our analysis can be also extended (but only for undirected graphs) to the DFS finishing numbers (FIN-numberings [6]). In Appendix 8, we show that (for undirected graphs) a numbering \(\mathop{\mathrm{\mathfrak{num}}}: V \rightarrow [n]\) is a valid FIN numbering iff the reverse numbering \(\smash{\stackrel{\raisebox{2.5pt}{\longleftarrow}}{\smash{\mathop{\mathrm{\mathfrak{num}}}}}}\) with \(\smash{\stackrel{\raisebox{2.5pt}{\longleftarrow}}{\smash{\mathop{\mathrm{\mathfrak{num}}}}}}(i) = n+1-i\) is a valid DFS numbering. This immediately implies that our results for testing valid DFS numberings extend to testing valid FIN numberings in undirected graphs.
As we mentioned in 6, our analysis in [section:DFS-lb] 5 naturally extends to directed graphs. Firstly, it is easy to see that the lower bound from 2 trivially extends to directed graphs. As for the upper bound, the key feature allowing us to consider directed graphs is that 3 of conflicting pairs extends to directed graphs. Then, a careful pass through the algorithm in 3 shows that the analysis can be extended accordingly. However, in order to implement it in our setting, for graph traversal, the tester requires access to all neighbors of any given vertex, both the endpoints of outgoing edges incident to a vertex and the endpoints of incoming edges incident to that vertex (in particular, in our analysis of local conflicts in 5.3.1).
For any vertex \(v \in V\), let \(\textsf{out}(v)\) denote the set of the endpoints of outgoing edges incident to \(v\), \(\textsf{out}(v) := \{u \in V: (v,u) \in E\}\), and \(\textsf{in}(v)\) denote the set of the endpoints of incoming edges incident to \(v\), \(\textsf{in}(v) := \{u \in V: (u,v) \in E\}\). Then DFS numbering for directed graphs is defined by the following algorithm (the labeling is not unique and depends on the ordering in which the vertices are traversed in the for-loops of Algorithms [alg:DFS-numbering-def] and [alg:DFS-numbering-def-rec]).
We consider a setting where the access to the input graph \(G=(V,E)\) with vertex labeling \(\mathop{\mathrm{\mathfrak{num}}}\) is through the oracle allowing the following three types of queries:
In-neighbor queries: for every vertex \(v \in V\) and every \(1 \le i \le d\), one can query the \(i\)th in-neighbor of vertex \(v\) (i.e., the \(i\)th vertex in \(\textsf{in}(v)\) with arbitrary but fixed order of vertices in set \(\textsf{in}(v)\)).
Out-neighbor queries: for every vertex \(v \in V\) and every \(1 \le i \le d\), one can query the \(i\)th out-neighbor of vertex \(v\).
Label queries: for every vertex \(v \in V\), one can query the label of \(v\), \(\mathop{\mathrm{\mathfrak{num}}}(v)\).
Remark 5. We notice that the requirement to access edges in both directions has been studied in the graph property testing setting (see, e.g., [9], [24]–[26]), though also the one-directional model has been considered (and testing in that model is known to be significantly more expensive, see, e.g., [24], [25]).
Then the following theorem extends 3 to directed graphs.
Theorem 6. Let \(0 < \varepsilon< 1\). There is an algorithm that with oracle access to a labeled bounded degree directed graph \(G\) on \(n\) vertices, performs \(O(n^{1/3}/\varepsilon)\) queries to the oracle and accepts, if \(G\) has a valid DFS numbering, and rejects with probability at least \(\frac{2}{3}\), if \(G\) is \(\varepsilon\)-far from having a valid DFS numbering.
The proof of 6 mimics the proof of 3, and so we will sketch it here. In order to extend our analysis from 5 to directed graphs, we have to revise the definition of \(p(v)\). For a directed implicitly labeled graph \(G = (V, E)\) with \(V = [n]\), we define \(p(v)\) for \(v \in V\) as \[\begin{align} p(v) := \begin{cases} \max\{u \in \textsf{in}(v) \cap [v-1]\} & \text{ if } \textsf{in}(v) \cap [v-1] \ne \emptyset \enspace,\\ 0 & \text{ otherwise.} \end{cases} \end{align}\] Then, with the same analysis as for undirected graphs, 1 holds for directed graphs and so does 2 with revised 3 of conflicting pairs \((v,(u,w)) \in V \times E\) if \(p(v) < u < v < w\). Next, 6 and 7 work for directed graphs as well. Then, using the fact that one can access both in- and out-neighbors, 4 and 8 hold without any change in the analysis, and so are 9 10. With all these claims at hand, as in 5.4, we obtain the proof of 6. 0◻
Finally, let us notice that the arguments used in 1 can be applied also for directed graphs. Since that approach does not need to use the neighbors queries in both directions, this is giving us a tester with \(O(\sqrt{n/\varepsilon})\) queries to the oracle that uses only the out-neighbor queries and label queries (and so, not using the in-neighbor queries).
In the context of a depth first search, the FIN numbering of vertices reflects the order in which vertices are marked as finished, i.e., \(v\) receives FIN number \(i\) if dfs-visit(\(v\)) is the \(i\)th call to dfs-visitto conclude12, see also Chapter 20.3 in [6]. In this section, we will show that in undirected graphs a numbering \(\mathop{\mathrm{\mathfrak{num}}}: V \rightarrow [n]\) is a valid FIN numbering if and only if the reverse numbering \(\smash{\stackrel{\raisebox{2.5pt}{\longleftarrow}}{\smash{\mathop{\mathrm{\mathfrak{num}}}}}}\) with \(\smash{\stackrel{\raisebox{2.5pt}{\longleftarrow}}{\smash{\mathop{\mathrm{\mathfrak{num}}}}}}(i) = n+1-i\) is a valid DFS numbering13, which immediately implies that our upper and lower bounds for testing valid DFS numberings (2 3) carry over to testing valid FIN numberings in undirected graph.
To prove the claim we consider “mirrored” notions of \(p(v)\) and conflicting pairs from 5. For any implicitly numbered graph we define \[p^{\mathrm{FIN}}(v) := \begin{cases} \min\{u \in N(v) \cap \{v+1, \dots ,n\}\} & \text{ if N(v) \cap \{v+1, \dots ,n\} \neq \emptyset,}\\ \infty & \text{ otherwise.} \end{cases}\] Note that the imagined virtual parent of all orphaned vertices is now a virtual vertex \(\infty\) which intuitively finishes last. We say a pair \((v,\{u,w\})\) for \(v \in V\) and \(\{u,w\} \in E\) is a fin-conflicting pair if \[w < v < u < p^{\mathrm{FIN}}(v).\] The following lemma summarizes the relationship between valid FIN-numberings and valid DFS numberings.
Lemma 13. Let \(G = (V,E)\) be an undirected implicitly labeled graph. The following claims are equivalent.
\(G\) has a valid FIN numbering.
There exists no FIN-conflicting pair.
There exists no conflicting pair in the reverse numbering.
The reverse numbering is a valid DFS numbering.
The equivalence of (ii) and (iii) is easy to see since the definitions of FIN-conflicting pair and conflicting pair coincide except for flipping the ordering (as well as the sentinels \(0\) and \(\infty\)). The equivalence of (iii) and (iv) was the content of 2. Before we proceed to show the equivalence of (i) and (ii) we prove a claim analogous to 1.
Claim 7. If the numbers of an implicitly labeled graph correspond to a FIN numbering then \(p^{\mathrm{FIN}}(v)\) is the (possibly virtual) parent of \(v\).
Proof. If \(v\) is an orphan then it has the highest FIN number in its connected component, hence \(\{u \in N(v): u > v\} = \emptyset\) and thus \(p^{\mathrm{FIN}}(v) = \infty\). This is also the virtual parent of \(v\) by definition.
Otherwise \(v\) has a non-virtual parent \(x\). Since we only return to processing \(x\) when \(v\) is finished we have \(x \in \{u \in N(v): u > v\}\). Now consider any \(x' \in \{u \in N(v): u > v\}\). Right after \(v\) finishes, \(x'\) must already be discovered but due to \(x' > v\) it has not yet finished. Thus \(x'\) is active and \(\{u \in N(v): u > v\}\) only contains vertices on the white path. Since FIN numbers appear in descending order on the white path \(x\) is the smallest among them as we defined it. ◻
Proof of 13..
Consider a DFS run that agrees with the FIN numbering and let \((v,\{u,w\})\) for \(v \in V\) and \(\{u,w\} \in E\) be a pair with \(w < v < p^{\mathrm{FIN}}(v)\) and \(u > v\). By 7 \(v\) was discovered from \(p^{\mathrm{FIN}}(v)\). Consider the time immediately after \(v\) has finished. Since \(w < v\) has previously finished its neighbor \(u \in N(w)\) has been discovered, but since \(u > v\) it has not yet finished. The active vertices \(u\) and \(p^{\mathrm{FIN}}(v)\) appear in descending order of FIN numbers along the white path with \(p^{\mathrm{FIN}}(v)\) being the last one we have just returned to, so \(p^{\mathrm{FIN}}(v) < u\). In particular \((v,\{u,w\})\) is not a FIN-conflicting pair.
Consider the tree \(T_{\mathrm{FIN}}= ([n] \cup \{\infty\}, \{(p^{\mathrm{FIN}}(v),v) : v \in [n]\}\) rooted at \(\infty\). It is acyclic because numbers are descending along edges. Note that \(\infty\) is the only vertex without incoming edges. We now establish two properties of \(T_{\mathrm{FIN}}\).
Firstly, \(G\) has no cross edges with respect to \(T_{\mathrm{FIN}}\), meaning for any \(\{u,w\} \in E\) either \(u\) is a decendant of \(w\) or vice versa. To see this, assume without loss of generality that \(w < u\). If \(p^{\mathrm{FIN}}(w) = u\) there is nothing to show, and \(p^{\mathrm{FIN}}(w) > u\) contradicts the definition of \(p^{\mathrm{FIN}}(w)\) so we may assume \(p^{\mathrm{FIN}}(w) < u\). Now consider the unique path \(P\) from \(p^{\mathrm{FIN}}(w)\) to \(\infty\) in \(T_{\mathrm{FIN}}\). Since \(p^{\mathrm{FIN}}(w) < u < \infty\) and numbers appear in increasing order on \(P\) there must be two adjacent \(v\) and \(p^{\mathrm{FIN}}(v)\) on \(P\) with \(v < u \le p^{\mathrm{FIN}}(v)\). Together we have \(w < v < u \le p^{\mathrm{FIN}}(v)\). The only way for this not to be a FIN-conflicting pair is \(u = p^{\mathrm{FIN}}(v)\). Hence \(u\) is on \(P\) and thus an ancestor of \(w\) as claimed.
Secondly, subtrees of \(T_{\mathrm{FIN}}\) use non-overlapping ranges of number, meaning the following. Consider a vertex \(r\) in \(T_{\mathrm{FIN}}\) with two children \(\ell\) and \(r\) and let \(T_\ell\) and \(T_r\) be the trees of all descendants of \(\ell\) and \(r\), respectively. If \(x_1\) and \(x_2\) appear in \(T_\ell\) and \(y\) appears in \(T_r\) then \(x_1 < y < x_2\) is impossible. To see this consider for contradiction a counterexample with maximum sum \(x_1 + y + x_2\). By maximality we have \(x_2 = \ell\) (otherwise replace \(x_2\) with \(\ell\)), \(p^{\mathrm{FIN}}(y) > x_2\) (otherwise replace \(y\) with \(p^{\mathrm{FIN}}(y)\)) and \(p^{\mathrm{FIN}}(x_1) > y\) (otherwise replace \(x_1\) with \(p^{\mathrm{FIN}}(x_1)\)). This gives \(x_1 < y < p^{\mathrm{FIN}}(x_1) \le \ell = x_2 < p^{\mathrm{FIN}}(y)\) and thus the FIN-conflicting pair \((y,\{p^{\mathrm{FIN}}(x_1),x_1\})\).
We can now argue by induction on \(k \in [n]\), that there exists a DFS run and a point in time such that the run has assigned the first \(k-1\) numbers as intended and where the white path is the unique path from \(\infty\) to \(k\) in \(T_{\mathrm{FIN}}\). For \(k = 1\) note that there is a unique path from \(\infty\) to \(1\) in \(T_{\mathrm{FIN}}\) and a DFS run may legally follow it. Now assume we have established the claim for \(k\) as witnessed by a DFS run currently stuck in \(k\) and we want to extend the claim to \(k+1\). First we show that the DFS may now legally backtrack from \(k\). To see this consider \(x \in N(k)\). If \(x < k\) then \(x\) is finished by induction, so not undiscovered. if \(x > k\) then by our first claim above \(x\) must be an ancestor of \(k\) in \(T_{\mathrm{FIN}}\). By induction \(x\) must lie on the white path. Thus \(x\) is already active and not undiscovered. After our DFS run has backtracked to \(p^{\mathrm{FIN}}(k)\) then, unless \(p^{\mathrm{FIN}}(k) = k+1\), we still have to show that it can descend within \(T_{\mathrm{FIN}}\) to \(k+1\). Since \(k < k+1 < p^{\mathrm{FIN}}(k)\) our second claim shows that \(k+1\) and \(p^{\mathrm{FIN}}(k)\) do not reside in disjoint subtrees, in other words \(p^{\mathrm{FIN}}(k)\) must be an ancestor of \(k+1\). All vertices on the \(T_{\mathrm{FIN}}\)-path from \(p^{\mathrm{FIN}}(k)\) to \(k+1\) are neither finished (because their numbers are \(\ge k+1\)) nor active (because all active vertices are on the white path and ancestors of \(p^{\mathrm{FIN}}(k)\)), so those vertices are undiscovered and may legally be chosen by the DFS.
◻
Unfortunately for directed graphs this close correspondence between valid DFS numberings and valid FIN numberings no longer holds. For instance for the graph \(G = (\{x,y\},\{(x,y)\})\) with a single edge both numberings \(\{ x \rightarrow 1, y \rightarrow 2\}\) and \(\{ x \rightarrow 2, y\rightarrow 1\}\) are valid DFS numberings, but only the second is a valid FIN numbering. (Another way of arguing about it follows easily from Exercise 20-3-5 in [6].)
In Figure 8, we present structural observations about an arm that can be revealed in \(I_n(v_i)\). Here a green solid dot indicates the joint that is contained in \(I_n(v_i)\) and blue circles indicate joints not contained in \(I_n(v_i)\). The central fact is that each observation is consistent with exactly one good arm (\((G1)\) or \((G2)\)) and exactly one bad arm (\((B1)\) or \((B2)\)). In general, when we also take into account the case when \(I_n(v_i)\) not contain any joint, we can say that \(I_n(v_i)\) is consistent with the same number of good and bad arms. Therefore for any labeled graph \(H\), we have \(\Pr[I_n(v_i) = H | b=0] = \Pr[I_n(v_i) = H | b=1]\). This is precisely what it means that \(I_n(v_i)\) and \(b\) are independent. Since \(G_n\) and \(B_n\) select their arms independently, conditioning on \(\mathcal{E}\), the same observation can be extended to hold for other vertices \(v_i' \in V_R\), and hence to the entire graph \(I_n(V_R^*)\). This implies that random variables \(b\) and \(I_n(V_R^*)\) are stochastically independent, which yields inequality (2 ) from the proof of 5 in 4.3.
Figure 8: A set of structural observations about an arm that can be revealed in \(I_n(v_i)\)..
In this section, we present a useful framework supporting our analysis in 5.1, showing that if a connected undirected graph \(G = (V,E)\) of maximum degree \(d\) has a valid DFS numbering \(\mathop{\mathrm{\mathfrak{num}}}\), then one can modify \(O(k)\) edges of \(G\) to obtain a connected graph on vertex set \(V\) whose maximum degree is smaller than \(d\), where \(k\) is the number of vertices with degree \(d\). (We observe that it is sufficient to prove the claim for connected graphs only, since if \(G\) is not connected, then the same construction can be applied to each connected component individually.)
Let us begin with two straightforward auxiliary claims that follow easily from the fact that the parent relation \(p(v)\) defines a DFS tree of a given connected graph equipped with a valid DFS numbering \(\mathop{\mathrm{\mathfrak{num}}}\).
Claim 8. Let \(G = (V,E)\) be an arbitrary connected graph equipped with a valid DFS numbering \(\mathop{\mathrm{\mathfrak{num}}}\). Let \(\{u,v\}\) be an arbitrary edge in \(E\) with \(\mathop{\mathrm{\mathfrak{num}}}(u) < \mathop{\mathrm{\mathfrak{num}}}(p(v)) < \mathop{\mathrm{\mathfrak{num}}}(v)\). Then the graph \(G' = (V,E \setminus \{u,v\})\) is connected and \(\mathop{\mathrm{\mathfrak{num}}}\) is its valid DFS numbering. 0◻
Claim 9. Let \(G = (V,E)\) be an arbitrary connected graph equipped with a valid DFS numbering \(\mathop{\mathrm{\mathfrak{num}}}\). Let \(v\) be an arbitrary vertex in \(V\) and let \(u_1, \dots, u_k\) be all neighbors of \(v\) with DFS numbers greater than \(\mathop{\mathrm{\mathfrak{num}}}(v)\), with \(\mathop{\mathrm{\mathfrak{num}}}(u_1) < \dots < \mathop{\mathrm{\mathfrak{num}}}(u_k)\). Let \(w\) be vertex with \(\mathop{\mathrm{\mathfrak{num}}}(w) = \mathop{\mathrm{\mathfrak{num}}}(u_k)-1\). If \(k > 1\) then the graph \(G' = (V,E \setminus \{v,u_k\} \cup \{w,u_k\})\) is connected and \(\mathop{\mathrm{\mathfrak{num}}}\) is its valid DFS numbering.
Furthermore, if \(k > 1\) then for every neighbor \(x\) of \(w\) we have \(\mathop{\mathrm{\mathfrak{num}}}(x) < \mathop{\mathrm{\mathfrak{num}}}(w)\). 0◻
Using Claims 8 and 9, we can easily prove the following.
Lemma 14. Let \(G = (V,E)\) be an arbitrary connected graph of maximum degree \(d \ge 3\) equipped with a valid DFS numbering \(\mathop{\mathrm{\mathfrak{num}}}\). Let \(V_d\) be the set of vertices of degree \(d\) in \(G\). Then one can construct a connected graph \(G' = (V,E')\) of maximum degree at most \(d-1\) for which \(\mathop{\mathrm{\mathfrak{num}}}\) is a valid DFS numbering and such that \(|E \triangle E'| \le 3 \cdot |V_d|\).
Proof. We process vertices in \(V_d\) one by one, in arbitrary order, and reduce their degrees without increasing the degree of any other vertex to more than \(d-1\). For that, iteratively, take a vertex \(v \in V_d\).
If \(v\) has at least two neighbors \(u\), \(w\) with \(\mathop{\mathrm{\mathfrak{num}}}(u) < \mathop{\mathrm{\mathfrak{num}}}(w) < \mathop{\mathrm{\mathfrak{num}}}(v)\) then remove edge \(\{u,v\}\) from \(G\). By 8, the resulting graph is connected with valid DFS numbering \(\mathop{\mathrm{\mathfrak{num}}}\) and has one less vertex of degree \(d\).
If \(v\) has no two neighbors \(u\), \(w\) with \(\mathop{\mathrm{\mathfrak{num}}}(u) < \mathop{\mathrm{\mathfrak{num}}}(w) < \mathop{\mathrm{\mathfrak{num}}}(v)\), then it has \(k \ge d-1 \ge 2\) neighbors \(u_1, \dots, u_k\) with DFS numbers greater than \(\mathop{\mathrm{\mathfrak{num}}}(v)\), with \(\mathop{\mathrm{\mathfrak{num}}}(u_1) < \dots < \mathop{\mathrm{\mathfrak{num}}}(u_k)\). Let \(w\) be the vertex with \(\mathop{\mathrm{\mathfrak{num}}}(w) = \mathop{\mathrm{\mathfrak{num}}}(u_k)-1\). Then replace in \(G\) edge \(\{v,u_k\}\) with a new edge \(\{w,u_k\}\). Furthermore, since this increases the degree of vertex \(w\), we want to remove one edge incident to \(w\) unless \(w\) has had initially a unique neighbor. If \(w\) had two or more neighbors, then 9 ensures that all of them have DFS numbers smaller than \(w\), and hence by 8, we can remove one of the incident edges to \(w\), maintaining the validity of DFS numbering \(\mathop{\mathrm{\mathfrak{num}}}\) and of the connectivity of the resulting graph.
By Claims 8 and 9, the resulting graph is connected with valid DFS numbering \(\mathop{\mathrm{\mathfrak{num}}}\) and has one less vertex of degree \(d\).
If we iterative proceed through all vertices from \(V_d\), then we obtain a graph \(G = (V,E')\) that is connected with valid DFS numbering \(\mathop{\mathrm{\mathfrak{num}}}\) and has no vertex of degree \(d\). Since we modify at most three edges per vertex from \(V_d\), we conclude that \(|E \triangle E'| \le 3 \cdot |V_d|\), as promised. ◻
Observe that it is not difficult to generalize 14 to obtain a bigger reduction of the maximum degree (though this lemma is not needed in our analysis).
Lemma 15 (DFS degree reduction). Let \(G = (V,E)\) be an arbitrary connected graph of maximum degree \(d\) equipped with a valid DFS numbering \(\mathop{\mathrm{\mathfrak{num}}}\). For any \(d'\), let \(V_{d'}\) be the set of vertices of degree \(d'\) in \(G\). Then for any \(d^* \ge 3\) we can construct a connected graph \(G = (V,E')\) of maximum degree at most \(d^*\) for which \(\mathop{\mathrm{\mathfrak{num}}}\) is a valid DFS numbering and such that \(|E \triangle E'| \le \sum_{k=d^*+1}^d 3 \cdot (k-d^*) \cdot |V_k|\).
Proof. This follows directly from 14 by first reducing the maximum degree from \(d\) to \(d-1\) using at most \(3 \cdot |V_d|\) edits of \(G\), then reducing the maximum degree from \(d-1\) to \(d-2\) using at most \(3 \cdot |V_d \cup V_{d-1}|\) further edits of \(G\), and so on so forth. ◻
Department of Computer Science and DIMAP, University of Warwick, Coventry, UK, and University of Cologne, Germany. Email: A.Czumaj@warwick.ac.uk. Research partially supported by the Centre for Discrete Mathematics and its Applications (DIMAP), by a Weizmann-UK Making Connections Grant, and by an IBM Award.↩︎
University of Cologne, Germany. Email: csohler@uni-koeln.de.↩︎
Karlsruhe Institute of Technology, Germany. Email: stefan.walzer@kit.edu. Research funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) — 465963632.↩︎
A preliminary version of the paper appeared in Proceedings of the 33rd European Symposium on Algorithms (ESA), pages 76:1—76:15, Warsaw, Poland, September 15–17, 2025.↩︎
While one often uses the bound of \(\varepsilon d |V|\) edges instead of \(\varepsilon|V|\) edges, in the setting when \(d=O(1)\) these terms differ only by a constant factor and as such there are essentially indistinguishable.↩︎
Modification of the edges means edge insertions and deletions, i.e., we require \(|E \triangle E'| > \varepsilon n\), where \(\triangle\) is the symmetric difference.↩︎
We use the following revised definition: a labeling \(\mathop{\mathrm{\mathfrak{num}}}\) is \(\varepsilon\)-far from a valid DFS numbering of \(G\) if for any graph \(G' = (V,E')\) of maximum degree at most \(d\) with a valid DFS numbering \(\mathop{\mathrm{\mathfrak{num}}}'\) we have \(|E \triangle E'| + |\{ v \in V: \mathop{\mathrm{\mathfrak{num}}}(v) \ne \mathop{\mathrm{\mathfrak{num}}}'(v)\}| \ge \varepsilon n\).↩︎
By a joint we mean an endpoint of a path or comb from which the arm was stitched together.↩︎
The DFS starts at the root and respects the order of children when visiting them in pre-order traversal.↩︎
To see that, observe that our definition of sets \(M_j\) ensures that every edge in \(M_j\) is of the form \(\{u,w\}\) with \((j-1) \ell < u \le j \ell\) and each vertex \(u\) is of degree at most \(d\).↩︎
For the ensuing approach, ties need to be broken as follows: End events come before start events. Two end events or two start events are sorted in decreasing order of their sibling events.↩︎
To generate such numbers use Algorithm [alg:DFS-numbering-def] but within Algorithm [alg:DFS-numbering-def-rec] move lines 2 and 3 (\(\mathop{\mathrm{\mathfrak{num}}}(v) \leftarrow t\) and \(t \leftarrow t+1\)) to the end, after line 6.↩︎
This claim should not be confused with the mistaken claim that in a single DFS a vertex \(v\) receiving DFS number \(\mathop{\mathrm{\mathfrak{num}}}(v)\) receives FIN number \(\smash{\stackrel{\raisebox{2.5pt}{\longleftarrow}}{\smash{\mathop{\mathrm{\mathfrak{num}}}}}}(v)\). In general, the DFS run that produces the DFS numbering \(\mathop{\mathrm{\mathfrak{num}}}\) visits vertices in a different order than the DFS run that produces the FIN numbering \(\smash{\stackrel{\raisebox{2.5pt}{\longleftarrow}}{\smash{\mathop{\mathrm{\mathfrak{num}}}}}}\).↩︎