Safe Interval RRT\(^*\) for Scalable Multi-Robot

Path Planning in Continuous Space

April 02, 2024

In this paper, we consider the problem of Multi-Robot Path Planning (MRPP) in continuous space to find conflict-free paths. The difficulty of the problem arises from two primary factors. First, the involvement of multiple robots leads to combinatorial
decision-making, which escalates the search space exponentially. Second, the continuous space presents potentially infinite states and actions. For this problem, we propose a two-level approach where the low level is a sampling-based planner *Safe
Interval RRT\(^*\) (SI-RRT\(^*\))* that finds a collision-free trajectory for individual robots. The high level can use any method that can resolve inter-robot conflicts where we employ
two representative methods that are Prioritized Planning (SI-CPP) and Conflict Based Search (SI-CCBS). Experimental results show that SI-RRT\(^*\) can find a high-quality solution quickly with a small number of samples.
SI-CPP exhibits improved scalability while SI-CCBS produces higher-quality solutions compared to the state-of-the-art planners for continuous space. Compared to the most scalable existing algorithm, SI-CPP achieves a success rate that is up to 94% higher
with 100 robots while maintaining solution quality (i.e., *flowtime*, the sum of travel times of all robots) without significant compromise. SI-CPP also decreases the makespan up to 45%. SI-CCBS decreases the flowtime by 9% compared to the
competitor, albeit exhibiting a 14% lower success rate.

Multi-Robot Path Planning (MRPP) is the problem of finding collision-free paths of multiple robots while optimizing a given objective, such as the sum of costs or makespan [1]. It is a computationally challenging problem to solve optimally, as proven to be \(\mathcal{NP}\)-hard [2]. MRPP has been studied extensively in the context of multi-agent systems to develop algorithms that produce high-quality solutions quickly, even in congested spaces with many robots (like the challenging instances shown in Fig. 1).

Existing methods can be classified into *coupled*, *decoupled*, and *dynamically-coupled* as suggested in [3]. The coupled approach
(e.g., Operator Decomposition [4], Enhanced Partial Expansion A\(^*\) [5]) using a high-dimensional joint state space of all robots does not scale with the number of robots although they guarantee completeness and optimality. The decoupled approach (e.g., Prioritized
Planning [6], Hierarchical Cooperative A\(^*\) [7]) reduces the search space by considering robots separately, but their theoretical guarantees are not established. Dynamically-coupled methods (e.g., Conflict-Based Search (CBS) [8], M\(^*\) [3]) combine the advantages of the former two by coupling robots
only whenever necessary. As CBS is one of the most successful methods, there exist several variants, such as Enhanced CBS [9] and Explicit Estimation CBS [10].

While the existing methods are effective for instances in moderate-sized discrete space, they often do not scale with the problem size. Also, they often demand discretizing continuous environments, which unnecessarily simplifies the original problem. These limitations indicate the necessity for a fast continuous-space MRPP algorithm.

Our goal is to develop a scalable MRPP algorithm in continuous space that can handle up to \(100\) robots efficiently, even in dense clutters, without significantly compromising solution quality. We propose a two-level
approach consisting of a low-level single-robot planner and a high-level conflict-resolving method. For the low level, we propose SI-RRT\(^*\) where the concept of using safe time intervals is borrowed from Safe Interval
Path Planning (SIPP) [11]. Unlike SIPP, which requires discretization of the environment, SI-RRT\(^*\)
allows a robot to navigate in continuous space without the need for discrete time steps. The high-level can use any conflict resolution method, such as Prioritized Planning (PP) and Conflict Based Search (CBS), which will be reviewed with more details in
Sec. 2. We suggest two variants *Safe Interval Continuous space PP* (SI-CPP) and *Safe Interval Continuous space CBS* (SI-CCBS), which have their advantages in computational efficiency and solution quality,
respectively.

The main contributions of the proposed method include:

Scalability: Our methods can find a high-quality solution to an MRPP instance even in a cramped environment with \(100\) moving robots and static obstacles.

Theoretical analysis: We provide analytic results proving that SI-RRT\(^*\) is probabilistically complete and asymptotically optimal.

Extensive experiments: We test our methods in four different environments where comparisons are done with the state-of-the-art methods (such as SSSP [12], ST-RRT\(^*\)-PP [13], and Graph Transformer (GT) [14]).

We propose the single-robot planner SI-RRT\(^*\) in Sec. 4 and its multi-robot extension in Sec. 5.

There has been an extensive line of research on MRPP in the context of multi-agent pathfinding (MAPF). Among them, we are particularly interested in MRPP and MAPF methods in continuous space. We introduce several recent works that are closely related to our research.

Safe Interval Path Planning (SIPP) is introduced in [11] for efficient path planning of a single robot in environments with moving
obstacles. SIPP assumes the existence of another system responsible for predicting future trajectories of dynamic obstacles. A *safe interval* is defined as a contiguous period of time for a specific configuration during which there is no collision
with the trajectories of the obstacles. As the number of safe intervals is significantly smaller than the number of time steps, SIPP can plan more efficiently by reducing the dimensionality of the problem. Nevertheless, SIPP is designed for discrete space
so with limited flexibility to handle continuous changes in the environment.

Continuous-time Conflict-Based Search (CCBS) [15] introduces several modifications to CBS. To handle continuous time and robots with arbitrary geometric shapes and variable action durations, CCBS implements a geometry-aware collision detection and an unsafe-interval detection for resolving conflicts. At the high-level search operating on a Constraint Tree (CT), CCBS adds constraints over pairs of actions and time ranges instead of location-time pairs. For the low-level search, a version of SIPP is employed to handle CCBS constraints. While CCBS can manage a variety of robot shapes, variable action durations, and continuous time while preserving optimality and completeness, its scalability is hindered by the inherent limitations of CBS.

Simultaneous Sampling and Search Planning (SSSP) [12] is a unified approach that does not separate the roadmap construction and conflict resolution. SSSP not
only keeps the search space small through unified planning but also provides a guarantee for probabilistic completeness. By reducing the branching factor in the search and making roadmaps sparse, SSSP can find a solution quickly. However, the proposed
method has a critical issue that limits its applicability. The solution is *sequential*, meaning that only one robot can move at each time. Although a postprocessing method is provided, it requires additional computation which does not seem to be
negligible. In addition, the proof for probabilistic completeness does not hold as it is built upon the sequential solution.

In [14], a Graph Transformer (GT) is used as a heuristic function to accelerate CBS in non-grid settings, aiming for both completeness and bounded-suboptimality. A contrastive loss training objective is introduced for learning a heuristic that ranks search nodes, demonstrating the generalizability of the method with promising results in accelerating CBS. While the approach of learning the heuristic function is novel, the success rate of the search decreases significantly as the number of robots increases.

ST-RRT\(^*\) [13] is a bidirectional planner working in a space-time configuration space, which consists of Cartesian coordinates and a time dimension. Prioritized planning (PP) [16] is used to resolve conflicts such that the robot with a lower priority modifies its individual path generated by ST-RRT\(^*\). Since the search space has a time dimension, the robots can exploit temporality for more effective collision avoidance. Although PP does not guarantee deadlock-free paths, ST-RRT\(^*\)-PP exhibits remarkable scalability and high quality solutions.

We consider the MRPP problem of finding collision-free trajectories for \(k\) robots from their start locations to respective goal locations in a *workspace* \(\mathcal{W} \in
\mathbb{R}^d\) where \(d \in \{2, 3\}\). A point \(\boldsymbol{w} \in \mathcal{W}\) can be occupied by an obstacle where the sets of static and dynamic obstacles are \(\mathcal{O}_S\) and \(\mathcal{O}_D\), respectively. We assume that \(\mathcal{O}_S\) and \(\mathcal{O}_D\) are all known. The
free space is \(\mathcal{W}^{f} = \mathcal{W} \setminus \mathcal{O}\) where \(\mathcal{O} = \mathcal{O}_S \cup \mathcal{O}_D\). The *configuration space* (C-space) \(\mathcal{C}^i \subseteq \mathcal{W}\) is the set of configurations of robot \(i\). For a configuration \(\boldsymbol{q} \in \mathcal{C}^i\), the set of points
occupied by \(i\) at \(\boldsymbol{q}\) is denoted as \(h^i(\boldsymbol{q}) \subset \mathcal{W}\), where \(h^i(\boldsymbol{q})\) depends on the shape of robot \(i\). The free C-space (*free space*) is \(\mathcal{C}^i_\text{free} = \{ \boldsymbol{q} \in \mathcal{C}^i
\mid h^i(\boldsymbol{q}) \cap \mathcal{O} = \emptyset \}\). A trajectory of robot \(i\) is a continuous function \(\pi^i : [0, t^i_\text{final}] \rightarrow
\mathcal{C}^i_\text{free}\) where \(t^i_\text{final}\) means the last time of the trajectory of \(i\). Each configuration \(\boldsymbol{q}\) in a
trajectory is given by \(\pi^i(t)\) for \(t \in [0, t^i_\text{final}]\). We assume that \(i\) moves with a constant velocity denoted by \(vel^i\).

A *safe interval* of \(\boldsymbol{q}\) is a continuous period of time \(\iota = [low, high)\) while \(\boldsymbol{q}\) resides in \(\mathcal{C}^i_\text{free}\). With a slight abuse of notation, we denote these values as \(\iota.low\) and \(\iota.high\). For each \(\iota\) of \(\boldsymbol{q}\), a robot can be at \(\boldsymbol{q}\) from \(\iota.low\) until \(\iota.high\). Within this safe interval, the robot cannot arrive at \(\boldsymbol{q}\) earlier than \(\iota.low\) and must leave \(\boldsymbol{q}\) before \(\iota.high\) to avoid collision. While \(\boldsymbol{q} \notin \mathcal{C}^i_\text{free}\), \(\boldsymbol{q}\) has a *collision interval*. Depending on the trajectories of dynamic obstacles, a single configuration \(\boldsymbol{q}\) can have multiple safe intervals which are
separated by collision intervals.

In Fig. 2 all points occupied by robot \(i\) (i.e., \(h^i(\boldsymbol{q})\)) are depicted by a circle which can be an arbitrary shape. The green zone represents
the safe intervals as no obstacle has an overlap with \(h^i(\boldsymbol{q})\). The dynamic obstacle (in blue) crosses \(h^i(\boldsymbol{q})\) so the collision intervals (in red) occurs. In a
discrete space, each discrete state has its own safe intervals ([11], [17]).
Since the number of configurations in \(\mathcal{C}^i\) is not bounded, it is infeasible to maintain the safe intervals of all possible configurations in \(\mathcal{C}^i\). Thus, we
construct a *safe interval map* \(\mathcal{M}\) such that one or more safe intervals at \(\boldsymbol{q}\) are added to \(\mathcal{M}\) only if \(\boldsymbol{q}\) is considered in planning, instead of containing the safe intervals of all configurations in \(\mathcal{C}^i\).

A roadmap (or a tree) \(G = (\mathcal{V}, \mathcal{E})\) is a topological graph in a \((d+1)\) dimensional space. The roadmap is used to generate a trajectory and assumed to be connected.
The roadmap of \(i\) is denoted by \(G^i\). Vertex \(v \in \mathcal{V}\) corresponds to a point in the \((d+1)\)-dim space
where \(d\) dimensions are for the workspace and the one dimension is for time. A vertex is specified by its configuration \(\boldsymbol{q}\) and \(t_\text{low}\) so in the \((d+1)\)-dim space. The quantity \(v.t_\text{low}\) is the *earliest arrival time* of \(v\)
indicating that the robot cannot arrive at \(v\) earlier than \(t_\text{low}\). Edge \(e \in \mathcal{E}\) connects a pair of vertices in \(\mathcal{V}\) if there exists a trajectory between the configurations of the vertex pair given the safe intervals of their configurations.

A *conflict* \(\langle i, j, h^i(\pi^i(t)), h^j(\pi^j(t)) \rangle\) is defined for robots \(i\) and \(j\) for \(t \in
[t_s, t_e]\) where \(h^i(\pi^i(t))\) represents the area occupied by \(i\) at \(t\) while \(i\) moves along \(\pi^i\). Robots \(i\) and \(j\) collide within a collision interval \([t_s, t_e]\) so \(h^i(\pi^i(t))
\cap h^j(\pi^j(t)) \neq \emptyset\) for \(t \in [t_s, t_e]\). A *constraint* for robot \(i\) is defined as \(\langle i, h^j(\pi^j(t)), t
\rangle\) where \(t \in [t_s, t_e]\) which indicates that robot \(i\) must avoid \(h^j(\pi^j(t))\) during \([t_s,
t_e]\). In other words, robot \(i\) must generate a trajectory while satisfying \(h^i(\pi^i(t)) \cap h^j(\pi^j(t)) = \emptyset\) for \(t \in [t_s,
t_e]\).

For a single robot \(i\), we aim to find a conflict-free trajectory \(\pi^i\) from the start configuration \(\boldsymbol{q}^i_\text{start}\) to the goal
configuration \(\boldsymbol{q}^i_\text{goal}\) with the minimum arrival time \(t^i_\text{final}\). In other words, we aim to minimize the travel time from \(\boldsymbol{q}^i_\text{start}\) to \(\boldsymbol{q}^i_\text{goal}\) without collisions. For multiple robots, we aim to find conflict-free trajectories for all \(k\) robots from their respective start positions to their goal positions while minimizing the *flowtime* \(\kappa\) which is the sum of individual arrival times of all robots at their
goal configurations: \[\kappa = \sum_{i=1}^{k} t^i_\text{final}.\label{eqn:flow}\tag{1}\] The collision-free trajectories satisfy \[h^i(\pi^i(t)) \cap
h^j(\pi^j(t)) = \emptyset \;\forall i, j, i \neq j, \forall t \in [0, {t_\text{final}}]\] where \({t_\text{final}} = \max \{t^1_\text{final}, \cdots, t^k_\text{final}\}\).

We propose a sampling-based planner SI-RRT\(^*\) which can be used with a high-level planner for multiple robots. It also can be used as an independent single-robot planner in dynamic environments. SI-RRT\(^*\) is based on the concept of using the safe intervals of SIPP [11] but more flexible as it works in continuous space without discrete time steps. Also, SI-RRT\(^*\) can find a high-quality trajectory with a lesser number of samples. For a simpler explanation, we temporarily omit the robot index \(i\) (or \(j\)) throughout this section.

SI-RRT\(^*\) expands \(G\) up to \(iteration\) times to find a collision-free trajectory \(\pi(t)\) for \(t \in [t_\text{start}, t_\text{goal}]\). SI-RRT\(^*\) has an anytime feature. Once a solution is found, it keeps improving the solution with more samples until \(iteration\) expires. Here, \(t_\text{start}\) and \(t_\text{goal}\) represent the time at the start and goal configurations \(\boldsymbol{q}_\text{start}\) and \(\boldsymbol{q}_\text{goal}\) of the robot, respectively. The configurations \(\boldsymbol{q}_\text{start}\) and \(\boldsymbol{q}_\text{goal}\) belong to the vertices \(v_\text{start}\) and \(v_\text{goal}\) of \(G\), respectively. We introduce several subroutines of SI-RRT\(^*\).

Functions Sampling and Steer are used to grow \(G\). Sampling randomly selects a configuration \(\boldsymbol{q}_{\text{rand}}\) from \(\mathcal{C}\) with a probability of \(1-\lambda\) or the goal configuration \(\boldsymbol{q}_\text{goal}\) with a probability of \(\lambda\). Steer returns a new configuration \(\boldsymbol{q}_{\text{new}}\). It identifies \(v_\text{near}\) in \(G\) whose configuration \(v_\text{near}.\boldsymbol{q}\) is the nearest to \(\boldsymbol{q}_{\text{rand}}\). If the distance between them does not exceed \(d_\text{max}\), \(\boldsymbol{q}_{\text{rand}}\) becomes \(\boldsymbol{q}_{\text{new}}\). Otherwise, \(\boldsymbol{q}_{\text{new}}\) lies on the line between \(v_\text{near}.\boldsymbol{q}\) and \(\boldsymbol{q}_{\text{rand}}\) at a distance of \(d_\text{max}\).

Functions GetNeighbor, ChooseParent, and Rewire are used to optimize \(G\). GetNeighbor returns the set of adjacent vertices \(\mathcal{V}_\text{neighbor}\) which are within \(d_\text{max}\) from \(\boldsymbol{q}_{\text{new}}\). ChooseParent generates a new vertex of \(\boldsymbol{q}_\text{new}\) (i.e., \(v_\text{new}\)), finds the parent vertex of \(v_\text{new}\) from \(\mathcal{V}_\text{neighbor}\), and updates the earliest arrival time \(t_{\text{low}}\) of \(v_\text{new}\). The parent vertex is chosen such that \(t_{\text{low}}\) of \(v_\text{new}\) is minimized. Function Rewire modifies the structure of graph \(G\). It reassigns the parent vertex of each neighbor in the set \(\mathcal{V}_\text{neighbor}\). If assigning \(v_\text{new}\) as the new parent results in a smaller \(t_{\text{low}}\), the parent is updated accordingly. These functions help optimize the tree structure and find more efficient trajectories.

In ChooseParent (Alg. [alg:chooseparent]), three members of \(v_\text{new}\) are initialized (line [alg1line:initial]), which are the new configuration \(\boldsymbol{q}_\text{new}\), its earliest arrival time \(t_{\text{low}}\), and the parent vertex. Alg. [alg:chooseparent] iterates to check if each neighbor vertex \(v\) can be a parent (lines [alg1line:forv]–[alg1line:endforv]). For each safe interval \(\iota_{\text{new}}\) of \(v_{\text{new}}.\boldsymbol{q}\), Alg. [alg:chooseparent] checks if the robot can move from \(v\) to \(v_\text{new}\) without collisions. If there exists at least one collision-free trajectory, the earliest arrival time of them is assigned to \(low\) (line [alg1line:low]).

In the left of Fig. 3, the robot can stay \(v\) from \(v.t_{\text{low}}\) until \(\iota.high\) without collisions. The safe interval of \(v_\text{new}.\boldsymbol{q}\) (green) is \([\iota_\text{new}.low, \iota_\text{new}.high)\). Until, the earliest arrival time of \(v_\text{new}\) (i.e., \(v_\text{new}.t_{\text{low}}\)) is not found but initialized by infinity (line [alg1line:initial]). In the right of Fig. 3, the bold arrow represents a trajectory to move from \(v.\boldsymbol{q}\) to \(v_\text{new}.\boldsymbol{q}\) as early as possible with \(vel\), without colliding with the dynamic obstacle (blue). Thus, the arrival time at \(v_\text{new}.\boldsymbol{q}\) along the arrow is assigned to \(low\).

If there is no overlap between the safe interval of \(v_\text{new}.\boldsymbol{q}\) and the time range that the robot can arrive at \(v_\text{new}.\boldsymbol{q}\), no trajectory exists between \(v\) and \(v_\text{new}\). If the robot collides with an obstacle no matter when it starts toward \(v_\text{new}.\boldsymbol{q}\), no trajectory can be found. In both cases, \(low\) does not exist. If \(low\) is not found, the rest of the loop is skipped (line [alg1line:continue]) to move on to the next neighbor vertex in \(\mathcal{V}_\text{neighbor}\). If \(low\) is found and lower than any other previously found \(low\), the earliest arrival time and the parent of \(v_\text{new}\) are updated with \(low\) and \(v\), respectively (lines [alg1line:update1]–[alg1line:update2]). Iterating the outer loop (lines [alg1line:forv]–[alg1line:endforv]) ensures the lowest value of the earliest arrival time. Therefore, the robot can arrive at \(v_\text{new}\) as early as possible among all possible choices of the parent from \(\mathcal{V}_\text{neighbor}\). If no collision-free trajectory to \(v_\text{new}.\boldsymbol{q}\) is found from any of \(\mathcal{V}_\text{neighbor}\), \(v_\text{new}\) is rejected so no new vertex is returned (line [alg1line:null]).

**Input**: \(\boldsymbol{q}_{\text{new}}\), \(\mathcal{V}_{\text{neighbor}}\), \(G\), \(\mathcal{M}\), \(\mathcal{O}\)

**Output**: \(v_{\text{new}}\) or NULL

\(v_{\text{new}}.\boldsymbol{q} \leftarrow \boldsymbol{q}_{\text{new}}\), \(v_{\text{new}}.t_\text{low} \leftarrow \infty\), \(v_{\text{new}}.parent =
\text{NULL}\) \(low \leftarrow\) the earliest arrival time of \(v_{\text{new}}.\boldsymbol{q}\) within \([\iota_\text{new}.low, \iota_\text{new}.high)\) without colliding with \(\mathcal{O}\) when the robot moves from \(v\) **continue** \(v_{\text{new}}.t_{\text{low}} \leftarrow low\) \(v_{\text{new}}.parent
\leftarrow v\) **return** \(\mathcal{V} \leftarrow \mathcal{V} \cup v_\text{new}\) \(\mathcal{E} \leftarrow \mathcal{E} \cup (v_\text{new}, v_\text{new}.parent)\) **return**
\(v_{\text{new}}\)

Given \(v_\text{new}\) generated from ChooseParent, Rewire (Alg. [alg:rewire]) reconnects edges of \(G\) if doing so results in a better trajectory. Alg. [alg:rewire] examines if \(v_\text{new}\) can be the parent of one or more of the neighbor vertices such that \(v.t_\text{low}\) of the neighbors can decrease. While iterating for each neighbor vertex \(v\), Rewire attempts to update the earliest arrival time of \(v\). If \(low\) does not exist or \(low\) found does not optimize \(G\), no change is made. An update occurs if the earliest arrival time of \(v\) (from \(v_\text{new}\)) is within the interval \([\iota.low, \iota.high)\) (lines [alg2line:if95update1]–[alg2line:if95update4]). In Fig. 4 (a), \(low\) falls within the current safe interval so \(v.t_\text{low}\) is updated with \(low\). Also, the parent of \(v\) is updated by \(v_\text{new}\) since coming from \(v_\text{new}\) to \(v\) is faster than coming from the current parent (lines [alg2line:if95update2]–[alg2line:if95update4]). If \(v.t_\text{low} \notin [\iota.low, \iota.high)\) as shown in Fig. 4 (b), a new vertex \(v_\text{re}\) is generated where \(v_\text{re}.\boldsymbol{q}\) is identical to \(v.\boldsymbol{q}\) but their values of \(t_\text{low}\) differ (lines [alg2line:else95update1]–[alg2line:else95update2]). This generation of a new vertex is caused by the presence of the collision interval. If we update \(v.t_\text{low}\) with \(low\), the robot would remain at \(v\) during the collision interval so a collision is incurred. Accordingly, the earliest arrival time and the parent of \(v_\text{re}\) are updated (lines [alg2line:else95update3]–[alg2line:else95update5]).

Note that \(v_\text{re}\) and \(v\) have an identical configuration in the spatial dimension but different values in the temporal dimension. As discussed, \(G\) is a topological graph in a \((d+1)\)-dim space where sampling occurs in the spatial dimension for configurations. The value in the temporal dimension is determined by \(t_\text{low}\) but not sampled.

**Input**: \(v_{\text{new}}\), \(\mathcal{V}_{\text{neighbor}}\), \(G\), \(\mathcal{M}\), \(\mathcal{O}\)

**Output**:

\(low \leftarrow\) the earliest arrival time of \(v.\boldsymbol{q}\) within \([\iota.low, \iota.high)\) without colliding with \(\mathcal{O}\) when the robot moves from \(v_{\text{new}}\) **continue**
\(v.t_{\text{low}} \leftarrow low\) \(\mathcal{E} \leftarrow
\mathcal{E} \setminus (v, v.parent)\) \(v.parent \leftarrow v_{\text{new}}\)
\(\mathcal{E} \leftarrow \mathcal{E} \cup (v, v_\text{new})\) \(v_{\text{re}}.\boldsymbol{q} \leftarrow
v.\boldsymbol{q}\) \(v_{\text{re}}.t_{\text{low}} \leftarrow low\)
\(v_{\text{re}}.parent \leftarrow v_{\text{new}}\) \(\mathcal{V} \leftarrow \mathcal{V} \cup v_\text{re}\) \(\mathcal{E} \leftarrow \mathcal{E} \cup (v_\text{re}, v_\text{new})\)

Based on the subroutines described above, SI-RRT\(^*\) performs as described in the pseudocode Alg. 5. The safe interval map \(\mathcal{M}\) is initialized to be empty and then updated to add the safe intervals at \(\boldsymbol{q}_\text{start}\) and \(\boldsymbol{q}_\text{goal}\) (lines [alg3line:init1]–[alg3line:init5]). GetSafeIntervals searches \(\mathcal{C}\) over time at the input configuration to find all safe intervals.

A robot should leave its start configuration within the earliest safe interval of \(\boldsymbol{q}_\text{start}\) which begins from zero. Otherwise, it will collide with a dynamic obstacle. Thus, the safe interval at \(\boldsymbol{q}_\text{start}\) is upper bounded by \(high\) of the earliest safe interval (line [alg3line:init4]). Since a robot should be able to stay at its goal configuration, it should arrive at \(\boldsymbol{q}_\text{goal}\) after all dynamic obstacles move away from the goal. Therefore, the safe interval at \(\boldsymbol{q}_\text{goal}\) should start from the latest safe interval so it is lower bounded by \(low\) of the latest safe interval. The upper bound is initially set to be infinity as we assume the robot stays at \(\boldsymbol{q}_\text{goal}\) indefinitely once it arrives the goal (line [alg3line:init5]). Initialization of the vertices of the start and goal configurations follows (lines [alg3line:init95start]–[alg3line:init95goal]) where \(v_\text{goal}.t_\text{low}\) is set to be infinity as the arrival time at the goal has not been computed.

The main loop iterates for \(iteration\) times (lines [alg3line:while95begin]–[alg3line:while95end]). If \(\boldsymbol{q}_\text{new}\) is found from Sampling and Steer, the safe intervals at \(\boldsymbol{q}_\text{new}\) are added to \(\mathcal{M}\) (line [alg3line:update]). If a new vertex \(v_\text{new}\) for \(\boldsymbol{q}_\text{new}\) is found by ChooseParent, Rewire optimizes \(G\) (line [alg3line:rewire]).

If (i) the configuration of \(v_\text{new}\) is equal to the goal configuration (line [alg3line:goalcheck]) and (ii) the robot could arrive at the goal earlier than the current earliest arrival time (line [alg3line:intvcheck]), \(v_\text{goal}\) is updated by \(v_\text{new}\). If \(v_\text{goal}\) has a parent, \(v_\text{goal}\) is connected to \(G\) so \(\pi(t)\) is returned with the goal arriving time \(t_\text{final} = v_\text{goal}.t_\text{low}\).

Once a goal vertex is found, Alg. 5 keeps optimizing \(G\) until \(iteration\) reaches zero through ChooseParent and Rewire. This anytime capability of SI-RRT\(^*\) enables us to balance computational resources and solution quality.

We prove the probabilistic completeness and the asymptotical optimality of SI-RRT\(^*\).

**Theorem 1**. *SI-RRT\(^*\) is probabilistically complete.*

*Proof.* Let us consider \(v_\text{new}\) to connect it to a neighbor \(v\). In SI-RRT\(^*\), the rejection of \(v_\text{new}\) occurs only under the following two conditions:

(Condition 1) No overlap in safe intervals: A robot coming from \(v\) belongs to a safe interval. If the robot starts from \(v\) to \(v_\text{new}\) with velocity \(vel\), the safe interval is linearly translated by \(\frac{\|v.\boldsymbol{q} - v_\text{new}.\boldsymbol{q} \|}{vel}\) along the time axis. If this translated safe interval does not have any overlap with all safe intervals at \(\boldsymbol{q}_\text{new}\), the robot cannot reach \(v_\text{new}\) within any of the safe intervals. In other words, the robot must collide with an obstacle if it moves from \(v\) to \(v_\text{new}\) with \(vel\), regardless of its departure time from \(v\).

(Condition 2) Colliding with dynamic obstacles: Even though there is at least an overlap between the translated interval and the safe intervals at \(v_\text{new}\), the robot cannot reach \(v_\text{new}\) if a dynamic obstacle occupies the entire overlap.

From these conditions, we can ensure that SI-RRT\(^*\) finds a trajectory between a newly sampled vertex and its neighbor vertex if there exists an overlap of intervals between the pair and no dynamic obstacle obstructs such an overlap. In other words, SI-RRT\(^*\) will find a solution if both Conditions 1 and 2 are not satisfied.

If a collision-free trajectory exists between \(v\) and \(v_\text{new}\), the probability of finding it approaches one under infinite uniform sampling. Equivalently, one can think that SI-RRT\(^*\) samples new safe intervals until any of them has an overlap with the current translated safe interval. The probability of having an overlap approaches as the number of samples goes to infinity. Therefore, SI-RRT\(^*\) expands \(G\) progressively as more samples are added. This incremental growth of \(G\) enables SI-RRT\(^*\) to eventually connect \(G\) to \(v_\text{goal}\) as the number of sample approaches to infinity. ◻

**Theorem 2**. *SI-RRT\(^*\) is asymptotically optimal.*

*Proof.* SI-RRT\(^*\) is designed to maintain \(t_\text{low}\) in all vertices to be the earliest arrival time at the respective vertices. Thus, \(t_\text{low}\) of \(v_\text{goal}\), which is \(t_\text{final}\), is also the earliest arrival time to reach the goal configuration.

Once a solution is found, the trajectory \(\pi(t)\) for \(t \in [0, t_\text{final}]\) is representation-optimal

^{1}given \(G\) by the design of SI-RRT\(^*\).The configurations of vertices in \(\mathcal{V}\) approximate \(\mathcal{C}_\text{free}\). As the number of sample approaches to infinity, the configurations of the vertices converge to \(\mathcal{C}_\text{free}\).

Under infinite uniform sampling, \(G\) covers all configurations in \(\mathcal{C}_\text{free}\) so the representation converges to the solution space. Since SI-RRT\(^*\) is representation-optimal, the solution found from \(G\) is optimal with an infinite number of samples. ◻

SI-RRT\(^*\) enables each robot to find its own collision-free trajectory. With multiple robots, the trajectories could have conflicts between them. We present the bi-level framework where the high level resolves conflicts between the individual trajectories from the low level SI-RRT\(^*\). While any conflict resolving methods can be used in the high level, we show SI-CPP and SI-CCBS which employ PP and CBS. While SI-CPP is more efficient, SI-CCBS produces solutions with better quality so one can choose either of them according to the requirements of the task.

PP is a general approach based on a given, fixed priority of robots. Each robot finds its trajectory independently and then avoids collisions with the trajectories of higher-rank robots. PP is simple to implement and efficient. However, it is not complete so deadlocks could occur where robots get stuck. In SI-CPP, individual trajectories are found by SI-RRT\(^*\). Then the trajectory of a higher-rank robot is treated as a dynamic obstacle to a lower-rank robot. Thus, lower-rank robots avoid all trajectories of higher rankers until they arrive at their goals.

CBS is a two-level algorithm which has proven to be optimal and complete [8]. Like many CBS variants, the high-level of SI-CCBS performs the best-first search in Constraint Tree (CT). We adopt the cost function of the greedy variant of CBS proposed in [9], which is the number of conflicts between trajectories for efficient search. Other modifications to CBS are done on the definition of conflicts and constraints to handle continuous space as described in Sec. 3. A CT node in the CT contains the cost, the set of individual trajectories, and the set of constraints. SI-CCBS begins with inserting the root node to the search frontier \(OPEN\) where the root is with an empty constraint set. The search iterates by expanding the node with the minimum cost from \(OPEN\). The expanded node is eliminated from \(OPEN\). If the expanded node has no conflict, the search terminates with a conflict-free solution. If \(OPEN\) is empty, the search fails to find a solution.

When a CT node is expanded, conflicts among all trajectories are found. The number of conflicts is used as the cost of the node. The constraint set is constructed with a conflict (we choose the conflict with the smallest \(t\)) for the involved robots \(i\) and \(j\) with trajectories \(\pi^i(t)\) and \(\pi^j(t)\), respectively. Subsequently, SI-RRT\(^*\) finds the trajectories that abide by the constraint set. With the new trajectories, a new node is inserted to \(OPEN\).

The first set of experiments is done for single-robot planners. Then we compare MRPP and MAPF algorithms that can handle continuous space and time. We have four test environments with different obstacle profiles: Circ10, Circ20, Rect10, and Rect20 as shown in Fig. 1. The size of all environments is \(40\,m \times 40\,m\) where the radius of all robots is \(0.5\,m\). Circ10 and Circ20 are filled with circular obstacles with different densities of the obstacles (10% and 20% of the entire space, respectively). Likewise, Rect10 and Rect20 are with rectangular obstacles. For each environment, we generate a set of \(50\) unique instances while randomizing the start and goal locations as well as the size and location of the obstacles. The same set of instances are used to test all compared methods for fairness. We have parameter settings: \(\lambda = 0.05\), \(d_{\text{max}} = 5\,m\), \(iteration = 1500\), and \(vel = 0.5\,m/s\). Unless otherwise noted, the maximum velocity of the robots across all compared algorithms is \(0.5\,m/s\). The system has an AMD Ryzen 5800X 3.8GHz CPU and 32G RAM, and all algorithms except for GT are implemented in C++17. GT is implemented using Python 3.8 because of its dependency on PyTorch.

To the best of our knowledge, ST-RRT\(^*\) is the most efficient sampling-based planner for single robots that can handle continuous space and dynamic obstacles. In [13], ST-RRT\(^*\) outperforms RRT-Connect and RRT\(^*\) in dynamic environments. In conjunction with PP, it also works efficiently for multi-robot systems. For a comparison with ST-RRT\(^*\), we place \(30\) dynamic obstacles whose size and shape are the same as the robot. The trajectories of the obstacles are randomly generated but known to both of the algorithms. The performance metric is the earliest arrival time at \(\boldsymbol{q}_\text{goal}\), which shows the quality of the tree and trajectory. With a lower earliest arrival time, the robot can reach the goal faster. We measure the metric by increasing the number of samples because we want to compare the sample-efficiency of the algorithms.

Fig. 6 shows the results across the four environments. The earliest arrival time at \(\boldsymbol{q}_\text{goal}\) of SI-RRT\(^*\) decreases faster than ST-RRT\(^*\), showing that SI-RRT\(^*\) can find better trajectories faster. Although the gap between the two algorithms decreases with a sufficient number of samples, SI-RRT\(^*\) converges to a lower earliest arrival time. Since both methods have the anytime feature, using SI-RRT\(^*\) is more beneficial if the time budget for planning is tight.

Our analysis suggests that this result stems from the dimensionality of the space in which the algorithms generate samples. ST-RRT\(^*\) samples from the entire space including the spatial and temporal dimensions while SI-RRT\(^*\) samples in the spatial dimension. Thus, the samples of SI-RRT\(^*\) can cover a larger number of configurations with the same number of samples. On the other hand, ST-RRT\(^*\) tends to expand more toward the temporal dimension than the spatial dimension since samples spreading out in the spatial dimension are more likely rejected owing to obstacles. Thus, ST-RRT\(^*\) needs more samples for the tree to reach the goal configuration.

Environment | Circ20 | Rect20 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

Metric | #robot | SI-CCBS | SI-CPP | ST-RRT\(^*\)-PP | GT | SSSP | SI-CCBS | SI-CPP | ST-RRT\(^*\)-PP | GT | SSSP |

Success rate (%) | 20 | 100 | 100 | 98 | 28 | 34 | 100 | 100 | 96 | 20 | 22 |

2-12 | 40 | 100 | 100 | 78 | 0 | 0 | 100 | 100 | 80 | 0 | 0 |

2-12 | 60 | 96 | 100 | 68 | 0 | 0 | 94 | 100 | 66 | 0 | 0 |

2-12 | 80 | 66 | 100 | 48 | 0 | 0 | 30 | 100 | 44 | 0 | 0 |

2-12 | 100 | 0 | 98 | 34 | 0 | 0 | 0 | 98 | 4 | 0 | 0 |

Flowtime (sec) | 20 | 839.6 | 858.0 | 858.1 | 2295.3 | 17142.9 | 839.0 | 874.6 | 872.5 | 2364.9 | 16696.4 |

2-12 | 40 | 1788.7 | 1848.0 | 1848.7 | - | - | 1837.2 | 1928.5 | 1893.7 | - | - |

2-12 | 60 | 2843.8 | 2942.9 | 2857.1 | - | - | 2945.6 | 3157.9 | 3092.6 | - | - |

2-12 | 80 | 3978.3 | 4287.3 | 4150.3 | - | - | 4129.9 | 4677.1 | 4538.5 | - | - |

2-12 | 100 | 0 | 5829 | 5471.9 | - | - | - | 6479.8 | 6051 | - | - |

Sum of distance (m) | 20 | 409.8 | 414.2 | 414.8 | 443.5 | 507.9 | 407.4 | 418.0 | 416.4 | 468.6 | 507.3 |

2-12 | 40 | 846.0 | 864.0 | 871.2 | - | - | 851.7 | 881.9 | 879.1 | - | - |

2-12 | 60 | 1285.4 | 1331.2 | 1319.1 | - | - | 1319.7 | 1401.5 | 1386.1 | - | - |

2-12 | 80 | 1747.1 | 1882.2 | 1852.8 | - | - | 1752.7 | 1984.8 | 1950.4 | - | - |

2-12 | 100 | 0 | 2480.6 | 2384.4 | - | - | - | 2644.2 | 2489.5 | - | - |

Makespan (sec) | 20 | 77.6 | 79.0 | 79.2 | 114.8 | 1286.1 | 79.7 | 84.8 | 84.1 | 118.2 | 1259.6 |

2-12 | 40 | 84.7 | 88.0 | 91.1 | - | - | 86.4 | 94.4 | 98.3 | - | - |

2-12 | 60 | 86.7 | 99.8 | 104.8 | - | - | 87.3 | 117.3 | 118.7 | - | - |

2-12 | 80 | 87.7 | 124.3 | 126.7 | - | - | 88.8 | 140.0 | 163.1 | - | - |

2-12 | 100 | 0 | 144 | 150.8 | - | - | - | 156.4 | 199.0 | - | - |

The main metrics for multi-robot tests are the *success rate* and *flowtime*. We measure the success rate by counting the number of instances (out of \(50\)) that can finish computation within five minutes.
The flowtime is our objective value to be minimized (Eq. 1 ), which is the sum of the travel times of the robots closely related to the energy consumption in the navigation task. In congested environments, the significance of
flowtime increases as robots are compelled to wait longer to circumvent conflicts with other robots. We also measure the *sum of distance* and *makespan* (i.e., the elapsed time until all robots reach their goals) as auxiliary metrics.

One of the main advantages of SI-CPP and SI-CCBS lies in the scalability, which can solve instances with \(50+\) robots. Although ST-RRT\(^*\) with PP (ST-RRT\(^*\)-PP) is the only such scalable algorithm among the recently proposed algorithms, we also compare SSSP [12] and GT [14] for an expanded comparison (all reviewed in Sec 2). We note that the solutions of SSSP and GT specify the locations of the robots for each discrete time step. In other words, all robots are synchronized with the time steps, so the robots which arrive at their next waypoint earlier than peer robots should wait until all of them reach the waypoints. On the other hand, ST-RRT\(^*\)-PP can run with continuous time without such a synchronization.

The experimental results are shown in Figs. 7 and 8 for varying numbers of robots and different environments. Since running the extensive set of experiments is cost-expensive, we vary the number of robots unevenly up to \(100\). Owing to the space constraint, we present values of the metrics in Table 1 selectively focusing on more difficult environments. Standard deviations are also omitted. The success rates in Fig. 7 show that only SI-CPP, SI-CCBS, and ST-RRT\(^*\)-PP can find solutions for \(40+\) robots within the time limit. The success rates of SSSP and GT drop significantly with more than \(15\) robots across all environments, especially in Circ20 and Rect20, which are with more static obstacles.

Even though the focus of SI-CCBS is on solution quality over scalability, its scalability is promising. It can find solutions for \(60\) robots in most instances, which significantly outperforms SSSP and GT. Up to \(60\) robots, SI-CCBS shows higher success rates than ST-RRT\(^*\)-PP. With \(80\) robots, its success rate in the most challenging environment Rect20 is outperformed by ST-RRT\(^*\)-PP (\(14\%\) lower). However, SI-CCBS succeeds more in other environments. With \(100\) robots, its success rates is outperformed by ST-RRT\(^*\)-PP. While having such high success rates, SI-CCBS also produces high-quality solutions as shown in Fig. 8. When compared to ST-RRT\(^*\)-PP with \(80\) robots, SI-CCBS decreases solution quality by \(4\%\) and \(9\%\) in Circ20 and Rect20, respectively.

On the other hand, SI-CPP demonstrates significantly higher performance than all compared methods in terms of scalability, while minimally sacrificing the solution quality. Even in Rect20 with \(100\) robots, the success rate is \(98\%\) while ST-RRT\(^*\)-PP succeeds to find a solution only in two instances out of \(50\) (so \(4\%\)) and all the other methods fail. Since the data points are limited to compare the solution quality in Rect20 with \(100\), we compare the results with \(80\) robots. The flowtime of SI-CPP is increased by \(3.1\%\) and \(2.96\%\) compared to ST-RRT\(^*\)-PP. The amount of quality drop is consistent as Fig. 8 shows similar flowtime between SI-CPP and ST-RRT\(^*\)-PP across different environments and numbers of robots.

The auxiliary metrics show an interesting result. The makespan of SI-CPP is significantly smaller than ST-RRT\(^*\)-PP with larger numbers of robots (Fig. 9). The reduction is up to \(30.8\%\) and \(45.6\%\) in Circ20 and Rect20, respectively. This result implies that SI-CPP generates high-quality trajectories evenly for all robots, which is a key benefit of using SI-CPP for tasks that finish when all robots reach their goals. The sum of distance (presented only in Table 1) shows similar results with the flowtime. This value of SI-CCBS is approximately \(10\%\) smaller than ST-RRT\(^*\)-PP while the value of SI-CPP is \(1\)–\(2\%\) higher than ST-RRT\(^*\)-PP. Other notes include the performance of SSSP and GT, which are unable to find solutions beyond \(30+\) robots. The flowtime of them is also greater than others. Since SSSP allows robots move one by one at each time step, its flowtime increases rapidly.

In summary, SI-CCBS and SI-CPP both show great scalability with high solution quality where SI-CPP seems to be more robust comprehensively. While ST-RRT\(^*\)-PP performs well in many instances, its scalability is challenged by dense obstacles and large team sizes.

We proposed a two-level approach for addressing the multi-robot path planning problem in continuous space to generate conflict-free trajectories efficiently. We proposed a single-robot planner SI-RRT\(^*\) that is sample-efficient. Based on SI-RRT\(^*\), two variants for multi-robot systems are presented where SI-CPP used prioritized planning and SI-CCBS used a search algorithm to resolve conflicts among the individual trajectories computed by SI-RRT\(^*\). Our experimental results demonstrate that SI-CPP outperforms a state-of-the-art method in terms of the success rate, which shows the scalability of SI-CPP given a time budget for computation. SI-CPP also shows significantly smaller makespan. SI-CCBS outperforms the state-of-the-art method in terms of the solution quality while not sacrificing the scalability significantly. As a future direction, we are in the progress of a physical multi-robot system using tens of Sony toio robots (https://toio.io).

[1]

R. Stern, N. Sturtevant, A. Felner, S. Koenig, H. Ma, T. Walker, J. Li, D. Atzmon, L. Cohen, T. Kumar *et al.*, “Multi-agent pathfinding: Definitions, variants, and
benchmarks,” in *Proceedings of the International Symposium on Combinatorial Search*, vol. 10, no. 1, 2019, pp. 151–158.

[2]

J. Yu and S. LaValle, “Structure and intractability of optimal multi-robot path planning on graphs,” in *Proceedings of the AAAI Conference on Artificial
Intelligence*, vol. 27, no. 1, 2013, pp. 1443–1449.

[3]

G. Wagner and H. Choset, “Subdimensional expansion for multirobot path planning,” *Artificial intelligence*, vol. 219, pp. 1–24, 2015.

[4]

T. Standley, “Finding optimal solutions to cooperative pathfinding problems,” in *Proceedings of the AAAI Conference on Artificial Intelligence*, vol. 24, no. 1, 2010,
pp. 173–178.

[5]

M. Goldenberg, A. Felner, R. Stern, G. Sharon, N. Sturtevant, R. C. Holte, and J. Schaeffer, “Enhanced partial expansion a,” *Journal of Artificial Intelligence
Research*, vol. 50, pp. 141–187, 2014.

[6]

P. Velagapudi, K. Sycara, and P. Scerri, “Decentralized prioritized planning in large multirobot teams,” in *Proceedings of IEEE/RSJ International Conference on
Intelligent Robots and Systems*.1em plus 0.5em minus 0.4emIEEE, 2010, pp. 4603–4609.

[7]

D. Silver, “Cooperative pathfinding,” in *Proceedings of the aaai conference on artificial intelligence and interactive digital entertainment*, vol. 1, no. 1, 2005,
pp. 117–122.

[8]

G. Sharon, R. Stern, A. Felner, and N. R. Sturtevant, “Conflict-based search for optimal multi-agent pathfinding,” *Artificial Intelligence*, vol. 219, pp. 40–66,
2015.

[9]

M. Barer, G. Sharon, R. Stern, and A. Felner, “Suboptimal variants of the conflict-based search algorithm for the multi-agent pathfinding problem,” in *Proceedings of the
International Symposium on Combinatorial Search*, vol. 5, no. 1, 2014, pp. 19–27.

[10]

J. Li, W. Ruml, and S. Koenig, “EECBS: A bounded-suboptimal search for multi-agent path finding,” in *Proceedings of the AAAI Conference on Artificial
Intelligence*, vol. 35, no. 14, 2021, pp. 12 353–12 362.

[11]

M. Phillips and M. Likhachev, “Sipp: Safe interval path planning for dynamic environments,” in *Proceedings of IEEE International Conference on Robotics and
Automation*, 2011, pp. 5628–5635.

[12]

K. Okumura and X. Défago, “Quick multi-robot motion planning by combining sampling and search,” in *Proceedings of International Joint Conferences on
Artificial Intelligence*, 2023.

[13]

F. Grothe, V. N. Hartmann, A. Orthey, and M. Toussaint, “ST-RRT\(^*\): Asymptotically-optimal bidirectional motion planning through
space-time,” in *Proceedings of International Conference on Robotics and Automation*.1em plus 0.5em minus 0.4emIEEE, 2022, pp. 3314–3320.

[14]

C. Yu, Q. Li, S. Gao, and A. Prorok, “Accelerating multi-agent planning using graph transformers with bounded suboptimality,” in *Proceedings of International Conference
on Robotics and Automation*, 2023.

[15]

A. Andreychuk, K. Yakovlev, D. Atzmon, and R. Sternr, “Multi-agent pathfinding with continuous time,” in *Proceedings of International Joint Conference on Artificial
Intelligence*, 2019, pp. 39–45.

[16]

A. Orthey, S. Akbar, and M. Toussaint, “Multilevel motion planning: A fiber bundle formulation,” *arXiv preprint arXiv:2007.09435*, 2020.

[17]

J. Li, Z. Chen, D. Harabor, P. J. Stuckey, and S. Koenig, “MAPF-LNS2: Fast repairing for multi-agent path finding via large neighborhood search,” in
*Proceedings of the AAAI Conference on Artificial Intelligence*, vol. 36, no. 9, 2022, pp. 10 256–10 265.

[18]

I. Solis, J. Motes, R. Sandström, and N. M. Amato, “Representation-optimal multi-robot motion planning using conflict-based search,” *IEEE Robotics and
Automation Letters*, vol. 6, no. 3, pp. 4608–4615, 2021.