A Multi-Threading Algorithm for Constrained

Path Optimization Problem on Road Networks

August 03, 2022

The constrained path optimization (CPO) problem takes the following input: (a) a road network represented as a directed graph, where each edge is associated with a “cost” and a “score” value; (b) a source-destination pair and; (c) a budget value, which denotes the maximum permissible cost of the solution. Given the input, the goal is to determine a path from source to destination, which maximizes the “score” while constraining the total “cost” of the path to be within the given budget value. CPO problem has applications in urban navigation. However, the CPO problem is computationally challenging as it can be reduced to an instance of the arc orienteering problem, which is known to be NP-hard. The current state-of-the-art algorithms for this problem are essentially serial in nature and cannot take full advantage (i.e., achieve good load balance) of the increasingly available multi-core systems to solve a CPO query. Our proposed parallel algorithm (with its intelligent task-assignment scheme) achieves both superior solution quality and very low execution times (via good load balancing). Moreover, our approach is also able to demonstrate an almost linear speed-up with an increase in the number of cores.

spatial network routing algorithms, road networks, transportation

The problem of finding a path in road networks has been of great importance. Given the significance of the problem area, several researchers (e.g., [1]–[6]) have explored it from different aspects. Among these, the most fundamental is computing a path between a source and destination under a given preference metric. The preference metric of choice has typically been the minimization of distance (e.g., [2]), time (e.g., [3], [5]), or fuel (e.g., [7]).

However, the increasing proliferation of mobility-based Big Data [8], [9] enables one to ask for much more nuanced routing queries such as: “Determine the path which goes through wide roads while constraining the total length of the path to be at most 5km longer than the shortest path?” or “Determine a path which can be easily followed (refer [10]) while constraining the total length of the path to be at most 5km longer than the shortest path.” The central theme of both these queries is that they have the notion of both an optimizing metric and a constraining metric. Here the aim is to determine a path that maximizes on the optimizing metric (e.g., road width, navigability) along with constraining the path according to the constraining metric (e.g., distance). We refer to such problems as *Constrained Path Optimization (CPO)* problems.

**Importance of Problem:** *Constrained Path Optimization* problems have recently gained interest in the database system community (e.g., [10]–[13]) from the perspective of developing scalable solutions for some “instantiations” of the constrained path optimization problem. For instance, in [12], [13], the goal was to determine a path that maximized the scenery (i.e., it has more beautiful viewpoints) while constraining the total length of the path in terms of distance. Whereas [10], [11] focused on maximizing the “navigability” of the route (in terms of easily identifiable landmarks or easily navigable roads) while constraining the total distance traveled. Thus, it is conceivable that several other potentials “instantiations” of the constrained path optimization problem can come forward in the modern age of Big Data, where different road parameters (e.g., road type, road quality, etc.) are increasingly being recorded.

An instance of the constrained path optimization (CPO) problem can be reduced to an instance of the arc orienteering problem (AOP), which is known to be an NP-hard problem [14]–[16]. Moreover, any typical urban road network would have hundreds of thousands of road segments and road intersections. Thus, scalability is vital for any potential algorithm for the CPO problem. More specifically, we believe that for any navigational system based on the CPO problem to be used in real-life, the underlying path-finding algorithm should have a running time of at most a few seconds.

**Challenges in adapting minimization based approach for CPO problem:** There are many noted works done in the area of minimization of preference metric (e.g., [17]–[19]). However, it is important to note that these approaches cannot be easily modified to solve the CPO problem. A straightforward approach to reducing a maximization problem such as the CPO problem into a minimization problem is to change the sign of score value (i.e., make it negative). However, given that most of the minimization-based approaches (e.g., [17]–[19]) inherently use a variant of Dijkstra’s algorithm for optimization, they would not be able to work with negative score values. Another approach is to use reciprocals of the score values on edges, i.e., replace each score value \(x\) with \(\frac{1}{x}\), and then use the minimization-based approach. However, with this reduction, we cannot get a one-to-one mapping (with the original maximization-based CPO problem instance). The challenge of this reciprocal-based approach was also detailed in [11].

This paper makes the following contributions:

(a) To the best of our knowledge, we believe this paper is the first to explore the challenges of developing a parallel approach for the Constrained Path Optimization (CPO) problem on road networks. (b) We propose a novel parallel approach, called *Parallel-Spatial-RG* algorithm, for the CPO problem on road networks. *Parallel-Spatial-RG* algorithm intelligently performs task assignment and obtains good CPU utilization while avoiding deadlocks.

(c) We experimentally evaluate the *Parallel-Spatial-RG* algorithm using real road networks and compare it with existing alternative techniques for the CPO problem.

(d) Our experimental results obtained significantly better solution quality than the current heuristic algorithm [10], [11] while maintaining comparable running times.

(e) Our experimental results also showed that our proposed algorithm was able to **load balance and demonstrate an almost linear speed-up** with an increase in the number of cores.

**Outline:** The rest of the paper is organized as follows: Section 3 presents some basic concepts and formally defines the CPO problem. Section 4 presents our proposed approach. We evaluate our proposed approach and compare it with alternatives in Section 5. Finally, we conclude this paper in Section 6.

Research literature most relevant to our problem consist of the following: (a) work done in the area of theoretical computer science (e.g., [16]), (b) work done in the area of database systems (e.g., [10]–[13]), (c) work done in the area of parallel algorithms for shortest paths (e.g., [20]–[22]), (d) work done in the area of constrained shortest path problem (e.g., [17], [19], [23], [24]).

Researchers from theoretical computer science have focused on developing approximation algorithms for the orienteering problem. Note that orienteering problem can be reduced to arc orienteering problem [25]. Researchers [16] proposed a quasi-polynomial algorithm that yields a \(O(\log OPT)\) approximation for the orienteering problem. While the approximation ratio is impressive, the execution time of this algorithm is very high due to its high time complexity. Our adaptation of [16] for the CPO problem yielded a running time in hours (or in some cases, days) on a typical road network dataset, which is not feasible in the real world.

Database system researchers have been working on variants of the CPO problem (e.g., [10]–[13]) from a scalability perspective. Their approaches already have running time in milliseconds, and parallelizing them can not produce a better solution quality. In contrast, our experimental analysis shows that our proposed approach obtains a significantly better solution quality while maintaining comparable running times (more details in Section 5).

Research work done in the area of parallel algorithms for shortest paths (e.g., [20]–[22], [26]) has focused on developing path-finding algorithms that optimize routes based on a single preference metric (e.g., distance or time). Hence, these cannot be used for our CPO problem, which has an inherently different computational structure as it uses more than one preference metric.

Lastly, it is important to acknowledge the work done in the area of constraint shortest path (CSP) problem [17], [19], [23], [24] while proposing a solution for the CPO problem. The CSP problem aims to determine the shortest path between a source-destination pair subject to certain constraints. Note that CSP problem is a *minimization based problem* and moreover the existing techniques [17], [19], [23], [24] are based on Dijkstra’s algorithm. Therefore, as mentioned in Section 1.1, we cannot use them to solve a CPO problem instance.

**Road network:** A road network is represented as a directed graph \(G=(V,E)\). Here, nodes (set \(V\)) represent the road intersections, whereas directed edges (set \(E\)) represent the road segments. Each edge in \(E\) is associated with a notion of a cost value (\(> 0\)) and a score value (\(\geq 0\)). In this paper, the cost of an edge \(e=(x,y)\) corresponds to the euclidean distance between \(x\) and \(y\).

**Optimizing Metric (\(\Gamma()\)):** Given any directed path \(P_i\), \(\Gamma(P_i)\) returns the total “score” collected by \(P_i\). In its simplest form, \(\Gamma(P_i)\) can be defined as a sum of the score values of all the edges constituting the path \(P_i\).

**Constraining Metric (\(\Phi()\)):** Given any directed path \(P_i\), \(\Phi(P_i)\) returns the total “cost” consumed by \(P_i\). We define \(\Phi(P_i)\) as the sum of the cost values of all the edges constituting the path \(P_i\).

**Input:** consists of the following:

(1) A road network, \(G=(V,E)\), where each node \(v \in V\) is associated with certain spatial coordinates.

(2) A source \(s\) \(\in\) \(V\) and a destination \(d\) \(\in\) \(V\).

(3) A positive value \(overhead\) corresponds to the maximum permissible cost allowed over the cost of the minimum cost path from \(s\) to \(d\). This paper uses the term \(budget\) to denote the sum of overhead and the cost of the minimum cost path between \(s\) and \(d\).

**Output:** A directed path \(P*\) between \(s\) and \(d\).

**Objective function:** Maximize \(\Gamma(P*)\)

**Constraint:** \(\Phi(P*) \leq budget\)

This section details our proposed parallel algorithm for solving the CPO problem. In Section 4.1, we present our Spatial-RG algorithm, inspired by the approximation algorithm proposed in [16] for the orienteering problem. Spatial-RG algorithm has been optimized to solve CPO problem instances on road networks and is much more amenable for a good **load-balanced** parallelization (details in Section 4.2). In Section 4.2, we develop a parallel version of the Spatial-RG algorithm.

The key idea over here is first to find an initial seed path (shortest path) between the given source and destination nodes and then recursively determine its better replacements. The initial seed path can be determined using any standard shortest path algorithm. In our implementation, we used A* algorithm with *euclidean distance* as the heuristic function [27]. The algorithm continues the recursion up to maximum recursion depth \(\theta\) (an input parameter given to the algorithm).

**Details of the Spatial-RG algorithm:** A pseudocode of the algorithm is presented in Algorithm 1. Each call to the algorithm primarily takes the following input: (i) “source node” \(u\), (ii) “destination node” \(v\) and (iii) remaining budget \(\beta\). In the first call to the Spatial-RG algorithm, \(u\), \(v\), and \(\beta\) would be set according to the input values given while defining the CPO query to be processed. Thereafter, \(u\), \(v\), and \(\beta\) change during the course of the recursion calls.

In each recursion, Algorithm 1 first iterates (outer loop on line 9) over all edges \(e\) which satisfy the following two filters: (a) \(\Gamma(e) > 0\) and, (b) \(e\) is inside the ellipse formed by \(u\) and \(v\) as the foci and \(\beta\) as the major axis length. It is important to note that the filter proposed in (a) may affect the quality of the final solution. Nevertheless, we use it to gain performance. Moreover, our experiments also reveal that Spatial-RG still has significantly better solution quality than the current heuristics for the CPO problem. In contrast, filter (b) maintains correctness. It was already explained in [11], [12] and thus, we do not include its correctness proof.

Suppose an edge \(e=(x,y)\in E\) is considered in the outer loop (line 9). This means that the current path between \(u\) and \(v\) would be replaced by a path which is the combination of the following three sub-paths: (1) a path \(P_1\) between \(u\) and \(x\), (2) edge \(e=(x,y)\) and, (3) a path \(P_2\) between \(y\) and \(v\). Both \(P_1\) and \(P_2\) are determined recursively inside the inner loop on line 11 as described next.

The inner loop of Algorithm 1 loop over a set of possible budget values. In each iteration of this inner loop, \(P_1\) and \(P_2\) are determined recursively for different budget values (line 12 and 13). For each pair of \(P_1\) and \(P_2\) (as returned by their respective recursive calls), we determine \(P_{new}\) as \(P_1 \cup e \cup P_2\) (joining \(P_1\), \(e\), and \(P_2\) in same order). We store \(P_{new}\) if it has a better score value than the original path between \(u\) and \(v\) (determined on line 1). Thus, at the termination of the inner while loop, we would have the “best possible” \(P_1\) and \(P_2\), which can be attached with \(e\) (current edge being considered on the outer loop) to obtain the “best possible” replacement for the current path between \(u\) and \(v\).

It is important to note that budget values sent into the recursive calls for determining \(P_1\) and \(P_2\) (line 12 and line 13) are not entirely independent of each other. More precisely, if a budget value of \(b\) is given for determine \(P_1\), then a maximum value of \(\beta-b-\Phi(e)\) budget can be given for determine \(P_2\).

And lastly, the set of feasible budget values (\(b\)) for the inner while loop range from \(b=Euclidean\_Distance(u,x)\) to \(b=\beta - \Phi(e)\) - \(Euclidean\_Distance(y,v)\). Correctness of these values can be easily proved by considering the fact *shortest network distance over a graph would always be greater than or equal to the Euclidean distance. Details omitted due to lack of space*.

**Spatial Indexing for implementing the Ellipse pruning:** We use a *Uniform Grid Indexing* [28] to efficiently determine the edges present in \(ellipse(u,v,\beta)\) on line 9 of Algorithm 1. This is done using the concept of spatial range query.

While the Spatial-RG algorithm uses several spatial filtering strategies and spatial indexing for gaining performance, its running time (as seen in our experiments) was still impracticable for meeting the real-world expectation of getting a solution for the CPO problem within a few seconds. To this end, a parallel version of Spatial-RG algorithm was developed, which can harness the increasingly available multi-core systems to **improve execution time while still maintaining the same solution quality as Spatial-RG.**

This section describes the proposed parallel algorithm *Parallel-Spatial-RG* for the CPO problem. On a close inspection of *Spatial-RG* (Algorithm 1), one may realize that the algorithm has some inherent parallelism at the following three places: (a) Outer loop (For loop on line 9 in Algorithm 1), (b) Recursion calls on line numbers 12 and 13 (in Algorithm 1), (c) Inner While loop (on line 11 in Algorithm 1). In the proposed algorithm, only options (a) and (b) are considered for parallelization. The while loop (option (c)) on line 11 in Algorithm 1 is not considered for parallelization as the thread which creates the tasks by unrolling the loop on line 9 in Algorithm 1 would have no other work except the creation of tasks corresponding to loop in line 11. As a result, this thread would be sitting idle while other threads undertake the job of iterating over the while loop. We first discuss our proposed technique for parallelizing the outer loop (For loop on line 9 in Algorithm 1), and then the recursion calls on line numbers 12 and 13. We start our discussion with an intuitive naive approach and highlight its limitations. Following that, we describe the proposed *Parallel-Spatial-RG* algorithm.

Note that *Spatial-RG* (Algorithm 1) has recursion calls inside a while loop. As a result, the search space of the algorithm would be non-linear. More specifically, the search space of the *Spatial-RG* algorithm would be similar to a *tree* structure. And the algorithm would essentially undertake a depth-first traversal of this *tree-structured* search space to obtain a solution.

Figure 2 illustrates this *tree-structured* search space for a hypothetical instance of Algorithm 1. In this instance, in the first level, the loop on line 9 in Algorithm 1 has 4 edges (\(e_1, e_2, e_3, e_4\)). For each of these edges, the algorithm would first have to execute the while loop (between lines 11 and 20 in Algorithm 1). And inside this while loop, the recursion calls are taking place.

An intuitive way of parallelizing the *Spatial-RG* algorithm would be to use multiple threads to explore its *tree-structured* search space. As a naive approach, one can employ independent threads to explore the sub-trees (in Figure 2) under the root in parallel and then determine the best solution amongst individual solutions obtained in different sub-trees. However, such an approach may not always guarantee good CPU utilization. For instance, consider an 8-core system (capable of running 16 threads). In the search space illustrated in Figure 2, one may assign the sub-trees underneath the root to 4 different threads. Such an approach would not be using the remaining 12 threads. Moreover, it may also happen that the work across different sub-trees may not be uniformly distributed due to variation in the properties (density of roads and their scores) of road segments across different spatial locations. Consequently, some threads may finish their work much sooner than others. Therefore, causing even worse CPU utilization.

A logical extension of this naive approach would be to determine an appropriate level in the *tree-structured* search space (at runtime) based on the number of threads available and then continue exploring in parallel. However, this approach also has its limitations. Firstly, on any typical road network, *Spatial-RG*’s search space tree would have a high fan-out factor. In other words, the number of “nodes” increases almost exponentially with each level. For instance, level 1 may have 10 nodes, and level 2 may have 80 nodes. Secondly, the search space of *Spatial-RG* (on any CPO problem instance) cannot be pre-determined precisely as it is heavily dependent on the spatial distribution of the edges in the input dataset.

As mentioned in the previous subsection, any approach which is solely dependent on unpacking the recursion (followed by simultaneous independent exploration by threads) would have poor CPU utilization.

To this end, we use the following key strategy: In each recursion call, the current thread is allowed to create further tasks for exploring its designated search space, but those tasks may not always be executed in parallel. More specifically, a separate task is created for each iteration of the outer loop (loop on line 9 in Algorithm 1) and the recursive calls on line numbers 12 and 13 of the *Spatial-RG* algorithm. These tasks are picked up by a thread only if it is idle or free. Otherwise, those tasks are executed serially by the thread which created those tasks (details provided later in the section).

It is important to note that the tasks created by different threads *should not be put in a global job pool*. Such an approach may lead to *deadlock*, as illustrated in the following example. Consider a case where the outer loop (line 9 of Algorithm 1) of *Spatial-RG* and the recursion calls (inside the inner loop) are parallelized. In each recursion call, the current thread would create a separate task for each outer loop iteration. Along with that, the parallelization of the recursion calls also takes place. So, the tasks created previously are further divided into two sub-tasks. All these tasks are put into a global job pool. And whenever a free thread is found, it is assigned a job from the global job pool.

Figure 3 illustrates a case of deadlock in such an approach. To simplify the example, we consider a very small instance and a limited number of resources (2 threads). Here in Figure 3 \(R\) is the recursion base. At the start of execution, the \(R\) is picked up by thread \(T0\). Then \(T0\) creates jobs corresponding to all the edges in the outer loop and puts those jobs into the global job pool. In this example, we assume that a single job is created corresponding to a single edge \(E1\) (in the outer loop) for the sake of simplicity. Now \(T0\) goes into the waiting stage and waits for \(E1\) to complete. The job corresponding to \(E1\) is then picked by the free thread available in the thread pool \(T1\). Following this, \(T1\) creates two tasks \(R1\) and \(R2\) (corresponding to two recursive calls in the inner loop) for \(E1\) and puts them into the same global job pool. Now \(T1\) also goes into the waiting stage and waits for its recursive tasks (\(R1\) and \(R2\)) to complete. At this stage, there are no free threads in the thread pool. Each thread is waiting for their respective jobs to finish, and jobs in the job pool are waiting for free threads. In Figure 3, this situation is explained using two *hold and wait* cycles, which is one of the well-known conditions for *deadlock*. One cycle is formed by the thread pool, \(E1\), and \(R1\), while the other is formed by the thread pool, \(E1\), and \(R2\). In each of the cases, \(E1\) holds one resource (thread) and waits for its recursive task (\(R1\) or \(R2\)) to complete, and the recursive tasks (\(R1\) or \(R2\)) are waiting for the \(R\) or \(E1\) to release one of the resources (thread), while \(R\) is waiting for \(E1\) to be finished. So the algorithm would not proceed further, resulting in a *deadlock*.

One may obtain slightly better performance by making each thread hold some tasks for itself and putting the remaining ones in the global job pool. However, this strategy can still not guarantee a deadlock-free execution. Moreover, for this approach to work, one would have to decide the “optimal number” of tasks needed to perform serially for each thread before any of their tasks starts execution. Determining this “optimal number” precisely would be challenging due to variation in properties (density of roads and their scores) of road segments across different spatial locations.

We address the problem of deadlocks through the following strategy: Each thread would maintain its *local job pool*. After creating jobs, the current thread would look for free threads. If no free threads are found, the current thread would start picking up jobs from its job pool (while actively looking for free threads).

The proposed parallel approach is primarily detailed across Algorithm 4 and Algorithm 5. Algorithm 5 is used inside Algorithm 4. We initialize Algorithm 4 by passing the source node, the destination node, budget, current level (set to \(0\)), and the maximum depth of recursion allowed. We also create a worker pool (referred to as \(threadpool\) in the pseudocodes) to be used in Algorithm 4 and Algorithm 5 according to the number of cores available. Typically, in modern multi-core processors with hyper-threading technology, we would set the number of workers (threads) to twice the number of cores available.

Overall, Algorithm 4 is structurally similar to the *Spatial-RG* algorithm. Algorithm 5 is used inside Algorithm 4 to do the work corresponding to the while loop between lines 11–20 in *Spatial-RG*. In each recursion call, Algorithm 4 creates a task for each iteration of the outer loop (for loop on line 9 in Algorithm 1). This is done in lines 10–13 of Algorithm 4. Each of these tasks essentially attempts to compute the solution while including a particular edge \(e=(x,y)\). With reference to the *Spatial-RG* algorithm (Algorithm 1), this is the work done corresponding to the while loop between lines 11–20.

The jobs created are put in a job pool \(JP\), which is *local* to the current recursion call of Algorithm 4. Jobs from \(JP\) are assigned to idle threads. These threads would report their respective results in the result set \(RS\), which is shared among them. In our implementation, each thread in \(JP\) is assigned a unique location in \(RS\). This allows all the threads to access the \(RS\) simultaneously and avoids a critical section. Consequently, we get better CPU utilization. If the number of idle threads is less than the number of jobs in \(JP\), the remaining jobs are picked up by the currently executing thread (which actually created the job queue) while actively looking for free threads. This is done in lines 10–25 in Algorithm 4. With the intent to simplify the notations, the term “primary thread” refers to the thread that creates the job pool in a recursion call. There would be only one “primary thread” per call. As mentioned earlier, one can trivially modify this algorithm to hold out one task (from \(JP\)) for the “primary thread.” This would give slightly better performance. After all the tasks in \(JP\) are completed, the “primary thread” chooses the best result from \(RS\) and returns it.

As mentioned earlier, Algorithm 5 focuses on the while loop between lines \(11\)–\(20\) of the *Spatial-RG* algorithm. It creates tasks for the two recursion calls to Algorithm 4 and puts them into a job pool \(JP2\). Similar to the previous case, this job pool is created locally by its “primary thread.” Tasks in \(JP2\) are assigned to free threads, which, in turn, would report their solutions in a shared result set. And if no free threads are found, the “primary thread” picks up the tasks in \(JP2\) for processing.

**Generalizing Parallel-Spatial-RG for Minimization:** Algorithm 4 and Algorithm 5 can be trivially generalized for a minimization case (e.g., minimize the #potholes in the path) by making two small changes. First, reverse the “if” conditions (between \(\Gamma(P_{new})\) and \(\Gamma(P)\)) on line 30 of Algorithm 4 and line 22 of Algorithm 5. Second, remove the restriction of using only the edges \(e=(x,y)\in E\) with \(\Gamma(e) > 0\) on line 10 of Algorithm 4. In general, we believe that our approach is suitable for both maximization and minimization cases. It is important to note that our generalized version of *Parallel-Spatial-RG* would still not use Dijkstra’s styled enumeration (as original *Parallel-Spatial-RG* does not use it) while optimizing the score values. This makes it robust to handle edges with negative scores as well. In fact, dependence on Dijkstra’s styled enumeration in the current state of art solutions (e.g., [17], [19], [23] makes them unsuitable for the CPO problem. As otherwise, one could reverse the sign of score values (i.e., make them negative) and run a minimization-based approach. We plan to investigate this generalization further in our future work and compare it with the current state-of-the-art in CSP problem (e.g., [17], [19], [23]).

**Comment on ForkJoinPool based implementation:** ForkJoinPool internally uses a *work-stealing* based scheduler and gives priority to the jobs created by lower levels in the recursion calls. Such a strategy would avoid deadlocks. However, this scheduler does not allow the programmer to control the threads created within a particular task. As a result, race conditions can not be controlled, which leads to low CPU utilization. Thus, a trivial parallelization of the *Spatial-RG* algorithm using the ForkJoinPool library would not give good CPU utilization due to the creation of race conditions at lines 15, 16, and 17 of Algorithm 1.

**Time complexity analysis of Parallel-Spatial-RG:** In the worst case, an instance of *Parallel-Spatial-RG* algorithm would iterate over all the edges \(e=(x,y)\in E\) (in the outer loop), and for each iteration, it would again iterate for \(\beta\) times (in the worst case) in the inner loop. Following this, it would have two recursion calls inside the inner loop. Thus, the time complexity for one recursion call is \(O(2m\beta)\) (\(m\) is the total number of edges present in the network). For a maximum recursion depth of \(\theta\), the total time complexity of *Parallel-Spatial-RG* would be \(O(2m\beta)^\theta\). Despite the high worst-case time complexity, the combination of spatial filters (Section 4.1) and good CPU utilization (via our parallel approach) helps *Parallel-Spatial-RG* to have low execution time in practice.

In this section, we experimentally evaluate *Parallel-Spatial-RG* and compare it with the current state-of-the-art.

Road Network |
#Nodes |
#Edges |
---|---|---|

Delhi | 52576 | 150488 |

Buenos Aires | 263783 | 864408 |

London | 285050 | 749382 |

Our experimental analysis is done on three real-world road network datasets from [29]. The details of the datasets are briefed in Table **¿tbl:tab?**::dataset. In each of these datasets, some edges were selected uniformly at random from across the road network and were assigned (randomly) a score value between 1 and 15. Other edges have a score value of zero. We also varied the number of edges, which had non-zero score values (details provided later).

We compare our proposed *Parallel-Spatial-RG* algorithm against the following candidates: *(a) ILS*(CEI)[12], (b) MSWBS[10]*. We also adapted and implemented [16] for our CPO problem. However, it showed an execution time in hours (sometimes even a day). Therefore, we did not include it in our experiments.

The following parameters were varied in our experiments in subsection 5.1.

*Budget (\(\beta\)):* Recall that budget has been defined as the sum of overhead and the cost of the shortest path. As the budget increases, the working space of *Parallel-Spatial-RG* also increases.

*Path Length:* Path length affects the running time of all the candidate algorithms. However, the notion of path length can have multiple interpretations in a weighted graph. To ensure the interpretability of the results, for our experiments, we define the path length as the number of edges in the shortest path between the given source and destination. We reported the average value of \(100\) source-destination pairs for each path length range.

*Recursion Depth (\(\theta\)):* Different recursion depths (\(\theta=1\), \(\theta=2\)) were tried to determine an “optimal” recursion depth for *Parallel-Spatial-RG*. This “optimal” recursion depth makes a good balance between solution quality and running time.

*Number of Threads (\(n\)):* *Parallel-Spatial-RG* is run for different number of threads (\(8, 16, 32, 64, 96, 128\)) to show the near linear speed-up with an increase in the number of cores.

*Number of edges with non-zero score:* The proposed algorithm’s runtime depends on the road network’s density, more specifically, the density of the edges with non-zero scores values in that road network.

We measured the following in our experiments: *(a) Execution time of the algorithm.* *(b) Score gain over the score of the shortest path.* Score gain is defined as the difference between the total score of the solution obtained by the candidate algorithm and the shortest path.

All the algorithms (including the candidate algorithm) were implemented in JAVA11. We used an Ubuntu machine with several Intel Xeon Platinum 8280M cores (total capacity of 128 threads) and 2048GB RAM. Also, note that our processor had a base frequency of 2.70GHz (with a max boost frequency of 4.00GHz).

In this experiment, *Parallel-Spatial-RG* is compared for three different values of recursion depth (\(\theta\)) (\(1\), \(2\), & \(3\)) at three different values of overhead \(30\%\), \(40\%\), and \(50\%\). The runtime of *Parallel-Spatial-RG* increases exponentially with an increase in the recursion depth. Therefore, we considered a smaller instance of the Delhi road network (\(9401\) nodes and \(25941\) edges) in which \(40\%\) of the edges had a non-zero score value for this experiment. In Fig. 6 (a) the run-time of *Parallel-Spatial-RG* is shown for different values of \(\theta\) at different overheads. Fig. 6 (b) shows the corresponding score gain. Overall, the figures show that as recursion depth is increased, the execution time of *Parallel-Spatial-RG* increases exponentially. Whereas we only get a very small improvement on the score gain aspect. To this end, we set the recursion depth to \(1\) in our remaining experiments.

In this experiment, the execution time of *Parallel-Spatial-RG* is analyzed for different number of cores (\(8, 16, 32, 64, 96, 128\)) and for different path length ranges (\(11\)–\(20\), \(21\)–\(30\), \(31\)–\(40\), \(41\)–\(50\)). As per Fig. 7, *Parallel-Spatial-RG* gives an almost linear scale up with the increasing number of cores. This shows the scalability of *Parallel-Spatial-RG* algorithm. Though we could not run *Parallel-Spatial-RG* on a bigger system due to resource constraints, the scalability demonstrated by *Parallel-Spatial-RG* guarantees a better performance on a system with a higher number of cores. Hence, we fix the number of cores to 128 for the later part of the experiments.

In the Delhi road network *Parallel-Spatial-RG* had a maximum runtime of \(1.32\) seconds and \(0.81\) seconds on an average across all the path length ranges. For both the Buenos Aires and London road networks, the performance of *Parallel-Spatial-RG* is quite similar as both have nearly the same network size. For the Buenos Aires dataset, *Parallel-Spatial-RG* had a runtime of \(2.7\) seconds on average and \(4.2\) seconds for a maximum case. And for the London network, those values were \(2.9\) seconds and \(4.15\) seconds, respectively.

To summarize, the proposed method *Parallel-Spatial-RG* produces high solution quality within an acceptable time limit for different datasets. For the bigger networks, it incurs a little bit higher execution time (more than \(3\) seconds). This is because of the multiple numbers of shortest path computation on-the-fly for a single CPO problem instance. In our implementation, we used A* algorithm (with Euclidean distance as the heuristic function) for computing shortest paths.

One may trivially improve the performance of *Parallel-Spatial-RG* by using the latest shortest path computation techniques such as [2], [30], [31]. However, experimental evaluation with these techniques is beyond the scope of this paper and will be investigated in the future. An interesting fact to note is that in all the datasets, for the shorter path lengths(\(11\)–\(20\)), the improvement with the increase in the number of cores gets saturated. This is due to the following two reasons: (a) runtime was already in around one second; (b) the number of sub-tasks created during the run was less than the total number of cores provided for the run.

The performance of *Parallel-Spatial-RG* depends on the budget value. To this end, we analyzed the behavior of *Parallel-Spatial-RG* with varying overhead values of \(30\%\), \(40\%\), and \(50\%\). Note that budget = cost of the shortest path + overhead over the shortest path cost. Fig. 8 reveals that the runtime, as well as the solution quality of *Parallel-Spatial-RG*, increases with the increase in overhead, i.e., increase in budget. As the behavior of *Parallel-Spatial-RG* is quite similar across the datasets, we only include the results for the London dataset for this experiment and fixed the overhead value to \(30\%\) over the cost of the shortest path for the later parts of the experiments.

The performance of *Parallel-Spatial-RG* also depends on the density of the edges with non-zero scores values. Therefore, we analyzed its behavior with varying density of the edges with non-zero score values (\(30\%\), \(40\%\), and \(50\%\)). Fig. 9 demonstrated that the runtime and the solution quality of *Parallel-Spatial-RG* increase with the increase in density of edges with non-zero scores values. Here also, we only consider the results for the London dataset and fixed the density of edges with non-zero score value to \(40\%\) for the later parts of the experiments.

This experiment compares *Parallel-Spatial-RG* with the candidate algorithms *MSWBS* and *ILS*(CEI)* algorithm for various path length ranges (\(11\)–\(20\), \(21\)–\(31\), \(31\)–\(40\), \(41\)–\(50\)) on each of our three road network datasets. Fig. 10 shows the average score comparison between *Parallel-Spatial-RG*, *MSWBS*, and *ILS*(CEI)*. In terms of achieved score, *Parallel-Spatial-RG* outperforms both *MSWBS* and *ILS*(CEI)* by a huge margin in all the three datasets.

With regards to execution time, *MSWBS* had an average (across different path lengths) run-time of \(0.12\)sec, \(0.13\)sec, and \(0.11\)sec on Delhi, Buenos Aires, and London datasets respectively. Whereas, *Parallel-Spatial-RG* demonstrated an average (across different path lengths) runtime of \(0.9\) seconds, \(2.7\) seconds, and \(2.9\) seconds respectively for the same parameters. *ILS*(CEI)* allows us to fix the execution time for each query and return the obtained path up to that particular threshold. In our experiments, we fixed this threshold to be \(3\) seconds (as *Parallel-Spatial-RG* had a maximum execution time of \(3\) seconds) in this experiment. Note that *Parallel-Spatial-RG* could have obtained a much lower execution time on a system with more available threads due to “almost” linear scale-up. *MSWBS* already has a running time in milliseconds, and *ILS*(CEI)* has an almost fixed runtime (due to its threshold). Therefore, due to lack of space, we didn’t include the runtime comparison between *Parallel-Spatial-RG* with the candidate algorithms.

This paper studied the problem of the Constrained Path Optimization (CPO) problem on road networks. CPO problem has value addition potential in the domain of urban navigation. However, the current state-of-the-art solutions (approximation algorithm or heuristic solutions) either fail to scale up to real-world road networks or have poor solution quality. In contrast, our proposed parallel algorithm *Parallel-Spatial-RG* shows promising results in terms of both scalability and solution quality. In the future, we will continue working on the *Parallel-Spatial-RG* algorithm to improve its scalability even further and also establish a formal approximation ratio for the algorithm. More specifically, we plan to explore the potential of hierarchical routing techniques for improving the scalability of our *Parallel-Spatial-RG* algorithm.

[1]

J. Yuan and et al., “T-drive: driving directions based on taxi trajectories,” in *Proc. of the ACM SIGSPATIAL*, ser. GIS ’10, 2010, pp. 99–108.

[2]

N. Jing, Y. Huang, and E. Rundensteiner, “Hierarchical encoded path views for path query processing: An optimal model and its performance evaluation,” *IEEE Trans. on Knowledge and Data Engg*, vol. 10, no. 3, pp. 409–432, 1998, iEEE.

[3]

U. Demiryurek and et al., “Online computation of fastest path in time-dependent spatial networks,” in *Proc. SSTD, LNCS Vol. no 6849*.1em plus 0.5em minus 0.4emSpringer, 2011, pp. 92–111.

[4]

V. Gunturi and et al., “A critical-time-point approach to all-departure-time lagrangian shortest paths,” *IEEE trans. KDE*, vol. 27, no. 10, pp. 2591–2603, 2015.

[5]

B. Yang and et al., “PACE: a path-centric paradigm for stochastic path finding,” *VLDB J.*, vol. 27, no. 2, pp. 153–178, 2018.

[6]

D. Delling, “Time-dependent sharc-routing,” in *Proc. of the 16th annual European symposium on Algorithms*.1em plus 0.5em minus 0.4emSpringer, 2008, pp. 332–343.

[7]

Y. Li and et. al., “Physics-guided energy-efficient path selection: a summary of results,” in *Proc. of the 26th ACM SIGSPATIAL*, 2018, pp. 99–108.

[8]

N. Henke and et al., “The age of analytics: Competing in a data-driven world,,” Mckinsey Global Institute https://tinyurl.com/yb7vytkg, Dec 2016.

[9]

R. Y. Ali and et al., “Future connected vehicles: challenges and opportunities for spatio-temporal computing,” in *Proc. of the 23rd ACM SIGSPATIAL*, 2015, pp. 14:1–14:4.

[10]

R. Kaur and et al., “Finding the most navigable path in road networks: A summary of results,” in *Proc. of DEXA*, ser. LNCS, vol. 11029.1em plus 0.5em minus 0.4emSpringer, 2018, pp. 440–456.

[11]

——, “Finding the most navigable path in road networks,” *Geoinformatica*, vol. 25, no. 1, pp. 207–240, 2021.

[12]

Y. Lu and C. Shahabi, “An arc orienteering algorithm to find the most scenic path on a large-scale road network,” in *Proc. of the 23rd SIGSPATIAL*.1em plus 0.5em minus 0.4em, 2015.

[13]

Y. Lu and et al., “Scenic routes now: Efficiently solving the time-dependent arc orienteering problem,” in *Proc. of the 2017 CKIM*.1em plus 0.5em minus 0.4em, 2017, p. 487–496.

[14]

C. Archetti and et al., “A branch-and-cut algorithm for the orienteering arc routing problem,” *Comput. Oper. Res.*, vol. 66, no. C, pp. 95–104, Feb. 2016.

[15]

D. Gavalas and et al., “Approximation algorithms for the arc orienteering problem,” *Information Processing Letters*, vol. 115, no. 2, pp. 313 – 315, 2015.

[16]

C. Chekuri and et al., “A recursive greedy algorithm for walks in directed graphs,” in *Proc. of Symp. on Foundations of Comp. Sci.*1em plus 0.5em minus 0.4emIEEE, 2005, pp. 245–253.

[17]

S. Wang and et al., “Effective indexing for approximate constrained shortest path queries on large road networks,” *Proc. VLDB Endow.*, vol. 10, no. 2, p. 61–72, Oct. 2016.

[18]

U. Demiryurek and et al., “Online computation of fastest path in time-dependent spatial networks,” in *Proc. of SSTD*, ser. LNCS, vol. 6849.1em plus 0.5em minus 0.4emSpringer, 2011, pp. 92–111.

[19]

Z. Liu and et al., “Efficient constrained shortest path query answering with forest hop labeling,” in *IEEE 37th ICDE*, 2021, pp. 1763–1774.

[20]

S. Maleki and et al., “Dsmr: A shared and distributed memory algorithm for single-source shortest path problem,” in *Proc. of the 21st PPoPP*, 2016, pp. 39:1–39:2.

[21]

Y. Simmhan and et al., “Distributed programming over time-series graphs,” in *2015 IEEE IPDPS*, 2015, pp. 809–818.

[22]

V. T. Chakaravarthy and et al., “Scalable single source shortest path algorithms for massively parallel systems,” *IEEE Trans on PDS*, vol. 28, no. 7, pp. 2031–2045, 2017.

[23]

S. Lu and et al., “Accelerating exact constrained shortest paths on gpus,” *Proc. VLDB Endow.*, vol. 14, no. 4, pp. 547–559, 2020.

[24]

P. Hansen, “Bicriterion path problems,” in *Multiple criteria decision making theory and application*.1em plus 0.5em minus 0.4emSpringer, 1980, pp. 109–127.

[25]

D. Gavalas and et al., “Approximation algorithms for the arc orienteering problem,” *Info. Processing Letters*, vol. 115, no. 2, pp. 313–315, 2015.

[26]

K. Vishwakarma and V. M. V. Gunturi, “Impromptu rendezvous based multi-threaded algorithm for shortest lagrangian path problem on road networks,” in *Proc. of the ICA3PP*, 2019, pp. 201–222.

[27]

V. Gunturi and S. Shekhar, *Spatio-Temporal Graph Data Analytics*.1em plus 0.5em minus 0.4emSpringer isbn 978-3-319-67770-5, 2017.

[28]

S. Shekhar and S. Chawla, *A tour of spatial databases*.1em plus 0.5em minus 0.4emPrentice Hall Upper Saddle River, 2003.

[29]

A. Karduni and et al., “A protocol to convert spatial polyline data to network formats and applications to world urban road networks,” *Scientific Data*, vol. 3, no. 160046, 2016.

[30]

D. Delling and et al., “Phast: Hardware-accelerated shortest path trees,” *Journal of Parallel and Distributed Computing*, vol. 73, no. 7, pp. 940 – 952, 2013.

[31]

P. Sanders and D. Schultes, “Highway hierarchies hasten exact shortest path queries,” in *Algorithms – ESA 2005*, 2005, pp. 568–579.