Tree-based formulation for the multi-commodity flow problem

Simon Spoorendonk
Flowty ApS
Denmark
simon@flowty.ai
Bjørn Petersen
Flowty ApS
Denmark
bjorn@flowty.ai


Abstract

We introduce a tree-based formulation for the minimum-cost multi-commodity flow problem that addresses large-scale instances. The method decomposes the source-based model by representing flows as convex combinations of trees rooted at source nodes, and solves the resulting formulation with column generation. The number of demand constraints now depends on the number of sources \(|S|\), not commodities \(|K|\), yielding a compact master problem when \(|S| \ll |K|\). We conduct a computational study comparing tree-based decomposition against path-based column generation and direct LP solving. The results show speed-ups of up to one order of magnitude over direct LP solving, and improved scalability compared to path-based formulations. Tree-based decomposition enables solving instances with millions of commodities and hundreds of thousands of nodes. This makes it well-suited for applications in transportation and logistics networks where multiple demands often share common origins.

1 Introduction↩︎

The minimum-cost multi-commodity flow (MCF) problem is a fundamental network optimization problem with applications in logistics, passenger transportation, telecommunication, and energy systems [1], [2]. It routes multiple origin–destination demands through a capacitated network at minimum cost. The MCF problem can be solved as a linear programming model, but the number of variables and constraints grows rapidly, making large-scale instances computationally challenging.

The classical edge-based formulation introduces \(O(|K||E|)\) variables and \(O(|K||V| + |E|)\) constraints. This becomes intractable when either the commodity set \(K\) or the network size grows. Source-based formulations aggregate by source nodes, reducing the model to \(O(|S||E|)\) variables and \(O(|S||V| + |E|)\) constraints. When \(|S| \ll |K|\) this yields savings, but commodity-level flows must then be reconstructed [1]. Path-based formulations, obtained via Dantzig–Wolfe decomposition, replace edge variables by path variables. Each commodity \(k \in K\) has an exponential path set \(\mathcal{P}_k\), and column generation dynamically adds paths [3], [4]. However, the master problem still contains \(O(|K|)\) demand constraints, and restricted master problems grow prohibitively large when \(|K|\) reaches millions.

Research has therefore focused on decomposition and relaxation approaches. Stabilized column generation methods [5], analytic center cutting-plane algorithms with active sets, and Lagrangian relaxation [6] have extended the tractable instance size. Recently, Zhang and Boyd [7] proposed a GPU–compatible primal–dual hybrid gradient method for the all-pair max-flow variant of the MCF problem, achieving orders-of-magnitude speed-ups over interior-point solvers. It remains to be investigated whether this is also true for the min-cost MCF problem. Destination-aggregated formulations have also been investigated in communication and optical networks [8]. These advances illustrate that scalability is based on exploiting the problem structure and avoiding the explicit introduction of unnecessary flow variables.

To address the master problem bottleneck, we study a tree-based decomposition of the MCF problem that are derived from the source-based formulation by representing flows as combinations of trees rooted at source nodes. This approach drastically reduces the size of the master problem in column generation while retaining the ability to model individual commodity demands. The master problem has now \(O(|S|)\) demand constraints and \(O(|E|)\) capacity constraints. The number of variables is still exponentially large but in practice much smaller than the path-based formulation since fewer tree variables are needed in an optimal basis. The approach pushes the scalability frontier of exact MCF optimization, solving instances with tens of thousands of nodes and millions of commodities.

The remainder of the paper is organized as follows. 2 introduces the formulations of the MCF problem including the tree-based formulation introduced in this paper. 3 describes the column generation algorithm used for the path- and tree-based formulations. 4 presents computational experiments on large-scale instances, and 5 concludes with directions for future research.

2 Problem statement↩︎

We present four formulations: the edge-based formulation, the source-based formulation, the path-based formulation commonly solved via column generation, and our novel tree-based formulation.

We consider a directed network \(G=(V,E)\) with node set \(V\) and edge set \(E\). Each edge \(e \in E\) has capacity \(u_e \geq 0\). The set of commodities is \(K\), with each commodity \(k \in K\) defined by source \(s_k \in V\), sink \(t_k \in V\), and demand \(d_k > 0\). Sending one unit of commodity over edge \(e\) incurs cost \(c_e\). The objective of the MCF problem is to route all demands at minimum cost without exceeding edge capacities.

2.1 Edge-based formulation↩︎

The classical edge-based linear program introduces variables \(f^k_e\) for the flow of commodity \(k\) on edge \(e\). The model is

\[\begin{align} \min \quad & \sum_{k \in K} \sum_{e \in E} c_e f^k_e \tag{1} \\ \text{s.t.} \quad & \sum_{e \in \delta^+(i)} f^k_e - \sum_{e \in \delta^-(i)} f^k_e = \begin{cases} d_k, & i = s_k, \\ -d_k, & i = t_k, \\ 0, & \text{otherwise}, \end{cases} && \forall i \in V, \; k \in K, \tag{2} \\ & \sum_{k \in K} f^k_e \leq u_e, && \forall e \in E, \\ & f^k_e \ge 0, && \forall e \in E,\; \forall k \in K. \tag{3} \end{align}\]

where \(\delta^+(i)\) and \(\delta^-(i)\) denote the outgoing and incoming edges of node \(i\), respectively. This formulation has \(O(|K||E|)\) variables and \(O(|K||V| + |E|)\) constraints.

2.2 Source-based formulation↩︎

Source-based formulations aggregate flows by source node. For each source \(s \in S\), let \(f^s_e\) denote the total flow on edge \(e\) originating at \(s\), and let \(d^s_i = \sum_{k \in K: s_k = s} d_k\) denote the total demand from \(s\) to node \(i\). The formulation is \[\begin{align} \min \quad & \sum_{s \in S} \sum_{e \in E} c_e f^s_e \tag{4} \\ \text{s.t.} \quad & \sum_{e \in \delta^+(i)} f^s_e - \sum_{e \in \delta^-(i)} f^s_e = \begin{cases} d^s_i & \text{if } i = s \\ -d_k & \text{if } i = t_k \forall k \in K: s_k = s \\ 0 & \text{otherwise} \end{cases} && \forall i \in V,\; \forall s \in S, \\ & \sum_{s \in S} f^s_e \leq u_e, && \forall e \in E, \\ & f^s_e \ge 0, && \forall e \in E,\; \forall s \in S. \tag{5} \end{align}\]

This model has \(O(|S||E|)\) variables and \(O(|S||V| + |E|)\) constraints, where \(|S|\) is the number of unique source nodes. When \(|S| \ll |K|\), this yields a significant reduction in model size compared to the edge-based formulation. The trade-off is that the commodity-level flow decomposition is lost and must be reconstructed in a post-processing step.

2.2.0.1 Remark (on reconstruction).

The source-aggregated flows can be disaggregated back into individual commodity flows through standard flow decomposition algorithms. For each source-sink pair \((s,t)\), we iteratively extract \(s\)\(t\) paths via augmenting-path search until the required demand is satisfied. This procedure yields a path-based decomposition with at most \(|E|\) paths per source-sink pair and runs in \(O(|E||V|)\) time per pair [1]. The resulting paths directly correspond to feasible commodity routings that respect capacity constraints [2].

2.3 Path-based formulation↩︎

Path-based formulations rely on a Dantzig–Wolfe decomposition of the edge-based formulation. For each commodity \(k \in K\), let \(\mathcal{P}_k\) be the set of all feasible paths from \(s_k\) to \(t_k\). Introduce variables \(x^k_p \geq 0\) denoting the flow of commodity \(k\) on path \(p \in \mathcal{P}_k\). The model is \[\begin{align} \min \quad & \sum_{k \in K} \sum_{p \in \mathcal{P}_k} c_p x^k_p \tag{6} \\ \text{s.t.} \quad & \sum_{p \in \mathcal{P}_k} x^k_p = d_k, && \forall k \in K, \tag{7} \\ & \sum_{k \in K} \sum_{p \in \mathcal{P}_k : e \in p} x^k_p \leq u_e, && \forall e \in E, \tag{8} \\ & x^k_p \geq 0, && \forall k \in K, \; p \in \mathcal{P}_k. \tag{9} \end{align}\] where \(c_p = \sum_{e \in p} c_e\) is the cost of path \(p\).

This formulation arises as a Dantzig–Wolfe decomposition of the edge-based model. Each commodity \(k\) defines a sub-problem whose feasible region is the set of all \(s_k\)\(t_k\) flows of value \(d_k\). The extreme points of this polytope correspond to simple \(s_k\)\(t_k\) paths. Thus, path variables \(x^k_p\) play the role of convex combination weights over extreme points in the decomposition [3], [4], [9], [10]. The formulation has \(O(|K|+|E|)\) constraints. The size of the path set \(|\mathcal{P}_k|\) is exponential in \(|V|\) and can be solved with column generation, see 3.

2.4 Tree-based formulation↩︎

The tree-based formulation is the equivalent of a Dantzig–Wolfe decomposition of the source-based formulation by representing flows as convex combinations of shortest-path trees rooted at each source. Instead of sending flow independently along paths, we aggregate commodities with the same source into tree structures that simultaneously provide feasible routes to all sinks of that source. This yields a compact master problem, since the number of trees required can be much smaller than the number of paths.

For each source \(s \in S\), let \(\mathcal{T}_s\) denote the set of trees rooted at \(s\) that contain all sinks of commodities with \(s_k = s\). We denote the sinks \(K_s = \{k \in K: s_k = s\}\). A decision variable \(x^s_\tau \geq 0\) is associated with each tree \(\tau \in \mathcal{T}_s\), indicating the fraction of flow routed through \(\tau\). Let \(e \in \tau\) be an edge in the tree \(\tau\) and let \(p_k\) be the path from \(s_k\) to \(t_k\) in the tree \(\tau\) for \(k \in K_s\). The model is

\[\begin{align} \min \quad & \sum_{s \in S} \sum_{\tau \in \mathcal{T}_s} c^\tau x^s_\tau \tag{10} \\ \text{s.t.} \quad & \sum_{\tau \in \mathcal{T}_s} x^s_\tau = 1, && \forall s \in S, \tag{11} \\ & \sum_{s \in S} \sum_{\tau \in \mathcal{T}_s:\, e \in \tau} \bar{f}^e_\tau x^s_\tau \leq u_e, && \forall e \in E, \tag{12} \\ & x^s_\tau \geq 0, && \forall s \in S,\; \tau \in \mathcal{T}_s. \tag{13} \end{align}\]

Here \(\bar{f}^e_\tau = \sum_{k \in K_s} \sum_{p_k \in \tau: e \in p_k} d_k\) is the total flow on edge \(e\) in tree \(\tau\), and \(c_\tau = \sum_{e \in \tau} \bar{f}^e_\tau c_e\) denotes the cost of tree \(\tau\). The formulation has \(O(|S|+|E|)\) constraints. The tree set \(|\mathcal{T}_s|\) is exponential in size.

3 Column Generation↩︎

Column generation is a powerful decomposition technique for solving large-scale linear programming problems that would be intractable to solve directly. The method has been extensively studied and successfully applied to many problem classes including vehicle routing, crew scheduling, and network flow problems [3], [4].

The approach decomposes the original problem into a restricted master problem (RMP) containing only a subset of variables, and one or more pricing sub-problems that generate new variables (columns) as needed. The algorithm iterates between solving the RMP to obtain dual values, and using these to find columns with negative reduced cost in the pricing problems. When no such columns exist, the current solution is optimal.

For the MCF problem, we apply column generation to both the path-based and tree-based formulations described above. The key difference lies in the structure of the pricing problems - finding shortest paths versus shortest path trees. Below we detail the specific column generation procedures for each formulation, followed by implementation techniques that exploit the network structure to accelerate convergence.

3.1 Path generation↩︎

For the path-based formulation the column generation procedure alternates between:

  • Restricted master problem (RMP): solve 69 using only a subset of paths from \(\cup_{k \in K} \mathcal{P}_k\).

  • Pricing sub-problem: for each commodity \(k\), compute a shortest \(s_k\)\(t_k\) path with respect to the reduced costs given by the dual variables of the RMP. If a path of negative reduced cost is found, it is added as a new column.

Let \(\pi_k\) denote the dual variable associated with the demand constraint of commodity \(k\) 7 , and let \(\mu_e\) denote the dual variable associated with the capacity constraint on edge \(e\) 8 . Then the reduced cost of a path \(p \in \mathcal{P}_k\) is \[\begin{align} \bar{c}_p = \sum_{e \in p} \big(c_e - \mu_e\big) - \pi_k. \label{eq:path-reduced-cost} \end{align}\tag{14}\] Thus, finding a column of negative reduced cost for commodity \(k\) amounts to computing an \(s_k\)\(t_k\) path that minimizes \(\sum_{e \in p} (c_e - \mu_e)\) and then subtracting the demand dual \(\pi_k\). Since all edge weights \(c_e \geq 0\) and \(\mu_e \leq 0\), the modified costs \(c_e - \mu_e \geq 0\), so the pricing problem reduces to a shortest path problem in a weighted graph.

This allows the use of efficient single-source shortest path algorithms such as Dijkstra’s method. In particular, when many commodities share the same source \(s\), a single run of Dijkstra’s algorithm yields shortest paths to all sinks \(t_k\) with \(s_k = s\), producing one pricing candidate for each commodity simultaneously. This reduces the number of Dijkstra runs per iteration from \(|K|\) to \(|S|\).

3.2 Tree generation↩︎

For the tree-based formulation the procedure is:

  • Restricted master problem (RMP): solve 10 13 using only a subset of trees from \(\cup_{k \in K} \mathcal{T}_s\).

  • Pricing sub-problem: for each source \(s \in S\), compute a shortest path tree rooted at \(s\) that covers all sinks \(t_k\) where \(s_k = s\).

Let \(\pi_s\) denote the dual variable of 11 and \(\mu_e\) the dual variable of 12 . The reduced cost of a tree \(\tau \in \mathcal{T}_s\) is \[\begin{align} \bar{c}_\tau = \sum_{e \in \tau} \bar{f}^e_\tau \big(c_e - \mu_e\big) - \pi_s. \label{eq:tree-reduced-cost} \end{align}\tag{15}\] Hence, pricing reduces to finding a minimum-cost tree rooted at \(s\) with respect to modified edge costs \(c^s_e - \mu_e\). Since all edge weights are non-negative, this corresponds to computing a shortest-path tree (SPT). A single-source shortest path computation from source \(s\) yields a set of shortest paths \(p_k\) to all sinks \(t_k\) with \(s_k = s\) where the unique set of edges \(\tau = \cap_{e \in p_k, k \in K_s} e\) is the tree. The cost of the paths corresponds to sending one unit of flow from \(s\) to all sinks \(t_k\) with \(s_k = s\). The edge flow values \(\bar{f}^e_\tau\) are calculated in a post-processing step as described in 2.4 which allows us to calculate the reduced cost of the tree.

Observe that every path in the tree is the minimum reduced cost path from \(s\) to any sink \(t_k\) with \(s_k = s\). Hence, the tree is the minimum reduced cost tree even when scaled by demands \(d_k\)

3.3 Implementation details↩︎

3.3.0.1 Bounded single-source pricing.

For a fixed source \(s\), define \(\Pi_s := \max_{k:\,s_k=s} \pi_k\). During a single run of Dijkstra on adjusted edge weights \(\tilde{c}_e := c_e - \mu_e\), extract-min keys are tentative distances \(d(v)\) from \(s\). If the current minimum key satisfies \(d(v) \ge \Pi_s\), then for every remaining destination \(t_k\) with \(s_k=s\) we have \[\underbrace{d(t_k)}_{\text{shortest}} - \pi_k \;\ge\; d(v) - \pi_k \;\ge\; \Pi_s - \pi_k \;\ge\; 0,\] so no path with negative reduced cost exists and the pricing for source \(s\) can terminate early.

3.3.0.2 A* with reverse distance bounds.

Let \(h(v)\) be an admissible and consistent lower bound on the remaining distance from \(v\) to any destination \(t_k\) with \(s_k=s\) under the adjusted costs \(\tilde{c}_e=c_e-\mu_e\) (e.g., obtained from a reverse single-source shortest path). Run A* on \(\tilde{c}\) with keys \(f(v)=g(v)+h(v)\), where \(g(v)\) is the tentative distance from \(s\). The same global stop test as above applies: if \(\min_v f(v)\ge \Pi_s\), no path of negative reduced cost exists for source \(s\). Exact reverse distances tighten \(h\) but may be memory-intensive to store for large graphs when \(|S|\) is large.

3.3.0.3 Precomputing reverse multi-target bounds.

We compute a lower bound \(h(v)\) from any node \(v\) to any destination \(t_k\) with \(s_k \in S\). Running multi-source Dijkstra on the reverse graph yields \[h(v)\;=\;\min_{k:\,s_k \in S}\;\text{dist}_{\text{rev}}(v,t_k),\] which is an admissible and consistent heuristic for all commodities with source \(s\). This provides lower bounds that are worse than if we considered a single fixed source \(s\). However, in this scenario only \(|V|\) lower bounds are stored which is much more memory efficient.

3.3.0.4 Lazy addition of capacity constraints.

In the RMP, the demand constraints, 7 and 11 respectively, are always enforced, while the capacity constraints, 8 and 12 respectively, constraints can be introduced lazily. At optimality, only the binding edge constraints matter, since the dual multiplier of a non-binding edge constraint is zero. Hence, adding all \(|E|\) capacity constraints upfront is unnecessary. Instead, one begins with a subset (potentially empty) and monitors edge utilizations during column capacity constraints upfront is unnecessary. Instead, one begins with a subset (potentially empty) and monitors edge utilizations during column generation. Whenever an edge \(e\) becomes violated, the corresponding capacity constraint is added to the RMP (row generation). This significantly reduces the initial model size and is essential for scalability on large networks [5], [6].

This mechanism is closely related to the active set strategy proposed in [6], where only those arc constraints that are likely to be saturated are kept in the RMP. Non-congested arcs are temporarily ignored until they become active, at which point their capacity constraints are inserted. Such strategies exploit the empirical fact that in optimal solutions of large multi-commodity problems, only a small fraction of edges are congested. By combining column generation (to handle the exponential number of paths) with active set row management (to handle the potentially huge set of edge capacities), the master problem is kept compact in both dimensions. This dual stabilization on rows and columns is crucial for solving instances with millions of commodities and tens of thousands of edges.

3.3.0.5 Balancing master and pricing problems.

The computational effort between solving the master problem and pricing problem can vary significantly based on instance characteristics and network structure. We adapt our strategy accordingly.

When the master RMP is computationally easier:

  • Prioritize re-optimizing the master problem whenever lazy cuts need to be added, before attempting any pricing

  • Apply the pricing filter technique from [11] - only price commodities whose previously generated columns use edges from newly added capacity constraints

  • Only remove the filter if fewer than \(\epsilon\) columns are found

  • In the final iteration, solve all pricing problems to prove optimality

When pricing is computationally easier:

  • In each iteration, attempt both lazy cut separation and pricing problem solutions

  • Continue solving pricing problems until finding \(N\) columns with negative reduced cost

  • If fewer than \(N\) columns are found, solve all remaining pricing problems to obtain valid lower bounds via reduced cost calculations

3.3.0.6 Slack variables.

To improve convergence we avoid infeasibility of the master problem by adding slack variables. When the number of demand constraints are less than the number of edges we add a slack variable to demand constraints 7 and 11 respectively. Otherwise we add a slack variable to edge capacity constraints 8 and 12 . The cost of the slack variables is a sufficiently large value, e.g., sum of all edges for the path-based formulation or sum of all edges times the sum of demand for tree-based formulation.

Note, that since capacity constraints are added dynamically, adding edge slack may result in small infeasibilities until all relevant constraints are added. In this case restore feasibility through a phase 1 by adding artificial variables. This can mostly be avoided by initially adding a column for each pricing problem. We do this when edge slack variables are used.

4 Experiments↩︎

We evaluate the performance of the path-based and tree-based formulations on a diverse set of instances. For comparison, we compare against a commercial LP solver (MOSEK) using the source-based formulation.

All experiments were performed on a machine with an AMD Ryzen 9 3950X 16-Core Processor and 128GB RAM running Ubuntu 24.04. The algorithms were implemented in C++. We used MOSEK 11.0.27 interior point method without crossover as the LP solver for both the full formulation and the restricted master problems. We use up to 32 threads.

Throughout the experiments we set a relative optimality tolerance of 1e-4 and use a timeout of 7200 seconds. For grid, planar and transportation networks the pricing problem is considered "easy" and for intermodal transport networks the master problem is considered "easy".

4.1 Instances↩︎

We consider 4 families of instances, grid, planar, transportation networks, and intermodal transport networks. The grid and planar instances are from the MCF instances benchmark [12]. The transportation networks are from the transportation networks benchmark [13]. The intermodal transport networks are from the intermodal transport networks benchmark [11].

The provided transportation networks are non-linear maximum flow MFC problems, and as in [6] we convert them to min-cost MCF problems. The data is converted such that "free flow time" is used as the unit edge cost, and we divide the demands of a problem by a coefficient. As previously proposed, we increase the coefficient until the problems becomes feasible. 3 in 6 shows the coefficients used for the transportation network instances.

[6] addresses these problems (with less accuracy 1e-3 vs 1e-4 relative). However, we were not able to reproduce their result using their coefficient values with the current version of transportation network instances [13]. Hence, we have slightly different coefficients and optimal objective values.

The intermodal transport networks are derived from the Munich public transport network. We have used the code of [11] to generate the instances with seed 0. Preliminary tests showed that instances with seeds 1 to 9 were computationally equivalent to seed 0, so due to lack of difference in behaviors we only consider seed 0. Additionally we have filtered out any trips that were not connected in the generated network.

1 shows the characteristics of the instance families. Grid and planar are relatively small but have many commodity sources per vertices. The transportation networks have a large number of commodities with many shared sources. The intermodal transport networks are large in number of vertices and edges but have a relatively small number of commodities with no shared sources. See the 4 in 6 for a full overview over instance characteristics.

Table 1: Instance family characteristics (mean values). Number of variables, constraints, and non-zeroes are based on the source-based formulation.
Family Nodes Edges Commodities Sources Variables Constraints Non-zeros
Grid 523 2,011 4,967 463 1.6e+06 4.1e+05 4.7e+06
Planar 607 3,230 13,920 607 4.8e+06 9.2e+05 1.4e+07
Transportation 10,833 27,108 943,682 1,114 5.0e+07 2.0e+07 1.5e+08
Intermodal 177,179 1,893,620 22,195 22,195 6.4e+10 4.6e+09 1.9e+11

4.2 Performance↩︎

The performance profile of the path-based, tree-based formulations, and the source-based formulation is shown in 1. The tree-based formulation outperforms the path-based formulation on most instances, the exception being planar2500. Both column generation formulations solves all instances (43) within the time limit compared to the source-based formulation (26) that cannot solve the largest instances due to memory limitations. It is noteworthy, that the source-based formulation with MOSEK did not timeout on any of the instances if it was possible to construct the instance in memory. Detailed results for the instances are shown in 5 in 6.

Figure 1: Performance profile of the path-based and tree-based formulations against the source-based formulation

The cactus plot of the path-based, tree-based formulations, and the source-based formulation is shown in 2 and 3 for runtime and memory respectively. From the plots it is seen that both column generation algorithms are on par with each other, with the tree-based formulation being slightly faster and using a little less memory. The source-based formulation is significantly slower and as expected it uses much more memory.

2 summarize the geometric mean results for instances for runtime and memory. Runtime means are shifted by 10 seconds to reduce the impact of very small runtimes. Memory means are not shifted. From the scaled results it is clear that both column generation formulations outperform the source-based formulation both in runtime and memory.

Figure 2: Cactus plot with runtime performance
Figure 3: Cactus plot with memory performance
Table 2: Shifted geometric mean runtime and memory results
Solver Time Memory Solved of 43
2-3 (lr)4-5 Unscaled Scaled Unscaled Scaled
Tree 15.0 1.00 0.7 1.00 43
Path 21.2 1.42 0.9 1.39 43
Source 313.4 20.96 12.6 18.57 26

4.2.0.1 Comparison to other work.

We compare our implementation against several recent approaches from the literature. For the planar2500 instance, which was the largest instance solved in [5], our implementation is approximately 7 times faster, solving it in under 900 seconds compared to their 6169 seconds on the same machine.

The transportation network instances from [6] include challenging cases like ChicagoRegional and Philadelphia that could not be solved to an accuracy of 1e-4 in their work. While our problem coefficients differ slightly, we successfully solve all instances, including the larger Sydney network, to 1e-4 accuracy in under 700 seconds.

For the intermodal transport networks, we compare against [11] who solved their largest instance (SBT-56295) in 2856 seconds on an Intel Core i9-9900 (3.1 GHz). Accounting for processor speed differences based on benchmark data, their solution time normalizes to approximately 1203 seconds on our AMD Ryzen 9 3950X (3.5 GHz). In comparison, our implementation solves the same instance in just 115 seconds on average - more than a 10-fold improvement in performance.

4.3 Path and tree-based formulations comparison↩︎

4 and 5 show the runtime and memory usage on the y-axis and size of the instance (in the number of non-zeroes in the LP) on the x-axis. The source-based formulation is included for completeness. The transportation network instances are located in the middle of the plots and the intermodal transport network instances are located most right in the plots.

It is seen that the tree- and path-based behave identical on the intermodal transport network instances. This is to be expected since all commodities have a unique source and destination with a demand of 1, hence the formulations are equivalent as all shortest paths trees are paths.

For the transportation network instances the tree-based formulation is faster than the path-based formulation. The plots indicate that high memory usage is correlated with high runtime, and the tree-based formulation is expected to use less memory than the path-based formulation due to fewer columns in the master problem.

Figure 4: Runtime vs size in number of non-zeroes of the source-based formulation
Figure 5: Memory vs size in number of non-zeroes of the source-based formulation

The scatter plot of the path-based and tree-based formulations is shown in 6. The formulations behave identical on the grid, planar, and intermodal transport network instances, except for the instance planar2500. The tree-based formulation is faster on the transportation network instances that have many shared sources.

Figure 6: Scatter plot of the path-based and tree-based formulations against each other

The relation between the number of commodities and the number of shared sources is explored via the heatmap in 7. It shows the relations between the number of commodities and the fraction of shared sources and is color coded according to the speed-up. A large speed-up (yellow) indicates that the tree-based formulation is faster than the path-based formulation. It is seen that the tree-based formulation is faster than the path-based formulation for instances with a large number of commodities and a small fraction of shared sources. This is expected as the number of demand constraints in the master problem is only \(O(|S|)\) for the tree-based formulation and \(O(|K|)\) for the path-based formulation.

Figure 7: Heatmap of the path-based and tree-based formulations against each other

We observe that the path-based formulation is faster on the largest planar instances. We expect this to be due to too slow column generation convergence and many pricing iterations for the tree-based formulation. The tree-based formulation generates at most \(|S|\) columns per iteration, while the path-based formulation generates up to \(|K|\) columns per iteration. This may stabilize the column generation process faster for the path-based formulation even if each master problem iteration is slower.

5 Conclusion↩︎

In this paper, we introduced a novel tree-based formulation for the multi-commodity flow problem. Rather than decomposing by individual commodity paths, our key innovation is to aggregate flows by shared sources and represent them using trees. This reduces the master problem size from O(|K|) to O(|S|) constraints, where |S| << |K| in many practical applications.

We demonstrated the advantages of this formulation through extensive computational experiments comparing it against traditional path-based decomposition and direct LP solving with MOSEK across diverse instance types including grid, planar, transportation, and intermodal networks. Our implementation using column generation showed that:

  • The tree-based formulation achieves up to 5x reduction in memory usage compared to path-based approaches by maintaining a more compact master problem

  • On large instances with many shared sources, the tree-based method scales significantly better, solving problems with millions of commodities with 5-30x speed-ups over the path-based formulation

  • Both decomposition approaches substantially outperform direct LP solving, with the tree-based formulation showing particular advantages on transportation networks where source sharing is common

The results establish that our tree-based formulation represents a significant advance in solving large-scale multi-commodity flow problems. By exploiting the natural structure of shared sources, it overcomes key scalability limitations of existing methods. The dramatic reduction in memory requirements and improved solution times make it possible to tackle problem sizes that were beyond the reach of previous approaches.

This innovation is particularly impactful for real-world applications in transportation and logistics networks where multiple commodities frequently share origins. The ability to efficiently handle millions of commodities while maintaining individual demand constraints opens new possibilities for detailed network optimization at scale.

Several promising directions could further enhance this approach. First-order optimization methods and GPU acceleration could improve master problem solve times. The pricing problem could benefit from edge filtering, path generation heuristics, and warm-starting techniques. Strategic selection of source nodes through pattern analysis and partitioning could also yield additional performance gains. These extensions could help realize the full potential of the tree-based paradigm for extremely large-scale network optimization.

6 Appendix↩︎

6.1 Instance details↩︎

3 shows the coefficients used for the generation of min-cost transportation network instances. 4 shows the instance details.

Table 3: Coefficients used for the generation of min-cost transportation network instances
Instance Coefficient
Transportation
Austin 6.0
Barcelona 5050.0
BerlinCenter 0.5
Birmingham 0.9
ChicagoRegional 4.1
ChicagoSketch 2.4
Philadelphia 7.0
Sydney 1.9
Winnipeg 2000.0
Table 4: Instance details. Number of variables, constraints, and non-zeroes are based on the source-based formulation.
Instance Nodes Edges Commodities Sources Variables Constraints Non-zeros
Grid
grid1 25 80 50 22 1.8e+03 6.3e+02 5.4e+03
grid2 25 80 100 25 2.0e+03 7.0e+02 6.1e+03
grid3 100 360 50 40 1.4e+04 4.4e+03 4.3e+04
grid4 100 360 100 64 2.3e+04 6.8e+03 6.9e+04
grid5 225 840 100 82 6.9e+04 1.9e+04 2.1e+05
grid6 225 840 200 138 1.2e+05 3.2e+04 3.5e+05
grid7 400 1,520 400 251 3.8e+05 1.0e+05 1.1e+06
grid8 625 2,400 500 356 8.5e+05 2.2e+05 2.6e+06
grid9 625 2,400 1,000 495 1.2e+06 3.1e+05 3.6e+06
grid10 625 2,400 2,000 603 1.4e+06 3.8e+05 4.3e+06
grid11 625 2,400 4,000 625 1.5e+06 3.9e+05 4.5e+06
grid12 900 3,480 6,000 898 3.1e+06 8.1e+05 9.4e+06
grid13 900 3,480 12,000 900 3.1e+06 8.1e+05 9.4e+06
grid14 1,225 4,760 16,000 1,225 5.8e+06 1.5e+06 1.8e+07
grid15 1,225 4,760 32,000 1,225 5.8e+06 1.5e+06 1.8e+07
planar30 30 150 92 29 4.4e+03 1.0e+03 1.3e+04
planar80 80 440 543 80 3.5e+04 6.8e+03 1.1e+05
planar100 100 532 1,085 100 5.3e+04 1.1e+04 1.6e+05
planar150 150 850 2,239 150 1.3e+05 2.3e+04 3.8e+05
planar300 300 1,680 3,584 300 5.0e+05 9.2e+04 1.5e+06
planar500 500 2,842 3,525 500 1.4e+06 2.5e+05 4.3e+06
planar800 800 4,388 12,756 800 3.5e+06 6.4e+05 1.1e+07
planar1000 1,000 5,200 20,026 1,000 5.2e+06 1.0e+06 1.6e+07
planar2500 2,500 12,990 81,430 2,500 3.2e+07 6.3e+06 9.8e+07
Austin 7,388 18,956 1,080,603 1,117 2.1e+07 8.3e+06 6.5e+07
Barcelona 1,020 2,522 7,922 108 2.7e+05 1.1e+05 8.3e+05
BerlinCenter 12,981 28,370 49,688 862 2.4e+07 1.1e+07 7.3e+07
Birmingham 14,639 33,937 470,805 898 3.0e+07 1.3e+07 9.2e+07
ChicagoRegional 12,982 39,018 2,296,227 1,768 6.9e+07 2.3e+07 2.1e+08
ChicagoSketch 933 2,950 93,135 386 1.1e+06 3.6e+05 3.5e+06
Philadelphia 13,389 40,003 1,149,795 1,489 6.0e+07 2.0e+07 1.8e+08
Sydney 33,113 75,379 3,340,619 3,257 2.5e+08 1.1e+08 7.4e+08
Winnipeg 1,052 2,836 4,344 138 3.9e+05 1.5e+05 1.2e+06
BUS-2632 119,857 397,233 2,628 2,628 1.0e+09 3.2e+08 3.1e+09
BUS-7896 130,367 710,464 7,883 7,883 5.6e+09 1.0e+09 1.7e+10
BUS-13160 140,871 1,022,654 13,135 13,135 1.3e+10 1.9e+09 4.0e+10
BUS-18424 151,377 1,336,774 18,388 18,388 2.5e+10 2.8e+09 7.4e+10
BUS-23688 161,887 1,651,215 23,643 23,643 3.9e+10 3.8e+09 1.2e+11
SBT-6255 163,479 809,333 6,251 6,251 5.1e+09 1.0e+09 1.5e+10
SBT-18765 188,467 1,782,451 18,745 18,745 3.3e+10 3.5e+09 1.0e+11
SBT-31275 213,481 2,760,176 31,252 31,252 8.6e+10 6.7e+09 2.6e+11
SBT-43785 238,491 3,741,011 43,757 43,757 1.6e+11 1.0e+10 4.9e+11
SBT-56295 263,515 4,724,888 56,269 56,269 2.7e+11 1.5e+10 8.0e+11

6.2 Runtime details↩︎

5 shows the runtime details for the instances.

Table 5: Detailed runtime comparison
Instance Tree Path Source Objective
2-3 (lr)4-5 (lr)6-7 Time(s) Mem(GB) Time(s) Mem(GB) Time(s) Mem(GB)
Grid
grid1 0.0 0.0 0.0 0.0 0.0 0.6 8.2732e+05
grid2 0.0 0.1 0.0 0.0 0.0 0.6 1.7054e+06
grid3 0.0 0.0 0.0 0.0 0.3 0.8 1.5246e+06
grid4 0.0 0.1 0.0 0.0 0.5 0.8 3.0317e+06
grid5 0.0 0.1 0.1 0.1 1.8 1.1 5.0497e+06
grid6 0.3 0.1 0.2 0.1 2.6 1.4 1.0402e+07
grid7 0.4 0.1 0.3 0.1 12.1 2.8 2.5864e+07
grid8 1.6 0.2 1.3 0.2 101.3 2.9 4.1711e+07
grid9 3.4 0.2 2.8 0.2 46.8 4.0 8.2653e+07
grid10 4.3 0.3 4.1 0.3 53.3 4.4 1.6411e+08
grid11 3.9 0.2 4.4 0.3 58.1 4.4 3.2926e+08
grid12 4.4 0.2 4.2 0.3 175.1 8.1 5.7719e+08
grid13 10.2 0.3 12.4 0.4 200.2 8.8 1.1593e+09
grid14 5.5 0.3 8.2 0.4 457.8 16.7 1.8027e+09
grid15 10.6 0.4 18.8 0.6 531.2 15.4 3.5935e+09
planar30 0.0 0.0 0.0 0.0 0.1 0.6 4.4351e+07
planar80 0.4 0.1 0.3 0.1 0.6 0.8 1.8244e+08
planar100 0.3 0.1 0.5 0.1 1.0 1.0 2.3134e+08
planar150 4.4 0.2 4.1 0.3 4.9 1.4 5.4809e+08
planar300 1.6 0.2 1.9 0.2 26.0 3.3 6.8998e+08
planar500 0.7 0.2 1.0 0.2 65.3 4.5 4.8198e+08
planar800 5.5 0.3 8.6 0.4 228.0 8.5 1.1674e+09
planar1000 75.3 0.6 53.5 0.9 426.2 25.7 3.4496e+09
planar2500 5894.7 10.0 866.9 19.3 - m 1.2662e+10
Austin 670.6 14.5 4967.0 48.1 - m 5.4329e+06
Barcelona 0.1 0.1 2.5 0.2 24.8 2.7 8.8588e+01
BerlinCenter 10.3 0.5 19.0 0.7 - m 2.7529e+07
Birmingham 196.2 3.3 399.3 16.7 - m 2.1565e+05
ChicagoRegional 68.2 6.1 981.8 33.4 - m 5.5131e+06
ChicagoSketch 0.5 0.2 3.1 0.6 34.4 4.1 6.7059e+06
Philadelphia 380.6 7.4 1471.2 17.3 - m 2.5578e+07
Sydney 66.4 11.1 2520.9 59.7 - m 4.9653e+06
Winnipeg 0.0 0.1 2.9 0.2 18.9 3.1 2.3767e+02
BUS-2632 1.5 0.7 1.6 0.7 - m 7.1027e+04
BUS-7896 4.7 2.3 4.3 2.3 - m 2.1060e+05
BUS-13160 8.4 4.6 8.2 4.6 - m 3.4836e+05
BUS-18424 12.9 7.8 13.7 7.8 - m 4.8503e+05
BUS-23688 20.9 11.7 20.5 11.7 - m 6.2488e+05
SBT-6255 3.7 2.2 3.6 2.2 - m 1.6061e+05
SBT-18765 15.4 10.4 15.5 10.4 - m 4.8111e+05
SBT-31275 38.0 24.5 37.3 24.5 - m 8.0133e+05
SBT-43785 68.5 44.5 68.5 44.5 - m 1.1236e+06
SBT-56295 114.3 70.6 112.6 70.6 - m 1.4479e+06

References↩︎

[1]
Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice Hall, Englewood Cliffs, NJ, 1993.
[2]
Khodakaram Salimifard and Sara Bigharaz. The multicommodity network flow problem: state of the art classification, applications, and solution methods. Operational Research, 22:1–47, 2022.
[3]
Jacques Desrosiers, Marco Lübbecke, Guy Desaulniers, and Jean Bertrand Gauthier. Branch-and-price. Les Cahiers du GERAD G-2024-36, Groupe d’études et de recherche en analyse des décisions, GERAD, Montréal QC H3T 2A7, Canada, June 2024.
[4]
Eduardo Uchoa, Artur Pessoa, and Lorenza Moreno. : Advanced branch-cut-and-price algorithms (Part I). Technical Report L-2024-3, Cadernos do LOGIS-UFF, Universidade Federal Fluminense, Engenharia de Produção, August 2024.
[5]
Jacek Gondzio, Pablo González-Brevis, and Pedro Munari. Large-scale optimization with the primal–dual column generation method. Mathematical Programming Computation, 8:47–82, 2016.
[6]
Frédéric Babonneau, Olivier du Merle, and Jean-Philippe Vial. Solving large-scale linear multicommodity flow problems with an active set strategy and proximal-accpm. Operations Research, 54(1):184–197, 2006.
[7]
Fangzhao Zhang and Stephen Boyd. Solving large multicommodity network flow problems on gpus. arXiv preprint, 2025.
[8]
Ping Yin, Steven Diamond, Bill Lin, and Stephen Boyd. Network optimization for unified packet and circuit switched networks. Optimization and Engineering, 21:159–180, 2020.
[9]
Cynthia Barnhart, Charles A. Hane, Ellis L. Johnson, and George Sigismondi. A column generation and partitioning approach for multi-commodity flow problems. Telecommunication Systems, 3:239–258, 1994.
[10]
Cynthia Barnhart, Charles A. Hane, and Patrick H. Vance. Using branch-and-price-and-cut to solve origin-destination integer multicommodity flow problems. Operations Research, 48(2):318–326, 2000.
[11]
Benedikt Lienkamp and Maximilian Schiffer. Column generation for solving large scale multi-commodity flow problems for passenger transportation. European Journal of Operational Research, 314(2):703–717, 2024.
[12]
University of Pisa CommadLab. Multicommodity flow problems, 2025. Accessed 2025-08-27, https://commalab.di.unipi.it/datasets/mmcf/.
[13]
Transportation Networks for Research Core Team. Transportation networks for research, 2025. GitHub repository. commit #d1639b4, Accessed 2025-08-27, https://github.com/bstabler/TransportationNetworks.