Abstract

We derive closed-form extensions of Riccati’s recursions (both sequential [1] and parallel [2]) for solving dual-regularized LQR problems. We show how these methods can be used to solve general constrained, non-convex, discrete-time optimal control problems via a regularized interior point method, while guaranteeing that each step is a descent direction of an Augmented Barrier-Lagrangian merit function. We provide MIT-licensed implementations of our methods in C++ and JAX.

Optimal Control, Numerical Optimization, Interior Point Method

49M37, 90C51, 93B45

1 Introduction↩︎

We start by introducing the core concepts motivating our work.

1.1 Optimal Control↩︎

A discrete-time optimal control problem is an optimization problem of the form

\[\begin{align} &\min_{x_0, u_0, \ldots, x_N} &\sum \limits_{i=0}^{N-1} f_i(x_i, u_i) + f_N(x_N) \\ &s.t. & x_0 = s_0, \\ & & \forall i \in \lbrace 0, \ldots, N-1 \rbrace, x_{i+1} = d_i(x_i, u_i), \\ & & \forall i \in \lbrace 0, \ldots, N-1 \rbrace, c_i(x_i, u_i) = 0, \\ & & \forall i \in \lbrace 0, \ldots, N-1 \rbrace, g_i(x_i, u_i) \leq 0, \\ & & c_N(x_N) = 0, \\ & & g_N(x_N) \leq 0 . \end{align}\]

The variables \(x_i, u_i\) represent the states and controls; the functions \(d_i, f_i, c_i, g_i\) are the dynamics, costs, equality constraints, and inequality constraints (respectively); \(s_0\) is the fixed initial state; \(N\) is the number of stages.

1.2 The Regularized Interior Point Method↩︎

The regularized interior point method (IPM) is a way of solving optimization problems of the form

\[\min\limits_{x} f(x) \qquad s.t. \quad c(x) = 0 \wedge g(x) + s = 0 \wedge s \geq 0,\]

where the functions \(f, c, g\) are required to be continuously differentiable.

We start by defining the Barrier-Lagrangian

\[\mathcal{L}(x, s, y, z; \mu) = f(x) - \mu \sum \limits_{i} \log(s_i) + y^T c(x) + z^T (g(x) + s)\]

and the Augmented Barrier-Lagrangian

\[\mathcal{A}(x, s, y, z; \mu, \eta) = \mathcal{L}(x, s, y, z; \mu) + \frac{\eta}{2} (\lVert c(x) \rVert^2 + \lVert g(x) + s\rVert^2).\]

At each iteration of the regularized interior point method, the search direction \((\Delta x, \Delta s, \Delta y, \Delta z)\) is computed by solving the linear system

\[\begin{align} \label{ipm-4x4-newton-kkt} \begin{bmatrix} P & 0 & C^T & G^T \\ 0 & S^{-1} Z & 0 & I \\ C & 0 & -\frac{1}{\eta} I & 0 \\ G & I & 0 & -\frac{1}{\eta} I \end{bmatrix} \begin{bmatrix} \Delta x \\ \Delta s \\ \Delta y \\ \Delta z \end{bmatrix} = - \nabla \mathcal{L}(x, s, y, z; \mu), \end{align}\tag{1}\]

where \(P\) is a positive-definite approximation of \(\nabla^2_{xx} \mathcal{L}(x, s, y, z; \mu)\), \(C = J(c)(x)\), \(G = J(g)(x)\), and \(S, Z\) denote the diagonal matrices with entries \(s, z\) respectively.

We call \(x, s\) the primal variables and \(y, z\) the dual variables. As usual, we require that \(s, z > 0\) always holds. Below, we will show that primal search direction determined by the regularized interior point method is guaranteed to be a descent direction of \(\mathcal{A}(\cdot, \cdot, y, z; \mu, \eta)\) at \((x, s)\) unless the primal variables have already converged.

Lemma 1. The linear system \[\begin{align} \begin{bmatrix} P & 0 & C^T & G^T \\ 0 & S^{-1} Z & 0 & I \\ C & 0 & -\frac{1}{\eta} I & 0 \\ G & I & 0 & -\frac{1}{\eta} I \end{bmatrix} \begin{bmatrix} \Delta x \\ \Delta s \\ \Delta y - \eta c(x) \\ \Delta z - \eta (g(x) + s) \end{bmatrix} = - \begin{bmatrix} \nabla_x \mathcal{A}(x, s, y, z; \mu, \eta) \\ \nabla_s \mathcal{A}(x, s, y, z; \mu, \eta) \\ 0 \\ 0 \end{bmatrix} \end{align}\] is equivalent to 1 .

Below, we use \(D( \cdot ; \cdot )\) to represent the directional derivative operator.

Theorem 1. When \(\Delta x \neq 0\) or \(\Delta s \neq 0\), \(D(\mathcal{A}(\cdot, \cdot, y, z; \mu, \eta); (\Delta x, \Delta s))(x, s) < 0\).

Proof. \[\begin{align} & D(\mathcal{A}(\cdot, \cdot, y, z; \mu, \eta); (\Delta x, \Delta s)) (x, s) \\ = &\begin{bmatrix} \Delta x^T & \Delta s^T \end{bmatrix} \begin{bmatrix} \nabla_x \mathcal{A}(x, s, y, z; \mu, \eta) \\ \nabla_s \mathcal{A}(x, s, y, z; \mu, \eta) \\ \end{bmatrix} = \begin{bmatrix} \Delta x^T & \Delta s^T & 0 & 0 \end{bmatrix} \begin{bmatrix} \nabla_x \mathcal{A}(x, s, y, z; \mu, \eta) \\ \nabla_s \mathcal{A}(x, s, y, z; \mu, \eta) \\ 0 \\ 0 \end{bmatrix} \\ = &- \begin{bmatrix} \Delta x^T & \Delta s^T & 0 & 0 \end{bmatrix} \begin{bmatrix} P & 0 & C^T & G^T \\ 0 & S^{-1} Z & 0 & I \\ C & 0 & -\frac{1}{\eta} I & 0 \\ G & I & 0 & -\frac{1}{\eta} I \end{bmatrix} \begin{bmatrix} \Delta x \\ \Delta s \\ \Delta y - \eta c(x) \\ \Delta z - \eta (g(x) + s) \end{bmatrix} \\ = &- \begin{bmatrix} \Delta x^T & \Delta s^T \end{bmatrix} \begin{bmatrix} P & 0 \\ 0 & S^{-1} Z \end{bmatrix} \begin{bmatrix} \Delta x \\ \Delta s \end{bmatrix} - \begin{bmatrix} \Delta x^T & \Delta s^T \end{bmatrix} \begin{bmatrix} C^T & G^T \\ 0 & I \end{bmatrix} \begin{bmatrix} \Delta y - \eta c(x) \\ \Delta z - \eta (g(x) + s) \end{bmatrix} \\ = &- \begin{bmatrix} \Delta x^T & \Delta s^T \end{bmatrix} \begin{bmatrix} P & 0 \\ 0 & S^{-1} Z \end{bmatrix} \begin{bmatrix} \Delta x \\ \Delta s \end{bmatrix} - \begin{bmatrix} C \Delta x \\ G \Delta x + \Delta s \end{bmatrix}^T \begin{bmatrix} \eta (C \Delta x) \\ \eta (G \Delta x + \Delta s) \end{bmatrix} \\ = &- \begin{bmatrix} \Delta x^T & \Delta s^T \end{bmatrix} \begin{bmatrix} P & 0 \\ 0 & S^{-1} Z \end{bmatrix} \begin{bmatrix} \Delta x \\ \Delta s \end{bmatrix} - \eta (\lVert C \Delta x \rVert^2 + \lVert G \Delta x + \Delta s \rVert^2) < 0 \end{align}\] ◻

Thus, the Augmented Barrier-Lagrangian \(\mathcal{A}(\cdot, \cdot, y, z; \mu, \eta)\) can be used as the merit function for a line search over the primal variables \((x, s)\).

Finally, note that the variable \(\Delta s\) can be eliminated from 1 via \[\Delta s = -Z^{-1} S (\Delta z - \mu S^{-1} e + z) = -Z^{-1} S \Delta z + \mu Z^{-1} e - s,\] where \(e\) represents the all-\(1\) vector, resulting in the linear system

\[\begin{align} \label{ipm-3x3-newton-kkt} \begin{bmatrix} P & C^T & G^T \\ C & -\frac{1}{\eta} I & 0 \\ G & 0 & -(W + \frac{1}{\eta} I) \end{bmatrix} \begin{bmatrix} \Delta x \\ \Delta y \\ \Delta z \end{bmatrix} = - \begin{bmatrix} \nabla_x \mathcal{L}(x, s, y, z; \mu, \eta) \\ c(x) \\ g(x) + \mu Z^{-1} e \end{bmatrix}, \end{align}\tag{2}\] where \(W = Z^{-1} S\).

1.3 Applying the regularized IPM to optimal control↩︎

As shown in 1.2, the line search directions used in the regularized interior point method are computed by solving a linear system of the form

\[\begin{align} \begin{bmatrix} P & C^T & G^T \\ C & -\frac{1}{\eta} I & 0 \\ G & 0 & -W - \frac{1}{\eta} I \end{bmatrix} \begin{bmatrix} \Delta x \\ \Delta y \\ \Delta z \end{bmatrix} = - \begin{bmatrix} r_x \\ r_y \\ r_z \end{bmatrix}, \end{align}\]

where \(\eta > 0\) and \(W\) is a diagonal matrix with positive elements.

Note that \(\Delta z\) can be eliminated via \(\Delta z = (W + \frac{1}{\eta} I)^{-1}(G \Delta x + r_z)\), resulting in

\[\begin{align} \begin{bmatrix} P + G^T (W + \frac{1}{\eta} I)^{-1} G & C^T \\ C & -\frac{1}{\eta} I \end{bmatrix} \begin{bmatrix} \Delta x \\ \Delta y \end{bmatrix} = - \begin{bmatrix} r_x + G^T (W + \frac{1}{\eta} I)^{-1} r_z \\ r_y \end{bmatrix}. \end{align}\]

Any component of \(\Delta y\) corresponding to an equality constraint other than the dynamics can be eliminated in the same fashion. Importantly, note that the only constraints that have cross-stage dependencies are the dynamics, so these eliminations preserve the stagewise nature of the problem.

This leaves us with a linear system that is a Dual-Regularized LQR problem.

1.4 Dual-Regularized LQR↩︎

A dual-regularized LQR problem is a linear system of the form \[\begin{align} \label{primal-dual-linear-system} \begin{bmatrix} P & C^T \\ C & -\delta I \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = - \begin{bmatrix} s \\ c \end{bmatrix}, \end{align}\tag{3}\] where \[\begin{align} \label{optimal-control-specialization} P &= \begin{bmatrix} P_0 & & \\ & \ddots & \\ & & P_N \end{bmatrix}, \quad P_i = \begin{cases} \begin{bmatrix} Q_i & M_i \\ M_i^T & R_i \end{bmatrix}, & \text{if } 0 \leq i < N, \\ Q_i, & \text{if } i = N, \end{cases} \\ C &= \begin{bmatrix} -I & & & & & & & & & & \\ A_0 & B_0 & -I & & & & & & & & \\ & & A_1 & B_1 & -I & & & & & & \\ & & & & A_2 & B_2 & -I & & & & \\ & & & & & & & \ddots & -I & & \\ & & & & & & & & A_{N-1} & B_{N-1} & -I \end{bmatrix}, \\ s &= \begin{bmatrix} q_0 \\ r_0 \\ \vdots \\ q_{N-1} \\ r_{N-1} \\ q_N \end{bmatrix}, \quad c = \begin{bmatrix} c_0 \\ c_1 \\ \vdots \\ c_{N-1} \end{bmatrix}, \quad x = \begin{bmatrix} x_0 \\ u_0 \\ \vdots \\ x_{N - 1} \\ u_{N - 1} \\ x_N \end{bmatrix}, \quad y = \begin{bmatrix} y_0 \\ y_1 \\ \vdots \\ y_N \end{bmatrix}, \quad \delta > 0. \end{align}\tag{4}\]

Moreover, the matrices \(P_i\) are expected to be symmetric and positive semi-definite, and the matrices \(R_i\) are expected to be symmetric and positive definite.

Note that, when \(\delta = 0\), this linear system is the KKT system associated with a standard LQR problem (i.e. not dual-regularized, and viewed as an optimization problem).

The linear system 3 can be seen as the KKT system associated with the optimization problem \[\begin{align} \label{dual-regularized-lagrangian} \max \limits_{y} \min \limits_{x} \frac{1}{2} x^T P x + s^T x - \frac{\delta}{2} y^T y + y^T (C x + c) . \end{align}\tag{5}\] We call this the dual-regularized Lagrangian. By abuse of notation, when specializing it to 4 , we also call this a dual-regularized LQR problem.

Specializing 5 to the case of discrete-time optimal control problems, as in 4 , this becomes \[\begin{align} \label{min-max-with-ys} \max \limits_{y_0, \ldots, y_N} \min \limits_{x_0, u_0, \ldots, x_N} L(x_0, u_0, \ldots, x_N, y_0, \ldots, y_N), \end{align}\tag{6}\] where \[\begin{align} L(x_0, u_0, \ldots, x_N, y_0, \ldots, y_N) = & \sum \limits_{i=0}^{N-1} \left( \frac{1}{2} x_i^T Q_i x_i + x_i^T M_i u_i + \frac{1}{2} u_i^T R_i u_i + q_i^T x_i + r_i^T u_i \right) + \frac{1}{2} x_N^T Q_N x_N + q_N^T x_N \\ & + y_0^T (c_0 - x_0) + \sum \limits_{i=0}^{N-1} \left( y_{i+1}^T (A_i x_i + B_i u_i + c_{i+1} - x_{i + 1}) \right) - \frac{\delta}{2} \sum \limits_{i=0}^{N} y_i^T y_i . \end{align}\]

1.5 Associative Scans Overview↩︎

Associative scans are a common parallelization mechanism used in functional programming, first introduced in [3]. They were used in [2] to derive a simple method for solving (primal) LQR problems in \(O(\log^2(m) + \log(N) \log^2(n))\) parallel time, where \(N, n, m\) are respectively the number of stages, states, and controls. Note, however, that [2] states that the complexity of their algorithm is \(O(m + \log(N) n)\), apparently stemming from some confusion around the parallel time complexity of matrix inversion (which is \(\log(n)^2\) for \(n \times n\) matrices, as shown in [4]).

Given a set \(\mathcal{X}\), a function \(f: \mathcal{X} \times \mathcal{X} \rightarrow \mathcal{X}\) is said to be associative if \(\forall a, b, c \in \mathcal{X}, f(f(a, b), c) = f(a, f(b, c))\). The forward associative scan operation \(S_f(x_1, \ldots, x_n; f)\) can be inductively defined by \[\begin{align} S_f(x_1; f) &= x_1, \\ S_f(x_1, \ldots, x_{i + 1}; f) &= (y_1, \ldots, y_i, f(y_i, x_{i + 1})), \end{align}\] where \(y_1, \ldots, y_i = S_f(x_1, \ldots, x_i; f)\). Similarly, the reverse associative scan operation \(S_r(x_1, \ldots, x_n; f)\) can be inductively defined by \[\begin{align} S_r(x_1; f) &= x_1, \\ S_r(x_1, \ldots, x_{i + 1}; f) &= (f(x_1, y_2), y_2, \ldots, y_{i + 1}), \end{align}\] where \(y_2, \ldots, y_{i + 1} = S_r(x_2, \ldots, x_{i + 1}; f)\). [3] provides a method for performing associative scans of \(N\) elements in parallel time \(O(\log(N))\).

We will employ associative scans do derive the parallel dual-regularized Riccati recursion, effectively extending [2].

1.6 Motivation↩︎

The reader may ask why we wish to regularize the dynamics and initial state constraints, whose Jacobians always have full row rank due to the presence of the identity blocks. In fact, if we regularized all equality and inequality constraints except for the dynamics, and performed the variable eliminations described in 1.3, we would be left with a standard LQR problem. The issue with doing so is that the computed primal search direction would no longer be guaranteed to be a descent direction of the Augmented Barrier-Lagrangian with the pre-specified penalty parameter.

The Differentiable Dynamic Programming (DDP) algorithm, introduced in [5], is still the most common method used in practice for numerical optimal control, in spite of the disadvantages associated with single-shooting methods (e.g. the impossibility of independently warm-starting state trajectories).

The Riccati recursion used for solving standard LQR problems was first derived in [1], and was first combined with an interior point method in [6].

More recently, FATROP [7] explores using a different extension of Riccati’s recursion (targeting stagewise equality constraint handling), combining it with an interior-point approach for inequality constraint handling.

An efficient parallel LQR algorithm with logarithmic complexity on the number of stages was first derived in [2], where it was also used to solve unconstrained non-convex discrete-time optimal control problems.

2 Sequential Algorithm↩︎

We start with some simple (but helpful) lemmas that we will employ to eliminate some of the variables in 6 .

Lemma 2. If \(\delta > 0\) and \(f(y) = k^T y - \frac{\delta}{2} y^T y\), then \(\max \limits_y f(y) = f(\frac{k}{\delta}) = \frac{1}{2 \delta} \lVert k \rVert^2\).

Proof. \(\nabla f(y) = k - \delta y = 0 \Rightarrow y = \frac{k}{\delta}\). Moreover, \(f(\frac{k}{\delta}) = k^T \frac{k}{\delta} - \frac{\delta}{2} \frac{k^T k}{\delta^2} = \frac{k^T k}{\delta} - \frac{k^T k}{2 \delta} = \frac{1}{2 \delta} \lVert k \rVert^2\). ◻

Lemma 3. If \(P\) is symmetric and positive semi-definite and \(\delta \geq 0\), \[I - (I + \delta P)^{-1} = (I + \delta P)^{-1} \delta P = \delta P (I + \delta P)^{-1} .\]

Proof. \[(I + \delta P)^{-1} \delta P = (I + \delta P)^{-1} (I + \delta P) - (I + \delta P)^{-1} = I - (I + \delta P)^{-1} .\] Similarly, \[\delta P (I + \delta P)^{-1} = (I + \delta P)(I + \delta P)^{-1} - (I + \delta P)^{-1} = I - (I + \delta P)^{-1} .\] ◻

Lemma 4. If \(P\) is positive semi-definite, \(\delta > 0\), and \(f(x) = \frac{1}{2} x^T P x + p^T x + \frac{1}{2 \delta} \lVert c - x \rVert^2\), then \[\begin{align} \min \limits_x f(x) = f \left( (I + \delta P)^{-1} (c - \delta p) \right) = \frac{1}{2} c^T \left( (I + \delta P)^{-1} P \right) c + p^T c - \delta p^T (I + \delta P)^{-1} P c - \frac{\delta}{2} p^T (I + \delta P)^{-1} p . \end{align}\]

Proof. \[\begin{align} \nabla f(x) = P x + p - \frac{c - x}{\delta} = 0 \Rightarrow (I + \delta P) x = c - \delta p \Rightarrow x = (I + \delta P)^{-1} (c - \delta p). \end{align}\]

Since we can also write \[\begin{align} f(x) = \frac{1}{\delta} \left( \frac{1}{2} x^T (I + \delta P) x + (\delta p - c)^T x + \frac{1}{2} c^T c \right), \end{align}\] and noting that we use 3 below, it follows that \[\begin{align} & f \left( (I + \delta P)^{-1} (c - \delta p) \right) \\ = & \frac{1}{\delta} \left( \frac{1}{2} (c - \delta p)^T (I + \delta P)^{-1} (I + \delta P) (I + \delta P)^{-1} (c - \delta p) + (\delta p - c) (I + \delta P)^{-1} (c - \delta p) + \frac{1}{2} c^T c \right) \\ = & \frac{1}{2 \delta} \left( c^T c - (c - \delta p) (I + \delta P)^{-1} (c - \delta p) \right) \\ = & \frac{1}{\delta} \left( \frac{1}{2} c^T \left( I - (I + \delta P)^{-1} \right) c + \delta p^T (I + \delta P)^{-1} c - \frac{\delta^2}{2} p^T (I + \delta P)^{-1} p \right) \\ = & \frac{1}{\delta} \left( \frac{1}{2} c^T \left( (I + \delta P)^{-1} \delta P \right) c + \delta p^T (I - \delta (I + \delta P)^{-1} P) c - \frac{\delta^2}{2} p^T (I + \delta P)^{-1} p \right) \\ = & \frac{1}{2} c^T \left( (I + \delta P)^{-1} P \right) c + p^T c - \delta p^T (I + \delta P)^{-1} P c - \frac{\delta}{2} p^T (I + \delta P)^{-1} p . \end{align}\] ◻

For \(0 \in \lbrace 0, \ldots, N \rbrace\), we define the cost-to-go functions \[\begin{align} \label{cost-to-go-value-functions} V_{k}(x_k) = \max \limits_{y_{k+1}, \ldots, y_N} \min \limits_{u_k, x_{k+1}, \ldots, u_{N-1}, x_N} & \sum \limits_{i=k}^{N-1} \left( \frac{1}{2} x_i^T Q_i x_i + x_i^T M_i u_i + \frac{1}{2} u_i^T R_i u_i + q_i^T x_i + r_i^T u_i \right) \\ & + \frac{1}{2} x_N^T Q_N x_n + q_N^T x_n \\ & + \sum \limits_{i=k}^{N-1} \left( y_{i+1}^T \left( A_i x_i + B_i u_i + c_{i + 1} - x_{i + 1} \right) \right) - \frac{\delta}{2} \sum \limits_{i=k}^{N-1} y_{i+1}^T y_{i+1} . \\ \end{align}\tag{7}\]

This brings us to the key theorem.

Theorem 2. For all \(k \in \lbrace 0, \ldots, N \rbrace\), there exist \(P_k, p_k\) such that \[\begin{align} \min \limits_{x_k} V_k(x_k) = \min \limits_{x_k} \left( \frac{1}{2} x_k^T P_k x_k + p_k^T x_k \right) + V_k(0), \end{align}\] where \(P_k\) is symmetric and positive semi-definite.

Moreover, there exist \(K_k, k_k\) such that 6 is optimal at \[\begin{align} u_k &= K_k x_k + k_k \\ x_0 &= (I + \delta P_0)^{-1} (c_0 - \delta p_0) \\ x_{k+1} &= (I + \delta P_{k+1})^{-1} (A_k x_k + B_k u_k + c_{k+1} - \delta p_{k+1}) \\ y_k &= P_k x_k + p_k . \end{align}\]

Proof. We proceed by induction, in decreasing order of \(k\). The base case, consisting of \(k = N\), holds trivially by setting \(P_N = Q_N\) and \(p_N = q_N\).

Assuming, by the induction hypothesis, that the statement holds true for \(k+1\), we will show it remains true for \(k\).

Noting that \[\begin{align} \label{value-function-recursive-definition} V_k(x_k) = \max \limits_{y_{k+1}} \min \limits_{u_k, x_{k+1}} & V_{k+1} (x_{k+1}) + \frac{1}{2} x_k^T Q_k x_k + x_k^T M_k u_k + \frac{1}{2} u_k^T R_k u_k + y_{k+1}^T (A_k x_k + B_k u_k + c_{k+1} - x_{k+1}) \\ & - \frac{\delta}{2} y_{k+1}^T y_{k+1} , \end{align}\tag{8}\] we start by eliminating \(y_{k+1}\) from 8 , by applying 2.

Note that \[\begin{align} \max \limits_{y_{k+1}} y_{k+1}^T \left( A_i x_i + B_i u_i + c_{k+1} - x_{k+1} \right) - \frac{\delta}{2} y_{k+1}^T y_{k+1} = \frac{1}{2 \delta} \lVert A_i x_i + B_i u_i + c_{k+1} - x_{k+1} \rVert^2 , \end{align}\] achieved at \[\begin{align} \label{recover-y} y_{k+1} = \frac{1}{\delta} (A_k x_k + B_k u_k + c_{k+1} - x_{k+1}) . \end{align}\tag{9}\]

Thus, \[\begin{align} V_{k} (x_{k}) = & \min \limits_{u_k, x_{k+1}} V_{k+1} (x_{k+1}) + \frac{1}{2} x_k^T Q_k x_k + x_k^T M_k u_k + \frac{1}{2} u_k^T R_k u_k + \frac{1}{2 \delta} \lVert A_k x_k + B_k u_k + c_{k+1} - x_{k+1} \rVert^2 \\ & + \text{const} . \end{align}\]

Applying the induction hypothesis, \[\begin{align} \label{before-eliminating-x} V_{k} (x_{k}) = \min \limits_{u_k, x_{k+1}} & \frac{1}{2} x_k^T Q_k x_k + x_k^T M_k u_k + \frac{1}{2} u_k^T R_k u_k + \frac{1}{2} x_{k+1} P_{k+1} x_{k+1} + p_{k+1}^T x_{k+1} \\ & + \frac{1}{2 \delta} \lVert A_k x_k + B_k u_k + c_{k+1} - x_{k+1} \rVert^2 + \text{const} . \end{align}\tag{10}\]

Next, we apply 4 to eliminate \(x_{k+1}\) from 10 .

The terms involving \(x_{k+1}\) are \[\begin{align} \min \limits_{x_{k+1}} \frac{1}{2} x_{k+1}^T P_{k+1} x_{k+1} + p_{k+1}^T x_{k+1} + \frac{1}{2 \delta} \lVert A_k x_k + B_k u_k + c_{k+1} - x_{k+1} \rVert^2. \end{align}\]

Letting \[\begin{align} W_{k+1} &= (I + \delta P_{k+1})^{-1} P_{k+1}, \end{align}\] applying 4, and discarding additive constant terms, this becomes \[\begin{align} & \frac{1}{2} (A_k x_k + B_k u_k + c_{k+1})^T W_{k+1} (A_k x_k + B_k u_k + c_{k+1}) \\ + & p_{k+1}^T (A_k x_k + B_k u_k + c_{k+1}) - \delta p_{k+1}^T W_{k+1} (A_k x_k + B_k u_k + c_{k+1}) , \end{align}\] achieved at \[\begin{align} \label{recover-x} x_{k+1} = (I + \delta P_{k+1})^{-1} (A_k x_k + B_k u_k + c_{k+1} - \delta p_{k+1}) . \end{align}\tag{11}\]

Next, we eliminate \(u_k\) from the resulting version of 10 after \(x_{k+1}\) has been eliminated. The terms involving \(u_k\) are \[\begin{align} \label{only-u-terms} \min \limits_{u_k} & (r_k + M_k^T x_k)^T u_k + \frac{1}{2} u_k^T R_k u_k + \frac{1}{2} (A_k x_k + B_k u_k + c_{k+1})^T W_{k+1} (A_k x_k + B_k u_k + c_{k+1}) \\ + & p_{k+1}^T B_k u_k - \delta p_{k+1}^T W_{k+1} B_k u_k \\ = \min \limits_{u_k} & \frac{1}{2} u_k^T (R_k + B_k^T W_{k+1} B_k) u_k \\ + & (r_k + M_k^T x_k + B_k^T W_{k+1} (A_k x_k + c_{k+1}) + B_k^T (p_{k+1} - \delta W_{k+1} p_{k+1})^T u_k \\ + & \frac{1}{2} (A_k x_k + c_{k+1})^T W_{k+1} (A_k x_k + c_{k+1}) . \end{align}\tag{12}\]

Note that this leaves out the terms \[\begin{align} \label{left-out-x-terms} p_{k+1}^T (A_k x_k + c_{k+1}) - \delta p_{k+1} (W_{k+1} A_k x_k + c_{k+1}) , \end{align}\tag{13}\] which do not depent on \(u_k\). We will pick these back up later.

Letting \[\begin{align} G_k &= R_k + B_k^T W_{k+1} B_k \\ g_k &= p_k + W_k (c_k - \delta p_k) \\ H_k &= B_k^T W_{k+1} A_k + M_k^T \\ h_k &= r_k + B_k^T g_{k+1} \\ K_k &= -G_k^{-1} H_k \\ k_k &= -G_k^{-1} h_k , \end{align}\] 12 becomes \[\begin{align} \min \limits_{u_k} \frac{1}{2} u_k^T G_k u_k + (H_k x_k + h_k)^T u_k + \frac{1}{2} (A_k x_k + c_{k+1})^T W_{k+1} (A_k x_k + c_{k+1}) . \end{align}\]

Taking the gradient with respect to \(u_{k}\) and equating it to \(0\), we get \[\begin{align} \label{control-law} u_{k} = - G_k^{-1} (H_k x_k + h_k) = K_k x_k + k_k. \end{align}\tag{14}\]

Plugging this back in, and discarding additive constants, we get \[\begin{align} \label{after-eliminating-u} & -\frac{1}{2} (H_k x_k + h_k)^T G_k^{-1} (H_k x_k + h_k) + \frac{1}{2} (A_k x_k + c_{k+1})^T W_{k+1} (A_k x_k + c_{k+1}) \\ = & \frac{1}{2} x_k (A_k^T W_{k+1} A_k - H_k G_k^{-1} H_k) x_k + (A_k^T W_{k+1} c_{k+1} - H_k^T G_k^{-1} h_k) x_k \\ = & \frac{1}{2} x_k (A_k^T W_{k+1} A_k + H_k K_k) x_k + (A_k^T W_{k+1} c_{k+1} + H_k^T k_k) x_k . \end{align}\tag{15}\]

Collecting all remaining terms from 13 and 15 , discarding additive constants, and letting \[\begin{align} \label{value-fn-formulas} P_k &= A_k^T W_{k+1} A_k + Q_k + H_k^T K_k \\ p_k &= q_k + A_k^T g_{k+1} + H_k^T k_k , \\ \end{align}\tag{16}\] 10 becomes \[\begin{align} V_k(x_k) = \frac{1}{2} x_k^T P_k x_k + p_k^T x_k + \text{const} = \frac{1}{2} x_k^T P_k x_k + p_k^T x_k + V_k(0) . \end{align}\]

Next, we must show that \(P_k\) remains positive semi-definite.

Noting that \(P_k\) is the Schur complement of \[\begin{align} \label{psd-block-matrices} \begin{bmatrix} Q_k + A_k^T W_{k+1} A_k & H_k^T \\ H_k & G_k \end{bmatrix} & = \begin{bmatrix} Q_k & M_k^T \\ M_k^T & R_k \end{bmatrix} + \begin{bmatrix} A_k^T W_{k+1} A_k & A_k^T W_{k+1} B_k \\ B_k^T W_{k+1} A_k & B_k^T W_{k+1} B_k \end{bmatrix} \\ &= \begin{bmatrix} Q_k & M_k^T \\ M_k^T & R_k \end{bmatrix} + \begin{bmatrix} A_k^T \\ B_k^T \end{bmatrix} W_{k+1} \begin{bmatrix} A_k & B_k \end{bmatrix} , \end{align}\tag{17}\] it follows that \(P_k\) is positive semi-definite, as \(G_k\) is positive-definite and the block-matrix 17 is positive semi-definite.

Noting that 6 can be written as \[\begin{align} \max \limits_{y_0} \min \limits_{x_0} V_0(x_0) + y_0^T (x_0 - c_0) - \frac{\delta}{2} y_0^T y_0 , \end{align}\] we can apply 2 to re-write this as \[\begin{align} \min \limits_{x_0} V_0(x_0) + \frac{1}{2 \delta} \lVert c_0 - x_0 \rVert^2 = V_0(0) + \frac{1}{\delta} \left( \min \limits_{x_0} \frac{1}{2} x_0^T (I + \delta P_0) x_0 + (\delta p_0 - c_0)^T x_0 + c_0^T c_0 \right), \end{align}\] achieved at \[\begin{align} \label{recover-y0} y_0 = \frac{1}{\delta} (c_0 - x_0) . \end{align}\tag{18}\]

Solving for \(x_0\), we get \[\begin{align} \label{recover-x0} x_0 = (I + \delta P_0)^{-1} (c_0 - \delta p_0). \end{align}\tag{19}\]

The remaining \(x_k, u_k\) can be recovered in a forward pass, i.e. in increasing order of \(k\) via 11 and 14 .

Finally, we can re-write 18 and 9 to remove any division by \(\delta\), in the interest of improving numerical stability when \(\delta \rightarrow 0\) as well as extending this method to the case of \(\delta = 0\) (recovering the standard backward and forward passses of the standard LQR algorithm).

Since, due to 19 , \[\begin{align} (I + \delta P_0) x_0 = -(\delta p_0 - c_0) \Leftrightarrow \frac{1}{\delta} (c_0 - x_0) = P_0 x_0 + p_0, \end{align}\] we can re-write 18 as \[\begin{align} y_0 = P_0 x_0 + p_0. \end{align}\]

Moreover, due to 11 , \[\begin{align} (I + \delta P_{k+1}) x_{k+1} = A_k x_k + B_k u_k + c_{k+1} - \delta p_{k+1} \Leftrightarrow \frac{1}{\delta} (A_k x_k + B_k u_k + c_{k+1} - x_{k+1}) = P_{k+1} x_{k+1} + p_{k+1} . \end{align}\] Thus, we can re-write 9 as \[\begin{align} y_{k+1} = P_{k+1} x_{k+1} + p_{k+1} . \end{align}\] ◻

3 Parallel Algorithm↩︎

First, we can eliminate the control variables \(u_i\) from 6 . Computing the gradients with respect to \(u_i\), equating them to \(0\), and solving for \(u_i\) yields \[\begin{align} u_i = -R_i^{-1} \left( M_i^T x_i + r_i + B_i^T y_{i+1} \right). \end{align}\]

Noting that \[\begin{align} \frac{1}{2} u_i^T R_i u_i &= \frac{1}{2} x_i^T M_i R_i^{-1} M_i^T x_i + r_i^T R_i^{-1} M_i^T x_i + \frac{1}{2} y_{i+1}^T B_i R_i^{-1} B_i^T y_{i+1} + y_{i+1}^T B_i R_i^{-1} M_i^T x_i + r_i^T R_i^{-1} B_i^T y_{i+1} \\ & + \frac{1}{2} r_i^T R_i^{-1} r_i, \\ x_i^T M_i u_i &= -x_i^T M_i R_i^{-1} M_i^T x_i - r_i^T R_i^{-1} M_i^T x_i - y_{i+1}^T B_i R_i^{-1} M_i^T x_i, \\ r_i^T u_i &= -r_i^T R_i^{-1} M_i^T x_i - r_i^T R_i^{-1} r_i - y_{i+1}^T B_i R_i^{-1} r_i, \\ y_{i+1}^T B_i u_i &= -y_{i+1}^T B_i R_i^{-1} B_i^T y_{i+1} - y_{i+1}^T B_i R_i^{-1} M_i^T x_i - r_i^T R_i^{-1} B_i^T y_{i+1}, \end{align}\] and discarding constant terms, the optimization problem 6 becomes \[\begin{align} \label{min-max-without-us} \max \limits_{y_0, \ldots, y_N} \min \limits_{x_0, \ldots, x_N} & \sum \limits_{i=0}^{N-1} \left( \frac{1}{2} x_i^T (Q_i - M_i R_i^{-1} M_i^T) x_i + (q_i - M_i R_i^{-1} r_i)^T x_i \right) \\ & + \frac{1}{2} x_N^T Q_N x_N + q_N^T x_N + \sum \limits_{i=0}^{N-1} \left( y_{i+1}^T \left( \left( A_i - B_i R_i^{-1} M_i^T \right) x_i + \left( c_{i+1} - B_i R_i^{-1} r_i \right) - x_{i + 1} \right) \right) \\ & - \frac{1}{2} \sum \limits_{i=0}^{N-1} y_{i+1}^T \left( \delta I + B_i R_i^{-1} B_i^T \right) y_{i+1} + y_0^T (c_0 - x_0) - \frac{\delta}{2} y_0^T y_0 . \end{align}\tag{20}\]

For \(i \in \lbrace 0, \ldots, N - 1 \rbrace\), we define \[\begin{align} \label{lqr-gpu-init-running} P_{i \rightarrow i+1} &= Q_i - M_i R_i^{-1} M_i^T, \\ p_{i \rightarrow i+1} &= q_i - M_i R_i^{-1} r_i, \\ A_{i \rightarrow i+1} &= A_i - B_i R_i^{-1} M_i^T, \\ C_{i \rightarrow i+1} &= \delta I + B_i R_i^{-1} B_i^T, \\ c_{i \rightarrow i+1} &= c_{i+1} - B_i R_i^{-1} r_i, \\ \end{align}\tag{21}\]

Similarly, we define \[\begin{align} \label{lqr-gpu-init-terminal} P_{N \rightarrow N+1} &= Q_N, \\ p_{N \rightarrow N+1} &= q_N, \\ A_{N \rightarrow N+1} &= 0, \\ C_{N \rightarrow N+1} &= 0, \\ c_{N \rightarrow N+1} &= 0, \\ \end{align}\tag{22}\]

For \(0 \leq i < j \leq N+1\), we define the interval value functions \[\begin{align} V_{i \rightarrow j}(x_i, x_j) = \max \limits_{y_{i+1}, \ldots, y_j} \min \limits_{x_{i+1}, \ldots, x_{j-1}} & \sum \limits_{k=i}^{j-1} \left( \frac{1}{2} x_k^T P_{k \rightarrow k + 1} x_k + p_{k \rightarrow k+1}^T x_k \right) \\ & + \sum \limits_{k=i}^{j-1} \left( y_{k+1}^T \left( A_{k \rightarrow k+1} x_k + c_{k \rightarrow k+1} - x_{k + 1} \right) \right) \\ & - \frac{1}{2} \sum \limits_{k=i}^{j-1} y_{k+1}^T C_{k \rightarrow k+1} y_{k+1} \\ \end{align}\]

Note that, by definition, \[\begin{align} V_i(x_i) = \min \limits_{x_{N+1}} V_{i \rightarrow N+1} (x_i, x_{N+1}). \end{align}\]

Since \[\begin{align} \max \limits_{y_{N+1}} \min \limits_{x_{N+1}} -y_{N+1}^T x_{N+1} = 0, \end{align}\] achieved by forcing \(x_{N+1} = 0\), it follows that \[\begin{align} V_i(x_i) = V_{i \rightarrow N+1} (x_i, 0). \end{align}\]

Thus, 20 amounts to solving \[\begin{align} \max \limits_{y_0} \min \limits_{x_0} V_{0}(x_0) -y_0^T (c_0 - x_0) - \frac{\delta}{2} y_0^T y_0 . \end{align}\]

It was shown in [2] that the functions \(V_{i \rightarrow j}\) admit representations of the form \[\begin{align} V_{i \rightarrow j} (x_i, x_j) = \max \limits_{y} \left( \frac{1}{2} x_i^T P_{i \rightarrow j} x_i + p_{i \rightarrow j}^T x_i - \frac{1}{2} y^T C_{i \rightarrow j} y + y^T \left( A_{i \rightarrow j} x_i + c_{i \rightarrow j} - x_j \right) \right), \end{align}\] modulo constant additive terms that are independent of \(x_i, x_j\) and can thus be discarded.

The following combination rules, established in [2], can be applied to compute \(V_{i \rightarrow k}\) from \(V_{i \rightarrow j}\) and \(V_{j \rightarrow k}\): \[\begin{align} \label{lqr-gpu-combo} P_{i \rightarrow k} &= A_{i \rightarrow j}^T \left( I + P_{j \rightarrow k} C_{i \rightarrow j} \right)^{-1} P_{j \rightarrow k} A_{i \rightarrow j} + P_{i \rightarrow j}, \\ p_{i \rightarrow k} &= A_{i \rightarrow j}^T \left( I + P_{j \rightarrow k} C_{i \rightarrow j} \right)^{-1} \left( p_{j \rightarrow k} + P_{j \rightarrow k} c_{i \rightarrow j} \right) + p_{i \rightarrow j}, \\ A_{i \rightarrow k} &= A_{j \rightarrow k} \left( I + C_{i \rightarrow j} P_{j \rightarrow k} \right)^{-1} A_{i \rightarrow j}, \\ C_{i \rightarrow k} &= A_{j \rightarrow k} \left( I + C_{i \rightarrow j} P_{j \rightarrow k} \right)^{-1} C_{i \rightarrow j} A_{j \rightarrow k}^T + C_{j \rightarrow k}, \\ c_{i \rightarrow k} &= A_{j \rightarrow k} \left( I + C_{i \rightarrow j} P_{j \rightarrow k} \right)^{-1} \left( c_{i \rightarrow j} - C_{i \rightarrow j} p_{j \rightarrow k} \right) + c_{j \rightarrow k}. \\ \end{align}\tag{23}\]

Note that we depart from [2] in 21 , but not in 23 or 22 .

A reverse associative scan [3] can be used to compute the \(P_{i \rightarrow N+1}, p_{i \rightarrow N+1}\) (i.e. the \(P_i, p_i\) in 16 ) in \(O(\log(N) \log (n)^2)\).

Next, the \((I + \delta P_i)^{-1}\) can be computed in \(O(\log(n)^2)\) parallel time (i.e. \(O(1)\) with respect to \(N\)).

We can then compute \(x_0\) in \(O(\log(n))\) parallel time via 19 and the \(K_i, k_i\) from 14 in \(O(\log(n) + \log (m)^2)\) parallel time (i.e. \(O(1)\) with respect to \(N\) in both cases).

Finally, we wish to compute the \(u_i, x_{i+1}\) from the \(K_i, k_i\) via 11 and 14 . Note that the sequential LQR forward pass has \(O(N)\) parallel time complexity. However, as done in [2], we can reduce the computation of the \(x_i\) to a sequential composition of affine functions, which can also be parallelized with an associative scan [3] in \(O(\log(N) \log(n))\) parallel time. This will be described in 3.1. Due to 2,

\[\begin{align} x_{i+1} & = (I + \delta P_i)^{-1} \left( A_i x_i + B_i u_i + c_{i+1} - \delta p_{i+1} \right) \\ &= (I + \delta P_i)^{-1} \left( A_i x_i + B_i \left( K_i x_i + k_i \right) + c_{i+1} - \delta p_{i+1} \right) \\ &= (I + \delta P_i)^{-1} \left( A_i + B_i K_i \right) x_i + (I + \delta P_i)^{-1} ( B_i k_i + c_{i+1} - \delta p_{i+1}) . \end{align}\]

Once these affine functions have been composed, they can be independently applied to \(x_0\) to recover all the \(x_i\) in \(O(\log(n))\) parallel time. The \(u_i\) can then be computed in \(O(\log (\max (m, n)))\) parallel time by independently evaluating \(u_i = K_i x_i + k_i\). Note that both of these parallel times are \(O(1)\) with respect to \(N\).

Thus, the combined parallel time complexity of our method is \(O(\log(m)^2 + \log(N) \log(n)^2)\).

3.1 Composing Affine Functions with Associative Scans↩︎

In this section, we describe an algorithm for composing \(N\) affine functions \(F_i(x) = M_i x_i + m_i\), as done in [2], with \(O(\log(N) \log(n))\) parallel time.

Letting \(\mathcal{X} = \mathbb{R}^{n} \times \mathbb{R}^{n \times n}\) and letting \(f: \mathcal{X} \rightarrow \mathcal{X}\) be defined by \(f((a, B), (c, D)) = (Da + c, DB)\), we claim that \(f\) is associative. This is simple to verify: \[\begin{align} &f(f((a, B), (c, D)), (e, F)) \\ = &f((Da + c, DB), (e, F)) \\ = &(F(Da + c) + e, F(DB)) \\ = &((FD)a + Fc + e, (FD)B) \\ = &f((a, B), (Fc + e, FD)) \\ = &f((a, B), f((c, D), (e, F))). \end{align}\]

Moreover, note that \(f\) is the affine function composition operator, as \(D(Bx + a) + c = (DB)x + (Da + c)\). Thus, it suffices to apply a forward associative scan to \(f\) to obtain all compositions \(F_0, F_1 \circ F_0, \ldots, F_{N-1} \circ \cdots \circ F_0\) in \(O(\log(N) \log(n))\) parallel time.

4 Software Contributions↩︎

We provide JAX implementations of both the sequential and parallel dual-regularized LQR algorithms [8], including unit tests verifying the correctness of the solutions provided by our methods on a set of random examples satisfying the required definiteness properties.

Modern interior-point solvers, such as IPOPT [9], while supporting several generic linear system solvers, are not designed to easily accommodate linear solvers that specialize to specific problem types (e.g. optimal control problems). In fact, until recently, IPOPT installations did not include all of the required headers for users to integrate any custom user-side linear system solvers, showing that this was not being done at all.

In order to address this limitation, we released a simple regularized interior point solver in [10], which allows users to easily plug in callbacks specializing certain operations to their specific problem type, including:

a KKT system factorization callback;
a KKT system solve callback;
a KKT system residual computation callback.

This effectively splits the solver into a shared backend and a problem-specific frontend. We do exactly that for the case of optimal control in [11].

Adding to this, we also implemented an integration with QDLDL [12] supporting arbitrary user-provided KKT permutations (for optimal fill-in prevention) in [13], resulting in an efficient sparse interior point solver that avoids dense operations entirely. Alternatively, a sparse linear algebra code generation library, such as SLACG [14], can be used to solve the KKT systems and compute their residuals; this often achieves substantial speed-ups for solving sparse problems, as no index computations are performed at runtime.

Examples can be found in [15]. In all cases, with correct usage, dynamic memory allocation is entirely avoided.

All of the provided packages are free and open-source (MIT-licensed).

5 Conclusion↩︎

We derived extensions of the sequential and parallel Riccati Recursions for dual-regularized LQR problems, allowing us to easily and efficiently numerically solve constrained non-linear discrete-time optimal control problems via the regularized interior point method. We also provided free, efficient, open-source implementations of the described algorithms, which are fully unit-tested on random well-defined examples, establishing the correctness of our derivations and implementations.

References↩︎

[1]

R. E. Kalman, Contributions to the Theory of Optimal Control, Boletín de la Sociedad Matemática Mexicana, 5 (1960), pp. 102–119.

[2]

S. Särkkä and Á. F. García-Fernández, Temporal parallelization of dynamic programming and linear quadratic control, IEEE Transactions on Automatic Control, 68 (2021), pp. 851–866.

[3]

G. Blelloch, Scans as primitive parallel operations, IEEE Transactions on Computers, 38 (1989), pp. 1526–1538.

[4]

L. Csanky, Fast parallel matrix inversion algorithms, SIAM Journal on Computing, 5 (1976), pp. 618–623.

[5]

D. Mayne, A second-order gradient method for determining optimal trajectories of non-linear discrete-time systems, International Journal of Control, 3 (1966), pp. 85–95.

[6]

C. V. Rao, S. J. Wright, and J. B. Rawlings, Application of interior-point methods to model predictive control, Journal of Optimization Theory and Applications, 99 (1998), pp. 723–757.

[7]

L. Vanroye, A. Sathya, J. D. Schutter, and W. Decré, FATROP: A fast constrained optimal control problem solver for robot trajectory optimization and control, in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023, pp. 10036–10043.

[8]

J. Sousa-Pinto, Regularized LQR in JAX. https://github.com/joaospinto/regularized_lqr_jax, 2025.

[9]

A. Wächter and L. T. Biegler, On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming, Mathematical Programming, 106 (2006), pp. 25–57.

[10]

J. Sousa-Pinto, SIP. https://github.com/joaospinto/sip, 2024.

[11]

J. Sousa-Pinto, SIP Optimal Control. https://github.com/joaospinto/sip_optimal_control, 2025.

[12]

B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, OSQP: an operator splitting solver for quadratic programs, Mathematical Programming Computation, 12 (2020), pp. 637–672.

[13]

J. Sousa-Pinto, SIP QDLDL. https://github.com/joaospinto/sip_qdldl, 2024.

[14]

J. Sousa-Pinto, SLACG. https://github.com/joaospinto/slacg, 2024.

[15]

J. Sousa-Pinto, SIP Examples. https://github.com/joaospinto/sip_examples, 2024.

Dual-Regularized Riccati Recursions
for Interior-Point Optimal Control

Abstract

1 Introduction↩︎

1.1 Optimal Control↩︎

1.2 The Regularized Interior Point Method↩︎

1.3 Applying the regularized IPM to optimal control↩︎

1.4 Dual-Regularized LQR↩︎

1.5 Associative Scans Overview↩︎

1.6 Motivation↩︎

2 Sequential Algorithm↩︎

3 Parallel Algorithm↩︎

3.1 Composing Affine Functions with Associative Scans↩︎

4 Software Contributions↩︎

5 Conclusion↩︎

References↩︎

Subjects

Updated on Academus

Dual-Regularized Riccati Recursions for Interior-Point Optimal Control

Abstract

1 Introduction↩︎

1.1 Optimal Control↩︎

1.2 The Regularized Interior Point Method↩︎

1.3 Applying the regularized IPM to optimal control↩︎

1.4 Dual-Regularized LQR↩︎

1.5 Associative Scans Overview↩︎

1.6 Motivation↩︎

1.7 Related Work↩︎

2 Sequential Algorithm↩︎

3 Parallel Algorithm↩︎

3.1 Composing Affine Functions with Associative Scans↩︎

4 Software Contributions↩︎

5 Conclusion↩︎

References↩︎

Subjects

Updated on Academus

Dual-Regularized Riccati Recursions
for Interior-Point Optimal Control