Abstract

Bilevel optimization provides a powerful framework for modeling hierarchical decision-making systems. This work presents a novel sensitivity-based algorithm that directly addresses the bilevel structure by treating the lower-level optimal solution as an implicit function of the upper-level variables, thus avoiding classical single-level reformulations. This implicit problem is solved within a robust Augmented Lagrangian framework, where the inner subproblems are managed by a quasi-Newton (L-BFGS-B) solver to handle the ill-conditioned and non-smooth landscapes that can arise. The validity of the proposed method is established through both theoretical convergence guarantees and extensive computational experiments. These experiments demonstrate the algorithm’s efficiency and robustness and validate the use of a pragmatic dual-criterion stopping condition to address the practical challenge of asymmetric primal-dual convergence rates.

1 Introduction↩︎

Bilevel optimization problems (BLPs) are mathematical programs in which one optimization problem is constrained by the solution of a subordinate optimization problem. This hierarchical structure defines two levels: an upper-level (or leader) problem, whose decisions influence the feasible set or objective of a lower-level (or follower) problem. The solution to the lower-level problem, in turn, feeds back into the upper-level decision-making process, creating a coupled dependency between the two levels.

The origin of BLPs can be traced back to leader-follower games introduced by [1] within the economic context. They were later introduced to the operation research community by [2] as optimization problems with an optimization problem in their constraints, and have since found widespread application in multiples disciplines. In chemical engineering, notable examples include the optimal design of processes involving thermodynamic equilibrium [3], parameter estimation in phase equilibrium problems [4], capacity planning [5], supply chain management [6], among others.

Despite their practical relevance, BLPs pose significant computational challenges. Their feasible region is often discontinuous and non-differentiable, rendering the overall problem nonconvex, even when each level is convex. In addition, there exist multiple, non-equivalent formulations of BLPs, which complicate the derivation of general optimality conditions. A common approach is to reformulate the problem as a single-level optimization problem. However, such reformulations are not always faithful: specific regularity and structural assumptions must be satisfied to preserve equivalence with the original bilevel structure [7].

This work proposes a sensitivity-based descent algorithm for solving deterministic, continuous bilevel optimization problems in which the lower-level problem is convex. The method leverages parametric sensitivity analysis to compute the response of the lower-level solution with respect to the upper-level variables, thus enabling the construction of descent directions for the upper-level objective. By decoupling the bilevel structure into a sequence of sensitivity evaluations and directional updates, the approach circumvents the need for single-level reformulations. A primal-dual update scheme is employed to ensure convergence under mild regularity assumptions.

The remainder of the paper is structured as follows. Section 2 discusses alternative formulations of BLPs and briefly reviews classical solution approaches. Section 3 introduces the proposed sensitivity-based method and provides theoretical results ensuring its convergence to an optimal solution. In Section 4 the numerical implementation is presented and applied to benchmark problems from the literature. Section 5 concludes the paper and outlines directions for future research.

2 BLP formulation and classical solution approaches↩︎

2.1 Bilevel optimization problems formulation↩︎

The general, yet inherently ambiguous, formulation of a BLP can be expressed as: \[\label{eq:ambiguous} \begin{align} \min_{x \in X} & \;F(x,y) \\ \text{s.t.} & \;G(x,y) \leq 0, \\ & \;y \in \mathop{\mathrm{argmin}}_{y \in Y} \set{ f(x,y) ; g(x,y) \leq 0 }, \end{align}\tag{1}\] where \(x \in X \subset \mathbb{R}^n\) denotes the upper-level variables, \(y \in Y \subset \mathbb{R}^m\) denotes the lower-level variables. The functions \(F, f:\mathbb{R}^n \times \mathbb{R}^m \to \mathbb{R}\) define the upper- and lower-level objectives, while \(G:\mathbb{R}^n \times \mathbb{R}^m \to \mathbb{R}^r\) and \(g:\mathbb{R}^n \times \mathbb{R}^m \to \mathbb{R}^s\) denote the vector-valued inequality constraints functions at the upper and lower-level, respectively. To simplify the notation and exposition, equality constraints are omitted in this work noting that they can be represented by pairs inequalities. Throughout this work, uppercase letters denote upper-level functions, while lowercase letters denote lower-level functions.

Despite its apparent simplicity, the BLP 1 is not well-posed whenever the lower-level problem admits multiple optimal solutions. Consider the illustrative example due to [8] whose upper-level objective is \(F(x,y) = x^2 + y^2\) and lower-level problem is \(\mathop{\mathrm{argmin}}_{y} \set{-xy; 0 \leq y \leq 1}\); if the upper-level decision is \(x=0\), then any \(y \in [0,1]\) is optimal for the lower-level, thus rendering the BLP ill-defined.

To appropriately formulate the BLP, we first introduce the parametric lower-level problem: \[\label{eq:ll} \begin{align} \min_{y \in Y} & \;f(x,y) \\ \text{s.t.} & \;g(x,y) \leq 0, \end{align}\tag{2}\] for a given value of the upper-level variable \(x\). The solution set of this problem defines a set-valued mapping \(\Psi: \mathbb{R}^n \rightrightarrows \mathbb{R}^m\) given by \[\label{eq:Psi40x41} \Psi(x) := \mathop{\mathrm{argmin}}_{y \in Y} \set{ f(x,y) ; g(x,y) \leq 0 }.\tag{3}\]

If the upper-level can influence the lower-level’s decision when multiple solutions exist, then one obtains the optimistic bilevel optimization problem: \[\label{eq:op95BLP} \begin{align} \min_{x, y} \; & \;F(x,y) \\ \text{s.t.} & \;G(x,y) \leq 0, \\ & \;x \in X, \\ & \;y \in \Psi(x). \end{align}\tag{4}\] It is important to note that the operator \(\min_{x, y}\) in this context does not imply a simultaneous optimization over the variables \(x\) and \(y\). Instead, the constraint \(y \in \Psi(x)\) dictates a sequential process where, for a given \(x\), the upper-level selects a specific \(y\) from the set \(\Psi(x)\) that is most favorable to its own objective. The sets \(X\) and \(Y\) are typically compact sets defined by box constraints on the variables which are treated as inequality constraints and included in \(G\) and \(g\).

If the leader cannot influence the lower-level decision making process, further assumptions or hierarchical selection rules must be imposed leading to the pessimistic formulation of BLPs [9], [10]. Throughout this work, we consider the optimistic case and assume that \(\Psi(x)\) is single-valued, ensuring that the BLP 4 is well defined.

Note that, by definition, the mapping \(\Psi\) in 3 denotes the set of global minimizers of the parametric lower-level problem 2 . Thus, any algorithm for solving bilevel problems must ensure global optimality at the lower-level. If local minimizers or stationary points of the lower-level problem are admitted, then the solution of the resulting relaxed problem will generally differ from that of the original problem [11].

2.2 Classical solution methods↩︎

Bilevel optimization problems are intrinsically difficult to analyze and solve. In particular, optimality conditions based on classical nonlinear programming concepts (stationarity, constraints qualifications or duality) are not readily available for the bilevel case. Therefore, the usual approach to solve the BLP 4 is to reformulate as a single-level optimization problem.

One such reformulations replaces the lower-level problem 2 with its Karush-Kuhn-Tucker (KKT) optimality conditions, which are then included as constraints in the upper-level problem. If the functions \(f, g\) are differentiable and convex, and if a suitable constraint qualification holds for all \(x\) at \(y \in \Psi(x)\), then the bilevel problem 4 can be reformulated as its KKT transformation: \[\label{eq:BLP32MPCC} \begin{align} \min_{x, y, \lambda} & \;F(x,y) \\ \text{s.t.} & \;G(x,y) \leq 0 \\ & \;\nabla_y f(x,y) + \lambda^\top \nabla_y g(x,y) =0, \\ & \;g(x,y) \leq 0 \\ & \;0 \leq \lambda, \;\lambda^\top g(x,y)=0 \\ & \;y \in Y, \;x \in X. \end{align}\tag{5}\]

Problem 5 is a mathematical program with complementarity constraints (MPCC). Both bilevel optimization problems and MPCCs are special cases of the broader class of mathematical programs with equilibrium constraints (MPECs) [12]. Due to the complementarity constraint, MPCCs violate standard constraint qualifications at any feasible point, which makes the derivation of optimality conditions a challenging task. To address these challenges, several generalized stationarity concepts have been developed within the MPEC framework [13], [14]. The strongest among these is Strong stationarity (S-stationarity). A feasible point of the MPCC reformulation 5 is called S-stationary if its standard KKT conditions are satisfied. This means there exist Lagrange multipliers such that the gradient of the MPEC Lagrangian is zero, and the multipliers associated with the inequality constraints are all non-negative. An S-stationary point is a highly desirable solution, as it is the most rigorous of the MPCC stationarity conditions. This MPCC reformulation has been studied in [15].

Another classical reformulation of the bilevel problem 4 is the so-called optimal value function reformulation, originally introduced by [13]. In this formulation, the optimal value function associated with the lower-level problem 2 is defined as \[\varphi(x) = \min_{y \in Y} \set{f(x,y) ; g(x,y) \leq 0},\] and is subsequently introduced as a constraint in the upper-level problem. By combining this with the feasible set of the lower-level problem, the bilevel problem is reformulated as the following single-level problem: \[\label{eq:optimal32value32BLP} \begin{align} \min_{x, y} \; & \;F(x,y) \\ \text{s.t.} \;& \;G(x) \leq 0 \\ & \;f(x,y) \leq \varphi(x) \\ & \;g(x,y) \leq 0 \\ & \;y \in T \\ & \;x \in X. \end{align}\tag{6}\] Problem 6 is a nonsmooth, nonconvex optimization program that is fully equivalent to the original bilevel problem 4 , both in terms of local and global solutions [7].

Due to its equivalence to bilevel optimization problems, problem 6 has attracted the attention of the global optimization community. [16] introduced a global algorithm for a class of bilevel problems with nonconvex lower levels and upper-level constraints that couple both decision levels which renders a mixed-integer nonlinear programming (MINLP) problem. This approach was subsequently extended to handle mixed-integer variables [17], and later adapted to accommodate lower-level equality constraints [18]. [19] proposed a branch-and-sandwich approach that maintains the bilevel structure while exploring both the upper- and lower-level solution spaces. This method was later extended to the case of mixed-integer bilevel problems [20].

3 The Sensitivity-Based Solution Method↩︎

This section details the proposed sensitivity-based algorithm for solving the optimistic bilevel problem 4 . Our method belongs to the class of gradient-based algorithms and follows a nested approach where an outer loop updates the upper-level variables and an inner loop solves the lower-level problem for a given upper-level decision. The main idea is to circumvent the single-level reformulations discussed previously by treating the lower-level problem as a parametric optimization problem and the upper-level as an implicit problem.

3.1 The Implicit Upper-Level Problem↩︎

The algorithm is based on the insight that the lower-level’s optimal decision can be viewed as an implicit function of \(x\), denoted \(\bar{y}(x)\). This allows to transform the bilevel problem 4 into an equivalent single-level problem: \[\label{eq:Implicit95BLP} \begin{align} \min_{x \in X} \quad & F(x, \bar{y}(x))\\ \text{s.t.} \quad & G(x, \bar{y}(x)) \le 0. \end{align}\tag{7}\]

While this problem cannot be solved directly, as \(\bar{y}(x)\) is not known in closed form, this formulation enables a gradient-based solution strategy. The total derivatives of the functions in the above problem can be computed via sensitivity analysis, as detailed in the subsequent sections.

For this transformation to be valid and for the implicit function \(\bar{y}(x)\) to be locally unique and continuously differentiable, we impose the following standard assumptions.

Assumption 1 (Regularity Conditions). Let \((\bar{x}, \bar{y})\) be a feasible point of the bilevel problem 4 .

Smoothness: The functions \(F, G, f,\) and \(g\) are twice continuously differentiable in a neighborhood of \((\bar{x}, \bar{y})\).
Lower-Level Convexity: For any feasible \(x\) in the neighborhood of \(\bar{x}\), the lower-level problem 2 is strictly convex.
Lower-Level Regularity: For any feasible \(x\) in the neighborhood of \(\bar{x}\), the lower-level solution \(\bar{y}(x)\) satisfies the Linear Independence Constraint Qualification (LICQ), the Second-Order Sufficient Condition (SOSC), and the Strict Complementarity Condition (SCC).

The above assumptions ensure, via the Implicit Function Theorem, that the solution map \(\bar{y}(x)\) and its associated Lagrange multipliers \(\bar{\lambda}(x)\) are continuously differentiable functions in the neighborhood of \(\bar{x}\). This differentiability is fundamental to compute derivatives of the upper-level problem 7 . For notational clarity throughout the remainder of this work, we omit the explicit dependency of the optimal lower-level solution \(\bar{y}\) and its associated multipliers \(\bar{\lambda}\) on \(x\).

The strict convexity assumption in Assumption 1(b) excludes the important class of bilevel optimization problems whose lower-level is a linear program (LP). To accommodate this class of problems within our framework, we employ a standard regularization technique. The linear objective \(f(x,y) = c(x)^T y\) is replaced by a strongly convex objective \(f(x,y) = c(x)^T y + \epsilon \|y\|^2\), where \(\epsilon\) is a small, positive constant (e.g., \(10^{-6}\)). This ensures the LP has a unique solution and satisfies the necessary regularity conditions for sensitivity analysis, allowing the rest of the algorithm to be applied without modification.

3.2 Sensitivity Analysis of the Lower Level↩︎

The optimality of the lower-level problem 2 for a fixed \(x\) is characterized by its Karush-Kuhn-Tucker (KKT) conditions. These are derived from the problem’s Lagrangian function: \[\label{eq:ll95lagrangian} \mathcal{L}_f(x, y, \lambda) = f(x, y) + \lambda^{\top} g(x, y),\tag{8}\] where \(\lambda \in \mathbb{R}^s\) are the Lagrange multipliers. The KKT conditions must hold at the optimal solution \((\bar{y}, \bar{\lambda})\). The lower-level KKT conditions are: \[\tag{9} \begin{align} \nabla_{y} \mathcal{L}_f(x, \bar{y}, \bar{\lambda}) &= 0, \tag{10} \\ g(x, \bar{y}) &\le 0, \tag{11}\\ \bar{\lambda} &\ge 0, \tag{12}\\ \bar{\lambda}_i g_i(x, \bar{y}) &= 0, \quad \forall i=1, \dots, s. \tag{13} \end{align}\]

Because of the complementarity constraints 13 , inactive constraints do not play a role in the optimization process. For sensitivity analysis, we only consider the constraints that are binding at the solution. We define the active set at the solution \(\bar{y}\) as \(A(x,\bar{y}) = \set{i; g_i(x,\bar{y}) = 0}\).

If the conditions in Assumption 1 are satisfied, we can differentiate the stationarity condition 10 and the complementarity constraints 13 for the active constraints. This yields the following linear system for the sensitivities: \[\label{eq:sensitivity95system} \begin{bmatrix} H_{\mathcal{L}_f} & \nabla_{y} g_A^{\top} \\ \Lambda_A \nabla_{y} g_A & 0 \end{bmatrix} \begin{bmatrix} \frac{d\bar{y}}{dx} \\ \frac{d\bar{\lambda}_A}{dx} \end{bmatrix} = \begin{bmatrix} - \nabla_{yx}^2 \mathcal{L}_f \\ - \Lambda_A \nabla_{x} g_A \end{bmatrix},\tag{14}\] where \(H_{\mathcal{L}_f}\) is the Hessian of the Lagrangian 8 with respect to \(y\) evaluated at \((x, \bar{y}, \bar{\lambda})\); \(g_A\) is the vector of active constraints, \(g_i\) for \(i \in A(x,\bar{y})\); and \(\Lambda_A\) is a diagonal matrix of the corresponding active multipliers \(\bar{\lambda}_i\).

3.3 Total Gradient Computation for the Upper Level↩︎

To solve the implicit upper-level problem 7 via a gradient-based method, we need the total derivatives of its objective and constraint functions with respect to \(x\). Using the sensitivity term \(d\bar{y}/dx\) from the linear system 14 , these gradients are computed via the chain rule: \[\tag{15} \begin{align} \nabla_{x} F(x, \bar{y}) &= \frac{\partial F}{\partial x} + \frac{\partial F}{\partial y} \frac{d \bar{y}}{dx} \tag{16} \\ \nabla_{x} G(x, \bar{y}) &= \frac{\partial G}{\partial x} + \frac{\partial G}{\partial y} \frac{d \bar{y}}{dx} \tag{17} \end{align}\] These gradients are essential for the iterative optimization procedure, which is designed to find a point satisfying the problem’s KKT conditions. The Lagrangian for the upper-level problem is: \[\label{eq:up95lagrangian} \mathcal{L}_{F} (x, \mu) = F(x, \bar{y}) + \mu^{\top} G(x, \bar{y}),\tag{18}\] where \(\mu \in \mathbb{R}^r\) are the Lagrange multipliers associated with the upper-level constraints. The goal of the upper-level solver is to find a point \((\bar{x}, \bar{\mu})\) that satisfies the following KKT conditions: \[\tag{19} \begin{align} \nabla_{x} \mathcal{L}_F (\bar{x},\bar{\mu}) &=0,\tag{20} \\ G(\bar{x}, \bar{y}) &\leq 0, \\ \bar{\mu} &\geq 0, \\ \bar{\mu}_i G_i(\bar{x}, \bar{y}) &= 0, \quad \forall i=1, \dots, r. \end{align}\]

3.4 The Proposed Algorithm↩︎

We use the residuals of the upper-level KKT conditions 19 to define a stopping criterion for our algorithm. At the end of each iteration, using the new iterate \((x_{k+1}, \mu_{k+1})\), we define the residuals as: \[\tag{21} \begin{align} r_{stat} &= \Vert \nabla_x \mathcal{L}_{F} (x_{k+1}, \mu_{k+1}) \Vert_{\infty}, \tag{22}\\ r_{feas} &= \Vert \max\{0, G(x_{k+1}, \bar{y}_{k+1})\} \Vert_{\infty}, \tag{23}\\ r_{comp} &= \Vert \text{diag}(\mu_{k+1}) G(x_{k+1}, \bar{y}_{k+1}) \Vert_{\infty}, \tag{24} \end{align}\] and the overall KKT residual is \[\label{eq:res95KKT} r_{KKT} = \max\{r_{stat}, r_{feas}, r_{comp}\}.\tag{25}\] The algorithm terminates when the KKT residual \(r_{KKT}\) falls below the prescribed tolerance \(\epsilon > 0\).

The complete procedure to solve the bilevel problem 4 via sensitivity analysis is presented as a general framework in Algorithm 2 which relies on a NLP solver step to update the upper-level variables. A high-level schematic of this overall framework is provided in Figure 1.

Figure 1: High-level schematic of the overall sensitivity-based Augmented Lagrangian framework, corresponding to Algorithm 2.

Figure 2: Sensitivity-Based Method for Bilevel Optimization

The upper-level update step, step [alg:UL95step] in Algorithm 2, can be performed using various methods. In this work, we implemented an Augmented Lagrangian Method (ALM) based on the Powell-Hestenes-Rockafellar (PHR) augmented Lagrangian function [21]–[23]: \[\label{eq:aug95Lagrangian} \mathcal{L}_{\rho} (x,\mu; \rho) = F(x, \bar{y}) + \frac{1}{2 \rho} \sum_{i =1}^{r} \left[ \left( \max \{ 0, \mu_i + \rho G_i(x, \bar{y}) \} \right)^2 - \mu_i^2 \right],\tag{26}\] where \(\rho >0\) is the penalty parameter. Its first derivative with respect to \(x\) is \[\label{eq:diff95aug95lagr} \nabla_{x} \mathcal{L}_{\rho} (x,\mu; \rho) = \nabla_{x}F(x,\bar{y}) + \sum_{i =1}^{r} \left( \max \{ 0, \mu_i + \rho G_i(x, \bar{y}) \} \right) \nabla_{x}G_i(x,\bar{y}),\tag{27}\] which is a continuous function.

At each outer iteration \(k\), the ALM forms the augmented Lagrangian subproblem \[\label{eq:subproblem} \min_{x} \mathcal{L}_{\rho} (x,\mu_k; \rho_k).\tag{28}\] In our framework, this subproblem is solved inexactly using a gradient-based method: \[\label{eq:x95update} x_{new} \gets x_{current} + \alpha p,\tag{29}\] where \(p = - \nabla_{x} \mathcal{L}_{\rho} (x_{current},\mu_k; \rho_k)\) is the search direction and \(\alpha\) is the step size determined via a line search that satisfies the Wolfe conditions. These inner iterations continue until the infinity norm of the gradient, \(\nabla_{x} \mathcal{L}_{\rho}\), falls below a prescribed inner tolerance, \(\epsilon_{inner} > 0\). The resulting point is then denoted \(x_{k+1}\). A general schematic of the upper-level variable update process is presented in Figure 3. The most computationally intensive part of this process is the evaluation of this gradient, which is an implicit function of \(x\). The multi-step workflow for this gradient evaluation, which involves solving the lower-level NLP and the sensitivity system, is illustrated in Figure 4.

Figure 3: Workflow for the ALM subproblem solution (Algorithm 5). At each outer iteration, a NLP solver is used to find an approximate minimizer of the implicit Augmented Lagrangian function.

Figure 4: Detailed workflow for the implicit objective and gradient evaluation. This multi-step process, which includes solving the lower-level NLP and the linear sensitivity system, is called at each iteration of the inner-loop.

Following the primal update, the dual variables are updated using the new iterate \(x_{k+1}\): \[\label{eq:mu95update} \mu_{k+1,i} \gets \max \{0, \mu_{k,i} + \rho_k G_i(x_{k+1}, \bar{y}_{k+1}) \}.\tag{30}\] This step uses the constraint violation at the new point \((x_{k+1}, \bar{y}_{k+1})\), to update the multipliers effectively.

Finally, the penalty parameter \(\rho\) is managed by using an adaptive scheme that balances the minimization of the objective function and the satisfaction of the constraints. The penalty is increased by a factor \(\gamma >1\) only when the improvement in the primal feasibility between iterations is deemed insufficient. This approach ensures that the iterates are driven towards feasibility without unnecessarily large penalty values, which could lead to ill-conditioning of the method. The detailed steps of the upper-level update are given in Algorithm 5.

Figure 5: Upper-Level Update via Augmented Lagrangian Method

3.5 Convergence Analysis↩︎

For completeness, we provide a sketch of the convergence proof of Algorithm 2 using the upper-level update methodology presented in Algorithm 5. The proof adapts the standard convergence analysis for the Augmented Lagrangian method [24] to our sensitivity-based framework for bilevel optimization.

Theorem 1 (Convergence to a KKT point). Let \(\{x_k, \mu_k \}\) be a sequence of iterates generated by Algorithm 2, with the upper-level update step given by Algorithm 5. Assume that:

The regularity conditions in Assumption 1 hold for every point in the sequence.
The sequence of iterates \(\{x_k\}\) is contained within a compact set \(X\), and the corresponding sequence of generated multipliers \(\{\mu_k\}\) is bounded.

Then any limit point of the sequence \(\{x_k, \mu_k \}\) is a KKT point of 7 .

Proof. First, we establish the existence of a limit point for the sequence of iterates. By Assumption (b), the sequence \(\{x_k\}\) is contained within a compact set \(X\), and the sequence of multipliers \(\{\mu_k\}\) is bounded. This implies that the joint sequence \(\{x_k, \mu_k\}\) is also contained within a compact set. Therefore, by the Bolzano-Weierstrass theorem, there exists at least one convergent subsequence. Let \((\bar{x}, \bar{\mu})\) be the limit point of such a subsequence, indexed by \(\mathcal{K} \subseteq \mathbb{N}\), such that: \[\label{eq:limit95point} \lim_{k \to \infty, k \in \mathcal{K}} (x_k, \mu_k) = (\bar{x}, \bar{\mu})\tag{31}\]

We now show that \((\bar{x}, \bar{\mu})\) satisfies the KKT conditions of the upper-level problem 19 .

1. Dual feasibility: The multiplier update rule in Algorithm 5, given by 30 , ensures that every component of \(\mu_k\) is non-negative for all \(k >1\). Since the terms of the convergent subsequence are non-negative, their limit must also be non-negative, i.e., \(\bar{\mu} \geq 0\).

2. Primal feasibility: Consider the two possible behaviors of the penalty parameter sequence \(\{\rho_k\}\).

(a) The sequence remains bounded. This implies that there exists sufficiently large \(\bar{k} \in \mathcal{K}\), after which the penalty parameter is no longer increased, i.e., \(\rho_{k} = \rho_{\bar{k}}\) for \(k > \bar{k}\). By Algorithm 5, this happens if the condition \[\Vert \max\{0, G(x_{k+1}, \bar{y}_{k+1})\} \Vert_{\infty} \leq c \cdot \Vert \max\{0, G(x_{k}, \bar{y}_{k})\} \Vert_{\infty},\] with \(c \in (0,1)\), is satisfied for all \(k > \bar{k}\). This implies that the sequence of feasibility residuals \(\{r_{feas,k}\}\) converges to zero. Therefore, \(r_{feas,k} \rightarrow 0\), implying \(G(\bar{x}, \bar{y}) \leq 0\) by continuity.

(b) The sequence diverges to infinity, i.e., \(\rho_k \to \infty\). Suppose for contradiction that the limit point \(\bar{x}\) is infeasible, meaning there is at least one constraint \(j\) such that \(G_j (\bar{x},\bar{y}) > 0\). By continuity of \(G_j\) and \(\bar{y}(x)\) (Assumption (a)), \(G_j (x_k, \bar{y}_k) > 0\) for sufficiently large \(k\). Since \({x_k}\) and \({\mu_k}\) are bounded, all other terms in \(\mathcal{L}_\rho\) remain bounded. However, the penalty term for constraint \(j\) grows asymptotically like \(\frac{\rho_{k}}{2}G_j(x_k, \bar{y}_k) \rightarrow \infty\) as \(\rho_k \rightarrow \infty\).

The Wolfe conditions ensure that the sequence of augmented Lagrangian values is non-increasing, \[\mathcal{L}_{\rho}(x_{k+1}, \mu_k; \rho_k) \le \mathcal{L}_{\rho}(x_k, \mu_k; \rho_k),\] and hence bounded above. This contradicts the divergence just established. Therefore, the initial assumption of infeasibility is false, and the limit point must be feasible.

3. Stationarity: By construction of Algorithm 5, the primal update step satisfies the Wolfe conditions. Standard results in optimization theory imply that for a continuously differentiable function, such a line search ensures the gradient norm of the minimized function converges to zero [24]. Therefore: \[\lim_{k \to \infty} \Vert \nabla_{x} \mathcal{L}_{\rho} (x_k,\mu_k; \rho_k) \Vert = 0.\]

We can substitute the dual variable update rule 30 into the gradient of the augmented Lagrangian function 27 . Since the sequence of gradient norms converges to zero, the limit of the subsequence indexed by \(\mathcal{K}\) must also be zero. As \(k \to \infty\) for \(k \in \mathcal{K}\), we have \(x_k \to \bar{x}\), \(\mu_k \to \bar{\mu}\), and \(\mu_{k+1} \to \bar{\mu}\). By the continuity of the gradient functions (Assumption (a)), we can take the limit: \[\begin{align} 0 &= \lim_{k \to \infty, k \in \mathcal{K}} \left\| \nabla_{x}F(x_k,\bar{y}_k) + \mu_{k+1}^{\top} \nabla_{x}G(x_k,\bar{y}_k) \right\| \\ &= \left\| \nabla_{x}F(\bar{x},\bar{y}) + \mu^{\top} \nabla_{x}G(\bar{x},\bar{y}) \right\| \\ &= \| \nabla_{x} \mathcal{L}_F (\bar{x},\bar{\mu}) \|. \end{align}\] This directly implies that the stationarity condition, \(\nabla_{x} \mathcal{L}_F (\bar{x},\bar{\mu}) = 0\), is satisfied.

4. Complementarity: Having established primal feasibility, we consider the two cases for any constraint \(j\) at the limit point.

(a) The constraint is inactive, i.e., \(G_j(\bar{x},\bar{y}) < 0\). By continuity, there exists a \(\bar{k}\) such that for all \(k \in \mathcal{K}\) with \(k > \bar{k}\) we have \(G_j(x_k,y_k) <0\). By Assumption (b), the sequence \(\{\mu_k\}\) is bounded. Regardless of whether \(\{\rho_k\}\) is bounded or diverges, the negative term \(\rho_k G_j(x_k,y_k)\) is guaranteed to eventually dominate the bounded, non-negative \(\mu_{k,j}\). This ensures that the term \(\mu_{k,j} + \rho_k G_j(x_k,y_k)\) will be negative for all \(k \in \mathcal{K}\) with \(k > \bar{k}\), forcing \(\mu_{k+1,j}\) to be zero via the \(\max\) operator in 30 . Therefore, the limit \(\bar{\mu}_j\) must be zero.

(b) The constraint is active, i.e., \(G_j(\bar{x},\bar{y}) = 0\). In this case, the condition \(\bar{\mu}_j G_j(\bar{x},\bar{y}) = 0\) is trivially satisfied.

Since all KKT conditions are satisfied, any limit point \((\bar{x}, \bar{\mu})\) is a KKT point of the implicit problem 7 . ◻

3.6 Equivalence to S-Stationarity↩︎

We now establish the equivalence between a KKT point of the implicit problem 7 and a stationary point of the MPCC reformulation 5 . A constraint qualification tailored for these problems is MPEC-LICQ, which requires that the gradients of the active upper-level constraints, the active lower-level constraints, and the lower-level stationarity equations are linearly independent. This condition ensures that the multipliers of the MPCC are well-defined [25].

The Lagrangian for the MPCC 5 is given by: \[\label{eq:MPCC95Lagrangian} \mathcal{L}_{MPCC}(x,y,\lambda,\mu,\nu,\pi,\xi) = F(x,y) + \mu^{\top}G(x,y) + \nu^{\top}\nabla_y \mathcal{L}_f(x,y,\lambda) + \pi^{\top}g(x,y) - \xi^{\top} \lambda\tag{32}\] where \(\mu, \nu, \pi, \xi\) are the Lagrange multipliers which follow standard multiplier sign convention, namely, the equality constraint multiplier \(\nu\) is free, and the inequality constraints multipliers \(\mu, \pi, \xi\) are non-negative. Note that this Lagrangian function does not consider the complemenarity slackness condition \(\bar\lambda_i g_i(\bar x,\bar y)=0\) for all \(i\), this condition is handled through index–set dependent rules on the multipliers.

A feasible point \((\bar{x},\bar{y},\bar{\lambda})\) of the MPCC 5 is S–stationary if there exist multipliers \((\bar{\mu}, \bar{\nu}, \bar{\pi}, \bar{\xi})\) such that the following hold:

Stationarity: \[\tag{33} \begin{align} \nabla_x \mathcal{L}_{MPCC} = &\nabla_x F + \nabla_x G^\top\bar\mu + (\nabla_{yx}^2 \mathcal{L}_f)^\top\bar\nu + \nabla_x g^\top\bar\pi = 0,\tag{34}\\ \nabla_y \mathcal{L}_{MPCC} = &\nabla_y F + \nabla_y G^\top\bar\mu + (\nabla_{y}^2 \mathcal{L}_f)^\top\bar\nu + \nabla_y g^\top\bar\pi = 0,\tag{35}\\ \nabla_{\lambda} \mathcal{L}_{MPCC} = &(\nabla_y g)\,\bar\nu - \bar\xi = 0.\tag{36} \end{align}\]
Primal feasibility: \[G(\bar{x},\bar{y}) \leq 0,\quad \nabla_y \mathcal{L}_f(\bar{x},\bar{y},\bar{\lambda})= 0, \quad g(\bar{x},\bar{y}) \leq 0,\quad \bar{\lambda} \geq 0,\quad \bar{\lambda}_i g_i(\bar{x},\bar{y})=0\;\forall i.\]
Dual feasibility and complementarity slackness: \[\bar{\mu} \geq 0, \quad \bar{\pi} \geq 0, \quad \bar{\xi} \geq 0,\] \[\bar{\mu}^{\top} G(\bar{x},\bar{y})=0, \quad \bar{\pi}^{\top} g(\bar{x},\bar{y})=0, \quad \bar{\xi}^{\top} \bar{\lambda} =0.\]
Sign rules: Define index sets \[\begin{align} \mathcal{I}^{+}&=\{i:\;g_i(\bar x,\bar y)=0,\;\bar{\lambda}_i>0\},\\ \mathcal{I}^{-}&=\{i:\;g_i(\bar x,\bar y) \leq 0,\;\bar{\lambda}_i = 0\},\\ \mathcal{I}^{0}&=\{i:\;g_i(\bar x,\bar y) = 0,\;\bar{\lambda}_i = 0\}. \end{align}\] Then for every \(i\): \[\begin{align} \begin{cases} i\in\mathcal{I}^{+}: & \bar{\pi}_i \ge 0,\;\bar{\xi}_i = 0,\\ i\in\mathcal{I}^{-}: & \bar{\pi}_i = 0,\;\bar{\xi}_i \ge 0,\\ i\in\mathcal{I}^{0}: & \bar{\pi}_i \ge 0,\;\bar{\xi}_i \ge 0. \end{cases} \end{align}\]

Theorem 2 (Equivalence to S-stationarity). Let \((\bar{x}, \bar{y})\) be a feasible point for the bilevel problem 4 , and let \(\bar{\lambda}\) be a multiplier such that \((\bar{x}, \bar{y}, \bar{\lambda})\) satisfies the KKT conditions of the lower–level problem 2 . Assume that the lower-level regularity conditions (Assumption 1) hold in a neighborhood of \((\bar{x}, \bar{y})\). Assume moreover that MPEC–LICQ holds for the MPCC 5 at \((\bar{x}, \bar{y}, \bar{\lambda})\).

Then, there exists \(\bar{\mu}\) such that \((\bar{x}, \bar{\mu})\) is a KKT point of the implicit problem 7 if and only if \((\bar{x}, \bar{y}, \bar{\lambda})\) is an S–stationary point of the MPCC reformulation 5 .

Proof. The idea for the forward direction is to construct an adjoint system that allows us to bypass the sensitivity term present in stationarity condition of the implicit problem and recover the stationarity conditions of the MPCC. The converse is shown by algebraic manipulation of the adjoint system. We first define the required expressions that will be used throughout this proof.

Define the sets \(A = \set{i; g_i(\bar{x}, \bar{y}) = 0,\;\bar{\lambda}_i > 0 }\) and \(I = \set{i; g_i(\bar{x}, \bar{y}) < 0,\;\bar{\lambda}_i = 0}\) as the active and inactive index sets at \((\bar x,\bar y)\), respectively. By SCC (Assumption 1), the biactive case is ruled out. The sensitivity system 14 at \((\bar{x}, \bar{y}, \bar{\lambda})\) is given by the linear system: \[\label{eq:sensitivity95system95compact} \underbrace{\begin{bmatrix} H_{\mathcal{L}_f} & \nabla_{y} g_A^{\top} \\ \bar{\Lambda}_A \nabla_{y} g_A & 0 \end{bmatrix}}_{=M} \begin{bmatrix} \frac{d\bar{y}}{dx} \\ \frac{d\bar{\lambda}_A}{dx} \end{bmatrix} = \underbrace{\begin{bmatrix} - \nabla_{yx}^2 \mathcal{L}_f \\ - \bar{\Lambda}_A \nabla_{x} g_A \end{bmatrix}}_{=b},\tag{37}\] where \(H_{\mathcal{L}_f} = \nabla_y^2 \mathcal{L}_f(\bar{x}, \bar{y}, \bar{\lambda})\) and \(\bar{\Lambda}_A = \text{diag}(\bar{\lambda}_A)\). By Assumption 1, the matrix \(M\) is nonsingular therefore the function \(x \mapsto \bar y(x)\) is single-valued and differentiable in a neighborhood of \(\bar x\).

The stationarity conditions of the implicit problem 7 at \((\bar{x},\bar{\mu})\) using the chain rule is: \[\label{eq:UL-stationarity} \nabla_{x} F(\bar{x}, \bar{y}) + \nabla_{x} G(\bar{x}, \bar{y})^{\top} \bar{\mu} + \left(\frac{d\bar{y}}{dx}\right)^{\top} \Big( \nabla_{y} F(\bar{x}, \bar{y}) + \nabla_{y} G(\bar{x}, \bar{y})^{\top} \bar{\mu} \Big)=0.\tag{38}\]

\((\Rightarrow)\) Choose adjoint variables \((\bar{\nu},\bar{w})\) as the solution of the following adjoint system \[\label{eq:adjoint95system} M^{\top} \begin{bmatrix} \bar\nu\\[2pt] \bar w \end{bmatrix} = -\begin{bmatrix} \nabla_y F(\bar x,\bar y)+\nabla_y G(\bar x,\bar y)^{\top}\bar\mu\\[2pt] 0 \end{bmatrix}.\tag{39}\] Since \(M\) is nonsingular, \((\bar\nu,\bar w)\) is uniquely defined. Multiply 37 on the left by \([\bar{\nu}^{\top},\bar{w}^{\top}]\) and using 39 gives: \[\label{eq:adjoint-equality} \left(\frac{d\bar{y}}{dx}\right)^{\top} \Big( \nabla_{y} F(\bar{x}, \bar{y}) + \nabla_{y} G(\bar{x}, \bar{y})^{\top} \bar{\mu} \Big) = \big(\nabla^2_{yx} \mathcal{L}_f\big)^{\top}\bar{\nu} + \left(\nabla_x g_A \right)^{\top}\bar{\Lambda}_A \bar{w}.\tag{40}\]

Substitute 40 into the stationarity condition 38 to get \[\label{eq:MPCC95stat95x95adjoint} \nabla_{x} F(\bar{x}, \bar{y}) + \nabla_{x} G(\bar{x}, \bar{y})^{\top} \bar{\mu} + \big(\nabla^2_{yx} \mathcal{L}_f\big)^{\top}\bar{\nu} + \left(\nabla_x g_A \right)^{\top}\bar{\Lambda}_A \bar{w} = 0.\tag{41}\] Define the MPCC multipliers \[\label{eq:MPCC95multipliers} \bar{\pi}_I =0, \quad \bar{\pi}_A = \bar{\Lambda}_A \bar{w}, \quad \bar{\xi} = \nabla_{y}g(\bar{x},\bar{y}) \bar{\nu},\tag{42}\] where \(\bar{\Lambda}_A\) is diagonal with strictly positive entries, this definition preserves nonnegativity if \(\bar{w} \geq 0\). Then 40 is the MPCC stationarity with respect to \(x\) 34 .

Stationarity with respect to \(y\) is shown by considering the first block of 39 : \[H_{\mathcal{L}_f}\bar{\nu} + \nabla_y g_A(\bar{x},\bar{y})^{\top}\Lambda_A \bar{w} = - \Big(\nabla_{y} F(\bar{x}, \bar{y}) + \nabla_{y} G(\bar{x}, \bar{y})^{\top} \bar{\mu}\Big),\] and recalling that \(H_{\mathcal{L}_f}=\nabla^2_{yy}\mathcal{L}_f\) and \(\bar{\pi}_I =0\) we obtain 35 . From the second block of 39 we obtain \(\nabla_y g_A \bar{\nu} =0\), which implies \(\bar{\xi}_A =0\). Hence, stationarity with respect to \(\lambda\) is satisfied \[\nabla_{\lambda} \mathcal{L}_{MPCC} = \nabla_y g \bar{\nu} - \bar{\xi} = 0.\]

Primal feasibility of \((\bar{x},\bar{y},\bar{\lambda})\) for the MPCC problem holds by construction while complementarity \((\bar{\mu} \geq 0, \bar{\mu}^{\top}G =0)\) is inherited from the KKT conditions of the implicit problem.

To verify the S–stationarity sign conditions, we first note that \(\bar{\xi}_A =0\) is a direct consequence of the adjoint system, and \(\bar{\pi}_I =0\) holds by construction. The remaining conditions, \(\bar{\pi}_A \geq 0\) and \(\bar{\xi}_I \geq 0\) follow from the optimality of the KKT point \((\bar{x},\bar{\mu})\). The MEPC-LICQ ensures that the multipliers have a valid sensitivity interpretation. For \(i \in A\), the multiplier \(\bar{\pi}_i\) represents the sensitivity of the upper-level objective \(F\) to the lower-level constraint \(g_i\). Since \(\bar{x}\) is a local minimum, relaxing this constraint cannot improve the objective, thus \(\bar{\pi}_A \geq 0\). Similarly for \(i \in I\), \(\bar{\xi}_i\) is the sensitivity to the condition \(\lambda_i = 0\). A negative \(\bar{\xi}_i\) would imply that the upper-level objective could be improved if \(\lambda_i > 0\), that is, by making the lower-level constraint \(g_i\) active, contradicting the optimality of \(\bar{x}\). We collect the S–stationarity sign rules here: \[i\in A:\;\bar\pi_i\ge 0,\;\bar\xi_i=0; \qquad i\in I:\;\bar\pi_i=0,\;\bar\xi_i\ge 0.\] This proves that \((\bar{x},\bar{y},\bar{\lambda})\) is S–stationary for 5 .

\((\Leftarrow)\) Conversely, suppose \((\bar x,\bar y,\bar\lambda)\) is S–stationary for the MPCC, i.e., there exist \((\bar\mu,\bar\nu,\bar\pi,\bar\xi)\) satisfying primal feasibility, dual feasibility , S–stationarity sign rules, complementarity, and the stationarity conditions 33 .

Under SCC (Assumption 1), we can define \[\bar{w} = \bar{\Lambda}_A^{-1}\bar{\pi}_A.\] Then the MPCC stationarity with respect to \(y\) 35 and with respect to \(\lambda\) 36 is equivalent to the adjoint system 39 . The MPCC stationarity condition with respect to \(x\) 34 , combined with the identity 40 , which follows from the other stationarity conditions, directly yields the implicit problem’s stationarity condition 38 . Complementarity and dual feasibility are inherited from the MPCC S–stationarity sign rules. Hence \((\bar{x}, \bar{\mu})\) is a KKT point of the implicit problem 7 . ◻

4 Computational Tests↩︎

To demonstrate the validity and performance of the proposed sensitivity-based method, a series of computational experiments were conducted. The algorithm, as outlined in Algorithm 2 and implemented using the Augmented Lagrangian framework from Algorithm 5, was tested on a suite of standard benchmark problems. This section details the implementation, the test problems, and an analysis of the results.

4.1 Implementation Details↩︎

The method was implemented in Python, leveraging CasADi [26] for its automatic differentiation capabilities. The lower-level parametric NLPs were solved using IPOPT [27]. A key architectural feature is the method used to solve the inner ALM subproblem. As this is an implicit and potentially ill-conditioned optimization problem, the robust L-BFGS-B algorithm [28] from the SciPy library was employed. This quasi-Newton method is well-suited for the non-smooth landscapes that can arise.

To ensure efficient termination, the algorithm uses a dual stopping criterion. A well-known practical feature of the Augmented Lagrangian method is that the primal variables often converge to a high-accuracy solution much more rapidly than the dual variables, whose first-order update scheme can exhibit slow linear convergence [24]. Therefore, while the primary criterion is the KKT residual 25 falling below a tolerance of \(\epsilon = 10^{-5}\), a secondary criterion is also employed. This pragmatic condition terminates the algorithm if the change in the primal variables \(x\) and the upper-level objective function \(F\) between consecutive iterations falls below a stall tolerance, also set to \(10^{-5}\). This prevents the solver from performing an excessive number of iterations to slowly refine the dual variables when no further meaningful improvement to the solution is being made.

4.2 Test Problems↩︎

As an illustrative example, we first conduct a detailed analysis of the classic ClarkWesterberg problem [3], a well-known benchmark in the chemical engineering literature. The problem is defined as: \[\begin{align} \min_x & \;(x - 3)^2 + (y - 2)^2\\ \text{s.t.} & \;0 \leq x \leq 8\\ & \;y \in \Psi(x) = \mathop{\mathrm{argmin}}_{y}\set*{(y - 5)^2;\begin{aligned} &-2x +y-1 \leq 0,\\ &x-2y+2 \leq 0, \\ & x+2y-14 \leq 0\end{aligned}}. \end{align}\]

The geometry of this problem is illustrated in Figure 6. The implicit upper-level objective, \(F(x,\bar{y}(x))\), is continuous but non-smooth, with non-differentiable kinks at \(x=2\) and \(x=4\) that correspond to changes in the lower-level active set. This non-convex landscape gives rise to multiple optima, including a global minimum at \(x=1\) and two distinct local minima. This sensitivity to the initial point underscores the necessity of a multi-start strategy to adequately explore the solution space, a characteristic feature of non-convex bilevel problems.

Figure 6: The implicit upper-level objective F(x,\bar{y}(x)) and the lower-level optimal response \bar{y}(x) as a function of the upper-level variable x for the ClarkWesterberg1990 problem. The local and global optima are highlighted. — Figure 6: The implicit upper-level objective \(F(x,\bar{y}(x))\) and the lower-level optimal response \(\bar{y}(x)\) as a function of the upper-level variable \(x\) for the `ClarkWesterberg1990` problem. The local and global optima are highlighted.

To demonstrate the algorithm’s performance on a problem with active upper-level constraints, we present the convergence results for the Outrata_Cervinka_2009 problem in Figure 7. Unlike cases that terminate due to stalling, this problem demonstrates convergence via the primary KKT criterion. The plot shows the KKT residual decreasing by several orders of magnitude to meet the tolerance, while the upper-level multiplier \(\mu_1\) converges rapidly to its optimal value. This provides strong numerical evidence that the algorithm performs as theoretically intended.

Figure 7: Convergence behavior for the Outrata_Cervinka_2009 problem. The algorithm converges in a few iterations as the KKT residual drops below the tolerance \epsilon = 10^{-5}. — Figure 7: Convergence behavior for the `Outrata_Cervinka_2009` problem. The algorithm converges in a few iterations as the KKT residual drops below the tolerance \(\epsilon = 10^{-5}\).

To demonstrate the broader applicability and robustness of the proposed method, the algorithm was tested on a suite of benchmark problems from the BOLIB library [29]. A summary of these computational experiments is presented in Table 1. For each problem, a multi-start strategy was employed, and the best solution found is reported. The table details the problem dimensions \((n,m)\), the initial point (\(x_0\)) that led to the best solution, the known optimal value from the literature (\(\bar{F}_r\)), the value found by our method (\(\bar{F}_c\)), the total outer iterations, and the solution time.

For problems with relatively simple landscapes, the algorithm often converges in very few outer iterations, which shows the efficiency of the quasi-Newton method used for the inner subproblem. For more challenging problems, such as AiyoshiShimizu1984Ex2, the algorithm’s dual-criterion termination proves essential. The solver correctly identifies the optimal primal solution rapidly, but convergence is achieved via the ‘stalled’ criterion due to the slow convergence of the dual variables. This demonstrates the robustness of the implementation in finding high-quality optima even when the strict KKT tolerance is not met in a practical number of iterations. Overall, the results found were consistent with the best-reported solutions in the literature.

Table 1: Performance of the Sensitivity-Based ALM on Selected Benchmark Problems.
Problem	(\(n\),\(m\))	\(x_0\)	\(\bar{F}_r\)	\(\bar{F}_c\)	Iters	Time (s)
AiyoshiShimizu1984Ex2	(2, 2)	(20.0, 20.0)	5.0	5.0	12	2.20
AllendeStill2013	(2, 2)	(2.0, 2.0)	-1.0	-1.0	11	1.11
Bard_1988_ex1	(1, 1)	2.0	17.0	17.0	11	0.91
Bard_1991_ex1	(1, 2)	4.0	2.0	2.0	11	0.48
Bard_Book_1998	(2, 2)	(15, 15)	0.0	0.0	6	0.28
ClarkWesterberg1990	(1, 1)	1.7	5.0	5.0	11	0.33
DempeEtal2012	(1, 1)	0.9	-1.0	-1.0	11	0.22
Dempe_Franke_2011_ex42	(2, 2)	\((-0.9,\;0.9)\)	3.0	3.0	11	2.75
Dempe_Lohse_2011_ex31a	(2, 2)	\((-0.4, \;-0.4)\)	-5.5	-5.5	11	5.35
Dempe_Lohse_2011_ex31b	(3, 3)	(4.0, 4.0, 4.0)	-12.0	-12.0	12	1.13
FloudasEtal2013	(2, 2)	(10.0, 10.0)	0.0	0.0	11	0.48
Outrata_Cervinka_2009	(2, 2)	\((-10.0, -1.0)\)	0.0	0.0	12	0.67
Shimizu_Aiyoshi_1981_ex2	(2, 2)	(10.0, 1.0)	225.0	225.0	39	4.20

5 Conclusions↩︎

In this work, a novel sensitivity-based algorithm for solving continuous, optimistic bilevel optimization problems was developed. By treating the lower-level problem as an implicit function of the upper-level variables, this approach addresses the hierarchical structure of BLPs, avoiding the need for classical KKT or value-function reformulations. The method was embedded within a robust Augmented Lagrangian framework, providing a theoretically sound and practical tool for solving this challenging class of optimization problems.

Computational experiments on a suite of benchmark problems demonstrated the effectiveness and efficiency of the proposed method. The results highlighted the critical role of the architectural choice for the inner-loop (Algorithm 5) solver; the use of a robust quasi-Newton method (L-BFGS-B) proved essential for handling the ill-conditioned and non-smooth subproblems that arise. The analysis of the problem landscapes confirmed the non-convexity inherent in BLPs, underscoring the necessity of a multi-start strategy. Furthermore, the implemented dual-criterion stopping condition proved to be a robust and efficient solution to the practical challenge of asymmetric convergence rates between primal and dual variables in the Augmented Lagrangian method.

Future research is focused on extending extending this framework to handle non-convex lower-level problems, potentially through the integration of global optimization techniques or branching strategies for the follower’s problem. Additionally, exploring second-order update schemes for the dual variables could accelerate convergence and warrants further investigation. Finally, the application of this method to larger-scale chemical engineering problems remains a promising avenue for future work.

Acknowledgments↩︎

The first author acknowledges Dr. Ehecatl Antonio del Río Chanona for helpful discussions on improving the presentation of this work.

References↩︎

[1]

Heinrich von Stackelberg. . Technical report, Berlin, 1934.

[2]

Jerome Bracken and James T Mcgill. . Mathematical Programming, 21 (1): 37–44, 1973.

[3]

P. A. Clark and A. W. Westerberg. . Computers and Chemical Engineering, 14 (1): 87–97, jan 1990. ISSN 00981354. . URL https://linkinghub.elsevier.com/retrieve/pii/009813549087007C.

[4]

Alexander Mitsos, George M. Bollas, and Paul I. Barton. . Chemical Engineering Science, 64 (3): 548–559, feb 2009. ISSN 00092509. . URL https://linkinghub.elsevier.com/retrieve/pii/S0009250908005113.

[5]

Pablo Garcia-Herreros, Lei Zhang, Pratik Misra, Erdem Arslan, Sanjay Mehta, and Ignacio E. Grossmann. . Computers and Chemical Engineering, 86: 33–47, 2016. ISSN 00981354. .

[6]

Dajun Yue and Fengqi You. . Computers and Chemical Engineering, 71: 347–361, 2014. ISSN 00981354. .

[7]

Stephan Dempe, Vyacheslav Kalashnikov, Gerardo A. Pérez-Valdés, and Nataliya Kalashnykova. Bilevel Programming Problems. Springer Berlin Heidelberg, 2015.

[8]

R. Lucchetti, F. Mignanego, and G. Pieri. . Optimization, 18 (6): 857–866, jan 1987. ISSN 0233-1934. . URL http://www.tandfonline.com/doi/abs/10.1080/02331938708843300.

[9]

Stephan Dempe. Foundations of Bilevel Programming, volume 61 of Nonconvex Optimization and Its Applications. Kluwer Academic Publishers, Dordrecht, 2002.

[10]

Wolfram Wiesemann, Angelos Tsoukalas, Polyxeni-Margarita Kleniati, and Berç Rustem. . SIAM Journal on Optimization, 23 (1): 353–380, 2013.

[11]

J. A. Mirrlees. . Review of Economic Studies, 66 (1): 3–21, jan 1999. ISSN 0034-6527.

[12]

Michal Kočvara and Jiří V. Outrata. . Mathematical Programming, 101 (1): 119–149, 2004. ISSN 0025-5610.

[13]

J. V. Outrata. . ZOR-Math. Operations Research Methods and Models of Operations Research, 34 (4): 255–277, 1990.

[14]

Holger Scheel and Stefan Scholtes. . Mathematics of Operations Research, 25 (1): 1–22, 2000.

[15]

Z.H. Gümüş and C.A. Floudas. . Journal of Global Optimization, 20: 1–31, 2001.

[16]

Alexander Mitsos, Panayiotis Lemonidis, and Paul I. Barton. . Journal of Global Optimization, 42 (4): 475–513, 2008.

[17]

Alexander Mitsos. . Journal of Global Optimization, 47 (4): 557–582, aug 2010. ISSN 0925-5001.

[18]

Hatim Djelassi, Moll Glass, and Alexander Mitsos. . Journal of Global Optimization, 75 (2): 341–392, 2019.

[19]

Polyxeni-Margarita Kleniati and Claire S. Adjiman. . Journal of Global Optimization, 60 (3): 425–458, 2014.

[20]

Polyxeni M. Kleniati and Claire S. Adjiman. . Computers and Chemical Engineering, 72: 373–386, 2015. ISSN 00981354. .

[21]

M. J. D. Powell. . In R. Fletcher, editor, Optimization, pages 283–298. Academic Press, New York, 1969.

[22]

Magnus R. Hestenes. . Journal of Optimization Theory and Applications, 4 (5): 303–320, nov 1969. ISSN 0022-3239. . URL http://link.springer.com/10.1007/BF00927673.

[23]

R. T. Rockafellar. . Journal of Optimization Theory and Applications, 12 (6): 555–562, dec 1973. ISSN 0022-3239. . URL http://link.springer.com/10.1007/BF00934777.

[24]

Jorge Nocedal and Stephen Wright. . Springer Series in Operations Research and Financial Engineering, chapter 13, page 664. Springer New York, 2 edition, 2006. ISBN 978-0-387-30303-1. .

[25]

Zhi-Quan Luo, Jong-Shi Pang, and Daniel Ralph. Mathematical Programs with Equilibrium Constraints. Cambridge University Press, nov 1996. ISBN 9780521572903. . URL https://www.cambridge.org/core/product/identifier/9780511983658/type/book.

[26]

Joel A. E. Andersson, Joris Gillis, Greg Horn, James B. Rawlings, and Moritz Diehl. . Mathematical Programming Computation, 11 (1): 1–36, mar 2019. ISSN 1867-2949. . URL http://link.springer.com/10.1007/s12532-018-0139-4.

[27]

Andreas Wächter and Lorenz T. Biegler. . Mathematical Programming, 106 (1): 25–57, mar 2006. ISSN 0025-5610. . URL http://link.springer.com/10.1007/s10107-004-0559-y.

[28]

Richard H. Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. . SIAM Journal on Scientific Computing, 16 (5): 1190–1208, sep 1995. ISSN 1064-8275. . URL http://epubs.siam.org/doi/10.1137/0916069.

[29]

Samuel Ward, Alain Zemkoho, Jordan Fisher, David Benfield, and Yaru Qian. . Technical report, University of Southampton, 2025.

Corresponding author: en307@cam.ac.uk↩︎

A Sensitivity-Based Method for Bilevel Optimization Problems: Theoretical Analysis and Computational Performance