Abstract

In this article, we provide a modification to the Bregman Golden Ratio Algorithm (B-GRAAL). We analyze the B-GRAAL algorithm with a new step size rule, where the step size increases after a certain number of iterations and does not require prior knowledge of the global Lipschitz constant of the cost operator. Under suitable assumptions, we establish the global iterate convergence as well as the R-linear rate of convergence of the modified algorithm. The numerical performance of the proposed approach is validated for the matrix game problem and the sparse logistic regression problem in machine learning.

Variational inequality; Bregman distance; KL divergence; Sparse logistic regression; Matrix games

MSC 47J20, 49J40, 65K15, 65Y20

1 Introduction↩︎

Let \(\mathcal{H}\) be a real finite dimentional Hilbert space. A mixed variational inequality problem (MVIP) is a problem to find \(w^*\in\mathcal{H}\) such that \[\label{eqn95defvi} \langle A(w^*),w-w^*\rangle+g(w)-g(w^*)\geq 0~\forall w\in \mathcal{H},\tag{1}\] where \(A:\mathcal{H}\to\mathcal{H}\) is the cost operator, and \(g:\mathcal{H}\to\mathbb{R}\cup\{+\infty\}\) is an extended real-valued function. We denote the set of solutions of (1 ) by \(S\). Mixed variational inequality problems (MVIPs) of the form (1 ) generalize various optimization problems encountered in nonlinear programming and variational analysis, including minimization problems, linear complementarity problems, and variational inequalities, see [1]–[4]. These problems have widespread applications in diverse fields, such as data science, image processing, mechanics, control theory, economics, structural engineering, and many more; see [1], [5]–[11], and the references therein.

Many algorithms solving the variational inequality (1 ) (for example; see [3], [12], [13]), need the exact value of the global Lipschitz constant of the cost operator \(A\). This assumption is often overly restrictive, as determining the Lipschitz constant of \(A\) can be more challenging than solving the original problem itself. Furthermore, even when the Lipschitz constant \(L\) is known, using a constant step size rule that inversely depends on \(L\) can be highly restrictive. In particular, when \(L\) is large, the step size \(\lambda\) is restricted to a very small value, which greatly limits the possible choices for \(\lambda\) and may slow down convergence for a given number of iterations. To overcome these issues, several researchers proposed step size sequences which do not require the value of Lipschitz constant, rather these algorithms approximate the local Lipschitz constant at each iterate. To generate such step size sequences the authors used backtracking procedures (see [14]–[18]). Although these methods address the aforementioned drawbacks, they can become costly in terms of overall run time due to the potentially large number of steps required for the backtracking procedure in each iteration.

To address this issue, Malitsky [19] proposed an adaptive strategy for step size selection. Subsequently, several authors adopted the adaptive step size technique to address the limitations of the backtracking procedure [20]–[22]. To build upon earlier methods, Hoai [23] recently proposed an algorithm to solve (1 ), in which the step size sequence increases after a finite number of iterations. One advantage of using an increasing step size is that it allows for adaptation when the initial step size is too small, particularly in cases where the Lipschitz constant is very large. If the step size begins at a low value, an increasing sequence can gradually adjust it over iterations, improving efficiency. This flexibility provides a key advantage over fixed or nonincreasing step size strategies.

Another approach to improve algorithms to solve (1 ) is to replace the Euclidean distance by other nonlinear distances such as Bregman distance. The usage of Bregman distance and Bregman proximal operators provides flexibility when deciding which projection to compute. For example, in convex-concave matrix games, the Bregman proximal operator appears as a projection onto the probability simplex, that has a straightforward closed form expression with regard to the Kullback-Leibler (KL) divergence but not with respect to the conventional Euclidean distance. For further advantages of the Bregman distance over the Euclidean distance, we refer the reader to [24]–[27]. The extant literature has several approaches for solving (1 ) with Bregman proximal operators [15], [28]–[36].

Most of these methods assume a Lipschitz condition without requiring the explicit value of the Lipschitz constant. Nevertheless, they often rely on a backtracking procedure. Recently, Tam and Uteda [27], studied the Bregman modifications of the Golden Ratio Algorithm, which iterates as follows.

Figure 1: The Bregman-Golden Ratio Algorithm (B-GRAAL).

This method does not require the backtracking procedure. However, to implement the algorithm, one needs to know the exact value of the global Lipschitz constant. To overcome this issue, in the same article [27], the authors proposed the Bregman adaptive Golden Ratio Algorithm (B-aGRAAL). Implementation of the B-aGRAAL algorithm requires neither backtracing nor the value of the Lipschitz constant.

Motivated by the work of Tam and Uteda [27], in this article, we propose a modification to the B-GRAAL algorithm. We study the B-GRAAL algorithm with a new step size rule. Our choice of step size sequence is motivated by the step size used by Hoai in [23]. This choice of step size sequence eliminates the need to know the value of global Lipschitz constant to implement the B-GRAAL algorithm. Furthermore, unlike previous approaches that utilize Bregman divergence to solve MVIP (1 ), our step size sequence increases after a finite number of iterations. We present a comprehensive convergence analysis of the proposed algorithm. Further, we establish the R linear rate of convergence of the algorithm. The numerical results given in Section 4 reveal that the proposed algorithm significantly improves the performance of the B-GRAAL algorithm.

This paper is structured as follows. Section 2 begins with the essential definitions and fundamental results as a foundation for the convergence analysis. Section 3 introduces the main results, including the proposed algorithm and its convergence analysis. Section 4 presents numerical results to illustrate the effectiveness of the proposed method.

2 Preliminaries↩︎

In this article, \(\mathcal{H}\) represents a real, finite-dimensional Hilbert space equipped with an inner product \(\langle \cdot, \cdot\rangle\) and the corresponding norm \(\lVert\cdot\rVert\). Given \(g:\mathcal{H}\rightarrow (-\infty, +\infty]\), domain of \(g\) is represented by \(\mathop{\mathrm{dom}}g:=\{x\in\mathcal{H}:g(x)<+\infty\}.\) The subdifferential of \(g\) at \(x\in\mathop{\mathrm{dom}}g\) is given by \[\partial g(x):=\{w\in\mathcal{H}:g(x)-g(y)-\langle w, x-y\rangle\leq 0~\forall y\in\mathcal{H}\}.\]

Lemma 1. [23]Let \((x_k)\) and \((y_k)\) be two sequences of nonnegative numbers satisfying: \[x_{k+1}\leq x_k-y_k~\forall k\in\mathbb{N}.\] Then, \((x_k)\) is convergent and \(\sum_{k=0}^{+\infty} y_k<+\infty.\)

For \(h:\mathcal{H}\rightarrow (-\infty, +\infty]\), the Bregman distance with respect to \(h\) is the function \(B_{h}:\mathcal{H}\times \mathop{\mathrm{int}}\mathop{\mathrm{dom}}h\rightarrow (-\infty,+\infty]\) defined as \[B_{h}(x,y):=h(x)-h(y)-\langle\nabla h(y), x-y\rangle.\] For \(\alpha\)-strongly convex function \(h\), \(B_{h}\) satisfies \[B_{h}(x,y)\geq\frac{\alpha}{2}\lVert x-y\rVert^{2}~\forall (x,y)\in \mathcal{H}\times \mathop{\mathrm{int}}\mathop{\mathrm{dom}}h.\] The Bregman distance with respect to two different choices of \(h\) is given as follows [30], [37]. Let \(z=(z_1,z_2,\ldots,z_n)^\top\) and \(w=(w_1,w_2,\ldots,w_n)^\top\) be two points in \({\mathbb{R}}^n.\)

(i) The Kullback-Leibler distance is given as \[B^{KL}_h(z,w):=\sum\limits_{{i=1}}^n z_i\left(\log\left(\frac{z_i}{w_i}\right)-1\right)+\sum\limits_{i=1}^n w_i,\] which is induced by the function \(h(z)=\sum\limits_{i=1}^n z_i\log z_i\), with the domain \(\mathop{\mathrm{dom}}h^{KL}=\{z\in{\mathbb{R}}^n:z_i>0\}.\)

(ii) The squared Mahalanobis distance is given as \[B^{SM}_h(z,w)=\frac{1}{2}(z-w)^\top Q(z-w),\] which is induced by the function \(h^{SM}(z)=\frac{1}{2}z^\top Q z\), with \(\mathop{\mathrm{dom}}h^{SM}={\mathbb{R}}^n\), where \(Q=\text{diag}(1,2,\ldots,n).\)

Remark 1. If we take \(h=\frac{1}{2}\lVert .\rVert^2\). Then, the Bregman distance simplifies to \(B_h(z,w)=\frac{1}{2}\lVert z-w\rVert^2\), which is the squared Euclidean distance.

In this work, we use the following properties of the Bregman distance.

Proposition 1. **[27] For the Legendre function \(h:\mathcal{H}\rightarrow (-\infty, +\infty]\), the statements listed below are true.

(i) For all \(x,y\in\mathop{\mathrm{int}}\mathop{\mathrm{dom}}h\) and \(w\in\mathop{\mathrm{dom}}h\), \[B_{h}(w,x)-B_{h}(w,y)-B_{h}(y,x)=\langle\nabla h(x)-\nabla h(y),y-w\rangle.\]

(ii) Let \(x\in\mathop{\mathrm{dom}}h, y,z, w\in\mathop{\mathrm{int}}\mathop{\mathrm{dom}}h\) with \(\beta\in\mathbb{R}\) and \(\nabla h(y)=\beta\nabla h(z)\\+(1-\beta)\nabla h(w)\). Then, \[B_{h}(x,y)=\beta\Big(B_{h}(x,z)-B_{h}(y,z)\Big)+(1-\beta)\Big(B_{h}(x,w)-B_{h}(y,w)\Big).\]

The (left) Bregman proximal operator of a function \(f:\mathcal{H}\rightarrow (-\infty,\infty]\) is the operator given by \[{\mathop{\mathrm{{prox}}}}^h_f(y):=\mathop{\mathrm{argmin}}_{x\in\mathcal{H}}\{f(x)+B_{h}(x,y)\}~\forall y\in\mathop{\mathrm{int}}\mathop{\mathrm{dom}}h.\] Since this work exclusively requires the left Bregman proximal operator, we will henceforth refer to it simply as the Bregman proximal operator. Next, we state some properties of the Bregman proximal operator useful in our algorithm analysis.

Proposition 2. **[27] Let \(f:\mathcal{H}\rightarrow (-\infty, +\infty]\) be proper, convex, lower semicontinuous, and let \(h:\mathcal{H}\rightarrow (-\infty, +\infty]\) be Legendre with \(\mathop{\mathrm{int}}\mathop{\mathrm{dom}}h\cap \mathop{\mathrm{dom}}f\neq\emptyset\).

(i) \(\mathop{\mathrm{range}}(\mathop{\mathrm{{prox}}}^{h}_{f})\subseteq\mathop{\mathrm{int}}\mathop{\mathrm{dom}}h\cap\mathop{\mathrm{dom}}f.\)

(ii) \(\mathop{\mathrm{{prox}}}^{h}_{f}\) is single-valued on \(\mathop{\mathrm{dom}}(\mathop{\mathrm{{prox}}}^{h}_{f})\subseteq\mathop{\mathrm{int}}\mathop{\mathrm{dom}}h\). In addition, if \(h+f\) is super-coercive, then \(\mathop{\mathrm{dom}}(\mathop{\mathrm{{prox}}}^{h}_{f})=\mathop{\mathrm{int}}\mathop{\mathrm{dom}}h.\)

(iii) Let \(y\in\mathop{\mathrm{dom}}(\mathop{\mathrm{{prox}}}^{h}_{f})\). Then, \(x=\mathop{\mathrm{{prox}}}^{h}_{f}(y)\) if and only if, for all \(z\in \mathcal{H}\), we have \[f(x)-f(z)\leq\langle\nabla h(y)-\nabla h(x), x-z \rangle.\]

3 The Modified Bregman-Golden Ratio Algorithm↩︎

In this section, we present the modified Bregman Golden Ratio algorithm and discuss its convergence. We take the following assumptions for our subsequent convergence analysis.

\(g:\mathcal{H}\rightarrow (-\infty, +\infty]\) is proper, convex, and lower semicontinuous.
\(h:\mathcal{H}\rightarrow (-\infty, +\infty]\) is continuously differentiable, Legendre, and \(\alpha\)-strongly convex. Further, we assume that for any sequence \((w_n)\subseteq \mathop{\mathrm{int}}\mathop{\mathrm{dom}}h\) converging to some \(w\in \mathop{\mathrm{dom}}h,\) \(B_{h}(w, w_n)\rightarrow 0\).
\(A:\mathcal{H}\rightarrow \mathcal{H}\) is monotone and \(L\)-Lipschitz continuous on \(\mathop{\mathrm{dom}}g \cap \mathop{\mathrm{int}}\mathop{\mathrm{dom}}h\neq\emptyset\).
\(\mathcal{S}:=S\cap\mathop{\mathrm{dom}}h\neq\emptyset\).

Let \(\phi:= \frac{\sqrt{5}+1}{2}\) represent the Golden Ratio that follows the identity \({\phi}^2=\phi +1.\) The proposed algorithm is formally outlined in Algorithm 2.

Figure 2: The Modified B-GRAAL for (1 ).

It is interesting to note that ([eqn95step952]) in Algorithm 2, can be equivalently written in the following forms [27]: \[\begin{align} w_{k+1} =& \mathop{\mathrm{{prox}}}^{h}_{\lambda_k f}(\bar{w}_k), ~\text{where}~f(w) = \langle A(w_k),w - w_k \rangle + g(w)\\=&\mathop{\mathrm{{prox}}}^{h}_{\lambda_k g} \left( (\nabla h)^{-1} (\nabla h(\bar{w}_k) - \lambda_k A(w_k)) \right). \end{align}\] By applying a proof analogous to that of Lemma 3.1 in [23], it can be established that the step size sequence \((\lambda_k)\) satisfies the following properties.

Lemma 2. Suppose \(A\) is \(L\)-Lipschitz continuous and \((\lambda_k)\) is the step size sequence produced by Algorithm 2. Then,

(i) \(\lambda_k\geq\min\left\{\frac{\eta_1}{L},\lambda_0\right\}>0~\forall\) \(k\in\mathbb{N}\);

(ii) \((\lambda_k)\) is convergent;

(iii) there exists \(k_1\in\mathbb{N}\) satisfying \(\lambda_{k+1}\geq\lambda_k\) for all \(k\geq k_1.\) Equivalently, there exists \(k_1\in\mathbb{N}\) such that \[\lVert A(w_k)-A(w_{k-1})\rVert\leq\frac{\eta_0\alpha}{\lambda_{k-1}}\lVert w_k-w_{k-1}\rVert~\forall k\geq k_1.\]

Remark 2. Since the positive series \(\sum_{k=0}^{+\infty}\gamma_k< +\infty\) in Algorithm 2 converges and \(0<\eta_1<\eta_0<\frac{\phi}{2}\), there exists \(\bar{k}\in\mathbb{N}\) such that \[\gamma_{k-1}<\frac{\phi}{\eta_0}-1~\forall k\geq \bar{k}.\]

Next, we present two key lemmas that play a fundamental role in proving the convergence of the algorithm.

Lemma 3. Suppose that Assumption A.[assum952] holds. If \((w_k)\) is generated by Algorithm 2, then there exists \(\bar{k}\in\mathbb{N}\) such that \[\lambda_k\langle A(w_k)-A(w_{k-1}), w_k-w_{k+1}\rangle<\frac{\phi}{2}\Big(B_{h}(w_k,w_{k-1})+B_{h}(w_{k+1},w_k)\Big)~\forall k\geq \bar{k}.\]

Proof We consider the following two possible cases.

\(\lVert A(w_k)-A(w_{k-1})\rVert>\frac{\eta_0\alpha}{\lambda_{k-1}}\lVert w_k-w_{k-1}\rVert.\)
By using Cauchy-Schwarz inequality, relation ([eqn95roc952]), definition of \(\lambda_k\) and \(\alpha\)- strong convexity of \(h\), we get \[\begin{align} &\lambda_k\langle A(w_k)-A(w_{k-1}),w_k-w_{k+1}\rangle\\& \leq \lambda_k\lVert A(w_k)-A(w_{k-1})\rVert~ \lVert w_k-w_{k+1}\rVert\\&=\eta_1 \alpha\lVert w_k-w_{k-1}\rVert ~\lVert w_k-w_{k+1}\rVert\\&<\frac{\phi}{2}\alpha\lVert w_k-w_{k-1}\rVert~ \lVert w_k-w_{k+1}\rVert\\&\leq\frac{\phi}{2}\frac{\alpha}{2}\Big(\lVert w_k-w_{k-1}\rVert^2+\lVert w_k-w_{k+1}\rVert^2\Big)\\ &\leq\frac{\phi}{2}\Big(B_{h}(w_k,w_{k-1})+B_{h}(w_{k+1},w_k)\Big). \end{align}\]
\(\lVert A(w_k)-A(w_{k-1})\rVert\leq\frac{\eta_0\alpha}{\lambda_{k-1}}\lVert w_k-w_{k-1}\rVert\).
By applying the Cauchy-Schwarz inequality, relation ([eqn95roc953]), Remark 2 and \(\alpha\)- strong convexity of \(h\), we obtain the following \(\forall k\geq\bar{k}\), \[\begin{align} &\lambda_k\langle A(w_k)-A(w_{k-1}),w_k-w_{k+1}\rangle\\&\leq \lambda_k\lVert A(w_k)-A(w_{k-1})\rVert~ \lVert w_k-w_{k+1}\rVert\\ &\leq \lambda_k\frac{\eta_0\alpha}{\lambda_{k-1}}\lVert w_k-w_{k-1}\rVert~\lVert w_k-w_{k+1}\rVert\\&=\eta_0(1+\gamma_{k-1})\alpha\lVert w_k-w_{k-1}\rVert~\lVert w_k-w_{k+1}\rVert\\& <\frac{\phi}{2}\frac{\alpha}{2}\Big(\lVert w_k-w_{k-1}\rVert^2+\lVert w_k-w_{k+1}\rVert^2\Big)\\&\leq\frac{\phi}{2}\Big(B_{h}(w_k,w_{k-1})+B_{h}(w_{k+1},w_k)\Big). \end{align}\]

Thus, from Case [lm95cs951] and Case [lm95cs952], we have \[\lambda_k\langle A(w_k)-A(w_{k-1}), w_k-w_{k+1}\rangle<\frac{\phi}{2}\Big(B_{h}(w_k,w_{k-1})+B_{h}(w_{k+1},w_k)\Big)~\forall k\geq \bar{k}.\]

Lemma 4. Suppose that Assumptions A.[assum951]-A.[assum954] hold. Let \(w^*\in \mathcal{S}\) be arbitrary. Then, there exists \(k'\in\mathbb{N}\) such that for all \(k\geq k'\) the sequences \((w_k)\) and \((\bar{w}_k)\) generated by Algorithm 2 satisfy \[\begin{align} \label{eqn95lmprf95main} 0 &\leq (\phi+1)B_{h}(w^*,\bar{w}_{k+1})+\frac{\phi}{2}B_{h}(w_{k+1},w_k)-\phi B_{h}(w_{k+1},\bar{w}_{k+1})\nonumber\\ &\leq (\phi+1)B_{h}(w^*,\bar{w}_k)+\frac{\phi}{2}B_{h}(w_k,w_{k-1})-\phi B_{h}(w_k,\bar{w}_k)\nonumber\\&~~~~~-\left(1-\frac{3}{2\phi}\right) B_{h}({w}_{k+1},\bar{w}_k). \end{align}\tag{2}\]

Proof From ([eqn95step952]), we have \[\label{eqn95lmprf951} w_{k+1}=\mathop{\mathrm{argmin}}_{w\in \mathcal{H}}\left\{\langle A(w_k),w-w_k\rangle + g(w)+\frac{1}{\lambda_k}B_{h}(w,\bar{w}_k)\right\}.\tag{3}\] Equation (3 ) can be equivalently written in terms of Bregman proximal operator as follows: \[w_{k+1}=\mathop{\mathrm{{prox}}}^{h}_{\lambda_{k}f}(\bar{w}_k), \text{where}~f(w)=\langle A(w_k),w-w_k\rangle+g(w).\] By applying Proposition [prp952](2) with \(z:=w\in \mathop{\mathrm{dom}}h\cap\mathop{\mathrm{dom}}g\) arbitrary, \(x:=w_{k+1}\) and \(y:=\bar{w}_k\) followed by Proposition [prp951]([prp95195itm951]), we get \[\begin{align} &\lambda_k (\langle A(w_k), w_{k+1}-w\rangle + g(w_{k+1})-g(w))\nonumber\\&\leq\langle\nabla h(\bar{w}_k)-\nabla h(w_{k+1}), w_{k+1}-w\rangle\nonumber\\& =B_{h}(w,\bar{w}_k)-B_{h}(w,w_{k+1})-B_{h}(w_{k+1}, \bar{w}_k).\label{eqn95lmprf952} \end{align}\tag{4}\] By putting \(k=k-1,w=w_{k+1}\) in (4 ) and using \(\phi\nabla h(\bar{w}_k)=(\phi-1)\nabla h(w_k)+\nabla h(\bar{w}_{k-1})\) followed by Proposition [prp951]([prp95195itm951]), we get \[\begin{align} &\lambda_{k-1}(\langle A(w_{k-1}), w_k-w_{k+1}\rangle + g(w_k)-g(w_{k+1}))\nonumber \\&\leq \langle \nabla h(\bar{w}_{k-1})-\nabla h(w_k), w_k-w_{k+1}\rangle\nonumber\\&=\phi\langle\nabla h(\bar{w}_k)-\nabla h(w_k), w_k-w_{k+1}\rangle\nonumber\\&=\phi\Big(B_{h}(w_{k+1},\bar{w}_k)-B_{h}(w_{k+1},w_k)-B_{h}(w_k,\bar{w}_k)\Big).\label{eqn95lmprf953} \end{align}\tag{5}\] On multiplying both sides of (5 ) by \(\frac{\lambda_k}{\lambda_{k-1}}\), we get \[\begin{align} &\lambda_k(\langle A(w_{k-1}),w_k-w_{k+1}\rangle +g(w_k)-g(w_{k+1}))\nonumber\\&\leq \frac{\lambda_k}{\lambda_{k-1}}\phi\Big(B_h(w_{k+1},\bar{w}_k)-B_h(w_{k+1},w_k)-B_h(w_k,\bar{w}_k)\Big).\label{eqn95mp95new951} \end{align}\tag{6}\] By Lemma [mm95lm](2), there exists \(k_1\in\mathbb{N}\) such that for all \(k\geq k_1\) \[\lambda_k=(1+\gamma_{k-1})\lambda_{k-1}.\] Also, since \(\displaystyle\sum_{k=0}^\infty \gamma_k < +\infty\), there exists \(k_2\in\mathbb{N}\) such that \[\gamma_{k-1}\leq \frac{1}{2\phi^2}~\forall k\geq k_2.\] Therefore, by taking \(k''=\max\{k_1,k_2\}\), we get \[\frac{\lambda_k}{\lambda_{k-1}}\phi\leq \left(\phi+\frac{1}{2\phi}\right)~\forall k\geq k''.\] Thus, we can rewrite (6 ) as \[\begin{align} &\lambda_k(\langle A(w_{k-1}),w_k-w_{k+1}\rangle +g(w_k)-g(w_{k+1}))\nonumber\\&\leq \left(\phi + \frac{1}{2\phi}\right)\Big(B_h(w_{k+1},\bar{w}_k)-B_h(w_{k+1},w_k)-B_h(w_k,\bar{w}_k)\Big)~\forall k\geq k''.\label{eqn95mp95new952} \end{align}\tag{7}\] Let \(w^{*}\in \mathcal{S}\). By setting \(w=w^*\) in (4 ) and summing with (7 ), and then rearranging yields \[\begin{align} &\lambda_k(\langle A(w_k),w_{k}-w^*\rangle +g(w_{k})-g(w^*))\nonumber\\ &\leq B_h(w^*,\bar{w}_k)-B_h(w^*,w_{k+1})-B_h(w_{k+1},\bar{w}_k)\nonumber\\&~~~~+\left(\phi + \frac{1}{2\phi}\right)\Big(B_h(w_{k+1},\bar{w}_k)-B_h(w_{k+1},w_k)-B_h(w_k,\bar{w}_k)\Big)\nonumber\\&~~~~+\lambda_k\langle A(w_k)-A(w_{k-1}),w_k-w_{k+1}\rangle~\forall k\geq k''\label{eqn95mp95new953}. \end{align}\tag{8}\] By (1 ) and the monotonicity of \(A\), we get \[\label{eqn95lmprf955} \begin{align} 0&\leq \langle A(w^*),w_k-w^*\rangle+g(w_k)-g(w^*)\leq \langle A(w_k),w_k-w^*\rangle + g(w_k)-g(w^*). \end{align}\tag{9}\] Combining (8 ) and (9 ) followed by Lemma 3, we get \[\begin{align} B_h(w^*,w_{k+1})&\leq B_h(w^*,\bar{w}_k)+\left(\phi+\frac{1}{2\phi}-1\right)B_h(w_{k+1},\bar{w}_k)\nonumber\\&~~~-\left(\phi+\frac{1}{2\phi}\right)B_h(w_k,\bar{w}_k)-\left(\frac{\phi}{2}+\frac{1}{2\phi}\right)B_h(w_{k+1},w_k)\nonumber\\&~~~+\frac{\phi}{2}B_h(w_k,w_{k-1})~\forall k\geq k':=\max\{\bar{k},k''\}.\label{eqn95mp95new954} \end{align}\tag{10}\] From ([eqn95stp2951]), we have \[\nabla h(w_{k+1})=\frac{\phi}{\phi-1}\nabla h(\bar{w}_{k+1})-\frac{1}{\phi-1}\nabla h(\bar{w}_k)=(\phi+1)\nabla h(\bar{w}_{k+1})-\phi\nabla h(\bar{w}_k).\] Now, by applying Proposition [prp951](1), we get \[\label{eqn95lmprf95895new} \begin{align} (\phi +1)B_{h}(w^*,\bar{w}_{k+1})&=B_{h}(w^*,w_{k+1})+(\phi +1)B_{h}(w_{k+1},\bar{w}_{k+1})\\&~~~+\phi\Big(B_{h}(w^*,\bar{w}_k)-B_{h}(w_{k+1},\bar{w}_k)\Big). \end{align}\tag{11}\] On combining (10 ) and (11 ), the following is true for all \(k\geq k'\) \[\begin{align} &(\phi +1)B_h(w^*,\bar{w}_{k+1})+\frac{\phi}{2}B_h(w_{k+1},w_k)-\phi B_h(w_{k+1},\bar{w}_{k+1})\nonumber\\&\leq (\phi +1)B_h(w^*,\bar{w}_k)+\frac{\phi}{2}B_h(w_k,w_{k-1})-\phi B_h(w_k,\bar{w}_k)+B_h(w_{k+1},\bar{w}_{k+1})\nonumber\\&~~~~+\left(\frac{1}{2\phi}-1\right)B_h(w_{k+1},\bar{w}_k).\label{eqn95mp95new955} \end{align}\tag{12}\] From ([eqn95stp2951]), \(\nabla h(\bar{w}_{k+1})=\frac{\phi-1}{\phi}\nabla h(w_{k+1})+\frac{1}{\phi}\nabla h(\bar{w}_k)\). Proposition [prp951](1) gives \[\begin{align} B_{h}(w_{k+1},\bar{w}_{k+1})&=\frac{\phi-1}{\phi}\Big(B_{h}(w_{k+1},w_{k+1})-B_{h}(\bar{w}_{k+1},w_{k+1})\Big)\nonumber\\&~~~+\frac{1}{\phi}\Big(B_{h}(w_{k+1},\bar{w}_k)-B_{h}(\bar{w}_{k+1},\bar{w}_k)\Big)\nonumber\\&\leq\frac{1}{\phi} B_{h}(w_{k+1},\bar{w}_k). \label{eqn95lmprf9510} \end{align}\tag{13}\] On combining (12 ) and (13 ), we get the second inequality in (2 ).
Next, we show the first inequality in (2 ). Applying Proposition [prp951]([prp95195itm951]) with \(w=w^*, x=\bar{w}_{k+1}, y=w_{k+1}\), followed by (4 ), we get \[\begin{align} &B_{h}(w^*,w_{k+1})+B_{h}(w_{k+1},\bar{w}_{k+1})\nonumber\\&=B_{h}(w^*,\bar{w}_{k+1})+\langle\nabla h(\bar{w}_{k+1})-\nabla h(w_{k+1}),w^*-w_{k+1}\rangle\nonumber\\&=B_{h}(w^*,\bar{w}_{k+1})+\frac{1}{\phi}\langle\nabla h(\bar{w}_k)-\nabla h(w_{k+1}),w^*-w_{k+1}\rangle\nonumber\\&\leq B_{h}(w^*,\bar{w}_{k+1})+\frac{\lambda_k}{\phi}\Big(\langle A(w_k),w^*-w_{k+1}\rangle-g(w_{k+1})+g(w^*)\Big)\nonumber\\&=B_{h}(w^*,\bar{w}_{k+1})+\frac{\lambda_k}{\phi}\langle A(w_k)-A(w_{k+1}),w^*-w_{k+1}\rangle\nonumber\\&~~~~+\frac{\lambda_k}{\phi}\Big(\langle A(w_{k+1}),w^*-w_{k+1}\rangle-g(w_{k+1})+g(w^*)\Big)\nonumber\\&\leq B_{h}(w^*,\bar{w}_{k+1})+\frac{\lambda_k}{\phi}\langle A(w_k)-A(w_{k+1}),w^*-w_{k+1}\rangle.\label{eqn95lmprf9511} \end{align}\tag{14}\] By Cauchy-Schwarz inequality, Lemma [mm95lm](2), and \(\alpha\)-strong convexity of \(h\), it follows that for all \(k\geq k_1\), \[\label{eqn95lmprf9512} \begin{align} &\frac{\lambda_k}{\phi}\langle A(w_k)-A(w_{k+1}),w^*-w_{k+1}\rangle\\&\leq \frac{\lambda_k}{\phi}\lVert A(w_k)-A(w_{k+1})\rVert~\lVert w^*-w_{k+1}\rVert\\&\leq \frac{\lambda_{k}}{\phi}\frac{\eta_0\alpha}{\lambda_k}\lVert w_{k+1}-w_k\rVert~\lVert w^*-w_{k+1}\rVert\\&\leq\frac{1}{2}\frac{\alpha}{2}\Big(\lVert w_{k+1}-w_k\rVert^{2}+\lVert w^*-w_{k+1}\rVert^{2}\Big)\\&\leq \frac{1}{2}\Big(B_{h}(w_{k+1},w_k)+B_{h}(w^*,w_{k+1})\Big). \end{align}\tag{15}\] From (14 ) and (15 ), for all \(k\geq k'\), the following inequality holds \[\label{eqn95roc958} \begin{align} B_{h}(w_{k+1},\bar{w}_{k+1})&\leq B_{h}(w^*,\bar{w}_{k+1})+\frac{1}{2}B_{h}(w_{k+1},w_k)-\frac{1}{2}B_{h}(w^*,w_{k+1})\\&\leq B_{h}(w^*,\bar{w}_{k+1})+\frac{1}{2}B_{h}(w_{k+1},w_k), \end{align}\tag{16}\] and thus for all \(k \geq k'\), we get \[\begin{align} 0\leq&~B_{h}(w^*,\bar{w}_{k+1})+\frac{1}{2}B_{h}(w_{k+1},w_k)-B_{h}(w_{k+1},\bar{w}_{k+1})\\\implies 0 \leq&~ (\phi+1)B_{h}(w^*,\bar{w}_{k+1})+\frac{\phi}{2}B_{h}(w_{k+1},w_k)-\phi B_{h}(w_{k+1},\bar{w}_{k+1}), \end{align}\] which is the first inequality of (2 ), and thus the proof is complete. 0◻

Theorem 3. Suppose that Assumptions A.[assum951]-A.[assum954] hold. Then, the sequences \((w_k)\) and \((\bar{w}_k)\) generated by Algorithm 2 converge to a point in \(\mathcal{S}\).

Proof Let \(w^*\in\mathcal{S}\) be arbitrary and let \((\mu_k)\) denote the sequence given by \[\label{eqn95roc956} \mu_k:=(\phi +1)B_{h}(w^*,\bar{w}_k)+\frac{\phi}{2}B_{h}(w_k,w_{k-1})-\phi B_{h}(w_k,\bar{w}_k)~\forall k\in\mathbb{N}.\tag{17}\] From Lemma 4, it is evident that there exists \(k'\in\mathbb{N}\) such that the following is true for all \(k\geq k'\) \[0\leq \mu_{k+1}\leq\mu_k -\left(1-\frac{3}{2\phi}\right)B_h(w_{k+1},\bar{w}_k).\] Therefore, by Lemma 1, it follows that \(\lim_{k\rightarrow\infty}\mu_k\) exists and \(B_{h}(w_{k+1},\bar{w}_k)\rightarrow 0\) as \(k\rightarrow\infty.\) Then, (13 ) implies that \(B_{h}(w_{k+1},\bar{w}_{k+1})\rightarrow 0\). Also, by Proposition [prp951](1) with \(\nabla h(w_k)=(\phi +1)\nabla h(\bar{w}_k)-\phi\nabla h(\bar{w}_{k-1})\), we get \[\begin{align} B_{h}(w_{k+1},w_k)&=(\phi +1)\Big(B_{h}(w_{k+1}, \bar{w}_k)-B_{h}(w_k,\bar{w}_k)\Big)-\phi \Big(B_{h}(w_{k+1},\bar{w}_{k-1})\\&~~~-B_{h}(w_k,\bar{w}_{k-1})\Big)\\&\leq (\phi +1)B_{h}(w_{k+1},\bar{w}_k)+\phi B_{h}(w_k,\bar{w}_{k-1})\rightarrow 0. \end{align}\] Thus, we get \[\lim_{k\rightarrow\infty}\mu_k=(\phi +1)\lim_{k\rightarrow \infty}B_{h}(w^*,\bar{w}_k),\] and, in particular, \(\lim_{k\rightarrow\infty}B_{h}(w^*,\bar{w}_k)\) exists.
Using \(\alpha\)-strong convexity of \(h\), we conclude that \(w_{k+1}-\bar{w}_k\rightarrow 0\) and that \((w_k)\) and \((\bar{w}_k)\) are bounded. Thus, let \(\bar{w}\in \mathcal{H}\) be a cluster point of \((\bar{w}_k)\). Then, there exists a subsequence \((\bar{w}_{{k}_{j}})\) such that \(\bar{w}_{{k}_{j}}\rightarrow \bar{w}\) and \(w_{k_{j}+1}\rightarrow\bar{w}\) as \(j\rightarrow\infty\). Referring back to (4 ), for all \(w\in \mathcal{H}\), we obtain \[\begin{gather} \lambda_k(\langle A(w_{{k}_{j}}), w_{k_{j}+1}-w\rangle+ g(w_{k_{j}+1})-g(w))\\\leq \langle\nabla h(\bar{w}_{{k}_{j}})-\nabla h(w_{k_{j}+1}), w_{k_{j}+1}-w\rangle, \end{gather}\] and taking the limit infimum on both sides as \(j\rightarrow \infty\) shows that \(\bar{w}\in\mathcal{S}\). Since \(w^*\in\mathcal{S}\) in Lemma 4 was arbitrary, we now set \(w^*=\bar{w}\). This implies that \(\lim_{j\rightarrow\infty}B_{h}(w^*,\bar{w}_{{k}_{j}})=0\), leading to \(\lim_{j\rightarrow\infty}\mu_{{k}_{j}}=0\). Also, from Lemma 4, for \(n\geq k_j\), we have \(\mu_{n}\leq\mu_{{k}_{j}}\), and thus, \[(\phi+1)\lim_{n\rightarrow\infty}B_{h}(w^*,\bar{w}_n)=\lim_{n\rightarrow\infty}\mu_n\leq\lim_{j\rightarrow\infty}\mu_{{k}_{j}}=0,\] and thus \(\bar{w}_k\rightarrow w^*\) from strong convexity of \(h\). The convergence \(w_k\rightarrow w^*\) follows from the fact that \(w_k-\bar{w}_k\rightarrow 0.\) 0◻

3.1 Linear Convergence of Modified B-GRAAL↩︎

We next establish the R-linear convergence rate of Algorithm 2. For this, we assume that for some \(\sigma>0\), \[\label{eqn95roc951} \langle A(z)-A(w),z-w\rangle\geq \sigma B_h(z,w)~\forall (z,w)\in\mathop{\mathrm{dom}}h\times \mathop{\mathrm{int}}\mathop{\mathrm{dom}}h.\tag{18}\] Observe that when \(h=\frac{1}{2}\lVert .\rVert^2\), (18 ) simplifies to the standard definition of \(\sigma\)-strong monotonicity.

Theorem 4. Assume that Assumptions A.[assum951]-A.[assum954] hold. Furthermore, assume that condition (18 ) is met. Then, the sequences \((w_k)\) and \((\bar{w}_k)\) generated by Algorithm 2 converge R-linearly to the unique point in \(\mathcal{S}\).

Proof Without loss of generality, we assume that \(\sigma<\frac{1}{2}\). Also, in the definition of \(\lambda_k\) by multiplying right hand sides of ([eqn95roc952]) and ([eqn95roc953]) by \(\sqrt{1-\sigma}\), and by replacing these updated values of \(\lambda_k\) in the proof of Lemma 3, we get the following updated bound. \[\begin{align} \label{eqn95roc954} &\lambda_k\langle A(w_k)-A(w_{k-1}), w_k-w_{k+1}\rangle\nonumber\\&\leq \frac{\phi}{2}(1-\sigma)B_h(w_k,w_{k-1})+\frac{\phi}{2}B_h(w_{k+1},w_k)~\forall k\geq \bar{k}. \end{align}\tag{19}\] Let \(w^*\in\mathcal{S}\). Since (18 ) holds, we get \[\begin{align} \label{eqnn} 0&\leq \langle A(w^*), w_k-w^*\rangle + g(w_k)-g(w^*)\nonumber\\&\leq \langle A(w_k),w_k-w^*\rangle+g(w_k)-g(w^*)-\sigma B_h(w^*,w_k). \end{align}\tag{20}\] Following a similar approach to the proof of Lemma 4, with (20 ) replacing (9 ) and (19 ) replacing Lemma 3, the corresponding form of (10 ) becomes \[\begin{align} \label{eqn95roc957} &B_h(w^*,w_{k+1})\leq B_h(w^*,\bar{w}_k)+(\phi-1)B_h(w_{k+1},\bar{w}_k)-\phi B_h(w_k,\bar{w}_k)\nonumber\\&-\sigma B_h(w^*,w_k) -\frac{\phi}{2}B_h(w_{k+1},w_k)+\frac{(1-\sigma)\phi}{2}B_h(w_k,w_{k-1})~\forall k\geq k'. \end{align}\tag{21}\] Proposition [prp951](1) with \(\nabla h(w_{k+1})=(\phi+1)\nabla h(\bar{w}_{k+1})-\phi\nabla h(\bar{w}_k)\) gives \[\label{eqn95roc955} \begin{align} B_h(w^*,w_{k+1})=&(\phi+1)\Big(B_h(w^*,\bar{w}_{k+1})-B_h(w_{k+1},\bar{w}_{k+1})\Big)\\&-\phi\Big(B_h(w^*,\bar{w}_k)-B_h(w_{k+1},\bar{w}_k)\Big). \end{align}\tag{22}\] Let \(\mu_k\) represent the sequence defined by (17 ). Then, applying (22 ) to substitute \(B_h(w^*,w_{k+1})\) and \(B_h(w^*,w_{k})\) in (21 ), and subsequently using (13 ), we get the following for all \(k\geq k'\), \[\begin{align} \mu'_{k+1}:=&~(\phi+1)B_h(w^*,\bar{w}_{k+1})+\frac{\phi}{2}B_h(w_{k+1},w_k)-B_h(w_{k+1},\bar{w}_{k+1})\\\leq&~ (1-\sigma)(\phi+1)B_h(w^*,\bar{w}_k)+\frac{(1-\sigma)\phi}{2}B_h(w_k,w_{k-1})+\\&~(\sigma(1+\phi)-\phi)B_h(w_k,\bar{w}_k)+\sigma\phi B_h(w^*,\bar{w}_{k-1})-\sigma\phi B_h(w_k,\bar{w}_{k-1})\\&~+\phi B_h(w_{k+1},\bar{w}_{k+1})+\left(\frac{1}{2\phi}-1\right)B_h(w_{k+1},\bar{w}_k)\\\leq&~ (1-\sigma)\mu_k+\sigma\phi B_h(w^*,\bar{w}_{k-1}). \end{align}\] Since \((\mu_k)\) is non increasing by Lemma 4 and \(\frac{(1-\sigma)\phi}{1-\sigma(2-\phi)}\geq 1\), we get \[\begin{align} \mu'_{k+1}\leq&~(1-\sigma)\mu_{k-1}+\sigma(\phi-1)(\phi+1)B_h(w^*,\bar{w}_{k-1})\\=&~(1-\sigma(2-\phi))(\phi+1)B_h(w^*,\bar{w}_{k-1})+(1-\sigma)\frac{\phi}{2}B_h(w_{k-1},w_{k-2})\\&-(1-\sigma)\phi B_h(w_{k-1},\bar{w}_{k-1})\\\leq&~ (1-\sigma(2-\phi))\Big((\phi +1)B_h(w^*,\bar{w}_{k-1})+\frac{\phi}{2}B_h(w_{k-1},w_{k-2})\\&~-B_h(w_{k-1},\bar{w}_{k-1})\Big)\\=&~q\mu'_{k-1},~\text{where}~q=(1-\sigma(2-\phi))\in(0,1). \end{align}\] Therefore, \(\mu'_k\) converges Q-linearly to \(0\). By (16 ) and \(\alpha\)-strong convexity of \(h\), it follows that \[\begin{align} \mu'_{k+1}=&~\phi B_h(w^*,\bar{w}_{k+1})+\frac{\phi-1}{2}B_h(w_{k+1},w_k)\\&+\left(B_h(w^*,\bar{w}_{k+1})+\frac{1}{2}B_h(w_{k+1},w_k)-B_h(w_{k+1},\bar{w}_{k+1})\right)\\\geq&~\frac{\alpha\phi}{2}\lVert w^*-\bar{w}_{k+1}\rVert^2+\frac{\alpha(\phi-1)}{4}\lVert w_{k+1}-w_k\rVert^2. \end{align}\] This shows that \(\bar{w}_k\) converges to \(w^*\) with R-linear convergence, and the difference \(\lVert w_{k+1} - w_k \rVert\) also converges to zero at the same rate. As a result, the sequence \((w_k)\) exhibits R-linear convergence to \(w^*\). Since the choice of \(w^*\) from the set \(\mathcal{S}\) was arbitrary, \(w^*\) is unique. 0◻

4 Numerical Results↩︎

In this section, we compare our Algorithm 2 with B-GRAAL [27], and B-aGRAAL [27]. Specifically, in Section 4.1, we present the comparison results for matrix game problem. In Section 4.2, we compare the algorithms in the context of the the sparse logistic regression problem. We run all the experiments in Python 3 on a Macbook Air with M1 chip and 8GB RAM.

Both the considered problems can be formulated as MVIP (1 ) for different choices of the operator \(A\) and the function \(g\). Note that the solution of MVIP (1 ) can also be formulated as the monotone inclusion: find \(w^*\in\mathcal{H}\) such that \[0\in A(w^*)+\partial g(w^*).\] Thus, by ([eqn95step952]), we get \[0\in A(w_k)+\partial g(w_{k+1})+\frac{1}{\lambda_k}(\nabla h(w_{k+1})-\nabla h(\bar{w}_k)),\] and we track the quantity \(J_k\) defined as \[\begin{gather} J_k:=\frac{1}{\lambda_k}(\nabla h(\bar{w}_k)-\nabla h(w_{k+1}))+A(w_{k+1})-A(w_k)\in A(w_{k+1})+\partial g(w_{k+1}). \end{gather}\] as a natural residual for Algorithm 2. The parameters used for the execution of each algorithm are as follows.

B-GRAAL: \(\phi= \frac{\sqrt{5}+1}{2}, \lambda= \frac{\phi}{2L}.\)
B-aGRAAL: \(\varphi=1.5\), \(\lambda_0=\frac{\varphi}{2}\frac{\lVert w_1-w_0 \rVert}{\lVert A(w_1)-A(w_0)\rVert}\), \(\rho= \frac{1}{\phi}+\frac{1}{{\phi}^2}\), \(\lambda_{\max}=10^6.\)
Algorithm 2: \(\eta_1=0.75, \eta_0=0.80, \lambda_0=\frac{\phi}{2}\frac{\lVert w_1-w_0 \rVert}{\lVert A(w_1)-A(w_0)\rVert}\), and the sequence of positive terms \(\gamma_{k}=\frac{r(\log (k+1))^s}{(k+1)^t}, r, s>0; t>1~\forall k\geq 1.\)

4.1 Matrix Games↩︎

In this instance, we examine the following matrix game: \[\label{eqn95app3951} \min_{x\in\triangle ^k}\max_{y\in\triangle^k}\langle Px, y\rangle,\tag{23}\] where \(\triangle^k:= \left\{x\in {\mathbb{R}}^{k}_+: \sum_{i=1}^k x_i=1\right\}\) denotes the unit simplex and \(P\in {\mathbb{R}}^{k\times k}\) is a given matrix. Note that (23 ) can be formulated as a MVIP (1 ). Specifically, we examine the problem in the form (23 ), which involves positioning a server within a network \(G = (U, E)\) consisting of \(k\) vertices, with the objective of minimizing its response time. In this scenario, a request arises from an unknown vertex \(u_j \in U\), and the goal is to position the server at a vertex \(u_i \in U\) in a manner that minimizes the response time, which is determined by the graphical distance \(d(u_i, u_j)\). We analyze the scenario where the request location \(u_j \in U\) is strategically determined by an adversary. The decision variable \(x \in \triangle^k\) (similarly, \(y \in \triangle^k\)) represents mixed strategies for server placement (and request origin, respectively). More precisely, \(x_t\) (respectively, \(y_t\)) denotes the probability of the server (or request origin) being positioned at node \(u_t\) for \(t = 1, 2, \ldots, k\). The matrix \(P\) is defined as the distance matrix of the graph \(G\), where each entry is given by \(P_{ij} = d(u_i, u_j)\) for all vertices \(( u_i, u_j) \in U\). Consequently, the objective function in (23 ) quantifies the expected response time, which we aim to minimize, while the adversary attempts to maximize it.

We conducted three experiments with \(k = 10\), \(k= 20\) and \(k = 100\), with the corresponding results presented in Figs. 3, 4, and 5. The initial points are selected as \(x_0 = y_0 = \left(\frac{1}{k}, \frac{1}{k}, \dots, \frac{1}{k}\right) \in \triangle^k\), with \(w_0 = (x_0, y_0)\). Subsequently, \(w_1\) is set as a randomly perturbed version of \(w_0\). Lipschitz constant of \(A\) is calculated as \(L=\lVert P \rVert_2\) and \(r=0.0007, s=7.5, t=1.1\).

Figure 3: Matrix game results for \(k=10\).

Figure 4: Matrix game results for \(k=20\).

Figure 5: Matrix game results for \(k=100\).

4.2 Sparse Logistic Regression↩︎

In this problem, we are interested to find \(x\in\mathbb{R}^n\) such that the following loss function is minimized \[k(x)=\displaystyle\sum_{i=1}^M \log(1+\exp(-c_i\langle d_i,x\rangle))+\bar{\beta}\lVert x\rVert_1,\] where \(d^i\in{\mathbb{R}}^n, c_i\in\{-1,1\}, i=1,2,\ldots, M\) are observations; and \(\bar{\beta}>0\) is a regularization parameter. It can be seen that (for details, see [23]) the problem \(\min_{x\in{\mathbb{R}}^n} k(x)\) can be formulated as problem (1 ) with \[A(x)=\nabla h(x);~ h(x)=\displaystyle\sum_{i=1}^M \log(1+\exp(-c_i\langle d_i,x\rangle)); ~g(x)=\bar{\beta}\lVert x\rVert_1.\] The tested datasets are sourced from the LIBSVM¹ library, including ijcnn1.bz2, a9a, and duke.bz2. We choose \(w_0=zeros(n), w_1=w_0+10^{-9}random(n),\\ \bar{\beta}=0.005\lVert C^{T}c\rVert_{\infty}(C~\text{composed by}~d^i, i=1,2,\ldots,M, c=(c_i)_{i=1}^N)\) for all algorithms. For the datasets ijcnn1 and a9a, we take \(r=0.0001, s=7.2,\) and \(t=1.01\). For duke.bz2, \(r=0.0005, s=7\) and \(t=1.1\). The numerical results for the datasets ijcnn1.bz2, a9a, and duke.bz2 are presented in Figs. 6, 7, and 8, respectively.

Figure 6: Result of the data ijcnn1.bz2 (\(M=35000, n=22\)).

Figure 7: Result of the data a9a (\(M=32561, n=123\)).

Figure 8: Result of the data duke.bz2 (\(M=44, n=7129\)).

5 Conclusion↩︎

In this article, we explored a modification of the B-GRAAL algorithm to solve mixed variational inequality problems. Under standard assumptions, we established the R-linear convergence of the modified algorithm. Unlike the B-GRAAL algorithm, the implementation of the modified algorithm does not require knowledge of the Lipschitz constant \(L\) and the numerical experiments on the sparse logistic regression problem and the matrix game problem demonstrate that this modification significantly enhances the performance of the B-GRAAL algorithm.

Acknowledgements↩︎

The authors wishes to thank Dr. P.T. Hoai and Dr. D. Uteda for sharing Python codes of their papers [23] and [27], respectively. This helped the authors to write the Python codes for this paper.

References↩︎

[1]

Addi, K., Goeleven, D.: Complementarity and variational inequalities in electronics. Operations Research, Engineering, and Cyber Security: Trends in Applied Mathematics and Technology, 1–43 (2017).

[2]

Ansari, Q. H., Lalitha, C. S., Mehta, M.: Generalized Convexity, Nonsmooth Variational Inequalities, and Nonsmooth Optimization, CRC Press (2013).

[3]

Fukushima, M.: Equivalent differentiable optimization problems and descent methods for asymmetric variational inequality problems. Math. program. 53, 99–110 (1992).

[4]

Popov, L. D.: A modification of the Arrow-Hurwitz method of search for saddle points. Mat. Zametki, 28(5), 777–784 (1980).

[5]

Daniele, P., Giannessi, F., Maugeri, A. (Eds.).: Equilibrium Problems and Variational Models, vol. 68. Dordrecht: Kluwer Academic Publishers (2003).

[6]

Grad, S. M., Lara, F.: Solving mixed variational inequalities beyond convexity. J. Optim. Theory Appl. 190(2), 565–580 (2021).

[7]

Han, W., Reddy, B. D.: On the finite element method for mixed variational inequalities arising in elastoplasticity. SIAM J. Numer. Anal. 32(6), 1778–1807 (1995).

[8]

Jolaoso, L. O., Shehu, Y., Yao, J. C.: Inertial extragradient type method for mixed variational inequalities without monotonicity. Math. Comput. Simul. 192, 353–369 (2022).

[9]

Ju, X., Che, H., Li, C., He, X.: Solving mixed variational inequalities via a proximal neurodynamic network with applications. Neural Process. Lett. 54(1), 207–226 (2022).

[10]

Ju, X., Hu, D., Li, C., He, X., Feng, G.: A novel fixed-time converging neurodynamic approach to mixed variational inequalities and applications. IEEE Trans. Cybern. 52(12), 12942–12953 (2021).

[11]

Ju, X., Li, C., Dai, Y. H., Chen, J.: A new dynamical system with self-adaptive dynamical stepsize for pseudomonotone mixed variational inequalities. Optimization 73(1), 159–188 (2024).

[12]

Marcotte, P., Wu, J. H.: On the convergence of projection methods: application to the decomposition of affine variational inequalities. J. Optim. Theory Appl. 85(2), 347–362 (1995).

[13]

Pang, J. S., Chan, D.: Iterative methods for variational and complementarity problems. Math. program. 24(1), 284–313 (1982).

[14]

Bello Cruz, J. Y., Diaz Millan, R.: A variant of forward-backward splitting method for the sum of two monotone operators with a new search strategy. Optimization 64(7), 1471–1486 (2015).

[15]

Censor, Y., Iusem, A. N., Zenios, S. A.: An interior point method with Bregman functions for the variational inequality problem with paramonotone operators. Math. Program. 81, 373–400 (1998).

[16]

Iusem, A. N., Svaiter, B. F.: A variant of Korpelevich’s method for variational inequalities with a new search strategy. Optimization 42(4), 309–321 (1997).

[17]

Malitsky, Y.: Proximal extrapolated gradient methods for variational inequalities. Optim. Methods Softw. 33(1), 140–164 (2018).

[18]

Malitsky, Y., Tam, M. K.: A forward-backward splitting method for monotone inclusions without cocoercivity. SIAM J. Optim. 30(2), 1451–1472 (2020).

[19]

Malitsky, Y.: Golden ratio algorithms for variational inequalities. Math. Program. 184(1), 383–410 (2020).

[20]

Alacaoglu, A., Malitsky, Y., Cevher, V.: Convergence of adaptive algorithms for weakly convex constrained optimization. In: Advances in Neural Information Processing Systems (2021).

[21]

Soe, S., Tam, M. K., Vetrivel, V. The golden ratio primal-dual algorithm with two new stepsize rules for convex-concave saddle point problems (2025). arXiv:2502.17918.

[22]

Zhou, X., Cai, G., Cholamjiak, P., Kesornprom, S.: A generalized proximal point algorithm with new step size update for solving monotone variational inequalities in real Hilbert spaces. J. Comput. and Appl. Math. 438, 115518 (2024).

[23]

Hoai, P. T.: A new proximal gradient method for solving mixed variational inequality problems with a novel explicit stepsize and applications. Math. Comput. Simul. 229, 594–610 (2025). https://doi.org/10.1016/j.matcom.2024.10.008.

[24]

Chen, Y., Ye, X.: Projection onto a simplex (2011). arXiv:1101.6081.

[25]

Dai, Y., Chen, C.: Distributed projections onto a simplex (2022). arXiv:2204.08153.

[26]

Wang, W., Carreira-Perpinán, M. A.: Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application (2013). arXiv:1309.1541.

[27]

Tam, M. K., Uteda, D. J.: Bregman-golden ratio algorithms for variational inequalities. J. Optim. Theory Appl. 199(3), 993–1021 (2023). https://doi.org/10.1007/s10957-023-02320-2.

[28]

Gibali, A.: A new Bregman projection method for solving variational inequalities in Hilbert spaces. Pure Appl. Funct. Anal. 3(3), 403–415 (2018).

[29]

Gibali, A., Jolaoso, L. O., Mewomo, O. T., Taiwo, A.: Fast and simple Bregman projection methods for solving variational inequalities and related problems in Banach spaces. Results Maths. 75, 1–36 (2020).

[30]

Van Hieu, D., Cholamjiak, P.: Modified extragradient method with Bregman distance for variational inequalities. Appl. Anal. 101(2), 655–670 (2022).

[31]

Hieu, D. V., Reich, S.: Two Bregman projection methods for solving variational inequalities. Optimization 71(7), 1777–1802 (2022).

[32]

Jolaoso, L. O., Shehu, Y.: Single Bregman projection method for solving variational inequalities in reflexive Banach spaces. Appl. Anal. 101(14), 4807–4828 (2022).

[33]

Jolaoso, L. O., Aphane, M.: Weak and strong convergence Bregman extragradient schemes for solving pseudo-monotone and non-Lipschitz variational inequalities. J. Inequal. Appl. 2020(1) 1–25 (2020).

[34]

Jolaoso, L. O., Aphane, M., Khan, S. H.: Two Bregman projection methods for solving variational inequality problems in Hilbert spaces with applications to signal processing. Symmetry 12(12), 2007 (2020).

[35]

Nomirovskii, D. A., Rublyov, B. V., Semenov, V. V.: Convergence of two-stage method with Bregman divergence for solving variational inequalities. Cybernet. Syst. Anal. 55, 359–368 (2019).

[36]

Semenov, V. V., Denisov, S. V., Kravets, A. V.: Adaptive two-stage Bregman method for variational inequalities. Cybernet. Syst. Anal. 57(6), 959–967 (2021).

[37]

Bauschke, H. H., Wang, X., Ye, J., Yuan, X.: Bregman distances and Chebyshev sets. J. Approx. Theory 159(1), 3–25 (2009).

https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/↩︎

Modified Bregman Golden Ratio Algorithm for Mixed Variational Inequality Problems