The Mirror Descent AlgorithmTo read this you should be familiar with
Ideally you will also be familiar with the material from the ‘‘more convex analysis’’ page, it is not necessary but you will then have to take a few facts for granted. 1 From Euclidean to arbitrary projectionsIn the notes on the projected gradient descent we showed that a way to solve the original constrained minimization problem is to follow the iteration \[ \begin{eqnarray} x_{k+1} &=& \pi_{C}(x_{k}\alpha_{k}\nabla f(x_{k})). \end{eqnarray} \] Where \(\pi_{C}\) denotes the Euclidean projection. Using its definition, this can be rewritten as \[ \begin{eqnarray} x_{k+1} &=& \arg\min_{x\in C}\quad \x_{k}\alpha_{k}\nabla f(x_{k})  x\^{2}_{2}\nonumber\\ &=& \arg\min_{x\in C}\quad \ (xx_{k})+\alpha_{k}\nabla f(x_{k})\^{2}_{2}. \end{eqnarray} \] Now, since the \(\ell^{2}\)norm is induced by the innerproduct, we can use the decomposition \[ \begin{eqnarray} \ (xx_{k}) +\alpha_{k}\nabla f(x_{k}) \_{2}^{2} &=& \ xx_{k}\^{2}_{2} + 2\alpha_{k}\langle x,\nabla f(x_{k})\rangle + M_{k},\nonumber \end{eqnarray} \] where \(M_{k}\) does not depend on \(x\) (and can thus be ignored). Rearranging terms, we are left with the equivalent iteration: \[ \begin{eqnarray} x_{k+1} &=& \arg\min_{x\in C} \quad \left\{ \langle x,\nabla f(x_{k})\rangle + {1\over \alpha_{k}}{\xx_{k}\^{2}_{2}\over 2} \right\}. \end{eqnarray} \]
Given this equivalent formulation of the PGD and the illustration above, one could then suggest keeping the exact same structure but replacing the \(\ell^{2}\)distance by another distancelike function. The generalized projected gradient descent (GPGD) is defined for any positivedefinite function \(d:X\times X\to \mathbb R^+\) by the following iteration: \[ \begin{eqnarray} x_{k+1} &=& \arg\min_{x\in C} \quad \left\{ \langle x,\nabla f(x_{k})\rangle + {1\over \alpha_{k}}d(x,x_{k}) \right\}\quad \text{with}\quad \alpha_{k}>0. \end{eqnarray} \] We shall call positivedefinite functions divergences. Recall that such a function verifies \(d(u,v)>0\) for all \(u\neq v\) and \(d(u,u)=0\). 2 Bregman divergences and MDARecall that for a strictly convex function \(\psi\) we can define a positive definite function \(B_{\psi}:X\times X\to \mathbb R^{+}\) with \[ \begin{eqnarray} B_{\psi}(x,y) &:=& \psi(x)\psi(y)\langle xy, \psi’(y)\rangle, \quad \text{with}\quad \psi’(y)\,\in\,\partial \psi(y),\nonumber \end{eqnarray} \] called the Bregman divergence (associated with \(\psi\)). \[ \begin{eqnarray} x_{k+1} &\in& \arg\min_{x\in C} \quad \left\{ \langle x,\nabla f(x_{k})\rangle + {1\over \alpha_{k}}B_{\varphi}(x,x_{k}) \right\}\quad \text{with}\quad \alpha_{k}>0.\nonumber \end{eqnarray} \] The FOC for the definition of \(x_{k+1}\) reads: \[ \begin{eqnarray} 0&\in& \alpha_{k}\nabla f(x_{k}) + \nabla \varphi(x_{k+1})  \nabla\varphi(x_{k}) + N_{C}(x_{k+1}),\nonumber \end{eqnarray} \] which, after rearranging terms, reads \[ \begin{eqnarray} x_{k+1} &\in& (\nabla \varphi + N_{C})^{1}(\nabla \varphi(x_{k})\alpha_{k}\nabla f(x_{k})).\nonumber \end{eqnarray} \] To complete this, observe that \((\nabla\varphi+N_{C})=\partial (\varphi+i_{C})\) so that letting \(\phi:=\varphi+i_{C}\), and using that for \(h\in\Gamma_{0}(X)\), \((\partial h)^{1}=\partial h^{\star}\) (cf. ‘‘more convex analysis’’), we are finally left with the MDA. The Mirror Descent Algorithm (MDA) is the iteration \[ \begin{eqnarray} x_{k+1} &=& \nabla \phi^{\star} (\nabla \varphi(x_{k})\alpha_{k}\nabla f(x_{k})), \end{eqnarray} \] where \(\phi^{\star}(y):=\max_{z\in C}\,[\langle z,y\rangle  \varphi (z)]\).
\[ \phi^{\star}(y) \,\,=\,\, \langle z^{+},y\rangle  \varphi(z^{+})\quad\text{and hence}\quad \nabla\phi^{\star}(y)\,\,=\,\, z^{+}. \]
3 Some references
