## Projected gradient descentTo read this you should be familiar with the notions of ## 1. Introduction and FOCWe consider here the general minimization problem over a convex set \(C\) written in its unconstrained form: \[ \begin{eqnarray} \min_{x}\quad f(x)+i_{C}(x). \end{eqnarray} \] where \(f\) is continuously differentiable on \(C\). Following the FOC, \(x^{\sharp}\) is a minimizer if \(0\in \nabla f(x^{\sharp})+\partial i_{C}(x^{\sharp})\). To obtain the subdifferential of \(i_{C}\) at a point \(x\in C\), note that if \(y\in\partial i_{C}(x)\) then, by definition, \[ \begin{eqnarray} i_{C}(z) &\ge & i_{C}(x) + \langle z-x,y \rangle, \quad \forall z.\nonumber \end{eqnarray} \] Now, for any \(z\notin C\), the above inequality is trivially verified. We can thus restrict to \(z\in C\) and since \(x\in C\), we are left with \[ \begin{eqnarray} 0 &\ge & \langle z-x,y \rangle, \quad \forall z\in C.\nonumber \end{eqnarray} \] This inequality defines the subdifferential of interest but also happens to be the definition of the \[ \begin{eqnarray} \partial i_{C}(x) &=& \{y \,|\, \langle z-x,y\rangle \le 0, \,\forall z\in C\} \,\,=:\,\, N_{C}(x). \end{eqnarray} \] We can thus re-write the FOC for the problem as follows: \[ \begin{eqnarray} x^{\sharp} \,\in\arg\min_{x\in C} \, f(x) &\Longleftrightarrow & 0 \,\in\, \nabla f(x^{\sharp})+N_{C}(x^{\sharp}) \label{FOC basic} \end{eqnarray} \] \[ \begin{eqnarray} \text{if}\,\, u \,\in\, N_{C}(x) \,\,\text{then}\,\, \forall\alpha\ge0,\,\, \alpha u \,\in\, N_{C}(x) \label{normal cone multiplication}. \end{eqnarray} \] ## 2. Normal cone and Euclidean projectionThe Euclidean projection onto a convex set \(C\) that we shall denote \(\pi_{C}\) is defined as follows for any \(z\in X\): \[ \begin{eqnarray} \pi_{C}(z) &=& \arg \min_{x\in C}\quad{1\over 2} \|x-z\|^{2}_{2}. \end{eqnarray} \] Note that in the The subgradient of the objective function is trivial to compute (\(\partial f(x)=(x-z)\)); writing \(x^{\sharp}\) the minimizer (with thus \(x^{\sharp}=\pi_{C}(z)\)), the FOC \(\eqref{FOC basic}\) in this particular problem simply reduces to \[ \begin{eqnarray} 0 \,\in\, (x^{\sharp}-z) + N_{C}(x^{\sharp}) \quad\!\!\text{or equivalently}\quad\!\! z \in x^{\sharp}+N_{C}(x^{\sharp})\label{reduced FOC}. \end{eqnarray} \] Using operator notations, the right-hand side can be written \(z\in (\mathbb I+N_{C})(x^{\sharp})\) where \(\mathbb I\) denotes the identity operator. Replacing \(x^{\sharp}\) by \(\pi_{C}(z)\) and re-arranging terms then gives \[ \begin{eqnarray} \pi_{C}(z) &=& (\mathbb I + N_{C})^{-1}(z).\nonumber \end{eqnarray} \] We have thus shown the classical relationship: \[ \begin{eqnarray} \pi_{C} &\equiv& (\mathbb I+N_{C})^{-1}. \end{eqnarray} \] \[ \begin{eqnarray} (x+\alpha u) - x &\in & N_{C}(x)\nonumber \end{eqnarray} \] then, letting \(z:=(x+\alpha u)\), we have precisely \(z\in x+N_{C}(x)\) which is precisely the form in \(\eqref{reduced FOC}\) which is equivalent to \(x=\pi_{C}(z)\) and thus \[ \begin{eqnarray} x &=& \pi_{C}(x + \alpha u), \quad (\alpha \ge 0, u\in N_{C}(x)).\nonumber \end{eqnarray} \] The last thing to observe is that the FOC \(\eqref{FOC basic}\) indicates that \(-\nabla f(x^{\sharp})\in N_{C}(x^{\sharp})\). We have thus found ourselves a useful fixed-point form for the minimizer. For any \(\alpha \ge 0\), a minimizer \(x^{\sharp}\) of the problem verifies the following \[ \begin{eqnarray} x^{\sharp} &=& \pi_{C}(x^{\sharp}-\alpha \nabla f(x^{\sharp})). \end{eqnarray} \] ## 3. The algorithmSo, as we had said in the introduction of this blog, a big part of convex optimization is based on working out clever fixed-point equations verified by the minimizers and to then apply a fixed-point iteration. In this case it leads to the following algorithm. The \[ \begin{eqnarray} x_{k+1} &=& \pi_{C}(x_k-\alpha_{k} \nabla f(x_{k})) \end{eqnarray} \] where \(\alpha_{k}>0\) is the Roughly speaking, assuming that the \(\alpha_k\) are picked sensibly and under some regularity conditions on the problem, the method enjoys a rate \((f(x_k)-f(x^\sharp))=\mathcal O(k^{-1/2})\). This convergence with the square root of the number of iterations is actually a standard result for gradient-descent type methods. ## 4. Steepest descentBeyond the fixed-point equation, it is well known that following the direction of the gradient is also a good idea since it is the local direction of maximum decrease of the function. Indeed, let us consider the unconstrained case and a small step from \(x\) to \(x-\delta\). A Taylor expansion of \(f\) around \(x\) gives \[ \begin{eqnarray} f(x)-f(x-\delta ) &=& \langle \delta,\nabla f(x)\rangle + o(\|\delta\|) \end{eqnarray} \] and for a sufficiently small \(\delta\), this difference is maximized when \(d=\nabla f(x)\). |