Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization

*International Conference on Machine Learning (ICML), JMLR W&CP 28 (1): 427–435, 2013.*

(This article presents a slightly improved and more compact version of the first part of my thesis)

**Abstract:**

Sparse greedy optimization techniques based on the Frank-Wolfe algorithm (also known as *conditional gradient*) have seen a surge of interest recently, driven by many new promising applications in machine learning, signal processing and other fields. For constrained convex optimization problems, an iteration of the Frank-Wolfe algorithm only requires the solution of a linear subproblem over the same domain, and does not need any projection steps.

Here we provide stronger and more general primal-dual convergence results for the Frank-Wolfe algorithm for arbitrary domains, enabled by a simple new framework of duality gap certicates for constrained optimization. Our analysis also holds if the linear subproblems are only solved approximately, and is proven to be worst-case optimal in the sparsity of the obtained solutions.

On the application side, this allows us to unify a large variety of existing sparse greedy methods, in particular for optimization domains given by the convex hull of an atomic set, such as optimizing over sparse (or group sparse) vectors and matrices, low-rank matrices, permutation matrices, or max-norm bounded matrices.

We present a new general framework for convex optimization over matrix factorizations, where every Frank-Wolfe iteration will consist of a low-rank update, and discuss the broad application areas of this approach.