Frank-Wolfe with Subsampling Oracle

Thomas Kerdreux, Fabian Pedregosa, Alexandre d’Aspremont
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2591-2600, 2018.

Abstract

We analyze two novel randomized variants of the Frank-Wolfe (FW) or conditional gradient algorithm. While classical FW algorithms require solving a linear minimization problem over the domain at each iteration, the proposed method only requires to solve a linear minimization problem over a small subset of the original domain. The first algorithm that we propose is a randomized variant of the original FW algorithm and achieves a $\mathcal{O}(1/t)$ sublinear convergence rate as in the deterministic counterpart. The second algorithm is a randomized variant of the Away-step FW algorithm, and again as its deterministic counterpart, reaches linear (i.e., exponential) convergence rate making it the first provably convergent randomized variant of Away-step FW. In both cases, while subsampling reduces the convergence rate by a constant factor, the linear minimization step can be a fraction of the cost of that of the deterministic versions, especially when the data is streamed. We illustrate computational gains of both algorithms on regression problems, involving both $\ell_1$ and latent group lasso penalties.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-kerdreux18a, title = {Frank-{W}olfe with Subsampling Oracle}, author = {Kerdreux, Thomas and Pedregosa, Fabian and d'Aspremont, Alexandre}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {2591--2600}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/kerdreux18a/kerdreux18a.pdf}, url = {https://proceedings.mlr.press/v80/kerdreux18a.html}, abstract = {We analyze two novel randomized variants of the Frank-Wolfe (FW) or conditional gradient algorithm. While classical FW algorithms require solving a linear minimization problem over the domain at each iteration, the proposed method only requires to solve a linear minimization problem over a small subset of the original domain. The first algorithm that we propose is a randomized variant of the original FW algorithm and achieves a $\mathcal{O}(1/t)$ sublinear convergence rate as in the deterministic counterpart. The second algorithm is a randomized variant of the Away-step FW algorithm, and again as its deterministic counterpart, reaches linear (i.e., exponential) convergence rate making it the first provably convergent randomized variant of Away-step FW. In both cases, while subsampling reduces the convergence rate by a constant factor, the linear minimization step can be a fraction of the cost of that of the deterministic versions, especially when the data is streamed. We illustrate computational gains of both algorithms on regression problems, involving both $\ell_1$ and latent group lasso penalties.} }
Endnote
%0 Conference Paper %T Frank-Wolfe with Subsampling Oracle %A Thomas Kerdreux %A Fabian Pedregosa %A Alexandre d’Aspremont %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-kerdreux18a %I PMLR %P 2591--2600 %U https://proceedings.mlr.press/v80/kerdreux18a.html %V 80 %X We analyze two novel randomized variants of the Frank-Wolfe (FW) or conditional gradient algorithm. While classical FW algorithms require solving a linear minimization problem over the domain at each iteration, the proposed method only requires to solve a linear minimization problem over a small subset of the original domain. The first algorithm that we propose is a randomized variant of the original FW algorithm and achieves a $\mathcal{O}(1/t)$ sublinear convergence rate as in the deterministic counterpart. The second algorithm is a randomized variant of the Away-step FW algorithm, and again as its deterministic counterpart, reaches linear (i.e., exponential) convergence rate making it the first provably convergent randomized variant of Away-step FW. In both cases, while subsampling reduces the convergence rate by a constant factor, the linear minimization step can be a fraction of the cost of that of the deterministic versions, especially when the data is streamed. We illustrate computational gains of both algorithms on regression problems, involving both $\ell_1$ and latent group lasso penalties.
APA
Kerdreux, T., Pedregosa, F. & d’Aspremont, A.. (2018). Frank-Wolfe with Subsampling Oracle. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:2591-2600 Available from https://proceedings.mlr.press/v80/kerdreux18a.html.

Related Material