Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features

Aleksandr Beznosikov, David Dobre, Gauthier Gidel
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:3725-3750, 2024.

Abstract

The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints that arise in machine learning applications. In recent years, stochastic versions of FW have gained popularity, motivated by large datasets for which the computation of the full gradient is prohibitively expensive. In this paper, we present two new variants of the FW algorithms for stochastic finite-sum minimization. Our algorithms have the best convergence guarantees of existing stochastic FW approaches for both convex and non-convex objective functions. Our methods do not have the issue of permanently collecting large batches, which is common to many stochastic projection-free approaches. Moreover, our second approach does not require either large batches or full deterministic gradients, which is a typical weakness of many techniques for finite-sum problems. The faster theoretical rates of our approaches are confirmed experimentally.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-beznosikov24a, title = {Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features}, author = {Beznosikov, Aleksandr and Dobre, David and Gidel, Gauthier}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {3725--3750}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/beznosikov24a/beznosikov24a.pdf}, url = {https://proceedings.mlr.press/v235/beznosikov24a.html}, abstract = {The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints that arise in machine learning applications. In recent years, stochastic versions of FW have gained popularity, motivated by large datasets for which the computation of the full gradient is prohibitively expensive. In this paper, we present two new variants of the FW algorithms for stochastic finite-sum minimization. Our algorithms have the best convergence guarantees of existing stochastic FW approaches for both convex and non-convex objective functions. Our methods do not have the issue of permanently collecting large batches, which is common to many stochastic projection-free approaches. Moreover, our second approach does not require either large batches or full deterministic gradients, which is a typical weakness of many techniques for finite-sum problems. The faster theoretical rates of our approaches are confirmed experimentally.} }
Endnote
%0 Conference Paper %T Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features %A Aleksandr Beznosikov %A David Dobre %A Gauthier Gidel %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-beznosikov24a %I PMLR %P 3725--3750 %U https://proceedings.mlr.press/v235/beznosikov24a.html %V 235 %X The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints that arise in machine learning applications. In recent years, stochastic versions of FW have gained popularity, motivated by large datasets for which the computation of the full gradient is prohibitively expensive. In this paper, we present two new variants of the FW algorithms for stochastic finite-sum minimization. Our algorithms have the best convergence guarantees of existing stochastic FW approaches for both convex and non-convex objective functions. Our methods do not have the issue of permanently collecting large batches, which is common to many stochastic projection-free approaches. Moreover, our second approach does not require either large batches or full deterministic gradients, which is a typical weakness of many techniques for finite-sum problems. The faster theoretical rates of our approaches are confirmed experimentally.
APA
Beznosikov, A., Dobre, D. & Gidel, G.. (2024). Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:3725-3750 Available from https://proceedings.mlr.press/v235/beznosikov24a.html.

Related Material