Efficient Dictionary Learning with Gradient Descent

Dar Gilboa, Sam Buchanan, John Wright
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:2252-2259, 2019.

Abstract

Randomly initialized first-order optimization algorithms are the method of choice for solving many high-dimensional nonconvex problems in machine learning, yet general theoretical guarantees cannot rule out convergence to critical points of poor objective value. For some highly structured nonconvex problems however, the success of gradient descent can be understood by studying the geometry of the objective. We study one such problem – complete orthogonal dictionary learning, and provide converge guarantees for randomly initialized gradient descent to the neighborhood of a global optimum. The resulting rates scale as low order polynomials in the dimension even though the objective possesses an exponential number of saddle points. This efficient convergence can be viewed as a consequence of negative curvature normal to the stable manifolds associated with saddle points, and we provide evidence that this feature is shared by other nonconvex problems of importance as well.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-gilboa19a, title = {Efficient Dictionary Learning with Gradient Descent}, author = {Gilboa, Dar and Buchanan, Sam and Wright, John}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {2252--2259}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/gilboa19a/gilboa19a.pdf}, url = {https://proceedings.mlr.press/v97/gilboa19a.html}, abstract = {Randomly initialized first-order optimization algorithms are the method of choice for solving many high-dimensional nonconvex problems in machine learning, yet general theoretical guarantees cannot rule out convergence to critical points of poor objective value. For some highly structured nonconvex problems however, the success of gradient descent can be understood by studying the geometry of the objective. We study one such problem – complete orthogonal dictionary learning, and provide converge guarantees for randomly initialized gradient descent to the neighborhood of a global optimum. The resulting rates scale as low order polynomials in the dimension even though the objective possesses an exponential number of saddle points. This efficient convergence can be viewed as a consequence of negative curvature normal to the stable manifolds associated with saddle points, and we provide evidence that this feature is shared by other nonconvex problems of importance as well.} }
Endnote
%0 Conference Paper %T Efficient Dictionary Learning with Gradient Descent %A Dar Gilboa %A Sam Buchanan %A John Wright %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-gilboa19a %I PMLR %P 2252--2259 %U https://proceedings.mlr.press/v97/gilboa19a.html %V 97 %X Randomly initialized first-order optimization algorithms are the method of choice for solving many high-dimensional nonconvex problems in machine learning, yet general theoretical guarantees cannot rule out convergence to critical points of poor objective value. For some highly structured nonconvex problems however, the success of gradient descent can be understood by studying the geometry of the objective. We study one such problem – complete orthogonal dictionary learning, and provide converge guarantees for randomly initialized gradient descent to the neighborhood of a global optimum. The resulting rates scale as low order polynomials in the dimension even though the objective possesses an exponential number of saddle points. This efficient convergence can be viewed as a consequence of negative curvature normal to the stable manifolds associated with saddle points, and we provide evidence that this feature is shared by other nonconvex problems of importance as well.
APA
Gilboa, D., Buchanan, S. & Wright, J.. (2019). Efficient Dictionary Learning with Gradient Descent. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:2252-2259 Available from https://proceedings.mlr.press/v97/gilboa19a.html.

Related Material