A Progressive Batching L-BFGS Method for Machine Learning

Raghu Bollapragada; Jorge Nocedal; Dheevatsa Mudigere; Hao-Jun Shi; Ping Tak Peter Tang

A Progressive Batching L-BFGS Method for Machine Learning

Raghu Bollapragada, Jorge Nocedal, Dheevatsa Mudigere, Hao-Jun Shi, Ping Tak Peter Tang

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:620-629, 2018.

Abstract

The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function. All of this appears to call for a full batch approach, but since small batch sizes give rise to faster algorithms with better generalization properties, L-BFGS is currently not considered an algorithm of choice for large-scale machine learning applications. One need not, however, choose between the two extremes represented by the full batch or highly stochastic regimes, and may instead follow a progressive batching approach in which the sample size increases during the course of the optimization. In this paper, we present a new version of the L-BFGS algorithm that combines three basic components - progressive batching, a stochastic line search, and stable quasi-Newton updating - and that performs well on training logistic regression and deep neural networks. We provide supporting convergence theory for the method.

Cite this Paper

BibTeX


@InProceedings{pmlr-v80-bollapragada18a,
  title = 	 {A Progressive Batching L-{BFGS} Method for Machine Learning},
  author =       {Bollapragada, Raghu and Nocedal, Jorge and Mudigere, Dheevatsa and Shi, Hao-Jun and Tang, Ping Tak Peter},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {620--629},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/bollapragada18a/bollapragada18a.pdf},
  url = 	 {https://proceedings.mlr.press/v80/bollapragada18a.html},
  abstract = 	 {The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function. All of this appears to call for a full batch approach, but since small batch sizes give rise to faster algorithms with better generalization properties, L-BFGS is currently not considered an algorithm of choice for large-scale machine learning applications. One need not, however, choose between the two extremes represented by the full batch or highly stochastic regimes, and may instead follow a progressive batching approach in which the sample size increases during the course of the optimization. In this paper, we present a new version of the L-BFGS algorithm that combines three basic components - progressive batching, a stochastic line search, and stable quasi-Newton updating - and that performs well on training logistic regression and deep neural networks. We provide supporting convergence theory for the method.}
}

Endnote

%0 Conference Paper
%T A Progressive Batching L-BFGS Method for Machine Learning
%A Raghu Bollapragada
%A Jorge Nocedal
%A Dheevatsa Mudigere
%A Hao-Jun Shi
%A Ping Tak Peter Tang
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-bollapragada18a
%I PMLR
%P 620--629
%U https://proceedings.mlr.press/v80/bollapragada18a.html
%V 80
%X The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function. All of this appears to call for a full batch approach, but since small batch sizes give rise to faster algorithms with better generalization properties, L-BFGS is currently not considered an algorithm of choice for large-scale machine learning applications. One need not, however, choose between the two extremes represented by the full batch or highly stochastic regimes, and may instead follow a progressive batching approach in which the sample size increases during the course of the optimization. In this paper, we present a new version of the L-BFGS algorithm that combines three basic components - progressive batching, a stochastic line search, and stable quasi-Newton updating - and that performs well on training logistic regression and deep neural networks. We provide supporting convergence theory for the method.

APA


Bollapragada, R., Nocedal, J., Mudigere, D., Shi, H. & Tang, P.T.P.. (2018). A Progressive Batching L-BFGS Method for Machine Learning. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:620-629 Available from https://proceedings.mlr.press/v80/bollapragada18a.html.

A Progressive Batching L-BFGS Method for Machine Learning

Abstract

Cite this Paper

Related Material