Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation

Si Yi Meng; Sharan Vaswani; Issam Hadj Laradji); Mark Schmidt; Simon Lacoste-Julien

Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation

Si Yi Meng, Sharan Vaswani, Issam Hadj Laradji), Mark Schmidt, Simon Lacoste-Julien

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:1375-1386, 2020.

Abstract

We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied by over-parameterized models. Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size. By growing the batch size for both the subsampled gradient and Hessian, we show that R-SSN can converge at a quadratic rate in a local neighbourhood of the solution. We also show that R-SSN attains local linear convergence for the family of self-concordant functions. Furthermore, we analyze stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence. We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and real datasets under a kernel mapping. Our experimental results demonstrate the fast convergence of these methods, both in terms of the number of iterations and wall-clock time.

Cite this Paper

BibTeX


@InProceedings{pmlr-v108-meng20a,
  title = 	 {Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation},
  author =       {Meng, Si Yi and Vaswani, Sharan and Laradji), Issam Hadj and Schmidt, Mark and Lacoste-Julien, Simon},
  booktitle = 	 {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1375--1386},
  year = 	 {2020},
  editor = 	 {Chiappa, Silvia and Calandra, Roberto},
  volume = 	 {108},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {26--28 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v108/meng20a/meng20a.pdf},
  url = 	 {https://proceedings.mlr.press/v108/meng20a.html},
  abstract = 	 {We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied by over-parameterized models. Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size. By growing the batch size for both the subsampled gradient and Hessian, we show that R-SSN can converge at a quadratic rate in a local neighbourhood of the solution. We also show that R-SSN attains local linear convergence for the family of self-concordant functions. Furthermore, we analyze stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence. We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and real datasets under a kernel mapping. Our experimental results demonstrate the fast convergence of these methods, both in terms of the number of iterations and wall-clock time.}
}

Endnote

%0 Conference Paper
%T Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation
%A Si Yi Meng
%A Sharan Vaswani
%A Issam Hadj Laradji)
%A Mark Schmidt
%A Simon Lacoste-Julien
%B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2020
%E Silvia Chiappa
%E Roberto Calandra	
%F pmlr-v108-meng20a
%I PMLR
%P 1375--1386
%U https://proceedings.mlr.press/v108/meng20a.html
%V 108
%X We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied by over-parameterized models. Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size. By growing the batch size for both the subsampled gradient and Hessian, we show that R-SSN can converge at a quadratic rate in a local neighbourhood of the solution. We also show that R-SSN attains local linear convergence for the family of self-concordant functions. Furthermore, we analyze stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence. We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and real datasets under a kernel mapping. Our experimental results demonstrate the fast convergence of these methods, both in terms of the number of iterations and wall-clock time.

APA


Meng, S.Y., Vaswani, S., Laradji), I.H., Schmidt, M. & Lacoste-Julien, S.. (2020). Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:1375-1386 Available from https://proceedings.mlr.press/v108/meng20a.html.

Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation

Abstract

Cite this Paper

Related Material