Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation

Si Yi Meng, Sharan Vaswani, Issam Hadj Laradji), Mark Schmidt, Simon Lacoste-Julien
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:1375-1386, 2020.

Abstract

We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied by over-parameterized models. Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size. By growing the batch size for both the subsampled gradient and Hessian, we show that R-SSN can converge at a quadratic rate in a local neighbourhood of the solution. We also show that R-SSN attains local linear convergence for the family of self-concordant functions. Furthermore, we analyze stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence. We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and real datasets under a kernel mapping. Our experimental results demonstrate the fast convergence of these methods, both in terms of the number of iterations and wall-clock time.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-meng20a, title = {Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation}, author = {Meng, Si Yi and Vaswani, Sharan and Laradji), Issam Hadj and Schmidt, Mark and Lacoste-Julien, Simon}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {1375--1386}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/meng20a/meng20a.pdf}, url = {https://proceedings.mlr.press/v108/meng20a.html}, abstract = {We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied by over-parameterized models. Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size. By growing the batch size for both the subsampled gradient and Hessian, we show that R-SSN can converge at a quadratic rate in a local neighbourhood of the solution. We also show that R-SSN attains local linear convergence for the family of self-concordant functions. Furthermore, we analyze stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence. We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and real datasets under a kernel mapping. Our experimental results demonstrate the fast convergence of these methods, both in terms of the number of iterations and wall-clock time.} }
Endnote
%0 Conference Paper %T Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation %A Si Yi Meng %A Sharan Vaswani %A Issam Hadj Laradji) %A Mark Schmidt %A Simon Lacoste-Julien %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-meng20a %I PMLR %P 1375--1386 %U https://proceedings.mlr.press/v108/meng20a.html %V 108 %X We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied by over-parameterized models. Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size. By growing the batch size for both the subsampled gradient and Hessian, we show that R-SSN can converge at a quadratic rate in a local neighbourhood of the solution. We also show that R-SSN attains local linear convergence for the family of self-concordant functions. Furthermore, we analyze stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence. We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and real datasets under a kernel mapping. Our experimental results demonstrate the fast convergence of these methods, both in terms of the number of iterations and wall-clock time.
APA
Meng, S.Y., Vaswani, S., Laradji), I.H., Schmidt, M. & Lacoste-Julien, S.. (2020). Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:1375-1386 Available from https://proceedings.mlr.press/v108/meng20a.html.

Related Material