Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method

Mark Eisen, Aryan Mokhtari, Alejandro Ribeiro
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:1447-1455, 2018.

Abstract

Most second order methods are inapplicable to large scale empirical risk minimization (ERM) problems because both, the number of samples N and number of parameters p are large. Large N makes it costly to evaluate Hessians and large p makes it costly to invert Hessians. This paper propose a novel adaptive sample size second-order method, which reduces the cost of computing the Hessian by solving a sequence of ERM problems corresponding to a subset of samples and lowers the cost of computing the Hessian inverse using a truncated eigenvalue decomposition. Although the sample size is grown at a geometric rate, it is shown that it is sufficient to run a single iteration in each growth stage to track the optimal classifier to within its statistical accuracy. This results in convergence to the optimal classifier associated with the whole set in a number of iterations that scales with $\log(N)$. The use of a truncated eigenvalue decomposition result in the cost of each iteration being of order $p^2$. Theoretical performance gains manifest in practical implementations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v84-eisen18a, title = {Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method}, author = {Eisen, Mark and Mokhtari, Aryan and Ribeiro, Alejandro}, booktitle = {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics}, pages = {1447--1455}, year = {2018}, editor = {Storkey, Amos and Perez-Cruz, Fernando}, volume = {84}, series = {Proceedings of Machine Learning Research}, month = {09--11 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v84/eisen18a/eisen18a.pdf}, url = {https://proceedings.mlr.press/v84/eisen18a.html}, abstract = {Most second order methods are inapplicable to large scale empirical risk minimization (ERM) problems because both, the number of samples N and number of parameters p are large. Large N makes it costly to evaluate Hessians and large p makes it costly to invert Hessians. This paper propose a novel adaptive sample size second-order method, which reduces the cost of computing the Hessian by solving a sequence of ERM problems corresponding to a subset of samples and lowers the cost of computing the Hessian inverse using a truncated eigenvalue decomposition. Although the sample size is grown at a geometric rate, it is shown that it is sufficient to run a single iteration in each growth stage to track the optimal classifier to within its statistical accuracy. This results in convergence to the optimal classifier associated with the whole set in a number of iterations that scales with $\log(N)$. The use of a truncated eigenvalue decomposition result in the cost of each iteration being of order $p^2$. Theoretical performance gains manifest in practical implementations.} }
Endnote
%0 Conference Paper %T Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method %A Mark Eisen %A Aryan Mokhtari %A Alejandro Ribeiro %B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2018 %E Amos Storkey %E Fernando Perez-Cruz %F pmlr-v84-eisen18a %I PMLR %P 1447--1455 %U https://proceedings.mlr.press/v84/eisen18a.html %V 84 %X Most second order methods are inapplicable to large scale empirical risk minimization (ERM) problems because both, the number of samples N and number of parameters p are large. Large N makes it costly to evaluate Hessians and large p makes it costly to invert Hessians. This paper propose a novel adaptive sample size second-order method, which reduces the cost of computing the Hessian by solving a sequence of ERM problems corresponding to a subset of samples and lowers the cost of computing the Hessian inverse using a truncated eigenvalue decomposition. Although the sample size is grown at a geometric rate, it is shown that it is sufficient to run a single iteration in each growth stage to track the optimal classifier to within its statistical accuracy. This results in convergence to the optimal classifier associated with the whole set in a number of iterations that scales with $\log(N)$. The use of a truncated eigenvalue decomposition result in the cost of each iteration being of order $p^2$. Theoretical performance gains manifest in practical implementations.
APA
Eisen, M., Mokhtari, A. & Ribeiro, A.. (2018). Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:1447-1455 Available from https://proceedings.mlr.press/v84/eisen18a.html.

Related Material