Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method

Mark Eisen; Aryan Mokhtari; Alejandro Ribeiro

Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method

Mark Eisen, Aryan Mokhtari, Alejandro Ribeiro

Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:1447-1455, 2018.

Abstract

Most second order methods are inapplicable to large scale empirical risk minimization (ERM) problems because both, the number of samples N and number of parameters p are large. Large N makes it costly to evaluate Hessians and large p makes it costly to invert Hessians. This paper propose a novel adaptive sample size second-order method, which reduces the cost of computing the Hessian by solving a sequence of ERM problems corresponding to a subset of samples and lowers the cost of computing the Hessian inverse using a truncated eigenvalue decomposition. Although the sample size is grown at a geometric rate, it is shown that it is sufficient to run a single iteration in each growth stage to track the optimal classifier to within its statistical accuracy. This results in convergence to the optimal classifier associated with the whole set in a number of iterations that scales with $\log(N)$. The use of a truncated eigenvalue decomposition result in the cost of each iteration being of order $p^2$. Theoretical performance gains manifest in practical implementations.

Cite this Paper

BibTeX

@InProceedings{pmlr-v84-eisen18a,
  title = 	 {Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method},
  author = 	 {Eisen, Mark and Mokhtari, Aryan and Ribeiro, Alejandro},
  booktitle = 	 {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1447--1455},
  year = 	 {2018},
  editor = 	 {Storkey, Amos and Perez-Cruz, Fernando},
  volume = 	 {84},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--11 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v84/eisen18a/eisen18a.pdf},
  url = 	 {https://proceedings.mlr.press/v84/eisen18a.html},
  abstract = 	 {Most second order methods are inapplicable to large scale empirical risk minimization (ERM) problems because both, the number of samples N and number of parameters p are large. Large N makes it costly to evaluate Hessians and large p makes it costly to invert Hessians. This paper propose a novel adaptive sample size second-order method, which reduces the cost of computing the Hessian by solving a sequence of ERM problems corresponding to a subset of samples and lowers the cost of computing the Hessian inverse using a truncated eigenvalue decomposition. Although the sample size is grown at a geometric rate, it is shown that it is sufficient to run a single iteration in each growth stage to track the optimal classifier to within its statistical accuracy. This results in convergence to the optimal classifier associated with the whole set in a number of iterations that scales with $\log(N)$. The use of a truncated eigenvalue decomposition result in the cost of each iteration being of order $p^2$. Theoretical performance gains manifest in practical implementations.}
}

Endnote

%0 Conference Paper
%T Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method
%A Mark Eisen
%A Aryan Mokhtari
%A Alejandro Ribeiro
%B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2018
%E Amos Storkey
%E Fernando Perez-Cruz	
%F pmlr-v84-eisen18a
%I PMLR
%P 1447--1455
%U https://proceedings.mlr.press/v84/eisen18a.html
%V 84
%X Most second order methods are inapplicable to large scale empirical risk minimization (ERM) problems because both, the number of samples N and number of parameters p are large. Large N makes it costly to evaluate Hessians and large p makes it costly to invert Hessians. This paper propose a novel adaptive sample size second-order method, which reduces the cost of computing the Hessian by solving a sequence of ERM problems corresponding to a subset of samples and lowers the cost of computing the Hessian inverse using a truncated eigenvalue decomposition. Although the sample size is grown at a geometric rate, it is shown that it is sufficient to run a single iteration in each growth stage to track the optimal classifier to within its statistical accuracy. This results in convergence to the optimal classifier associated with the whole set in a number of iterations that scales with $\log(N)$. The use of a truncated eigenvalue decomposition result in the cost of each iteration being of order $p^2$. Theoretical performance gains manifest in practical implementations.

APA

Eisen, M., Mokhtari, A. & Ribeiro, A.. (2018). Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:1447-1455 Available from https://proceedings.mlr.press/v84/eisen18a.html.

Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method

Abstract

Cite this Paper

Related Material