Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:1447-1455, 2018.
Most second order methods are inapplicable to large scale empirical risk minimization (ERM) problems because both, the number of samples N and number of parameters p are large. Large N makes it costly to evaluate Hessians and large p makes it costly to invert Hessians. This paper propose a novel adaptive sample size second-order method, which reduces the cost of computing the Hessian by solving a sequence of ERM problems corresponding to a subset of samples and lowers the cost of computing the Hessian inverse using a truncated eigenvalue decomposition. Although the sample size is grown at a geometric rate, it is shown that it is sufficient to run a single iteration in each growth stage to track the optimal classifier to within its statistical accuracy. This results in convergence to the optimal classifier associated with the whole set in a number of iterations that scales with $\log(N)$. The use of a truncated eigenvalue decomposition result in the cost of each iteration being of order $p^2$. Theoretical performance gains manifest in practical implementations.