Preconditioned Conjugate Gradient Methods in Truncated Newton Frameworks for Large-scale Linear Classification
Proceedings of The 10th Asian Conference on Machine Learning, PMLR 95:312-326, 2018.
Truncated Newton method is one of the most effective optimization methods for large-scale linear classification. The main computational task at each Newton iteration is to approximately solve a quadratic sub-problem by an iterative procedure such as the conjugate gradient (CG) method. It is known that CG has slow convergence if the sub-problem is ill-conditioned. Preconditioned CG (PCG) methods have been used to improve the convergence of the CG method, but it is difficult to find a preconditioner that performs well in most situations. Further, because Hessian-free optimization techniques are incorporated for handling large data, many existing preconditioners are not directly applicable. In this work, we detailedly study some preconditioners that have been considered in past works for linear classification. We show that these preconditioners may not help to improve the training speed in some cases. After some investigation, we propose simple and effective techniques to make the PCG method more robust in a truncated Newton framework. The idea is to avoid the situation when a preconditioner leads to a much worse condition number than when it is not applied. We provide theoretical justification. Through carefully designed experiments, we demonstrate that our method can effectively reduce the training time for large-scale problems.