Efficient Distributed Hessian Free Algorithm for Large-scale Empirical Risk Minimization via Accumulating Sample Strategy

Majid Jahani, Xi He, Chenxin Ma, Aryan Mokhtari, Dheevatsa Mudigere, Alejandro Ribeiro, Martin Takac
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:2634-2644, 2020.

Abstract

In this paper, we propose a Distributed Accumulated Newton Conjugate gradiEnt (DANCE) method in which sample size is gradually increasing to quickly obtain a solution whose empirical loss is under satisfactory statistical accuracy. Our proposed method is multistage in which the solution of a stage serves as a warm start for the next stage which contains more samples (including the samples in the previous stage). The proposed multistage algorithm reduces the number of passes over data to achieve the statistical accuracy of the full training set. Moreover, our algorithm in nature is easy to be distributed and shares the strong scaling property indicating that acceleration is always expected by using more computing nodes. Various iteration complexity results regarding descent direction computation, communication efficiency and stopping criteria are analyzed under convex setting. Our numerical results illustrate that the proposed method outperforms other comparable methods for solving learning problems including neural networks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-jahani20a, title = {Efficient Distributed Hessian Free Algorithm for Large-scale Empirical Risk Minimization via Accumulating Sample Strategy}, author = {Jahani, Majid and He, Xi and Ma, Chenxin and Mokhtari, Aryan and Mudigere, Dheevatsa and Ribeiro, Alejandro and Takac, Martin}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {2634--2644}, year = {2020}, editor = {Silvia Chiappa and Roberto Calandra}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, abstract = {In this paper, we propose a Distributed Accumulated Newton Conjugate gradiEnt (DANCE) method in which sample size is gradually increasing to quickly obtain a solution whose empirical loss is under satisfactory statistical accuracy. Our proposed method is multistage in which the solution of a stage serves as a warm start for the next stage which contains more samples (including the samples in the previous stage). The proposed multistage algorithm reduces the number of passes over data to achieve the statistical accuracy of the full training set. Moreover, our algorithm in nature is easy to be distributed and shares the strong scaling property indicating that acceleration is always expected by using more computing nodes. Various iteration complexity results regarding descent direction computation, communication efficiency and stopping criteria are analyzed under convex setting. Our numerical results illustrate that the proposed method outperforms other comparable methods for solving learning problems including neural networks.} }
Endnote
%0 Conference Paper %T Efficient Distributed Hessian Free Algorithm for Large-scale Empirical Risk Minimization via Accumulating Sample Strategy %A Majid Jahani %A Xi He %A Chenxin Ma %A Aryan Mokhtari %A Dheevatsa Mudigere %A Alejandro Ribeiro %A Martin Takac %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-jahani20a %I PMLR %P 2634--2644 %V 108 %X In this paper, we propose a Distributed Accumulated Newton Conjugate gradiEnt (DANCE) method in which sample size is gradually increasing to quickly obtain a solution whose empirical loss is under satisfactory statistical accuracy. Our proposed method is multistage in which the solution of a stage serves as a warm start for the next stage which contains more samples (including the samples in the previous stage). The proposed multistage algorithm reduces the number of passes over data to achieve the statistical accuracy of the full training set. Moreover, our algorithm in nature is easy to be distributed and shares the strong scaling property indicating that acceleration is always expected by using more computing nodes. Various iteration complexity results regarding descent direction computation, communication efficiency and stopping criteria are analyzed under convex setting. Our numerical results illustrate that the proposed method outperforms other comparable methods for solving learning problems including neural networks.
APA
Jahani, M., He, X., Ma, C., Mokhtari, A., Mudigere, D., Ribeiro, A. & Takac, M.. (2020). Efficient Distributed Hessian Free Algorithm for Large-scale Empirical Risk Minimization via Accumulating Sample Strategy. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:2634-2644

Related Material