Communication-Efficient Distributed PCA by Riemannian Optimization

Long-Kai Huang; Sinno Pan

Communication-Efficient Distributed PCA by Riemannian Optimization

Long-Kai Huang, Sinno Pan

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:4465-4474, 2020.

Abstract

In this paper, we study the leading eigenvector problem in a statistically distributed setting and propose a communication-efficient algorithm based on Riemannian optimization, which trades local computation for global communication. Theoretical analysis shows that the proposed algorithm linearly converges to the centralized empirical risk minimization solution regarding the number of communication rounds. When the number of data points in local machines is sufficiently large, the proposed algorithm achieves a significant reduction of communication cost over existing distributed PCA algorithms. Superior performance in terms of communication cost of the proposed algorithm is verified on real-world and synthetic datasets.

Cite this Paper

BibTeX

@InProceedings{pmlr-v119-huang20e,
  title = 	 {Communication-Efficient Distributed {PCA} by {R}iemannian Optimization},
  author =       {Huang, Long-Kai and Pan, Sinno},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {4465--4474},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/huang20e/huang20e.pdf},
  url = 	 {https://proceedings.mlr.press/v119/huang20e.html},
  abstract = 	 {In this paper, we study the leading eigenvector problem in a statistically distributed setting and propose a communication-efficient algorithm based on Riemannian optimization, which trades local computation for global communication. Theoretical analysis shows that the proposed algorithm linearly converges to the centralized empirical risk minimization solution regarding the number of communication rounds. When the number of data points in local machines is sufficiently large, the proposed algorithm achieves a significant reduction of communication cost over existing distributed PCA algorithms. Superior performance in terms of communication cost of the proposed algorithm is verified on real-world and synthetic datasets.}
}

Endnote

%0 Conference Paper
%T Communication-Efficient Distributed PCA by Riemannian Optimization
%A Long-Kai Huang
%A Sinno Pan
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-huang20e
%I PMLR
%P 4465--4474
%U https://proceedings.mlr.press/v119/huang20e.html
%V 119
%X In this paper, we study the leading eigenvector problem in a statistically distributed setting and propose a communication-efficient algorithm based on Riemannian optimization, which trades local computation for global communication. Theoretical analysis shows that the proposed algorithm linearly converges to the centralized empirical risk minimization solution regarding the number of communication rounds. When the number of data points in local machines is sufficiently large, the proposed algorithm achieves a significant reduction of communication cost over existing distributed PCA algorithms. Superior performance in terms of communication cost of the proposed algorithm is verified on real-world and synthetic datasets.

APA

Huang, L. & Pan, S.. (2020). Communication-Efficient Distributed PCA by Riemannian Optimization. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:4465-4474 Available from https://proceedings.mlr.press/v119/huang20e.html.

Communication-Efficient Distributed PCA by Riemannian Optimization

Abstract

Cite this Paper

Related Material