[edit]
Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: Joint Gradient Estimation and Tracking
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:9217-9228, 2020.
Abstract
Many modern large-scale machine learning problems benefit from decentralized and stochastic optimization. Recent works have shown that utilizing both decentralized computing and local stochastic gradient estimates can outperform state-of-the-art centralized algorithms, in applications involving highly non-convex problems, such as training deep neural networks. In this work, we propose a decentralized stochastic algorithm to deal with certain smooth non-convex problems where there are m nodes in the system, and each node has a large number of samples (denoted as n). Differently from the majority of the existing decentralized learning algorithms for either stochastic or finite-sum problems, our focus is given to \emph{both} reducing the total communication rounds among the nodes, while accessing the minimum number of local data samples. In particular, we propose an algorithm named D-GET (decentralized gradient estimation and tracking), which jointly performs decentralized gradient estimation (which estimates the local gradient using a subset of local samples) \emph{and} gradient tracking (which tracks the global full gradient using local estimates). We show that to achieve certain ϵ stationary solution of the deterministic finite sum problem, the proposed algorithm achieves an O(mn1/2ϵ−1) sample complexity and an O(ϵ−1) communication complexity. These bounds significantly improve upon the best existing bounds of O(mnϵ−1) and O(ϵ−1), respectively. Similarly, for online problems, the proposed method achieves an O(mϵ−3/2) sample complexity and an O(ϵ−1) communication complexity.