Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:1662-1672, 2020.
Due to the imminent need to alleviate the communication burden in multi-agent and federated learning, the investigation of communication-efficient distributed optimization algorithms for empirical risk minimization has flourished recently. A large fraction of existing algorithms are developed for the master/slave setting, relying on the presence of a central parameter server. This paper focuses on distributed optimization in the network setting (also known as the decentralized setting), where each agent is only allowed to aggregate information from its neighbors over a graph. By properly adjusting the global gradient estimate via a tracking term, we first develop a communication-efficient approximate Newton-type method, called Network-DANE, which generalizes the attractive DANE algorithm to decentralized networks. Our key algorithmic ideas can be applied, in a systematic manner, to obtain decentralized versions of other master/slave distributed algorithms. Notably, we develop Network-SVRG/SARAH, which employ stochastic variance reduction at each agent to accelerate local computations. We establish linear convergence of Network-DANE and Network-SVRG for strongly convex losses, and Network-SARAH for quadratic losses, which shed light on the impact of data homogeneity, network connectivity, and local averaging upon the rate of convergence. Numerical evidence is provided to demonstrate the appealing performance of our algorithms over competitive baselines, in terms of both communication and computation efficiency.