[edit]
On the Convergence of Decentralized Adaptive Gradient Methods
Proceedings of The 14th Asian Conference on Machine
Learning, PMLR 189:217-232, 2023.
Abstract
Adaptive gradient methods including Adam, AdaGrad,
and their variants have been very successful for
training deep learning models, such as neural
networks. Meanwhile, given the need for distributed
computing, distributed optimization algorithms are
rapidly becoming a focal point. With the growth of
computing power and the need for using machine
learning models on mobile devices, the communication
cost of distributed training algorithms needs
careful consideration. In this paper, we introduce
novel convergent decentralized adaptive gradient
methods and rigorously incorporate adaptive gradient
methods into decentralized training
procedures. Specifically, we propose a general
algorithmic framework that can convert existing
adaptive gradient methods to their decentralized
counterparts. In addition, we thoroughly analyze the
convergence behavior of the proposed algorithmic
framework and show that if a given adaptive gradient
method converges, under some specific conditions,
then its decentralized counterpart is also
convergent. We illustrate the benefit of our generic
decentralized framework on prototype methods,
AMSGrad and AdaGrad.