Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency

Yuyang Deng, Mehrdad Mahdavi
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:1387-1395, 2021.

Abstract

Local SGD is a promising approach to overcome the communication overhead in distributed learning by reducing the synchronization frequency among worker nodes. Despite the recent theoretical advances of local SGD in empirical risk minimization, the efficiency of its counterpart in minimax optimization remains unexplored. Motivated by large scale minimax learning problems, such as adversarial robust learning and GANs, we propose local Stochastic Gradient Descent Ascent (local SGDA), where the primal and dual variables can be trained locally and averaged periodically to significantly reduce the number of communications. We show that local SGDA can provably optimize distributed minimax problems in both homogeneous and heterogeneous data with reduced number of communications and establish convergence rates under strongly-convex-strongly-concave and nonconvex-strongly-concave settings. In addition, we propose a novel variant, dubbed as local SGDA+, to solve nonconvex-nonconcave problems. We also give corroborating empirical evidence on different distributed minimax problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-deng21a, title = { Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency }, author = {Deng, Yuyang and Mahdavi, Mehrdad}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {1387--1395}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/deng21a/deng21a.pdf}, url = {https://proceedings.mlr.press/v130/deng21a.html}, abstract = { Local SGD is a promising approach to overcome the communication overhead in distributed learning by reducing the synchronization frequency among worker nodes. Despite the recent theoretical advances of local SGD in empirical risk minimization, the efficiency of its counterpart in minimax optimization remains unexplored. Motivated by large scale minimax learning problems, such as adversarial robust learning and GANs, we propose local Stochastic Gradient Descent Ascent (local SGDA), where the primal and dual variables can be trained locally and averaged periodically to significantly reduce the number of communications. We show that local SGDA can provably optimize distributed minimax problems in both homogeneous and heterogeneous data with reduced number of communications and establish convergence rates under strongly-convex-strongly-concave and nonconvex-strongly-concave settings. In addition, we propose a novel variant, dubbed as local SGDA+, to solve nonconvex-nonconcave problems. We also give corroborating empirical evidence on different distributed minimax problems. } }
Endnote
%0 Conference Paper %T Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency %A Yuyang Deng %A Mehrdad Mahdavi %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-deng21a %I PMLR %P 1387--1395 %U https://proceedings.mlr.press/v130/deng21a.html %V 130 %X Local SGD is a promising approach to overcome the communication overhead in distributed learning by reducing the synchronization frequency among worker nodes. Despite the recent theoretical advances of local SGD in empirical risk minimization, the efficiency of its counterpart in minimax optimization remains unexplored. Motivated by large scale minimax learning problems, such as adversarial robust learning and GANs, we propose local Stochastic Gradient Descent Ascent (local SGDA), where the primal and dual variables can be trained locally and averaged periodically to significantly reduce the number of communications. We show that local SGDA can provably optimize distributed minimax problems in both homogeneous and heterogeneous data with reduced number of communications and establish convergence rates under strongly-convex-strongly-concave and nonconvex-strongly-concave settings. In addition, we propose a novel variant, dubbed as local SGDA+, to solve nonconvex-nonconcave problems. We also give corroborating empirical evidence on different distributed minimax problems.
APA
Deng, Y. & Mahdavi, M.. (2021). Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:1387-1395 Available from https://proceedings.mlr.press/v130/deng21a.html.

Related Material