On Generalization of Decentralized Learning with Separable Data

Hossein Taheri, Christos Thrampoulidis
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:4917-4945, 2023.

Abstract

Decentralized learning offers privacy and communication efficiency when data are naturally distributed among agents communicating over an underlying graph. Motivated by overparameterized learning settings, in which models are trained to zero training loss, we study algorithmic and generalization properties of decentralized learning with gradient descent on separable data. Specifically, for decentralized gradient descent (DGD) and a variety of loss functions that asymptote to zero at infinity (including exponential and logistic losses), we derive novel finite-time generalization bounds. This complements a long line of recent work that studies the generalization performance and the implicit bias of gradient descent over separable data, but has thus far been limited to centralized learning scenarios. Notably, our generalization bounds approximately match in order their centralized counterparts. Critical behind this, and of independent interest, is establishing novel bounds on the training loss and the rate-of-consensus of DGD for a class of self-bounded losses. Finally, on the algorithmic front, we design improved gradient-based routines for decentralized learning with separable data and empirically demonstrate orders-of-magnitude of speed-up in terms of both training and generalization performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-taheri23a, title = {On Generalization of Decentralized Learning with Separable Data}, author = {Taheri, Hossein and Thrampoulidis, Christos}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {4917--4945}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/taheri23a/taheri23a.pdf}, url = {https://proceedings.mlr.press/v206/taheri23a.html}, abstract = {Decentralized learning offers privacy and communication efficiency when data are naturally distributed among agents communicating over an underlying graph. Motivated by overparameterized learning settings, in which models are trained to zero training loss, we study algorithmic and generalization properties of decentralized learning with gradient descent on separable data. Specifically, for decentralized gradient descent (DGD) and a variety of loss functions that asymptote to zero at infinity (including exponential and logistic losses), we derive novel finite-time generalization bounds. This complements a long line of recent work that studies the generalization performance and the implicit bias of gradient descent over separable data, but has thus far been limited to centralized learning scenarios. Notably, our generalization bounds approximately match in order their centralized counterparts. Critical behind this, and of independent interest, is establishing novel bounds on the training loss and the rate-of-consensus of DGD for a class of self-bounded losses. Finally, on the algorithmic front, we design improved gradient-based routines for decentralized learning with separable data and empirically demonstrate orders-of-magnitude of speed-up in terms of both training and generalization performance.} }
Endnote
%0 Conference Paper %T On Generalization of Decentralized Learning with Separable Data %A Hossein Taheri %A Christos Thrampoulidis %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-taheri23a %I PMLR %P 4917--4945 %U https://proceedings.mlr.press/v206/taheri23a.html %V 206 %X Decentralized learning offers privacy and communication efficiency when data are naturally distributed among agents communicating over an underlying graph. Motivated by overparameterized learning settings, in which models are trained to zero training loss, we study algorithmic and generalization properties of decentralized learning with gradient descent on separable data. Specifically, for decentralized gradient descent (DGD) and a variety of loss functions that asymptote to zero at infinity (including exponential and logistic losses), we derive novel finite-time generalization bounds. This complements a long line of recent work that studies the generalization performance and the implicit bias of gradient descent over separable data, but has thus far been limited to centralized learning scenarios. Notably, our generalization bounds approximately match in order their centralized counterparts. Critical behind this, and of independent interest, is establishing novel bounds on the training loss and the rate-of-consensus of DGD for a class of self-bounded losses. Finally, on the algorithmic front, we design improved gradient-based routines for decentralized learning with separable data and empirically demonstrate orders-of-magnitude of speed-up in terms of both training and generalization performance.
APA
Taheri, H. & Thrampoulidis, C.. (2023). On Generalization of Decentralized Learning with Separable Data. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:4917-4945 Available from https://proceedings.mlr.press/v206/taheri23a.html.

Related Material