Federated Optimization with Doubly Regularized Drift Correction

Xiaowen Jiang, Anton Rodomanov, Sebastian U Stich
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:21912-21945, 2024.

Abstract

Federated learning is a distributed optimization paradigm that allows training machine learning models across decentralized devices while keeping the data localized. The standard method, FedAvg, suffers from client drift which can hamper performance and increase communication costs over centralized methods. Previous works proposed various strategies to mitigate drift, yet none have shown consistently improved communication-computation trade-offs over vanilla gradient descent across all standard function classes. In this work, we revisit DANE, an established method in distributed optimization. We show that (i) DANE can achieve the desired communication reduction under Hessian similarity constraints. Furthermore, (ii) we present an extension, DANE+, which supports arbitrary inexact local solvers and has more freedom to choose how to aggregate the local updates. We propose (iii) a novel method, FedRed, which has improved local computational complexity and retains the same communication complexity compared to DANE/DANE+. This is achieved by doubly regularized drift correction.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-jiang24e, title = {Federated Optimization with Doubly Regularized Drift Correction}, author = {Jiang, Xiaowen and Rodomanov, Anton and Stich, Sebastian U}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {21912--21945}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/jiang24e/jiang24e.pdf}, url = {https://proceedings.mlr.press/v235/jiang24e.html}, abstract = {Federated learning is a distributed optimization paradigm that allows training machine learning models across decentralized devices while keeping the data localized. The standard method, FedAvg, suffers from client drift which can hamper performance and increase communication costs over centralized methods. Previous works proposed various strategies to mitigate drift, yet none have shown consistently improved communication-computation trade-offs over vanilla gradient descent across all standard function classes. In this work, we revisit DANE, an established method in distributed optimization. We show that (i) DANE can achieve the desired communication reduction under Hessian similarity constraints. Furthermore, (ii) we present an extension, DANE+, which supports arbitrary inexact local solvers and has more freedom to choose how to aggregate the local updates. We propose (iii) a novel method, FedRed, which has improved local computational complexity and retains the same communication complexity compared to DANE/DANE+. This is achieved by doubly regularized drift correction.} }
Endnote
%0 Conference Paper %T Federated Optimization with Doubly Regularized Drift Correction %A Xiaowen Jiang %A Anton Rodomanov %A Sebastian U Stich %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-jiang24e %I PMLR %P 21912--21945 %U https://proceedings.mlr.press/v235/jiang24e.html %V 235 %X Federated learning is a distributed optimization paradigm that allows training machine learning models across decentralized devices while keeping the data localized. The standard method, FedAvg, suffers from client drift which can hamper performance and increase communication costs over centralized methods. Previous works proposed various strategies to mitigate drift, yet none have shown consistently improved communication-computation trade-offs over vanilla gradient descent across all standard function classes. In this work, we revisit DANE, an established method in distributed optimization. We show that (i) DANE can achieve the desired communication reduction under Hessian similarity constraints. Furthermore, (ii) we present an extension, DANE+, which supports arbitrary inexact local solvers and has more freedom to choose how to aggregate the local updates. We propose (iii) a novel method, FedRed, which has improved local computational complexity and retains the same communication complexity compared to DANE/DANE+. This is achieved by doubly regularized drift correction.
APA
Jiang, X., Rodomanov, A. & Stich, S.U.. (2024). Federated Optimization with Doubly Regularized Drift Correction. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:21912-21945 Available from https://proceedings.mlr.press/v235/jiang24e.html.

Related Material