Projection-free Distributed Online Convex Optimization with $O(\sqrtT)$ Communication Complexity

Yuanyu Wan, Wei-Wei Tu, Lijun Zhang
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:9818-9828, 2020.

Abstract

To deal with complicated constraints via locally light computations in distributed online learning, a recent study has presented a projection-free algorithm called distributed online conditional gradient (D-OCG), and achieved an $O(T^{3/4})$ regret bound, where $T$ is the number of prediction rounds. However, in each round, the local learners of D-OCG need to communicate with their neighbors to share the local gradients, which results in a high communication complexity of $O(T)$. In this paper, we first propose an improved variant of D-OCG, namely D-BOCG, which enjoys an $O(T^{3/4})$ regret bound with only $O(\sqrt{T})$ communication complexity. The key idea is to divide the total prediction rounds into $\sqrt{T}$ equally-sized blocks, and only update the local learners at the beginning of each block by performing iterative linear optimization steps. Furthermore, to handle the more challenging bandit setting, in which only the loss value is available, we incorporate the classical one-point gradient estimator into D-BOCG, and obtain similar theoretical guarantees.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-wan20b, title = {Projection-free Distributed Online Convex Optimization with $O(\sqrt{T})$ Communication Complexity}, author = {Wan, Yuanyu and Tu, Wei-Wei and Zhang, Lijun}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {9818--9828}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/wan20b/wan20b.pdf}, url = {https://proceedings.mlr.press/v119/wan20b.html}, abstract = {To deal with complicated constraints via locally light computations in distributed online learning, a recent study has presented a projection-free algorithm called distributed online conditional gradient (D-OCG), and achieved an $O(T^{3/4})$ regret bound, where $T$ is the number of prediction rounds. However, in each round, the local learners of D-OCG need to communicate with their neighbors to share the local gradients, which results in a high communication complexity of $O(T)$. In this paper, we first propose an improved variant of D-OCG, namely D-BOCG, which enjoys an $O(T^{3/4})$ regret bound with only $O(\sqrt{T})$ communication complexity. The key idea is to divide the total prediction rounds into $\sqrt{T}$ equally-sized blocks, and only update the local learners at the beginning of each block by performing iterative linear optimization steps. Furthermore, to handle the more challenging bandit setting, in which only the loss value is available, we incorporate the classical one-point gradient estimator into D-BOCG, and obtain similar theoretical guarantees.} }
Endnote
%0 Conference Paper %T Projection-free Distributed Online Convex Optimization with $O(\sqrtT)$ Communication Complexity %A Yuanyu Wan %A Wei-Wei Tu %A Lijun Zhang %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-wan20b %I PMLR %P 9818--9828 %U https://proceedings.mlr.press/v119/wan20b.html %V 119 %X To deal with complicated constraints via locally light computations in distributed online learning, a recent study has presented a projection-free algorithm called distributed online conditional gradient (D-OCG), and achieved an $O(T^{3/4})$ regret bound, where $T$ is the number of prediction rounds. However, in each round, the local learners of D-OCG need to communicate with their neighbors to share the local gradients, which results in a high communication complexity of $O(T)$. In this paper, we first propose an improved variant of D-OCG, namely D-BOCG, which enjoys an $O(T^{3/4})$ regret bound with only $O(\sqrt{T})$ communication complexity. The key idea is to divide the total prediction rounds into $\sqrt{T}$ equally-sized blocks, and only update the local learners at the beginning of each block by performing iterative linear optimization steps. Furthermore, to handle the more challenging bandit setting, in which only the loss value is available, we incorporate the classical one-point gradient estimator into D-BOCG, and obtain similar theoretical guarantees.
APA
Wan, Y., Tu, W. & Zhang, L.. (2020). Projection-free Distributed Online Convex Optimization with $O(\sqrtT)$ Communication Complexity. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:9818-9828 Available from https://proceedings.mlr.press/v119/wan20b.html.

Related Material