Decentralised Learning with Random Features and Distributed Gradient Descent

Dominic Richards, Patrick Rebeschini, Lorenzo Rosasco
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:8105-8115, 2020.

Abstract

We investigate the generalisation performance of Distributed Gradient Descent with implicit regularisation and random features in the homogenous setting where a network of agents are given data sampled independently from the same unknown distribution. Along with reducing the memory footprint, random features are particularly convenient in this setting as they provide a common parameterisation across agents that allows to overcome previous difficulties in implementing decentralised kernel regression. Under standard source and capacity assumptions, we establish high probability bounds on the predictive performance for each agent as a function of the step size, number of iterations, inverse spectral gap of the communication matrix and number of random features. By tuning these parameters, we obtain statistical rates that are minimax optimal with respect to the total number of samples in the network. The algorithm provides a linear improvement over single-machine gradient descent in memory cost and, when agents hold enough data with respect to the network size and inverse spectral gap, a linear speed up in computational run-time for any network topology. We present simulations that show how the number of random features, iterations and samples impact predictive performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-richards20a, title = {Decentralised Learning with Random Features and Distributed Gradient Descent}, author = {Richards, Dominic and Rebeschini, Patrick and Rosasco, Lorenzo}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {8105--8115}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/richards20a/richards20a.pdf}, url = {https://proceedings.mlr.press/v119/richards20a.html}, abstract = {We investigate the generalisation performance of Distributed Gradient Descent with implicit regularisation and random features in the homogenous setting where a network of agents are given data sampled independently from the same unknown distribution. Along with reducing the memory footprint, random features are particularly convenient in this setting as they provide a common parameterisation across agents that allows to overcome previous difficulties in implementing decentralised kernel regression. Under standard source and capacity assumptions, we establish high probability bounds on the predictive performance for each agent as a function of the step size, number of iterations, inverse spectral gap of the communication matrix and number of random features. By tuning these parameters, we obtain statistical rates that are minimax optimal with respect to the total number of samples in the network. The algorithm provides a linear improvement over single-machine gradient descent in memory cost and, when agents hold enough data with respect to the network size and inverse spectral gap, a linear speed up in computational run-time for any network topology. We present simulations that show how the number of random features, iterations and samples impact predictive performance.} }
Endnote
%0 Conference Paper %T Decentralised Learning with Random Features and Distributed Gradient Descent %A Dominic Richards %A Patrick Rebeschini %A Lorenzo Rosasco %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-richards20a %I PMLR %P 8105--8115 %U https://proceedings.mlr.press/v119/richards20a.html %V 119 %X We investigate the generalisation performance of Distributed Gradient Descent with implicit regularisation and random features in the homogenous setting where a network of agents are given data sampled independently from the same unknown distribution. Along with reducing the memory footprint, random features are particularly convenient in this setting as they provide a common parameterisation across agents that allows to overcome previous difficulties in implementing decentralised kernel regression. Under standard source and capacity assumptions, we establish high probability bounds on the predictive performance for each agent as a function of the step size, number of iterations, inverse spectral gap of the communication matrix and number of random features. By tuning these parameters, we obtain statistical rates that are minimax optimal with respect to the total number of samples in the network. The algorithm provides a linear improvement over single-machine gradient descent in memory cost and, when agents hold enough data with respect to the network size and inverse spectral gap, a linear speed up in computational run-time for any network topology. We present simulations that show how the number of random features, iterations and samples impact predictive performance.
APA
Richards, D., Rebeschini, P. & Rosasco, L.. (2020). Decentralised Learning with Random Features and Distributed Gradient Descent. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:8105-8115 Available from https://proceedings.mlr.press/v119/richards20a.html.

Related Material