Lightweight Projective Derivative Codes for Compressed Asynchronous Gradient Descent

Pedro J Soto, Ilia Ilmer, Haibin Guan, Jun Li
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:20444-20458, 2022.

Abstract

Coded distributed computation has become common practice for performing gradient descent on large datasets to mitigate stragglers and other faults. This paper proposes a novel algorithm that encodes the partial derivatives themselves and furthermore optimizes the codes by performing lossy compression on the derivative codewords by maximizing the information contained in the codewords while minimizing the information between the codewords. The utility of this application of coding theory is a geometrical consequence of the observed fact in optimization research that noise is tolerable, sometimes even helpful, in gradient descent based learning algorithms since it helps avoid overfitting and local minima. This stands in contrast with much current conventional work on distributed coded computation which focuses on recovering all of the data from the workers. A second further contribution is that the low-weight nature of the coding scheme allows for asynchronous gradient updates since the code can be iteratively decoded; i.e., a worker’s task can immediately be updated into the larger gradient. The directional derivative is always a linear function of the direction vectors; thus, our framework is robust since it can apply linear coding techniques to general machine learning frameworks such as deep neural networks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-soto22a, title = {Lightweight Projective Derivative Codes for Compressed Asynchronous Gradient Descent}, author = {Soto, Pedro J and Ilmer, Ilia and Guan, Haibin and Li, Jun}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {20444--20458}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/soto22a/soto22a.pdf}, url = {https://proceedings.mlr.press/v162/soto22a.html}, abstract = {Coded distributed computation has become common practice for performing gradient descent on large datasets to mitigate stragglers and other faults. This paper proposes a novel algorithm that encodes the partial derivatives themselves and furthermore optimizes the codes by performing lossy compression on the derivative codewords by maximizing the information contained in the codewords while minimizing the information between the codewords. The utility of this application of coding theory is a geometrical consequence of the observed fact in optimization research that noise is tolerable, sometimes even helpful, in gradient descent based learning algorithms since it helps avoid overfitting and local minima. This stands in contrast with much current conventional work on distributed coded computation which focuses on recovering all of the data from the workers. A second further contribution is that the low-weight nature of the coding scheme allows for asynchronous gradient updates since the code can be iteratively decoded; i.e., a worker’s task can immediately be updated into the larger gradient. The directional derivative is always a linear function of the direction vectors; thus, our framework is robust since it can apply linear coding techniques to general machine learning frameworks such as deep neural networks.} }
Endnote
%0 Conference Paper %T Lightweight Projective Derivative Codes for Compressed Asynchronous Gradient Descent %A Pedro J Soto %A Ilia Ilmer %A Haibin Guan %A Jun Li %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-soto22a %I PMLR %P 20444--20458 %U https://proceedings.mlr.press/v162/soto22a.html %V 162 %X Coded distributed computation has become common practice for performing gradient descent on large datasets to mitigate stragglers and other faults. This paper proposes a novel algorithm that encodes the partial derivatives themselves and furthermore optimizes the codes by performing lossy compression on the derivative codewords by maximizing the information contained in the codewords while minimizing the information between the codewords. The utility of this application of coding theory is a geometrical consequence of the observed fact in optimization research that noise is tolerable, sometimes even helpful, in gradient descent based learning algorithms since it helps avoid overfitting and local minima. This stands in contrast with much current conventional work on distributed coded computation which focuses on recovering all of the data from the workers. A second further contribution is that the low-weight nature of the coding scheme allows for asynchronous gradient updates since the code can be iteratively decoded; i.e., a worker’s task can immediately be updated into the larger gradient. The directional derivative is always a linear function of the direction vectors; thus, our framework is robust since it can apply linear coding techniques to general machine learning frameworks such as deep neural networks.
APA
Soto, P.J., Ilmer, I., Guan, H. & Li, J.. (2022). Lightweight Projective Derivative Codes for Compressed Asynchronous Gradient Descent. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:20444-20458 Available from https://proceedings.mlr.press/v162/soto22a.html.

Related Material