Chebyshev Polynomial Codes: Task Entanglement-based Coding for Distributed Matrix Multiplication

Sangwoo Hong, Heecheol Yang, Youngseok Yoon, Taehyun Cho, Jungwoo Lee
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:4319-4327, 2021.

Abstract

Distributed computing has been a prominent solution to efficiently process massive datasets in parallel. However, the existence of stragglers is one of the major concerns that slows down the overall speed of distributed computing. To deal with this problem, we consider a distributed matrix multiplication scenario where a master assigns multiple tasks to each worker to exploit stragglers’ computing ability (which is typically wasted in conventional distributed computing). We propose Chebyshev polynomial codes, which can achieve order-wise improvement in encoding complexity at the master and communication load in distributed matrix multiplication using task entanglement. The key idea of task entanglement is to reduce the number of encoded matrices for multiple tasks assigned to each worker by intertwining encoded matrices. We experimentally demonstrate that, in cloud environments, Chebyshev polynomial codes can provide significant reduction in overall processing time in distributed computing for matrix multiplication, which is a key computational component in modern deep learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-hong21b, title = {Chebyshev Polynomial Codes: Task Entanglement-based Coding for Distributed Matrix Multiplication}, author = {Hong, Sangwoo and Yang, Heecheol and Yoon, Youngseok and Cho, Taehyun and Lee, Jungwoo}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {4319--4327}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/hong21b/hong21b.pdf}, url = {https://proceedings.mlr.press/v139/hong21b.html}, abstract = {Distributed computing has been a prominent solution to efficiently process massive datasets in parallel. However, the existence of stragglers is one of the major concerns that slows down the overall speed of distributed computing. To deal with this problem, we consider a distributed matrix multiplication scenario where a master assigns multiple tasks to each worker to exploit stragglers’ computing ability (which is typically wasted in conventional distributed computing). We propose Chebyshev polynomial codes, which can achieve order-wise improvement in encoding complexity at the master and communication load in distributed matrix multiplication using task entanglement. The key idea of task entanglement is to reduce the number of encoded matrices for multiple tasks assigned to each worker by intertwining encoded matrices. We experimentally demonstrate that, in cloud environments, Chebyshev polynomial codes can provide significant reduction in overall processing time in distributed computing for matrix multiplication, which is a key computational component in modern deep learning.} }
Endnote
%0 Conference Paper %T Chebyshev Polynomial Codes: Task Entanglement-based Coding for Distributed Matrix Multiplication %A Sangwoo Hong %A Heecheol Yang %A Youngseok Yoon %A Taehyun Cho %A Jungwoo Lee %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-hong21b %I PMLR %P 4319--4327 %U https://proceedings.mlr.press/v139/hong21b.html %V 139 %X Distributed computing has been a prominent solution to efficiently process massive datasets in parallel. However, the existence of stragglers is one of the major concerns that slows down the overall speed of distributed computing. To deal with this problem, we consider a distributed matrix multiplication scenario where a master assigns multiple tasks to each worker to exploit stragglers’ computing ability (which is typically wasted in conventional distributed computing). We propose Chebyshev polynomial codes, which can achieve order-wise improvement in encoding complexity at the master and communication load in distributed matrix multiplication using task entanglement. The key idea of task entanglement is to reduce the number of encoded matrices for multiple tasks assigned to each worker by intertwining encoded matrices. We experimentally demonstrate that, in cloud environments, Chebyshev polynomial codes can provide significant reduction in overall processing time in distributed computing for matrix multiplication, which is a key computational component in modern deep learning.
APA
Hong, S., Yang, H., Yoon, Y., Cho, T. & Lee, J.. (2021). Chebyshev Polynomial Codes: Task Entanglement-based Coding for Distributed Matrix Multiplication. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:4319-4327 Available from https://proceedings.mlr.press/v139/hong21b.html.

Related Material