On Efficient Constructions of Checkpoints

Yu Chen, Zhenming Liu, Bin Ren, Xin Jin
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:1627-1636, 2020.

Abstract

Efficient construction of checkpoints/snapshots is a critical tool for training and diagnosing deep learning models. In this paper, we propose a lossy compression scheme for checkpoint constructions (called LC-Checkpoint). LC-Checkpoint simultaneously maximizes the compression rate and optimizes the recovery speed, under the assumption that SGD is used to train the model. LC-Checkpoint uses quantization and priority promotion to store the most crucial information for SGD to recover, and then uses a Huffman coding to leverage the non-uniform distribution of the gradient scales. Our extensive experiments show that LC-Checkpoint achieves a compression rate up to 28{\texttimes} and recovery speedup up to 5.77{\texttimes} over a state-of-the-art algorithm (SCAR).

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-chen20m, title = {On Efficient Constructions of Checkpoints}, author = {Chen, Yu and Liu, Zhenming and Ren, Bin and Jin, Xin}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {1627--1636}, year = {2020}, editor = {Hal Daumé III and Aarti Singh}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/chen20m/chen20m.pdf}, url = { http://proceedings.mlr.press/v119/chen20m.html }, abstract = {Efficient construction of checkpoints/snapshots is a critical tool for training and diagnosing deep learning models. In this paper, we propose a lossy compression scheme for checkpoint constructions (called LC-Checkpoint). LC-Checkpoint simultaneously maximizes the compression rate and optimizes the recovery speed, under the assumption that SGD is used to train the model. LC-Checkpoint uses quantization and priority promotion to store the most crucial information for SGD to recover, and then uses a Huffman coding to leverage the non-uniform distribution of the gradient scales. Our extensive experiments show that LC-Checkpoint achieves a compression rate up to 28{\texttimes} and recovery speedup up to 5.77{\texttimes} over a state-of-the-art algorithm (SCAR).} }
Endnote
%0 Conference Paper %T On Efficient Constructions of Checkpoints %A Yu Chen %A Zhenming Liu %A Bin Ren %A Xin Jin %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-chen20m %I PMLR %P 1627--1636 %U http://proceedings.mlr.press/v119/chen20m.html %V 119 %X Efficient construction of checkpoints/snapshots is a critical tool for training and diagnosing deep learning models. In this paper, we propose a lossy compression scheme for checkpoint constructions (called LC-Checkpoint). LC-Checkpoint simultaneously maximizes the compression rate and optimizes the recovery speed, under the assumption that SGD is used to train the model. LC-Checkpoint uses quantization and priority promotion to store the most crucial information for SGD to recover, and then uses a Huffman coding to leverage the non-uniform distribution of the gradient scales. Our extensive experiments show that LC-Checkpoint achieves a compression rate up to 28{\texttimes} and recovery speedup up to 5.77{\texttimes} over a state-of-the-art algorithm (SCAR).
APA
Chen, Y., Liu, Z., Ren, B. & Jin, X.. (2020). On Efficient Constructions of Checkpoints. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:1627-1636 Available from http://proceedings.mlr.press/v119/chen20m.html .

Related Material