GACT: Activation Compressed Training for Generic Network Architectures

Xiaoxuan Liu, Lianmin Zheng, Dequan Wang, Yukuo Cen, Weize Chen, Xu Han, Jianfei Chen, Zhiyuan Liu, Jie Tang, Joey Gonzalez, Michael Mahoney, Alvin Cheung
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:14139-14152, 2022.

Abstract

Training large neural network (NN) models requires extensive memory resources, and Activation Compression Training (ACT) is a promising approach to reduce training memory footprint. This paper presents GACT, an ACT framework to support a broad range of machine learning tasks for generic NN architectures with limited domain knowledge. By analyzing a linearized version of ACT’s approximate gradient, we prove the convergence of GACT without prior knowledge on operator type or model architecture. To make training stable, we propose an algorithm that decides the compression ratio for each tensor by estimating its impact on the gradient at run time. We implement GACT as a PyTorch library that readily applies to any NN architecture. GACT reduces the activation memory for convolutional NNs, transformers, and graph NNs by up to 8.1x, enabling training with a 4.2x to 24.7x larger batch size, with negligible accuracy loss.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-liu22v, title = {{GACT}: Activation Compressed Training for Generic Network Architectures}, author = {Liu, Xiaoxuan and Zheng, Lianmin and Wang, Dequan and Cen, Yukuo and Chen, Weize and Han, Xu and Chen, Jianfei and Liu, Zhiyuan and Tang, Jie and Gonzalez, Joey and Mahoney, Michael and Cheung, Alvin}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {14139--14152}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/liu22v/liu22v.pdf}, url = {https://proceedings.mlr.press/v162/liu22v.html}, abstract = {Training large neural network (NN) models requires extensive memory resources, and Activation Compression Training (ACT) is a promising approach to reduce training memory footprint. This paper presents GACT, an ACT framework to support a broad range of machine learning tasks for generic NN architectures with limited domain knowledge. By analyzing a linearized version of ACT’s approximate gradient, we prove the convergence of GACT without prior knowledge on operator type or model architecture. To make training stable, we propose an algorithm that decides the compression ratio for each tensor by estimating its impact on the gradient at run time. We implement GACT as a PyTorch library that readily applies to any NN architecture. GACT reduces the activation memory for convolutional NNs, transformers, and graph NNs by up to 8.1x, enabling training with a 4.2x to 24.7x larger batch size, with negligible accuracy loss.} }
Endnote
%0 Conference Paper %T GACT: Activation Compressed Training for Generic Network Architectures %A Xiaoxuan Liu %A Lianmin Zheng %A Dequan Wang %A Yukuo Cen %A Weize Chen %A Xu Han %A Jianfei Chen %A Zhiyuan Liu %A Jie Tang %A Joey Gonzalez %A Michael Mahoney %A Alvin Cheung %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-liu22v %I PMLR %P 14139--14152 %U https://proceedings.mlr.press/v162/liu22v.html %V 162 %X Training large neural network (NN) models requires extensive memory resources, and Activation Compression Training (ACT) is a promising approach to reduce training memory footprint. This paper presents GACT, an ACT framework to support a broad range of machine learning tasks for generic NN architectures with limited domain knowledge. By analyzing a linearized version of ACT’s approximate gradient, we prove the convergence of GACT without prior knowledge on operator type or model architecture. To make training stable, we propose an algorithm that decides the compression ratio for each tensor by estimating its impact on the gradient at run time. We implement GACT as a PyTorch library that readily applies to any NN architecture. GACT reduces the activation memory for convolutional NNs, transformers, and graph NNs by up to 8.1x, enabling training with a 4.2x to 24.7x larger batch size, with negligible accuracy loss.
APA
Liu, X., Zheng, L., Wang, D., Cen, Y., Chen, W., Han, X., Chen, J., Liu, Z., Tang, J., Gonzalez, J., Mahoney, M. & Cheung, A.. (2022). GACT: Activation Compressed Training for Generic Network Architectures. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:14139-14152 Available from https://proceedings.mlr.press/v162/liu22v.html.

Related Material