Collage: Light-Weight Low-Precision Strategy for LLM Training

Tao Yu, Gaurav Gupta, Karthick Gopalswamy, Amith R Mamidala, Hao Zhou, Jeffrey Huynh, Youngsuk Park, Ron Diamant, Anoop Deoras, Luke Huan
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:57459-57479, 2024.

Abstract

Large models training is plagued by the intense compute cost and limited hardware memory. A practical solution is low-precision representation but is troubled by loss in numerical accuracy and unstable training rendering the model less useful. We argue that low-precision floating points can perform well provided the error is properly compensated at the critical locations in the training process. We propose Collage which utilizes multi-component float representation in low-precision to accurately perform operations with numerical errors accounted. To understand the impact of imprecision to training, we propose a simple and novel metric which tracks the lost information during training as well as differentiates various precision strategies. Our method works with commonly used low-precision such as half-precision ($16$-bit floating points) and can be naturally extended to work with even lower precision such as $8$-bit. Experimental results show that pre-training using Collage removes the requirement of using $32$-bit floating-point copies of the model and attains similar/better training performance compared to $(16, 32)$-bit mixed-precision strategy, with up to $3.7\times$ speedup and $\sim 15%$ to $23%$ less memory usage in practice. The code is available at https://github.com/amazon-science/collage.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-yu24d, title = {Collage: Light-Weight Low-Precision Strategy for {LLM} Training}, author = {Yu, Tao and Gupta, Gaurav and Gopalswamy, Karthick and Mamidala, Amith R and Zhou, Hao and Huynh, Jeffrey and Park, Youngsuk and Diamant, Ron and Deoras, Anoop and Huan, Luke}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {57459--57479}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/yu24d/yu24d.pdf}, url = {https://proceedings.mlr.press/v235/yu24d.html}, abstract = {Large models training is plagued by the intense compute cost and limited hardware memory. A practical solution is low-precision representation but is troubled by loss in numerical accuracy and unstable training rendering the model less useful. We argue that low-precision floating points can perform well provided the error is properly compensated at the critical locations in the training process. We propose Collage which utilizes multi-component float representation in low-precision to accurately perform operations with numerical errors accounted. To understand the impact of imprecision to training, we propose a simple and novel metric which tracks the lost information during training as well as differentiates various precision strategies. Our method works with commonly used low-precision such as half-precision ($16$-bit floating points) and can be naturally extended to work with even lower precision such as $8$-bit. Experimental results show that pre-training using Collage removes the requirement of using $32$-bit floating-point copies of the model and attains similar/better training performance compared to $(16, 32)$-bit mixed-precision strategy, with up to $3.7\times$ speedup and $\sim 15%$ to $23%$ less memory usage in practice. The code is available at https://github.com/amazon-science/collage.} }
Endnote
%0 Conference Paper %T Collage: Light-Weight Low-Precision Strategy for LLM Training %A Tao Yu %A Gaurav Gupta %A Karthick Gopalswamy %A Amith R Mamidala %A Hao Zhou %A Jeffrey Huynh %A Youngsuk Park %A Ron Diamant %A Anoop Deoras %A Luke Huan %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-yu24d %I PMLR %P 57459--57479 %U https://proceedings.mlr.press/v235/yu24d.html %V 235 %X Large models training is plagued by the intense compute cost and limited hardware memory. A practical solution is low-precision representation but is troubled by loss in numerical accuracy and unstable training rendering the model less useful. We argue that low-precision floating points can perform well provided the error is properly compensated at the critical locations in the training process. We propose Collage which utilizes multi-component float representation in low-precision to accurately perform operations with numerical errors accounted. To understand the impact of imprecision to training, we propose a simple and novel metric which tracks the lost information during training as well as differentiates various precision strategies. Our method works with commonly used low-precision such as half-precision ($16$-bit floating points) and can be naturally extended to work with even lower precision such as $8$-bit. Experimental results show that pre-training using Collage removes the requirement of using $32$-bit floating-point copies of the model and attains similar/better training performance compared to $(16, 32)$-bit mixed-precision strategy, with up to $3.7\times$ speedup and $\sim 15%$ to $23%$ less memory usage in practice. The code is available at https://github.com/amazon-science/collage.
APA
Yu, T., Gupta, G., Gopalswamy, K., Mamidala, A.R., Zhou, H., Huynh, J., Park, Y., Diamant, R., Deoras, A. & Huan, L.. (2024). Collage: Light-Weight Low-Precision Strategy for LLM Training. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:57459-57479 Available from https://proceedings.mlr.press/v235/yu24d.html.

Related Material