Flora: Low-Rank Adapters Are Secretly Gradient Compressors

Yongchang Hao, Yanshuai Cao, Lili Mou
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:17554-17571, 2024.

Abstract

Despite large neural networks demonstrating remarkable abilities to complete different tasks, they require excessive memory usage to store the optimization states for training. To alleviate this, the low-rank adaptation (LoRA) is proposed to reduce the optimization states by training fewer parameters. However, LoRA restricts overall weight update matrices to be low-rank, limiting the model performance. In this work, we investigate the dynamics of LoRA and identify that it can be approximated by a random projection. Based on this observation, we propose Flora, which is able to achieve high-rank updates by resampling the projection matrices while enjoying the sublinear space complexity of optimization states. We conduct experiments across different tasks and model architectures to verify the effectiveness of our approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-hao24a, title = {Flora: Low-Rank Adapters Are Secretly Gradient Compressors}, author = {Hao, Yongchang and Cao, Yanshuai and Mou, Lili}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {17554--17571}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/hao24a/hao24a.pdf}, url = {https://proceedings.mlr.press/v235/hao24a.html}, abstract = {Despite large neural networks demonstrating remarkable abilities to complete different tasks, they require excessive memory usage to store the optimization states for training. To alleviate this, the low-rank adaptation (LoRA) is proposed to reduce the optimization states by training fewer parameters. However, LoRA restricts overall weight update matrices to be low-rank, limiting the model performance. In this work, we investigate the dynamics of LoRA and identify that it can be approximated by a random projection. Based on this observation, we propose Flora, which is able to achieve high-rank updates by resampling the projection matrices while enjoying the sublinear space complexity of optimization states. We conduct experiments across different tasks and model architectures to verify the effectiveness of our approach.} }
Endnote
%0 Conference Paper %T Flora: Low-Rank Adapters Are Secretly Gradient Compressors %A Yongchang Hao %A Yanshuai Cao %A Lili Mou %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-hao24a %I PMLR %P 17554--17571 %U https://proceedings.mlr.press/v235/hao24a.html %V 235 %X Despite large neural networks demonstrating remarkable abilities to complete different tasks, they require excessive memory usage to store the optimization states for training. To alleviate this, the low-rank adaptation (LoRA) is proposed to reduce the optimization states by training fewer parameters. However, LoRA restricts overall weight update matrices to be low-rank, limiting the model performance. In this work, we investigate the dynamics of LoRA and identify that it can be approximated by a random projection. Based on this observation, we propose Flora, which is able to achieve high-rank updates by resampling the projection matrices while enjoying the sublinear space complexity of optimization states. We conduct experiments across different tasks and model architectures to verify the effectiveness of our approach.
APA
Hao, Y., Cao, Y. & Mou, L.. (2024). Flora: Low-Rank Adapters Are Secretly Gradient Compressors. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:17554-17571 Available from https://proceedings.mlr.press/v235/hao24a.html.

Related Material