LoRA Training Provably Converges to a Low-Rank Global Minimum Or It Fails Loudly (But it Probably Won’t Fail)

Junsu Kim, Jaeyeon Kim, Ernest K. Ryu
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:30224-30247, 2025.

Abstract

Low-rank adaptation (LoRA) has become a standard approach for fine-tuning large foundation models. However, our theoretical understanding of LoRA remains limited as prior analyses of LoRA’s training dynamics either rely on linearization arguments or consider highly simplified setups. In this work, we analyze the LoRA loss landscape without such restrictive assumptions. We define two regimes: a "special regime", which includes idealized setups where linearization arguments hold, and a "generic regime" representing more realistic setups where linearization arguments do not hold. In the generic regime, we show that LoRA training converges to a global minimizer with low rank and small magnitude, or a qualitatively distinct solution with high rank and large magnitude. Finally, we argue that the zero-initialization and weight decay in LoRA training induce an implicit bias toward the low-rank, small-magnitude region of the parameter space—where global minima lie—thus shedding light on why LoRA training usually succeeds in finding global minima.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-kim25n, title = {{L}o{RA} Training Provably Converges to a Low-Rank Global Minimum Or It Fails Loudly ({B}ut it Probably Won’t Fail)}, author = {Kim, Junsu and Kim, Jaeyeon and Ryu, Ernest K.}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {30224--30247}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/kim25n/kim25n.pdf}, url = {https://proceedings.mlr.press/v267/kim25n.html}, abstract = {Low-rank adaptation (LoRA) has become a standard approach for fine-tuning large foundation models. However, our theoretical understanding of LoRA remains limited as prior analyses of LoRA’s training dynamics either rely on linearization arguments or consider highly simplified setups. In this work, we analyze the LoRA loss landscape without such restrictive assumptions. We define two regimes: a "special regime", which includes idealized setups where linearization arguments hold, and a "generic regime" representing more realistic setups where linearization arguments do not hold. In the generic regime, we show that LoRA training converges to a global minimizer with low rank and small magnitude, or a qualitatively distinct solution with high rank and large magnitude. Finally, we argue that the zero-initialization and weight decay in LoRA training induce an implicit bias toward the low-rank, small-magnitude region of the parameter space—where global minima lie—thus shedding light on why LoRA training usually succeeds in finding global minima.} }
Endnote
%0 Conference Paper %T LoRA Training Provably Converges to a Low-Rank Global Minimum Or It Fails Loudly (But it Probably Won’t Fail) %A Junsu Kim %A Jaeyeon Kim %A Ernest K. Ryu %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-kim25n %I PMLR %P 30224--30247 %U https://proceedings.mlr.press/v267/kim25n.html %V 267 %X Low-rank adaptation (LoRA) has become a standard approach for fine-tuning large foundation models. However, our theoretical understanding of LoRA remains limited as prior analyses of LoRA’s training dynamics either rely on linearization arguments or consider highly simplified setups. In this work, we analyze the LoRA loss landscape without such restrictive assumptions. We define two regimes: a "special regime", which includes idealized setups where linearization arguments hold, and a "generic regime" representing more realistic setups where linearization arguments do not hold. In the generic regime, we show that LoRA training converges to a global minimizer with low rank and small magnitude, or a qualitatively distinct solution with high rank and large magnitude. Finally, we argue that the zero-initialization and weight decay in LoRA training induce an implicit bias toward the low-rank, small-magnitude region of the parameter space—where global minima lie—thus shedding light on why LoRA training usually succeeds in finding global minima.
APA
Kim, J., Kim, J. & Ryu, E.K.. (2025). LoRA Training Provably Converges to a Low-Rank Global Minimum Or It Fails Loudly (But it Probably Won’t Fail). Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:30224-30247 Available from https://proceedings.mlr.press/v267/kim25n.html.

Related Material