LoRA Training in the NTK Regime has No Spurious Local Minima

Uijeong Jang; Jason D. Lee; Ernest K. Ryu

LoRA Training in the NTK Regime has No Spurious Local Minima

Uijeong Jang, Jason D. Lee, Ernest K. Ryu

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:21306-21328, 2024.

Abstract

Low-rank adaptation (LoRA) has become the standard approach for parameter-efficient fine-tuning of large language models (LLM), but our theoretical understanding of LoRA has been limited. In this work, we theoretically analyze LoRA fine-tuning in the neural tangent kernel (NTK) regime with

$N$ data points, showing: (i) full fine-tuning (without LoRA) admits a low-rank solution of rank

$r\lesssim \sqrt{N}$ ; (ii) using LoRA with rank

$r\gtrsim \sqrt{N}$ eliminates spurious local minima, allowing gradient descent to find the low-rank solutions; (iii) the low-rank solution found using LoRA generalizes well.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-jang24d,
  title = 	 {{L}o{RA} Training in the {NTK} Regime has No Spurious Local Minima},
  author =       {Jang, Uijeong and Lee, Jason D. and Ryu, Ernest K.},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {21306--21328},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/jang24d/jang24d.pdf},
  url = 	 {https://proceedings.mlr.press/v235/jang24d.html},
  abstract = 	 {Low-rank adaptation (LoRA) has become the standard approach for parameter-efficient fine-tuning of large language models (LLM), but our theoretical understanding of LoRA has been limited. In this work, we theoretically analyze LoRA fine-tuning in the neural tangent kernel (NTK) regime with $N$ data points, showing: (i) full fine-tuning (without LoRA) admits a low-rank solution of rank $r\lesssim \sqrt{N}$; (ii) using LoRA with rank $r\gtrsim \sqrt{N}$ eliminates spurious local minima, allowing gradient descent to find the low-rank solutions; (iii) the low-rank solution found using LoRA generalizes well.}
}

Endnote

%0 Conference Paper
%T LoRA Training in the NTK Regime has No Spurious Local Minima
%A Uijeong Jang
%A Jason D. Lee
%A Ernest K. Ryu
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-jang24d
%I PMLR
%P 21306--21328
%U https://proceedings.mlr.press/v235/jang24d.html
%V 235
%X Low-rank adaptation (LoRA) has become the standard approach for parameter-efficient fine-tuning of large language models (LLM), but our theoretical understanding of LoRA has been limited. In this work, we theoretically analyze LoRA fine-tuning in the neural tangent kernel (NTK) regime with $N$ data points, showing: (i) full fine-tuning (without LoRA) admits a low-rank solution of rank $r\lesssim \sqrt{N}$; (ii) using LoRA with rank $r\gtrsim \sqrt{N}$ eliminates spurious local minima, allowing gradient descent to find the low-rank solutions; (iii) the low-rank solution found using LoRA generalizes well.

APA


Jang, U., Lee, J.D. & Ryu, E.K.. (2024). LoRA Training in the NTK Regime has No Spurious Local Minima. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:21306-21328 Available from https://proceedings.mlr.press/v235/jang24d.html.

LoRA Training in the NTK Regime has No Spurious Local Minima

Abstract

Cite this Paper

Related Material