SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity

Samir Khaki, Xiuyu Li, Junxian Guo, Ligeng Zhu, Konstantinos N. Plataniotis, Amir Yazdanbakhsh, Kurt Keutzer, Song Han, Zhijian Liu
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:29768-29783, 2025.

Abstract

Fine-tuning LLMs is both computationally and memory-intensive. While parameter-efficient fine-tuning methods, such as QLoRA and DoRA, reduce the number of trainable parameters and lower memory usage, they do not decrease computational cost. In some cases, they may even slow down fine-tuning. In this paper, we introduce SparseLoRA, a method that accelerates LLM fine-tuning through contextual sparsity. We propose a lightweight, training-free SVD sparsity estimator that dynamically selects a sparse subset of weights for loss and gradient computation. Also, we systematically analyze and address sensitivity across layers, tokens, and training steps. Our experimental results show that SparseLoRA reduces computational cost by up to $2.2\times$ and a measured speedup of up to $1.6\times$ while maintaining accuracy across various downstream tasks, including commonsense and arithmetic reasoning, code generation, and instruction following.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-khaki25a, title = {{S}parse{L}o{RA}: Accelerating {LLM} Fine-Tuning with Contextual Sparsity}, author = {Khaki, Samir and Li, Xiuyu and Guo, Junxian and Zhu, Ligeng and Plataniotis, Konstantinos N. and Yazdanbakhsh, Amir and Keutzer, Kurt and Han, Song and Liu, Zhijian}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {29768--29783}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/khaki25a/khaki25a.pdf}, url = {https://proceedings.mlr.press/v267/khaki25a.html}, abstract = {Fine-tuning LLMs is both computationally and memory-intensive. While parameter-efficient fine-tuning methods, such as QLoRA and DoRA, reduce the number of trainable parameters and lower memory usage, they do not decrease computational cost. In some cases, they may even slow down fine-tuning. In this paper, we introduce SparseLoRA, a method that accelerates LLM fine-tuning through contextual sparsity. We propose a lightweight, training-free SVD sparsity estimator that dynamically selects a sparse subset of weights for loss and gradient computation. Also, we systematically analyze and address sensitivity across layers, tokens, and training steps. Our experimental results show that SparseLoRA reduces computational cost by up to $2.2\times$ and a measured speedup of up to $1.6\times$ while maintaining accuracy across various downstream tasks, including commonsense and arithmetic reasoning, code generation, and instruction following.} }
Endnote
%0 Conference Paper %T SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity %A Samir Khaki %A Xiuyu Li %A Junxian Guo %A Ligeng Zhu %A Konstantinos N. Plataniotis %A Amir Yazdanbakhsh %A Kurt Keutzer %A Song Han %A Zhijian Liu %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-khaki25a %I PMLR %P 29768--29783 %U https://proceedings.mlr.press/v267/khaki25a.html %V 267 %X Fine-tuning LLMs is both computationally and memory-intensive. While parameter-efficient fine-tuning methods, such as QLoRA and DoRA, reduce the number of trainable parameters and lower memory usage, they do not decrease computational cost. In some cases, they may even slow down fine-tuning. In this paper, we introduce SparseLoRA, a method that accelerates LLM fine-tuning through contextual sparsity. We propose a lightweight, training-free SVD sparsity estimator that dynamically selects a sparse subset of weights for loss and gradient computation. Also, we systematically analyze and address sensitivity across layers, tokens, and training steps. Our experimental results show that SparseLoRA reduces computational cost by up to $2.2\times$ and a measured speedup of up to $1.6\times$ while maintaining accuracy across various downstream tasks, including commonsense and arithmetic reasoning, code generation, and instruction following.
APA
Khaki, S., Li, X., Guo, J., Zhu, L., Plataniotis, K.N., Yazdanbakhsh, A., Keutzer, K., Han, S. & Liu, Z.. (2025). SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:29768-29783 Available from https://proceedings.mlr.press/v267/khaki25a.html.

Related Material