BSLoRA: Enhancing the Parameter Efficiency of LoRA with Intra-Layer and Inter-Layer Sharing

Yuhua Zhou, Ruifeng Li, Changhai Zhou, Fei Yang, Aimin Pan
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:78883-78902, 2025.

Abstract

Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning method for large language models (LLMs) to adapt to downstream tasks. However, in scenarios where multiple LoRA models are deployed simultaneously, standard LoRA introduces substantial trainable parameters, resulting in significant memory overhead and inference latency, particularly when supporting thousands of downstream tasks on a single server. While existing methods reduce stored parameters via parameter sharing, they fail to capture both local and global information simultaneously. To address this issue, we propose the Bi-Share LoRA (BSLoRA), which extends local LoRA with intra-LoRA and inter-LoRA parameter sharing to better capture local and global information. This approach reduces trainable parameters while maintaining or even enhancing model performance. Additionally, we design three transformation methods to improve the compatibility and collaborative efficiency of shared parameters with varying shapes, enhancing overall adaptability. Experiments on the 7B, 8B, and 13B versions of Llama show that BSLoRA, with only 44.59% of the parameters of standard LoRA, outperforms LoRA by approximately 0.33% on commonsense reasoning and 2.08% on MMLU benchmarks. Code is available at https://github.com/yuhua-zhou/BSLoRA.git.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-zhou25k, title = {{BSL}o{RA}: Enhancing the Parameter Efficiency of {L}o{RA} with Intra-Layer and Inter-Layer Sharing}, author = {Zhou, Yuhua and Li, Ruifeng and Zhou, Changhai and Yang, Fei and Pan, Aimin}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {78883--78902}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zhou25k/zhou25k.pdf}, url = {https://proceedings.mlr.press/v267/zhou25k.html}, abstract = {Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning method for large language models (LLMs) to adapt to downstream tasks. However, in scenarios where multiple LoRA models are deployed simultaneously, standard LoRA introduces substantial trainable parameters, resulting in significant memory overhead and inference latency, particularly when supporting thousands of downstream tasks on a single server. While existing methods reduce stored parameters via parameter sharing, they fail to capture both local and global information simultaneously. To address this issue, we propose the Bi-Share LoRA (BSLoRA), which extends local LoRA with intra-LoRA and inter-LoRA parameter sharing to better capture local and global information. This approach reduces trainable parameters while maintaining or even enhancing model performance. Additionally, we design three transformation methods to improve the compatibility and collaborative efficiency of shared parameters with varying shapes, enhancing overall adaptability. Experiments on the 7B, 8B, and 13B versions of Llama show that BSLoRA, with only 44.59% of the parameters of standard LoRA, outperforms LoRA by approximately 0.33% on commonsense reasoning and 2.08% on MMLU benchmarks. Code is available at https://github.com/yuhua-zhou/BSLoRA.git.} }
Endnote
%0 Conference Paper %T BSLoRA: Enhancing the Parameter Efficiency of LoRA with Intra-Layer and Inter-Layer Sharing %A Yuhua Zhou %A Ruifeng Li %A Changhai Zhou %A Fei Yang %A Aimin Pan %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-zhou25k %I PMLR %P 78883--78902 %U https://proceedings.mlr.press/v267/zhou25k.html %V 267 %X Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning method for large language models (LLMs) to adapt to downstream tasks. However, in scenarios where multiple LoRA models are deployed simultaneously, standard LoRA introduces substantial trainable parameters, resulting in significant memory overhead and inference latency, particularly when supporting thousands of downstream tasks on a single server. While existing methods reduce stored parameters via parameter sharing, they fail to capture both local and global information simultaneously. To address this issue, we propose the Bi-Share LoRA (BSLoRA), which extends local LoRA with intra-LoRA and inter-LoRA parameter sharing to better capture local and global information. This approach reduces trainable parameters while maintaining or even enhancing model performance. Additionally, we design three transformation methods to improve the compatibility and collaborative efficiency of shared parameters with varying shapes, enhancing overall adaptability. Experiments on the 7B, 8B, and 13B versions of Llama show that BSLoRA, with only 44.59% of the parameters of standard LoRA, outperforms LoRA by approximately 0.33% on commonsense reasoning and 2.08% on MMLU benchmarks. Code is available at https://github.com/yuhua-zhou/BSLoRA.git.
APA
Zhou, Y., Li, R., Zhou, C., Yang, F. & Pan, A.. (2025). BSLoRA: Enhancing the Parameter Efficiency of LoRA with Intra-Layer and Inter-Layer Sharing. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:78883-78902 Available from https://proceedings.mlr.press/v267/zhou25k.html.

Related Material