Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation

Zhan Zhuang, Xiequn Wang, Wei Li, Yulong Zhang, Qiushi Huang, Shuhao Chen, Xuehao Wang, Yanbin Wei, Yuhe Nie, Kede Ma, Yu Zhang, Ying Wei
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:80533-80550, 2025.

Abstract

Low-rank adaptation (LoRA) has emerged as a leading parameter-efficient fine-tuning technique for adapting large foundation models, yet it often locks adapters into suboptimal minima near their initialization. This hampers model generalization and limits downstream operators such as adapter merging and pruning. Here, we propose CoTo, a progressive training strategy that gradually increases adapters’ activation probability over the course of fine-tuning. By stochastically deactivating adapters, CoTo encourages more balanced optimization and broader exploration of the loss landscape. We provide a theoretical analysis showing that CoTo promotes layer-wise dropout stability and linear mode connectivity, and we adopt a cooperative-game approach to quantify each adapter’s marginal contribution. Extensive experiments demonstrate that CoTo consistently boosts single-task performance, enhances multi-task merging accuracy, improves pruning robustness, and reduces training overhead, all while remaining compatible with diverse LoRA variants. Code is available at https://github.com/zwebzone/coto.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-zhuang25c, title = {Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation}, author = {Zhuang, Zhan and Wang, Xiequn and Li, Wei and Zhang, Yulong and Huang, Qiushi and Chen, Shuhao and Wang, Xuehao and Wei, Yanbin and Nie, Yuhe and Ma, Kede and Zhang, Yu and Wei, Ying}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {80533--80550}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zhuang25c/zhuang25c.pdf}, url = {https://proceedings.mlr.press/v267/zhuang25c.html}, abstract = {Low-rank adaptation (LoRA) has emerged as a leading parameter-efficient fine-tuning technique for adapting large foundation models, yet it often locks adapters into suboptimal minima near their initialization. This hampers model generalization and limits downstream operators such as adapter merging and pruning. Here, we propose CoTo, a progressive training strategy that gradually increases adapters’ activation probability over the course of fine-tuning. By stochastically deactivating adapters, CoTo encourages more balanced optimization and broader exploration of the loss landscape. We provide a theoretical analysis showing that CoTo promotes layer-wise dropout stability and linear mode connectivity, and we adopt a cooperative-game approach to quantify each adapter’s marginal contribution. Extensive experiments demonstrate that CoTo consistently boosts single-task performance, enhances multi-task merging accuracy, improves pruning robustness, and reduces training overhead, all while remaining compatible with diverse LoRA variants. Code is available at https://github.com/zwebzone/coto.} }
Endnote
%0 Conference Paper %T Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation %A Zhan Zhuang %A Xiequn Wang %A Wei Li %A Yulong Zhang %A Qiushi Huang %A Shuhao Chen %A Xuehao Wang %A Yanbin Wei %A Yuhe Nie %A Kede Ma %A Yu Zhang %A Ying Wei %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-zhuang25c %I PMLR %P 80533--80550 %U https://proceedings.mlr.press/v267/zhuang25c.html %V 267 %X Low-rank adaptation (LoRA) has emerged as a leading parameter-efficient fine-tuning technique for adapting large foundation models, yet it often locks adapters into suboptimal minima near their initialization. This hampers model generalization and limits downstream operators such as adapter merging and pruning. Here, we propose CoTo, a progressive training strategy that gradually increases adapters’ activation probability over the course of fine-tuning. By stochastically deactivating adapters, CoTo encourages more balanced optimization and broader exploration of the loss landscape. We provide a theoretical analysis showing that CoTo promotes layer-wise dropout stability and linear mode connectivity, and we adopt a cooperative-game approach to quantify each adapter’s marginal contribution. Extensive experiments demonstrate that CoTo consistently boosts single-task performance, enhances multi-task merging accuracy, improves pruning robustness, and reduces training overhead, all while remaining compatible with diverse LoRA variants. Code is available at https://github.com/zwebzone/coto.
APA
Zhuang, Z., Wang, X., Li, W., Zhang, Y., Huang, Q., Chen, S., Wang, X., Wei, Y., Nie, Y., Ma, K., Zhang, Y. & Wei, Y.. (2025). Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:80533-80550 Available from https://proceedings.mlr.press/v267/zhuang25c.html.

Related Material