Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters

Yuhang Zhou, Zihua Zhao, Siyuan Du, Haolin Li, Jiangchao Yao, Ya Zhang, Yanfeng Wang
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:62294-62306, 2024.

Abstract

Training a unified model to take multiple targets into account is a trend towards artificial general intelligence. However, how to efficiently mitigate the training conflicts among heterogeneous data collected from different domains or tasks remains under-explored. In this study, we explore to leverage Mixture of Low-rank Adapters (MoLA) to mitigate conflicts in heterogeneous data training, which requires to jointly train the multiple low-rank adapters and their shared backbone. Specifically, we introduce two variants of MoLA, namely, MoLA-Grad and MoLA-Router, to respectively handle the target-aware and target-agnostic scenarios during inference. The former uses task identifiers to assign personalized low-rank adapters to each task, disentangling task-specific knowledge towards their adapters, thereby mitigating heterogeneity conflicts. The latter uses a novel Task-wise Decorrelation (TwD) loss to intervene the router to learn oriented weight combinations of adapters to homogeneous tasks, achieving similar effects. We conduct comprehensive experiments to verify the superiority of MoLA over previous state-of-the-art methods and present in-depth analysis on its working mechanism. Source code is available at: https://github.com/MediaBrain-SJTU/MoLA

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-zhou24w, title = {Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters}, author = {Zhou, Yuhang and Zhao, Zihua and Du, Siyuan and Li, Haolin and Yao, Jiangchao and Zhang, Ya and Wang, Yanfeng}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {62294--62306}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/zhou24w/zhou24w.pdf}, url = {https://proceedings.mlr.press/v235/zhou24w.html}, abstract = {Training a unified model to take multiple targets into account is a trend towards artificial general intelligence. However, how to efficiently mitigate the training conflicts among heterogeneous data collected from different domains or tasks remains under-explored. In this study, we explore to leverage Mixture of Low-rank Adapters (MoLA) to mitigate conflicts in heterogeneous data training, which requires to jointly train the multiple low-rank adapters and their shared backbone. Specifically, we introduce two variants of MoLA, namely, MoLA-Grad and MoLA-Router, to respectively handle the target-aware and target-agnostic scenarios during inference. The former uses task identifiers to assign personalized low-rank adapters to each task, disentangling task-specific knowledge towards their adapters, thereby mitigating heterogeneity conflicts. The latter uses a novel Task-wise Decorrelation (TwD) loss to intervene the router to learn oriented weight combinations of adapters to homogeneous tasks, achieving similar effects. We conduct comprehensive experiments to verify the superiority of MoLA over previous state-of-the-art methods and present in-depth analysis on its working mechanism. Source code is available at: https://github.com/MediaBrain-SJTU/MoLA} }
Endnote
%0 Conference Paper %T Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters %A Yuhang Zhou %A Zihua Zhao %A Siyuan Du %A Haolin Li %A Jiangchao Yao %A Ya Zhang %A Yanfeng Wang %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-zhou24w %I PMLR %P 62294--62306 %U https://proceedings.mlr.press/v235/zhou24w.html %V 235 %X Training a unified model to take multiple targets into account is a trend towards artificial general intelligence. However, how to efficiently mitigate the training conflicts among heterogeneous data collected from different domains or tasks remains under-explored. In this study, we explore to leverage Mixture of Low-rank Adapters (MoLA) to mitigate conflicts in heterogeneous data training, which requires to jointly train the multiple low-rank adapters and their shared backbone. Specifically, we introduce two variants of MoLA, namely, MoLA-Grad and MoLA-Router, to respectively handle the target-aware and target-agnostic scenarios during inference. The former uses task identifiers to assign personalized low-rank adapters to each task, disentangling task-specific knowledge towards their adapters, thereby mitigating heterogeneity conflicts. The latter uses a novel Task-wise Decorrelation (TwD) loss to intervene the router to learn oriented weight combinations of adapters to homogeneous tasks, achieving similar effects. We conduct comprehensive experiments to verify the superiority of MoLA over previous state-of-the-art methods and present in-depth analysis on its working mechanism. Source code is available at: https://github.com/MediaBrain-SJTU/MoLA
APA
Zhou, Y., Zhao, Z., Du, S., Li, H., Yao, J., Zhang, Y. & Wang, Y.. (2024). Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:62294-62306 Available from https://proceedings.mlr.press/v235/zhou24w.html.

Related Material