On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists

Dongyang Fan, Bettina Messmer, Nikita Doikov, Martin Jaggi
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:15833-15861, 2025.

Abstract

On-device LLMs have gained increasing attention for their ability to enhance privacy and provide a personalized user experience. To facilitate private learning with scarce data, Federated Learning has become a standard approach. However, it faces challenges such as computational resource heterogeneity and data heterogeneity among end users. We propose CoMiGS ($\textbf{Co}$llaborative learning with a $\textbf{Mi}$xture of $\textbf{G}$eneralists and $\textbf{S}$pecialists), the first approach to address both challenges. A key innovation of our method is the bi-level optimization formulation of the Mixture-of-Experts learning objective, where the router is optimized using a separate validation set to ensure alignment with the target distribution. We solve our objective with alternating minimization, for which we provide a theoretical analysis. Our method shares generalist experts across users while localizing a varying number of specialist experts, thereby adapting to users’ computational resources and preserving privacy. Through extensive experiments, we show CoMiGS effectively balances general and personalized knowledge for each token generation. We demonstrate that CoMiGS remains robust against overfitting—due to the generalists’ regularizing effect—while adapting to local data through specialist expertise. We open source our codebase for collaborative LLMs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-fan25h, title = {On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists}, author = {Fan, Dongyang and Messmer, Bettina and Doikov, Nikita and Jaggi, Martin}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {15833--15861}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/fan25h/fan25h.pdf}, url = {https://proceedings.mlr.press/v267/fan25h.html}, abstract = {On-device LLMs have gained increasing attention for their ability to enhance privacy and provide a personalized user experience. To facilitate private learning with scarce data, Federated Learning has become a standard approach. However, it faces challenges such as computational resource heterogeneity and data heterogeneity among end users. We propose CoMiGS ($\textbf{Co}$llaborative learning with a $\textbf{Mi}$xture of $\textbf{G}$eneralists and $\textbf{S}$pecialists), the first approach to address both challenges. A key innovation of our method is the bi-level optimization formulation of the Mixture-of-Experts learning objective, where the router is optimized using a separate validation set to ensure alignment with the target distribution. We solve our objective with alternating minimization, for which we provide a theoretical analysis. Our method shares generalist experts across users while localizing a varying number of specialist experts, thereby adapting to users’ computational resources and preserving privacy. Through extensive experiments, we show CoMiGS effectively balances general and personalized knowledge for each token generation. We demonstrate that CoMiGS remains robust against overfitting—due to the generalists’ regularizing effect—while adapting to local data through specialist expertise. We open source our codebase for collaborative LLMs.} }
Endnote
%0 Conference Paper %T On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists %A Dongyang Fan %A Bettina Messmer %A Nikita Doikov %A Martin Jaggi %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-fan25h %I PMLR %P 15833--15861 %U https://proceedings.mlr.press/v267/fan25h.html %V 267 %X On-device LLMs have gained increasing attention for their ability to enhance privacy and provide a personalized user experience. To facilitate private learning with scarce data, Federated Learning has become a standard approach. However, it faces challenges such as computational resource heterogeneity and data heterogeneity among end users. We propose CoMiGS ($\textbf{Co}$llaborative learning with a $\textbf{Mi}$xture of $\textbf{G}$eneralists and $\textbf{S}$pecialists), the first approach to address both challenges. A key innovation of our method is the bi-level optimization formulation of the Mixture-of-Experts learning objective, where the router is optimized using a separate validation set to ensure alignment with the target distribution. We solve our objective with alternating minimization, for which we provide a theoretical analysis. Our method shares generalist experts across users while localizing a varying number of specialist experts, thereby adapting to users’ computational resources and preserving privacy. Through extensive experiments, we show CoMiGS effectively balances general and personalized knowledge for each token generation. We demonstrate that CoMiGS remains robust against overfitting—due to the generalists’ regularizing effect—while adapting to local data through specialist expertise. We open source our codebase for collaborative LLMs.
APA
Fan, D., Messmer, B., Doikov, N. & Jaggi, M.. (2025). On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:15833-15861 Available from https://proceedings.mlr.press/v267/fan25h.html.

Related Material