Improving Model Alignment Through Collective Intelligence of Open-Source Models

Junlin Wang; Roy Xie; Shang Zhu; Jue Wang; Ben Athiwaratkun; Bhuwan Dhingra; Shuaiwen Leon Song; Ce Zhang; James Zou

Improving Model Alignment Through Collective Intelligence of Open-Source Models

Junlin Wang, Roy Xie, Shang Zhu, Jue Wang, Ben Athiwaratkun, Bhuwan Dhingra, Shuaiwen Leon Song, Ce Zhang, James Zou

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:64976-64997, 2025.

Abstract

Building helpful and harmless large language models (LLMs) requires effective model alignment approach based on human instructions and feedback, which necessitates high-quality human-labeled data. Constructing such datasets is often expensive and hard to scale, and may face potential limitations on diversity and generalization. To address these challenges, we introduce Mixture of Agents Alignment (MoAA), that leverages the collective strengths of various language models to provide high-quality data for model alignment. By employing MoAA, we enhance both supervised fine-tuning and preference optimization, leading to improved performance compared to using a single model alone to generate alignment data (e.g. using GPT-4o alone). Evaluation results show that our approach can improve win rate of LLaMA-3.1-8B-Instruct from 19.5 to 48.3 on Arena-Hard and from 22.33 to 57.23 on AlpacaEval2, highlighting a promising direction for model alignment through this new scalable and diverse synthetic data recipe. Furthermore, we demonstrate that MoAA enables a self-improvement pipeline, where models fine-tuned on MoA-generated data surpass their own initial capabilities, providing evidence that our approach can push the frontier of open-source LLMs without reliance on stronger external supervision. Data and code will be released.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-wang25dr,
  title = 	 {Improving Model Alignment Through Collective Intelligence of Open-Source Models},
  author =       {Wang, Junlin and Xie, Roy and Zhu, Shang and Wang, Jue and Athiwaratkun, Ben and Dhingra, Bhuwan and Song, Shuaiwen Leon and Zhang, Ce and Zou, James},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {64976--64997},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/wang25dr/wang25dr.pdf},
  url = 	 {https://proceedings.mlr.press/v267/wang25dr.html},
  abstract = 	 {Building helpful and harmless large language models (LLMs) requires effective model alignment approach based on human instructions and feedback, which necessitates high-quality human-labeled data. Constructing such datasets is often expensive and hard to scale, and may face potential limitations on diversity and generalization. To address these challenges, we introduce Mixture of Agents Alignment (MoAA), that leverages the collective strengths of various language models to provide high-quality data for model alignment. By employing MoAA, we enhance both supervised fine-tuning and preference optimization, leading to improved performance compared to using a single model alone to generate alignment data (e.g. using GPT-4o alone). Evaluation results show that our approach can improve win rate of LLaMA-3.1-8B-Instruct from 19.5 to 48.3 on Arena-Hard and from 22.33 to 57.23 on AlpacaEval2, highlighting a promising direction for model alignment through this new scalable and diverse synthetic data recipe. Furthermore, we demonstrate that MoAA enables a self-improvement pipeline, where models fine-tuned on MoA-generated data surpass their own initial capabilities, providing evidence that our approach can push the frontier of open-source LLMs without reliance on stronger external supervision. Data and code will be released.}
}

Endnote

%0 Conference Paper
%T Improving Model Alignment Through Collective Intelligence of Open-Source Models
%A Junlin Wang
%A Roy Xie
%A Shang Zhu
%A Jue Wang
%A Ben Athiwaratkun
%A Bhuwan Dhingra
%A Shuaiwen Leon Song
%A Ce Zhang
%A James Zou
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-wang25dr
%I PMLR
%P 64976--64997
%U https://proceedings.mlr.press/v267/wang25dr.html
%V 267
%X Building helpful and harmless large language models (LLMs) requires effective model alignment approach based on human instructions and feedback, which necessitates high-quality human-labeled data. Constructing such datasets is often expensive and hard to scale, and may face potential limitations on diversity and generalization. To address these challenges, we introduce Mixture of Agents Alignment (MoAA), that leverages the collective strengths of various language models to provide high-quality data for model alignment. By employing MoAA, we enhance both supervised fine-tuning and preference optimization, leading to improved performance compared to using a single model alone to generate alignment data (e.g. using GPT-4o alone). Evaluation results show that our approach can improve win rate of LLaMA-3.1-8B-Instruct from 19.5 to 48.3 on Arena-Hard and from 22.33 to 57.23 on AlpacaEval2, highlighting a promising direction for model alignment through this new scalable and diverse synthetic data recipe. Furthermore, we demonstrate that MoAA enables a self-improvement pipeline, where models fine-tuned on MoA-generated data surpass their own initial capabilities, providing evidence that our approach can push the frontier of open-source LLMs without reliance on stronger external supervision. Data and code will be released.

APA

Wang, J., Xie, R., Zhu, S., Wang, J., Athiwaratkun, B., Dhingra, B., Song, S.L., Zhang, C. & Zou, J.. (2025). Improving Model Alignment Through Collective Intelligence of Open-Source Models. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:64976-64997 Available from https://proceedings.mlr.press/v267/wang25dr.html.

Improving Model Alignment Through Collective Intelligence of Open-Source Models

Abstract

Cite this Paper

Related Material