CoSER: Coordinating LLM-Based Persona Simulation of Established Roles

Xintao Wang; Heng Wang; Yifei Zhang; Xinfeng Yuan; Rui Xu; Jen-Tse Huang; Siyu Yuan; Haoran Guo; Jiangjie Chen; Shuchang Zhou; Wei Wang; Yanghua Xiao

CoSER: Coordinating LLM-Based Persona Simulation of Established Roles

Xintao Wang, Heng Wang, Yifei Zhang, Xinfeng Yuan, Rui Xu, Jen-Tse Huang, Siyu Yuan, Haoran Guo, Jiangjie Chen, Shuchang Zhou, Wei Wang, Yanghua Xiao

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:64822-64858, 2025.

Abstract

Role-playing language agents (RPLAs) have emerged as promising applications of large language models (LLMs). However, simulating established characters presents a challenging task for RPLAs, due to the lack of authentic character datasets and nuanced evaluation methods using such data. In this paper, we present CoSER, a collection of a high-quality dataset, open models, and an evaluation protocol towards effective RPLAs of established characters. The CoSER dataset covers 17,966 characters from 771 renowned books. It provides authentic dialogues with real-world intricacies, as well as diverse data types such as character experiences and internal thoughts. Drawing from acting methodology, we introduce given-circumstance acting for training and evaluating role-playing LLMs, where LLMs sequentially portray multiple characters in book scenes. Using our dataset, we develop CoSER 8B and CoSER 70B, i.e., advanced open role-playing LLMs built on LLaMA-3.1 models. Extensive experiments demonstrate the value of the CoSER dataset for RPLA training, evaluation and retrieval. Moreover, CoSER 70B exhibits state-of-the-art performance surpassing or matching GPT-4o on our evaluation and three existing benchmarks, i.e., achieving 75.80% and 93.47% accuracy on the InCharacter and LifeChoice benchmarks respectively. Our code, dataset and models are available at: https://github.com/Neph0s/CoSER.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-wang25dk,
  title = 	 {{C}o{SER}: Coordinating {LLM}-Based Persona Simulation of Established Roles},
  author =       {Wang, Xintao and Wang, Heng and Zhang, Yifei and Yuan, Xinfeng and Xu, Rui and Huang, Jen-Tse and Yuan, Siyu and Guo, Haoran and Chen, Jiangjie and Zhou, Shuchang and Wang, Wei and Xiao, Yanghua},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {64822--64858},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/wang25dk/wang25dk.pdf},
  url = 	 {https://proceedings.mlr.press/v267/wang25dk.html},
  abstract = 	 {Role-playing language agents (RPLAs) have emerged as promising applications of large language models (LLMs). However, simulating established characters presents a challenging task for RPLAs, due to the lack of authentic character datasets and nuanced evaluation methods using such data. In this paper, we present CoSER, a collection of a high-quality dataset, open models, and an evaluation protocol towards effective RPLAs of established characters. The CoSER dataset covers 17,966 characters from 771 renowned books. It provides authentic dialogues with real-world intricacies, as well as diverse data types such as character experiences and internal thoughts. Drawing from acting methodology, we introduce given-circumstance acting for training and evaluating role-playing LLMs, where LLMs sequentially portray multiple characters in book scenes. Using our dataset, we develop CoSER 8B and CoSER 70B, i.e., advanced open role-playing LLMs built on LLaMA-3.1 models. Extensive experiments demonstrate the value of the CoSER dataset for RPLA training, evaluation and retrieval. Moreover, CoSER 70B exhibits state-of-the-art performance surpassing or matching GPT-4o on our evaluation and three existing benchmarks, i.e., achieving 75.80% and 93.47% accuracy on the InCharacter and LifeChoice benchmarks respectively. Our code, dataset and models are available at: https://github.com/Neph0s/CoSER.}
}

Endnote

%0 Conference Paper
%T CoSER: Coordinating LLM-Based Persona Simulation of Established Roles
%A Xintao Wang
%A Heng Wang
%A Yifei Zhang
%A Xinfeng Yuan
%A Rui Xu
%A Jen-Tse Huang
%A Siyu Yuan
%A Haoran Guo
%A Jiangjie Chen
%A Shuchang Zhou
%A Wei Wang
%A Yanghua Xiao
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-wang25dk
%I PMLR
%P 64822--64858
%U https://proceedings.mlr.press/v267/wang25dk.html
%V 267
%X Role-playing language agents (RPLAs) have emerged as promising applications of large language models (LLMs). However, simulating established characters presents a challenging task for RPLAs, due to the lack of authentic character datasets and nuanced evaluation methods using such data. In this paper, we present CoSER, a collection of a high-quality dataset, open models, and an evaluation protocol towards effective RPLAs of established characters. The CoSER dataset covers 17,966 characters from 771 renowned books. It provides authentic dialogues with real-world intricacies, as well as diverse data types such as character experiences and internal thoughts. Drawing from acting methodology, we introduce given-circumstance acting for training and evaluating role-playing LLMs, where LLMs sequentially portray multiple characters in book scenes. Using our dataset, we develop CoSER 8B and CoSER 70B, i.e., advanced open role-playing LLMs built on LLaMA-3.1 models. Extensive experiments demonstrate the value of the CoSER dataset for RPLA training, evaluation and retrieval. Moreover, CoSER 70B exhibits state-of-the-art performance surpassing or matching GPT-4o on our evaluation and three existing benchmarks, i.e., achieving 75.80% and 93.47% accuracy on the InCharacter and LifeChoice benchmarks respectively. Our code, dataset and models are available at: https://github.com/Neph0s/CoSER.

APA

Wang, X., Wang, H., Zhang, Y., Yuan, X., Xu, R., Huang, J., Yuan, S., Guo, H., Chen, J., Zhou, S., Wang, W. & Xiao, Y.. (2025). CoSER: Coordinating LLM-Based Persona Simulation of Established Roles. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:64822-64858 Available from https://proceedings.mlr.press/v267/wang25dk.html.

CoSER: Coordinating LLM-Based Persona Simulation of Established Roles

Abstract

Cite this Paper

Related Material