SPRI: Aligning Large Language Models with Context-Situated Principles

Hongli Zhan; Muneeza Azmat; Raya Horesh; Junyi Jessy Li; Mikhail Yurochkin

SPRI: Aligning Large Language Models with Context-Situated Principles

Hongli Zhan, Muneeza Azmat, Raya Horesh, Junyi Jessy Li, Mikhail Yurochkin

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:74370-74405, 2025.

Abstract

Aligning Large Language Models to integrate and reflect human values, especially for tasks that demand intricate human oversight, is arduous since it is resource-intensive and time-consuming to depend on human expertise for context-specific guidance. Prior work has utilized predefined sets of rules or principles to steer the behavior of models (Bai et al., 2022; Sun et al., 2023). However, these principles tend to be generic, making it challenging to adapt them to each individual input query or context. In this work, we present Situated-PRInciples (SPRI), a framework requiring minimal or no human effort that is designed to automatically generate guiding principles in real-time for each input query and utilize them to align each response. We evaluate SPRI on three tasks, and show that 1) SPRI can derive principles in a complex domain-specific task that leads to on-par performance as expert-crafted ones; 2) SPRI-generated principles lead to instance-specific rubrics that outperform prior LLM-as-a-judge frameworks; 3) using SPRI to generate synthetic SFT data leads to substantial improvement on truthfulness.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-zhan25a,
  title = 	 {{SPRI}: Aligning Large Language Models with Context-Situated Principles},
  author =       {Zhan, Hongli and Azmat, Muneeza and Horesh, Raya and Li, Junyi Jessy and Yurochkin, Mikhail},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {74370--74405},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zhan25a/zhan25a.pdf},
  url = 	 {https://proceedings.mlr.press/v267/zhan25a.html},
  abstract = 	 {Aligning Large Language Models to integrate and reflect human values, especially for tasks that demand intricate human oversight, is arduous since it is resource-intensive and time-consuming to depend on human expertise for context-specific guidance. Prior work has utilized predefined sets of rules or principles to steer the behavior of models (Bai et al., 2022; Sun et al., 2023). However, these principles tend to be generic, making it challenging to adapt them to each individual input query or context. In this work, we present Situated-PRInciples (SPRI), a framework requiring minimal or no human effort that is designed to automatically generate guiding principles in real-time for each input query and utilize them to align each response. We evaluate SPRI on three tasks, and show that 1) SPRI can derive principles in a complex domain-specific task that leads to on-par performance as expert-crafted ones; 2) SPRI-generated principles lead to instance-specific rubrics that outperform prior LLM-as-a-judge frameworks; 3) using SPRI to generate synthetic SFT data leads to substantial improvement on truthfulness.}
}

Endnote

%0 Conference Paper
%T SPRI: Aligning Large Language Models with Context-Situated Principles
%A Hongli Zhan
%A Muneeza Azmat
%A Raya Horesh
%A Junyi Jessy Li
%A Mikhail Yurochkin
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-zhan25a
%I PMLR
%P 74370--74405
%U https://proceedings.mlr.press/v267/zhan25a.html
%V 267
%X Aligning Large Language Models to integrate and reflect human values, especially for tasks that demand intricate human oversight, is arduous since it is resource-intensive and time-consuming to depend on human expertise for context-specific guidance. Prior work has utilized predefined sets of rules or principles to steer the behavior of models (Bai et al., 2022; Sun et al., 2023). However, these principles tend to be generic, making it challenging to adapt them to each individual input query or context. In this work, we present Situated-PRInciples (SPRI), a framework requiring minimal or no human effort that is designed to automatically generate guiding principles in real-time for each input query and utilize them to align each response. We evaluate SPRI on three tasks, and show that 1) SPRI can derive principles in a complex domain-specific task that leads to on-par performance as expert-crafted ones; 2) SPRI-generated principles lead to instance-specific rubrics that outperform prior LLM-as-a-judge frameworks; 3) using SPRI to generate synthetic SFT data leads to substantial improvement on truthfulness.

APA

Zhan, H., Azmat, M., Horesh, R., Li, J.J. & Yurochkin, M.. (2025). SPRI: Aligning Large Language Models with Context-Situated Principles. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:74370-74405 Available from https://proceedings.mlr.press/v267/zhan25a.html.

SPRI: Aligning Large Language Models with Context-Situated Principles

Abstract

Cite this Paper

Related Material