LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models

Lukas Helff, Felix Friedrich, Manuel Brack, Kristian Kersting, Patrick Schramowski
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:22964-22987, 2025.

Abstract

This paper introduces Llavaguard, a suite of VLM-based vision safeguards that address the critical need for reliable tools in the era of large-scale data and models. To this end, we establish a novel open framework, describing a customizable safety taxonomy, data preprocessing, augmentation, and training setup. For teaching a VLM safeguard on safety, we further create a multimodal safety dataset with high-quality human expert annotations, where each image is labeled with a safety rating, category, and rationale. We also employ advanced augmentations to support context-specific assessments. The resulting Llavaguard models, ranging from 0.5B to 7B, serve as a versatile tool for evaluating the safety compliance of visual content against flexible policies. In comprehensive experiments, Llavaguard outperforms both state-of-the-art safeguards and VLMs in accuracy and in flexibly handling different policies. Additionally, we demonstrate Llavaguard’s performance in two real-world applications: large-scale dataset annotation and moderation of text-to-image models. We make our entire framework, including the dataset, model weights, and training code, publicly available at https://ml-research.github.io/human-centered-genai/projects/llavaguard.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-helff25a, title = {{L}lava{G}uard: An Open {VLM}-based Framework for Safeguarding Vision Datasets and Models}, author = {Helff, Lukas and Friedrich, Felix and Brack, Manuel and Kersting, Kristian and Schramowski, Patrick}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {22964--22987}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/helff25a/helff25a.pdf}, url = {https://proceedings.mlr.press/v267/helff25a.html}, abstract = {This paper introduces Llavaguard, a suite of VLM-based vision safeguards that address the critical need for reliable tools in the era of large-scale data and models. To this end, we establish a novel open framework, describing a customizable safety taxonomy, data preprocessing, augmentation, and training setup. For teaching a VLM safeguard on safety, we further create a multimodal safety dataset with high-quality human expert annotations, where each image is labeled with a safety rating, category, and rationale. We also employ advanced augmentations to support context-specific assessments. The resulting Llavaguard models, ranging from 0.5B to 7B, serve as a versatile tool for evaluating the safety compliance of visual content against flexible policies. In comprehensive experiments, Llavaguard outperforms both state-of-the-art safeguards and VLMs in accuracy and in flexibly handling different policies. Additionally, we demonstrate Llavaguard’s performance in two real-world applications: large-scale dataset annotation and moderation of text-to-image models. We make our entire framework, including the dataset, model weights, and training code, publicly available at https://ml-research.github.io/human-centered-genai/projects/llavaguard.} }
Endnote
%0 Conference Paper %T LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models %A Lukas Helff %A Felix Friedrich %A Manuel Brack %A Kristian Kersting %A Patrick Schramowski %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-helff25a %I PMLR %P 22964--22987 %U https://proceedings.mlr.press/v267/helff25a.html %V 267 %X This paper introduces Llavaguard, a suite of VLM-based vision safeguards that address the critical need for reliable tools in the era of large-scale data and models. To this end, we establish a novel open framework, describing a customizable safety taxonomy, data preprocessing, augmentation, and training setup. For teaching a VLM safeguard on safety, we further create a multimodal safety dataset with high-quality human expert annotations, where each image is labeled with a safety rating, category, and rationale. We also employ advanced augmentations to support context-specific assessments. The resulting Llavaguard models, ranging from 0.5B to 7B, serve as a versatile tool for evaluating the safety compliance of visual content against flexible policies. In comprehensive experiments, Llavaguard outperforms both state-of-the-art safeguards and VLMs in accuracy and in flexibly handling different policies. Additionally, we demonstrate Llavaguard’s performance in two real-world applications: large-scale dataset annotation and moderation of text-to-image models. We make our entire framework, including the dataset, model weights, and training code, publicly available at https://ml-research.github.io/human-centered-genai/projects/llavaguard.
APA
Helff, L., Friedrich, F., Brack, M., Kersting, K. & Schramowski, P.. (2025). LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:22964-22987 Available from https://proceedings.mlr.press/v267/helff25a.html.

Related Material