Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization

Emiliano Penaloza, Tianyue H. Zhang, Laurent Charlin, Mateo Espinosa Zarlenga
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:48762-48788, 2025.

Abstract

Concept Bottleneck Models (CBMs) propose to enhance the trustworthiness of AI systems by constraining their decisions on a set of human understandable concepts. However, CBMs typically rely on datasets with assumedly accurate concept labels—an assumption often violated in practice which we show can significantly degrade performance. To address this, we introduce the Concept Preference Optimization (CPO) objective, a new loss function based on Direct Preference Optimization, which effectively mitigates the negative impact of concept mislabeling on CBM performance. We provide an analysis on some key properties of the CPO objective showing it directly optimizes for the concept’s posterior distribution, and contrast it against Binary Cross Entropy (BCE) where we show CPO is inherently less sensitive to concept noise. We empirically confirm our analysis finding that CPO consistently outperforms BCE in three real-world datasets with and without added label noise

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-penaloza25a, title = {Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization}, author = {Penaloza, Emiliano and Zhang, Tianyue H. and Charlin, Laurent and Espinosa Zarlenga, Mateo}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {48762--48788}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/penaloza25a/penaloza25a.pdf}, url = {https://proceedings.mlr.press/v267/penaloza25a.html}, abstract = {Concept Bottleneck Models (CBMs) propose to enhance the trustworthiness of AI systems by constraining their decisions on a set of human understandable concepts. However, CBMs typically rely on datasets with assumedly accurate concept labels—an assumption often violated in practice which we show can significantly degrade performance. To address this, we introduce the Concept Preference Optimization (CPO) objective, a new loss function based on Direct Preference Optimization, which effectively mitigates the negative impact of concept mislabeling on CBM performance. We provide an analysis on some key properties of the CPO objective showing it directly optimizes for the concept’s posterior distribution, and contrast it against Binary Cross Entropy (BCE) where we show CPO is inherently less sensitive to concept noise. We empirically confirm our analysis finding that CPO consistently outperforms BCE in three real-world datasets with and without added label noise} }
Endnote
%0 Conference Paper %T Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization %A Emiliano Penaloza %A Tianyue H. Zhang %A Laurent Charlin %A Mateo Espinosa Zarlenga %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-penaloza25a %I PMLR %P 48762--48788 %U https://proceedings.mlr.press/v267/penaloza25a.html %V 267 %X Concept Bottleneck Models (CBMs) propose to enhance the trustworthiness of AI systems by constraining their decisions on a set of human understandable concepts. However, CBMs typically rely on datasets with assumedly accurate concept labels—an assumption often violated in practice which we show can significantly degrade performance. To address this, we introduce the Concept Preference Optimization (CPO) objective, a new loss function based on Direct Preference Optimization, which effectively mitigates the negative impact of concept mislabeling on CBM performance. We provide an analysis on some key properties of the CPO objective showing it directly optimizes for the concept’s posterior distribution, and contrast it against Binary Cross Entropy (BCE) where we show CPO is inherently less sensitive to concept noise. We empirically confirm our analysis finding that CPO consistently outperforms BCE in three real-world datasets with and without added label noise
APA
Penaloza, E., Zhang, T.H., Charlin, L. & Espinosa Zarlenga, M.. (2025). Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:48762-48788 Available from https://proceedings.mlr.press/v267/penaloza25a.html.

Related Material