Concept Bottleneck Model with Zero Performance Loss

Zhenzhen Wang, Aleksander Popel, Jeremias Sulam
Conference on Parsimony and Learning, PMLR 280:433-461, 2025.

Abstract

Interpreting machine learning models with high-level, human-understandable concepts has gained increasing importance. The concept bottleneck model (CBM) is a popular approach for providing such explanations but typically sacrifices some prediction power compared with standard black-box models. In this work, we propose an approach to turn an off-the-shelf black-box model into a CBM without changing its predictions or compromising prediction power. Through an invertible mapping from the model’s latent space to a concept space, predictions are decomposed into a linear combination of concepts. This provides concept-based explanations for the complex model and allows us to intervene in its predictions manually. Experiments across benchmarks demonstrate that CBM-zero provides comparable explainability and better accuracy than other CBM methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v280-wang25b, title = {Concept Bottleneck Model with Zero Performance Loss}, author = {Wang, Zhenzhen and Popel, Aleksander and Sulam, Jeremias}, booktitle = {Conference on Parsimony and Learning}, pages = {433--461}, year = {2025}, editor = {Chen, Beidi and Liu, Shijia and Pilanci, Mert and Su, Weijie and Sulam, Jeremias and Wang, Yuxiang and Zhu, Zhihui}, volume = {280}, series = {Proceedings of Machine Learning Research}, month = {24--27 Mar}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v280/main/assets/wang25b/wang25b.pdf}, url = {https://proceedings.mlr.press/v280/wang25b.html}, abstract = {Interpreting machine learning models with high-level, human-understandable concepts has gained increasing importance. The concept bottleneck model (CBM) is a popular approach for providing such explanations but typically sacrifices some prediction power compared with standard black-box models. In this work, we propose an approach to turn an off-the-shelf black-box model into a CBM without changing its predictions or compromising prediction power. Through an invertible mapping from the model’s latent space to a concept space, predictions are decomposed into a linear combination of concepts. This provides concept-based explanations for the complex model and allows us to intervene in its predictions manually. Experiments across benchmarks demonstrate that CBM-zero provides comparable explainability and better accuracy than other CBM methods.} }
Endnote
%0 Conference Paper %T Concept Bottleneck Model with Zero Performance Loss %A Zhenzhen Wang %A Aleksander Popel %A Jeremias Sulam %B Conference on Parsimony and Learning %C Proceedings of Machine Learning Research %D 2025 %E Beidi Chen %E Shijia Liu %E Mert Pilanci %E Weijie Su %E Jeremias Sulam %E Yuxiang Wang %E Zhihui Zhu %F pmlr-v280-wang25b %I PMLR %P 433--461 %U https://proceedings.mlr.press/v280/wang25b.html %V 280 %X Interpreting machine learning models with high-level, human-understandable concepts has gained increasing importance. The concept bottleneck model (CBM) is a popular approach for providing such explanations but typically sacrifices some prediction power compared with standard black-box models. In this work, we propose an approach to turn an off-the-shelf black-box model into a CBM without changing its predictions or compromising prediction power. Through an invertible mapping from the model’s latent space to a concept space, predictions are decomposed into a linear combination of concepts. This provides concept-based explanations for the complex model and allows us to intervene in its predictions manually. Experiments across benchmarks demonstrate that CBM-zero provides comparable explainability and better accuracy than other CBM methods.
APA
Wang, Z., Popel, A. & Sulam, J.. (2025). Concept Bottleneck Model with Zero Performance Loss. Conference on Parsimony and Learning, in Proceedings of Machine Learning Research 280:433-461 Available from https://proceedings.mlr.press/v280/wang25b.html.

Related Material