Identifying Interpretable Subspaces in Image Representations

Neha Kalibhat, Shweta Bhardwaj, C. Bayan Bruss, Hamed Firooz, Maziar Sanjabi, Soheil Feizi
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:15623-15638, 2023.

Abstract

We propose Automatic Feature Explanation using Contrasting Concepts (FALCON), an interpretability framework to explain features of image representations. For a target feature, FALCON captions its highly activating cropped images using a large captioning dataset (like LAION-400m) and a pre-trained vision-language model like CLIP. Each word among the captions is scored and ranked leading to a small number of shared, human-understandable concepts that closely describe the target feature. FALCON also applies contrastive interpretation using lowly activating (counterfactual) images, to eliminate spurious concepts. Although many existing approaches interpret features independently, we observe in state-of-the-art self-supervised and supervised models, that less than 20% of the representation space can be explained by individual features. We show that features in larger spaces become more interpretable when studied in groups and can be explained with high-order scoring concepts through FALCON. We discuss how extracted concepts can be used to explain and debug failures in downstream tasks. Finally, we present a technique to transfer concepts from one (explainable) representation space to another unseen representation space by learning a simple linear transformation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-kalibhat23a, title = {Identifying Interpretable Subspaces in Image Representations}, author = {Kalibhat, Neha and Bhardwaj, Shweta and Bruss, C. Bayan and Firooz, Hamed and Sanjabi, Maziar and Feizi, Soheil}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {15623--15638}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/kalibhat23a/kalibhat23a.pdf}, url = {https://proceedings.mlr.press/v202/kalibhat23a.html}, abstract = {We propose Automatic Feature Explanation using Contrasting Concepts (FALCON), an interpretability framework to explain features of image representations. For a target feature, FALCON captions its highly activating cropped images using a large captioning dataset (like LAION-400m) and a pre-trained vision-language model like CLIP. Each word among the captions is scored and ranked leading to a small number of shared, human-understandable concepts that closely describe the target feature. FALCON also applies contrastive interpretation using lowly activating (counterfactual) images, to eliminate spurious concepts. Although many existing approaches interpret features independently, we observe in state-of-the-art self-supervised and supervised models, that less than 20% of the representation space can be explained by individual features. We show that features in larger spaces become more interpretable when studied in groups and can be explained with high-order scoring concepts through FALCON. We discuss how extracted concepts can be used to explain and debug failures in downstream tasks. Finally, we present a technique to transfer concepts from one (explainable) representation space to another unseen representation space by learning a simple linear transformation.} }
Endnote
%0 Conference Paper %T Identifying Interpretable Subspaces in Image Representations %A Neha Kalibhat %A Shweta Bhardwaj %A C. Bayan Bruss %A Hamed Firooz %A Maziar Sanjabi %A Soheil Feizi %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-kalibhat23a %I PMLR %P 15623--15638 %U https://proceedings.mlr.press/v202/kalibhat23a.html %V 202 %X We propose Automatic Feature Explanation using Contrasting Concepts (FALCON), an interpretability framework to explain features of image representations. For a target feature, FALCON captions its highly activating cropped images using a large captioning dataset (like LAION-400m) and a pre-trained vision-language model like CLIP. Each word among the captions is scored and ranked leading to a small number of shared, human-understandable concepts that closely describe the target feature. FALCON also applies contrastive interpretation using lowly activating (counterfactual) images, to eliminate spurious concepts. Although many existing approaches interpret features independently, we observe in state-of-the-art self-supervised and supervised models, that less than 20% of the representation space can be explained by individual features. We show that features in larger spaces become more interpretable when studied in groups and can be explained with high-order scoring concepts through FALCON. We discuss how extracted concepts can be used to explain and debug failures in downstream tasks. Finally, we present a technique to transfer concepts from one (explainable) representation space to another unseen representation space by learning a simple linear transformation.
APA
Kalibhat, N., Bhardwaj, S., Bruss, C.B., Firooz, H., Sanjabi, M. & Feizi, S.. (2023). Identifying Interpretable Subspaces in Image Representations. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:15623-15638 Available from https://proceedings.mlr.press/v202/kalibhat23a.html.

Related Material