Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts

Mateo Espinosa Zarlenga, Gabriele Dominici, Pietro Barbiero, Zohreh Shams, Mateja Jamnik
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:15564-15595, 2025.

Abstract

In this paper, we investigate how concept-based models (CMs) respond to out-of-distribution (OOD) inputs. CMs are interpretable neural architectures that first predict a set of high-level concepts (e.g., "stripes", "black") and then predict a task label from those concepts. In particular, we study the impact of concept interventions (i.e., operations where a human expert corrects a CM’s mispredicted concepts at test time) on CMs’ task predictions when inputs are OOD. Our analysis reveals a weakness in current state-of-the-art CMs, which we term leakage poisoning, that prevents them from properly improving their accuracy when intervened on for OOD inputs. To address this, we introduce MixCEM, a new CM that learns to dynamically exploit leaked information missing from its concepts only when this information is in-distribution. Our results across tasks with and without complete sets of concept annotations demonstrate that MixCEMs outperform strong baselines by significantly improving their accuracy for both in-distribution and OOD samples in the presence and absence of concept interventions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-espinosa-zarlenga25a, title = {Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts}, author = {Espinosa Zarlenga, Mateo and Dominici, Gabriele and Barbiero, Pietro and Shams, Zohreh and Jamnik, Mateja}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {15564--15595}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/espinosa-zarlenga25a/espinosa-zarlenga25a.pdf}, url = {https://proceedings.mlr.press/v267/espinosa-zarlenga25a.html}, abstract = {In this paper, we investigate how concept-based models (CMs) respond to out-of-distribution (OOD) inputs. CMs are interpretable neural architectures that first predict a set of high-level concepts (e.g., "stripes", "black") and then predict a task label from those concepts. In particular, we study the impact of concept interventions (i.e., operations where a human expert corrects a CM’s mispredicted concepts at test time) on CMs’ task predictions when inputs are OOD. Our analysis reveals a weakness in current state-of-the-art CMs, which we term leakage poisoning, that prevents them from properly improving their accuracy when intervened on for OOD inputs. To address this, we introduce MixCEM, a new CM that learns to dynamically exploit leaked information missing from its concepts only when this information is in-distribution. Our results across tasks with and without complete sets of concept annotations demonstrate that MixCEMs outperform strong baselines by significantly improving their accuracy for both in-distribution and OOD samples in the presence and absence of concept interventions.} }
Endnote
%0 Conference Paper %T Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts %A Mateo Espinosa Zarlenga %A Gabriele Dominici %A Pietro Barbiero %A Zohreh Shams %A Mateja Jamnik %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-espinosa-zarlenga25a %I PMLR %P 15564--15595 %U https://proceedings.mlr.press/v267/espinosa-zarlenga25a.html %V 267 %X In this paper, we investigate how concept-based models (CMs) respond to out-of-distribution (OOD) inputs. CMs are interpretable neural architectures that first predict a set of high-level concepts (e.g., "stripes", "black") and then predict a task label from those concepts. In particular, we study the impact of concept interventions (i.e., operations where a human expert corrects a CM’s mispredicted concepts at test time) on CMs’ task predictions when inputs are OOD. Our analysis reveals a weakness in current state-of-the-art CMs, which we term leakage poisoning, that prevents them from properly improving their accuracy when intervened on for OOD inputs. To address this, we introduce MixCEM, a new CM that learns to dynamically exploit leaked information missing from its concepts only when this information is in-distribution. Our results across tasks with and without complete sets of concept annotations demonstrate that MixCEMs outperform strong baselines by significantly improving their accuracy for both in-distribution and OOD samples in the presence and absence of concept interventions.
APA
Espinosa Zarlenga, M., Dominici, G., Barbiero, P., Shams, Z. & Jamnik, M.. (2025). Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:15564-15595 Available from https://proceedings.mlr.press/v267/espinosa-zarlenga25a.html.

Related Material