[edit]
Counterfactual Explanations for Conformal Prediction Sets
Proceedings of the Fourteenth Symposium on Conformal and Probabilistic Prediction with Applications, PMLR 266:405-424, 2025.
Abstract
Conformal classification outputs prediction sets with formal guarantees, making it suitable for uncertainty-aware decision support. However, explaining such prediction sets remains an open challenge, as most existing explanation methods, including counterfactual ones, are tailored to point predictions. In this paper, we introduce a novel form of counterfactual explanations for conformal classifiers. These counterfactuals identify minimal changes that modify the conformal prediction set at a fixed significance level, thereby explaining how and why certain classes are included or excluded. To guide the generation of informative counterfactuals, we consider proximity, sparsity, and plausibility. While proximity and sparsity are commonly used in the literature, we introduce credibility as a new measure of how well a counterfactual conforms to the underlying data distribution, and hence its plausibility. We empirically evaluate our method across multiple tabular datasets and optimization criteria. The findings demonstrate the potential of using counterfactual explanations for conformal classification as informative and trustworthy explanations for conformal prediction sets.