[edit]

# Trust Regions for Explanations via Black-Box Probabilistic Certification

*Proceedings of the 41st International Conference on Machine Learning*, PMLR 235:10736-10764, 2024.

#### Abstract

Given the black box nature of machine learning models, a plethora of explainability methods have been developed to decipher the factors behind individual decisions. In this paper, we introduce a novel problem of black box (probabilistic) explanation certification. We ask the question: Given a black box model with only query access, an explanation for an example and a quality metric (viz. fidelity, stability), can we find the largest hypercube (i.e., $\ell_{\infty}$ ball) centered at the example such that when the explanation is applied to all examples within the hypercube, (with high probability) a quality criterion is met (viz. fidelity greater than some value)? Being able to efficiently find such a

*trust region*has multiple benefits: i) insight into model behavior in a*region*, with a*guarantee*; ii) ascertained*stability*of the explanation; iii)*explanation reuse*, which can save time, energy and money by not having to find explanations for every example; and iv) a possible*meta-metric*to compare explanation methods. Our contributions include formalizing this problem, proposing solutions, providing theoretical guarantees for these solutions that are computable, and experimentally showing their efficacy on synthetic and real data.