Heavy Sets with Applications to Interpretable Machine Learning Diagnostics

Dmitry Malioutov, Sanjeeb Dash, Dennis Wei
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:5918-5930, 2023.

Abstract

ML models take on a new life after deployment and raise a host of new challenges: data drift, model recalibration and monitoring. If performance erodes over time, engineers in charge may ask what changed – did the data distribution change, did the model get worse after retraining? We propose a flexible paradigm for answering a variety of model diagnosis questions by finding heaviest-weight interpretable regions, which we call heavy sets. We associate a local weight describing model mismatch at each datapoint, and find a simple region maximizing the sum (or average) of these weights. Specific choices of weights can find regions where two models differ the most, where a single model makes unusually many errors, or where two datasets have large differences in densities. The premise is that a region with overall elevated errors (weights) may discover statistically significant effects despite individual errors not standing out in the noise. We focus on interpretable regions defined by sparse AND-rules (conjunctive rule using a small subset of available features). We first describe an exact integer programming (IP) formulation applicable to smaller data-sets. As the exact IP is NP-hard, we develop a greedy coordinate-wise dynamic-programming based formulation. For smaller datasets the heuristic often comes close in accuracy to the IP in objective, but it can scale to datasets with millions of examples and thousands of features. We also address statistical significance of the detected regions, taking care of multiple hypothesis testing and spatial dependence challenges that arise in model diagnostics. We evaluate our proposed approach both on synthetic data (with known ground-truth), and on well-known public ML datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-malioutov23a, title = {Heavy Sets with Applications to Interpretable Machine Learning Diagnostics}, author = {Malioutov, Dmitry and Dash, Sanjeeb and Wei, Dennis}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {5918--5930}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/malioutov23a/malioutov23a.pdf}, url = {https://proceedings.mlr.press/v206/malioutov23a.html}, abstract = {ML models take on a new life after deployment and raise a host of new challenges: data drift, model recalibration and monitoring. If performance erodes over time, engineers in charge may ask what changed – did the data distribution change, did the model get worse after retraining? We propose a flexible paradigm for answering a variety of model diagnosis questions by finding heaviest-weight interpretable regions, which we call heavy sets. We associate a local weight describing model mismatch at each datapoint, and find a simple region maximizing the sum (or average) of these weights. Specific choices of weights can find regions where two models differ the most, where a single model makes unusually many errors, or where two datasets have large differences in densities. The premise is that a region with overall elevated errors (weights) may discover statistically significant effects despite individual errors not standing out in the noise. We focus on interpretable regions defined by sparse AND-rules (conjunctive rule using a small subset of available features). We first describe an exact integer programming (IP) formulation applicable to smaller data-sets. As the exact IP is NP-hard, we develop a greedy coordinate-wise dynamic-programming based formulation. For smaller datasets the heuristic often comes close in accuracy to the IP in objective, but it can scale to datasets with millions of examples and thousands of features. We also address statistical significance of the detected regions, taking care of multiple hypothesis testing and spatial dependence challenges that arise in model diagnostics. We evaluate our proposed approach both on synthetic data (with known ground-truth), and on well-known public ML datasets.} }
Endnote
%0 Conference Paper %T Heavy Sets with Applications to Interpretable Machine Learning Diagnostics %A Dmitry Malioutov %A Sanjeeb Dash %A Dennis Wei %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-malioutov23a %I PMLR %P 5918--5930 %U https://proceedings.mlr.press/v206/malioutov23a.html %V 206 %X ML models take on a new life after deployment and raise a host of new challenges: data drift, model recalibration and monitoring. If performance erodes over time, engineers in charge may ask what changed – did the data distribution change, did the model get worse after retraining? We propose a flexible paradigm for answering a variety of model diagnosis questions by finding heaviest-weight interpretable regions, which we call heavy sets. We associate a local weight describing model mismatch at each datapoint, and find a simple region maximizing the sum (or average) of these weights. Specific choices of weights can find regions where two models differ the most, where a single model makes unusually many errors, or where two datasets have large differences in densities. The premise is that a region with overall elevated errors (weights) may discover statistically significant effects despite individual errors not standing out in the noise. We focus on interpretable regions defined by sparse AND-rules (conjunctive rule using a small subset of available features). We first describe an exact integer programming (IP) formulation applicable to smaller data-sets. As the exact IP is NP-hard, we develop a greedy coordinate-wise dynamic-programming based formulation. For smaller datasets the heuristic often comes close in accuracy to the IP in objective, but it can scale to datasets with millions of examples and thousands of features. We also address statistical significance of the detected regions, taking care of multiple hypothesis testing and spatial dependence challenges that arise in model diagnostics. We evaluate our proposed approach both on synthetic data (with known ground-truth), and on well-known public ML datasets.
APA
Malioutov, D., Dash, S. & Wei, D.. (2023). Heavy Sets with Applications to Interpretable Machine Learning Diagnostics. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:5918-5930 Available from https://proceedings.mlr.press/v206/malioutov23a.html.

Related Material