Total Variation Floodgate for Variable Importance Inference in Classification

Wenshuo Wang, Lucas Janson, Lihua Lei, Aaditya Ramdas
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:50711-50725, 2024.

Abstract

Inferring variable importance is the key goal of many scientific studies, where researchers seek to learn the effect of a feature $X$ on the outcome $Y$ in the presence of confounding variables $Z$. Focusing on classification problems, we define the expected total variation (ETV), which is an intuitive and deterministic measure of variable importance that does not rely on any model assumption. We then introduce algorithms for statistical inference on the ETV under design-based/model-X assumptions. We name our method Total Variation Floodgate in reference to its shared high-level structure with the Floodgate method of Zhang & Janson (2020). The algorithms we introduce can leverage any user-specified regression function and produce asymptotic lower confidence bounds for the ETV. We show the effectiveness of our algorithms with simulations and a case study in conjoint analysis on the US general election.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-wang24ad, title = {Total Variation Floodgate for Variable Importance Inference in Classification}, author = {Wang, Wenshuo and Janson, Lucas and Lei, Lihua and Ramdas, Aaditya}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {50711--50725}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/wang24ad/wang24ad.pdf}, url = {https://proceedings.mlr.press/v235/wang24ad.html}, abstract = {Inferring variable importance is the key goal of many scientific studies, where researchers seek to learn the effect of a feature $X$ on the outcome $Y$ in the presence of confounding variables $Z$. Focusing on classification problems, we define the expected total variation (ETV), which is an intuitive and deterministic measure of variable importance that does not rely on any model assumption. We then introduce algorithms for statistical inference on the ETV under design-based/model-X assumptions. We name our method Total Variation Floodgate in reference to its shared high-level structure with the Floodgate method of Zhang & Janson (2020). The algorithms we introduce can leverage any user-specified regression function and produce asymptotic lower confidence bounds for the ETV. We show the effectiveness of our algorithms with simulations and a case study in conjoint analysis on the US general election.} }
Endnote
%0 Conference Paper %T Total Variation Floodgate for Variable Importance Inference in Classification %A Wenshuo Wang %A Lucas Janson %A Lihua Lei %A Aaditya Ramdas %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-wang24ad %I PMLR %P 50711--50725 %U https://proceedings.mlr.press/v235/wang24ad.html %V 235 %X Inferring variable importance is the key goal of many scientific studies, where researchers seek to learn the effect of a feature $X$ on the outcome $Y$ in the presence of confounding variables $Z$. Focusing on classification problems, we define the expected total variation (ETV), which is an intuitive and deterministic measure of variable importance that does not rely on any model assumption. We then introduce algorithms for statistical inference on the ETV under design-based/model-X assumptions. We name our method Total Variation Floodgate in reference to its shared high-level structure with the Floodgate method of Zhang & Janson (2020). The algorithms we introduce can leverage any user-specified regression function and produce asymptotic lower confidence bounds for the ETV. We show the effectiveness of our algorithms with simulations and a case study in conjoint analysis on the US general election.
APA
Wang, W., Janson, L., Lei, L. & Ramdas, A.. (2024). Total Variation Floodgate for Variable Importance Inference in Classification. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:50711-50725 Available from https://proceedings.mlr.press/v235/wang24ad.html.

Related Material