Density Ratio Estimation and Neyman Pearson Classification with Missing Data

Josh Givens, Song Liu, Henry W. J. Reeve
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:8645-8681, 2023.

Abstract

Density Ratio Estimation (DRE) is an important machine learning technique with many downstream applications. We consider the challenge of DRE with missing not at random (MNAR) data. In this setting, we show that using standard DRE methods leads to biased results while our proposal (M-KLIEP), an adaptation of the popular DRE procedure KLIEP, restores consistency. Moreover, we provide finite sample estimation error bounds for M-KLIEP, which demonstrate minimax optimality with respect to both sample size and worst-case missingness. We then adapt an important downstream application of DRE, Neyman-Pearson (NP) classification, to this MNAR setting. Our procedure both controls Type I error and achieves high power, with high probability. Finally, we demonstrate promising empirical performance both synthetic data and real-world data with simulated missingness.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-givens23a, title = {Density Ratio Estimation and Neyman Pearson Classification with Missing Data}, author = {Givens, Josh and Liu, Song and Reeve, Henry W. J.}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {8645--8681}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/givens23a/givens23a.pdf}, url = {https://proceedings.mlr.press/v206/givens23a.html}, abstract = {Density Ratio Estimation (DRE) is an important machine learning technique with many downstream applications. We consider the challenge of DRE with missing not at random (MNAR) data. In this setting, we show that using standard DRE methods leads to biased results while our proposal (M-KLIEP), an adaptation of the popular DRE procedure KLIEP, restores consistency. Moreover, we provide finite sample estimation error bounds for M-KLIEP, which demonstrate minimax optimality with respect to both sample size and worst-case missingness. We then adapt an important downstream application of DRE, Neyman-Pearson (NP) classification, to this MNAR setting. Our procedure both controls Type I error and achieves high power, with high probability. Finally, we demonstrate promising empirical performance both synthetic data and real-world data with simulated missingness.} }
Endnote
%0 Conference Paper %T Density Ratio Estimation and Neyman Pearson Classification with Missing Data %A Josh Givens %A Song Liu %A Henry W. J. Reeve %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-givens23a %I PMLR %P 8645--8681 %U https://proceedings.mlr.press/v206/givens23a.html %V 206 %X Density Ratio Estimation (DRE) is an important machine learning technique with many downstream applications. We consider the challenge of DRE with missing not at random (MNAR) data. In this setting, we show that using standard DRE methods leads to biased results while our proposal (M-KLIEP), an adaptation of the popular DRE procedure KLIEP, restores consistency. Moreover, we provide finite sample estimation error bounds for M-KLIEP, which demonstrate minimax optimality with respect to both sample size and worst-case missingness. We then adapt an important downstream application of DRE, Neyman-Pearson (NP) classification, to this MNAR setting. Our procedure both controls Type I error and achieves high power, with high probability. Finally, we demonstrate promising empirical performance both synthetic data and real-world data with simulated missingness.
APA
Givens, J., Liu, S. & Reeve, H.W.J.. (2023). Density Ratio Estimation and Neyman Pearson Classification with Missing Data. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:8645-8681 Available from https://proceedings.mlr.press/v206/givens23a.html.

Related Material