A label efficient two-sample test

Weizhi Li; Gautam Dasarathy; Karthikeyan Natesan Ramamurthy; Visar Berisha

A label efficient two-sample test

Weizhi Li, Gautam Dasarathy, Karthikeyan Natesan Ramamurthy, Visar Berisha

Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:1168-1177, 2022.

Abstract

Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are easily measured whereas sample labels are unknown and costly to obtain. Accordingly, we devise a three-stage framework in service of performing an effective two-sample test with only a small number of sample label queries: first, a classifier is trained with samples uniformly labeled to model the posterior probabilities of the labels; second, a novel query scheme dubbed bimodal query is used to query labels of samples from both classes, and last, the classical Friedman-Rafsky (FR) two-sample test is performed on the queried samples. Theoretical analysis and extensive experiments performed on several datasets demonstrate that the proposed test controls the Type I error and has decreased Type II error relative to uniform querying and certainty-based querying. Source code for our algorithms and experimental results is available at https://github.com/wayne0908/Label-Efficient-Two-Sample.

Cite this Paper

BibTeX


@InProceedings{pmlr-v180-li22f,
  title = 	 {A label efficient two-sample test},
  author =       {Li, Weizhi and Dasarathy, Gautam and Ramamurthy, Karthikeyan Natesan and Berisha, Visar},
  booktitle = 	 {Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence},
  pages = 	 {1168--1177},
  year = 	 {2022},
  editor = 	 {Cussens, James and Zhang, Kun},
  volume = 	 {180},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {01--05 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v180/li22f/li22f.pdf},
  url = 	 {https://proceedings.mlr.press/v180/li22f.html},
  abstract = 	 {Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are easily measured whereas sample labels are unknown and costly to obtain. Accordingly, we devise a three-stage framework in service of performing an effective two-sample test with only a small number of sample label queries: first, a classifier is trained with samples uniformly labeled to model the posterior probabilities of the labels; second, a novel query scheme dubbed bimodal query is used to query labels of samples from both classes, and last, the classical Friedman-Rafsky (FR) two-sample test is performed on the queried samples. Theoretical analysis and extensive experiments performed on several datasets demonstrate that the proposed test controls the Type I error and has decreased Type II error relative to uniform querying and certainty-based querying. Source code for our algorithms and experimental results is available at https://github.com/wayne0908/Label-Efficient-Two-Sample.}
}

Endnote

%0 Conference Paper
%T A label efficient two-sample test
%A Weizhi Li
%A Gautam Dasarathy
%A Karthikeyan Natesan Ramamurthy
%A Visar Berisha
%B Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2022
%E James Cussens
%E Kun Zhang	
%F pmlr-v180-li22f
%I PMLR
%P 1168--1177
%U https://proceedings.mlr.press/v180/li22f.html
%V 180
%X Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are easily measured whereas sample labels are unknown and costly to obtain. Accordingly, we devise a three-stage framework in service of performing an effective two-sample test with only a small number of sample label queries: first, a classifier is trained with samples uniformly labeled to model the posterior probabilities of the labels; second, a novel query scheme dubbed bimodal query is used to query labels of samples from both classes, and last, the classical Friedman-Rafsky (FR) two-sample test is performed on the queried samples. Theoretical analysis and extensive experiments performed on several datasets demonstrate that the proposed test controls the Type I error and has decreased Type II error relative to uniform querying and certainty-based querying. Source code for our algorithms and experimental results is available at https://github.com/wayne0908/Label-Efficient-Two-Sample.

APA


Li, W., Dasarathy, G., Ramamurthy, K.N. & Berisha, V.. (2022). A label efficient two-sample test. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 180:1168-1177 Available from https://proceedings.mlr.press/v180/li22f.html.

A label efficient two-sample test

Abstract

Cite this Paper

Related Material