On the High Dimensional Power of a Linear-Time Two Sample Test under Mean-shift Alternatives

Sashank Reddi; Aaditya Ramdas; Barnabas Poczos; Aarti Singh; Larry Wasserman

On the High Dimensional Power of a Linear-Time Two Sample Test under Mean-shift Alternatives

Sashank Reddi, Aaditya Ramdas, Barnabas Poczos, Aarti Singh, Larry Wasserman

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR 38:772-780, 2015.

Abstract

Nonparametric two sample testing deals with the question of consistently deciding if two distributions are different, given samples from both, without making any parametric assumptions about the form of the distributions. The current literature is split into two kinds of tests - those which are consistent without any assumptions about how the distributions may differ (\textitgeneral alternatives), and those which are designed to specifically test easier alternatives, like a difference in means (\textitmean-shift alternatives). The main contribution of this paper is to explicitly characterize the power of a popular nonparametric two sample test, designed for general alternatives, under a mean-shift alternative in the high-dimensional setting. Specifically, we explicitly derive the power of the linear-time Maximum Mean Discrepancy statistic using the Gaussian kernel, where the dimension and sample size can both tend to infinity at any rate, and the two distributions differ in their means. As a corollary, we find that if the signal-to-noise ratio is held constant, then the test’s power goes to one if the number of samples increases faster than the dimension increases. This is the first explicit power derivation for a general nonparametric test in the high-dimensional setting, and the first analysis of how tests designed for general alternatives perform against easier ones.

Cite this Paper

BibTeX


@InProceedings{pmlr-v38-reddi15,
  title = 	 {{On the High Dimensional Power of a Linear-Time Two Sample Test under Mean-shift Alternatives}},
  author = 	 {Reddi, Sashank and Ramdas, Aaditya and Poczos, Barnabas and Singh, Aarti and Wasserman, Larry},
  booktitle = 	 {Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {772--780},
  year = 	 {2015},
  editor = 	 {Lebanon, Guy and Vishwanathan, S. V. N.},
  volume = 	 {38},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {San Diego, California, USA},
  month = 	 {09--12 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v38/reddi15.pdf},
  url = 	 {https://proceedings.mlr.press/v38/reddi15.html},
  abstract = 	 {Nonparametric two sample testing deals with the question of consistently deciding if two distributions are different, given samples from both, without making any parametric assumptions about the form of the distributions. The current literature is split into two kinds of tests - those which are consistent without any assumptions about how the distributions may differ (\textitgeneral alternatives), and those which are designed to specifically test easier alternatives, like a difference in means (\textitmean-shift alternatives). The main contribution of this paper is to explicitly characterize the power  of a popular nonparametric two sample test, designed for general alternatives, under a mean-shift alternative in the  high-dimensional setting. Specifically, we explicitly derive the power of the linear-time Maximum Mean Discrepancy statistic using the Gaussian kernel, where the dimension and sample size can both tend to infinity at any rate, and the two distributions differ in their means. As a corollary, we find that if the signal-to-noise ratio is held constant, then the test’s power goes to one if the number of samples increases faster than the dimension increases. This is the first explicit power derivation for a general nonparametric test in the high-dimensional setting, and  the first analysis of how tests designed for general alternatives perform against easier ones.}
}

Endnote

%0 Conference Paper
%T On the High Dimensional Power of a Linear-Time Two Sample Test under Mean-shift Alternatives
%A Sashank Reddi
%A Aaditya Ramdas
%A Barnabas Poczos
%A Aarti Singh
%A Larry Wasserman
%B Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2015
%E Guy Lebanon
%E S. V. N. Vishwanathan	
%F pmlr-v38-reddi15
%I PMLR
%P 772--780
%U https://proceedings.mlr.press/v38/reddi15.html
%V 38
%X Nonparametric two sample testing deals with the question of consistently deciding if two distributions are different, given samples from both, without making any parametric assumptions about the form of the distributions. The current literature is split into two kinds of tests - those which are consistent without any assumptions about how the distributions may differ (\textitgeneral alternatives), and those which are designed to specifically test easier alternatives, like a difference in means (\textitmean-shift alternatives). The main contribution of this paper is to explicitly characterize the power  of a popular nonparametric two sample test, designed for general alternatives, under a mean-shift alternative in the  high-dimensional setting. Specifically, we explicitly derive the power of the linear-time Maximum Mean Discrepancy statistic using the Gaussian kernel, where the dimension and sample size can both tend to infinity at any rate, and the two distributions differ in their means. As a corollary, we find that if the signal-to-noise ratio is held constant, then the test’s power goes to one if the number of samples increases faster than the dimension increases. This is the first explicit power derivation for a general nonparametric test in the high-dimensional setting, and  the first analysis of how tests designed for general alternatives perform against easier ones.

RIS


TY  - CPAPER
TI  - On the High Dimensional Power of a Linear-Time Two Sample Test under Mean-shift Alternatives
AU  - Sashank Reddi
AU  - Aaditya Ramdas
AU  - Barnabas Poczos
AU  - Aarti Singh
AU  - Larry Wasserman
BT  - Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
DA  - 2015/02/21
ED  - Guy Lebanon
ED  - S. V. N. Vishwanathan	
ID  - pmlr-v38-reddi15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 38
SP  - 772
EP  - 780
L1  - http://proceedings.mlr.press/v38/reddi15.pdf
UR  - https://proceedings.mlr.press/v38/reddi15.html
AB  - Nonparametric two sample testing deals with the question of consistently deciding if two distributions are different, given samples from both, without making any parametric assumptions about the form of the distributions. The current literature is split into two kinds of tests - those which are consistent without any assumptions about how the distributions may differ (\textitgeneral alternatives), and those which are designed to specifically test easier alternatives, like a difference in means (\textitmean-shift alternatives). The main contribution of this paper is to explicitly characterize the power  of a popular nonparametric two sample test, designed for general alternatives, under a mean-shift alternative in the  high-dimensional setting. Specifically, we explicitly derive the power of the linear-time Maximum Mean Discrepancy statistic using the Gaussian kernel, where the dimension and sample size can both tend to infinity at any rate, and the two distributions differ in their means. As a corollary, we find that if the signal-to-noise ratio is held constant, then the test’s power goes to one if the number of samples increases faster than the dimension increases. This is the first explicit power derivation for a general nonparametric test in the high-dimensional setting, and  the first analysis of how tests designed for general alternatives perform against easier ones.
ER  -

APA


Reddi, S., Ramdas, A., Poczos, B., Singh, A. & Wasserman, L.. (2015). On the High Dimensional Power of a Linear-Time Two Sample Test under Mean-shift Alternatives. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 38:772-780 Available from https://proceedings.mlr.press/v38/reddi15.html.

On the High Dimensional Power of a Linear-Time Two Sample Test under Mean-shift Alternatives

Abstract

Cite this Paper

Related Material