Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits

Martin Zhang, James Zou, David Tse
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:7512-7522, 2019.

Abstract

Monte Carlo (MC) permutation test is considered the gold standard for statistical hypothesis testing, especially when standard parametric assumptions are not clear or likely to fail. However, in modern data science settings where a large number of hypothesis tests need to be performed simultaneously, it is rarely used due to its prohibitive computational cost. In genome-wide association studies, for example, the number of hypothesis tests $m$ is around $10^6$ while the number of MC samples $n$ for each test could be greater than $10^8$, totaling more than $nm$=$10^{14}$ samples. In this paper, we propose \texttt{A}daptive \texttt{M}C multiple \texttt{T}esting (\texttt{AMT}) to estimate MC p-values and control false discovery rate in multiple testing. The algorithm outputs the same result as the standard full MC approach with high probability while requiring only $\tilde{O}(\sqrt{n}m)$ samples. This sample complexity is shown to be optimal. On a Parkinson GWAS dataset, the algorithm reduces the running time from 2 months for full MC to an hour. The \texttt{AMT} algorithm is derived based on the theory of multi-armed bandits.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-zhang19t, title = {Adaptive {M}onte {C}arlo Multiple Testing via Multi-Armed Bandits}, author = {Zhang, Martin and Zou, James and Tse, David}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {7512--7522}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/zhang19t/zhang19t.pdf}, url = {https://proceedings.mlr.press/v97/zhang19t.html}, abstract = {Monte Carlo (MC) permutation test is considered the gold standard for statistical hypothesis testing, especially when standard parametric assumptions are not clear or likely to fail. However, in modern data science settings where a large number of hypothesis tests need to be performed simultaneously, it is rarely used due to its prohibitive computational cost. In genome-wide association studies, for example, the number of hypothesis tests $m$ is around $10^6$ while the number of MC samples $n$ for each test could be greater than $10^8$, totaling more than $nm$=$10^{14}$ samples. In this paper, we propose \texttt{A}daptive \texttt{M}C multiple \texttt{T}esting (\texttt{AMT}) to estimate MC p-values and control false discovery rate in multiple testing. The algorithm outputs the same result as the standard full MC approach with high probability while requiring only $\tilde{O}(\sqrt{n}m)$ samples. This sample complexity is shown to be optimal. On a Parkinson GWAS dataset, the algorithm reduces the running time from 2 months for full MC to an hour. The \texttt{AMT} algorithm is derived based on the theory of multi-armed bandits.} }
Endnote
%0 Conference Paper %T Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits %A Martin Zhang %A James Zou %A David Tse %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-zhang19t %I PMLR %P 7512--7522 %U https://proceedings.mlr.press/v97/zhang19t.html %V 97 %X Monte Carlo (MC) permutation test is considered the gold standard for statistical hypothesis testing, especially when standard parametric assumptions are not clear or likely to fail. However, in modern data science settings where a large number of hypothesis tests need to be performed simultaneously, it is rarely used due to its prohibitive computational cost. In genome-wide association studies, for example, the number of hypothesis tests $m$ is around $10^6$ while the number of MC samples $n$ for each test could be greater than $10^8$, totaling more than $nm$=$10^{14}$ samples. In this paper, we propose \texttt{A}daptive \texttt{M}C multiple \texttt{T}esting (\texttt{AMT}) to estimate MC p-values and control false discovery rate in multiple testing. The algorithm outputs the same result as the standard full MC approach with high probability while requiring only $\tilde{O}(\sqrt{n}m)$ samples. This sample complexity is shown to be optimal. On a Parkinson GWAS dataset, the algorithm reduces the running time from 2 months for full MC to an hour. The \texttt{AMT} algorithm is derived based on the theory of multi-armed bandits.
APA
Zhang, M., Zou, J. & Tse, D.. (2019). Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:7512-7522 Available from https://proceedings.mlr.press/v97/zhang19t.html.

Related Material