Sequential Kernelized Independence Testing

Aleksandr Podkopaev, Patrick Blöbaum, Shiva Kasiviswanathan, Aaditya Ramdas
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:27957-27993, 2023.

Abstract

Independence testing is a classical statistical problem that has been extensively studied in the batch setting when one fixes the sample size before collecting data. However, practitioners often prefer procedures that adapt to the complexity of a problem at hand instead of setting sample size in advance. Ideally, such procedures should (a) stop earlier on easy tasks (and later on harder tasks), hence making better use of available resources, and (b) continuously monitor the data and efficiently incorporate statistical evidence after collecting new data, while controlling the false alarm rate. Classical batch tests are not tailored for streaming data: valid inference after data peeking requires correcting for multiple testing which results in low power. Following the principle of testing by betting, we design sequential kernelized independence tests that overcome such shortcomings. We exemplify our broad framework using bets inspired by kernelized dependence measures, e.g., the Hilbert-Schmidt independence criterion. Our test is also valid under non-i.i.d. time-varying settings. We demonstrate the power of our approaches on both simulated and real data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-podkopaev23a, title = {Sequential Kernelized Independence Testing}, author = {Podkopaev, Aleksandr and Bl\"{o}baum, Patrick and Kasiviswanathan, Shiva and Ramdas, Aaditya}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {27957--27993}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/podkopaev23a/podkopaev23a.pdf}, url = {https://proceedings.mlr.press/v202/podkopaev23a.html}, abstract = {Independence testing is a classical statistical problem that has been extensively studied in the batch setting when one fixes the sample size before collecting data. However, practitioners often prefer procedures that adapt to the complexity of a problem at hand instead of setting sample size in advance. Ideally, such procedures should (a) stop earlier on easy tasks (and later on harder tasks), hence making better use of available resources, and (b) continuously monitor the data and efficiently incorporate statistical evidence after collecting new data, while controlling the false alarm rate. Classical batch tests are not tailored for streaming data: valid inference after data peeking requires correcting for multiple testing which results in low power. Following the principle of testing by betting, we design sequential kernelized independence tests that overcome such shortcomings. We exemplify our broad framework using bets inspired by kernelized dependence measures, e.g., the Hilbert-Schmidt independence criterion. Our test is also valid under non-i.i.d. time-varying settings. We demonstrate the power of our approaches on both simulated and real data.} }
Endnote
%0 Conference Paper %T Sequential Kernelized Independence Testing %A Aleksandr Podkopaev %A Patrick Blöbaum %A Shiva Kasiviswanathan %A Aaditya Ramdas %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-podkopaev23a %I PMLR %P 27957--27993 %U https://proceedings.mlr.press/v202/podkopaev23a.html %V 202 %X Independence testing is a classical statistical problem that has been extensively studied in the batch setting when one fixes the sample size before collecting data. However, practitioners often prefer procedures that adapt to the complexity of a problem at hand instead of setting sample size in advance. Ideally, such procedures should (a) stop earlier on easy tasks (and later on harder tasks), hence making better use of available resources, and (b) continuously monitor the data and efficiently incorporate statistical evidence after collecting new data, while controlling the false alarm rate. Classical batch tests are not tailored for streaming data: valid inference after data peeking requires correcting for multiple testing which results in low power. Following the principle of testing by betting, we design sequential kernelized independence tests that overcome such shortcomings. We exemplify our broad framework using bets inspired by kernelized dependence measures, e.g., the Hilbert-Schmidt independence criterion. Our test is also valid under non-i.i.d. time-varying settings. We demonstrate the power of our approaches on both simulated and real data.
APA
Podkopaev, A., Blöbaum, P., Kasiviswanathan, S. & Ramdas, A.. (2023). Sequential Kernelized Independence Testing. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:27957-27993 Available from https://proceedings.mlr.press/v202/podkopaev23a.html.

Related Material