Why Adaptively Collected Data Have Negative Bias and How to Correct for It

Xinkun Nie, Xiaoying Tian, Jonathan Taylor, James Zou
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:1261-1269, 2018.

Abstract

From scientific experiments to online A/B testing, the previously observed data often affects how future experiments are performed, which in turn affects which data will be collected. Such adaptivity introduces complex correlations between the data and the collection procedure. In this paper, we prove that when the data collection procedure satisfies natural conditions, then sample means of the data have systematic negative biases. As an example, consider an adaptive clinical trial where additional data points are more likely to be tested for treatments that show initial promise. Our surprising result implies that the average observed treatment effects would underestimate the true effects of each treatment. We quantitatively analyze the magnitude and behavior of this negative bias in a variety of settings. We also propose a novel debiasing algorithm based on selective inference techniques. In experiments, our method can effectively reduce bias and estimation error.

Cite this Paper


BibTeX
@InProceedings{pmlr-v84-nie18a, title = {Why Adaptively Collected Data Have Negative Bias and How to Correct for It}, author = {Nie, Xinkun and Tian, Xiaoying and Taylor, Jonathan and Zou, James}, booktitle = {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics}, pages = {1261--1269}, year = {2018}, editor = {Storkey, Amos and Perez-Cruz, Fernando}, volume = {84}, series = {Proceedings of Machine Learning Research}, month = {09--11 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v84/nie18a/nie18a.pdf}, url = {https://proceedings.mlr.press/v84/nie18a.html}, abstract = {From scientific experiments to online A/B testing, the previously observed data often affects how future experiments are performed, which in turn affects which data will be collected. Such adaptivity introduces complex correlations between the data and the collection procedure. In this paper, we prove that when the data collection procedure satisfies natural conditions, then sample means of the data have systematic negative biases. As an example, consider an adaptive clinical trial where additional data points are more likely to be tested for treatments that show initial promise. Our surprising result implies that the average observed treatment effects would underestimate the true effects of each treatment. We quantitatively analyze the magnitude and behavior of this negative bias in a variety of settings. We also propose a novel debiasing algorithm based on selective inference techniques. In experiments, our method can effectively reduce bias and estimation error. } }
Endnote
%0 Conference Paper %T Why Adaptively Collected Data Have Negative Bias and How to Correct for It %A Xinkun Nie %A Xiaoying Tian %A Jonathan Taylor %A James Zou %B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2018 %E Amos Storkey %E Fernando Perez-Cruz %F pmlr-v84-nie18a %I PMLR %P 1261--1269 %U https://proceedings.mlr.press/v84/nie18a.html %V 84 %X From scientific experiments to online A/B testing, the previously observed data often affects how future experiments are performed, which in turn affects which data will be collected. Such adaptivity introduces complex correlations between the data and the collection procedure. In this paper, we prove that when the data collection procedure satisfies natural conditions, then sample means of the data have systematic negative biases. As an example, consider an adaptive clinical trial where additional data points are more likely to be tested for treatments that show initial promise. Our surprising result implies that the average observed treatment effects would underestimate the true effects of each treatment. We quantitatively analyze the magnitude and behavior of this negative bias in a variety of settings. We also propose a novel debiasing algorithm based on selective inference techniques. In experiments, our method can effectively reduce bias and estimation error.
APA
Nie, X., Tian, X., Taylor, J. & Zou, J.. (2018). Why Adaptively Collected Data Have Negative Bias and How to Correct for It. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:1261-1269 Available from https://proceedings.mlr.press/v84/nie18a.html.

Related Material