Leveraging covariate adjustments at scale in online A/B testing

Lorenzo Masoero, Doug Hains, James McQueen
Proceedings of The KDD'23 Workshop on Causal Discovery, Prediction and Decision, PMLR 218:25-48, 2023.

Abstract

Companies offering web services routinely run randomized online experiments to estimate the “causal impact” associated with the adoption of new features and policies on key performance metrics of interest. These experiments are used to estimate a variety of effects: the increase in click rate due to the repositioning of a banner, the impact on subscription rate as a consequence of a discount or special offer, etc. In these settings, even effects whose sizes are very small can have large downstream impacts. The simple difference in means estimator (Splawa-Neyman et al., 1923/1990) is still the standard estimator of choice for many online A/B testing platforms due to its simplicity. This method, however, can fail to detect small effects, even when the experiment contains thousands or millions of observational units. As a byproduct of these experiments, however, large amounts of additional data (covariates) are collected. In this paper, we discuss benefits, costs and risks of allowing experimenters to leverage more complicated estimators that make use of covariates when estimating causal effects of interest. We adapt a recently proposed general-purpose algorithm for the estimation of causal effects with covariates to the setting of online A/B testing. Through this paradigm, we implement several covariate-adjusted causal estimators. We thoroughly evaluate their performance at scale, highlighting benefits and shortcomings of different methods. We show on real experiments how “covariate- adjusted” estimators can (i) lead to more precise quantification of the causal effects of interest and (ii) fix issues related to imbalance across treatment arms — a practical concern often overlooked in the literature. In turn, (iii) these more precise estimates can reduce experimentation time, cutting cost and helping to streamline decision-making processes, allowing for faster adoption of beneficial interventions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v218-masoero23a, title = {Leveraging covariate adjustments at scale in online A/B testing}, author = {Masoero, Lorenzo and Hains, Doug and McQueen, James}, booktitle = {Proceedings of The KDD'23 Workshop on Causal Discovery, Prediction and Decision}, pages = {25--48}, year = {2023}, editor = {Le, Thuc and Li, Jiuyong and Ness, Robert and Triantafillou, Sofia and Shimizu, Shohei and Cui, Peng and Kuang, Kun and Pei, Jian and Wang, Fei and Prosperi, Mattia}, volume = {218}, series = {Proceedings of Machine Learning Research}, month = {07 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v218/masoero23a/masoero23a.pdf}, url = {https://proceedings.mlr.press/v218/masoero23a.html}, abstract = {Companies offering web services routinely run randomized online experiments to estimate the “causal impact” associated with the adoption of new features and policies on key performance metrics of interest. These experiments are used to estimate a variety of effects: the increase in click rate due to the repositioning of a banner, the impact on subscription rate as a consequence of a discount or special offer, etc. In these settings, even effects whose sizes are very small can have large downstream impacts. The simple difference in means estimator (Splawa-Neyman et al., 1923/1990) is still the standard estimator of choice for many online A/B testing platforms due to its simplicity. This method, however, can fail to detect small effects, even when the experiment contains thousands or millions of observational units. As a byproduct of these experiments, however, large amounts of additional data (covariates) are collected. In this paper, we discuss benefits, costs and risks of allowing experimenters to leverage more complicated estimators that make use of covariates when estimating causal effects of interest. We adapt a recently proposed general-purpose algorithm for the estimation of causal effects with covariates to the setting of online A/B testing. Through this paradigm, we implement several covariate-adjusted causal estimators. We thoroughly evaluate their performance at scale, highlighting benefits and shortcomings of different methods. We show on real experiments how “covariate- adjusted” estimators can (i) lead to more precise quantification of the causal effects of interest and (ii) fix issues related to imbalance across treatment arms — a practical concern often overlooked in the literature. In turn, (iii) these more precise estimates can reduce experimentation time, cutting cost and helping to streamline decision-making processes, allowing for faster adoption of beneficial interventions.} }
Endnote
%0 Conference Paper %T Leveraging covariate adjustments at scale in online A/B testing %A Lorenzo Masoero %A Doug Hains %A James McQueen %B Proceedings of The KDD'23 Workshop on Causal Discovery, Prediction and Decision %C Proceedings of Machine Learning Research %D 2023 %E Thuc Le %E Jiuyong Li %E Robert Ness %E Sofia Triantafillou %E Shohei Shimizu %E Peng Cui %E Kun Kuang %E Jian Pei %E Fei Wang %E Mattia Prosperi %F pmlr-v218-masoero23a %I PMLR %P 25--48 %U https://proceedings.mlr.press/v218/masoero23a.html %V 218 %X Companies offering web services routinely run randomized online experiments to estimate the “causal impact” associated with the adoption of new features and policies on key performance metrics of interest. These experiments are used to estimate a variety of effects: the increase in click rate due to the repositioning of a banner, the impact on subscription rate as a consequence of a discount or special offer, etc. In these settings, even effects whose sizes are very small can have large downstream impacts. The simple difference in means estimator (Splawa-Neyman et al., 1923/1990) is still the standard estimator of choice for many online A/B testing platforms due to its simplicity. This method, however, can fail to detect small effects, even when the experiment contains thousands or millions of observational units. As a byproduct of these experiments, however, large amounts of additional data (covariates) are collected. In this paper, we discuss benefits, costs and risks of allowing experimenters to leverage more complicated estimators that make use of covariates when estimating causal effects of interest. We adapt a recently proposed general-purpose algorithm for the estimation of causal effects with covariates to the setting of online A/B testing. Through this paradigm, we implement several covariate-adjusted causal estimators. We thoroughly evaluate their performance at scale, highlighting benefits and shortcomings of different methods. We show on real experiments how “covariate- adjusted” estimators can (i) lead to more precise quantification of the causal effects of interest and (ii) fix issues related to imbalance across treatment arms — a practical concern often overlooked in the literature. In turn, (iii) these more precise estimates can reduce experimentation time, cutting cost and helping to streamline decision-making processes, allowing for faster adoption of beneficial interventions.
APA
Masoero, L., Hains, D. & McQueen, J.. (2023). Leveraging covariate adjustments at scale in online A/B testing. Proceedings of The KDD'23 Workshop on Causal Discovery, Prediction and Decision, in Proceedings of Machine Learning Research 218:25-48 Available from https://proceedings.mlr.press/v218/masoero23a.html.

Related Material