Strategic A/B testing via Maximum Probability-driven Two-armed Bandit

Yu Zhang, Shanshan Zhao, Bokui Wan, Jinjuan Wang, Xiaodong Yan
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:77069-77089, 2025.

Abstract

Detecting a minor average treatment effect is a major challenge in large-scale applications, where even minimal improvements can have a significant economic impact. Traditional methods, reliant on normal distribution-based or expanded statistics, often fail to identify such minor effects because of their inability to handle small discrepancies with sufficient sensitivity. This work leverages a counterfactual outcome framework and proposes a maximum probability-driven two-armed bandit (TAB) process by weighting the mean volatility statistic, which controls Type I error. The implementation of permutation methods further enhances the robustness and efficacy. The established strategic central limit theorem (SCLT) demonstrates that our approach yields a more concentrated distribution under the null hypothesis and a less concentrated one under the alternative hypothesis, greatly improving statistical power. The experimental results indicate a significant improvement in the A/B testing, highlighting the potential to reduce experimental costs while maintaining high statistical power.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-zhang25dj, title = {Strategic {A}/{B} testing via Maximum Probability-driven Two-armed Bandit}, author = {Zhang, Yu and Zhao, Shanshan and Wan, Bokui and Wang, Jinjuan and Yan, Xiaodong}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {77069--77089}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zhang25dj/zhang25dj.pdf}, url = {https://proceedings.mlr.press/v267/zhang25dj.html}, abstract = {Detecting a minor average treatment effect is a major challenge in large-scale applications, where even minimal improvements can have a significant economic impact. Traditional methods, reliant on normal distribution-based or expanded statistics, often fail to identify such minor effects because of their inability to handle small discrepancies with sufficient sensitivity. This work leverages a counterfactual outcome framework and proposes a maximum probability-driven two-armed bandit (TAB) process by weighting the mean volatility statistic, which controls Type I error. The implementation of permutation methods further enhances the robustness and efficacy. The established strategic central limit theorem (SCLT) demonstrates that our approach yields a more concentrated distribution under the null hypothesis and a less concentrated one under the alternative hypothesis, greatly improving statistical power. The experimental results indicate a significant improvement in the A/B testing, highlighting the potential to reduce experimental costs while maintaining high statistical power.} }
Endnote
%0 Conference Paper %T Strategic A/B testing via Maximum Probability-driven Two-armed Bandit %A Yu Zhang %A Shanshan Zhao %A Bokui Wan %A Jinjuan Wang %A Xiaodong Yan %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-zhang25dj %I PMLR %P 77069--77089 %U https://proceedings.mlr.press/v267/zhang25dj.html %V 267 %X Detecting a minor average treatment effect is a major challenge in large-scale applications, where even minimal improvements can have a significant economic impact. Traditional methods, reliant on normal distribution-based or expanded statistics, often fail to identify such minor effects because of their inability to handle small discrepancies with sufficient sensitivity. This work leverages a counterfactual outcome framework and proposes a maximum probability-driven two-armed bandit (TAB) process by weighting the mean volatility statistic, which controls Type I error. The implementation of permutation methods further enhances the robustness and efficacy. The established strategic central limit theorem (SCLT) demonstrates that our approach yields a more concentrated distribution under the null hypothesis and a less concentrated one under the alternative hypothesis, greatly improving statistical power. The experimental results indicate a significant improvement in the A/B testing, highlighting the potential to reduce experimental costs while maintaining high statistical power.
APA
Zhang, Y., Zhao, S., Wan, B., Wang, J. & Yan, X.. (2025). Strategic A/B testing via Maximum Probability-driven Two-armed Bandit. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:77069-77089 Available from https://proceedings.mlr.press/v267/zhang25dj.html.

Related Material