Bootstrap Your Conversions: Thompson Sampling for Partially Observable Delayed Rewards

Marco Gigli, Fabio Stella
Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:1438-1452, 2024.

Abstract

This paper presents a novel approach to address contextual bandit problems with partially observable, delayed feedback by introducing an approximate Thompson sampling technique. This is a common setting, with applications ranging from online marketing to vaccine trials. Leveraging Bootstrapped Thompson sampling (BTS), we obtain an approximate posterior distribution over delay distributions and conversion probabilities, thereby extending an Expectation-Maximisation (EM) model to the Bayesian domain. Unlike prior methodologies, our approach does not overlook uncertainty on delays. Within the EM framework, we employ the Kaplan-Meier estimator to place no restriction on delay distributions. Through extensive benchmarking against state-of-the-art techniques, our approach demonstrates superior performance across the majority of tested environments, with comparable performance in the remaining cases. Furthermore, our method offers practical implementation using off-the-shelf libraries, facilitating broader adoption. Our technique lays a foundation for extending to other bandit settings, such as non-contextual bandits or action-dependent delay distributions, promising wider applicability and versatility in real-world applications.

Cite this Paper


BibTeX
@InProceedings{pmlr-v244-gigli24a, title = {Bootstrap Your Conversions: Thompson Sampling for Partially Observable Delayed Rewards}, author = {Gigli, Marco and Stella, Fabio}, booktitle = {Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence}, pages = {1438--1452}, year = {2024}, editor = {Kiyavash, Negar and Mooij, Joris M.}, volume = {244}, series = {Proceedings of Machine Learning Research}, month = {15--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v244/main/assets/gigli24a/gigli24a.pdf}, url = {https://proceedings.mlr.press/v244/gigli24a.html}, abstract = {This paper presents a novel approach to address contextual bandit problems with partially observable, delayed feedback by introducing an approximate Thompson sampling technique. This is a common setting, with applications ranging from online marketing to vaccine trials. Leveraging Bootstrapped Thompson sampling (BTS), we obtain an approximate posterior distribution over delay distributions and conversion probabilities, thereby extending an Expectation-Maximisation (EM) model to the Bayesian domain. Unlike prior methodologies, our approach does not overlook uncertainty on delays. Within the EM framework, we employ the Kaplan-Meier estimator to place no restriction on delay distributions. Through extensive benchmarking against state-of-the-art techniques, our approach demonstrates superior performance across the majority of tested environments, with comparable performance in the remaining cases. Furthermore, our method offers practical implementation using off-the-shelf libraries, facilitating broader adoption. Our technique lays a foundation for extending to other bandit settings, such as non-contextual bandits or action-dependent delay distributions, promising wider applicability and versatility in real-world applications.} }
Endnote
%0 Conference Paper %T Bootstrap Your Conversions: Thompson Sampling for Partially Observable Delayed Rewards %A Marco Gigli %A Fabio Stella %B Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2024 %E Negar Kiyavash %E Joris M. Mooij %F pmlr-v244-gigli24a %I PMLR %P 1438--1452 %U https://proceedings.mlr.press/v244/gigli24a.html %V 244 %X This paper presents a novel approach to address contextual bandit problems with partially observable, delayed feedback by introducing an approximate Thompson sampling technique. This is a common setting, with applications ranging from online marketing to vaccine trials. Leveraging Bootstrapped Thompson sampling (BTS), we obtain an approximate posterior distribution over delay distributions and conversion probabilities, thereby extending an Expectation-Maximisation (EM) model to the Bayesian domain. Unlike prior methodologies, our approach does not overlook uncertainty on delays. Within the EM framework, we employ the Kaplan-Meier estimator to place no restriction on delay distributions. Through extensive benchmarking against state-of-the-art techniques, our approach demonstrates superior performance across the majority of tested environments, with comparable performance in the remaining cases. Furthermore, our method offers practical implementation using off-the-shelf libraries, facilitating broader adoption. Our technique lays a foundation for extending to other bandit settings, such as non-contextual bandits or action-dependent delay distributions, promising wider applicability and versatility in real-world applications.
APA
Gigli, M. & Stella, F.. (2024). Bootstrap Your Conversions: Thompson Sampling for Partially Observable Delayed Rewards. Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 244:1438-1452 Available from https://proceedings.mlr.press/v244/gigli24a.html.

Related Material