Nonstationary Bandit Learning via Predictive Sampling

Yueyang Liu, Benjamin Van Roy, Kuang Xu
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:6215-6244, 2023.

Abstract

Thompson sampling has proven effective across a wide range of stationary bandit environments. However, as we demonstrate in this paper, it can perform poorly when applied to nonstationary environments. We show that such failures are attributed to the fact that, when exploring, the algorithm does not differentiate actions based on how quickly the information acquired loses its usefulness due to nonstationarity. Building upon this insight, we propose predictive sampling, an algorithm that deprioritizes acquiring information that quickly loses usefulness. Theoretical guarantee on the performance of predictive sampling is established through a Bayesian regret bound. We provide versions of predictive sampling for which computations tractably scale to complex bandit environments of practical interest. Through numerical simulation, we demonstrate that predictive sampling outperforms Thompson sampling in all nonstationary environments examined.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-liu23e, title = {Nonstationary Bandit Learning via Predictive Sampling}, author = {Liu, Yueyang and Van Roy, Benjamin and Xu, Kuang}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {6215--6244}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/liu23e/liu23e.pdf}, url = {https://proceedings.mlr.press/v206/liu23e.html}, abstract = {Thompson sampling has proven effective across a wide range of stationary bandit environments. However, as we demonstrate in this paper, it can perform poorly when applied to nonstationary environments. We show that such failures are attributed to the fact that, when exploring, the algorithm does not differentiate actions based on how quickly the information acquired loses its usefulness due to nonstationarity. Building upon this insight, we propose predictive sampling, an algorithm that deprioritizes acquiring information that quickly loses usefulness. Theoretical guarantee on the performance of predictive sampling is established through a Bayesian regret bound. We provide versions of predictive sampling for which computations tractably scale to complex bandit environments of practical interest. Through numerical simulation, we demonstrate that predictive sampling outperforms Thompson sampling in all nonstationary environments examined.} }
Endnote
%0 Conference Paper %T Nonstationary Bandit Learning via Predictive Sampling %A Yueyang Liu %A Benjamin Van Roy %A Kuang Xu %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-liu23e %I PMLR %P 6215--6244 %U https://proceedings.mlr.press/v206/liu23e.html %V 206 %X Thompson sampling has proven effective across a wide range of stationary bandit environments. However, as we demonstrate in this paper, it can perform poorly when applied to nonstationary environments. We show that such failures are attributed to the fact that, when exploring, the algorithm does not differentiate actions based on how quickly the information acquired loses its usefulness due to nonstationarity. Building upon this insight, we propose predictive sampling, an algorithm that deprioritizes acquiring information that quickly loses usefulness. Theoretical guarantee on the performance of predictive sampling is established through a Bayesian regret bound. We provide versions of predictive sampling for which computations tractably scale to complex bandit environments of practical interest. Through numerical simulation, we demonstrate that predictive sampling outperforms Thompson sampling in all nonstationary environments examined.
APA
Liu, Y., Van Roy, B. & Xu, K.. (2023). Nonstationary Bandit Learning via Predictive Sampling. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:6215-6244 Available from https://proceedings.mlr.press/v206/liu23e.html.

Related Material