Learning to Optimize under Non-Stationarity

Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:1079-1087, 2019.

Abstract

We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Our main contributions are the tuned Sliding Window UCB (SW-UCB) algorithm with optimal dynamic regret, and the tuning free bandit-over-bandit (BOB) framework built on top of the SW-UCB algorithm with best (compared to existing literature) dynamic regret.

Cite this Paper


BibTeX
@InProceedings{pmlr-v89-cheung19b, title = {Learning to Optimize under Non-Stationarity}, author = {Cheung, Wang Chi and Simchi-Levi, David and Zhu, Ruihao}, booktitle = {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics}, pages = {1079--1087}, year = {2019}, editor = {Chaudhuri, Kamalika and Sugiyama, Masashi}, volume = {89}, series = {Proceedings of Machine Learning Research}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v89/cheung19b/cheung19b.pdf}, url = {https://proceedings.mlr.press/v89/cheung19b.html}, abstract = {We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Our main contributions are the tuned Sliding Window UCB (SW-UCB) algorithm with optimal dynamic regret, and the tuning free bandit-over-bandit (BOB) framework built on top of the SW-UCB algorithm with best (compared to existing literature) dynamic regret.} }
Endnote
%0 Conference Paper %T Learning to Optimize under Non-Stationarity %A Wang Chi Cheung %A David Simchi-Levi %A Ruihao Zhu %B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Masashi Sugiyama %F pmlr-v89-cheung19b %I PMLR %P 1079--1087 %U https://proceedings.mlr.press/v89/cheung19b.html %V 89 %X We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Our main contributions are the tuned Sliding Window UCB (SW-UCB) algorithm with optimal dynamic regret, and the tuning free bandit-over-bandit (BOB) framework built on top of the SW-UCB algorithm with best (compared to existing literature) dynamic regret.
APA
Cheung, W.C., Simchi-Levi, D. & Zhu, R.. (2019). Learning to Optimize under Non-Stationarity. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:1079-1087 Available from https://proceedings.mlr.press/v89/cheung19b.html.

Related Material