Learning to Optimize under Non-Stationarity

Wang Chi Cheung; David Simchi-Levi; Ruihao Zhu

Learning to Optimize under Non-Stationarity

Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:1079-1087, 2019.

Abstract

We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Our main contributions are the tuned Sliding Window UCB (SW-UCB) algorithm with optimal dynamic regret, and the tuning free bandit-over-bandit (BOB) framework built on top of the SW-UCB algorithm with best (compared to existing literature) dynamic regret.

Cite this Paper

BibTeX


@InProceedings{pmlr-v89-cheung19b,
  title = 	 {Learning to Optimize under Non-Stationarity},
  author =       {Cheung, Wang Chi and Simchi-Levi, David and Zhu, Ruihao},
  booktitle = 	 {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1079--1087},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Sugiyama, Masashi},
  volume = 	 {89},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v89/cheung19b/cheung19b.pdf},
  url = 	 {https://proceedings.mlr.press/v89/cheung19b.html},
  abstract = 	 {We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Our main contributions are the tuned Sliding Window UCB (SW-UCB) algorithm with optimal dynamic regret, and the tuning free bandit-over-bandit (BOB) framework built on top of the SW-UCB algorithm with best (compared to existing literature) dynamic regret.}
}

Endnote

%0 Conference Paper
%T Learning to Optimize under Non-Stationarity
%A Wang Chi Cheung
%A David Simchi-Levi
%A Ruihao Zhu
%B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Masashi Sugiyama	
%F pmlr-v89-cheung19b
%I PMLR
%P 1079--1087
%U https://proceedings.mlr.press/v89/cheung19b.html
%V 89
%X We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Our main contributions are the tuned Sliding Window UCB (SW-UCB) algorithm with optimal dynamic regret, and the tuning free bandit-over-bandit (BOB) framework built on top of the SW-UCB algorithm with best (compared to existing literature) dynamic regret.

APA


Cheung, W.C., Simchi-Levi, D. & Zhu, R.. (2019). Learning to Optimize under Non-Stationarity. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:1079-1087 Available from https://proceedings.mlr.press/v89/cheung19b.html.

Learning to Optimize under Non-Stationarity

Abstract

Cite this Paper

Related Material