Learning to Optimize under Non-Stationarity

[edit]

Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu ;
Proceedings of Machine Learning Research, PMLR 89:1079-1087, 2019.

Abstract

We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Our main contributions are the tuned Sliding Window UCB (SW-UCB) algorithm with optimal dynamic regret, and the tuning free bandit-over-bandit (BOB) framework built on top of the SW-UCB algorithm with best (compared to existing literature) dynamic regret.

Related Material