Achieving Optimal Dynamic Regret for Non-stationary Bandits without Prior Information
Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:159-163, 2019.
This joint extended abstract introduces and compares the results of (Auer et al., 2019) and (Chen et al., 2019), both of which resolve the problem of achieving optimal dynamic regret for non-stationary bandits without prior information on the non-stationarity. Specifically, Auer et al. (2019) resolve the problem for the traditional multi-armed bandits setting, while Chen et al. (2019) give a solution for the more general contextual bandits setting. Both works extend the key idea of (Auer et al., 2018) developed for a simpler two-armed setting.