Adaptivity to Smoothness in Xarmed bandits
[edit]
Proceedings of the 31st Conference On Learning Theory, PMLR 75:14631492, 2018.
Abstract
We study the stochastic continuumarmed bandit problem from the angle of adaptivity to \emph{unknown regularity} of the reward function $f$. We prove that there exists no strategy for the cumulative regret that adapts optimally to the \emph{smoothness} of $f$. We show however that such minimax optimal adaptive strategies exist if the learner is given \emph{extrainformation} about $f$. Finally, we complement our positive results with matching lower bounds.
Related Material


