Adaptivity to Smoothness in X-armed bandits

Andrea Locatelli, Alexandra Carpentier
Proceedings of the 31st Conference On Learning Theory, PMLR 75:1463-1492, 2018.

Abstract

We study the stochastic continuum-armed bandit problem from the angle of adaptivity to \emph{unknown regularity} of the reward function $f$. We prove that there exists no strategy for the cumulative regret that adapts optimally to the \emph{smoothness} of $f$. We show however that such minimax optimal adaptive strategies exist if the learner is given \emph{extra-information} about $f$. Finally, we complement our positive results with matching lower bounds.

Cite this Paper


BibTeX
@InProceedings{pmlr-v75-locatelli18a, title = {Adaptivity to Smoothness in X-armed bandits}, author = {Locatelli, Andrea and Carpentier, Alexandra}, booktitle = {Proceedings of the 31st Conference On Learning Theory}, pages = {1463--1492}, year = {2018}, editor = {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe}, volume = {75}, series = {Proceedings of Machine Learning Research}, month = {06--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v75/locatelli18a/locatelli18a.pdf}, url = {https://proceedings.mlr.press/v75/locatelli18a.html}, abstract = {We study the stochastic continuum-armed bandit problem from the angle of adaptivity to \emph{unknown regularity} of the reward function $f$. We prove that there exists no strategy for the cumulative regret that adapts optimally to the \emph{smoothness} of $f$. We show however that such minimax optimal adaptive strategies exist if the learner is given \emph{extra-information} about $f$. Finally, we complement our positive results with matching lower bounds.} }
Endnote
%0 Conference Paper %T Adaptivity to Smoothness in X-armed bandits %A Andrea Locatelli %A Alexandra Carpentier %B Proceedings of the 31st Conference On Learning Theory %C Proceedings of Machine Learning Research %D 2018 %E Sébastien Bubeck %E Vianney Perchet %E Philippe Rigollet %F pmlr-v75-locatelli18a %I PMLR %P 1463--1492 %U https://proceedings.mlr.press/v75/locatelli18a.html %V 75 %X We study the stochastic continuum-armed bandit problem from the angle of adaptivity to \emph{unknown regularity} of the reward function $f$. We prove that there exists no strategy for the cumulative regret that adapts optimally to the \emph{smoothness} of $f$. We show however that such minimax optimal adaptive strategies exist if the learner is given \emph{extra-information} about $f$. Finally, we complement our positive results with matching lower bounds.
APA
Locatelli, A. & Carpentier, A.. (2018). Adaptivity to Smoothness in X-armed bandits. Proceedings of the 31st Conference On Learning Theory, in Proceedings of Machine Learning Research 75:1463-1492 Available from https://proceedings.mlr.press/v75/locatelli18a.html.

Related Material