Best of both worlds: Stochastic & adversarial best-arm identification

Yasin Abbasi-Yadkori, Peter Bartlett, Victor Gabillon, Alan Malek, Michal Valko
Proceedings of the 31st Conference On Learning Theory, PMLR 75:918-949, 2018.

Abstract

We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the rewards are sampled stochastically. Therefore, we ask: $\backslash$emph{\{}Can we design a learner that performs optimally in both the stochastic and adversarial problems while not being aware of the nature of the rewards?{\}} First, we show that designing such a learner is impossible in general. In particular, to be robust to adversarial rewards, we can only guarantee optimal rates of error on a subset of the stochastic problems. We give a lower bound that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards. Finally, we design a simple parameter-free algorithm and show that its probability of error matches (up to log factors) the lower bound in stochastic problems, and it is also robust to adversarial ones.

Cite this Paper


BibTeX
@InProceedings{pmlr-v75-abbasi-yadkori18a, title = {{Best of both worlds: Stochastic {&} adversarial best-arm identification}}, author = {Abbasi-Yadkori, Yasin and Bartlett, Peter and Gabillon, Victor and Malek, Alan and Valko, Michal}, booktitle = {Proceedings of the 31st Conference On Learning Theory}, pages = {918--949}, year = {2018}, editor = {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe}, volume = {75}, series = {Proceedings of Machine Learning Research}, month = {06--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v75/abbasi-yadkori18a/abbasi-yadkori18a.pdf}, url = {https://proceedings.mlr.press/v75/abbasi-yadkori18a.html}, abstract = {We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the rewards are sampled stochastically. Therefore, we ask: $\backslash$emph{\{}Can we design a learner that performs optimally in both the stochastic and adversarial problems while not being aware of the nature of the rewards?{\}} First, we show that designing such a learner is impossible in general. In particular, to be robust to adversarial rewards, we can only guarantee optimal rates of error on a subset of the stochastic problems. We give a lower bound that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards. Finally, we design a simple parameter-free algorithm and show that its probability of error matches (up to log factors) the lower bound in stochastic problems, and it is also robust to adversarial ones.} }
Endnote
%0 Conference Paper %T Best of both worlds: Stochastic & adversarial best-arm identification %A Yasin Abbasi-Yadkori %A Peter Bartlett %A Victor Gabillon %A Alan Malek %A Michal Valko %B Proceedings of the 31st Conference On Learning Theory %C Proceedings of Machine Learning Research %D 2018 %E Sébastien Bubeck %E Vianney Perchet %E Philippe Rigollet %F pmlr-v75-abbasi-yadkori18a %I PMLR %P 918--949 %U https://proceedings.mlr.press/v75/abbasi-yadkori18a.html %V 75 %X We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the rewards are sampled stochastically. Therefore, we ask: $\backslash$emph{\{}Can we design a learner that performs optimally in both the stochastic and adversarial problems while not being aware of the nature of the rewards?{\}} First, we show that designing such a learner is impossible in general. In particular, to be robust to adversarial rewards, we can only guarantee optimal rates of error on a subset of the stochastic problems. We give a lower bound that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards. Finally, we design a simple parameter-free algorithm and show that its probability of error matches (up to log factors) the lower bound in stochastic problems, and it is also robust to adversarial ones.
APA
Abbasi-Yadkori, Y., Bartlett, P., Gabillon, V., Malek, A. & Valko, M.. (2018). Best of both worlds: Stochastic & adversarial best-arm identification. Proceedings of the 31st Conference On Learning Theory, in Proceedings of Machine Learning Research 75:918-949 Available from https://proceedings.mlr.press/v75/abbasi-yadkori18a.html.

Related Material