Bandits with Adversarial Scaling

Thodoris Lykouris, Vahab Mirrokni, Renato Paes Leme
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:6511-6521, 2020.

Abstract

We study "adversarial scaling", a multi-armed bandit model where rewards have a stochastic and an adversarial component. Our model captures display advertising where the "click-through-rate" can be decomposed to a (fixed across time) arm-quality component and a non-stochastic user-relevance component (fixed across arms). Despite the relative stochasticity of our model, we demonstrate two settings where most bandit algorithms suffer. On the positive side, we show that two algorithms, one from the action elimination and one from the mirror descent family are adaptive enough to be robust to adversarial scaling. Our results shed light on the robustness of adaptive parameter selection in stochastic bandits, which may be of independent interest.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-lykouris20a, title = {Bandits with Adversarial Scaling}, author = {Lykouris, Thodoris and Mirrokni, Vahab and Leme, Renato Paes}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {6511--6521}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/lykouris20a/lykouris20a.pdf}, url = {https://proceedings.mlr.press/v119/lykouris20a.html}, abstract = {We study "adversarial scaling", a multi-armed bandit model where rewards have a stochastic and an adversarial component. Our model captures display advertising where the "click-through-rate" can be decomposed to a (fixed across time) arm-quality component and a non-stochastic user-relevance component (fixed across arms). Despite the relative stochasticity of our model, we demonstrate two settings where most bandit algorithms suffer. On the positive side, we show that two algorithms, one from the action elimination and one from the mirror descent family are adaptive enough to be robust to adversarial scaling. Our results shed light on the robustness of adaptive parameter selection in stochastic bandits, which may be of independent interest.} }
Endnote
%0 Conference Paper %T Bandits with Adversarial Scaling %A Thodoris Lykouris %A Vahab Mirrokni %A Renato Paes Leme %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-lykouris20a %I PMLR %P 6511--6521 %U https://proceedings.mlr.press/v119/lykouris20a.html %V 119 %X We study "adversarial scaling", a multi-armed bandit model where rewards have a stochastic and an adversarial component. Our model captures display advertising where the "click-through-rate" can be decomposed to a (fixed across time) arm-quality component and a non-stochastic user-relevance component (fixed across arms). Despite the relative stochasticity of our model, we demonstrate two settings where most bandit algorithms suffer. On the positive side, we show that two algorithms, one from the action elimination and one from the mirror descent family are adaptive enough to be robust to adversarial scaling. Our results shed light on the robustness of adaptive parameter selection in stochastic bandits, which may be of independent interest.
APA
Lykouris, T., Mirrokni, V. & Leme, R.P.. (2020). Bandits with Adversarial Scaling. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:6511-6521 Available from https://proceedings.mlr.press/v119/lykouris20a.html.

Related Material