Open Problem: Risk of Ruin in Multiarmed Bandits

Filipo S. Perotto, Mathieu Bourgais, Bruno C. Silva, Laurent Vercouter
Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:3194-3197, 2019.

Abstract

We formalize a particular class of problems called \textit{survival multiarmed bandits} (S-MAB), which constitutes a modified version of \textit{budgeted multiarmed bandits} (B-MAB) where a true \textit{risk of ruin} must be considered, bringing it closer to \textit{risk-averse multiarmed bandits} (RA-MAB). In a S-MAB, pulling an arm can result in both positive and negative rewards. The agent has an initial budget that evolves in time with the received rewards. The goal is finding a good \textit{exploration-exploitation-safety} trade-off, maximizing rewards while minimizing the probability of getting ruined (i.e. hitting a negative budget). Such simple and until now neglected modification in the MAB statement changes the way to approach the problem, asking for adapted algorithms and specific analytical tools, and also make it more likely related to some important real-world applications. We are interested in the following open problems which stem from such new MAB definition: (a) how can the regret be meaningfully defined in formal terms for a S-MAB given its multiobjective optimization nature? (b) can a S-MAB be reduced to a RA-MAB or a B-MAB, transferring their theoretical guarantees? (c) what kind of method or strategy must an agent follow to optimally solve a S-MAB?

Cite this Paper


BibTeX
@InProceedings{pmlr-v99-perotto19a, title = {Open Problem: Risk of Ruin in Multiarmed Bandits}, author = {Perotto, Filipo S. and Bourgais, Mathieu and Silva, Bruno C. and Vercouter, Laurent}, booktitle = {Proceedings of the Thirty-Second Conference on Learning Theory}, pages = {3194--3197}, year = {2019}, editor = {Beygelzimer, Alina and Hsu, Daniel}, volume = {99}, series = {Proceedings of Machine Learning Research}, month = {25--28 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v99/perotto19a/perotto19a.pdf}, url = {https://proceedings.mlr.press/v99/perotto19a.html}, abstract = {We formalize a particular class of problems called \textit{survival multiarmed bandits} (S-MAB), which constitutes a modified version of \textit{budgeted multiarmed bandits} (B-MAB) where a true \textit{risk of ruin} must be considered, bringing it closer to \textit{risk-averse multiarmed bandits} (RA-MAB). In a S-MAB, pulling an arm can result in both positive and negative rewards. The agent has an initial budget that evolves in time with the received rewards. The goal is finding a good \textit{exploration-exploitation-safety} trade-off, maximizing rewards while minimizing the probability of getting ruined (i.e. hitting a negative budget). Such simple and until now neglected modification in the MAB statement changes the way to approach the problem, asking for adapted algorithms and specific analytical tools, and also make it more likely related to some important real-world applications. We are interested in the following open problems which stem from such new MAB definition: (a) how can the regret be meaningfully defined in formal terms for a S-MAB given its multiobjective optimization nature? (b) can a S-MAB be reduced to a RA-MAB or a B-MAB, transferring their theoretical guarantees? (c) what kind of method or strategy must an agent follow to optimally solve a S-MAB?} }
Endnote
%0 Conference Paper %T Open Problem: Risk of Ruin in Multiarmed Bandits %A Filipo S. Perotto %A Mathieu Bourgais %A Bruno C. Silva %A Laurent Vercouter %B Proceedings of the Thirty-Second Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2019 %E Alina Beygelzimer %E Daniel Hsu %F pmlr-v99-perotto19a %I PMLR %P 3194--3197 %U https://proceedings.mlr.press/v99/perotto19a.html %V 99 %X We formalize a particular class of problems called \textit{survival multiarmed bandits} (S-MAB), which constitutes a modified version of \textit{budgeted multiarmed bandits} (B-MAB) where a true \textit{risk of ruin} must be considered, bringing it closer to \textit{risk-averse multiarmed bandits} (RA-MAB). In a S-MAB, pulling an arm can result in both positive and negative rewards. The agent has an initial budget that evolves in time with the received rewards. The goal is finding a good \textit{exploration-exploitation-safety} trade-off, maximizing rewards while minimizing the probability of getting ruined (i.e. hitting a negative budget). Such simple and until now neglected modification in the MAB statement changes the way to approach the problem, asking for adapted algorithms and specific analytical tools, and also make it more likely related to some important real-world applications. We are interested in the following open problems which stem from such new MAB definition: (a) how can the regret be meaningfully defined in formal terms for a S-MAB given its multiobjective optimization nature? (b) can a S-MAB be reduced to a RA-MAB or a B-MAB, transferring their theoretical guarantees? (c) what kind of method or strategy must an agent follow to optimally solve a S-MAB?
APA
Perotto, F.S., Bourgais, M., Silva, B.C. & Vercouter, L.. (2019). Open Problem: Risk of Ruin in Multiarmed Bandits. Proceedings of the Thirty-Second Conference on Learning Theory, in Proceedings of Machine Learning Research 99:3194-3197 Available from https://proceedings.mlr.press/v99/perotto19a.html.

Related Material