Open Problem: Risk of Ruin in Multiarmed Bandits


Filipo S. Perotto, Mathieu Bourgais, Bruno C. Silva, Laurent Vercouter ;
Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:3194-3197, 2019.


We formalize a particular class of problems called \textit{survival multiarmed bandits} (S-MAB), which constitutes a modified version of \textit{budgeted multiarmed bandits} (B-MAB) where a true \textit{risk of ruin} must be considered, bringing it closer to \textit{risk-averse multiarmed bandits} (RA-MAB). In a S-MAB, pulling an arm can result in both positive and negative rewards. The agent has an initial budget that evolves in time with the received rewards. The goal is finding a good \textit{exploration-exploitation-safety} trade-off, maximizing rewards while minimizing the probability of getting ruined (i.e. hitting a negative budget). Such simple and until now neglected modification in the MAB statement changes the way to approach the problem, asking for adapted algorithms and specific analytical tools, and also make it more likely related to some important real-world applications. We are interested in the following open problems which stem from such new MAB definition: (a) how can the regret be meaningfully defined in formal terms for a S-MAB given its multiobjective optimization nature? (b) can a S-MAB be reduced to a RA-MAB or a B-MAB, transferring their theoretical guarantees? (c) what kind of method or strategy must an agent follow to optimally solve a S-MAB?

Related Material