Open Problem: Risk of Ruin in Multiarmed Bandits
[edit]
Proceedings of the ThirtySecond Conference on Learning Theory, PMLR 99:31943197, 2019.
Abstract
We formalize a particular class of problems called \textit{survival multiarmed bandits} (SMAB), which constitutes a modified version of \textit{budgeted multiarmed bandits} (BMAB) where a true \textit{risk of ruin} must be considered, bringing it closer to \textit{riskaverse multiarmed bandits} (RAMAB). In a SMAB, pulling an arm can result in both positive and negative rewards. The agent has an initial budget that evolves in time with the received rewards. The goal is finding a good \textit{explorationexploitationsafety} tradeoff, maximizing rewards while minimizing the probability of getting ruined (i.e. hitting a negative budget). Such simple and until now neglected modification in the MAB statement changes the way to approach the problem, asking for adapted algorithms and specific analytical tools, and also make it more likely related to some important realworld applications. We are interested in the following open problems which stem from such new MAB definition: (a) how can the regret be meaningfully defined in formal terms for a SMAB given its multiobjective optimization nature? (b) can a SMAB be reduced to a RAMAB or a BMAB, transferring their theoretical guarantees? (c) what kind of method or strategy must an agent follow to optimally solve a SMAB?
Related Material


