[edit]
Bandit Regret Scaling with the Effective Loss Range
Proceedings of Algorithmic Learning Theory, PMLR 83:128-151, 2018.
Abstract
We study how the regret guarantees of nonstochastic multi-armed
bandits can be improved, if the effective range of the losses in each round is
small (for example, the maximal difference between two losses or in a given
round). Despite a recent impossibility result, we show how this can be made
possible under certain mild additional assumptions, such as availability of
rough estimates of the losses, or knowledge of the loss of a single, possibly
unspecified arm, at the end of each round. Along the way, we develop a novel
technique which might be of independent interest, to convert any multi-armed
bandit algorithm with regret depending on the loss range, to an algorithm with
regret depending only on the effective range, while attaining better regret
bounds than existing approaches.