[edit]
Improved Analysis of the Tsallis-INF Algorithm in Stochastically Constrained Adversarial Bandits and Stochastic Bandits with Adversarial Corruptions
Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:3330-3350, 2021.
Abstract
We derive improved regret bounds for the Tsallis-INF algorithm of Zimmert and Seldin (2021). We show that in adversarial regimes with a (Δ,C,T) self-bounding constraint the algorithm achieves O((∑i≠i∗1Δi)log+((K−1)T(∑i≠i∗1Δi)2)+√C(∑i≠i∗1Δi)log+((K−1)TC∑i≠i∗1Δi)) regret bound, where T is the time horizon, K is the number of arms, Δi are the suboptimality gaps, i∗ is the best arm, C is the corruption magnitude, and log+(x)=max. The regime includes stochastic bandits, stochastically constrained adversarial bandits, and stochastic bandits with adversarial corruptions as special cases. Additionally, we provide a general analysis, which allows to achieve the same kind of improvement for generalizations of Tsallis-INF to other settings beyond multiarmed bandits.