Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds

Shinji Ito; Taira Tsuchiya; Junya Honda

Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds

Shinji Ito, Taira Tsuchiya, Junya Honda

Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:1421-1422, 2022.

Abstract

This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BOBW) algorithm that works nearly optimally in both stochastic and adversarial settings. In stochastic settings, some existing BOBW algorithms achieve tight gap-dependent regret bounds of $O(\sum_{i: \Delta_i>0} \frac{\log T}{\Delta_i})$ for suboptimality gap $\Delta_i$ of arm $i$ and time horizon $T$. On the other hand, it is shown in Audibert et al. (2007) that the regret bound can be tightened to $O(\sum_{i: \Delta_i>0} (\frac{\sigma_i^2}{\Delta_i} + 1) \log T )$ using the loss variance $\sigma_i^2$ of each arm $i$ in the stochastic environments. In this paper, we propose an algorithm based on the follow-the-regularized-leader method, which employs adaptive learning rates that depend on the empirical prediction error of the loss. This is the first BOBW algorithm with gap-variance-dependent bounds, showing that the variance information can be used even in the possibly adversarial environment. Further, the leading constant factor in our gap-variance dependent bound is only (almost) twice the value for the lower bound. In addition, the proposed algorithm enjoys multiple data-dependent regret bounds in adversarial settings and works well in stochastic settings with adversarial corruptions. Table 1 summarizes the achievable bounds in comparison with UCB-V Audibert et al. (2007), Tsallis-INF (Zimmert and Seldin, 2021) and LB-INF (Ito, 2021).

Cite this Paper

BibTeX


@InProceedings{pmlr-v178-ito22a,
  title = 	 {Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds},
  author =       {Ito, Shinji and Tsuchiya, Taira and Honda, Junya},
  booktitle = 	 {Proceedings of Thirty Fifth Conference on Learning Theory},
  pages = 	 {1421--1422},
  year = 	 {2022},
  editor = 	 {Loh, Po-Ling and Raginsky, Maxim},
  volume = 	 {178},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--05 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v178/ito22a/ito22a.pdf},
  url = 	 {https://proceedings.mlr.press/v178/ito22a.html},
  abstract = 	 {This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BOBW) algorithm that works nearly optimally in both stochastic and adversarial settings. In stochastic settings, some existing BOBW algorithms achieve tight gap-dependent regret bounds of $O(\sum_{i: \Delta_i>0} \frac{\log T}{\Delta_i})$ for suboptimality gap $\Delta_i$ of arm $i$ and time horizon $T$. On the other hand, it is shown in Audibert et al. (2007) that the regret bound can be tightened to $O(\sum_{i: \Delta_i>0} (\frac{\sigma_i^2}{\Delta_i} + 1) \log T )$ using the loss variance $\sigma_i^2$ of each arm $i$ in the stochastic environments. In this paper, we propose an algorithm based on the follow-the-regularized-leader method, which employs adaptive learning rates that depend on the empirical prediction error of the loss. This is the first BOBW algorithm with gap-variance-dependent bounds, showing that the variance information can be used even in the possibly adversarial environment. Further, the leading constant factor in our gap-variance dependent bound is only (almost) twice the value for the lower bound. In addition, the proposed algorithm enjoys multiple data-dependent regret bounds in adversarial settings and works well in stochastic settings with adversarial corruptions. Table 1 summarizes the achievable bounds in comparison with UCB-V Audibert et al. (2007), Tsallis-INF (Zimmert and Seldin, 2021) and LB-INF (Ito, 2021).}
}

Endnote

%0 Conference Paper
%T Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds
%A Shinji Ito
%A Taira Tsuchiya
%A Junya Honda
%B Proceedings of Thirty Fifth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2022
%E Po-Ling Loh
%E Maxim Raginsky	
%F pmlr-v178-ito22a
%I PMLR
%P 1421--1422
%U https://proceedings.mlr.press/v178/ito22a.html
%V 178
%X This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BOBW) algorithm that works nearly optimally in both stochastic and adversarial settings. In stochastic settings, some existing BOBW algorithms achieve tight gap-dependent regret bounds of $O(\sum_{i: \Delta_i>0} \frac{\log T}{\Delta_i})$ for suboptimality gap $\Delta_i$ of arm $i$ and time horizon $T$. On the other hand, it is shown in Audibert et al. (2007) that the regret bound can be tightened to $O(\sum_{i: \Delta_i>0} (\frac{\sigma_i^2}{\Delta_i} + 1) \log T )$ using the loss variance $\sigma_i^2$ of each arm $i$ in the stochastic environments. In this paper, we propose an algorithm based on the follow-the-regularized-leader method, which employs adaptive learning rates that depend on the empirical prediction error of the loss. This is the first BOBW algorithm with gap-variance-dependent bounds, showing that the variance information can be used even in the possibly adversarial environment. Further, the leading constant factor in our gap-variance dependent bound is only (almost) twice the value for the lower bound. In addition, the proposed algorithm enjoys multiple data-dependent regret bounds in adversarial settings and works well in stochastic settings with adversarial corruptions. Table 1 summarizes the achievable bounds in comparison with UCB-V Audibert et al. (2007), Tsallis-INF (Zimmert and Seldin, 2021) and LB-INF (Ito, 2021).

APA


Ito, S., Tsuchiya, T. & Honda, J.. (2022). Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:1421-1422 Available from https://proceedings.mlr.press/v178/ito22a.html.

Related Material

Download PDF