Approximate Top-$m$ Arm Identification with Heterogeneous Reward Variances

Ruida Zhou; Chao Tian

Approximate Top- $m$ Arm Identification with Heterogeneous Reward Variances

Ruida Zhou, Chao Tian

Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:7483-7504, 2022.

Abstract

We study the effect of reward variance heterogeneity in the approximate top-

$m$ arm identification setting. In this setting, the reward for the

$i$ -th arm follows a

$\sigma^2_i$ -sub-Gaussian distribution, and the agent needs to incorporate this knowledge to minimize the expected number of arm pulls to identify

$m$ arms with the largest means within error

$\epsilon$ out of the

$n$ arms, with probability at least

$1-\delta$ . We show that the worst-case sample complexity of this problem is

$\Theta\left( \sum_{i =1}^n \frac{\sigma_i^2}{\epsilon^2} \ln\frac{1}{\delta} + \sum_{i \in G^{m}} \frac{\sigma_i^2}{\epsilon^2} \ln(m) + \sum_{j \in G^{l}} \frac{\sigma_j^2}{\epsilon^2} \text{Ent}(\sigma^2_{G^{r}}) \right),$ where

$G^{m}, G^{l}, G^{r}$ are certain specific subsets of the overall arm set

$\{1, 2, \ldots, n\}$ , and

$\text{Ent}(\cdot)$ is an entropy-like function which measures the heterogeneity of the variance proxies. The upper bound of the complexity is obtained using a divide-and-conquer style algorithm, while the matching lower bound relies on the study of a dual formulation.

Cite this Paper

BibTeX


@InProceedings{pmlr-v151-zhou22c,
  title = 	 { Approximate Top-$m$ Arm Identification with Heterogeneous Reward Variances },
  author =       {Zhou, Ruida and Tian, Chao},
  booktitle = 	 {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {7483--7504},
  year = 	 {2022},
  editor = 	 {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel},
  volume = 	 {151},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {28--30 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v151/zhou22c/zhou22c.pdf},
  url = 	 {https://proceedings.mlr.press/v151/zhou22c.html},
  abstract = 	 { We study the effect of reward variance heterogeneity in the approximate top-$m$ arm identification setting. In this setting, the reward for the $i$-th arm follows a $\sigma^2_i$-sub-Gaussian distribution, and the agent needs to incorporate this knowledge to minimize the expected number of arm pulls to identify $m$ arms with the largest means within error $\epsilon$ out of the $n$ arms, with probability at least $1-\delta$. We show that the worst-case sample complexity of this problem is $$\Theta\left( \sum_{i =1}^n \frac{\sigma_i^2}{\epsilon^2} \ln\frac{1}{\delta} + \sum_{i \in G^{m}} \frac{\sigma_i^2}{\epsilon^2} \ln(m) + \sum_{j \in G^{l}} \frac{\sigma_j^2}{\epsilon^2} \text{Ent}(\sigma^2_{G^{r}}) \right), $$where $G^{m}, G^{l}, G^{r}$ are certain specific subsets of the overall arm set $\{1, 2, \ldots, n\}$, and $\text{Ent}(\cdot)$ is an entropy-like function which measures the heterogeneity of the variance proxies. The upper bound of the complexity is obtained using a divide-and-conquer style algorithm, while the matching lower bound relies on the study of a dual formulation. }
}

Endnote

%0 Conference Paper
%T  Approximate Top-$m$ Arm Identification with Heterogeneous Reward Variances 
%A Ruida Zhou
%A Chao Tian
%B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2022
%E Gustau Camps-Valls
%E Francisco J. R. Ruiz
%E Isabel Valera	
%F pmlr-v151-zhou22c
%I PMLR
%P 7483--7504
%U https://proceedings.mlr.press/v151/zhou22c.html
%V 151
%X  We study the effect of reward variance heterogeneity in the approximate top-$m$ arm identification setting. In this setting, the reward for the $i$-th arm follows a $\sigma^2_i$-sub-Gaussian distribution, and the agent needs to incorporate this knowledge to minimize the expected number of arm pulls to identify $m$ arms with the largest means within error $\epsilon$ out of the $n$ arms, with probability at least $1-\delta$. We show that the worst-case sample complexity of this problem is $$\Theta\left( \sum_{i =1}^n \frac{\sigma_i^2}{\epsilon^2} \ln\frac{1}{\delta} + \sum_{i \in G^{m}} \frac{\sigma_i^2}{\epsilon^2} \ln(m) + \sum_{j \in G^{l}} \frac{\sigma_j^2}{\epsilon^2} \text{Ent}(\sigma^2_{G^{r}}) \right), $$where $G^{m}, G^{l}, G^{r}$ are certain specific subsets of the overall arm set $\{1, 2, \ldots, n\}$, and $\text{Ent}(\cdot)$ is an entropy-like function which measures the heterogeneity of the variance proxies. The upper bound of the complexity is obtained using a divide-and-conquer style algorithm, while the matching lower bound relies on the study of a dual formulation.

APA


Zhou, R. & Tian, C.. (2022).  Approximate Top-$m$ Arm Identification with Heterogeneous Reward Variances . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:7483-7504 Available from https://proceedings.mlr.press/v151/zhou22c.html.

Related Material

Download PDF

Approximate Top-mm Arm Identification with Heterogeneous Reward Variances

Abstract

Cite this Paper

Related Material

Approximate Top- $m$ Arm Identification with Heterogeneous Reward Variances