[edit]
Approximate Top-m Arm Identification with Heterogeneous Reward Variances
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:7483-7504, 2022.
Abstract
We study the effect of reward variance heterogeneity in the approximate top-m arm identification setting. In this setting, the reward for the i-th arm follows a σ2i-sub-Gaussian distribution, and the agent needs to incorporate this knowledge to minimize the expected number of arm pulls to identify m arms with the largest means within error ϵ out of the n arms, with probability at least 1−δ. We show that the worst-case sample complexity of this problem is Θ(n∑i=1σ2iϵ2ln1δ+∑i∈Gmσ2iϵ2ln(m)+∑j∈Glσ2jϵ2Ent(σ2Gr)),where Gm,Gl,Gr are certain specific subsets of the overall arm set {1,2,…,n}, and Ent(⋅) is an entropy-like function which measures the heterogeneity of the variance proxies. The upper bound of the complexity is obtained using a divide-and-conquer style algorithm, while the matching lower bound relies on the study of a dual formulation.