Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration
[edit]
Proceedings of the 2017 Conference on Learning Theory, PMLR 65:482534, 2017.
Abstract
We study the combinatorial pure exploration problem \textscBestSet in a stochastic multiarmed bandit game. In an \textscBestSet instance, we are given $n$ stochastic arms with unknown reward distributions, as well as a family $\mathcal{F}$ of feasible subsets over the arms. Let the weight of an arm be the mean of its reward distribution. Our goal is to identify the feasible subset in $\mathcal{F}$ with the maximum total weight, using as few samples as possible. The problem generalizes the classical best arm identification problem and the top$k$ arm identification problem, both of which have attracted significant attention in recent years. We provide a novel \textitinstancewise lower bound for the sample complexity of the problem, as well as a nontrivial sampling algorithm, matching the lower bound up to a factor of $\ln\mathcal{F}$. For an important class of combinatorial families (including spanning trees, matchings, and path constraints), we also provide polynomial time implementation of the sampling algorithm, using the equivalence of separation and optimization for convex program, and the notion of approximate Pareto curves in multiobjective optimization (note that $\mathcal{F}$ can be exponential in $n$). We also show that the $\ln\mathcal{F}$ factor is inevitable in general, through a nontrivial lower bound construction utilizing a combinatorial structure resembling the NisanWigderson design. Our results significantly improve several previous results for several important combinatorial constraints, and provide a tighter understanding of the general \textscBestSet problem. We further introduce an even more general problem, formulated in geometric terms. We are given $n$ Gaussian arms with unknown means and unit variance. Consider the $n$dimensional Euclidean space $\mathbb{R}^n$, and a collection $\mathcal{O}$ of disjoint subsets. Our goal is to determine the subset in $\mathcal{O}$ that contains the mean profile (which is the $n$dimensional vector of the means), using as few samples as possible. The problem generalizes most pure exploration bandit problems studied in the literature. We provide the first nearly optimal sample complexity upper and lower bounds for the problem.
Related Material


