Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons

[edit]

Arpit Agarwal, Shivani Agarwal, Sepehr Assadi, Sanjeev Khanna ;
Proceedings of the 2017 Conference on Learning Theory, PMLR 65:39-75, 2017.

Abstract

In many learning settings, active/adaptive querying is possible, but the number of rounds of adaptivity is limited. We study the relationship between query complexity and adaptivity in identifying the $k$ most biased coins among a set of $n$ coins with unknown biases. This problem is a common abstraction of many well-studied problems, including the problem of identifying the $k$ best arms in a stochastic multi-armed bandit, and the problem of top-$k$ ranking from pairwise comparisons. An $r$-round adaptive algorithm for the $k$ most biased coins problem specifies in each round the set of coin tosses to be performed based on the observed outcomes in earlier rounds, and outputs the set of $k$ most biased coins at the end of $r$ rounds. When $r=1$, the algorithm is known as \em non-adaptive; when $r$ is unbounded, the algorithm is known as \em fully adaptive. While the power of adaptivity in reducing query complexity is well known, full adaptivity requires repeated interaction with the coin tossing (feedback generation) mechanism, and is highly sequential, since the set of coins to be tossed in each round can only be determined after we have observed the outcomes of the coin tosses from the previous round. In contrast, algorithms with only few rounds of adaptivity require fewer rounds of interaction with the feedback generation mechanism, and offer the benefits of parallelism in algorithmic decision-making. Motivated by these considerations, we consider the question of how much adaptivity is needed to realize the optimal worst case query complexity for identifying the $k$ most biased coins. Given any positive integer $r$, we derive essentially matching upper and lower bounds on the query complexity of $r$-round algorithms. We then show that $Θ(\log^*n)$ rounds are both necessary and sufficient for achieving the optimal worst case query complexity for identifying the $k$ most biased coins. In particular, our algorithm achieves the optimal query complexity in at most $\log^*n$ rounds, which implies that on any realistic input, $5$ parallel rounds of exploration suffice to achieve the optimal worst-case sample complexity. The best known algorithm prior to our work required $Θ(\log n)$ rounds to achieve the optimal worst case query complexity even for the special case of $k=1$.

Related Material