Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback

Marc Jourdan, Mojmír Mutný, Johannes Kirschner, Andreas Krause
Proceedings of the 32nd International Conference on Algorithmic Learning Theory, PMLR 132:805-849, 2021.

Abstract

Combinatorial bandits with semi-bandit feedback generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. The action set satisfies a given structure such as forming a base of a matroid or a path in a graph. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Using the recently popularized game framework, we interpret this problem as a sequential zero-sum game and develop a CombGame meta-algorithm whose instances are asymptotically optimal algorithms with finite time guarantees. In addition to comparing two families of learners to instantiate our meta-algorithm, the main contribution of our work is a specific oracle efficient instance for best-arm identification with combinatorial actions. Based on a projection-free online learning algorithm for convex polytopes, it is the first computationally efficient algorithm which is asymptotically optimal and has competitive empirical performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v132-jourdan21a, title = {Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback}, author = {Jourdan, Marc and Mutn\'y, Mojm\'ir and Kirschner, Johannes and Krause, Andreas}, booktitle = {Proceedings of the 32nd International Conference on Algorithmic Learning Theory}, pages = {805--849}, year = {2021}, editor = {Feldman, Vitaly and Ligett, Katrina and Sabato, Sivan}, volume = {132}, series = {Proceedings of Machine Learning Research}, month = {16--19 Mar}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v132/jourdan21a/jourdan21a.pdf}, url = {https://proceedings.mlr.press/v132/jourdan21a.html}, abstract = {Combinatorial bandits with semi-bandit feedback generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. The action set satisfies a given structure such as forming a base of a matroid or a path in a graph. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Using the recently popularized game framework, we interpret this problem as a sequential zero-sum game and develop a CombGame meta-algorithm whose instances are asymptotically optimal algorithms with finite time guarantees. In addition to comparing two families of learners to instantiate our meta-algorithm, the main contribution of our work is a specific oracle efficient instance for best-arm identification with combinatorial actions. Based on a projection-free online learning algorithm for convex polytopes, it is the first computationally efficient algorithm which is asymptotically optimal and has competitive empirical performance.} }
Endnote
%0 Conference Paper %T Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback %A Marc Jourdan %A Mojmír Mutný %A Johannes Kirschner %A Andreas Krause %B Proceedings of the 32nd International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2021 %E Vitaly Feldman %E Katrina Ligett %E Sivan Sabato %F pmlr-v132-jourdan21a %I PMLR %P 805--849 %U https://proceedings.mlr.press/v132/jourdan21a.html %V 132 %X Combinatorial bandits with semi-bandit feedback generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. The action set satisfies a given structure such as forming a base of a matroid or a path in a graph. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Using the recently popularized game framework, we interpret this problem as a sequential zero-sum game and develop a CombGame meta-algorithm whose instances are asymptotically optimal algorithms with finite time guarantees. In addition to comparing two families of learners to instantiate our meta-algorithm, the main contribution of our work is a specific oracle efficient instance for best-arm identification with combinatorial actions. Based on a projection-free online learning algorithm for convex polytopes, it is the first computationally efficient algorithm which is asymptotically optimal and has competitive empirical performance.
APA
Jourdan, M., Mutný, M., Kirschner, J. & Krause, A.. (2021). Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback. Proceedings of the 32nd International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 132:805-849 Available from https://proceedings.mlr.press/v132/jourdan21a.html.

Related Material