Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback

Marc Jourdan; Mojmír Mutný; Johannes Kirschner; Andreas Krause

Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback

Marc Jourdan, Mojmír Mutný, Johannes Kirschner, Andreas Krause

Proceedings of the 32nd International Conference on Algorithmic Learning Theory, PMLR 132:805-849, 2021.

Abstract

Combinatorial bandits with semi-bandit feedback generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. The action set satisfies a given structure such as forming a base of a matroid or a path in a graph. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Using the recently popularized game framework, we interpret this problem as a sequential zero-sum game and develop a CombGame meta-algorithm whose instances are asymptotically optimal algorithms with finite time guarantees. In addition to comparing two families of learners to instantiate our meta-algorithm, the main contribution of our work is a specific oracle efficient instance for best-arm identification with combinatorial actions. Based on a projection-free online learning algorithm for convex polytopes, it is the first computationally efficient algorithm which is asymptotically optimal and has competitive empirical performance.

Cite this Paper

BibTeX


@InProceedings{pmlr-v132-jourdan21a,
  title = 	 {Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback},
  author =       {Jourdan, Marc and Mutn\'y, Mojm\'ir and Kirschner, Johannes and Krause, Andreas},
  booktitle = 	 {Proceedings of the 32nd International Conference on Algorithmic Learning Theory},
  pages = 	 {805--849},
  year = 	 {2021},
  editor = 	 {Feldman, Vitaly and Ligett, Katrina and Sabato, Sivan},
  volume = 	 {132},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--19 Mar},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v132/jourdan21a/jourdan21a.pdf},
  url = 	 {https://proceedings.mlr.press/v132/jourdan21a.html},
  abstract = 	 {Combinatorial bandits with semi-bandit feedback generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. The action set satisfies a given structure such as forming a base of a matroid or a path in a graph. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Using the recently popularized game framework, we interpret this problem as a sequential zero-sum game and develop a CombGame meta-algorithm whose instances are asymptotically optimal algorithms with finite time guarantees. In addition to comparing two families of learners to instantiate our meta-algorithm, the main contribution of our work is a specific oracle efficient instance for best-arm identification with combinatorial actions. Based on a projection-free online learning algorithm for convex polytopes, it is the first computationally efficient algorithm which is asymptotically optimal and has competitive empirical performance.}
}

Endnote

%0 Conference Paper
%T Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback
%A Marc Jourdan
%A Mojmír Mutný
%A Johannes Kirschner
%A Andreas Krause
%B Proceedings of the 32nd International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2021
%E Vitaly Feldman
%E Katrina Ligett
%E Sivan Sabato	
%F pmlr-v132-jourdan21a
%I PMLR
%P 805--849
%U https://proceedings.mlr.press/v132/jourdan21a.html
%V 132
%X Combinatorial bandits with semi-bandit feedback generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. The action set satisfies a given structure such as forming a base of a matroid or a path in a graph. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Using the recently popularized game framework, we interpret this problem as a sequential zero-sum game and develop a CombGame meta-algorithm whose instances are asymptotically optimal algorithms with finite time guarantees. In addition to comparing two families of learners to instantiate our meta-algorithm, the main contribution of our work is a specific oracle efficient instance for best-arm identification with combinatorial actions. Based on a projection-free online learning algorithm for convex polytopes, it is the first computationally efficient algorithm which is asymptotically optimal and has competitive empirical performance.

APA


Jourdan, M., Mutný, M., Kirschner, J. & Krause, A.. (2021). Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback. Proceedings of the 32nd International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 132:805-849 Available from https://proceedings.mlr.press/v132/jourdan21a.html.

Related Material

Download PDF