[edit]
An explore-then-commit algorithm for submodular maximization under full-bandit feedback
Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:1541-1551, 2022.
Abstract
We investigate the problem of combinatorial multi-armed bandits with stochastic submodular (in expectation) rewards and full-bandit feedback, where no extra information other than the reward of selected action at each time step t is observed. We propose a simple algorithm, Explore-Then-Commit Greedy (ETCG) and prove that it achieves a (1−1/e)-regret upper bound of O(n13k43T23log(T)12) for a horizon T, number of base elements n, and cardinality constraint k. We also show in experiments with synthetic and real-world data that the ETCG empirically outperforms other full-bandit methods.