[edit]
Open Problem: First-Order Regret Bounds for Contextual Bandits
Proceedings of the 2017 Conference on Learning Theory, PMLR 65:4-7, 2017.
Abstract
We describe two open problems related to first order regret bounds for contextual bandits. The first asks for an algorithm with a regret bound of ˜O(√L⋆KlnN) where there are K actions, N policies, and L⋆ is the cumulative loss of the best policy. The second asks for an optimization-oracle-efficient algorithm with regret \tilde{\mathcal{O}}(L_⋆^{2/3}poly(K, \ln(N/δ))). We describe some positive results, such as an inefficient algorithm for the second problem, and some partial negative results.