Open Problem: Adversarial Multiarmed Bandits with Limited Advice
Proceedings of the 26th Annual Conference on Learning Theory, PMLR 30:1067-1072, 2013.
Adversarial multiarmed bandits with expert advice is one of the fundamental problems in studying the exploration-exploitation trade-off. It is known that if we observe the advice of all experts on every round we can achieve O\left(\sqrtKT \ln N\right) regret, where K is the number of arms, T is the number of game rounds, and N is the number of experts. It is also known that if we observe the advice of just one expert on every round, we can achieve regret of order O\left(\sqrtNT\right). Our open problem is what can be achieved by asking M experts on every round, where 1