Open Problem: First-Order Regret Bounds for Contextual Bandits

Alekh Agarwal; Akshay Krishnamurthy; John Langford; Haipeng Luo; Schapire Robert E.

Open Problem: First-Order Regret Bounds for Contextual Bandits

Alekh Agarwal, Akshay Krishnamurthy, John Langford, Haipeng Luo, Schapire Robert E.

Proceedings of the 2017 Conference on Learning Theory, PMLR 65:4-7, 2017.

Abstract

We describe two open problems related to first order regret bounds for contextual bandits. The first asks for an algorithm with a regret bound of

$\tilde{\mathcal{O}}(\sqrt{L_⋆}K \ln N)$ where there are

$K$ actions,

$N$ policies, and

$L_⋆$ is the cumulative loss of the best policy. The second asks for an optimization-oracle-efficient algorithm with regret

$\tilde{\mathcal{O}}(L_⋆^{2/3}poly(K, \ln(N/δ)))$ . We describe some positive results, such as an inefficient algorithm for the second problem, and some partial negative results.

Cite this Paper

BibTeX


@InProceedings{pmlr-v65-agarwal17a,
  title = 	 {Open Problem: First-Order Regret Bounds for Contextual Bandits},
  author = 	 {Agarwal, Alekh and Krishnamurthy, Akshay and Langford, John and Luo, Haipeng and E., Schapire Robert},
  booktitle = 	 {Proceedings of the 2017 Conference on Learning Theory},
  pages = 	 {4--7},
  year = 	 {2017},
  editor = 	 {Kale, Satyen and Shamir, Ohad},
  volume = 	 {65},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {07--10 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v65/agarwal17a/agarwal17a.pdf},
  url = 	 {https://proceedings.mlr.press/v65/agarwal17a.html},
  abstract = 	 {We describe two open problems related to first order regret bounds for contextual bandits. The first asks for an algorithm with a regret bound of $\tilde{\mathcal{O}}(\sqrt{L_⋆}K \ln N)$ where there are $K$ actions, $N$ policies, and $L_⋆$ is the cumulative loss of the best policy. The second asks for an optimization-oracle-efficient algorithm with regret $\tilde{\mathcal{O}}(L_⋆^{2/3}poly(K, \ln(N/δ)))$.  We describe some positive results, such as an inefficient algorithm for the second problem, and some partial negative results. }
}

Endnote

%0 Conference Paper
%T Open Problem: First-Order Regret Bounds for Contextual Bandits
%A Alekh Agarwal
%A Akshay Krishnamurthy
%A John Langford
%A Haipeng Luo
%A Schapire Robert E.
%B Proceedings of the 2017 Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2017
%E Satyen Kale
%E Ohad Shamir	
%F pmlr-v65-agarwal17a
%I PMLR
%P 4--7
%U https://proceedings.mlr.press/v65/agarwal17a.html
%V 65
%X We describe two open problems related to first order regret bounds for contextual bandits. The first asks for an algorithm with a regret bound of $\tilde{\mathcal{O}}(\sqrt{L_⋆}K \ln N)$ where there are $K$ actions, $N$ policies, and $L_⋆$ is the cumulative loss of the best policy. The second asks for an optimization-oracle-efficient algorithm with regret $\tilde{\mathcal{O}}(L_⋆^{2/3}poly(K, \ln(N/δ)))$.  We describe some positive results, such as an inefficient algorithm for the second problem, and some partial negative results.

APA


Agarwal, A., Krishnamurthy, A., Langford, J., Luo, H. & E., S.R.. (2017). Open Problem: First-Order Regret Bounds for Contextual Bandits. Proceedings of the 2017 Conference on Learning Theory, in Proceedings of Machine Learning Research 65:4-7 Available from https://proceedings.mlr.press/v65/agarwal17a.html.

Related Material

Download PDF