Contextual bandits with continuous actions: Smoothing, zooming, and adapting

Akshay Krishnamurthy; John Langford; Aleksandrs Slivkins; Chicheng Zhang

Contextual bandits with continuous actions: Smoothing, zooming, and adapting

Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins, Chicheng Zhang

Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:2025-2027, 2019.

Abstract

We study contextual bandit learning for any competitor policy class and continuous action space. We obtain two qualitatively different regret bounds: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both bounds exhibit data-dependent “zooming" behavior and, with no tuning, yield improved guarantees for benign problems. We also study adapting to unknown smoothness parameters, establishing a price-of-adaptivity and deriving optimal adaptive algorithms that require no additional information.

Cite this Paper

BibTeX


@InProceedings{pmlr-v99-krishnamurthy19a,
  title = 	 {Contextual bandits with continuous actions: Smoothing, zooming, and adapting},
  author =       {Krishnamurthy, Akshay and Langford, John and Slivkins, Aleksandrs and Zhang, Chicheng},
  booktitle = 	 {Proceedings of the Thirty-Second Conference on Learning Theory},
  pages = 	 {2025--2027},
  year = 	 {2019},
  editor = 	 {Beygelzimer, Alina and Hsu, Daniel},
  volume = 	 {99},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--28 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v99/krishnamurthy19a/krishnamurthy19a.pdf},
  url = 	 {https://proceedings.mlr.press/v99/krishnamurthy19a.html},
  abstract = 	 {We study contextual bandit learning for any competitor policy class and continuous action space.  We obtain two qualitatively different regret bounds: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both bounds exhibit data-dependent “zooming" behavior and, with no tuning, yield improved guarantees for benign problems. We also study adapting to unknown smoothness parameters, establishing a price-of-adaptivity and deriving optimal adaptive algorithms that require no additional information.}
}

Endnote

%0 Conference Paper
%T Contextual bandits with continuous actions: Smoothing, zooming, and adapting
%A Akshay Krishnamurthy
%A John Langford
%A Aleksandrs Slivkins
%A Chicheng Zhang
%B Proceedings of the Thirty-Second Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2019
%E Alina Beygelzimer
%E Daniel Hsu	
%F pmlr-v99-krishnamurthy19a
%I PMLR
%P 2025--2027
%U https://proceedings.mlr.press/v99/krishnamurthy19a.html
%V 99
%X We study contextual bandit learning for any competitor policy class and continuous action space.  We obtain two qualitatively different regret bounds: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both bounds exhibit data-dependent “zooming" behavior and, with no tuning, yield improved guarantees for benign problems. We also study adapting to unknown smoothness parameters, establishing a price-of-adaptivity and deriving optimal adaptive algorithms that require no additional information.

APA


Krishnamurthy, A., Langford, J., Slivkins, A. & Zhang, C.. (2019). Contextual bandits with continuous actions: Smoothing, zooming, and adapting. Proceedings of the Thirty-Second Conference on Learning Theory, in Proceedings of Machine Learning Research 99:2025-2027 Available from https://proceedings.mlr.press/v99/krishnamurthy19a.html.

Related Material

Download PDF