Contextual bandits with continuous actions: Smoothing, zooming, and adapting

Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins, Chicheng Zhang
Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:2025-2027, 2019.

Abstract

We study contextual bandit learning for any competitor policy class and continuous action space. We obtain two qualitatively different regret bounds: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both bounds exhibit data-dependent “zooming" behavior and, with no tuning, yield improved guarantees for benign problems. We also study adapting to unknown smoothness parameters, establishing a price-of-adaptivity and deriving optimal adaptive algorithms that require no additional information.

Cite this Paper


BibTeX
@InProceedings{pmlr-v99-krishnamurthy19a, title = {Contextual bandits with continuous actions: Smoothing, zooming, and adapting}, author = {Krishnamurthy, Akshay and Langford, John and Slivkins, Aleksandrs and Zhang, Chicheng}, booktitle = {Proceedings of the Thirty-Second Conference on Learning Theory}, pages = {2025--2027}, year = {2019}, editor = {Beygelzimer, Alina and Hsu, Daniel}, volume = {99}, series = {Proceedings of Machine Learning Research}, month = {25--28 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v99/krishnamurthy19a/krishnamurthy19a.pdf}, url = {https://proceedings.mlr.press/v99/krishnamurthy19a.html}, abstract = {We study contextual bandit learning for any competitor policy class and continuous action space. We obtain two qualitatively different regret bounds: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both bounds exhibit data-dependent “zooming" behavior and, with no tuning, yield improved guarantees for benign problems. We also study adapting to unknown smoothness parameters, establishing a price-of-adaptivity and deriving optimal adaptive algorithms that require no additional information.} }
Endnote
%0 Conference Paper %T Contextual bandits with continuous actions: Smoothing, zooming, and adapting %A Akshay Krishnamurthy %A John Langford %A Aleksandrs Slivkins %A Chicheng Zhang %B Proceedings of the Thirty-Second Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2019 %E Alina Beygelzimer %E Daniel Hsu %F pmlr-v99-krishnamurthy19a %I PMLR %P 2025--2027 %U https://proceedings.mlr.press/v99/krishnamurthy19a.html %V 99 %X We study contextual bandit learning for any competitor policy class and continuous action space. We obtain two qualitatively different regret bounds: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both bounds exhibit data-dependent “zooming" behavior and, with no tuning, yield improved guarantees for benign problems. We also study adapting to unknown smoothness parameters, establishing a price-of-adaptivity and deriving optimal adaptive algorithms that require no additional information.
APA
Krishnamurthy, A., Langford, J., Slivkins, A. & Zhang, C.. (2019). Contextual bandits with continuous actions: Smoothing, zooming, and adapting. Proceedings of the Thirty-Second Conference on Learning Theory, in Proceedings of Machine Learning Research 99:2025-2027 Available from https://proceedings.mlr.press/v99/krishnamurthy19a.html.

Related Material