Adaptive Discretization for Adversarial Lipschitz Bandits

Chara Podimata, Alex Slivkins
Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:3788-3805, 2021.

Abstract

Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces such as the [0,1] interval, where similar actions are guaranteed to have similar rewards. A central theme here is the adaptive discretization of the action space, which gradually "zooms in" on the more promising regions thereof. The goal is to take advantage of "nicer" problem instances, while retaining near-optimal worst-case performance. While the stochastic version of the problem is well-understood, the general version with adversarial rewards is not. We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds. In particular, we recover the worst-case optimal regret bound for the adversarial version, and the instance-dependent regret bound for the stochastic version. A version with full proofs (and additional results) appears at arxiv.org/abs/2006.12367v2.

Cite this Paper


BibTeX
@InProceedings{pmlr-v134-podimata21a, title = {Adaptive Discretization for Adversarial Lipschitz Bandits}, author = {Podimata, Chara and Slivkins, Alex}, booktitle = {Proceedings of Thirty Fourth Conference on Learning Theory}, pages = {3788--3805}, year = {2021}, editor = {Belkin, Mikhail and Kpotufe, Samory}, volume = {134}, series = {Proceedings of Machine Learning Research}, month = {15--19 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v134/podimata21a/podimata21a.pdf}, url = {https://proceedings.mlr.press/v134/podimata21a.html}, abstract = {Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces such as the [0,1] interval, where similar actions are guaranteed to have similar rewards. A central theme here is the adaptive discretization of the action space, which gradually "zooms in" on the more promising regions thereof. The goal is to take advantage of "nicer" problem instances, while retaining near-optimal worst-case performance. While the stochastic version of the problem is well-understood, the general version with adversarial rewards is not. We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds. In particular, we recover the worst-case optimal regret bound for the adversarial version, and the instance-dependent regret bound for the stochastic version. A version with full proofs (and additional results) appears at arxiv.org/abs/2006.12367v2.} }
Endnote
%0 Conference Paper %T Adaptive Discretization for Adversarial Lipschitz Bandits %A Chara Podimata %A Alex Slivkins %B Proceedings of Thirty Fourth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2021 %E Mikhail Belkin %E Samory Kpotufe %F pmlr-v134-podimata21a %I PMLR %P 3788--3805 %U https://proceedings.mlr.press/v134/podimata21a.html %V 134 %X Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces such as the [0,1] interval, where similar actions are guaranteed to have similar rewards. A central theme here is the adaptive discretization of the action space, which gradually "zooms in" on the more promising regions thereof. The goal is to take advantage of "nicer" problem instances, while retaining near-optimal worst-case performance. While the stochastic version of the problem is well-understood, the general version with adversarial rewards is not. We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds. In particular, we recover the worst-case optimal regret bound for the adversarial version, and the instance-dependent regret bound for the stochastic version. A version with full proofs (and additional results) appears at arxiv.org/abs/2006.12367v2.
APA
Podimata, C. & Slivkins, A.. (2021). Adaptive Discretization for Adversarial Lipschitz Bandits. Proceedings of Thirty Fourth Conference on Learning Theory, in Proceedings of Machine Learning Research 134:3788-3805 Available from https://proceedings.mlr.press/v134/podimata21a.html.

Related Material