Smooth Bandit Optimization: Generalization to Holder Space

Yusha Liu; Yining Wang; Aarti Singh

Smooth Bandit Optimization: Generalization to Holder Space

Yusha Liu, Yining Wang, Aarti Singh

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:2206-2214, 2021.

Abstract

We consider bandit optimization of a smooth reward function, where the goal is cumulative regret minimization. This problem has been studied for $\alpha$-Holder continuous (including Lipschitz) functions with $0<\alpha\leq 1$. Our main result is in generalization of the reward function to Holder space with exponent $\alpha>1$ to bridge the gap between Lipschitz bandits and infinitely-differentiable models such as linear bandits. For Holder continuous functions, approaches based on random sampling in bins of a discretized domain suffices as optimal. In contrast, we propose a class of two-layer algorithms that deploy misspecified linear/polynomial bandit algorithms in bins. We demonstrate that the proposed algorithm can exploit higher-order smoothness of the function by deriving a regret upper bound of $\tilde{O}(T^\frac{d+\alpha}{d+2\alpha})$ for when $\alpha>1$, which matches existing lower bound. We also study adaptation to unknown function smoothness over a continuous scale of Holder spaces indexed by $\alpha$, with a bandit model selection approach applied with our proposed two-layer algorithms. We show that it achieves regret rate that matches the existing lower bound for adaptation within the $\alpha\leq 1$ subset.

Cite this Paper

BibTeX


@InProceedings{pmlr-v130-liu21f,
  title = 	 { Smooth Bandit Optimization: Generalization to Holder Space },
  author =       {Liu, Yusha and Wang, Yining and Singh, Aarti},
  booktitle = 	 {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2206--2214},
  year = 	 {2021},
  editor = 	 {Banerjee, Arindam and Fukumizu, Kenji},
  volume = 	 {130},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--15 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v130/liu21f/liu21f.pdf},
  url = 	 {https://proceedings.mlr.press/v130/liu21f.html},
  abstract = 	 { We consider bandit optimization of a smooth reward function, where the goal is cumulative regret minimization. This problem has been studied for $\alpha$-Holder continuous (including Lipschitz) functions with $0<\alpha\leq 1$. Our main result is in generalization of the reward function to Holder space with exponent $\alpha>1$ to bridge the gap between Lipschitz bandits and infinitely-differentiable models such as linear bandits. For Holder continuous functions, approaches based on random sampling in bins of a discretized domain suffices as optimal. In contrast, we propose a class of two-layer algorithms that deploy misspecified linear/polynomial bandit algorithms in bins. We demonstrate that the proposed algorithm can exploit higher-order smoothness of the function by deriving a regret upper bound of $\tilde{O}(T^\frac{d+\alpha}{d+2\alpha})$ for when $\alpha>1$, which matches existing lower bound. We also study adaptation to unknown function smoothness over a continuous scale of Holder spaces indexed by $\alpha$, with a bandit model selection approach applied with our proposed two-layer algorithms. We show that it achieves regret rate that matches the existing lower bound for adaptation within the $\alpha\leq 1$ subset. }
}

Endnote

%0 Conference Paper
%T  Smooth Bandit Optimization: Generalization to Holder Space 
%A Yusha Liu
%A Yining Wang
%A Aarti Singh
%B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2021
%E Arindam Banerjee
%E Kenji Fukumizu	
%F pmlr-v130-liu21f
%I PMLR
%P 2206--2214
%U https://proceedings.mlr.press/v130/liu21f.html
%V 130
%X  We consider bandit optimization of a smooth reward function, where the goal is cumulative regret minimization. This problem has been studied for $\alpha$-Holder continuous (including Lipschitz) functions with $0<\alpha\leq 1$. Our main result is in generalization of the reward function to Holder space with exponent $\alpha>1$ to bridge the gap between Lipschitz bandits and infinitely-differentiable models such as linear bandits. For Holder continuous functions, approaches based on random sampling in bins of a discretized domain suffices as optimal. In contrast, we propose a class of two-layer algorithms that deploy misspecified linear/polynomial bandit algorithms in bins. We demonstrate that the proposed algorithm can exploit higher-order smoothness of the function by deriving a regret upper bound of $\tilde{O}(T^\frac{d+\alpha}{d+2\alpha})$ for when $\alpha>1$, which matches existing lower bound. We also study adaptation to unknown function smoothness over a continuous scale of Holder spaces indexed by $\alpha$, with a bandit model selection approach applied with our proposed two-layer algorithms. We show that it achieves regret rate that matches the existing lower bound for adaptation within the $\alpha\leq 1$ subset.

APA


Liu, Y., Wang, Y. & Singh, A.. (2021).  Smooth Bandit Optimization: Generalization to Holder Space . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:2206-2214 Available from https://proceedings.mlr.press/v130/liu21f.html.

Smooth Bandit Optimization: Generalization to Holder Space

Abstract

Cite this Paper

Related Material