Pareto Optimal Model Selection in Linear Bandits

Yinglun Zhu, Robert Nowak
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:6793-6813, 2022.

Abstract

We study model selection in linear bandits, where the learner must adapt to the dimension (denoted by $d_\star$) of the smallest hypothesis class containing the true linear model while balancing exploration and exploitation. Previous papers provide various guarantees for this model selection problem, but have limitations; i.e., the analysis requires favorable conditions that allow for inexpensive statistical testing to locate the right hypothesis class or are based on the idea of “corralling” multiple base algorithms, which often performs relatively poorly in practice. These works also mainly focus on upper bounds. In this paper, we establish the first lower bound for the model selection problem. Our lower bound implies that, even with a fixed action set, adaptation to the unknown dimension $d_\star$ comes at a cost: There is no algorithm that can achieve the regret bound $\widetilde{O}(\sqrt{d_\star T})$ simultaneously for all values of $d_\star$. We propose Pareto optimal algorithms that match the lower bound. Empirical evaluations show that our algorithm enjoys superior performance compared to existing ones.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-zhu22g, title = { Pareto Optimal Model Selection in Linear Bandits }, author = {Zhu, Yinglun and Nowak, Robert}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {6793--6813}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/zhu22g/zhu22g.pdf}, url = {https://proceedings.mlr.press/v151/zhu22g.html}, abstract = { We study model selection in linear bandits, where the learner must adapt to the dimension (denoted by $d_\star$) of the smallest hypothesis class containing the true linear model while balancing exploration and exploitation. Previous papers provide various guarantees for this model selection problem, but have limitations; i.e., the analysis requires favorable conditions that allow for inexpensive statistical testing to locate the right hypothesis class or are based on the idea of “corralling” multiple base algorithms, which often performs relatively poorly in practice. These works also mainly focus on upper bounds. In this paper, we establish the first lower bound for the model selection problem. Our lower bound implies that, even with a fixed action set, adaptation to the unknown dimension $d_\star$ comes at a cost: There is no algorithm that can achieve the regret bound $\widetilde{O}(\sqrt{d_\star T})$ simultaneously for all values of $d_\star$. We propose Pareto optimal algorithms that match the lower bound. Empirical evaluations show that our algorithm enjoys superior performance compared to existing ones. } }
Endnote
%0 Conference Paper %T Pareto Optimal Model Selection in Linear Bandits %A Yinglun Zhu %A Robert Nowak %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-zhu22g %I PMLR %P 6793--6813 %U https://proceedings.mlr.press/v151/zhu22g.html %V 151 %X We study model selection in linear bandits, where the learner must adapt to the dimension (denoted by $d_\star$) of the smallest hypothesis class containing the true linear model while balancing exploration and exploitation. Previous papers provide various guarantees for this model selection problem, but have limitations; i.e., the analysis requires favorable conditions that allow for inexpensive statistical testing to locate the right hypothesis class or are based on the idea of “corralling” multiple base algorithms, which often performs relatively poorly in practice. These works also mainly focus on upper bounds. In this paper, we establish the first lower bound for the model selection problem. Our lower bound implies that, even with a fixed action set, adaptation to the unknown dimension $d_\star$ comes at a cost: There is no algorithm that can achieve the regret bound $\widetilde{O}(\sqrt{d_\star T})$ simultaneously for all values of $d_\star$. We propose Pareto optimal algorithms that match the lower bound. Empirical evaluations show that our algorithm enjoys superior performance compared to existing ones.
APA
Zhu, Y. & Nowak, R.. (2022). Pareto Optimal Model Selection in Linear Bandits . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:6793-6813 Available from https://proceedings.mlr.press/v151/zhu22g.html.

Related Material