On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits

Weitong Zhang, Jiafan He, Zhiyuan Fan, Quanquan Gu
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:41111-41132, 2023.

Abstract

We study linear contextual bandits in the misspecified setting, where the expected reward function can be approximated by a linear function class up to a bounded misspecification level $\zeta>0$. We propose an algorithm based on a novel data selection scheme, which only selects the contextual vectors with large uncertainty for online regression. We show that, when the misspecification level $\zeta$ is dominated by $\tilde O(\Delta / \sqrt{d})$ with $\Delta$ being the minimal sub-optimality gap and $d$ being the dimension of the contextual vectors, our algorithm enjoys the same gap-dependent regret bound $\tilde O ({d^2} /{\Delta})$ as in the well-specified setting up to logarithmic factors. Given this result, we show that the existing SupLinUCB algorithm (Chu et al., 2011) can also achieve a gap-dependent constant regret bound without the knowledge of sub-optimality gap $\Delta$. Together with a lower bound adapted from Lattimore et al. (2020), our result suggests an interplay between the misspecification level and the sub-optimality gap: (1) the linear contextual bandit model is efficiently learnable when $\zeta \leq \tilde O({\Delta} / \sqrt{d})$; and (2) it is not efficiently learnable when $\zeta \geq \tilde \Omega({\Delta} / {\sqrt{d}})$. Experiments on both synthetic and real-world datasets corroborate our theoretical results.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-zhang23n, title = {On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits}, author = {Zhang, Weitong and He, Jiafan and Fan, Zhiyuan and Gu, Quanquan}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {41111--41132}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/zhang23n/zhang23n.pdf}, url = {https://proceedings.mlr.press/v202/zhang23n.html}, abstract = {We study linear contextual bandits in the misspecified setting, where the expected reward function can be approximated by a linear function class up to a bounded misspecification level $\zeta>0$. We propose an algorithm based on a novel data selection scheme, which only selects the contextual vectors with large uncertainty for online regression. We show that, when the misspecification level $\zeta$ is dominated by $\tilde O(\Delta / \sqrt{d})$ with $\Delta$ being the minimal sub-optimality gap and $d$ being the dimension of the contextual vectors, our algorithm enjoys the same gap-dependent regret bound $\tilde O ({d^2} /{\Delta})$ as in the well-specified setting up to logarithmic factors. Given this result, we show that the existing SupLinUCB algorithm (Chu et al., 2011) can also achieve a gap-dependent constant regret bound without the knowledge of sub-optimality gap $\Delta$. Together with a lower bound adapted from Lattimore et al. (2020), our result suggests an interplay between the misspecification level and the sub-optimality gap: (1) the linear contextual bandit model is efficiently learnable when $\zeta \leq \tilde O({\Delta} / \sqrt{d})$; and (2) it is not efficiently learnable when $\zeta \geq \tilde \Omega({\Delta} / {\sqrt{d}})$. Experiments on both synthetic and real-world datasets corroborate our theoretical results.} }
Endnote
%0 Conference Paper %T On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits %A Weitong Zhang %A Jiafan He %A Zhiyuan Fan %A Quanquan Gu %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-zhang23n %I PMLR %P 41111--41132 %U https://proceedings.mlr.press/v202/zhang23n.html %V 202 %X We study linear contextual bandits in the misspecified setting, where the expected reward function can be approximated by a linear function class up to a bounded misspecification level $\zeta>0$. We propose an algorithm based on a novel data selection scheme, which only selects the contextual vectors with large uncertainty for online regression. We show that, when the misspecification level $\zeta$ is dominated by $\tilde O(\Delta / \sqrt{d})$ with $\Delta$ being the minimal sub-optimality gap and $d$ being the dimension of the contextual vectors, our algorithm enjoys the same gap-dependent regret bound $\tilde O ({d^2} /{\Delta})$ as in the well-specified setting up to logarithmic factors. Given this result, we show that the existing SupLinUCB algorithm (Chu et al., 2011) can also achieve a gap-dependent constant regret bound without the knowledge of sub-optimality gap $\Delta$. Together with a lower bound adapted from Lattimore et al. (2020), our result suggests an interplay between the misspecification level and the sub-optimality gap: (1) the linear contextual bandit model is efficiently learnable when $\zeta \leq \tilde O({\Delta} / \sqrt{d})$; and (2) it is not efficiently learnable when $\zeta \geq \tilde \Omega({\Delta} / {\sqrt{d}})$. Experiments on both synthetic and real-world datasets corroborate our theoretical results.
APA
Zhang, W., He, J., Fan, Z. & Gu, Q.. (2023). On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:41111-41132 Available from https://proceedings.mlr.press/v202/zhang23n.html.

Related Material