No-Regret Linear Bandits beyond Realizability

Chong Liu; Ming Yin; Yu-Xiang Wang

No-Regret Linear Bandits beyond Realizability

Chong Liu, Ming Yin, Yu-Xiang Wang

Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:1294-1303, 2023.

Abstract

We study linear bandits when the underlying reward function is not linear. Existing work relies on a uniform misspecification parameter

$\epsilon$ that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever

$\epsilon > 0$ . We describe a more natural model of misspecification which only requires the approximation error at each input

$x$ to be proportional to the suboptimality gap at

$x$ . It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions. Quite surprisingly, we show that the classical LinUCB algorithm — designed for the realizable case — is automatically robust against such gap-adjusted misspecification. It achieves a near-optimal

$\sqrt{T}$ regret for problems that the best-known regret is almost linear in time horizon

$T$ . Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself.

Cite this Paper

BibTeX


@InProceedings{pmlr-v216-liu23c,
  title = 	 {No-Regret Linear Bandits beyond Realizability},
  author =       {Liu, Chong and Yin, Ming and Wang, Yu-Xiang},
  booktitle = 	 {Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence},
  pages = 	 {1294--1303},
  year = 	 {2023},
  editor = 	 {Evans, Robin J. and Shpitser, Ilya},
  volume = 	 {216},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {31 Jul--04 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v216/liu23c/liu23c.pdf},
  url = 	 {https://proceedings.mlr.press/v216/liu23c.html},
  abstract = 	 {We study linear bandits when the underlying reward function is not linear. Existing work relies on a uniform misspecification parameter $\epsilon$ that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever $\epsilon > 0$. We describe a more natural model of misspecification which only requires the approximation error at each input $x$ to be proportional to the suboptimality gap at $x$.  It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions. Quite surprisingly, we show that the classical LinUCB algorithm — designed for the realizable case — is automatically robust against such gap-adjusted misspecification.  It achieves a near-optimal $\sqrt{T}$ regret for problems that the best-known regret is almost linear in time horizon $T$. Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself.}
}

Endnote

%0 Conference Paper
%T No-Regret Linear Bandits beyond Realizability
%A Chong Liu
%A Ming Yin
%A Yu-Xiang Wang
%B Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2023
%E Robin J. Evans
%E Ilya Shpitser	
%F pmlr-v216-liu23c
%I PMLR
%P 1294--1303
%U https://proceedings.mlr.press/v216/liu23c.html
%V 216
%X We study linear bandits when the underlying reward function is not linear. Existing work relies on a uniform misspecification parameter $\epsilon$ that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever $\epsilon > 0$. We describe a more natural model of misspecification which only requires the approximation error at each input $x$ to be proportional to the suboptimality gap at $x$.  It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions. Quite surprisingly, we show that the classical LinUCB algorithm — designed for the realizable case — is automatically robust against such gap-adjusted misspecification.  It achieves a near-optimal $\sqrt{T}$ regret for problems that the best-known regret is almost linear in time horizon $T$. Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself.

APA


Liu, C., Yin, M. & Wang, Y.. (2023). No-Regret Linear Bandits beyond Realizability. Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 216:1294-1303 Available from https://proceedings.mlr.press/v216/liu23c.html.

No-Regret Linear Bandits beyond Realizability

Abstract

Cite this Paper

Related Material