Optimistic Information Directed Sampling

Gergely Neu; Matteo Papini; Ludovic Schwartz

Optimistic Information Directed Sampling

Gergely Neu, Matteo Papini, Ludovic Schwartz

Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:3970-4006, 2024.

Abstract

We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class. We propose a new analytic framework for this setting that bridges the Bayesian theory of information-directed sampling due to Russo and Van Roy (2018) and the worst-case theory of Foster et al. (2021) based on the decision-estimation coefficient. Drawing from both lines of work, we propose a algorithmic template called Optimistic Information-Directed Sampling and show that it can achieve instance-dependent regret guarantees similar to the ones achievable by the classic Bayesian IDS method, but with the major advantage of not requiring any Bayesian assumptions. The key technical innovation of our analysis is introducing an optimistic surrogate model for the regret and using it to define a frequentist version of the Information Ratio of Russo and Van Roy (2018), and a less conservative version of the Decision Estimation Coefficient of Foster et al. (2021).

Cite this Paper

BibTeX


@InProceedings{pmlr-v247-neu24a,
  title = 	 {Optimistic Information Directed Sampling},
  author =       {Neu, Gergely and Papini, Matteo and Schwartz, Ludovic},
  booktitle = 	 {Proceedings of Thirty Seventh Conference on Learning Theory},
  pages = 	 {3970--4006},
  year = 	 {2024},
  editor = 	 {Agrawal, Shipra and Roth, Aaron},
  volume = 	 {247},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v247/neu24a/neu24a.pdf},
  url = 	 {https://proceedings.mlr.press/v247/neu24a.html},
  abstract = 	 {We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class. We propose a new analytic framework for this setting that bridges the Bayesian theory of information-directed sampling due to Russo and Van Roy (2018) and the worst-case theory of Foster et al. (2021) based on the decision-estimation coefficient. Drawing from both lines of work, we propose a algorithmic template called Optimistic Information-Directed Sampling and show that it can achieve instance-dependent regret guarantees similar to the ones achievable by the classic Bayesian IDS method, but with the major advantage of not requiring any Bayesian assumptions. The key technical innovation of our analysis is introducing an optimistic surrogate model for the regret and using it to define a frequentist version of the Information Ratio of Russo and Van Roy (2018), and a less conservative version of the Decision Estimation Coefficient of Foster et al. (2021).}
}

Endnote

%0 Conference Paper
%T Optimistic Information Directed Sampling
%A Gergely Neu
%A Matteo Papini
%A Ludovic Schwartz
%B Proceedings of Thirty Seventh Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2024
%E Shipra Agrawal
%E Aaron Roth	
%F pmlr-v247-neu24a
%I PMLR
%P 3970--4006
%U https://proceedings.mlr.press/v247/neu24a.html
%V 247
%X We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class. We propose a new analytic framework for this setting that bridges the Bayesian theory of information-directed sampling due to Russo and Van Roy (2018) and the worst-case theory of Foster et al. (2021) based on the decision-estimation coefficient. Drawing from both lines of work, we propose a algorithmic template called Optimistic Information-Directed Sampling and show that it can achieve instance-dependent regret guarantees similar to the ones achievable by the classic Bayesian IDS method, but with the major advantage of not requiring any Bayesian assumptions. The key technical innovation of our analysis is introducing an optimistic surrogate model for the regret and using it to define a frequentist version of the Information Ratio of Russo and Van Roy (2018), and a less conservative version of the Decision Estimation Coefficient of Foster et al. (2021).

APA


Neu, G., Papini, M. & Schwartz, L.. (2024). Optimistic Information Directed Sampling. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:3970-4006 Available from https://proceedings.mlr.press/v247/neu24a.html.

Related Material

Download PDF