PAC-Bayesian Offline Contextual Bandits With Guarantees

Otmane Sakhi; Pierre Alquier; Nicolas Chopin

PAC-Bayesian Offline Contextual Bandits With Guarantees

Otmane Sakhi, Pierre Alquier, Nicolas Chopin

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:29777-29799, 2023.

Abstract

This paper introduces a new principled approach for off-policy learning in contextual bandits. Unlike previous work, our approach does not derive learning principles from intractable or loose bounds. We analyse the problem through the PAC-Bayesian lens, interpreting policies as mixtures of decision rules. This allows us to propose novel generalization bounds and provide tractable algorithms to optimize them. We prove that the derived bounds are tighter than their competitors, and can be optimized directly to confidently improve upon the logging policy offline. Our approach learns policies with guarantees, uses all available data and does not require tuning additional hyperparameters on held-out sets. We demonstrate through extensive experiments the effectiveness of our approach in providing performance guarantees in practical scenarios.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-sakhi23a,
  title = 	 {{PAC}-{B}ayesian Offline Contextual Bandits With Guarantees},
  author =       {Sakhi, Otmane and Alquier, Pierre and Chopin, Nicolas},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {29777--29799},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/sakhi23a/sakhi23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/sakhi23a.html},
  abstract = 	 {This paper introduces a new principled approach for off-policy learning in contextual bandits. Unlike previous work, our approach does not derive learning principles from intractable or loose bounds. We analyse the problem through the PAC-Bayesian lens, interpreting policies as mixtures of decision rules. This allows us to propose novel generalization bounds and provide tractable algorithms to optimize them. We prove that the derived bounds are tighter than their competitors, and can be optimized directly to confidently improve upon the logging policy offline. Our approach learns policies with guarantees, uses all available data and does not require tuning additional hyperparameters on held-out sets. We demonstrate through extensive experiments the effectiveness of our approach in providing performance guarantees in practical scenarios.}
}

Endnote

%0 Conference Paper
%T PAC-Bayesian Offline Contextual Bandits With Guarantees
%A Otmane Sakhi
%A Pierre Alquier
%A Nicolas Chopin
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-sakhi23a
%I PMLR
%P 29777--29799
%U https://proceedings.mlr.press/v202/sakhi23a.html
%V 202
%X This paper introduces a new principled approach for off-policy learning in contextual bandits. Unlike previous work, our approach does not derive learning principles from intractable or loose bounds. We analyse the problem through the PAC-Bayesian lens, interpreting policies as mixtures of decision rules. This allows us to propose novel generalization bounds and provide tractable algorithms to optimize them. We prove that the derived bounds are tighter than their competitors, and can be optimized directly to confidently improve upon the logging policy offline. Our approach learns policies with guarantees, uses all available data and does not require tuning additional hyperparameters on held-out sets. We demonstrate through extensive experiments the effectiveness of our approach in providing performance guarantees in practical scenarios.

APA


Sakhi, O., Alquier, P. & Chopin, N.. (2023). PAC-Bayesian Offline Contextual Bandits With Guarantees. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:29777-29799 Available from https://proceedings.mlr.press/v202/sakhi23a.html.

PAC-Bayesian Offline Contextual Bandits With Guarantees

Abstract

Cite this Paper

Related Material