Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

Imad Aouali; Victor-Emmanuel Brunel; David Rohde; Anna Korba

Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

Imad Aouali, Victor-Emmanuel Brunel, David Rohde, Anna Korba

Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:136-144, 2025.

Abstract

In interactive systems, actions are often correlated, presenting an opportunity for more sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We introduce a unified Bayesian framework to capture these correlations through structured and informative priors. In this framework, we propose sDM, a generic Bayesian approach for OPE and OPL, grounded in both algorithmic and theoretical foundations. Notably, sDM leverages action correlations without compromising computational efficiency. Moreover, inspired by online Bayesian bandits, we introduce Bayesian metrics that assess the average performance of algorithms across multiple problem instances, deviating from the conventional worst-case assessments. We analyze sDM in OPE and OPL, highlighting the benefits of leveraging action correlations. Empirical evidence showcases the strong performance of sDM.

Cite this Paper

BibTeX

@InProceedings{pmlr-v258-aouali25a,
  title = 	 {Bayesian Off-Policy Evaluation and Learning for Large Action Spaces},
  author =       {Aouali, Imad and Brunel, Victor-Emmanuel and Rohde, David and Korba, Anna},
  booktitle = 	 {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {136--144},
  year = 	 {2025},
  editor = 	 {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz},
  volume = 	 {258},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {03--05 May},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v258/main/assets/aouali25a/aouali25a.pdf},
  url = 	 {https://proceedings.mlr.press/v258/aouali25a.html},
  abstract = 	 {In interactive systems, actions are often correlated, presenting an opportunity for more sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We introduce a unified Bayesian framework to capture these correlations through structured and informative priors. In this framework, we propose sDM, a generic Bayesian approach for OPE and OPL, grounded in both algorithmic and theoretical foundations. Notably, sDM leverages action correlations without compromising computational efficiency. Moreover, inspired by online Bayesian bandits, we introduce Bayesian metrics that assess the average performance of algorithms across multiple problem instances, deviating from the conventional worst-case assessments. We analyze sDM in OPE and OPL, highlighting the benefits of leveraging action correlations. Empirical evidence showcases the strong performance of sDM.}
}

Endnote

%0 Conference Paper
%T Bayesian Off-Policy Evaluation and Learning for Large Action Spaces
%A Imad Aouali
%A Victor-Emmanuel Brunel
%A David Rohde
%A Anna Korba
%B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2025
%E Yingzhen Li
%E Stephan Mandt
%E Shipra Agrawal
%E Emtiyaz Khan	
%F pmlr-v258-aouali25a
%I PMLR
%P 136--144
%U https://proceedings.mlr.press/v258/aouali25a.html
%V 258
%X In interactive systems, actions are often correlated, presenting an opportunity for more sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We introduce a unified Bayesian framework to capture these correlations through structured and informative priors. In this framework, we propose sDM, a generic Bayesian approach for OPE and OPL, grounded in both algorithmic and theoretical foundations. Notably, sDM leverages action correlations without compromising computational efficiency. Moreover, inspired by online Bayesian bandits, we introduce Bayesian metrics that assess the average performance of algorithms across multiple problem instances, deviating from the conventional worst-case assessments. We analyze sDM in OPE and OPL, highlighting the benefits of leveraging action correlations. Empirical evidence showcases the strong performance of sDM.

APA

Aouali, I., Brunel, V., Rohde, D. & Korba, A.. (2025). Bayesian Off-Policy Evaluation and Learning for Large Action Spaces. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:136-144 Available from https://proceedings.mlr.press/v258/aouali25a.html.

Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

Abstract

Cite this Paper

Related Material