Bayesian Counterfactual Risk Minimization

Ben London, Ted Sandler
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:4125-4133, 2019.

Abstract

We present a Bayesian view of counterfactual risk minimization (CRM) for offline learning from logged bandit feedback. Using PAC-Bayesian analysis, we derive a new generalization bound for the truncated inverse propensity score estimator. We apply the bound to a class of Bayesian policies, which motivates a novel, potentially data-dependent, regularization technique for CRM. Experimental results indicate that this technique outperforms standard $L_2$ regularization, and that it is competitive with variance regularization while being both simpler to implement and more computationally efficient.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-london19a, title = {{B}ayesian Counterfactual Risk Minimization}, author = {London, Ben and Sandler, Ted}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {4125--4133}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/london19a/london19a.pdf}, url = {https://proceedings.mlr.press/v97/london19a.html}, abstract = {We present a Bayesian view of counterfactual risk minimization (CRM) for offline learning from logged bandit feedback. Using PAC-Bayesian analysis, we derive a new generalization bound for the truncated inverse propensity score estimator. We apply the bound to a class of Bayesian policies, which motivates a novel, potentially data-dependent, regularization technique for CRM. Experimental results indicate that this technique outperforms standard $L_2$ regularization, and that it is competitive with variance regularization while being both simpler to implement and more computationally efficient.} }
Endnote
%0 Conference Paper %T Bayesian Counterfactual Risk Minimization %A Ben London %A Ted Sandler %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-london19a %I PMLR %P 4125--4133 %U https://proceedings.mlr.press/v97/london19a.html %V 97 %X We present a Bayesian view of counterfactual risk minimization (CRM) for offline learning from logged bandit feedback. Using PAC-Bayesian analysis, we derive a new generalization bound for the truncated inverse propensity score estimator. We apply the bound to a class of Bayesian policies, which motivates a novel, potentially data-dependent, regularization technique for CRM. Experimental results indicate that this technique outperforms standard $L_2$ regularization, and that it is competitive with variance regularization while being both simpler to implement and more computationally efficient.
APA
London, B. & Sandler, T.. (2019). Bayesian Counterfactual Risk Minimization. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:4125-4133 Available from https://proceedings.mlr.press/v97/london19a.html.

Related Material