Bayesian Counterfactual Risk Minimization

Ben London, Ted Sandler
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:4125-4133, 2019.

Abstract

We present a Bayesian view of counterfactual risk minimization (CRM) for offline learning from logged bandit feedback. Using PAC-Bayesian analysis, we derive a new generalization bound for the truncated inverse propensity score estimator. We apply the bound to a class of Bayesian policies, which motivates a novel, potentially data-dependent, regularization technique for CRM. Experimental results indicate that this technique outperforms standard L2 regularization, and that it is competitive with variance regularization while being both simpler to implement and more computationally efficient.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-london19a, title = {{B}ayesian Counterfactual Risk Minimization}, author = {London, Ben and Sandler, Ted}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {4125--4133}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/london19a/london19a.pdf}, url = {https://proceedings.mlr.press/v97/london19a.html}, abstract = {We present a Bayesian view of counterfactual risk minimization (CRM) for offline learning from logged bandit feedback. Using PAC-Bayesian analysis, we derive a new generalization bound for the truncated inverse propensity score estimator. We apply the bound to a class of Bayesian policies, which motivates a novel, potentially data-dependent, regularization technique for CRM. Experimental results indicate that this technique outperforms standard $L_2$ regularization, and that it is competitive with variance regularization while being both simpler to implement and more computationally efficient.} }
Endnote
%0 Conference Paper %T Bayesian Counterfactual Risk Minimization %A Ben London %A Ted Sandler %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-london19a %I PMLR %P 4125--4133 %U https://proceedings.mlr.press/v97/london19a.html %V 97 %X We present a Bayesian view of counterfactual risk minimization (CRM) for offline learning from logged bandit feedback. Using PAC-Bayesian analysis, we derive a new generalization bound for the truncated inverse propensity score estimator. We apply the bound to a class of Bayesian policies, which motivates a novel, potentially data-dependent, regularization technique for CRM. Experimental results indicate that this technique outperforms standard $L_2$ regularization, and that it is competitive with variance regularization while being both simpler to implement and more computationally efficient.
APA
London, B. & Sandler, T.. (2019). Bayesian Counterfactual Risk Minimization. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:4125-4133 Available from https://proceedings.mlr.press/v97/london19a.html.

Related Material