Bayesian Counterfactual Risk Minimization

Ben London; Ted Sandler

Bayesian Counterfactual Risk Minimization

Ben London, Ted Sandler

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:4125-4133, 2019.

Abstract

We present a Bayesian view of counterfactual risk minimization (CRM) for offline learning from logged bandit feedback. Using PAC-Bayesian analysis, we derive a new generalization bound for the truncated inverse propensity score estimator. We apply the bound to a class of Bayesian policies, which motivates a novel, potentially data-dependent, regularization technique for CRM. Experimental results indicate that this technique outperforms standard $L_2$ regularization, and that it is competitive with variance regularization while being both simpler to implement and more computationally efficient.

Cite this Paper

BibTeX

@InProceedings{pmlr-v97-london19a,
  title = 	 {{B}ayesian Counterfactual Risk Minimization},
  author =       {London, Ben and Sandler, Ted},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {4125--4133},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/london19a/london19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/london19a.html},
  abstract = 	 {We present a Bayesian view of counterfactual risk minimization (CRM) for offline learning from logged bandit feedback. Using PAC-Bayesian analysis, we derive a new generalization bound for the truncated inverse propensity score estimator. We apply the bound to a class of Bayesian policies, which motivates a novel, potentially data-dependent, regularization technique for CRM. Experimental results indicate that this technique outperforms standard $L_2$ regularization, and that it is competitive with variance regularization while being both simpler to implement and more computationally efficient.}
}

Endnote

%0 Conference Paper
%T Bayesian Counterfactual Risk Minimization
%A Ben London
%A Ted Sandler
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-london19a
%I PMLR
%P 4125--4133
%U https://proceedings.mlr.press/v97/london19a.html
%V 97
%X We present a Bayesian view of counterfactual risk minimization (CRM) for offline learning from logged bandit feedback. Using PAC-Bayesian analysis, we derive a new generalization bound for the truncated inverse propensity score estimator. We apply the bound to a class of Bayesian policies, which motivates a novel, potentially data-dependent, regularization technique for CRM. Experimental results indicate that this technique outperforms standard $L_2$ regularization, and that it is competitive with variance regularization while being both simpler to implement and more computationally efficient.

APA

London, B. & Sandler, T.. (2019). Bayesian Counterfactual Risk Minimization. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:4125-4133 Available from https://proceedings.mlr.press/v97/london19a.html.

Bayesian Counterfactual Risk Minimization

Abstract

Cite this Paper

Related Material