Balanced Off-Policy Evaluation for Personalized Pricing

Adam Elmachtoub, Vishal Gupta, Yunfan Zhao
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:10901-10917, 2023.

Abstract

We consider a personalized pricing problem in which we have data consisting of feature information, historical pricing decisions, and binary realized demand. The goal is to perform off-policy evaluation for a new personalized pricing policy that maps features to prices. Methods based on inverse propensity weighting (including doubly robust methods) for off-policy evaluation may perform poorly when the logging policy has little exploration or is deterministic, which is common in pricing applications. Building on the balanced policy evaluation framework of Kallus (2018), we propose a new approach tailored to pricing applications. The key idea is to compute an estimate that minimizes the worst-case mean squared error or maximizes a worst-case lower bound on policy performance, where in both cases the worst-case is taken with respect to a set of possible revenue functions. We establish theoretical convergence guarantees and empirically demonstrate the advantage of our approach using a real-world pricing dataset.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-elmachtoub23a, title = {Balanced Off-Policy Evaluation for Personalized Pricing}, author = {Elmachtoub, Adam and Gupta, Vishal and Zhao, Yunfan}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {10901--10917}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/elmachtoub23a/elmachtoub23a.pdf}, url = {https://proceedings.mlr.press/v206/elmachtoub23a.html}, abstract = {We consider a personalized pricing problem in which we have data consisting of feature information, historical pricing decisions, and binary realized demand. The goal is to perform off-policy evaluation for a new personalized pricing policy that maps features to prices. Methods based on inverse propensity weighting (including doubly robust methods) for off-policy evaluation may perform poorly when the logging policy has little exploration or is deterministic, which is common in pricing applications. Building on the balanced policy evaluation framework of Kallus (2018), we propose a new approach tailored to pricing applications. The key idea is to compute an estimate that minimizes the worst-case mean squared error or maximizes a worst-case lower bound on policy performance, where in both cases the worst-case is taken with respect to a set of possible revenue functions. We establish theoretical convergence guarantees and empirically demonstrate the advantage of our approach using a real-world pricing dataset.} }
Endnote
%0 Conference Paper %T Balanced Off-Policy Evaluation for Personalized Pricing %A Adam Elmachtoub %A Vishal Gupta %A Yunfan Zhao %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-elmachtoub23a %I PMLR %P 10901--10917 %U https://proceedings.mlr.press/v206/elmachtoub23a.html %V 206 %X We consider a personalized pricing problem in which we have data consisting of feature information, historical pricing decisions, and binary realized demand. The goal is to perform off-policy evaluation for a new personalized pricing policy that maps features to prices. Methods based on inverse propensity weighting (including doubly robust methods) for off-policy evaluation may perform poorly when the logging policy has little exploration or is deterministic, which is common in pricing applications. Building on the balanced policy evaluation framework of Kallus (2018), we propose a new approach tailored to pricing applications. The key idea is to compute an estimate that minimizes the worst-case mean squared error or maximizes a worst-case lower bound on policy performance, where in both cases the worst-case is taken with respect to a set of possible revenue functions. We establish theoretical convergence guarantees and empirically demonstrate the advantage of our approach using a real-world pricing dataset.
APA
Elmachtoub, A., Gupta, V. & Zhao, Y.. (2023). Balanced Off-Policy Evaluation for Personalized Pricing. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:10901-10917 Available from https://proceedings.mlr.press/v206/elmachtoub23a.html.

Related Material