Clustering Context in Off-Policy Evaluation

Daniel Guzman Olivares; Philipp Schmidt; Jacek Golebiowski; Artur Bekasov

Clustering Context in Off-Policy Evaluation

Daniel Guzman Olivares, Philipp Schmidt, Jacek Golebiowski, Artur Bekasov

Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:5194-5202, 2025.

Abstract

Off-policy evaluation can leverage logged data to estimate the effectiveness of new policies in e-commerce, search engines, media streaming services, or automatic diagnostic tools in healthcare. However, the performance of baseline off-policy estimators like IPS deteriorates when the logging policy significantly differs from the evaluation policy. Recent work proposes sharing information across similar actions to mitigate this problem. In this work, we propose an alternative estimator that shares information across similar contexts using clustering. We study the theoretical properties of the proposed estimator, characterizing its bias and variance under different conditions. We also compare the performance of the proposed estimator and existing approaches in various synthetic problems, as well as a real-world recommendation dataset. Our experimental results confirm that clustering contexts improves estimation accuracy, especially in deficient information settings.

Cite this Paper

BibTeX

@InProceedings{pmlr-v258-olivares25a,
  title = 	 {Clustering Context in Off-Policy Evaluation},
  author =       {Olivares, Daniel Guzman and Schmidt, Philipp and Golebiowski, Jacek and Bekasov, Artur},
  booktitle = 	 {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {5194--5202},
  year = 	 {2025},
  editor = 	 {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz},
  volume = 	 {258},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {03--05 May},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v258/main/assets/olivares25a/olivares25a.pdf},
  url = 	 {https://proceedings.mlr.press/v258/olivares25a.html},
  abstract = 	 {Off-policy evaluation can leverage logged data to estimate the effectiveness of new policies in e-commerce, search engines, media streaming services, or automatic diagnostic tools in healthcare. However, the performance of baseline off-policy estimators like IPS deteriorates when the logging policy significantly differs from the evaluation policy. Recent work proposes sharing information across similar actions to mitigate this problem. In this work, we propose an alternative estimator that shares information across similar contexts using clustering. We study the theoretical properties of the proposed estimator, characterizing its bias and variance under different conditions. We also compare the performance of the proposed estimator and existing approaches in various synthetic problems, as well as a real-world recommendation dataset. Our experimental results confirm that clustering contexts improves estimation accuracy, especially in deficient information settings.}
}

Endnote

%0 Conference Paper
%T Clustering Context in Off-Policy Evaluation
%A Daniel Guzman Olivares
%A Philipp Schmidt
%A Jacek Golebiowski
%A Artur Bekasov
%B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2025
%E Yingzhen Li
%E Stephan Mandt
%E Shipra Agrawal
%E Emtiyaz Khan	
%F pmlr-v258-olivares25a
%I PMLR
%P 5194--5202
%U https://proceedings.mlr.press/v258/olivares25a.html
%V 258
%X Off-policy evaluation can leverage logged data to estimate the effectiveness of new policies in e-commerce, search engines, media streaming services, or automatic diagnostic tools in healthcare. However, the performance of baseline off-policy estimators like IPS deteriorates when the logging policy significantly differs from the evaluation policy. Recent work proposes sharing information across similar actions to mitigate this problem. In this work, we propose an alternative estimator that shares information across similar contexts using clustering. We study the theoretical properties of the proposed estimator, characterizing its bias and variance under different conditions. We also compare the performance of the proposed estimator and existing approaches in various synthetic problems, as well as a real-world recommendation dataset. Our experimental results confirm that clustering contexts improves estimation accuracy, especially in deficient information settings.

APA

Olivares, D.G., Schmidt, P., Golebiowski, J. & Bekasov, A.. (2025). Clustering Context in Off-Policy Evaluation. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:5194-5202 Available from https://proceedings.mlr.press/v258/olivares25a.html.

Clustering Context in Off-Policy Evaluation

Abstract

Cite this Paper

Related Material