Rethinking Causal Ranking: A Balanced Perspective on Uplift Model Evaluation

Minqin Zhu; Zexu Sun; Ruoxuan Xiong; Anpeng Wu; Baohong Li; Caizhi Tang; Jun Zhou; Fei Wu; Kun Kuang

Rethinking Causal Ranking: A Balanced Perspective on Uplift Model Evaluation

Minqin Zhu, Zexu Sun, Ruoxuan Xiong, Anpeng Wu, Baohong Li, Caizhi Tang, Jun Zhou, Fei Wu, Kun Kuang

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:80134-80154, 2025.

Abstract

Uplift modeling is crucial for identifying individuals likely to respond to a treatment in applications like marketing and customer retention, but evaluating these models is challenging due to the inaccessibility of counterfactual outcomes in real-world settings. In this paper, we identify a fundamental limitation in existing evaluation metrics, such as the uplift and Qini curves, which fail to rank individuals with binary negative outcomes accurately. This can lead to biased evaluations, where biased models receive higher curve values than unbiased ones, resulting in suboptimal model selection. To address this, we propose the Principled Uplift Curve (PUC), a novel evaluation metric that assigns equal curve values of individuals with both positive and negative binary outcomes, offering a more balanced and unbiased assessment. We then derive the Principled Uplift Loss (PUL) function from the PUC and integrate it into a new uplift model, the Principled Treatment and Outcome Network (PTONet), to reduce bias during uplift model training. Experiments on both simulated and real-world datasets demonstrate that the PUC provides less biased evaluations, while PTONet outperforms existing methods. The source code is available at: https://github.com/euzmin/PUC.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-zhu25s,
  title = 	 {Rethinking Causal Ranking: A Balanced Perspective on Uplift Model Evaluation},
  author =       {Zhu, Minqin and Sun, Zexu and Xiong, Ruoxuan and Wu, Anpeng and Li, Baohong and Tang, Caizhi and Zhou, Jun and Wu, Fei and Kuang, Kun},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {80134--80154},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zhu25s/zhu25s.pdf},
  url = 	 {https://proceedings.mlr.press/v267/zhu25s.html},
  abstract = 	 {Uplift modeling is crucial for identifying individuals likely to respond to a treatment in applications like marketing and customer retention, but evaluating these models is challenging due to the inaccessibility of counterfactual outcomes in real-world settings. In this paper, we identify a fundamental limitation in existing evaluation metrics, such as the uplift and Qini curves, which fail to rank individuals with binary negative outcomes accurately. This can lead to biased evaluations, where biased models receive higher curve values than unbiased ones, resulting in suboptimal model selection. To address this, we propose the Principled Uplift Curve (PUC), a novel evaluation metric that assigns equal curve values of individuals with both positive and negative binary outcomes, offering a more balanced and unbiased assessment. We then derive the Principled Uplift Loss (PUL) function from the PUC and integrate it into a new uplift model, the Principled Treatment and Outcome Network (PTONet), to reduce bias during uplift model training. Experiments on both simulated and real-world datasets demonstrate that the PUC provides less biased evaluations, while PTONet outperforms existing methods. The source code is available at: https://github.com/euzmin/PUC.}
}

Endnote

%0 Conference Paper
%T Rethinking Causal Ranking: A Balanced Perspective on Uplift Model Evaluation
%A Minqin Zhu
%A Zexu Sun
%A Ruoxuan Xiong
%A Anpeng Wu
%A Baohong Li
%A Caizhi Tang
%A Jun Zhou
%A Fei Wu
%A Kun Kuang
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-zhu25s
%I PMLR
%P 80134--80154
%U https://proceedings.mlr.press/v267/zhu25s.html
%V 267
%X Uplift modeling is crucial for identifying individuals likely to respond to a treatment in applications like marketing and customer retention, but evaluating these models is challenging due to the inaccessibility of counterfactual outcomes in real-world settings. In this paper, we identify a fundamental limitation in existing evaluation metrics, such as the uplift and Qini curves, which fail to rank individuals with binary negative outcomes accurately. This can lead to biased evaluations, where biased models receive higher curve values than unbiased ones, resulting in suboptimal model selection. To address this, we propose the Principled Uplift Curve (PUC), a novel evaluation metric that assigns equal curve values of individuals with both positive and negative binary outcomes, offering a more balanced and unbiased assessment. We then derive the Principled Uplift Loss (PUL) function from the PUC and integrate it into a new uplift model, the Principled Treatment and Outcome Network (PTONet), to reduce bias during uplift model training. Experiments on both simulated and real-world datasets demonstrate that the PUC provides less biased evaluations, while PTONet outperforms existing methods. The source code is available at: https://github.com/euzmin/PUC.

APA

Zhu, M., Sun, Z., Xiong, R., Wu, A., Li, B., Tang, C., Zhou, J., Wu, F. & Kuang, K.. (2025). Rethinking Causal Ranking: A Balanced Perspective on Uplift Model Evaluation. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:80134-80154 Available from https://proceedings.mlr.press/v267/zhu25s.html.

Rethinking Causal Ranking: A Balanced Perspective on Uplift Model Evaluation

Abstract

Cite this Paper

Related Material