When More is Less: Incorporating Additional Datasets Can Hurt Performance By Introducing Spurious Correlations

Hyungrok Do, Yuxin Chang, Yoon Sang Cho, Padhraic Smyth, Judy Zhong
Proceedings of the 8th Machine Learning for Healthcare Conference, PMLR 219:128-149, 2023.

Abstract

Survival analysis is a general framework for predicting the time until a specific event occurs, often in the presence of censoring. Although this framework is widely used in practice, few studies to date have considered fairness for time-to-event outcomes, despite recent significant advances in the algorithmic fairness literature more broadly. In this paper, we propose a framework to achieve demographic parity in survival analysis models by minimizing the mutual information between predicted time-to-event and sensitive attributes. We show that our approach effectively minimizes mutual information to encourage statistical independence of time-to-event predictions and sensitive attributes. Furthermore, we propose four types of disparity assessment metrics based on common survival analysis metrics. Through experiments on multiple benchmark datasets, we demonstrate that by minimizing the dependence between the prediction and the sensitive attributes, our method can systematically improve the fairness of survival predictions and is robust to censoring.

Cite this Paper


BibTeX
@InProceedings{pmlr-v219-do23a, title = {When More is Less: Incorporating Additional Datasets Can Hurt Performance By Introducing Spurious Correlations}, author = {Do, Hyungrok and Chang, Yuxin and Cho, Yoon Sang and Smyth, Padhraic and Zhong, Judy}, booktitle = {Proceedings of the 8th Machine Learning for Healthcare Conference}, pages = {128--149}, year = {2023}, editor = {Deshpande, Kaivalya and Fiterau, Madalina and Joshi, Shalmali and Lipton, Zachary and Ranganath, Rajesh and Urteaga, Iñigo and Yeung, Serene}, volume = {219}, series = {Proceedings of Machine Learning Research}, month = {11--12 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v219/do23a/do23a.pdf}, url = {https://proceedings.mlr.press/v219/do23a.html}, abstract = {Survival analysis is a general framework for predicting the time until a specific event occurs, often in the presence of censoring. Although this framework is widely used in practice, few studies to date have considered fairness for time-to-event outcomes, despite recent significant advances in the algorithmic fairness literature more broadly. In this paper, we propose a framework to achieve demographic parity in survival analysis models by minimizing the mutual information between predicted time-to-event and sensitive attributes. We show that our approach effectively minimizes mutual information to encourage statistical independence of time-to-event predictions and sensitive attributes. Furthermore, we propose four types of disparity assessment metrics based on common survival analysis metrics. Through experiments on multiple benchmark datasets, we demonstrate that by minimizing the dependence between the prediction and the sensitive attributes, our method can systematically improve the fairness of survival predictions and is robust to censoring.} }
Endnote
%0 Conference Paper %T When More is Less: Incorporating Additional Datasets Can Hurt Performance By Introducing Spurious Correlations %A Hyungrok Do %A Yuxin Chang %A Yoon Sang Cho %A Padhraic Smyth %A Judy Zhong %B Proceedings of the 8th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2023 %E Kaivalya Deshpande %E Madalina Fiterau %E Shalmali Joshi %E Zachary Lipton %E Rajesh Ranganath %E Iñigo Urteaga %E Serene Yeung %F pmlr-v219-do23a %I PMLR %P 128--149 %U https://proceedings.mlr.press/v219/do23a.html %V 219 %X Survival analysis is a general framework for predicting the time until a specific event occurs, often in the presence of censoring. Although this framework is widely used in practice, few studies to date have considered fairness for time-to-event outcomes, despite recent significant advances in the algorithmic fairness literature more broadly. In this paper, we propose a framework to achieve demographic parity in survival analysis models by minimizing the mutual information between predicted time-to-event and sensitive attributes. We show that our approach effectively minimizes mutual information to encourage statistical independence of time-to-event predictions and sensitive attributes. Furthermore, we propose four types of disparity assessment metrics based on common survival analysis metrics. Through experiments on multiple benchmark datasets, we demonstrate that by minimizing the dependence between the prediction and the sensitive attributes, our method can systematically improve the fairness of survival predictions and is robust to censoring.
APA
Do, H., Chang, Y., Cho, Y.S., Smyth, P. & Zhong, J.. (2023). When More is Less: Incorporating Additional Datasets Can Hurt Performance By Introducing Spurious Correlations. Proceedings of the 8th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 219:128-149 Available from https://proceedings.mlr.press/v219/do23a.html.

Related Material