When Personalization Harms Performance: Reconsidering the Use of Group Attributes in Prediction

Vinith Menon Suriyakumar, Marzyeh Ghassemi, Berk Ustun
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:33209-33228, 2023.

Abstract

Machine learning models are often personalized with categorical attributes that define groups. In this work, we show that personalization with group attributes can inadvertently reduce performance at a group level – i.e., groups may receive unnecessarily inaccurate predictions by sharing their personal characteristics. We present formal conditions to ensure the fair use of group attributes in a prediction task, and describe how they can be checked by training one additional model. We characterize how fair use conditions be violated due to standard practices in model development, and study the prevalence of fair use violations in clinical prediction tasks. Our results show that personalization often fails to produce a tailored performance gain for every group who reports personal data, and underscore the need to evaluate fair use when personalizing models with characteristics that are protected, sensitive, self-reported, or costly to acquire.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-suriyakumar23a, title = {When Personalization Harms Performance: Reconsidering the Use of Group Attributes in Prediction}, author = {Suriyakumar, Vinith Menon and Ghassemi, Marzyeh and Ustun, Berk}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {33209--33228}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/suriyakumar23a/suriyakumar23a.pdf}, url = {https://proceedings.mlr.press/v202/suriyakumar23a.html}, abstract = {Machine learning models are often personalized with categorical attributes that define groups. In this work, we show that personalization with group attributes can inadvertently reduce performance at a group level – i.e., groups may receive unnecessarily inaccurate predictions by sharing their personal characteristics. We present formal conditions to ensure the fair use of group attributes in a prediction task, and describe how they can be checked by training one additional model. We characterize how fair use conditions be violated due to standard practices in model development, and study the prevalence of fair use violations in clinical prediction tasks. Our results show that personalization often fails to produce a tailored performance gain for every group who reports personal data, and underscore the need to evaluate fair use when personalizing models with characteristics that are protected, sensitive, self-reported, or costly to acquire.} }
Endnote
%0 Conference Paper %T When Personalization Harms Performance: Reconsidering the Use of Group Attributes in Prediction %A Vinith Menon Suriyakumar %A Marzyeh Ghassemi %A Berk Ustun %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-suriyakumar23a %I PMLR %P 33209--33228 %U https://proceedings.mlr.press/v202/suriyakumar23a.html %V 202 %X Machine learning models are often personalized with categorical attributes that define groups. In this work, we show that personalization with group attributes can inadvertently reduce performance at a group level – i.e., groups may receive unnecessarily inaccurate predictions by sharing their personal characteristics. We present formal conditions to ensure the fair use of group attributes in a prediction task, and describe how they can be checked by training one additional model. We characterize how fair use conditions be violated due to standard practices in model development, and study the prevalence of fair use violations in clinical prediction tasks. Our results show that personalization often fails to produce a tailored performance gain for every group who reports personal data, and underscore the need to evaluate fair use when personalizing models with characteristics that are protected, sensitive, self-reported, or costly to acquire.
APA
Suriyakumar, V.M., Ghassemi, M. & Ustun, B.. (2023). When Personalization Harms Performance: Reconsidering the Use of Group Attributes in Prediction. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:33209-33228 Available from https://proceedings.mlr.press/v202/suriyakumar23a.html.

Related Material