[edit]
When Personalization Harms Performance: Reconsidering the Use of Group Attributes in Prediction
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:33209-33228, 2023.
Abstract
Machine learning models are often personalized with categorical attributes that define groups. In this work, we show that personalization with group attributes can inadvertently reduce performance at a group level – i.e., groups may receive unnecessarily inaccurate predictions by sharing their personal characteristics. We present formal conditions to ensure the fair use of group attributes in a prediction task, and describe how they can be checked by training one additional model. We characterize how fair use conditions be violated due to standard practices in model development, and study the prevalence of fair use violations in clinical prediction tasks. Our results show that personalization often fails to produce a tailored performance gain for every group who reports personal data, and underscore the need to evaluate fair use when personalizing models with characteristics that are protected, sensitive, self-reported, or costly to acquire.