[edit]
Medicine After Death: XAI and Algorithmic Fairness Under Label Bias
Proceedings of Fourth European Workshop on Algorithmic Fairness, PMLR 294:171-186, 2025.
Abstract
Trustworthy AI methods like algorithmic fairness and explainable artificial intelligence (XAI) are becoming increasingly important in the fields of machine learning and artificial intelligence (AI). Yet, while we use trustworthy AI tools to investigate the robustness of AI models, we rarely consider that the AI tools themselves are also models, that can also fail. In this paper, we present a case study highlighting how algorithmic fairness and XAI can lead to incorrect interpretation and bias mitigation when the underlying data suffers from systematic label bias.
Label bias is common in crucial application domains such as healthcare or welfare AI – a well known example being diseases that are underdiagnosed in certain demographic groups. In practice, moreover, the real labels are often inaccessible – consider e.g. mental diseases such as major depressive disorder. Without access to true labels, it becomes challenging to estimate the magnitude of the bias. Prior work has documented well how label bias can propagate into biased predictive models, but the question of how (undetected) label bias affects Trustworthy AI tools remains unexplored. We design a case study using the well-known COMPAS dataset, which actually comes with two real sets of labels: One which is known to be highly biased, and one which is a well-accepted proxy for the underlying effect. This enables us to study label bias in a realistic way. We show how label bias leads to incorrect diagnosis of algorithmic bias, as well as incorrect mitigation. Also, we show that using XAI on models trained on biased labels highlights different important features than when training the same models on unbiased labels. When the label bias is unknown to the user, this can lead to incorrect interpretation of what causes different outcomes.
In conclusion, we find that trustworthy AI in the face of label bias acts as a "medicine-after-death" (MAD) process, that addresses symptoms rather than the root causes of bias and is therefore ineffective at solving the problem.