Leveraging an Alignment Set in Tackling Instance-Dependent Label Noise

Donna Tjandra, Jenna Wiens
Proceedings of the Conference on Health, Inference, and Learning, PMLR 209:477-497, 2023.

Abstract

Noisy training labels can hurt model performance. Most approaches that aim to address label noise assume label noise is independent from the input features. In practice, however, label noise is often feature or \textit{instance-dependent}, and therefore biased (i.e., some instances are more likely to be mislabeled than others). E.g., in clinical care, female patients are more likely to be under-diagnosed for cardiovascular disease compared to male patients. Approaches that ignore this dependence can produce models with poor discriminative performance, and in many healthcare settings, can exacerbate issues around health disparities. In light of these limitations, we propose a two-stage approach to learn in the presence instance-dependent label noise. Our approach utilizes \textit{\anchor points}, a small subset of data for which we know the observed and ground truth labels. On several tasks, our approach leads to consistent improvements over the state-of-the-art in discriminative performance (AUROC) while mitigating bias (area under the equalized odds curve, AUEOC). For example, when predicting acute respiratory failure onset on the MIMIC-III dataset, our approach achieves a harmonic mean (AUROC and AUEOC) of 0.84 (SD [standard deviation] 0.01) while that of the next best baseline is 0.81 (SD 0.01). Overall, our approach improves accuracy while mitigating potential bias compared to existing approaches in the presence of instance-dependent label noise.

Cite this Paper


BibTeX
@InProceedings{pmlr-v209-tjandra23a, title = {Leveraging an Alignment Set in Tackling Instance-Dependent Label Noise}, author = {Tjandra, Donna and Wiens, Jenna}, booktitle = {Proceedings of the Conference on Health, Inference, and Learning}, pages = {477--497}, year = {2023}, editor = {Mortazavi, Bobak J. and Sarker, Tasmie and Beam, Andrew and Ho, Joyce C.}, volume = {209}, series = {Proceedings of Machine Learning Research}, month = {22 Jun--24 Jun}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v209/tjandra23a/tjandra23a.pdf}, url = {https://proceedings.mlr.press/v209/tjandra23a.html}, abstract = {Noisy training labels can hurt model performance. Most approaches that aim to address label noise assume label noise is independent from the input features. In practice, however, label noise is often feature or \textit{instance-dependent}, and therefore biased (i.e., some instances are more likely to be mislabeled than others). E.g., in clinical care, female patients are more likely to be under-diagnosed for cardiovascular disease compared to male patients. Approaches that ignore this dependence can produce models with poor discriminative performance, and in many healthcare settings, can exacerbate issues around health disparities. In light of these limitations, we propose a two-stage approach to learn in the presence instance-dependent label noise. Our approach utilizes \textit{\anchor points}, a small subset of data for which we know the observed and ground truth labels. On several tasks, our approach leads to consistent improvements over the state-of-the-art in discriminative performance (AUROC) while mitigating bias (area under the equalized odds curve, AUEOC). For example, when predicting acute respiratory failure onset on the MIMIC-III dataset, our approach achieves a harmonic mean (AUROC and AUEOC) of 0.84 (SD [standard deviation] 0.01) while that of the next best baseline is 0.81 (SD 0.01). Overall, our approach improves accuracy while mitigating potential bias compared to existing approaches in the presence of instance-dependent label noise.} }
Endnote
%0 Conference Paper %T Leveraging an Alignment Set in Tackling Instance-Dependent Label Noise %A Donna Tjandra %A Jenna Wiens %B Proceedings of the Conference on Health, Inference, and Learning %C Proceedings of Machine Learning Research %D 2023 %E Bobak J. Mortazavi %E Tasmie Sarker %E Andrew Beam %E Joyce C. Ho %F pmlr-v209-tjandra23a %I PMLR %P 477--497 %U https://proceedings.mlr.press/v209/tjandra23a.html %V 209 %X Noisy training labels can hurt model performance. Most approaches that aim to address label noise assume label noise is independent from the input features. In practice, however, label noise is often feature or \textit{instance-dependent}, and therefore biased (i.e., some instances are more likely to be mislabeled than others). E.g., in clinical care, female patients are more likely to be under-diagnosed for cardiovascular disease compared to male patients. Approaches that ignore this dependence can produce models with poor discriminative performance, and in many healthcare settings, can exacerbate issues around health disparities. In light of these limitations, we propose a two-stage approach to learn in the presence instance-dependent label noise. Our approach utilizes \textit{\anchor points}, a small subset of data for which we know the observed and ground truth labels. On several tasks, our approach leads to consistent improvements over the state-of-the-art in discriminative performance (AUROC) while mitigating bias (area under the equalized odds curve, AUEOC). For example, when predicting acute respiratory failure onset on the MIMIC-III dataset, our approach achieves a harmonic mean (AUROC and AUEOC) of 0.84 (SD [standard deviation] 0.01) while that of the next best baseline is 0.81 (SD 0.01). Overall, our approach improves accuracy while mitigating potential bias compared to existing approaches in the presence of instance-dependent label noise.
APA
Tjandra, D. & Wiens, J.. (2023). Leveraging an Alignment Set in Tackling Instance-Dependent Label Noise. Proceedings of the Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 209:477-497 Available from https://proceedings.mlr.press/v209/tjandra23a.html.

Related Material