Unbiased Multi-Label Learning from Crowdsourced Annotations

Mingxuan Xia, Zenan Huang, Runze Wu, Gengyu Lyu, Junbo Zhao, Gang Chen, Haobo Wang
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:54064-54081, 2024.

Abstract

This work studies the novel Crowdsourced Multi-Label Learning (CMLL) problem, where each instance is related to multiple true labels but the model only receives unreliable labels from different annotators. Although a few Crowdsourced Multi-Label Inference (CMLI) methods have been developed, they require both the training and testing sets to be assigned crowdsourced labels and focus on true label inferring rather than prediction, making them less practical. In this paper, by excavating the generation process of crowdsourced labels, we establish the first unbiased risk estimator for CMLL based on the crowdsourced transition matrices. To facilitate transition matrix estimation, we upgrade our unbiased risk estimator by aggregating crowdsourced labels and transition matrices from all annotators while guaranteeing its theoretical characteristics. Integrating with the unbiased risk estimator, we further propose a decoupled autoencoder framework to exploit label correlations and boost performance. We also provide a generalization error bound to ensure the convergence of the empirical risk estimator. Experiments on various CMLL scenarios demonstrate the effectiveness of our proposed method. The source code is available at https://github.com/MingxuanXia/CLEAR.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-xia24a, title = {Unbiased Multi-Label Learning from Crowdsourced Annotations}, author = {Xia, Mingxuan and Huang, Zenan and Wu, Runze and Lyu, Gengyu and Zhao, Junbo and Chen, Gang and Wang, Haobo}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {54064--54081}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/xia24a/xia24a.pdf}, url = {https://proceedings.mlr.press/v235/xia24a.html}, abstract = {This work studies the novel Crowdsourced Multi-Label Learning (CMLL) problem, where each instance is related to multiple true labels but the model only receives unreliable labels from different annotators. Although a few Crowdsourced Multi-Label Inference (CMLI) methods have been developed, they require both the training and testing sets to be assigned crowdsourced labels and focus on true label inferring rather than prediction, making them less practical. In this paper, by excavating the generation process of crowdsourced labels, we establish the first unbiased risk estimator for CMLL based on the crowdsourced transition matrices. To facilitate transition matrix estimation, we upgrade our unbiased risk estimator by aggregating crowdsourced labels and transition matrices from all annotators while guaranteeing its theoretical characteristics. Integrating with the unbiased risk estimator, we further propose a decoupled autoencoder framework to exploit label correlations and boost performance. We also provide a generalization error bound to ensure the convergence of the empirical risk estimator. Experiments on various CMLL scenarios demonstrate the effectiveness of our proposed method. The source code is available at https://github.com/MingxuanXia/CLEAR.} }
Endnote
%0 Conference Paper %T Unbiased Multi-Label Learning from Crowdsourced Annotations %A Mingxuan Xia %A Zenan Huang %A Runze Wu %A Gengyu Lyu %A Junbo Zhao %A Gang Chen %A Haobo Wang %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-xia24a %I PMLR %P 54064--54081 %U https://proceedings.mlr.press/v235/xia24a.html %V 235 %X This work studies the novel Crowdsourced Multi-Label Learning (CMLL) problem, where each instance is related to multiple true labels but the model only receives unreliable labels from different annotators. Although a few Crowdsourced Multi-Label Inference (CMLI) methods have been developed, they require both the training and testing sets to be assigned crowdsourced labels and focus on true label inferring rather than prediction, making them less practical. In this paper, by excavating the generation process of crowdsourced labels, we establish the first unbiased risk estimator for CMLL based on the crowdsourced transition matrices. To facilitate transition matrix estimation, we upgrade our unbiased risk estimator by aggregating crowdsourced labels and transition matrices from all annotators while guaranteeing its theoretical characteristics. Integrating with the unbiased risk estimator, we further propose a decoupled autoencoder framework to exploit label correlations and boost performance. We also provide a generalization error bound to ensure the convergence of the empirical risk estimator. Experiments on various CMLL scenarios demonstrate the effectiveness of our proposed method. The source code is available at https://github.com/MingxuanXia/CLEAR.
APA
Xia, M., Huang, Z., Wu, R., Lyu, G., Zhao, J., Chen, G. & Wang, H.. (2024). Unbiased Multi-Label Learning from Crowdsourced Annotations. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:54064-54081 Available from https://proceedings.mlr.press/v235/xia24a.html.

Related Material