Label Distribution Propagation-based Label Completion for Crowdsourcing

Tong Wu, Liangxiao Jiang, Wenjun Zhang, Chaoqun Li
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:67369-67381, 2025.

Abstract

In real-world crowdsourcing scenarios, most workers often annotate a few instances only, which results in a significantly sparse crowdsourced label matrix and subsequently harms the performance of label integration algorithms. Recent work called worker similarity-based label completion (WSLC) has been proven to be an effective algorithm to addressing this issue. However, WSLC considers solely the correlation of the labels annotated by different workers on per individual instance while totally ignoring the correlation of the labels annotated by different workers among similar instances. To fill this gap, we propose a novel label distribution propagation-based label completion (LDPLC) algorithm. At first, we use worker similarity weighted majority voting to initialize a label distribution for each missing label. Then, we design a label distribution propagation algorithm to enable each missing label of each instance to iteratively absorb its neighbors’ label distributions. Finally, we complete each missing label based on its converged label distribution. Experimental results on both real-world and simulated crowdsourced datasets show that LDPLC significantly outperforms WSLC in enhancing the performance of label integration algorithms. Our codes and datasets are available at https://github.com/jiangliangxiao/LDPLC.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-wu25n, title = {Label Distribution Propagation-based Label Completion for Crowdsourcing}, author = {Wu, Tong and Jiang, Liangxiao and Zhang, Wenjun and Li, Chaoqun}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {67369--67381}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/wu25n/wu25n.pdf}, url = {https://proceedings.mlr.press/v267/wu25n.html}, abstract = {In real-world crowdsourcing scenarios, most workers often annotate a few instances only, which results in a significantly sparse crowdsourced label matrix and subsequently harms the performance of label integration algorithms. Recent work called worker similarity-based label completion (WSLC) has been proven to be an effective algorithm to addressing this issue. However, WSLC considers solely the correlation of the labels annotated by different workers on per individual instance while totally ignoring the correlation of the labels annotated by different workers among similar instances. To fill this gap, we propose a novel label distribution propagation-based label completion (LDPLC) algorithm. At first, we use worker similarity weighted majority voting to initialize a label distribution for each missing label. Then, we design a label distribution propagation algorithm to enable each missing label of each instance to iteratively absorb its neighbors’ label distributions. Finally, we complete each missing label based on its converged label distribution. Experimental results on both real-world and simulated crowdsourced datasets show that LDPLC significantly outperforms WSLC in enhancing the performance of label integration algorithms. Our codes and datasets are available at https://github.com/jiangliangxiao/LDPLC.} }
Endnote
%0 Conference Paper %T Label Distribution Propagation-based Label Completion for Crowdsourcing %A Tong Wu %A Liangxiao Jiang %A Wenjun Zhang %A Chaoqun Li %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-wu25n %I PMLR %P 67369--67381 %U https://proceedings.mlr.press/v267/wu25n.html %V 267 %X In real-world crowdsourcing scenarios, most workers often annotate a few instances only, which results in a significantly sparse crowdsourced label matrix and subsequently harms the performance of label integration algorithms. Recent work called worker similarity-based label completion (WSLC) has been proven to be an effective algorithm to addressing this issue. However, WSLC considers solely the correlation of the labels annotated by different workers on per individual instance while totally ignoring the correlation of the labels annotated by different workers among similar instances. To fill this gap, we propose a novel label distribution propagation-based label completion (LDPLC) algorithm. At first, we use worker similarity weighted majority voting to initialize a label distribution for each missing label. Then, we design a label distribution propagation algorithm to enable each missing label of each instance to iteratively absorb its neighbors’ label distributions. Finally, we complete each missing label based on its converged label distribution. Experimental results on both real-world and simulated crowdsourced datasets show that LDPLC significantly outperforms WSLC in enhancing the performance of label integration algorithms. Our codes and datasets are available at https://github.com/jiangliangxiao/LDPLC.
APA
Wu, T., Jiang, L., Zhang, W. & Li, C.. (2025). Label Distribution Propagation-based Label Completion for Crowdsourcing. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:67369-67381 Available from https://proceedings.mlr.press/v267/wu25n.html.

Related Material