Exploiting Worker Correlation for Label Aggregation in Crowdsourcing

Yuan Li, Benjamin Rubinstein, Trevor Cohn
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:3886-3895, 2019.

Abstract

Crowdsourcing has emerged as a core component of data science pipelines. From collected noisy worker labels, aggregation models that incorporate worker reliability parameters aim to infer a latent true annotation. In this paper, we argue that existing crowdsourcing approaches do not sufficiently model worker correlations observed in practical settings; we propose in response an enhanced Bayesian classifier combination (EBCC) model, with inference based on a mean-field variational approach. An introduced mixture of intra-class reliabilities—connected to tensor decomposition and item clustering—induces inter-worker correlation. EBCC does not suffer the limitations of existing correlation models: intractable marginalisation of missing labels and poor scaling to large worker cohorts. Extensive empirical comparison on 17 real-world datasets sees EBCC achieving the highest mean accuracy across 10 benchmark crowdsourcing methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-li19i, title = {Exploiting Worker Correlation for Label Aggregation in Crowdsourcing}, author = {Li, Yuan and Rubinstein, Benjamin and Cohn, Trevor}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {3886--3895}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/li19i/li19i.pdf}, url = {https://proceedings.mlr.press/v97/li19i.html}, abstract = {Crowdsourcing has emerged as a core component of data science pipelines. From collected noisy worker labels, aggregation models that incorporate worker reliability parameters aim to infer a latent true annotation. In this paper, we argue that existing crowdsourcing approaches do not sufficiently model worker correlations observed in practical settings; we propose in response an enhanced Bayesian classifier combination (EBCC) model, with inference based on a mean-field variational approach. An introduced mixture of intra-class reliabilities—connected to tensor decomposition and item clustering—induces inter-worker correlation. EBCC does not suffer the limitations of existing correlation models: intractable marginalisation of missing labels and poor scaling to large worker cohorts. Extensive empirical comparison on 17 real-world datasets sees EBCC achieving the highest mean accuracy across 10 benchmark crowdsourcing methods.} }
Endnote
%0 Conference Paper %T Exploiting Worker Correlation for Label Aggregation in Crowdsourcing %A Yuan Li %A Benjamin Rubinstein %A Trevor Cohn %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-li19i %I PMLR %P 3886--3895 %U https://proceedings.mlr.press/v97/li19i.html %V 97 %X Crowdsourcing has emerged as a core component of data science pipelines. From collected noisy worker labels, aggregation models that incorporate worker reliability parameters aim to infer a latent true annotation. In this paper, we argue that existing crowdsourcing approaches do not sufficiently model worker correlations observed in practical settings; we propose in response an enhanced Bayesian classifier combination (EBCC) model, with inference based on a mean-field variational approach. An introduced mixture of intra-class reliabilities—connected to tensor decomposition and item clustering—induces inter-worker correlation. EBCC does not suffer the limitations of existing correlation models: intractable marginalisation of missing labels and poor scaling to large worker cohorts. Extensive empirical comparison on 17 real-world datasets sees EBCC achieving the highest mean accuracy across 10 benchmark crowdsourcing methods.
APA
Li, Y., Rubinstein, B. & Cohn, T.. (2019). Exploiting Worker Correlation for Label Aggregation in Crowdsourcing. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:3886-3895 Available from https://proceedings.mlr.press/v97/li19i.html.

Related Material