Exploiting Worker Correlation for Label Aggregation in Crowdsourcing

Yuan Li; Benjamin Rubinstein; Trevor Cohn

Exploiting Worker Correlation for Label Aggregation in Crowdsourcing

Yuan Li, Benjamin Rubinstein, Trevor Cohn

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:3886-3895, 2019.

Abstract

Crowdsourcing has emerged as a core component of data science pipelines. From collected noisy worker labels, aggregation models that incorporate worker reliability parameters aim to infer a latent true annotation. In this paper, we argue that existing crowdsourcing approaches do not sufficiently model worker correlations observed in practical settings; we propose in response an enhanced Bayesian classifier combination (EBCC) model, with inference based on a mean-field variational approach. An introduced mixture of intra-class reliabilities—connected to tensor decomposition and item clustering—induces inter-worker correlation. EBCC does not suffer the limitations of existing correlation models: intractable marginalisation of missing labels and poor scaling to large worker cohorts. Extensive empirical comparison on 17 real-world datasets sees EBCC achieving the highest mean accuracy across 10 benchmark crowdsourcing methods.

Cite this Paper

BibTeX

@InProceedings{pmlr-v97-li19i,
  title = 	 {Exploiting Worker Correlation for Label Aggregation in Crowdsourcing},
  author =       {Li, Yuan and Rubinstein, Benjamin and Cohn, Trevor},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {3886--3895},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/li19i/li19i.pdf},
  url = 	 {https://proceedings.mlr.press/v97/li19i.html},
  abstract = 	 {Crowdsourcing has emerged as a core component of data science pipelines. From collected noisy worker labels, aggregation models that incorporate worker reliability parameters aim to infer a latent true annotation. In this paper, we argue that existing crowdsourcing approaches do not sufficiently model worker correlations observed in practical settings; we propose in response an enhanced Bayesian classifier combination (EBCC) model, with inference based on a mean-field variational approach. An introduced mixture of intra-class reliabilities—connected to tensor decomposition and item clustering—induces inter-worker correlation. EBCC does not suffer the limitations of existing correlation models: intractable marginalisation of missing labels and poor scaling to large worker cohorts. Extensive empirical comparison on 17 real-world datasets sees EBCC achieving the highest mean accuracy across 10 benchmark crowdsourcing methods.}
}

Endnote

%0 Conference Paper
%T Exploiting Worker Correlation for Label Aggregation in Crowdsourcing
%A Yuan Li
%A Benjamin Rubinstein
%A Trevor Cohn
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-li19i
%I PMLR
%P 3886--3895
%U https://proceedings.mlr.press/v97/li19i.html
%V 97
%X Crowdsourcing has emerged as a core component of data science pipelines. From collected noisy worker labels, aggregation models that incorporate worker reliability parameters aim to infer a latent true annotation. In this paper, we argue that existing crowdsourcing approaches do not sufficiently model worker correlations observed in practical settings; we propose in response an enhanced Bayesian classifier combination (EBCC) model, with inference based on a mean-field variational approach. An introduced mixture of intra-class reliabilities—connected to tensor decomposition and item clustering—induces inter-worker correlation. EBCC does not suffer the limitations of existing correlation models: intractable marginalisation of missing labels and poor scaling to large worker cohorts. Extensive empirical comparison on 17 real-world datasets sees EBCC achieving the highest mean accuracy across 10 benchmark crowdsourcing methods.

APA

Li, Y., Rubinstein, B. & Cohn, T.. (2019). Exploiting Worker Correlation for Label Aggregation in Crowdsourcing. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:3886-3895 Available from https://proceedings.mlr.press/v97/li19i.html.

Exploiting Worker Correlation for Label Aggregation in Crowdsourcing

Abstract

Cite this Paper

Related Material