Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model

Hideaki Imamura, Issei Sato, Masashi Sugiyama
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2147-2156, 2018.

Abstract

While crowdsourcing has become an important means to label data, there is great interest in estimating the ground truth from unreliable labels produced by crowdworkers. The Dawid and Skene (DS) model is one of the most well-known models in the study of crowdsourcing. Despite its practical popularity, theoretical error analysis for the DS model has been conducted only under restrictive assumptions on class priors, confusion matrices, or the number of labels each worker provides. In this paper, we derive a minimax error rate under more practical setting for a broader class of crowdsourcing models including the DS model as a special case. We further propose the worker clustering model, which is more practical than the DS model under real crowdsourcing settings. The wide applicability of our theoretical analysis allows us to immediately investigate the behavior of this proposed model, which can not be analyzed by existing studies. Experimental results showed that there is a strong similarity between the lower bound of the minimax error rate derived by our theoretical analysis and the empirical error of the estimated value.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-imamura18a, title = {Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model}, author = {Imamura, Hideaki and Sato, Issei and Sugiyama, Masashi}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {2147--2156}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/imamura18a/imamura18a.pdf}, url = {https://proceedings.mlr.press/v80/imamura18a.html}, abstract = {While crowdsourcing has become an important means to label data, there is great interest in estimating the ground truth from unreliable labels produced by crowdworkers. The Dawid and Skene (DS) model is one of the most well-known models in the study of crowdsourcing. Despite its practical popularity, theoretical error analysis for the DS model has been conducted only under restrictive assumptions on class priors, confusion matrices, or the number of labels each worker provides. In this paper, we derive a minimax error rate under more practical setting for a broader class of crowdsourcing models including the DS model as a special case. We further propose the worker clustering model, which is more practical than the DS model under real crowdsourcing settings. The wide applicability of our theoretical analysis allows us to immediately investigate the behavior of this proposed model, which can not be analyzed by existing studies. Experimental results showed that there is a strong similarity between the lower bound of the minimax error rate derived by our theoretical analysis and the empirical error of the estimated value.} }
Endnote
%0 Conference Paper %T Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model %A Hideaki Imamura %A Issei Sato %A Masashi Sugiyama %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-imamura18a %I PMLR %P 2147--2156 %U https://proceedings.mlr.press/v80/imamura18a.html %V 80 %X While crowdsourcing has become an important means to label data, there is great interest in estimating the ground truth from unreliable labels produced by crowdworkers. The Dawid and Skene (DS) model is one of the most well-known models in the study of crowdsourcing. Despite its practical popularity, theoretical error analysis for the DS model has been conducted only under restrictive assumptions on class priors, confusion matrices, or the number of labels each worker provides. In this paper, we derive a minimax error rate under more practical setting for a broader class of crowdsourcing models including the DS model as a special case. We further propose the worker clustering model, which is more practical than the DS model under real crowdsourcing settings. The wide applicability of our theoretical analysis allows us to immediately investigate the behavior of this proposed model, which can not be analyzed by existing studies. Experimental results showed that there is a strong similarity between the lower bound of the minimax error rate derived by our theoretical analysis and the empirical error of the estimated value.
APA
Imamura, H., Sato, I. & Sugiyama, M.. (2018). Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:2147-2156 Available from https://proceedings.mlr.press/v80/imamura18a.html.

Related Material