Exact Exponent in Optimal Rates for Crowdsourcing

Chao Gao; Yu Lu; Dengyong Zhou

Exact Exponent in Optimal Rates for Crowdsourcing

Chao Gao, Yu Lu, Dengyong Zhou

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:603-611, 2016.

Abstract

Crowdsourcing has become a popular tool for labeling large datasets. This paper studies the optimal error rate for aggregating crowdsourced labels provided by a collection of amateur workers. Under the Dawid-Skene probabilistic model, we establish matching upper and lower bounds with an exact exponent mI(\pi), where m is the number of workers and I(\pi) is the average Chernoff information that characterizes the workers’ collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement m \ge \frac1I(\pi)\log\frac1ε in order to achieve an εmisclassification error. In addition, our results imply optimality of various forms of EM algorithms given accurate initializers of the model parameters.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-gaoa16,
  title = 	 {Exact Exponent in Optimal Rates for Crowdsourcing},
  author = 	 {Gao, Chao and Lu, Yu and Zhou, Dengyong},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {603--611},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/gaoa16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/gaoa16.html},
  abstract = 	 {Crowdsourcing has become a popular tool for labeling large datasets. This paper studies the optimal error rate for aggregating crowdsourced labels provided by a collection of amateur workers. Under the Dawid-Skene probabilistic model, we establish matching upper and lower bounds with an exact exponent mI(\pi), where m is the number of workers and I(\pi) is the average Chernoff information that characterizes the workers’ collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement m \ge \frac1I(\pi)\log\frac1ε in order to achieve an εmisclassification error. In addition, our results imply optimality of various forms of EM algorithms given accurate initializers of the model parameters.}
}

Endnote

%0 Conference Paper
%T Exact Exponent in Optimal Rates for Crowdsourcing
%A Chao Gao
%A Yu Lu
%A Dengyong Zhou
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-gaoa16
%I PMLR
%P 603--611
%U https://proceedings.mlr.press/v48/gaoa16.html
%V 48
%X Crowdsourcing has become a popular tool for labeling large datasets. This paper studies the optimal error rate for aggregating crowdsourced labels provided by a collection of amateur workers. Under the Dawid-Skene probabilistic model, we establish matching upper and lower bounds with an exact exponent mI(\pi), where m is the number of workers and I(\pi) is the average Chernoff information that characterizes the workers’ collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement m \ge \frac1I(\pi)\log\frac1ε in order to achieve an εmisclassification error. In addition, our results imply optimality of various forms of EM algorithms given accurate initializers of the model parameters.

RIS


TY  - CPAPER
TI  - Exact Exponent in Optimal Rates for Crowdsourcing
AU  - Chao Gao
AU  - Yu Lu
AU  - Dengyong Zhou
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-gaoa16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 603
EP  - 611
L1  - http://proceedings.mlr.press/v48/gaoa16.pdf
UR  - https://proceedings.mlr.press/v48/gaoa16.html
AB  - Crowdsourcing has become a popular tool for labeling large datasets. This paper studies the optimal error rate for aggregating crowdsourced labels provided by a collection of amateur workers. Under the Dawid-Skene probabilistic model, we establish matching upper and lower bounds with an exact exponent mI(\pi), where m is the number of workers and I(\pi) is the average Chernoff information that characterizes the workers’ collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement m \ge \frac1I(\pi)\log\frac1ε in order to achieve an εmisclassification error. In addition, our results imply optimality of various forms of EM algorithms given accurate initializers of the model parameters.
ER  -

APA


Gao, C., Lu, Y. & Zhou, D.. (2016). Exact Exponent in Optimal Rates for Crowdsourcing. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:603-611 Available from https://proceedings.mlr.press/v48/gaoa16.html.

Exact Exponent in Optimal Rates for Crowdsourcing

Abstract

Cite this Paper

Related Material