Exact Exponent in Optimal Rates for Crowdsourcing

Chao Gao, Yu Lu, Dengyong Zhou
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:603-611, 2016.

Abstract

Crowdsourcing has become a popular tool for labeling large datasets. This paper studies the optimal error rate for aggregating crowdsourced labels provided by a collection of amateur workers. Under the Dawid-Skene probabilistic model, we establish matching upper and lower bounds with an exact exponent mI(\pi), where m is the number of workers and I(\pi) is the average Chernoff information that characterizes the workers’ collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement m \ge \frac1I(\pi)\log\frac1ε in order to achieve an εmisclassification error. In addition, our results imply optimality of various forms of EM algorithms given accurate initializers of the model parameters.

Cite this Paper


BibTeX
@InProceedings{pmlr-v48-gaoa16, title = {Exact Exponent in Optimal Rates for Crowdsourcing}, author = {Gao, Chao and Lu, Yu and Zhou, Dengyong}, booktitle = {Proceedings of The 33rd International Conference on Machine Learning}, pages = {603--611}, year = {2016}, editor = {Balcan, Maria Florina and Weinberger, Kilian Q.}, volume = {48}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {20--22 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v48/gaoa16.pdf}, url = {https://proceedings.mlr.press/v48/gaoa16.html}, abstract = {Crowdsourcing has become a popular tool for labeling large datasets. This paper studies the optimal error rate for aggregating crowdsourced labels provided by a collection of amateur workers. Under the Dawid-Skene probabilistic model, we establish matching upper and lower bounds with an exact exponent mI(\pi), where m is the number of workers and I(\pi) is the average Chernoff information that characterizes the workers’ collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement m \ge \frac1I(\pi)\log\frac1ε in order to achieve an εmisclassification error. In addition, our results imply optimality of various forms of EM algorithms given accurate initializers of the model parameters.} }
Endnote
%0 Conference Paper %T Exact Exponent in Optimal Rates for Crowdsourcing %A Chao Gao %A Yu Lu %A Dengyong Zhou %B Proceedings of The 33rd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Maria Florina Balcan %E Kilian Q. Weinberger %F pmlr-v48-gaoa16 %I PMLR %P 603--611 %U https://proceedings.mlr.press/v48/gaoa16.html %V 48 %X Crowdsourcing has become a popular tool for labeling large datasets. This paper studies the optimal error rate for aggregating crowdsourced labels provided by a collection of amateur workers. Under the Dawid-Skene probabilistic model, we establish matching upper and lower bounds with an exact exponent mI(\pi), where m is the number of workers and I(\pi) is the average Chernoff information that characterizes the workers’ collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement m \ge \frac1I(\pi)\log\frac1ε in order to achieve an εmisclassification error. In addition, our results imply optimality of various forms of EM algorithms given accurate initializers of the model parameters.
RIS
TY - CPAPER TI - Exact Exponent in Optimal Rates for Crowdsourcing AU - Chao Gao AU - Yu Lu AU - Dengyong Zhou BT - Proceedings of The 33rd International Conference on Machine Learning DA - 2016/06/11 ED - Maria Florina Balcan ED - Kilian Q. Weinberger ID - pmlr-v48-gaoa16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 48 SP - 603 EP - 611 L1 - http://proceedings.mlr.press/v48/gaoa16.pdf UR - https://proceedings.mlr.press/v48/gaoa16.html AB - Crowdsourcing has become a popular tool for labeling large datasets. This paper studies the optimal error rate for aggregating crowdsourced labels provided by a collection of amateur workers. Under the Dawid-Skene probabilistic model, we establish matching upper and lower bounds with an exact exponent mI(\pi), where m is the number of workers and I(\pi) is the average Chernoff information that characterizes the workers’ collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement m \ge \frac1I(\pi)\log\frac1ε in order to achieve an εmisclassification error. In addition, our results imply optimality of various forms of EM algorithms given accurate initializers of the model parameters. ER -
APA
Gao, C., Lu, Y. & Zhou, D.. (2016). Exact Exponent in Optimal Rates for Crowdsourcing. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:603-611 Available from https://proceedings.mlr.press/v48/gaoa16.html.

Related Material