No Oops, You Won’t Do It Again: Mechanisms for Self-correction in Crowdsourcing

Nihar Shah, Dengyong Zhou
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1-10, 2016.

Abstract

Crowdsourcing is a very popular means of obtaining the large amounts of labeled data that modern machine learning methods require. Although cheap and fast to obtain, crowdsourced labels suffer from significant amounts of error, thereby degrading the performance of downstream machine learning tasks. With the goal of improving the quality of the labeled data, we seek to mitigate the many errors that occur due to silly mistakes or inadvertent errors by crowdsourcing workers. We propose a two-stage setting for crowdsourcing where the worker first answers the questions, and is then allowed to change her answers after looking at a (noisy) reference answer. We mathematically formulate this process and develop mechanisms to incentivize workers to act appropriately. Our mathematical guarantees show that our mechanism incentivizes the workers to answer honestly in both stages, and refrain from answering randomly in the first stage or simply copying in the second. Numerical experiments reveal a significant boost in performance that such "self-correction" can provide when using crowdsourcing to train machine learning algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v48-shaha16, title = {No Oops, You Won't Do It Again: Mechanisms for Self-correction in Crowdsourcing}, author = {Shah, Nihar and Zhou, Dengyong}, booktitle = {Proceedings of The 33rd International Conference on Machine Learning}, pages = {1--10}, year = {2016}, editor = {Balcan, Maria Florina and Weinberger, Kilian Q.}, volume = {48}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {20--22 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v48/shaha16.pdf}, url = {https://proceedings.mlr.press/v48/shaha16.html}, abstract = {Crowdsourcing is a very popular means of obtaining the large amounts of labeled data that modern machine learning methods require. Although cheap and fast to obtain, crowdsourced labels suffer from significant amounts of error, thereby degrading the performance of downstream machine learning tasks. With the goal of improving the quality of the labeled data, we seek to mitigate the many errors that occur due to silly mistakes or inadvertent errors by crowdsourcing workers. We propose a two-stage setting for crowdsourcing where the worker first answers the questions, and is then allowed to change her answers after looking at a (noisy) reference answer. We mathematically formulate this process and develop mechanisms to incentivize workers to act appropriately. Our mathematical guarantees show that our mechanism incentivizes the workers to answer honestly in both stages, and refrain from answering randomly in the first stage or simply copying in the second. Numerical experiments reveal a significant boost in performance that such "self-correction" can provide when using crowdsourcing to train machine learning algorithms.} }
Endnote
%0 Conference Paper %T No Oops, You Won’t Do It Again: Mechanisms for Self-correction in Crowdsourcing %A Nihar Shah %A Dengyong Zhou %B Proceedings of The 33rd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Maria Florina Balcan %E Kilian Q. Weinberger %F pmlr-v48-shaha16 %I PMLR %P 1--10 %U https://proceedings.mlr.press/v48/shaha16.html %V 48 %X Crowdsourcing is a very popular means of obtaining the large amounts of labeled data that modern machine learning methods require. Although cheap and fast to obtain, crowdsourced labels suffer from significant amounts of error, thereby degrading the performance of downstream machine learning tasks. With the goal of improving the quality of the labeled data, we seek to mitigate the many errors that occur due to silly mistakes or inadvertent errors by crowdsourcing workers. We propose a two-stage setting for crowdsourcing where the worker first answers the questions, and is then allowed to change her answers after looking at a (noisy) reference answer. We mathematically formulate this process and develop mechanisms to incentivize workers to act appropriately. Our mathematical guarantees show that our mechanism incentivizes the workers to answer honestly in both stages, and refrain from answering randomly in the first stage or simply copying in the second. Numerical experiments reveal a significant boost in performance that such "self-correction" can provide when using crowdsourcing to train machine learning algorithms.
RIS
TY - CPAPER TI - No Oops, You Won’t Do It Again: Mechanisms for Self-correction in Crowdsourcing AU - Nihar Shah AU - Dengyong Zhou BT - Proceedings of The 33rd International Conference on Machine Learning DA - 2016/06/11 ED - Maria Florina Balcan ED - Kilian Q. Weinberger ID - pmlr-v48-shaha16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 48 SP - 1 EP - 10 L1 - http://proceedings.mlr.press/v48/shaha16.pdf UR - https://proceedings.mlr.press/v48/shaha16.html AB - Crowdsourcing is a very popular means of obtaining the large amounts of labeled data that modern machine learning methods require. Although cheap and fast to obtain, crowdsourced labels suffer from significant amounts of error, thereby degrading the performance of downstream machine learning tasks. With the goal of improving the quality of the labeled data, we seek to mitigate the many errors that occur due to silly mistakes or inadvertent errors by crowdsourcing workers. We propose a two-stage setting for crowdsourcing where the worker first answers the questions, and is then allowed to change her answers after looking at a (noisy) reference answer. We mathematically formulate this process and develop mechanisms to incentivize workers to act appropriately. Our mathematical guarantees show that our mechanism incentivizes the workers to answer honestly in both stages, and refrain from answering randomly in the first stage or simply copying in the second. Numerical experiments reveal a significant boost in performance that such "self-correction" can provide when using crowdsourcing to train machine learning algorithms. ER -
APA
Shah, N. & Zhou, D.. (2016). No Oops, You Won’t Do It Again: Mechanisms for Self-correction in Crowdsourcing. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:1-10 Available from https://proceedings.mlr.press/v48/shaha16.html.

Related Material