Efficient PAC Learning from the Crowd

Pranjal Awasthi; Avrim Blum; Nika Haghtalab; Yishay Mansour

Efficient PAC Learning from the Crowd

Pranjal Awasthi, Avrim Blum, Nika Haghtalab, Yishay Mansour

Proceedings of the 2017 Conference on Learning Theory, PMLR 65:127-150, 2017.

Abstract

In recent years crowdsourcing has become the method of choice for gathering labeled training data for learning algorithms. Standard approaches to crowdsourcing view the process of acquiring labeled data separately from the process of learning a classifier from the gathered data. This can give rise to computational and statistical challenges. For example, in most cases there are no known computationally efficient learning algorithms that are robust to the high level of noise that exists in crowdsourced data, and efforts to eliminate noise through voting often require a large number of queries per example. In this paper, we show how by interleaving the process of labeling and learning, we can attain computational efficiency with much less overhead in the labeling cost. In particular, we consider the \em realizable setting where there exists a true target function in $\mathcal{F}$ and consider a pool of labelers. When a noticeable fraction of the labelers are \emphperfect, and the rest behave arbitrarily, we show that any $\mathcal{F}$ that can be efficiently learned in the traditional \em realizable PAC model can be learned in a computationally efficient manner by querying the crowd, despite high amounts of noise in the responses. Moreover, we show that this can be done while each labeler only labels a constant number of examples and the number of labels requested per example, on average, is a constant. When no perfect labelers exist, a related task is to find a set of the labelers which are \emphgood but not perfect. We show that we can identify all good labelers, when at least the majority of labelers are good.

Cite this Paper

BibTeX

@InProceedings{pmlr-v65-awasthi17a,
  title = 	 {Efficient PAC Learning from the Crowd},
  author = 	 {Awasthi, Pranjal and Blum, Avrim and Haghtalab, Nika and Mansour, Yishay},
  booktitle = 	 {Proceedings of the 2017 Conference on Learning Theory},
  pages = 	 {127--150},
  year = 	 {2017},
  editor = 	 {Kale, Satyen and Shamir, Ohad},
  volume = 	 {65},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {07--10 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v65/awasthi17a/awasthi17a.pdf},
  url = 	 {https://proceedings.mlr.press/v65/awasthi17a.html},
  abstract = 	 {In recent years crowdsourcing has become the method of choice for gathering labeled training data for learning algorithms. Standard approaches to crowdsourcing view the process of acquiring labeled data separately from the process of learning a classifier from the gathered data. This can give rise to computational and statistical challenges. For example, in most cases there are no known computationally efficient learning algorithms that are robust to the high level of noise that exists in crowdsourced data, and efforts to eliminate noise through voting often require a large number of queries per example. In this paper, we show how by interleaving the process of labeling and learning, we can attain computational efficiency with much less overhead in the labeling cost. In particular, we consider the \em realizable setting where there exists a true target function in $\mathcal{F}$ and consider a pool of labelers. When a noticeable fraction of the labelers are \emphperfect, and the rest  behave arbitrarily, we show that any $\mathcal{F}$ that can be efficiently learned in the traditional \em realizable PAC model can be learned in a computationally efficient manner by querying the crowd, despite high amounts of noise in the responses. Moreover, we show that this can be done while each labeler only labels a constant number of examples and the number of labels requested per example, on average, is a constant. When no perfect labelers exist, a related task is to find a set of the labelers which are \emphgood but not perfect. We show that we can identify  all good labelers, when at least the majority of labelers are good. }
}

Endnote

%0 Conference Paper
%T Efficient PAC Learning from the Crowd
%A Pranjal Awasthi
%A Avrim Blum
%A Nika Haghtalab
%A Yishay Mansour
%B Proceedings of the 2017 Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2017
%E Satyen Kale
%E Ohad Shamir	
%F pmlr-v65-awasthi17a
%I PMLR
%P 127--150
%U https://proceedings.mlr.press/v65/awasthi17a.html
%V 65
%X In recent years crowdsourcing has become the method of choice for gathering labeled training data for learning algorithms. Standard approaches to crowdsourcing view the process of acquiring labeled data separately from the process of learning a classifier from the gathered data. This can give rise to computational and statistical challenges. For example, in most cases there are no known computationally efficient learning algorithms that are robust to the high level of noise that exists in crowdsourced data, and efforts to eliminate noise through voting often require a large number of queries per example. In this paper, we show how by interleaving the process of labeling and learning, we can attain computational efficiency with much less overhead in the labeling cost. In particular, we consider the \em realizable setting where there exists a true target function in $\mathcal{F}$ and consider a pool of labelers. When a noticeable fraction of the labelers are \emphperfect, and the rest  behave arbitrarily, we show that any $\mathcal{F}$ that can be efficiently learned in the traditional \em realizable PAC model can be learned in a computationally efficient manner by querying the crowd, despite high amounts of noise in the responses. Moreover, we show that this can be done while each labeler only labels a constant number of examples and the number of labels requested per example, on average, is a constant. When no perfect labelers exist, a related task is to find a set of the labelers which are \emphgood but not perfect. We show that we can identify  all good labelers, when at least the majority of labelers are good.

APA

Awasthi, P., Blum, A., Haghtalab, N. & Mansour, Y.. (2017). Efficient PAC Learning from the Crowd. Proceedings of the 2017 Conference on Learning Theory, in Proceedings of Machine Learning Research 65:127-150 Available from https://proceedings.mlr.press/v65/awasthi17a.html.

Related Material

Download PDF