Semi-Verified PAC Learning from the Crowd

Shiwei Zeng; Jie Shen

Semi-Verified PAC Learning from the Crowd

Shiwei Zeng, Jie Shen

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:505-522, 2023.

Abstract

We study the problem of crowdsourced PAC learning of threshold functions. This is a challenging problem and only recently have query-efficient algorithms been established under the assumption that a noticeable fraction of the workers are perfect. In this work, we investigate a more challenging case where the majority may behave adversarially and the rest behave as the Massart noise – a significant generalization of the perfectness assumption. We show that under the semi-verified model of Charikar et al. (2017), where we have (limited) access to a trusted oracle who always returns correct annotations, it is possible to PAC learn the underlying hypothesis class with a manageable amount of label queries. Moreover, we show that the labeling cost can be drastically mitigated via the more easily obtained comparison queries. Orthogonal to recent developments in semi-verified or list-decodable learning that crucially rely on data distributional assumptions, our PAC guarantee holds by exploring the wisdom of the crowd.

Cite this Paper

BibTeX


@InProceedings{pmlr-v206-zeng23a,
  title = 	 {Semi-Verified PAC Learning from the Crowd},
  author =       {Zeng, Shiwei and Shen, Jie},
  booktitle = 	 {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {505--522},
  year = 	 {2023},
  editor = 	 {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  volume = 	 {206},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v206/zeng23a/zeng23a.pdf},
  url = 	 {https://proceedings.mlr.press/v206/zeng23a.html},
  abstract = 	 {We study the problem of crowdsourced PAC learning of threshold functions. This is a challenging problem and only recently have query-efficient algorithms been established under the assumption that a noticeable fraction of the workers are perfect. In this work, we investigate a more challenging case where the majority may behave adversarially and the rest behave as the Massart noise – a significant generalization of the perfectness assumption. We show that under the semi-verified model of Charikar et al. (2017), where we have (limited) access to a trusted oracle who always returns correct annotations, it is possible to PAC learn the underlying hypothesis class with a manageable amount of label queries. Moreover, we show that the labeling cost can be drastically mitigated via the more easily obtained comparison queries. Orthogonal to recent developments in semi-verified or list-decodable learning that crucially rely on data distributional assumptions, our PAC guarantee holds by exploring the wisdom of the crowd.}
}

Endnote

%0 Conference Paper
%T Semi-Verified PAC Learning from the Crowd
%A Shiwei Zeng
%A Jie Shen
%B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2023
%E Francisco Ruiz
%E Jennifer Dy
%E Jan-Willem van de Meent	
%F pmlr-v206-zeng23a
%I PMLR
%P 505--522
%U https://proceedings.mlr.press/v206/zeng23a.html
%V 206
%X We study the problem of crowdsourced PAC learning of threshold functions. This is a challenging problem and only recently have query-efficient algorithms been established under the assumption that a noticeable fraction of the workers are perfect. In this work, we investigate a more challenging case where the majority may behave adversarially and the rest behave as the Massart noise – a significant generalization of the perfectness assumption. We show that under the semi-verified model of Charikar et al. (2017), where we have (limited) access to a trusted oracle who always returns correct annotations, it is possible to PAC learn the underlying hypothesis class with a manageable amount of label queries. Moreover, we show that the labeling cost can be drastically mitigated via the more easily obtained comparison queries. Orthogonal to recent developments in semi-verified or list-decodable learning that crucially rely on data distributional assumptions, our PAC guarantee holds by exploring the wisdom of the crowd.

APA


Zeng, S. & Shen, J.. (2023). Semi-Verified PAC Learning from the Crowd. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:505-522 Available from https://proceedings.mlr.press/v206/zeng23a.html.

Related Material

Download PDF