Firebolt: Weak Supervision Under Weaker Assumptions

Zhaobin Kuang; Chidubem G. Arachie; Bangyong Liang; Pradyumna Narayana; Giulia Desalvo; Michael S. Quinn; Bert Huang; Geoffrey Downs; Yang Yang

Firebolt: Weak Supervision Under Weaker Assumptions

Zhaobin Kuang, Chidubem G. Arachie, Bangyong Liang, Pradyumna Narayana, Giulia Desalvo, Michael S. Quinn, Bert Huang, Geoffrey Downs, Yang Yang

Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:8214-8259, 2022.

Abstract

Modern machine learning demands a large amount of training data. Weak supervision is a promising approach to meet this demand. It aggregates multiple labeling functions (LFs)–noisy, user-provided labeling heuristics—to rapidly and cheaply curate probabilistic labels for large-scale unlabeled data. However, standard assumptions in weak supervision—such as user-specified class balance, similar accuracy of an LF in classifying different classes, and full knowledge of LF dependency at inference time—might be undesirable in practice. In response, we present Firebolt, a new weak supervision framework that seeks to operate under weaker assumptions. In particular, Firebolt learns the class balance and class-specific accuracy of LFs jointly from unlabeled data. It carries out inference in an efficient and interpretable manner. We analyze the parameter estimation error of Firebolt and characterize its impact on downstream model performance. Furthermore, we show that on five publicly available datasets, Firebolt outperforms a state-of-the-art weak supervision method by up to 5.8 points in AUC. We also provide a case study in the production setting of a tech company, where a Firebolt-supervised model outperforms the existing weakly-supervised production model by 1.3 points in AUC and speedup label model training and inference from one hour to three minutes.

Cite this Paper

BibTeX


@InProceedings{pmlr-v151-kuang22a,
  title = 	 { Firebolt: Weak Supervision Under Weaker Assumptions },
  author =       {Kuang, Zhaobin and Arachie, Chidubem G. and Liang, Bangyong and Narayana, Pradyumna and Desalvo, Giulia and Quinn, Michael S. and Huang, Bert and Downs, Geoffrey and Yang, Yang},
  booktitle = 	 {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {8214--8259},
  year = 	 {2022},
  editor = 	 {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel},
  volume = 	 {151},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {28--30 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v151/kuang22a/kuang22a.pdf},
  url = 	 {https://proceedings.mlr.press/v151/kuang22a.html},
  abstract = 	 { Modern machine learning demands a large amount of training data. Weak supervision is a promising approach to meet this demand. It aggregates multiple labeling functions (LFs)–noisy, user-provided labeling heuristics—to rapidly and cheaply curate probabilistic labels for large-scale unlabeled data. However, standard assumptions in weak supervision—such as user-specified class balance, similar accuracy of an LF in classifying different classes, and full knowledge of LF dependency at inference time—might be undesirable in practice. In response, we present Firebolt, a new weak supervision framework that seeks to operate under weaker assumptions. In particular, Firebolt learns the class balance and class-specific accuracy of LFs jointly from unlabeled data. It carries out inference in an efficient and interpretable manner. We analyze the parameter estimation error of Firebolt and characterize its impact on downstream model performance. Furthermore, we show that on five publicly available datasets, Firebolt outperforms a state-of-the-art weak supervision method by up to 5.8 points in AUC. We also provide a case study in the production setting of a tech company, where a Firebolt-supervised model outperforms the existing weakly-supervised production model by 1.3 points in AUC and speedup label model training and inference from one hour to three minutes. }
}

Endnote

%0 Conference Paper
%T  Firebolt: Weak Supervision Under Weaker Assumptions 
%A Zhaobin Kuang
%A Chidubem G. Arachie
%A Bangyong Liang
%A Pradyumna Narayana
%A Giulia Desalvo
%A Michael S. Quinn
%A Bert Huang
%A Geoffrey Downs
%A Yang Yang
%B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2022
%E Gustau Camps-Valls
%E Francisco J. R. Ruiz
%E Isabel Valera	
%F pmlr-v151-kuang22a
%I PMLR
%P 8214--8259
%U https://proceedings.mlr.press/v151/kuang22a.html
%V 151
%X  Modern machine learning demands a large amount of training data. Weak supervision is a promising approach to meet this demand. It aggregates multiple labeling functions (LFs)–noisy, user-provided labeling heuristics—to rapidly and cheaply curate probabilistic labels for large-scale unlabeled data. However, standard assumptions in weak supervision—such as user-specified class balance, similar accuracy of an LF in classifying different classes, and full knowledge of LF dependency at inference time—might be undesirable in practice. In response, we present Firebolt, a new weak supervision framework that seeks to operate under weaker assumptions. In particular, Firebolt learns the class balance and class-specific accuracy of LFs jointly from unlabeled data. It carries out inference in an efficient and interpretable manner. We analyze the parameter estimation error of Firebolt and characterize its impact on downstream model performance. Furthermore, we show that on five publicly available datasets, Firebolt outperforms a state-of-the-art weak supervision method by up to 5.8 points in AUC. We also provide a case study in the production setting of a tech company, where a Firebolt-supervised model outperforms the existing weakly-supervised production model by 1.3 points in AUC and speedup label model training and inference from one hour to three minutes.

APA


Kuang, Z., Arachie, C.G., Liang, B., Narayana, P., Desalvo, G., Quinn, M.S., Huang, B., Downs, G. & Yang, Y.. (2022).  Firebolt: Weak Supervision Under Weaker Assumptions . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:8214-8259 Available from https://proceedings.mlr.press/v151/kuang22a.html.

Related Material

Download PDF