Leveraging Instance Features for Label Aggregation in Programmatic Weak Supervision

Jieyu Zhang, Linxin Song, Alex Ratner
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:157-171, 2023.

Abstract

Programmatic Weak Supervision (PWS) has emerged as a widespread paradigm to synthesize training labels efficiently. The core component of PWS is the label model, which infers true labels by aggregating the outputs of multiple noisy supervision sources abstracted as labeling functions (LFs). Existing statistical label models typically rely only on the outputs of LF, ignoring the instance features when modeling the underlying generative process. In this paper, we attempt to incorporate the instance features into a statistical label model via the proposed FABLE. In particular, it is built on a mixture of Bayesian label models, each corresponding to a global pattern of correlation, and the coefficients of the mixture components are predicted by a Gaussian Process classifier based on instance features. We adopt an auxiliary variable-based variational inference algorithm to tackle the non-conjugate issue between the Gaussian Process and Bayesian label models. Extensive empirical comparison on eleven benchmark datasets sees FABLE achieving the highest averaged performance across nine baselines. Our implementation of FABLE can be found in https://github.com/JieyuZ2/wrench/blob/main/wrench/labelmodel/fable.py.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-zhang23a, title = {Leveraging Instance Features for Label Aggregation in Programmatic Weak Supervision}, author = {Zhang, Jieyu and Song, Linxin and Ratner, Alex}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {157--171}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/zhang23a/zhang23a.pdf}, url = {https://proceedings.mlr.press/v206/zhang23a.html}, abstract = {Programmatic Weak Supervision (PWS) has emerged as a widespread paradigm to synthesize training labels efficiently. The core component of PWS is the label model, which infers true labels by aggregating the outputs of multiple noisy supervision sources abstracted as labeling functions (LFs). Existing statistical label models typically rely only on the outputs of LF, ignoring the instance features when modeling the underlying generative process. In this paper, we attempt to incorporate the instance features into a statistical label model via the proposed FABLE. In particular, it is built on a mixture of Bayesian label models, each corresponding to a global pattern of correlation, and the coefficients of the mixture components are predicted by a Gaussian Process classifier based on instance features. We adopt an auxiliary variable-based variational inference algorithm to tackle the non-conjugate issue between the Gaussian Process and Bayesian label models. Extensive empirical comparison on eleven benchmark datasets sees FABLE achieving the highest averaged performance across nine baselines. Our implementation of FABLE can be found in https://github.com/JieyuZ2/wrench/blob/main/wrench/labelmodel/fable.py.} }
Endnote
%0 Conference Paper %T Leveraging Instance Features for Label Aggregation in Programmatic Weak Supervision %A Jieyu Zhang %A Linxin Song %A Alex Ratner %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-zhang23a %I PMLR %P 157--171 %U https://proceedings.mlr.press/v206/zhang23a.html %V 206 %X Programmatic Weak Supervision (PWS) has emerged as a widespread paradigm to synthesize training labels efficiently. The core component of PWS is the label model, which infers true labels by aggregating the outputs of multiple noisy supervision sources abstracted as labeling functions (LFs). Existing statistical label models typically rely only on the outputs of LF, ignoring the instance features when modeling the underlying generative process. In this paper, we attempt to incorporate the instance features into a statistical label model via the proposed FABLE. In particular, it is built on a mixture of Bayesian label models, each corresponding to a global pattern of correlation, and the coefficients of the mixture components are predicted by a Gaussian Process classifier based on instance features. We adopt an auxiliary variable-based variational inference algorithm to tackle the non-conjugate issue between the Gaussian Process and Bayesian label models. Extensive empirical comparison on eleven benchmark datasets sees FABLE achieving the highest averaged performance across nine baselines. Our implementation of FABLE can be found in https://github.com/JieyuZ2/wrench/blob/main/wrench/labelmodel/fable.py.
APA
Zhang, J., Song, L. & Ratner, A.. (2023). Leveraging Instance Features for Label Aggregation in Programmatic Weak Supervision. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:157-171 Available from https://proceedings.mlr.press/v206/zhang23a.html.

Related Material