Bayesian Online Learning for Consensus Prediction

Samuel Showalter, Alex J Boyd, Padhraic Smyth, Mark Steyvers
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:2539-2547, 2024.

Abstract

Given a pre-trained classifier and multiple human experts, we investigate the task of online classification where model predictions are provided for free but querying humans incurs a cost. In this practical but under-explored setting, oracle ground truth is not available. Instead, the prediction target is defined as the consensus vote of all experts. Given that querying full consensus can be costly, we propose a general framework for online Bayesian consensus estimation, leveraging properties of the multivariate hypergeometric distribution. Based on this framework, we propose a family of methods that dynamically estimate expert consensus from partial feedback by producing a posterior over expert and model beliefs. Analyzing this posterior induces an interpretable trade-off between querying cost and classification performance. We demonstrate the efficacy of our framework against a variety of baselines on CIFAR-10H and ImageNet-16H, two large-scale crowdsourced datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-showalter24a, title = { Bayesian Online Learning for Consensus Prediction }, author = {Showalter, Samuel and J Boyd, Alex and Smyth, Padhraic and Steyvers, Mark}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {2539--2547}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/showalter24a/showalter24a.pdf}, url = {https://proceedings.mlr.press/v238/showalter24a.html}, abstract = { Given a pre-trained classifier and multiple human experts, we investigate the task of online classification where model predictions are provided for free but querying humans incurs a cost. In this practical but under-explored setting, oracle ground truth is not available. Instead, the prediction target is defined as the consensus vote of all experts. Given that querying full consensus can be costly, we propose a general framework for online Bayesian consensus estimation, leveraging properties of the multivariate hypergeometric distribution. Based on this framework, we propose a family of methods that dynamically estimate expert consensus from partial feedback by producing a posterior over expert and model beliefs. Analyzing this posterior induces an interpretable trade-off between querying cost and classification performance. We demonstrate the efficacy of our framework against a variety of baselines on CIFAR-10H and ImageNet-16H, two large-scale crowdsourced datasets. } }
Endnote
%0 Conference Paper %T Bayesian Online Learning for Consensus Prediction %A Samuel Showalter %A Alex J Boyd %A Padhraic Smyth %A Mark Steyvers %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-showalter24a %I PMLR %P 2539--2547 %U https://proceedings.mlr.press/v238/showalter24a.html %V 238 %X Given a pre-trained classifier and multiple human experts, we investigate the task of online classification where model predictions are provided for free but querying humans incurs a cost. In this practical but under-explored setting, oracle ground truth is not available. Instead, the prediction target is defined as the consensus vote of all experts. Given that querying full consensus can be costly, we propose a general framework for online Bayesian consensus estimation, leveraging properties of the multivariate hypergeometric distribution. Based on this framework, we propose a family of methods that dynamically estimate expert consensus from partial feedback by producing a posterior over expert and model beliefs. Analyzing this posterior induces an interpretable trade-off between querying cost and classification performance. We demonstrate the efficacy of our framework against a variety of baselines on CIFAR-10H and ImageNet-16H, two large-scale crowdsourced datasets.
APA
Showalter, S., J Boyd, A., Smyth, P. & Steyvers, M.. (2024). Bayesian Online Learning for Consensus Prediction . Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:2539-2547 Available from https://proceedings.mlr.press/v238/showalter24a.html.

Related Material