Thresholding Bandit Problem with Both Duels and Pulls

Yichong Xu, Xi Chen, Aarti Singh, Artur Dubrawski
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:2591-2600, 2020.

Abstract

The Thresholding Bandit Problem (TBP) aims to find the set of arms with mean rewards greater than a given threshold. We consider a new setting of TBP, where in addition to pulling arms, one can also duel two arms and get the arm with a greater mean. In our motivating application from crowdsourcing, dueling two arms can be more cost-effective and time-efficient than direct pulls. We refer to this problem as TBP with Dueling Choices (TBP-DC). This paper provides an algorithm called Rank-Search (RS) for solving TBP-DC by alternating between ranking and binary search. We prove theoretical guarantees for RS, and also give lower bounds to show the optimality of it. Experiments show that RS outperforms previous baseline algorithms that only use pulls or duels.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-xu20c, title = {Thresholding Bandit Problem with Both Duels and Pulls}, author = {Xu, Yichong and Chen, Xi and Singh, Aarti and Dubrawski, Artur}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {2591--2600}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/xu20c/xu20c.pdf}, url = {https://proceedings.mlr.press/v108/xu20c.html}, abstract = {The Thresholding Bandit Problem (TBP) aims to find the set of arms with mean rewards greater than a given threshold. We consider a new setting of TBP, where in addition to pulling arms, one can also duel two arms and get the arm with a greater mean. In our motivating application from crowdsourcing, dueling two arms can be more cost-effective and time-efficient than direct pulls. We refer to this problem as TBP with Dueling Choices (TBP-DC). This paper provides an algorithm called Rank-Search (RS) for solving TBP-DC by alternating between ranking and binary search. We prove theoretical guarantees for RS, and also give lower bounds to show the optimality of it. Experiments show that RS outperforms previous baseline algorithms that only use pulls or duels.} }
Endnote
%0 Conference Paper %T Thresholding Bandit Problem with Both Duels and Pulls %A Yichong Xu %A Xi Chen %A Aarti Singh %A Artur Dubrawski %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-xu20c %I PMLR %P 2591--2600 %U https://proceedings.mlr.press/v108/xu20c.html %V 108 %X The Thresholding Bandit Problem (TBP) aims to find the set of arms with mean rewards greater than a given threshold. We consider a new setting of TBP, where in addition to pulling arms, one can also duel two arms and get the arm with a greater mean. In our motivating application from crowdsourcing, dueling two arms can be more cost-effective and time-efficient than direct pulls. We refer to this problem as TBP with Dueling Choices (TBP-DC). This paper provides an algorithm called Rank-Search (RS) for solving TBP-DC by alternating between ranking and binary search. We prove theoretical guarantees for RS, and also give lower bounds to show the optimality of it. Experiments show that RS outperforms previous baseline algorithms that only use pulls or duels.
APA
Xu, Y., Chen, X., Singh, A. & Dubrawski, A.. (2020). Thresholding Bandit Problem with Both Duels and Pulls. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:2591-2600 Available from https://proceedings.mlr.press/v108/xu20c.html.

Related Material