Think Before You Duel: Understanding Complexities of Preference Learning under Constrained Resources

Rohan Deb, Aadirupa Saha, Arindam Banerjee
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:4546-4554, 2024.

Abstract

We consider the problem of reward maximization in the dueling bandit setup along with constraints on resource consumption. As in the classic dueling bandits, at each round the learner has to choose a pair of items from a set of $K$ items and observe a relative feedback for the current pair. Additionally, for both items, the learner also observes a vector of resource consumptions. The objective of the learner is to maximize the cumulative reward, while ensuring that the total consumption of any resource is within the allocated budget. We show that due to the relative nature of the feedback, the problem is more difficult than its bandit counterpart and that without further assumptions the problem is not learnable from a regret minimization perspective. Thereafter, by exploiting assumptions on the available budget, we provide an EXP3 based dueling algorithm that also considers the associated consumptions and show that it achieves an $\tilde{\mathcal{O}}\left(\big({\frac{OPT^{(b)}}{B}}+1\big)K^{1/3}T^{2/3}\right)$ regret, where $OPT^{(b)}$ is the optimal value and $B$ is the available budget. Finally, we provide numerical simulations to demonstrate the efficacy of our proposed method.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-deb24a, title = {Think Before You Duel: Understanding Complexities of Preference Learning under Constrained Resources}, author = {Deb, Rohan and Saha, Aadirupa and Banerjee, Arindam}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {4546--4554}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/deb24a/deb24a.pdf}, url = {https://proceedings.mlr.press/v238/deb24a.html}, abstract = {We consider the problem of reward maximization in the dueling bandit setup along with constraints on resource consumption. As in the classic dueling bandits, at each round the learner has to choose a pair of items from a set of $K$ items and observe a relative feedback for the current pair. Additionally, for both items, the learner also observes a vector of resource consumptions. The objective of the learner is to maximize the cumulative reward, while ensuring that the total consumption of any resource is within the allocated budget. We show that due to the relative nature of the feedback, the problem is more difficult than its bandit counterpart and that without further assumptions the problem is not learnable from a regret minimization perspective. Thereafter, by exploiting assumptions on the available budget, we provide an EXP3 based dueling algorithm that also considers the associated consumptions and show that it achieves an $\tilde{\mathcal{O}}\left(\big({\frac{OPT^{(b)}}{B}}+1\big)K^{1/3}T^{2/3}\right)$ regret, where $OPT^{(b)}$ is the optimal value and $B$ is the available budget. Finally, we provide numerical simulations to demonstrate the efficacy of our proposed method.} }
Endnote
%0 Conference Paper %T Think Before You Duel: Understanding Complexities of Preference Learning under Constrained Resources %A Rohan Deb %A Aadirupa Saha %A Arindam Banerjee %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-deb24a %I PMLR %P 4546--4554 %U https://proceedings.mlr.press/v238/deb24a.html %V 238 %X We consider the problem of reward maximization in the dueling bandit setup along with constraints on resource consumption. As in the classic dueling bandits, at each round the learner has to choose a pair of items from a set of $K$ items and observe a relative feedback for the current pair. Additionally, for both items, the learner also observes a vector of resource consumptions. The objective of the learner is to maximize the cumulative reward, while ensuring that the total consumption of any resource is within the allocated budget. We show that due to the relative nature of the feedback, the problem is more difficult than its bandit counterpart and that without further assumptions the problem is not learnable from a regret minimization perspective. Thereafter, by exploiting assumptions on the available budget, we provide an EXP3 based dueling algorithm that also considers the associated consumptions and show that it achieves an $\tilde{\mathcal{O}}\left(\big({\frac{OPT^{(b)}}{B}}+1\big)K^{1/3}T^{2/3}\right)$ regret, where $OPT^{(b)}$ is the optimal value and $B$ is the available budget. Finally, we provide numerical simulations to demonstrate the efficacy of our proposed method.
APA
Deb, R., Saha, A. & Banerjee, A.. (2024). Think Before You Duel: Understanding Complexities of Preference Learning under Constrained Resources. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:4546-4554 Available from https://proceedings.mlr.press/v238/deb24a.html.

Related Material