Contextual Combinatorial Volatile Multi-armed Bandit with Adaptive Discretization

Andi Nika, Sepehr Elahi, Cem Tekin
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:1486-1496, 2020.

Abstract

We consider contextual combinatorial volatile multi-armed bandit (CCV-MAB), in which at each round, the learner observes a set of available base arms and their contexts, and then, selects a super arm that contains $K$ base arms in order to maximize its cumulative reward. Under the semi-bandit feedback setting and assuming that the contexts lie in a space ${\cal X}$ endowed with the Euclidean norm and that the expected base arm outcomes (expected rewards) are Lipschitz continuous in the contexts (expected base arm outcomes), we propose an algorithm called Adaptive Contextual Combinatorial Upper Confidence Bound (ACC-UCB). This algorithm, which adaptively discretizes ${\cal X}$ to form estimates of base arm outcomes and uses an $\alpha$-approximation oracle as a subroutine to select a super arm in each round, achieves $\tilde{O} ( T^{(\bar{D}+1)/(\bar{D}+2) + \epsilon} )$ regret for any $\epsilon>0$, where $\bar{D}$ represents the approximate optimality dimension related to ${\cal X}$. This dimension captures both the benignness of the base arm arrivals and the structure of the expected reward. In addition, we provide a recipe for obtaining more optimistic regret bounds by taking into account the volatility of the base arms and show that ACC-UCB achieves significant performance gains compared to the state-of-the-art for worker selection in mobile crowdsourcing.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-nika20a, title = {Contextual Combinatorial Volatile Multi-armed Bandit with Adaptive Discretization}, author = {Nika, Andi and Elahi, Sepehr and Tekin, Cem}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {1486--1496}, year = {2020}, editor = {Silvia Chiappa and Roberto Calandra}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/nika20a/nika20a.pdf}, url = { http://proceedings.mlr.press/v108/nika20a.html }, abstract = {We consider contextual combinatorial volatile multi-armed bandit (CCV-MAB), in which at each round, the learner observes a set of available base arms and their contexts, and then, selects a super arm that contains $K$ base arms in order to maximize its cumulative reward. Under the semi-bandit feedback setting and assuming that the contexts lie in a space ${\cal X}$ endowed with the Euclidean norm and that the expected base arm outcomes (expected rewards) are Lipschitz continuous in the contexts (expected base arm outcomes), we propose an algorithm called Adaptive Contextual Combinatorial Upper Confidence Bound (ACC-UCB). This algorithm, which adaptively discretizes ${\cal X}$ to form estimates of base arm outcomes and uses an $\alpha$-approximation oracle as a subroutine to select a super arm in each round, achieves $\tilde{O} ( T^{(\bar{D}+1)/(\bar{D}+2) + \epsilon} )$ regret for any $\epsilon>0$, where $\bar{D}$ represents the approximate optimality dimension related to ${\cal X}$. This dimension captures both the benignness of the base arm arrivals and the structure of the expected reward. In addition, we provide a recipe for obtaining more optimistic regret bounds by taking into account the volatility of the base arms and show that ACC-UCB achieves significant performance gains compared to the state-of-the-art for worker selection in mobile crowdsourcing.} }
Endnote
%0 Conference Paper %T Contextual Combinatorial Volatile Multi-armed Bandit with Adaptive Discretization %A Andi Nika %A Sepehr Elahi %A Cem Tekin %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-nika20a %I PMLR %P 1486--1496 %U http://proceedings.mlr.press/v108/nika20a.html %V 108 %X We consider contextual combinatorial volatile multi-armed bandit (CCV-MAB), in which at each round, the learner observes a set of available base arms and their contexts, and then, selects a super arm that contains $K$ base arms in order to maximize its cumulative reward. Under the semi-bandit feedback setting and assuming that the contexts lie in a space ${\cal X}$ endowed with the Euclidean norm and that the expected base arm outcomes (expected rewards) are Lipschitz continuous in the contexts (expected base arm outcomes), we propose an algorithm called Adaptive Contextual Combinatorial Upper Confidence Bound (ACC-UCB). This algorithm, which adaptively discretizes ${\cal X}$ to form estimates of base arm outcomes and uses an $\alpha$-approximation oracle as a subroutine to select a super arm in each round, achieves $\tilde{O} ( T^{(\bar{D}+1)/(\bar{D}+2) + \epsilon} )$ regret for any $\epsilon>0$, where $\bar{D}$ represents the approximate optimality dimension related to ${\cal X}$. This dimension captures both the benignness of the base arm arrivals and the structure of the expected reward. In addition, we provide a recipe for obtaining more optimistic regret bounds by taking into account the volatility of the base arms and show that ACC-UCB achieves significant performance gains compared to the state-of-the-art for worker selection in mobile crowdsourcing.
APA
Nika, A., Elahi, S. & Tekin, C.. (2020). Contextual Combinatorial Volatile Multi-armed Bandit with Adaptive Discretization. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:1486-1496 Available from http://proceedings.mlr.press/v108/nika20a.html .

Related Material