On Interpolating Experts and Multi-Armed Bandits

Houshuang Chen, Yuchen He, Chihao Zhang
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:6776-6802, 2024.

Abstract

Learning with expert advice and multi-armed bandit are two classic online decision problems which differ on how the information is observed in each round of the game. We study a family of problems interpolating the two. For a vector $\mathbf{m}=(m_1,…,m_K)\in \mathbb N^K$, an instance of $\mathbf m$-MAB indicates that the arms are partitioned into $K$ groups and the $i$-th group contains $m_i$ arms. Once an arm is pulled, the losses of all arms in the same group are observed. We prove tight minimax regret bounds for $\mathbf m$-MAB and design an optimal PAC algorithm for its pure exploration version, $\mathbf m$-BAI, where the goal is to identify the arm with minimum loss with as few rounds as possible. We show that the minimax regret of $\mathbf m$-MAB is $\Theta\left(\sqrt{T\sum_{k=1}^K\log (m_k+1)}\right)$ and the minimum number of pulls for an $(\varepsilon,0.05)$-PAC algorithm of $\mathbf m$-BAI is $\Theta\left(\frac{1}{\varepsilon^2}\cdot \sum_{k=1}^K\log (m_k+1)\right)$. Both our upper bounds and lower bounds for $\mathbf m$-MAB can be extended to a more general setting, namely the bandit with graph feedback, in terms of the clique cover and related graph parameters. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-chen24p, title = {On Interpolating Experts and Multi-Armed Bandits}, author = {Chen, Houshuang and He, Yuchen and Zhang, Chihao}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {6776--6802}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24p/chen24p.pdf}, url = {https://proceedings.mlr.press/v235/chen24p.html}, abstract = {Learning with expert advice and multi-armed bandit are two classic online decision problems which differ on how the information is observed in each round of the game. We study a family of problems interpolating the two. For a vector $\mathbf{m}=(m_1,…,m_K)\in \mathbb N^K$, an instance of $\mathbf m$-MAB indicates that the arms are partitioned into $K$ groups and the $i$-th group contains $m_i$ arms. Once an arm is pulled, the losses of all arms in the same group are observed. We prove tight minimax regret bounds for $\mathbf m$-MAB and design an optimal PAC algorithm for its pure exploration version, $\mathbf m$-BAI, where the goal is to identify the arm with minimum loss with as few rounds as possible. We show that the minimax regret of $\mathbf m$-MAB is $\Theta\left(\sqrt{T\sum_{k=1}^K\log (m_k+1)}\right)$ and the minimum number of pulls for an $(\varepsilon,0.05)$-PAC algorithm of $\mathbf m$-BAI is $\Theta\left(\frac{1}{\varepsilon^2}\cdot \sum_{k=1}^K\log (m_k+1)\right)$. Both our upper bounds and lower bounds for $\mathbf m$-MAB can be extended to a more general setting, namely the bandit with graph feedback, in terms of the clique cover and related graph parameters. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.} }
Endnote
%0 Conference Paper %T On Interpolating Experts and Multi-Armed Bandits %A Houshuang Chen %A Yuchen He %A Chihao Zhang %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-chen24p %I PMLR %P 6776--6802 %U https://proceedings.mlr.press/v235/chen24p.html %V 235 %X Learning with expert advice and multi-armed bandit are two classic online decision problems which differ on how the information is observed in each round of the game. We study a family of problems interpolating the two. For a vector $\mathbf{m}=(m_1,…,m_K)\in \mathbb N^K$, an instance of $\mathbf m$-MAB indicates that the arms are partitioned into $K$ groups and the $i$-th group contains $m_i$ arms. Once an arm is pulled, the losses of all arms in the same group are observed. We prove tight minimax regret bounds for $\mathbf m$-MAB and design an optimal PAC algorithm for its pure exploration version, $\mathbf m$-BAI, where the goal is to identify the arm with minimum loss with as few rounds as possible. We show that the minimax regret of $\mathbf m$-MAB is $\Theta\left(\sqrt{T\sum_{k=1}^K\log (m_k+1)}\right)$ and the minimum number of pulls for an $(\varepsilon,0.05)$-PAC algorithm of $\mathbf m$-BAI is $\Theta\left(\frac{1}{\varepsilon^2}\cdot \sum_{k=1}^K\log (m_k+1)\right)$. Both our upper bounds and lower bounds for $\mathbf m$-MAB can be extended to a more general setting, namely the bandit with graph feedback, in terms of the clique cover and related graph parameters. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.
APA
Chen, H., He, Y. & Zhang, C.. (2024). On Interpolating Experts and Multi-Armed Bandits. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:6776-6802 Available from https://proceedings.mlr.press/v235/chen24p.html.

Related Material