Multinomial Logit Bandit with Low Switching Cost

Kefan Dong, Yingkai Li, Qin Zhang, Yuan Zhou
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:2607-2615, 2020.

Abstract

We study multinomial logit bandit with limited adaptivity, where the algorithms change their exploration actions as infrequently as possible when achieving almost optimal minimax regret. We propose two measures of adaptivity: the assortment switching cost and the more fine-grained item switching cost. We present an anytime algorithm (AT-DUCB) with $O(N \log T)$ assortment switches, almost matching the lower bound $\Omega(\frac{N \log T}{ \log \log T})$. In the fixed-horizon setting, our algorithm FH-DUCB incurs $O(N \log \log T)$ assortment switches, matching the asymptotic lower bound. We also present the ESUCB algorithm with item switching cost $O(N \log^2 T)$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-dong20b, title = {Multinomial Logit Bandit with Low Switching Cost}, author = {Dong, Kefan and Li, Yingkai and Zhang, Qin and Zhou, Yuan}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {2607--2615}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/dong20b/dong20b.pdf}, url = {https://proceedings.mlr.press/v119/dong20b.html}, abstract = {We study multinomial logit bandit with limited adaptivity, where the algorithms change their exploration actions as infrequently as possible when achieving almost optimal minimax regret. We propose two measures of adaptivity: the assortment switching cost and the more fine-grained item switching cost. We present an anytime algorithm (AT-DUCB) with $O(N \log T)$ assortment switches, almost matching the lower bound $\Omega(\frac{N \log T}{ \log \log T})$. In the fixed-horizon setting, our algorithm FH-DUCB incurs $O(N \log \log T)$ assortment switches, matching the asymptotic lower bound. We also present the ESUCB algorithm with item switching cost $O(N \log^2 T)$.} }
Endnote
%0 Conference Paper %T Multinomial Logit Bandit with Low Switching Cost %A Kefan Dong %A Yingkai Li %A Qin Zhang %A Yuan Zhou %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-dong20b %I PMLR %P 2607--2615 %U https://proceedings.mlr.press/v119/dong20b.html %V 119 %X We study multinomial logit bandit with limited adaptivity, where the algorithms change their exploration actions as infrequently as possible when achieving almost optimal minimax regret. We propose two measures of adaptivity: the assortment switching cost and the more fine-grained item switching cost. We present an anytime algorithm (AT-DUCB) with $O(N \log T)$ assortment switches, almost matching the lower bound $\Omega(\frac{N \log T}{ \log \log T})$. In the fixed-horizon setting, our algorithm FH-DUCB incurs $O(N \log \log T)$ assortment switches, matching the asymptotic lower bound. We also present the ESUCB algorithm with item switching cost $O(N \log^2 T)$.
APA
Dong, K., Li, Y., Zhang, Q. & Zhou, Y.. (2020). Multinomial Logit Bandit with Low Switching Cost. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:2607-2615 Available from https://proceedings.mlr.press/v119/dong20b.html.

Related Material