Multi-armed Bandit Problem with Lock-up Periods

Junpei Komiyama, Issei Sato, Hiroshi Nakagawa
Proceedings of the 5th Asian Conference on Machine Learning, PMLR 29:100-115, 2013.

Abstract

We investigate a stochastic multi-armed bandit problem in which the forecaster’s choice is restricted. In this problem, rounds are divided into lock-up periods and the forecaster must select the same arm throughout a period. While there has been much work on finding optimal algorithms for the stochastic multi-armed bandit problem, their use under restricted conditions is not obvious. We extend the application ranges of these algorithms by proposing their natural conversion from ones for the stochastic bandit problem (index-based algorithms and greedy algorithms) to ones for the multi-armed bandit problem with lock-up periods. We prove that the regret of the converted algorithms is O(\logT + L_max ), where T is the total number of rounds and L_max is the maximum size of the lock-up periods. The regret is preferable, except for the case when the maximum size of the lock-up periods is large. For these cases, we propose a meta-algorithm that results in a smaller regret by using a empirical best arm for large periods. We empirically compare and discuss these algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v29-Komiyama13, title = {Multi-armed Bandit Problem with Lock-up Periods}, author = {Komiyama, Junpei and Sato, Issei and Nakagawa, Hiroshi}, booktitle = {Proceedings of the 5th Asian Conference on Machine Learning}, pages = {100--115}, year = {2013}, editor = {Ong, Cheng Soon and Ho, Tu Bao}, volume = {29}, series = {Proceedings of Machine Learning Research}, address = {Australian National University, Canberra, Australia}, month = {13--15 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v29/Komiyama13.pdf}, url = {https://proceedings.mlr.press/v29/Komiyama13.html}, abstract = {We investigate a stochastic multi-armed bandit problem in which the forecaster’s choice is restricted. In this problem, rounds are divided into lock-up periods and the forecaster must select the same arm throughout a period. While there has been much work on finding optimal algorithms for the stochastic multi-armed bandit problem, their use under restricted conditions is not obvious. We extend the application ranges of these algorithms by proposing their natural conversion from ones for the stochastic bandit problem (index-based algorithms and greedy algorithms) to ones for the multi-armed bandit problem with lock-up periods. We prove that the regret of the converted algorithms is O(\logT + L_max ), where T is the total number of rounds and L_max is the maximum size of the lock-up periods. The regret is preferable, except for the case when the maximum size of the lock-up periods is large. For these cases, we propose a meta-algorithm that results in a smaller regret by using a empirical best arm for large periods. We empirically compare and discuss these algorithms.} }
Endnote
%0 Conference Paper %T Multi-armed Bandit Problem with Lock-up Periods %A Junpei Komiyama %A Issei Sato %A Hiroshi Nakagawa %B Proceedings of the 5th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Cheng Soon Ong %E Tu Bao Ho %F pmlr-v29-Komiyama13 %I PMLR %P 100--115 %U https://proceedings.mlr.press/v29/Komiyama13.html %V 29 %X We investigate a stochastic multi-armed bandit problem in which the forecaster’s choice is restricted. In this problem, rounds are divided into lock-up periods and the forecaster must select the same arm throughout a period. While there has been much work on finding optimal algorithms for the stochastic multi-armed bandit problem, their use under restricted conditions is not obvious. We extend the application ranges of these algorithms by proposing their natural conversion from ones for the stochastic bandit problem (index-based algorithms and greedy algorithms) to ones for the multi-armed bandit problem with lock-up periods. We prove that the regret of the converted algorithms is O(\logT + L_max ), where T is the total number of rounds and L_max is the maximum size of the lock-up periods. The regret is preferable, except for the case when the maximum size of the lock-up periods is large. For these cases, we propose a meta-algorithm that results in a smaller regret by using a empirical best arm for large periods. We empirically compare and discuss these algorithms.
RIS
TY - CPAPER TI - Multi-armed Bandit Problem with Lock-up Periods AU - Junpei Komiyama AU - Issei Sato AU - Hiroshi Nakagawa BT - Proceedings of the 5th Asian Conference on Machine Learning DA - 2013/10/21 ED - Cheng Soon Ong ED - Tu Bao Ho ID - pmlr-v29-Komiyama13 PB - PMLR DP - Proceedings of Machine Learning Research VL - 29 SP - 100 EP - 115 L1 - http://proceedings.mlr.press/v29/Komiyama13.pdf UR - https://proceedings.mlr.press/v29/Komiyama13.html AB - We investigate a stochastic multi-armed bandit problem in which the forecaster’s choice is restricted. In this problem, rounds are divided into lock-up periods and the forecaster must select the same arm throughout a period. While there has been much work on finding optimal algorithms for the stochastic multi-armed bandit problem, their use under restricted conditions is not obvious. We extend the application ranges of these algorithms by proposing their natural conversion from ones for the stochastic bandit problem (index-based algorithms and greedy algorithms) to ones for the multi-armed bandit problem with lock-up periods. We prove that the regret of the converted algorithms is O(\logT + L_max ), where T is the total number of rounds and L_max is the maximum size of the lock-up periods. The regret is preferable, except for the case when the maximum size of the lock-up periods is large. For these cases, we propose a meta-algorithm that results in a smaller regret by using a empirical best arm for large periods. We empirically compare and discuss these algorithms. ER -
APA
Komiyama, J., Sato, I. & Nakagawa, H.. (2013). Multi-armed Bandit Problem with Lock-up Periods. Proceedings of the 5th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 29:100-115 Available from https://proceedings.mlr.press/v29/Komiyama13.html.

Related Material