Multi-armed Bandit Problem with Lock-up Periods

Junpei Komiyama; Issei Sato; Hiroshi Nakagawa

Multi-armed Bandit Problem with Lock-up Periods

Junpei Komiyama, Issei Sato, Hiroshi Nakagawa

Proceedings of the 5th Asian Conference on Machine Learning, PMLR 29:100-115, 2013.

Abstract

We investigate a stochastic multi-armed bandit problem in which the forecaster’s choice is restricted. In this problem, rounds are divided into lock-up periods and the forecaster must select the same arm throughout a period. While there has been much work on finding optimal algorithms for the stochastic multi-armed bandit problem, their use under restricted conditions is not obvious. We extend the application ranges of these algorithms by proposing their natural conversion from ones for the stochastic bandit problem (index-based algorithms and greedy algorithms) to ones for the multi-armed bandit problem with lock-up periods. We prove that the regret of the converted algorithms is O(\logT + L_max ), where T is the total number of rounds and L_max is the maximum size of the lock-up periods. The regret is preferable, except for the case when the maximum size of the lock-up periods is large. For these cases, we propose a meta-algorithm that results in a smaller regret by using a empirical best arm for large periods. We empirically compare and discuss these algorithms.

Cite this Paper

BibTeX

@InProceedings{pmlr-v29-Komiyama13,
  title = 	 {Multi-armed Bandit Problem with Lock-up Periods},
  author = 	 {Komiyama, Junpei and Sato, Issei and Nakagawa, Hiroshi},
  booktitle = 	 {Proceedings of the 5th Asian Conference on Machine Learning},
  pages = 	 {100--115},
  year = 	 {2013},
  editor = 	 {Ong, Cheng Soon and Ho, Tu Bao},
  volume = 	 {29},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Australian National University, Canberra, Australia},
  month = 	 {13--15 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v29/Komiyama13.pdf},
  url = 	 {https://proceedings.mlr.press/v29/Komiyama13.html},
  abstract = 	 {We investigate a stochastic multi-armed bandit problem in which the forecaster’s choice is restricted. In this problem, rounds are divided into lock-up periods and the forecaster must select the same arm throughout a period. While there has been much work on finding optimal algorithms for the stochastic multi-armed bandit problem,  their use under restricted conditions is not obvious. We extend the application ranges of these algorithms by proposing their natural conversion from ones for the stochastic bandit problem (index-based algorithms and greedy algorithms) to ones for the multi-armed bandit problem with lock-up periods. We prove that the regret of the converted algorithms is O(\logT + L_max ), where T is the total number of rounds and L_max is the maximum size of the lock-up periods. The regret is preferable, except for the case when the maximum size of the lock-up periods is large. For these cases, we propose a meta-algorithm that results in a smaller regret by using a empirical best arm for large periods. We empirically compare and discuss these algorithms.}
}

Endnote

%0 Conference Paper
%T Multi-armed Bandit Problem with Lock-up Periods
%A Junpei Komiyama
%A Issei Sato
%A Hiroshi Nakagawa
%B Proceedings of the 5th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Cheng Soon Ong
%E Tu Bao Ho	
%F pmlr-v29-Komiyama13
%I PMLR
%P 100--115
%U https://proceedings.mlr.press/v29/Komiyama13.html
%V 29
%X We investigate a stochastic multi-armed bandit problem in which the forecaster’s choice is restricted. In this problem, rounds are divided into lock-up periods and the forecaster must select the same arm throughout a period. While there has been much work on finding optimal algorithms for the stochastic multi-armed bandit problem,  their use under restricted conditions is not obvious. We extend the application ranges of these algorithms by proposing their natural conversion from ones for the stochastic bandit problem (index-based algorithms and greedy algorithms) to ones for the multi-armed bandit problem with lock-up periods. We prove that the regret of the converted algorithms is O(\logT + L_max ), where T is the total number of rounds and L_max is the maximum size of the lock-up periods. The regret is preferable, except for the case when the maximum size of the lock-up periods is large. For these cases, we propose a meta-algorithm that results in a smaller regret by using a empirical best arm for large periods. We empirically compare and discuss these algorithms.

RIS

TY  - CPAPER
TI  - Multi-armed Bandit Problem with Lock-up Periods
AU  - Junpei Komiyama
AU  - Issei Sato
AU  - Hiroshi Nakagawa
BT  - Proceedings of the 5th Asian Conference on Machine Learning
DA  - 2013/10/21
ED  - Cheng Soon Ong
ED  - Tu Bao Ho	
ID  - pmlr-v29-Komiyama13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 29
SP  - 100
EP  - 115
L1  - http://proceedings.mlr.press/v29/Komiyama13.pdf
UR  - https://proceedings.mlr.press/v29/Komiyama13.html
AB  - We investigate a stochastic multi-armed bandit problem in which the forecaster’s choice is restricted. In this problem, rounds are divided into lock-up periods and the forecaster must select the same arm throughout a period. While there has been much work on finding optimal algorithms for the stochastic multi-armed bandit problem,  their use under restricted conditions is not obvious. We extend the application ranges of these algorithms by proposing their natural conversion from ones for the stochastic bandit problem (index-based algorithms and greedy algorithms) to ones for the multi-armed bandit problem with lock-up periods. We prove that the regret of the converted algorithms is O(\logT + L_max ), where T is the total number of rounds and L_max is the maximum size of the lock-up periods. The regret is preferable, except for the case when the maximum size of the lock-up periods is large. For these cases, we propose a meta-algorithm that results in a smaller regret by using a empirical best arm for large periods. We empirically compare and discuss these algorithms.
ER  -

APA

Komiyama, J., Sato, I. & Nakagawa, H.. (2013). Multi-armed Bandit Problem with Lock-up Periods. Proceedings of the 5th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 29:100-115 Available from https://proceedings.mlr.press/v29/Komiyama13.html.

Related Material

Download PDF