An Information-Theoretic Analysis of Nonstationary Bandit Learning

Seungki Min; Daniel Russo

An Information-Theoretic Analysis of Nonstationary Bandit Learning

Seungki Min, Daniel Russo

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:24831-24849, 2023.

Abstract

In nonstationary bandit learning problems, the decision-maker must continually gather information and adapt their action selection as the latent state of the environment evolves. In each time period, some latent optimal action maximizes expected reward under the environment state. We view the optimal action sequence as a stochastic process, and take an information-theoretic approach to analyze attainable performance. We bound per-period regret in terms of the entropy rate of the optimal action process. The bound applies to a wide array of problems studied in the literature and reflects the problem’s information structure through its information-ratio.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-min23c,
  title = 	 {An Information-Theoretic Analysis of Nonstationary Bandit Learning},
  author =       {Min, Seungki and Russo, Daniel},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {24831--24849},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/min23c/min23c.pdf},
  url = 	 {https://proceedings.mlr.press/v202/min23c.html},
  abstract = 	 {In nonstationary bandit learning problems, the decision-maker must continually gather information and adapt their action selection as the latent state of the environment evolves. In each time period, some latent optimal action maximizes expected reward under the environment state. We view the optimal action sequence as a stochastic process, and take an information-theoretic approach to analyze attainable performance. We bound per-period regret in terms of the entropy rate of the optimal action process. The bound applies to a wide array of problems studied in the literature and reflects the problem’s information structure through its information-ratio.}
}

Endnote

%0 Conference Paper
%T An Information-Theoretic Analysis of Nonstationary Bandit Learning
%A Seungki Min
%A Daniel Russo
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-min23c
%I PMLR
%P 24831--24849
%U https://proceedings.mlr.press/v202/min23c.html
%V 202
%X In nonstationary bandit learning problems, the decision-maker must continually gather information and adapt their action selection as the latent state of the environment evolves. In each time period, some latent optimal action maximizes expected reward under the environment state. We view the optimal action sequence as a stochastic process, and take an information-theoretic approach to analyze attainable performance. We bound per-period regret in terms of the entropy rate of the optimal action process. The bound applies to a wide array of problems studied in the literature and reflects the problem’s information structure through its information-ratio.

APA


Min, S. & Russo, D.. (2023). An Information-Theoretic Analysis of Nonstationary Bandit Learning. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:24831-24849 Available from https://proceedings.mlr.press/v202/min23c.html.

An Information-Theoretic Analysis of Nonstationary Bandit Learning

Abstract

Cite this Paper

Related Material