An Information-Theoretic Analysis of Nonstationary Bandit Learning

Seungki Min, Daniel Russo
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:24831-24849, 2023.

Abstract

In nonstationary bandit learning problems, the decision-maker must continually gather information and adapt their action selection as the latent state of the environment evolves. In each time period, some latent optimal action maximizes expected reward under the environment state. We view the optimal action sequence as a stochastic process, and take an information-theoretic approach to analyze attainable performance. We bound per-period regret in terms of the entropy rate of the optimal action process. The bound applies to a wide array of problems studied in the literature and reflects the problem’s information structure through its information-ratio.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-min23c, title = {An Information-Theoretic Analysis of Nonstationary Bandit Learning}, author = {Min, Seungki and Russo, Daniel}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {24831--24849}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/min23c/min23c.pdf}, url = {https://proceedings.mlr.press/v202/min23c.html}, abstract = {In nonstationary bandit learning problems, the decision-maker must continually gather information and adapt their action selection as the latent state of the environment evolves. In each time period, some latent optimal action maximizes expected reward under the environment state. We view the optimal action sequence as a stochastic process, and take an information-theoretic approach to analyze attainable performance. We bound per-period regret in terms of the entropy rate of the optimal action process. The bound applies to a wide array of problems studied in the literature and reflects the problem’s information structure through its information-ratio.} }
Endnote
%0 Conference Paper %T An Information-Theoretic Analysis of Nonstationary Bandit Learning %A Seungki Min %A Daniel Russo %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-min23c %I PMLR %P 24831--24849 %U https://proceedings.mlr.press/v202/min23c.html %V 202 %X In nonstationary bandit learning problems, the decision-maker must continually gather information and adapt their action selection as the latent state of the environment evolves. In each time period, some latent optimal action maximizes expected reward under the environment state. We view the optimal action sequence as a stochastic process, and take an information-theoretic approach to analyze attainable performance. We bound per-period regret in terms of the entropy rate of the optimal action process. The bound applies to a wide array of problems studied in the literature and reflects the problem’s information structure through its information-ratio.
APA
Min, S. & Russo, D.. (2023). An Information-Theoretic Analysis of Nonstationary Bandit Learning. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:24831-24849 Available from https://proceedings.mlr.press/v202/min23c.html.

Related Material