Dynamic Balancing for Model Selection in Bandits and RL

Ashok Cutkosky, Christoph Dann, Abhimanyu Das, Claudio Gentile, Aldo Pacchiano, Manish Purohit
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:2276-2285, 2021.

Abstract

We propose a framework for model selection by combining base algorithms in stochastic bandits and reinforcement learning. We require a candidate regret bound for each base algorithm that may or may not hold. We select base algorithms to play in each round using a “balancing condition” on the candidate regret bounds. Our approach simultaneously recovers previous worst-case regret bounds, while also obtaining much smaller regret in natural scenarios when some base learners significantly exceed their candidate bounds. Our framework is relevant in many settings, including linear bandits and MDPs with nested function classes, linear bandits with unknown misspecification, and tuning confidence parameters of algorithms such as LinUCB. Moreover, unlike recent efforts in model selection for linear stochastic bandits, our approach can be extended to consider adversarial rather than stochastic contexts.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-cutkosky21a, title = {Dynamic Balancing for Model Selection in Bandits and RL}, author = {Cutkosky, Ashok and Dann, Christoph and Das, Abhimanyu and Gentile, Claudio and Pacchiano, Aldo and Purohit, Manish}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {2276--2285}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/cutkosky21a/cutkosky21a.pdf}, url = {https://proceedings.mlr.press/v139/cutkosky21a.html}, abstract = {We propose a framework for model selection by combining base algorithms in stochastic bandits and reinforcement learning. We require a candidate regret bound for each base algorithm that may or may not hold. We select base algorithms to play in each round using a “balancing condition” on the candidate regret bounds. Our approach simultaneously recovers previous worst-case regret bounds, while also obtaining much smaller regret in natural scenarios when some base learners significantly exceed their candidate bounds. Our framework is relevant in many settings, including linear bandits and MDPs with nested function classes, linear bandits with unknown misspecification, and tuning confidence parameters of algorithms such as LinUCB. Moreover, unlike recent efforts in model selection for linear stochastic bandits, our approach can be extended to consider adversarial rather than stochastic contexts.} }
Endnote
%0 Conference Paper %T Dynamic Balancing for Model Selection in Bandits and RL %A Ashok Cutkosky %A Christoph Dann %A Abhimanyu Das %A Claudio Gentile %A Aldo Pacchiano %A Manish Purohit %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-cutkosky21a %I PMLR %P 2276--2285 %U https://proceedings.mlr.press/v139/cutkosky21a.html %V 139 %X We propose a framework for model selection by combining base algorithms in stochastic bandits and reinforcement learning. We require a candidate regret bound for each base algorithm that may or may not hold. We select base algorithms to play in each round using a “balancing condition” on the candidate regret bounds. Our approach simultaneously recovers previous worst-case regret bounds, while also obtaining much smaller regret in natural scenarios when some base learners significantly exceed their candidate bounds. Our framework is relevant in many settings, including linear bandits and MDPs with nested function classes, linear bandits with unknown misspecification, and tuning confidence parameters of algorithms such as LinUCB. Moreover, unlike recent efforts in model selection for linear stochastic bandits, our approach can be extended to consider adversarial rather than stochastic contexts.
APA
Cutkosky, A., Dann, C., Das, A., Gentile, C., Pacchiano, A. & Purohit, M.. (2021). Dynamic Balancing for Model Selection in Bandits and RL. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:2276-2285 Available from https://proceedings.mlr.press/v139/cutkosky21a.html.

Related Material