Feature and Parameter Selection in Stochastic Linear Bandits

Ahmadreza Moradipari, Berkay Turan, Yasin Abbasi-Yadkori, Mahnoosh Alizadeh, Mohammad Ghavamzadeh
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:15927-15958, 2022.

Abstract

We study two model selection settings in stochastic linear bandits (LB). In the first setting, which we refer to as feature selection, the expected reward of the LB problem is in the linear span of at least one of $M$ feature maps (models). In the second setting, the reward parameter of the LB problem is arbitrarily selected from $M$ models represented as (possibly) overlapping balls in $\mathbb R^d$. However, the agent only has access to misspecified models, i.e., estimates of the centers and radii of the balls. We refer to this setting as parameter selection. For each setting, we develop and analyze a computationally efficient algorithm that is based on a reduction from bandits to full-information problems. This allows us to obtain regret bounds that are not worse (up to a $\sqrt{\log M}$ factor) than the case where the true model is known. This is the best reported dependence on the number of models $M$ in these settings. Finally, we empirically show the effectiveness of our algorithms using synthetic and real-world experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-moradipari22a, title = {Feature and Parameter Selection in Stochastic Linear Bandits}, author = {Moradipari, Ahmadreza and Turan, Berkay and Abbasi-Yadkori, Yasin and Alizadeh, Mahnoosh and Ghavamzadeh, Mohammad}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {15927--15958}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/moradipari22a/moradipari22a.pdf}, url = {https://proceedings.mlr.press/v162/moradipari22a.html}, abstract = {We study two model selection settings in stochastic linear bandits (LB). In the first setting, which we refer to as feature selection, the expected reward of the LB problem is in the linear span of at least one of $M$ feature maps (models). In the second setting, the reward parameter of the LB problem is arbitrarily selected from $M$ models represented as (possibly) overlapping balls in $\mathbb R^d$. However, the agent only has access to misspecified models, i.e., estimates of the centers and radii of the balls. We refer to this setting as parameter selection. For each setting, we develop and analyze a computationally efficient algorithm that is based on a reduction from bandits to full-information problems. This allows us to obtain regret bounds that are not worse (up to a $\sqrt{\log M}$ factor) than the case where the true model is known. This is the best reported dependence on the number of models $M$ in these settings. Finally, we empirically show the effectiveness of our algorithms using synthetic and real-world experiments.} }
Endnote
%0 Conference Paper %T Feature and Parameter Selection in Stochastic Linear Bandits %A Ahmadreza Moradipari %A Berkay Turan %A Yasin Abbasi-Yadkori %A Mahnoosh Alizadeh %A Mohammad Ghavamzadeh %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-moradipari22a %I PMLR %P 15927--15958 %U https://proceedings.mlr.press/v162/moradipari22a.html %V 162 %X We study two model selection settings in stochastic linear bandits (LB). In the first setting, which we refer to as feature selection, the expected reward of the LB problem is in the linear span of at least one of $M$ feature maps (models). In the second setting, the reward parameter of the LB problem is arbitrarily selected from $M$ models represented as (possibly) overlapping balls in $\mathbb R^d$. However, the agent only has access to misspecified models, i.e., estimates of the centers and radii of the balls. We refer to this setting as parameter selection. For each setting, we develop and analyze a computationally efficient algorithm that is based on a reduction from bandits to full-information problems. This allows us to obtain regret bounds that are not worse (up to a $\sqrt{\log M}$ factor) than the case where the true model is known. This is the best reported dependence on the number of models $M$ in these settings. Finally, we empirically show the effectiveness of our algorithms using synthetic and real-world experiments.
APA
Moradipari, A., Turan, B., Abbasi-Yadkori, Y., Alizadeh, M. & Ghavamzadeh, M.. (2022). Feature and Parameter Selection in Stochastic Linear Bandits. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:15927-15958 Available from https://proceedings.mlr.press/v162/moradipari22a.html.

Related Material