Bilinear Bandits with Low-rank Structure

Kwang-Sung Jun, Rebecca Willett, Stephen Wright, Robert Nowak
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:3163-3172, 2019.

Abstract

We introduce the bilinear bandit problem with low-rank structure in which an action takes the form of a pair of arms from two different entity types, and the reward is a bilinear function of the known feature vectors of the arms. The unknown in the problem is a $d_1$ by $d_2$ matrix $\mathbf{\Theta}^*$ that defines the reward, and has low rank $r \ll \min\{d_1,d_2\}$. Determination of $\mathbf{\Theta}^*$ with this low-rank structure poses a significant challenge in finding the right exploration-exploitation tradeoff. In this work, we propose a new two-stage algorithm called “Explore-Subspace-Then-Refine” (ESTR). The first stage is an explicit subspace exploration, while the second stage is a linear bandit algorithm called “almost-low-dimensional OFUL” (LowOFUL) that exploits and further refines the estimated subspace via a regularization technique. We show that the regret of ESTR is $\widetilde{\mathcal{O}}((d_1+d_2)^{3/2} \sqrt{r T})$ where $\widetilde{\mathcal{O}}$ hides logarithmic factors and $T$ is the time horizon, which improves upon the regret of $\widetilde{\mathcal{O}}(d_1d_2\sqrt{T})$ attained for a naïve linear bandit reduction. We conjecture that the regret bound of ESTR is unimprovable up to polylogarithmic factors, and our preliminary experiment shows that ESTR outperforms a naïve linear bandit reduction.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-jun19a, title = {Bilinear Bandits with Low-rank Structure}, author = {Jun, Kwang-Sung and Willett, Rebecca and Wright, Stephen and Nowak, Robert}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {3163--3172}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/jun19a/jun19a.pdf}, url = {https://proceedings.mlr.press/v97/jun19a.html}, abstract = {We introduce the bilinear bandit problem with low-rank structure in which an action takes the form of a pair of arms from two different entity types, and the reward is a bilinear function of the known feature vectors of the arms. The unknown in the problem is a $d_1$ by $d_2$ matrix $\mathbf{\Theta}^*$ that defines the reward, and has low rank $r \ll \min\{d_1,d_2\}$. Determination of $\mathbf{\Theta}^*$ with this low-rank structure poses a significant challenge in finding the right exploration-exploitation tradeoff. In this work, we propose a new two-stage algorithm called “Explore-Subspace-Then-Refine” (ESTR). The first stage is an explicit subspace exploration, while the second stage is a linear bandit algorithm called “almost-low-dimensional OFUL” (LowOFUL) that exploits and further refines the estimated subspace via a regularization technique. We show that the regret of ESTR is $\widetilde{\mathcal{O}}((d_1+d_2)^{3/2} \sqrt{r T})$ where $\widetilde{\mathcal{O}}$ hides logarithmic factors and $T$ is the time horizon, which improves upon the regret of $\widetilde{\mathcal{O}}(d_1d_2\sqrt{T})$ attained for a naïve linear bandit reduction. We conjecture that the regret bound of ESTR is unimprovable up to polylogarithmic factors, and our preliminary experiment shows that ESTR outperforms a naïve linear bandit reduction.} }
Endnote
%0 Conference Paper %T Bilinear Bandits with Low-rank Structure %A Kwang-Sung Jun %A Rebecca Willett %A Stephen Wright %A Robert Nowak %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-jun19a %I PMLR %P 3163--3172 %U https://proceedings.mlr.press/v97/jun19a.html %V 97 %X We introduce the bilinear bandit problem with low-rank structure in which an action takes the form of a pair of arms from two different entity types, and the reward is a bilinear function of the known feature vectors of the arms. The unknown in the problem is a $d_1$ by $d_2$ matrix $\mathbf{\Theta}^*$ that defines the reward, and has low rank $r \ll \min\{d_1,d_2\}$. Determination of $\mathbf{\Theta}^*$ with this low-rank structure poses a significant challenge in finding the right exploration-exploitation tradeoff. In this work, we propose a new two-stage algorithm called “Explore-Subspace-Then-Refine” (ESTR). The first stage is an explicit subspace exploration, while the second stage is a linear bandit algorithm called “almost-low-dimensional OFUL” (LowOFUL) that exploits and further refines the estimated subspace via a regularization technique. We show that the regret of ESTR is $\widetilde{\mathcal{O}}((d_1+d_2)^{3/2} \sqrt{r T})$ where $\widetilde{\mathcal{O}}$ hides logarithmic factors and $T$ is the time horizon, which improves upon the regret of $\widetilde{\mathcal{O}}(d_1d_2\sqrt{T})$ attained for a naïve linear bandit reduction. We conjecture that the regret bound of ESTR is unimprovable up to polylogarithmic factors, and our preliminary experiment shows that ESTR outperforms a naïve linear bandit reduction.
APA
Jun, K., Willett, R., Wright, S. & Nowak, R.. (2019). Bilinear Bandits with Low-rank Structure. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:3163-3172 Available from https://proceedings.mlr.press/v97/jun19a.html.

Related Material