Multi-task Representation Learning with Stochastic Linear Bandits

Leonardo Cella, Karim Lounici, Grégoire Pacreau, Massimiliano Pontil
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:4822-4847, 2023.

Abstract

We study the problem of transfer-learning in the setting of stochastic linear contextual bandit tasks. We consider that a low dimensional linear representation is shared across the tasks, and study the benefit of learning the tasks jointly. Following recent results to design Lasso stochastic bandit policies, we propose an efficient greedy policy based on trace norm regularization. It implicitly learns a low dimensional representation by encouraging the matrix formed by the task regression vectors to be of low rank. Unlike previous work in the literature, our policy does not need to know the rank of the underlying matrix, nor {does} it requires the covariance of the arms distribution to be invertible. We derive an upper bound on the multi-task regret of our policy, which is, up to logarithmic factors, of order $T\sqrt{rN}+\sqrt{rNTd}$, where $T$ is the number of tasks, $r$ the rank, $d$ the number of variables and $N$ the number of rounds per task. We show the benefit of our strategy over an independent task learning baseline, which has a worse regret of order $T\sqrt{dN}$. We also argue that our policy {is minimax optimal} and, when $T\geq d$, has a multi-task regret which is comparable to the regret of an oracle policy which knows the true underlying representation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-cella23a, title = {Multi-task Representation Learning with Stochastic Linear Bandits}, author = {Cella, Leonardo and Lounici, Karim and Pacreau, Gr\'egoire and Pontil, Massimiliano}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {4822--4847}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/cella23a/cella23a.pdf}, url = {https://proceedings.mlr.press/v206/cella23a.html}, abstract = {We study the problem of transfer-learning in the setting of stochastic linear contextual bandit tasks. We consider that a low dimensional linear representation is shared across the tasks, and study the benefit of learning the tasks jointly. Following recent results to design Lasso stochastic bandit policies, we propose an efficient greedy policy based on trace norm regularization. It implicitly learns a low dimensional representation by encouraging the matrix formed by the task regression vectors to be of low rank. Unlike previous work in the literature, our policy does not need to know the rank of the underlying matrix, nor {does} it requires the covariance of the arms distribution to be invertible. We derive an upper bound on the multi-task regret of our policy, which is, up to logarithmic factors, of order $T\sqrt{rN}+\sqrt{rNTd}$, where $T$ is the number of tasks, $r$ the rank, $d$ the number of variables and $N$ the number of rounds per task. We show the benefit of our strategy over an independent task learning baseline, which has a worse regret of order $T\sqrt{dN}$. We also argue that our policy {is minimax optimal} and, when $T\geq d$, has a multi-task regret which is comparable to the regret of an oracle policy which knows the true underlying representation.} }
Endnote
%0 Conference Paper %T Multi-task Representation Learning with Stochastic Linear Bandits %A Leonardo Cella %A Karim Lounici %A Grégoire Pacreau %A Massimiliano Pontil %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-cella23a %I PMLR %P 4822--4847 %U https://proceedings.mlr.press/v206/cella23a.html %V 206 %X We study the problem of transfer-learning in the setting of stochastic linear contextual bandit tasks. We consider that a low dimensional linear representation is shared across the tasks, and study the benefit of learning the tasks jointly. Following recent results to design Lasso stochastic bandit policies, we propose an efficient greedy policy based on trace norm regularization. It implicitly learns a low dimensional representation by encouraging the matrix formed by the task regression vectors to be of low rank. Unlike previous work in the literature, our policy does not need to know the rank of the underlying matrix, nor {does} it requires the covariance of the arms distribution to be invertible. We derive an upper bound on the multi-task regret of our policy, which is, up to logarithmic factors, of order $T\sqrt{rN}+\sqrt{rNTd}$, where $T$ is the number of tasks, $r$ the rank, $d$ the number of variables and $N$ the number of rounds per task. We show the benefit of our strategy over an independent task learning baseline, which has a worse regret of order $T\sqrt{dN}$. We also argue that our policy {is minimax optimal} and, when $T\geq d$, has a multi-task regret which is comparable to the regret of an oracle policy which knows the true underlying representation.
APA
Cella, L., Lounici, K., Pacreau, G. & Pontil, M.. (2023). Multi-task Representation Learning with Stochastic Linear Bandits. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:4822-4847 Available from https://proceedings.mlr.press/v206/cella23a.html.

Related Material