Asymptotically Optimal Problem-Dependent Bandit Policies for Transfer Learning

Adrien Prevost, Timothée Mathieu, Odalric-Ambrym Maillard
Proceedings of the 17th Asian Conference on Machine Learning, PMLR 304:319-334, 2025.

Abstract

We study the non-contextual multi-armed bandit problem in a transfer learning setting: before any pulls, the learner is given $N’_k$ i.i.d.\\{samples} from each source distribution $\\nu’_k$, and the true target distributions $\\nu_k$ lie within a known distance bound $d_k(\\nu_k,\\nu’_k)\\le L_k$. In this framework, we first derive a problem-dependent asymptotic lower bound on cumulative regret that extends the classical Lai–Robbins result to incorporate the transfer parameters $(d_k,L_k,N’_k)$. We then propose \\textsc\{KL-UCB-Transfer\}, a simple index policy that matches this new bound in the Gaussian case. Finally, we validate our approach via simulations, showing that \\textsc\{KL-UCB-Transfer\} significantly outperforms the no-prior baseline when source and target distributions are sufficiently close.

Cite this Paper


BibTeX
@InProceedings{pmlr-v304-prevost25a, title = {Asymptotically Optimal Problem-Dependent Bandit Policies for Transfer Learning}, author = {Prevost, Adrien and Mathieu, Timoth\'{e}e and Maillard, Odalric-Ambrym}, booktitle = {Proceedings of the 17th Asian Conference on Machine Learning}, pages = {319--334}, year = {2025}, editor = {Lee, Hung-yi and Liu, Tongliang}, volume = {304}, series = {Proceedings of Machine Learning Research}, month = {09--12 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v304/main/assets/prevost25a/prevost25a.pdf}, url = {https://proceedings.mlr.press/v304/prevost25a.html}, abstract = {We study the non-contextual multi-armed bandit problem in a transfer learning setting: before any pulls, the learner is given $N’_k$ i.i.d.\\{samples} from each source distribution $\\nu’_k$, and the true target distributions $\\nu_k$ lie within a known distance bound $d_k(\\nu_k,\\nu’_k)\\le L_k$. In this framework, we first derive a problem-dependent asymptotic lower bound on cumulative regret that extends the classical Lai–Robbins result to incorporate the transfer parameters $(d_k,L_k,N’_k)$. We then propose \\textsc\{KL-UCB-Transfer\}, a simple index policy that matches this new bound in the Gaussian case. Finally, we validate our approach via simulations, showing that \\textsc\{KL-UCB-Transfer\} significantly outperforms the no-prior baseline when source and target distributions are sufficiently close.} }
Endnote
%0 Conference Paper %T Asymptotically Optimal Problem-Dependent Bandit Policies for Transfer Learning %A Adrien Prevost %A Timothée Mathieu %A Odalric-Ambrym Maillard %B Proceedings of the 17th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Hung-yi Lee %E Tongliang Liu %F pmlr-v304-prevost25a %I PMLR %P 319--334 %U https://proceedings.mlr.press/v304/prevost25a.html %V 304 %X We study the non-contextual multi-armed bandit problem in a transfer learning setting: before any pulls, the learner is given $N’_k$ i.i.d.\\{samples} from each source distribution $\\nu’_k$, and the true target distributions $\\nu_k$ lie within a known distance bound $d_k(\\nu_k,\\nu’_k)\\le L_k$. In this framework, we first derive a problem-dependent asymptotic lower bound on cumulative regret that extends the classical Lai–Robbins result to incorporate the transfer parameters $(d_k,L_k,N’_k)$. We then propose \\textsc\{KL-UCB-Transfer\}, a simple index policy that matches this new bound in the Gaussian case. Finally, we validate our approach via simulations, showing that \\textsc\{KL-UCB-Transfer\} significantly outperforms the no-prior baseline when source and target distributions are sufficiently close.
APA
Prevost, A., Mathieu, T. & Maillard, O.. (2025). Asymptotically Optimal Problem-Dependent Bandit Policies for Transfer Learning. Proceedings of the 17th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 304:319-334 Available from https://proceedings.mlr.press/v304/prevost25a.html.

Related Material