Transfer of Samples in Policy Search via Multiple Importance Sampling

Andrea Tirinzoni, Mattia Salvini, Marcello Restelli
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:6264-6274, 2019.

Abstract

We consider the transfer of experience samples in reinforcement learning. Most of the previous works in this context focused on value-based settings, where transferring instances conveniently reduces to the transfer of (s,a,s’,r) tuples. In this paper, we consider the more complex case of reusing samples in policy search methods, in which the agent is required to transfer entire trajectories between environments with different transition models. By leveraging ideas from multiple importance sampling, we propose robust gradient estimators that effectively achieve this goal, along with several techniques to reduce their variance. In the case where the transition models are known, we theoretically establish the robustness to the negative transfer for our estimators. In the case of unknown models, we propose a method to efficiently estimate them when the target task belongs to a finite set of possible tasks and when it belongs to some reproducing kernel Hilbert space. We provide empirical results to show the effectiveness of our estimators.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-tirinzoni19a, title = {Transfer of Samples in Policy Search via Multiple Importance Sampling}, author = {Tirinzoni, Andrea and Salvini, Mattia and Restelli, Marcello}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {6264--6274}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/tirinzoni19a/tirinzoni19a.pdf}, url = {https://proceedings.mlr.press/v97/tirinzoni19a.html}, abstract = {We consider the transfer of experience samples in reinforcement learning. Most of the previous works in this context focused on value-based settings, where transferring instances conveniently reduces to the transfer of (s,a,s’,r) tuples. In this paper, we consider the more complex case of reusing samples in policy search methods, in which the agent is required to transfer entire trajectories between environments with different transition models. By leveraging ideas from multiple importance sampling, we propose robust gradient estimators that effectively achieve this goal, along with several techniques to reduce their variance. In the case where the transition models are known, we theoretically establish the robustness to the negative transfer for our estimators. In the case of unknown models, we propose a method to efficiently estimate them when the target task belongs to a finite set of possible tasks and when it belongs to some reproducing kernel Hilbert space. We provide empirical results to show the effectiveness of our estimators.} }
Endnote
%0 Conference Paper %T Transfer of Samples in Policy Search via Multiple Importance Sampling %A Andrea Tirinzoni %A Mattia Salvini %A Marcello Restelli %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-tirinzoni19a %I PMLR %P 6264--6274 %U https://proceedings.mlr.press/v97/tirinzoni19a.html %V 97 %X We consider the transfer of experience samples in reinforcement learning. Most of the previous works in this context focused on value-based settings, where transferring instances conveniently reduces to the transfer of (s,a,s’,r) tuples. In this paper, we consider the more complex case of reusing samples in policy search methods, in which the agent is required to transfer entire trajectories between environments with different transition models. By leveraging ideas from multiple importance sampling, we propose robust gradient estimators that effectively achieve this goal, along with several techniques to reduce their variance. In the case where the transition models are known, we theoretically establish the robustness to the negative transfer for our estimators. In the case of unknown models, we propose a method to efficiently estimate them when the target task belongs to a finite set of possible tasks and when it belongs to some reproducing kernel Hilbert space. We provide empirical results to show the effectiveness of our estimators.
APA
Tirinzoni, A., Salvini, M. & Restelli, M.. (2019). Transfer of Samples in Policy Search via Multiple Importance Sampling. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:6264-6274 Available from https://proceedings.mlr.press/v97/tirinzoni19a.html.

Related Material