Directed Exploration in Reinforcement Learning with Transferred Knowledge

Timothy A. Mann, Yoonsuck Choe
Proceedings of the Tenth European Workshop on Reinforcement Learning, PMLR 24:59-76, 2013.

Abstract

Experimental results suggest that transfer learning (TL), compared to learning from scratch, can decrease exploration by reinforcement learning (RL) algorithms. Most existing TL algorithms for RL are heuristic and may result in worse performance than learning from scratch (i.e., negative transfer). We introduce a theoretically grounded and flexible approach that transfers action-values via an intertask mapping and, based on those, explores the target task systematically. We characterize positive transfer as (1) decreasing sample complexity in the target task compared to the sample complexity of the base RL algorithm (without transferred action-values) and (2) guaranteeing that the algorithm converges to a near-optimal policy (i.e., negligible optimality loss). The sample complexity of our approach is no worse than the base algorithm's, and our analysis reveals that positive transfer can occur even with highly inaccurate and partial intertask mappings. Finally, we empirically test directed exploration with transfer in a multijoint reaching task, which highlights the value of our analysis and the robustness of our approach under imperfect conditions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v24-mann12a, title = {Directed Exploration in Reinforcement Learning with Transferred Knowledge}, author = {Mann, Timothy A. and Choe, Yoonsuck}, booktitle = {Proceedings of the Tenth European Workshop on Reinforcement Learning}, pages = {59--76}, year = {2013}, editor = {Deisenroth, Marc Peter and Szepesvári, Csaba and Peters, Jan}, volume = {24}, series = {Proceedings of Machine Learning Research}, address = {Edinburgh, Scotland}, month = {30 Jun--01 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v24/mann12a/mann12a.pdf}, url = {https://proceedings.mlr.press/v24/mann12a.html}, abstract = {Experimental results suggest that transfer learning (TL), compared to learning from scratch, can decrease exploration by reinforcement learning (RL) algorithms. Most existing TL algorithms for RL are heuristic and may result in worse performance than learning from scratch (i.e., negative transfer). We introduce a theoretically grounded and flexible approach that transfers action-values via an intertask mapping and, based on those, explores the target task systematically. We characterize positive transfer as (1) decreasing sample complexity in the target task compared to the sample complexity of the base RL algorithm (without transferred action-values) and (2) guaranteeing that the algorithm converges to a near-optimal policy (i.e., negligible optimality loss). The sample complexity of our approach is no worse than the base algorithm's, and our analysis reveals that positive transfer can occur even with highly inaccurate and partial intertask mappings. Finally, we empirically test directed exploration with transfer in a multijoint reaching task, which highlights the value of our analysis and the robustness of our approach under imperfect conditions.} }
Endnote
%0 Conference Paper %T Directed Exploration in Reinforcement Learning with Transferred Knowledge %A Timothy A. Mann %A Yoonsuck Choe %B Proceedings of the Tenth European Workshop on Reinforcement Learning %C Proceedings of Machine Learning Research %D 2013 %E Marc Peter Deisenroth %E Csaba Szepesvári %E Jan Peters %F pmlr-v24-mann12a %I PMLR %P 59--76 %U https://proceedings.mlr.press/v24/mann12a.html %V 24 %X Experimental results suggest that transfer learning (TL), compared to learning from scratch, can decrease exploration by reinforcement learning (RL) algorithms. Most existing TL algorithms for RL are heuristic and may result in worse performance than learning from scratch (i.e., negative transfer). We introduce a theoretically grounded and flexible approach that transfers action-values via an intertask mapping and, based on those, explores the target task systematically. We characterize positive transfer as (1) decreasing sample complexity in the target task compared to the sample complexity of the base RL algorithm (without transferred action-values) and (2) guaranteeing that the algorithm converges to a near-optimal policy (i.e., negligible optimality loss). The sample complexity of our approach is no worse than the base algorithm's, and our analysis reveals that positive transfer can occur even with highly inaccurate and partial intertask mappings. Finally, we empirically test directed exploration with transfer in a multijoint reaching task, which highlights the value of our analysis and the robustness of our approach under imperfect conditions.
RIS
TY - CPAPER TI - Directed Exploration in Reinforcement Learning with Transferred Knowledge AU - Timothy A. Mann AU - Yoonsuck Choe BT - Proceedings of the Tenth European Workshop on Reinforcement Learning DA - 2013/01/12 ED - Marc Peter Deisenroth ED - Csaba Szepesvári ED - Jan Peters ID - pmlr-v24-mann12a PB - PMLR DP - Proceedings of Machine Learning Research VL - 24 SP - 59 EP - 76 L1 - http://proceedings.mlr.press/v24/mann12a/mann12a.pdf UR - https://proceedings.mlr.press/v24/mann12a.html AB - Experimental results suggest that transfer learning (TL), compared to learning from scratch, can decrease exploration by reinforcement learning (RL) algorithms. Most existing TL algorithms for RL are heuristic and may result in worse performance than learning from scratch (i.e., negative transfer). We introduce a theoretically grounded and flexible approach that transfers action-values via an intertask mapping and, based on those, explores the target task systematically. We characterize positive transfer as (1) decreasing sample complexity in the target task compared to the sample complexity of the base RL algorithm (without transferred action-values) and (2) guaranteeing that the algorithm converges to a near-optimal policy (i.e., negligible optimality loss). The sample complexity of our approach is no worse than the base algorithm's, and our analysis reveals that positive transfer can occur even with highly inaccurate and partial intertask mappings. Finally, we empirically test directed exploration with transfer in a multijoint reaching task, which highlights the value of our analysis and the robustness of our approach under imperfect conditions. ER -
APA
Mann, T.A. & Choe, Y.. (2013). Directed Exploration in Reinforcement Learning with Transferred Knowledge. Proceedings of the Tenth European Workshop on Reinforcement Learning, in Proceedings of Machine Learning Research 24:59-76 Available from https://proceedings.mlr.press/v24/mann12a.html.

Related Material