Directed Exploration in Reinforcement Learning with Transferred Knowledge

Timothy A. Mann; Yoonsuck Choe

Directed Exploration in Reinforcement Learning with Transferred Knowledge

Timothy A. Mann, Yoonsuck Choe

Proceedings of the Tenth European Workshop on Reinforcement Learning, PMLR 24:59-76, 2013.

Abstract

Experimental results suggest that transfer learning (TL), compared to learning from scratch, can decrease exploration by reinforcement learning (RL) algorithms. Most existing TL algorithms for RL are heuristic and may result in worse performance than learning from scratch (i.e., negative transfer). We introduce a theoretically grounded and flexible approach that transfers action-values via an intertask mapping and, based on those, explores the target task systematically. We characterize positive transfer as (1) decreasing sample complexity in the target task compared to the sample complexity of the base RL algorithm (without transferred action-values) and (2) guaranteeing that the algorithm converges to a near-optimal policy (i.e., negligible optimality loss). The sample complexity of our approach is no worse than the base algorithm's, and our analysis reveals that positive transfer can occur even with highly inaccurate and partial intertask mappings. Finally, we empirically test directed exploration with transfer in a multijoint reaching task, which highlights the value of our analysis and the robustness of our approach under imperfect conditions.

Cite this Paper

BibTeX


@InProceedings{pmlr-v24-mann12a,
  title = 	 {Directed Exploration in Reinforcement Learning with Transferred Knowledge},
  author = 	 {Mann, Timothy A. and Choe, Yoonsuck},
  booktitle = 	 {Proceedings of the Tenth European Workshop on Reinforcement Learning},
  pages = 	 {59--76},
  year = 	 {2013},
  editor = 	 {Deisenroth, Marc Peter and Szepesvári, Csaba and Peters, Jan},
  volume = 	 {24},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Edinburgh, Scotland},
  month = 	 {30 Jun--01 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v24/mann12a/mann12a.pdf},
  url = 	 {https://proceedings.mlr.press/v24/mann12a.html},
  abstract = 	 {Experimental results suggest that transfer learning (TL), compared to learning from scratch, can decrease exploration by reinforcement learning (RL) algorithms. Most existing TL algorithms for RL are heuristic and may result in worse performance than learning from scratch (i.e., negative transfer). We introduce a theoretically grounded and flexible approach that transfers action-values via an intertask mapping and, based on those, explores the target task systematically. We characterize positive transfer as (1) decreasing sample complexity in the target task compared to the sample complexity of the base RL algorithm (without transferred action-values) and (2) guaranteeing that the algorithm converges to a near-optimal policy (i.e., negligible optimality loss). The sample complexity of our approach is no worse than the base algorithm's, and our analysis reveals that positive transfer can occur even with highly inaccurate and partial intertask mappings. Finally, we empirically test directed exploration with transfer in a multijoint reaching task, which highlights the value of our analysis and the robustness of our approach under imperfect conditions.}
}

Endnote

%0 Conference Paper
%T Directed Exploration in Reinforcement Learning with Transferred Knowledge
%A Timothy A. Mann
%A Yoonsuck Choe
%B Proceedings of the Tenth European Workshop on Reinforcement Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Marc Peter Deisenroth
%E Csaba Szepesvári
%E Jan Peters	
%F pmlr-v24-mann12a
%I PMLR
%P 59--76
%U https://proceedings.mlr.press/v24/mann12a.html
%V 24
%X Experimental results suggest that transfer learning (TL), compared to learning from scratch, can decrease exploration by reinforcement learning (RL) algorithms. Most existing TL algorithms for RL are heuristic and may result in worse performance than learning from scratch (i.e., negative transfer). We introduce a theoretically grounded and flexible approach that transfers action-values via an intertask mapping and, based on those, explores the target task systematically. We characterize positive transfer as (1) decreasing sample complexity in the target task compared to the sample complexity of the base RL algorithm (without transferred action-values) and (2) guaranteeing that the algorithm converges to a near-optimal policy (i.e., negligible optimality loss). The sample complexity of our approach is no worse than the base algorithm's, and our analysis reveals that positive transfer can occur even with highly inaccurate and partial intertask mappings. Finally, we empirically test directed exploration with transfer in a multijoint reaching task, which highlights the value of our analysis and the robustness of our approach under imperfect conditions.

RIS


TY  - CPAPER
TI  - Directed Exploration in Reinforcement Learning with Transferred Knowledge
AU  - Timothy A. Mann
AU  - Yoonsuck Choe
BT  - Proceedings of the Tenth European Workshop on Reinforcement Learning
DA  - 2013/01/12
ED  - Marc Peter Deisenroth
ED  - Csaba Szepesvári
ED  - Jan Peters	
ID  - pmlr-v24-mann12a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 24
SP  - 59
EP  - 76
L1  - http://proceedings.mlr.press/v24/mann12a/mann12a.pdf
UR  - https://proceedings.mlr.press/v24/mann12a.html
AB  - Experimental results suggest that transfer learning (TL), compared to learning from scratch, can decrease exploration by reinforcement learning (RL) algorithms. Most existing TL algorithms for RL are heuristic and may result in worse performance than learning from scratch (i.e., negative transfer). We introduce a theoretically grounded and flexible approach that transfers action-values via an intertask mapping and, based on those, explores the target task systematically. We characterize positive transfer as (1) decreasing sample complexity in the target task compared to the sample complexity of the base RL algorithm (without transferred action-values) and (2) guaranteeing that the algorithm converges to a near-optimal policy (i.e., negligible optimality loss). The sample complexity of our approach is no worse than the base algorithm's, and our analysis reveals that positive transfer can occur even with highly inaccurate and partial intertask mappings. Finally, we empirically test directed exploration with transfer in a multijoint reaching task, which highlights the value of our analysis and the robustness of our approach under imperfect conditions.
ER  -

APA


Mann, T.A. & Choe, Y.. (2013). Directed Exploration in Reinforcement Learning with Transferred Knowledge. Proceedings of the Tenth European Workshop on Reinforcement Learning, in Proceedings of Machine Learning Research 24:59-76 Available from https://proceedings.mlr.press/v24/mann12a.html.

Related Material

Download PDF