Hierarchical Relative Entropy Policy Search

Christian Daniel, Gerhard Neumann, Jan Peters
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:273-281, 2012.

Abstract

Many real-world problems are inherently hi- erarchically structured. The use of this struc- ture in an agent’s policy may well be the key to improved scalability and higher per- formance. However, such hierarchical struc- tures cannot be exploited by current policy search algorithms. We will concentrate on a basic, but highly relevant hierarchy - the ’mixed option’ policy. Here, a gating network first decides which of the options to execute and, subsequently, the option-policy deter- mines the action. In this paper, we reformulate learning a hi- erarchical policy as a latent variable estima- tion problem and subsequently extend the Relative Entropy Policy Search (REPS) to the latent variable case. We show that our Hierarchical REPS can learn versatile solu- tions while also showing an increased perfor- mance in terms of learning speed and quality of the found policy in comparison to the non- hierarchical approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v22-daniel12, title = {Hierarchical Relative Entropy Policy Search}, author = {Daniel, Christian and Neumann, Gerhard and Peters, Jan}, booktitle = {Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics}, pages = {273--281}, year = {2012}, editor = {Lawrence, Neil D. and Girolami, Mark}, volume = {22}, series = {Proceedings of Machine Learning Research}, address = {La Palma, Canary Islands}, month = {21--23 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v22/daniel12/daniel12.pdf}, url = {https://proceedings.mlr.press/v22/daniel12.html}, abstract = {Many real-world problems are inherently hi- erarchically structured. The use of this struc- ture in an agent’s policy may well be the key to improved scalability and higher per- formance. However, such hierarchical struc- tures cannot be exploited by current policy search algorithms. We will concentrate on a basic, but highly relevant hierarchy - the ’mixed option’ policy. Here, a gating network first decides which of the options to execute and, subsequently, the option-policy deter- mines the action. In this paper, we reformulate learning a hi- erarchical policy as a latent variable estima- tion problem and subsequently extend the Relative Entropy Policy Search (REPS) to the latent variable case. We show that our Hierarchical REPS can learn versatile solu- tions while also showing an increased perfor- mance in terms of learning speed and quality of the found policy in comparison to the non- hierarchical approach.} }
Endnote
%0 Conference Paper %T Hierarchical Relative Entropy Policy Search %A Christian Daniel %A Gerhard Neumann %A Jan Peters %B Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2012 %E Neil D. Lawrence %E Mark Girolami %F pmlr-v22-daniel12 %I PMLR %P 273--281 %U https://proceedings.mlr.press/v22/daniel12.html %V 22 %X Many real-world problems are inherently hi- erarchically structured. The use of this struc- ture in an agent’s policy may well be the key to improved scalability and higher per- formance. However, such hierarchical struc- tures cannot be exploited by current policy search algorithms. We will concentrate on a basic, but highly relevant hierarchy - the ’mixed option’ policy. Here, a gating network first decides which of the options to execute and, subsequently, the option-policy deter- mines the action. In this paper, we reformulate learning a hi- erarchical policy as a latent variable estima- tion problem and subsequently extend the Relative Entropy Policy Search (REPS) to the latent variable case. We show that our Hierarchical REPS can learn versatile solu- tions while also showing an increased perfor- mance in terms of learning speed and quality of the found policy in comparison to the non- hierarchical approach.
RIS
TY - CPAPER TI - Hierarchical Relative Entropy Policy Search AU - Christian Daniel AU - Gerhard Neumann AU - Jan Peters BT - Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics DA - 2012/03/21 ED - Neil D. Lawrence ED - Mark Girolami ID - pmlr-v22-daniel12 PB - PMLR DP - Proceedings of Machine Learning Research VL - 22 SP - 273 EP - 281 L1 - http://proceedings.mlr.press/v22/daniel12/daniel12.pdf UR - https://proceedings.mlr.press/v22/daniel12.html AB - Many real-world problems are inherently hi- erarchically structured. The use of this struc- ture in an agent’s policy may well be the key to improved scalability and higher per- formance. However, such hierarchical struc- tures cannot be exploited by current policy search algorithms. We will concentrate on a basic, but highly relevant hierarchy - the ’mixed option’ policy. Here, a gating network first decides which of the options to execute and, subsequently, the option-policy deter- mines the action. In this paper, we reformulate learning a hi- erarchical policy as a latent variable estima- tion problem and subsequently extend the Relative Entropy Policy Search (REPS) to the latent variable case. We show that our Hierarchical REPS can learn versatile solu- tions while also showing an increased perfor- mance in terms of learning speed and quality of the found policy in comparison to the non- hierarchical approach. ER -
APA
Daniel, C., Neumann, G. & Peters, J.. (2012). Hierarchical Relative Entropy Policy Search. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 22:273-281 Available from https://proceedings.mlr.press/v22/daniel12.html.

Related Material