Gradient-free Policy Architecture Search and Adaptation

Sayna Ebrahimi, Anna Rohrbach, Trevor Darrell
Proceedings of the 1st Annual Conference on Robot Learning, PMLR 78:505-514, 2017.

Abstract

We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can learn with relatively few early catastrophic failures. We first learn an architecture of appropriate complexity to perceive aspects of world state relevant to the expert demonstration, and then mitigate the effect of domain-shift during deployment by adapting a policy demonstrated in a source domain to rewards obtained in a target environment. We show that our approach allows safer learning than baseline methods, offering a reduced cumulative crash metric over the agent’s lifetime as it learns to drive in a realistic simulated environment.

Cite this Paper


BibTeX
@InProceedings{pmlr-v78-ebrahimi17a, title = {Gradient-free Policy Architecture Search and Adaptation}, author = {Ebrahimi, Sayna and Rohrbach, Anna and Darrell, Trevor}, booktitle = {Proceedings of the 1st Annual Conference on Robot Learning}, pages = {505--514}, year = {2017}, editor = {Levine, Sergey and Vanhoucke, Vincent and Goldberg, Ken}, volume = {78}, series = {Proceedings of Machine Learning Research}, month = {13--15 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v78/ebrahimi17a/ebrahimi17a.pdf}, url = {https://proceedings.mlr.press/v78/ebrahimi17a.html}, abstract = {We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can learn with relatively few early catastrophic failures. We first learn an architecture of appropriate complexity to perceive aspects of world state relevant to the expert demonstration, and then mitigate the effect of domain-shift during deployment by adapting a policy demonstrated in a source domain to rewards obtained in a target environment. We show that our approach allows safer learning than baseline methods, offering a reduced cumulative crash metric over the agent’s lifetime as it learns to drive in a realistic simulated environment.} }
Endnote
%0 Conference Paper %T Gradient-free Policy Architecture Search and Adaptation %A Sayna Ebrahimi %A Anna Rohrbach %A Trevor Darrell %B Proceedings of the 1st Annual Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2017 %E Sergey Levine %E Vincent Vanhoucke %E Ken Goldberg %F pmlr-v78-ebrahimi17a %I PMLR %P 505--514 %U https://proceedings.mlr.press/v78/ebrahimi17a.html %V 78 %X We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can learn with relatively few early catastrophic failures. We first learn an architecture of appropriate complexity to perceive aspects of world state relevant to the expert demonstration, and then mitigate the effect of domain-shift during deployment by adapting a policy demonstrated in a source domain to rewards obtained in a target environment. We show that our approach allows safer learning than baseline methods, offering a reduced cumulative crash metric over the agent’s lifetime as it learns to drive in a realistic simulated environment.
APA
Ebrahimi, S., Rohrbach, A. & Darrell, T.. (2017). Gradient-free Policy Architecture Search and Adaptation. Proceedings of the 1st Annual Conference on Robot Learning, in Proceedings of Machine Learning Research 78:505-514 Available from https://proceedings.mlr.press/v78/ebrahimi17a.html.

Related Material