Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation

Chenyang Zhao; Timothy Hospedales

Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation

Chenyang Zhao, Timothy Hospedales

Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:1237-1252, 2021.

Abstract

In reinforcement learning, domain randomisation is a popular technique for learning general policies that are robust to new environments and domain-shifts at deployment. However, naively aggregating information from randomised domains may lead to high variances in gradient estimation and sub-optimal policies. To address this issue, we present a peer-to-peer online distillation strategy for reinforcement learning termed P2PDRL, where multiple learning agents are each assigned to a different environment, and then exchange knowledge through mutual regularisation based on Kullback–Leibler divergence. Our experiments on continuous control tasks show that P2PDRL enables robust learning across a wider randomisation distribution than baselines, and more robust generalisation performance to new environments at testing.

Cite this Paper

BibTeX


@InProceedings{pmlr-v157-zhao21b,
  title = 	 {Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation},
  author =       {Zhao, Chenyang and Hospedales, Timothy},
  booktitle = 	 {Proceedings of The 13th Asian Conference on Machine Learning},
  pages = 	 {1237--1252},
  year = 	 {2021},
  editor = 	 {Balasubramanian, Vineeth N. and Tsang, Ivor},
  volume = 	 {157},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--19 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v157/zhao21b/zhao21b.pdf},
  url = 	 {https://proceedings.mlr.press/v157/zhao21b.html},
  abstract = 	 {In reinforcement learning, domain randomisation is a popular technique for learning general policies that are robust to new environments and domain-shifts at deployment. However, naively aggregating information from randomised domains may lead to high variances in gradient estimation and sub-optimal policies. To address this issue, we present a peer-to-peer online distillation strategy for reinforcement learning termed P2PDRL, where multiple learning agents are each assigned to a different environment, and then exchange knowledge through mutual regularisation based on Kullback–Leibler divergence. Our experiments on continuous control tasks show that P2PDRL enables robust learning across a wider randomisation distribution than baselines, and more robust generalisation performance to new environments at testing.}
}

Endnote

%0 Conference Paper
%T Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation
%A Chenyang Zhao
%A Timothy Hospedales
%B Proceedings of The 13th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Vineeth N. Balasubramanian
%E Ivor Tsang	
%F pmlr-v157-zhao21b
%I PMLR
%P 1237--1252
%U https://proceedings.mlr.press/v157/zhao21b.html
%V 157
%X In reinforcement learning, domain randomisation is a popular technique for learning general policies that are robust to new environments and domain-shifts at deployment. However, naively aggregating information from randomised domains may lead to high variances in gradient estimation and sub-optimal policies. To address this issue, we present a peer-to-peer online distillation strategy for reinforcement learning termed P2PDRL, where multiple learning agents are each assigned to a different environment, and then exchange knowledge through mutual regularisation based on Kullback–Leibler divergence. Our experiments on continuous control tasks show that P2PDRL enables robust learning across a wider randomisation distribution than baselines, and more robust generalisation performance to new environments at testing.

APA


Zhao, C. & Hospedales, T.. (2021). Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:1237-1252 Available from https://proceedings.mlr.press/v157/zhao21b.html.

Related Material

Download PDF