Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation

Chenyang Zhao, Timothy Hospedales
Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:1237-1252, 2021.

Abstract

In reinforcement learning, domain randomisation is a popular technique for learning general policies that are robust to new environments and domain-shifts at deployment. However, naively aggregating information from randomised domains may lead to high variances in gradient estimation and sub-optimal policies. To address this issue, we present a peer-to-peer online distillation strategy for reinforcement learning termed P2PDRL, where multiple learning agents are each assigned to a different environment, and then exchange knowledge through mutual regularisation based on Kullback–Leibler divergence. Our experiments on continuous control tasks show that P2PDRL enables robust learning across a wider randomisation distribution than baselines, and more robust generalisation performance to new environments at testing.

Cite this Paper


BibTeX
@InProceedings{pmlr-v157-zhao21b, title = {Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation}, author = {Zhao, Chenyang and Hospedales, Timothy}, booktitle = {Proceedings of The 13th Asian Conference on Machine Learning}, pages = {1237--1252}, year = {2021}, editor = {Balasubramanian, Vineeth N. and Tsang, Ivor}, volume = {157}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v157/zhao21b/zhao21b.pdf}, url = {https://proceedings.mlr.press/v157/zhao21b.html}, abstract = {In reinforcement learning, domain randomisation is a popular technique for learning general policies that are robust to new environments and domain-shifts at deployment. However, naively aggregating information from randomised domains may lead to high variances in gradient estimation and sub-optimal policies. To address this issue, we present a peer-to-peer online distillation strategy for reinforcement learning termed P2PDRL, where multiple learning agents are each assigned to a different environment, and then exchange knowledge through mutual regularisation based on Kullback–Leibler divergence. Our experiments on continuous control tasks show that P2PDRL enables robust learning across a wider randomisation distribution than baselines, and more robust generalisation performance to new environments at testing.} }
Endnote
%0 Conference Paper %T Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation %A Chenyang Zhao %A Timothy Hospedales %B Proceedings of The 13th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Vineeth N. Balasubramanian %E Ivor Tsang %F pmlr-v157-zhao21b %I PMLR %P 1237--1252 %U https://proceedings.mlr.press/v157/zhao21b.html %V 157 %X In reinforcement learning, domain randomisation is a popular technique for learning general policies that are robust to new environments and domain-shifts at deployment. However, naively aggregating information from randomised domains may lead to high variances in gradient estimation and sub-optimal policies. To address this issue, we present a peer-to-peer online distillation strategy for reinforcement learning termed P2PDRL, where multiple learning agents are each assigned to a different environment, and then exchange knowledge through mutual regularisation based on Kullback–Leibler divergence. Our experiments on continuous control tasks show that P2PDRL enables robust learning across a wider randomisation distribution than baselines, and more robust generalisation performance to new environments at testing.
APA
Zhao, C. & Hospedales, T.. (2021). Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:1237-1252 Available from https://proceedings.mlr.press/v157/zhao21b.html.

Related Material