Graph Policy Gradients for Large Scale Robot Control

Arbaaz Khan, Ekaterina Tolstaya, Alejandro Ribeiro, Vijay Kumar
; Proceedings of the Conference on Robot Learning, PMLR 100:823-834, 2020.

Abstract

In this paper, the problem of learning policies to control a large number of homogeneous robots is considered. To this end, we propose a new algorithm we call Graph Policy Gradients (GPG) that exploits the underlying graph symmetry among the robots. The curse of dimensionality one encounters when working with a large number of robots is mitigated by employing a graph convolutional neural (GCN) network to parametrize policies for the robots. The GCN reduces the dimensionality of the problem by learning filters that aggregate information among robots locally, similar to how a convolutional neural network is able to learn local features in an image. Through experiments on formation flying, we show that our proposed method is able to scale better than existing reinforcement methods that employ fully connected networks. More importantly, we show that by using our locally learned filters we are able to zero-shot transfer policies trained on just three robots to over hundred robots. A video demonstrating our results can be found here.

Cite this Paper


BibTeX
@InProceedings{pmlr-v100-khan20a, title = {Graph Policy Gradients for Large Scale Robot Control}, author = {Khan, Arbaaz and Tolstaya, Ekaterina and Ribeiro, Alejandro and Kumar, Vijay}, pages = {823--834}, year = {2020}, editor = {Leslie Pack Kaelbling and Danica Kragic and Komei Sugiura}, volume = {100}, series = {Proceedings of Machine Learning Research}, address = {}, month = {30 Oct--01 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v100/khan20a/khan20a.pdf}, url = {http://proceedings.mlr.press/v100/khan20a.html}, abstract = {In this paper, the problem of learning policies to control a large number of homogeneous robots is considered. To this end, we propose a new algorithm we call Graph Policy Gradients (GPG) that exploits the underlying graph symmetry among the robots. The curse of dimensionality one encounters when working with a large number of robots is mitigated by employing a graph convolutional neural (GCN) network to parametrize policies for the robots. The GCN reduces the dimensionality of the problem by learning filters that aggregate information among robots locally, similar to how a convolutional neural network is able to learn local features in an image. Through experiments on formation flying, we show that our proposed method is able to scale better than existing reinforcement methods that employ fully connected networks. More importantly, we show that by using our locally learned filters we are able to zero-shot transfer policies trained on just three robots to over hundred robots. A video demonstrating our results can be found here.} }
Endnote
%0 Conference Paper %T Graph Policy Gradients for Large Scale Robot Control %A Arbaaz Khan %A Ekaterina Tolstaya %A Alejandro Ribeiro %A Vijay Kumar %B Proceedings of the Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2020 %E Leslie Pack Kaelbling %E Danica Kragic %E Komei Sugiura %F pmlr-v100-khan20a %I PMLR %J Proceedings of Machine Learning Research %P 823--834 %U http://proceedings.mlr.press %V 100 %W PMLR %X In this paper, the problem of learning policies to control a large number of homogeneous robots is considered. To this end, we propose a new algorithm we call Graph Policy Gradients (GPG) that exploits the underlying graph symmetry among the robots. The curse of dimensionality one encounters when working with a large number of robots is mitigated by employing a graph convolutional neural (GCN) network to parametrize policies for the robots. The GCN reduces the dimensionality of the problem by learning filters that aggregate information among robots locally, similar to how a convolutional neural network is able to learn local features in an image. Through experiments on formation flying, we show that our proposed method is able to scale better than existing reinforcement methods that employ fully connected networks. More importantly, we show that by using our locally learned filters we are able to zero-shot transfer policies trained on just three robots to over hundred robots. A video demonstrating our results can be found here.
APA
Khan, A., Tolstaya, E., Ribeiro, A. & Kumar, V.. (2020). Graph Policy Gradients for Large Scale Robot Control. Proceedings of the Conference on Robot Learning, in PMLR 100:823-834

Related Material