Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal

Jake Bruce, Niko Sunderhauf, Piotr Mirowski, Raia Hadsell, Michael Milford
Proceedings of The 2nd Conference on Robot Learning, PMLR 87:346-361, 2018.

Abstract

Model-free reinforcement learning has recently been shown to be effective at learning navigation policies from complex image input. However, these algorithms tend to require large amounts of interaction with the environment, which can be prohibitively costly to obtain on robots in the real world. We present an approach for efficiently learning goal-directed navigation policies on a mobile robot, from only a single coverage traversal of recorded data. The navigation agent learns an effective policy over a diverse action space in a large heterogeneous environment consisting of more than 2km of travel, through buildings and outdoor regions that collectively exhibit large variations in visual appearance, self-similarity, and connectivity. We compare pretrained visual encoders that enable precomputation of visual embeddings to achieve a throughput of tens of thousands of transitions per second at training time on a commodity desktop computer, allowing agents to learn from millions of trajectories of experience in a matter of hours. We propose multiple forms of computationally efficient stochastic augmentation to enable the learned policy to generalise beyond these precomputed embeddings, and demonstrate successful deployment of the learned policy on the real robot without fine tuning, despite environmental appearance differences at test time. The dataset and code required to reproduce these results and apply the technique to other datasets and robots is made publicly available at rl-navigation.github.io/deployable.

Cite this Paper


BibTeX
@InProceedings{pmlr-v87-bruce18a, title = {Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal}, author = {Bruce, Jake and Sunderhauf, Niko and Mirowski, Piotr and Hadsell, Raia and Milford, Michael}, booktitle = {Proceedings of The 2nd Conference on Robot Learning}, pages = {346--361}, year = {2018}, editor = {Billard, Aude and Dragan, Anca and Peters, Jan and Morimoto, Jun}, volume = {87}, series = {Proceedings of Machine Learning Research}, month = {29--31 Oct}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v87/bruce18a/bruce18a.pdf}, url = {https://proceedings.mlr.press/v87/bruce18a.html}, abstract = {Model-free reinforcement learning has recently been shown to be effective at learning navigation policies from complex image input. However, these algorithms tend to require large amounts of interaction with the environment, which can be prohibitively costly to obtain on robots in the real world. We present an approach for efficiently learning goal-directed navigation policies on a mobile robot, from only a single coverage traversal of recorded data. The navigation agent learns an effective policy over a diverse action space in a large heterogeneous environment consisting of more than 2km of travel, through buildings and outdoor regions that collectively exhibit large variations in visual appearance, self-similarity, and connectivity. We compare pretrained visual encoders that enable precomputation of visual embeddings to achieve a throughput of tens of thousands of transitions per second at training time on a commodity desktop computer, allowing agents to learn from millions of trajectories of experience in a matter of hours. We propose multiple forms of computationally efficient stochastic augmentation to enable the learned policy to generalise beyond these precomputed embeddings, and demonstrate successful deployment of the learned policy on the real robot without fine tuning, despite environmental appearance differences at test time. The dataset and code required to reproduce these results and apply the technique to other datasets and robots is made publicly available at rl-navigation.github.io/deployable. } }
Endnote
%0 Conference Paper %T Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal %A Jake Bruce %A Niko Sunderhauf %A Piotr Mirowski %A Raia Hadsell %A Michael Milford %B Proceedings of The 2nd Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2018 %E Aude Billard %E Anca Dragan %E Jan Peters %E Jun Morimoto %F pmlr-v87-bruce18a %I PMLR %P 346--361 %U https://proceedings.mlr.press/v87/bruce18a.html %V 87 %X Model-free reinforcement learning has recently been shown to be effective at learning navigation policies from complex image input. However, these algorithms tend to require large amounts of interaction with the environment, which can be prohibitively costly to obtain on robots in the real world. We present an approach for efficiently learning goal-directed navigation policies on a mobile robot, from only a single coverage traversal of recorded data. The navigation agent learns an effective policy over a diverse action space in a large heterogeneous environment consisting of more than 2km of travel, through buildings and outdoor regions that collectively exhibit large variations in visual appearance, self-similarity, and connectivity. We compare pretrained visual encoders that enable precomputation of visual embeddings to achieve a throughput of tens of thousands of transitions per second at training time on a commodity desktop computer, allowing agents to learn from millions of trajectories of experience in a matter of hours. We propose multiple forms of computationally efficient stochastic augmentation to enable the learned policy to generalise beyond these precomputed embeddings, and demonstrate successful deployment of the learned policy on the real robot without fine tuning, despite environmental appearance differences at test time. The dataset and code required to reproduce these results and apply the technique to other datasets and robots is made publicly available at rl-navigation.github.io/deployable.
APA
Bruce, J., Sunderhauf, N., Mirowski, P., Hadsell, R. & Milford, M.. (2018). Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal. Proceedings of The 2nd Conference on Robot Learning, in Proceedings of Machine Learning Research 87:346-361 Available from https://proceedings.mlr.press/v87/bruce18a.html.

Related Material