Learning Object-conditioned Exploration using Distributed Soft Actor Critic

Ayzaan Wahid, Austin Stone, Kevin Chen, Brian Ichter, Alexander Toshev
Proceedings of the 2020 Conference on Robot Learning, PMLR 155:1684-1695, 2021.

Abstract

Object navigation is defined as navigating to an object of a given label in a complex, unexplored environment. In its general form, this problem poses several challenges for Robotics: semantic exploration of unknown environments in search of an object and low-level control. In this work we study object-guided exploration and low-level control, and present an end-to-end trained navigation policy achieving a success rate of 0.68 and SPL of 0.58 on unseen, visually complex scans of real homes. We propose a highly scalable implementation of an off-policy Reinforcement Learning algorithm, distributed Soft Actor Critic, which allows the system to utilize 98M experience steps in 24 hours on 8 GPUs. Our system learns to control a differential drive mobile base in simulation from a stack of high dimensional observations commonly used on robotic platforms. The learned policy is capable of object-guided exploratory behaviors and low-level control learned from pure experiences in realistic environments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v155-wahid21a, title = {Learning Object-conditioned Exploration using Distributed Soft Actor Critic}, author = {Wahid, Ayzaan and Stone, Austin and Chen, Kevin and Ichter, Brian and Toshev, Alexander}, booktitle = {Proceedings of the 2020 Conference on Robot Learning}, pages = {1684--1695}, year = {2021}, editor = {Kober, Jens and Ramos, Fabio and Tomlin, Claire}, volume = {155}, series = {Proceedings of Machine Learning Research}, month = {16--18 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v155/wahid21a/wahid21a.pdf}, url = {https://proceedings.mlr.press/v155/wahid21a.html}, abstract = {Object navigation is defined as navigating to an object of a given label in a complex, unexplored environment. In its general form, this problem poses several challenges for Robotics: semantic exploration of unknown environments in search of an object and low-level control. In this work we study object-guided exploration and low-level control, and present an end-to-end trained navigation policy achieving a success rate of 0.68 and SPL of 0.58 on unseen, visually complex scans of real homes. We propose a highly scalable implementation of an off-policy Reinforcement Learning algorithm, distributed Soft Actor Critic, which allows the system to utilize 98M experience steps in 24 hours on 8 GPUs. Our system learns to control a differential drive mobile base in simulation from a stack of high dimensional observations commonly used on robotic platforms. The learned policy is capable of object-guided exploratory behaviors and low-level control learned from pure experiences in realistic environments.} }
Endnote
%0 Conference Paper %T Learning Object-conditioned Exploration using Distributed Soft Actor Critic %A Ayzaan Wahid %A Austin Stone %A Kevin Chen %A Brian Ichter %A Alexander Toshev %B Proceedings of the 2020 Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2021 %E Jens Kober %E Fabio Ramos %E Claire Tomlin %F pmlr-v155-wahid21a %I PMLR %P 1684--1695 %U https://proceedings.mlr.press/v155/wahid21a.html %V 155 %X Object navigation is defined as navigating to an object of a given label in a complex, unexplored environment. In its general form, this problem poses several challenges for Robotics: semantic exploration of unknown environments in search of an object and low-level control. In this work we study object-guided exploration and low-level control, and present an end-to-end trained navigation policy achieving a success rate of 0.68 and SPL of 0.58 on unseen, visually complex scans of real homes. We propose a highly scalable implementation of an off-policy Reinforcement Learning algorithm, distributed Soft Actor Critic, which allows the system to utilize 98M experience steps in 24 hours on 8 GPUs. Our system learns to control a differential drive mobile base in simulation from a stack of high dimensional observations commonly used on robotic platforms. The learned policy is capable of object-guided exploratory behaviors and low-level control learned from pure experiences in realistic environments.
APA
Wahid, A., Stone, A., Chen, K., Ichter, B. & Toshev, A.. (2021). Learning Object-conditioned Exploration using Distributed Soft Actor Critic. Proceedings of the 2020 Conference on Robot Learning, in Proceedings of Machine Learning Research 155:1684-1695 Available from https://proceedings.mlr.press/v155/wahid21a.html.

Related Material