Learning Object-conditioned Exploration using Distributed Soft Actor Critic

Ayzaan Wahid; Austin Stone; Kevin Chen; Brian Ichter; Alexander Toshev

Learning Object-conditioned Exploration using Distributed Soft Actor Critic

Ayzaan Wahid, Austin Stone, Kevin Chen, Brian Ichter, Alexander Toshev

Proceedings of the 2020 Conference on Robot Learning, PMLR 155:1684-1695, 2021.

Abstract

Object navigation is defined as navigating to an object of a given label in a complex, unexplored environment. In its general form, this problem poses several challenges for Robotics: semantic exploration of unknown environments in search of an object and low-level control. In this work we study object-guided exploration and low-level control, and present an end-to-end trained navigation policy achieving a success rate of 0.68 and SPL of 0.58 on unseen, visually complex scans of real homes. We propose a highly scalable implementation of an off-policy Reinforcement Learning algorithm, distributed Soft Actor Critic, which allows the system to utilize 98M experience steps in 24 hours on 8 GPUs. Our system learns to control a differential drive mobile base in simulation from a stack of high dimensional observations commonly used on robotic platforms. The learned policy is capable of object-guided exploratory behaviors and low-level control learned from pure experiences in realistic environments.

Cite this Paper

BibTeX


@InProceedings{pmlr-v155-wahid21a,
  title = 	 {Learning Object-conditioned Exploration using Distributed Soft Actor Critic},
  author =       {Wahid, Ayzaan and Stone, Austin and Chen, Kevin and Ichter, Brian and Toshev, Alexander},
  booktitle = 	 {Proceedings of the 2020 Conference on Robot Learning},
  pages = 	 {1684--1695},
  year = 	 {2021},
  editor = 	 {Kober, Jens and Ramos, Fabio and Tomlin, Claire},
  volume = 	 {155},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v155/wahid21a/wahid21a.pdf},
  url = 	 {https://proceedings.mlr.press/v155/wahid21a.html},
  abstract = 	 {Object navigation is defined as navigating to an object of a given label in a complex, unexplored environment. In its general form, this problem poses several challenges for Robotics: semantic exploration of unknown environments in search of an object and low-level control. In this work we study object-guided exploration and low-level control, and present an end-to-end trained navigation policy achieving a success rate of 0.68 and SPL of 0.58 on unseen, visually complex scans of real homes. We propose a highly scalable implementation of an off-policy Reinforcement Learning algorithm, distributed Soft Actor Critic, which allows the system to utilize 98M experience steps in 24 hours on 8 GPUs. Our system learns to control a differential drive mobile base in simulation from a stack of high dimensional observations commonly used on robotic platforms. The learned policy is capable of object-guided exploratory behaviors and low-level control learned from pure experiences in realistic environments.}
}

Endnote

%0 Conference Paper
%T Learning Object-conditioned Exploration using Distributed Soft Actor Critic
%A Ayzaan Wahid
%A Austin Stone
%A Kevin Chen
%A Brian Ichter
%A Alexander Toshev
%B Proceedings of the 2020 Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Jens Kober
%E Fabio Ramos
%E Claire Tomlin	
%F pmlr-v155-wahid21a
%I PMLR
%P 1684--1695
%U https://proceedings.mlr.press/v155/wahid21a.html
%V 155
%X Object navigation is defined as navigating to an object of a given label in a complex, unexplored environment. In its general form, this problem poses several challenges for Robotics: semantic exploration of unknown environments in search of an object and low-level control. In this work we study object-guided exploration and low-level control, and present an end-to-end trained navigation policy achieving a success rate of 0.68 and SPL of 0.58 on unseen, visually complex scans of real homes. We propose a highly scalable implementation of an off-policy Reinforcement Learning algorithm, distributed Soft Actor Critic, which allows the system to utilize 98M experience steps in 24 hours on 8 GPUs. Our system learns to control a differential drive mobile base in simulation from a stack of high dimensional observations commonly used on robotic platforms. The learned policy is capable of object-guided exploratory behaviors and low-level control learned from pure experiences in realistic environments.

APA


Wahid, A., Stone, A., Chen, K., Ichter, B. & Toshev, A.. (2021). Learning Object-conditioned Exploration using Distributed Soft Actor Critic. Proceedings of the 2020 Conference on Robot Learning, in Proceedings of Machine Learning Research 155:1684-1695 Available from https://proceedings.mlr.press/v155/wahid21a.html.

Related Material

Download PDF