Online Inverse Reinforcement Learning with Learned Observation Model

Saurabh Arora, Prashant Doshi, Bikramjit Banerjee
Proceedings of The 6th Conference on Robot Learning, PMLR 205:1468-1477, 2023.

Abstract

With the motivation of extending incremental inverse reinforcement learning (I2RL) to real-world robotics applications with noisy observations as well as an unknown observation model, we introduce a new method (RIMEO) that approximates the observation model in order to best estimate the noise-free ground truth underlying the observations. It learns a maximum entropy distribution over the observation features governing the perception process, and then uses the inferred observation model to learn the reward function. Experimental evaluation is performed in two robotics tasks: (1) post-harvest vegetable sorting with a Sawyer arm based on human demonstration, and (2) breaching a perimeter patrol by two Turtlebots. Our experiments reveal that RIMEO learns a more accurate policy compared to (a) a state-of-the-art IRL method that does not directly learn an observation model, and (b) a custom baseline that learns a less sophisticated observation model. Furthermore, we show that RIMEO admits formal guarantees of monotonic convergence and a sample complexity bound.

Cite this Paper


BibTeX
@InProceedings{pmlr-v205-arora23a, title = {Online Inverse Reinforcement Learning with Learned Observation Model}, author = {Arora, Saurabh and Doshi, Prashant and Banerjee, Bikramjit}, booktitle = {Proceedings of The 6th Conference on Robot Learning}, pages = {1468--1477}, year = {2023}, editor = {Liu, Karen and Kulic, Dana and Ichnowski, Jeff}, volume = {205}, series = {Proceedings of Machine Learning Research}, month = {14--18 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v205/arora23a/arora23a.pdf}, url = {https://proceedings.mlr.press/v205/arora23a.html}, abstract = {With the motivation of extending incremental inverse reinforcement learning (I2RL) to real-world robotics applications with noisy observations as well as an unknown observation model, we introduce a new method (RIMEO) that approximates the observation model in order to best estimate the noise-free ground truth underlying the observations. It learns a maximum entropy distribution over the observation features governing the perception process, and then uses the inferred observation model to learn the reward function. Experimental evaluation is performed in two robotics tasks: (1) post-harvest vegetable sorting with a Sawyer arm based on human demonstration, and (2) breaching a perimeter patrol by two Turtlebots. Our experiments reveal that RIMEO learns a more accurate policy compared to (a) a state-of-the-art IRL method that does not directly learn an observation model, and (b) a custom baseline that learns a less sophisticated observation model. Furthermore, we show that RIMEO admits formal guarantees of monotonic convergence and a sample complexity bound.} }
Endnote
%0 Conference Paper %T Online Inverse Reinforcement Learning with Learned Observation Model %A Saurabh Arora %A Prashant Doshi %A Bikramjit Banerjee %B Proceedings of The 6th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Karen Liu %E Dana Kulic %E Jeff Ichnowski %F pmlr-v205-arora23a %I PMLR %P 1468--1477 %U https://proceedings.mlr.press/v205/arora23a.html %V 205 %X With the motivation of extending incremental inverse reinforcement learning (I2RL) to real-world robotics applications with noisy observations as well as an unknown observation model, we introduce a new method (RIMEO) that approximates the observation model in order to best estimate the noise-free ground truth underlying the observations. It learns a maximum entropy distribution over the observation features governing the perception process, and then uses the inferred observation model to learn the reward function. Experimental evaluation is performed in two robotics tasks: (1) post-harvest vegetable sorting with a Sawyer arm based on human demonstration, and (2) breaching a perimeter patrol by two Turtlebots. Our experiments reveal that RIMEO learns a more accurate policy compared to (a) a state-of-the-art IRL method that does not directly learn an observation model, and (b) a custom baseline that learns a less sophisticated observation model. Furthermore, we show that RIMEO admits formal guarantees of monotonic convergence and a sample complexity bound.
APA
Arora, S., Doshi, P. & Banerjee, B.. (2023). Online Inverse Reinforcement Learning with Learned Observation Model. Proceedings of The 6th Conference on Robot Learning, in Proceedings of Machine Learning Research 205:1468-1477 Available from https://proceedings.mlr.press/v205/arora23a.html.

Related Material