Marginal MAP estimation for inverse RL under occlusion with observer noise

Prasanth Sengadu Suresh, Prashant Doshi
Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:1907-1916, 2022.

Abstract

We consider the problem of learning the behavioral preferences of an expert engaged in a task from noisy and partially-observable demonstrations. This is motivated by real-world applications such as a line robot learning from observing a human worker, where some observations are occluded by environmental elements. Furthermore, robotic perception tends to be imperfect and noisy. Previous techniques for inverse reinforcement learning (IRL) take the approach of either omitting the missing portions or inferring it as part of expectation-maximization, which tends to be slow and prone to local optima. We present a new method that generalizes the well-known Bayesian maximum-a-posteriori (MAP) IRL method by marginalizing the occluded portions of the trajectory. This is then extended with an observation model to account for perception noise. This novel application of marginal MAP (MMAP) to IRL significantly improves on the previous IRL technique under occlusion in both formative evaluations on a toy problem and in a summative evaluation on a produce sorting line task by a physical robot.

Cite this Paper


BibTeX
@InProceedings{pmlr-v180-suresh22a, title = {Marginal {MAP} estimation for inverse {RL} under occlusion with observer noise}, author = {Suresh, Prasanth Sengadu and Doshi, Prashant}, booktitle = {Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence}, pages = {1907--1916}, year = {2022}, editor = {Cussens, James and Zhang, Kun}, volume = {180}, series = {Proceedings of Machine Learning Research}, month = {01--05 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v180/suresh22a/suresh22a.pdf}, url = {https://proceedings.mlr.press/v180/suresh22a.html}, abstract = {We consider the problem of learning the behavioral preferences of an expert engaged in a task from noisy and partially-observable demonstrations. This is motivated by real-world applications such as a line robot learning from observing a human worker, where some observations are occluded by environmental elements. Furthermore, robotic perception tends to be imperfect and noisy. Previous techniques for inverse reinforcement learning (IRL) take the approach of either omitting the missing portions or inferring it as part of expectation-maximization, which tends to be slow and prone to local optima. We present a new method that generalizes the well-known Bayesian maximum-a-posteriori (MAP) IRL method by marginalizing the occluded portions of the trajectory. This is then extended with an observation model to account for perception noise. This novel application of marginal MAP (MMAP) to IRL significantly improves on the previous IRL technique under occlusion in both formative evaluations on a toy problem and in a summative evaluation on a produce sorting line task by a physical robot.} }
Endnote
%0 Conference Paper %T Marginal MAP estimation for inverse RL under occlusion with observer noise %A Prasanth Sengadu Suresh %A Prashant Doshi %B Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2022 %E James Cussens %E Kun Zhang %F pmlr-v180-suresh22a %I PMLR %P 1907--1916 %U https://proceedings.mlr.press/v180/suresh22a.html %V 180 %X We consider the problem of learning the behavioral preferences of an expert engaged in a task from noisy and partially-observable demonstrations. This is motivated by real-world applications such as a line robot learning from observing a human worker, where some observations are occluded by environmental elements. Furthermore, robotic perception tends to be imperfect and noisy. Previous techniques for inverse reinforcement learning (IRL) take the approach of either omitting the missing portions or inferring it as part of expectation-maximization, which tends to be slow and prone to local optima. We present a new method that generalizes the well-known Bayesian maximum-a-posteriori (MAP) IRL method by marginalizing the occluded portions of the trajectory. This is then extended with an observation model to account for perception noise. This novel application of marginal MAP (MMAP) to IRL significantly improves on the previous IRL technique under occlusion in both formative evaluations on a toy problem and in a summative evaluation on a produce sorting line task by a physical robot.
APA
Suresh, P.S. & Doshi, P.. (2022). Marginal MAP estimation for inverse RL under occlusion with observer noise. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 180:1907-1916 Available from https://proceedings.mlr.press/v180/suresh22a.html.

Related Material