X-MEN: guaranteed XOR-maximum entropy constrained inverse reinforcement learning

Fan Ding; Yexiang Xue

X-MEN: guaranteed XOR-maximum entropy constrained inverse reinforcement learning

Fan Ding, Yexiang Xue

Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:589-598, 2022.

Abstract

Inverse Reinforcement Learning (IRL) is a powerful way of learning from demonstrations. In this paper, we address IRL problems with the availability of prior knowledge that optimal policies will never violate certain constraints. Conventional approaches ignoring these constraints need many demonstrations to converge. We propose XOR-Maximum Entropy Constrained Inverse Reinforcement Learning (X-MEN), which is guaranteed to converge to the global optimal reward function in linear rate w.r.t. the number of learning iterations. X-MEN embeds XOR-sampling – a provable sampling approach which transforms the #-P complete sampling problem into queries to NP oracles – into the framework of maximum entropy IRL. X-MEN also guarantees the learned IRL agent will never generate trajectories that violate constraints. Empirical results in navigation demonstrate that X-MEN converges faster to the optimal rewards compared to baseline approaches and always generates trajectories that satisfy multi-state combinatorial constraints.

Cite this Paper

BibTeX


@InProceedings{pmlr-v180-ding22a,
  title = 	 {X-MEN: guaranteed XOR-maximum entropy constrained inverse reinforcement learning},
  author =       {Ding, Fan and Xue, Yexiang},
  booktitle = 	 {Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence},
  pages = 	 {589--598},
  year = 	 {2022},
  editor = 	 {Cussens, James and Zhang, Kun},
  volume = 	 {180},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {01--05 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v180/ding22a/ding22a.pdf},
  url = 	 {https://proceedings.mlr.press/v180/ding22a.html},
  abstract = 	 {Inverse Reinforcement Learning (IRL)  is a powerful way of learning from demonstrations.  In this paper, we address IRL problems with the  availability of prior knowledge that optimal policies  will never violate certain constraints. Conventional  approaches ignoring these constraints need many  demonstrations to converge. We propose XOR-Maximum Entropy  Constrained Inverse Reinforcement Learning (X-MEN),  which is guaranteed to converge to the global optimal  reward function in linear rate w.r.t. the number of  learning iterations. X-MEN embeds XOR-sampling –  a provable sampling approach which transforms  the #-P complete sampling problem into queries  to NP oracles – into the framework of maximum  entropy IRL. X-MEN also guarantees the learned  IRL agent will never generate trajectories that  violate constraints. Empirical results in navigation  demonstrate that X-MEN converges faster to the  optimal rewards compared to baseline approaches  and always generates trajectories that satisfy  multi-state combinatorial constraints.}
}

Endnote

%0 Conference Paper
%T X-MEN: guaranteed XOR-maximum entropy constrained inverse reinforcement learning
%A Fan Ding
%A Yexiang Xue
%B Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2022
%E James Cussens
%E Kun Zhang	
%F pmlr-v180-ding22a
%I PMLR
%P 589--598
%U https://proceedings.mlr.press/v180/ding22a.html
%V 180
%X Inverse Reinforcement Learning (IRL)  is a powerful way of learning from demonstrations.  In this paper, we address IRL problems with the  availability of prior knowledge that optimal policies  will never violate certain constraints. Conventional  approaches ignoring these constraints need many  demonstrations to converge. We propose XOR-Maximum Entropy  Constrained Inverse Reinforcement Learning (X-MEN),  which is guaranteed to converge to the global optimal  reward function in linear rate w.r.t. the number of  learning iterations. X-MEN embeds XOR-sampling –  a provable sampling approach which transforms  the #-P complete sampling problem into queries  to NP oracles – into the framework of maximum  entropy IRL. X-MEN also guarantees the learned  IRL agent will never generate trajectories that  violate constraints. Empirical results in navigation  demonstrate that X-MEN converges faster to the  optimal rewards compared to baseline approaches  and always generates trajectories that satisfy  multi-state combinatorial constraints.

APA


Ding, F. & Xue, Y.. (2022). X-MEN: guaranteed XOR-maximum entropy constrained inverse reinforcement learning. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 180:589-598 Available from https://proceedings.mlr.press/v180/ding22a.html.

X-MEN: guaranteed XOR-maximum entropy constrained inverse reinforcement learning

Abstract

Cite this Paper

Related Material