Learning Constraints from Offline Demonstrations via Superior Distribution Correction Estimation

Guorui Quan, Zhiqiang Xu, Guiliang Liu
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:41787-41803, 2024.

Abstract

An effective approach for learning both safety constraints and control policies is Inverse Constrained Reinforcement Learning (ICRL). Previous ICRL algorithms commonly employ an online learning framework that permits unlimited sampling from an interactive environment. This setting, however, is infeasible in many realistic applications where data collection is dangerous and expensive. To address this challenge, we propose Inverse Constrained Superior Distribution Correction Estimation (ICSDICE) as an offline ICRL solver. ICSDICE extracts feasible constraints from superior distributions, thereby highlighting policies with expert-exceeding rewards maximization ability. To estimate these distributions, ICSDICE solves a regularized dual optimization problem for safe control by exploiting the observed reward signals and expert preferences. Striving for transferable constraints and unbiased estimations, ICSDICE actively encourages sparsity and incorporates a discounting effect within the learned and observed distributions. Empirical studies show that ICSDICE outperforms other baselines by accurately recovering the constraints and adapting to high-dimensional environments. The code is available at https://github.com/quangr/ICSDICE.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-quan24a, title = {Learning Constraints from Offline Demonstrations via Superior Distribution Correction Estimation}, author = {Quan, Guorui and Xu, Zhiqiang and Liu, Guiliang}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {41787--41803}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/quan24a/quan24a.pdf}, url = {https://proceedings.mlr.press/v235/quan24a.html}, abstract = {An effective approach for learning both safety constraints and control policies is Inverse Constrained Reinforcement Learning (ICRL). Previous ICRL algorithms commonly employ an online learning framework that permits unlimited sampling from an interactive environment. This setting, however, is infeasible in many realistic applications where data collection is dangerous and expensive. To address this challenge, we propose Inverse Constrained Superior Distribution Correction Estimation (ICSDICE) as an offline ICRL solver. ICSDICE extracts feasible constraints from superior distributions, thereby highlighting policies with expert-exceeding rewards maximization ability. To estimate these distributions, ICSDICE solves a regularized dual optimization problem for safe control by exploiting the observed reward signals and expert preferences. Striving for transferable constraints and unbiased estimations, ICSDICE actively encourages sparsity and incorporates a discounting effect within the learned and observed distributions. Empirical studies show that ICSDICE outperforms other baselines by accurately recovering the constraints and adapting to high-dimensional environments. The code is available at https://github.com/quangr/ICSDICE.} }
Endnote
%0 Conference Paper %T Learning Constraints from Offline Demonstrations via Superior Distribution Correction Estimation %A Guorui Quan %A Zhiqiang Xu %A Guiliang Liu %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-quan24a %I PMLR %P 41787--41803 %U https://proceedings.mlr.press/v235/quan24a.html %V 235 %X An effective approach for learning both safety constraints and control policies is Inverse Constrained Reinforcement Learning (ICRL). Previous ICRL algorithms commonly employ an online learning framework that permits unlimited sampling from an interactive environment. This setting, however, is infeasible in many realistic applications where data collection is dangerous and expensive. To address this challenge, we propose Inverse Constrained Superior Distribution Correction Estimation (ICSDICE) as an offline ICRL solver. ICSDICE extracts feasible constraints from superior distributions, thereby highlighting policies with expert-exceeding rewards maximization ability. To estimate these distributions, ICSDICE solves a regularized dual optimization problem for safe control by exploiting the observed reward signals and expert preferences. Striving for transferable constraints and unbiased estimations, ICSDICE actively encourages sparsity and incorporates a discounting effect within the learned and observed distributions. Empirical studies show that ICSDICE outperforms other baselines by accurately recovering the constraints and adapting to high-dimensional environments. The code is available at https://github.com/quangr/ICSDICE.
APA
Quan, G., Xu, Z. & Liu, G.. (2024). Learning Constraints from Offline Demonstrations via Superior Distribution Correction Estimation. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:41787-41803 Available from https://proceedings.mlr.press/v235/quan24a.html.

Related Material