Improving Behavioural Cloning with Positive Unlabeled Learning

Qiang Wang, Robert McCarthy, David Cordova Bulens, Kevin McGuinness, Noel E. O’Connor, Francisco Roldan Sanchez, Nico Gürtler, Felix Widmaier, Stephen J. Redmond
Proceedings of The 7th Conference on Robot Learning, PMLR 229:3851-3869, 2023.

Abstract

Learning control policies offline from pre-recorded datasets is a promising avenue for solving challenging real-world problems. However, available datasets are typically of mixed quality, with a limited number of the trajectories that we would consider as positive examples; i.e., high-quality demonstrations. Therefore, we propose a novel iterative learning algorithm for identifying expert trajectories in unlabeled mixed-quality robotics datasets given a minimal set of positive examples, surpassing existing algorithms in terms of accuracy. We show that applying behavioral cloning to the resulting filtered dataset outperforms several competitive offline reinforcement learning and imitation learning baselines. We perform experiments on a range of simulated locomotion tasks and on two challenging manipulation tasks on a real robotic system; in these experiments, our method showcases state-of-the-art performance. Our website: https://sites.google.com/view/offline-policy-learning-pubc.

Cite this Paper


BibTeX
@InProceedings{pmlr-v229-wang23f, title = {Improving Behavioural Cloning with Positive Unlabeled Learning}, author = {Wang, Qiang and McCarthy, Robert and Bulens, David Cordova and McGuinness, Kevin and O'Connor, Noel E. and Sanchez, Francisco Roldan and G\"{u}rtler, Nico and Widmaier, Felix and Redmond, Stephen J.}, booktitle = {Proceedings of The 7th Conference on Robot Learning}, pages = {3851--3869}, year = {2023}, editor = {Tan, Jie and Toussaint, Marc and Darvish, Kourosh}, volume = {229}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v229/wang23f/wang23f.pdf}, url = {https://proceedings.mlr.press/v229/wang23f.html}, abstract = {Learning control policies offline from pre-recorded datasets is a promising avenue for solving challenging real-world problems. However, available datasets are typically of mixed quality, with a limited number of the trajectories that we would consider as positive examples; i.e., high-quality demonstrations. Therefore, we propose a novel iterative learning algorithm for identifying expert trajectories in unlabeled mixed-quality robotics datasets given a minimal set of positive examples, surpassing existing algorithms in terms of accuracy. We show that applying behavioral cloning to the resulting filtered dataset outperforms several competitive offline reinforcement learning and imitation learning baselines. We perform experiments on a range of simulated locomotion tasks and on two challenging manipulation tasks on a real robotic system; in these experiments, our method showcases state-of-the-art performance. Our website: https://sites.google.com/view/offline-policy-learning-pubc.} }
Endnote
%0 Conference Paper %T Improving Behavioural Cloning with Positive Unlabeled Learning %A Qiang Wang %A Robert McCarthy %A David Cordova Bulens %A Kevin McGuinness %A Noel E. O’Connor %A Francisco Roldan Sanchez %A Nico Gürtler %A Felix Widmaier %A Stephen J. Redmond %B Proceedings of The 7th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Jie Tan %E Marc Toussaint %E Kourosh Darvish %F pmlr-v229-wang23f %I PMLR %P 3851--3869 %U https://proceedings.mlr.press/v229/wang23f.html %V 229 %X Learning control policies offline from pre-recorded datasets is a promising avenue for solving challenging real-world problems. However, available datasets are typically of mixed quality, with a limited number of the trajectories that we would consider as positive examples; i.e., high-quality demonstrations. Therefore, we propose a novel iterative learning algorithm for identifying expert trajectories in unlabeled mixed-quality robotics datasets given a minimal set of positive examples, surpassing existing algorithms in terms of accuracy. We show that applying behavioral cloning to the resulting filtered dataset outperforms several competitive offline reinforcement learning and imitation learning baselines. We perform experiments on a range of simulated locomotion tasks and on two challenging manipulation tasks on a real robotic system; in these experiments, our method showcases state-of-the-art performance. Our website: https://sites.google.com/view/offline-policy-learning-pubc.
APA
Wang, Q., McCarthy, R., Bulens, D.C., McGuinness, K., O’Connor, N.E., Sanchez, F.R., Gürtler, N., Widmaier, F. & Redmond, S.J.. (2023). Improving Behavioural Cloning with Positive Unlabeled Learning. Proceedings of The 7th Conference on Robot Learning, in Proceedings of Machine Learning Research 229:3851-3869 Available from https://proceedings.mlr.press/v229/wang23f.html.

Related Material