Towards Uniformly Superhuman Autonomy via Subdominance Minimization

Brian Ziebart, Sanjiban Choudhury, Xinyan Yan, Paul Vernaza
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:27654-27670, 2022.

Abstract

Prevalent imitation learning methods seek to produce behavior that matches or exceeds average human performance. This often prevents achieving expert-level or superhuman performance when identifying the better demonstrations to imitate is difficult. We instead assume demonstrations are of varying quality and seek to induce behavior that is unambiguously better (i.e., Pareto dominant or minimally subdominant) than all human demonstrations. Our minimum subdominance inverse optimal control training objective is primarily defined by high quality demonstrations; lower quality demonstrations, which are more easily dominated, are effectively ignored instead of degrading imitation. With increasing probability, our approach produces superhuman behavior incurring lower cost than demonstrations on the demonstrator’s unknown cost function{—}even if that cost function differs for each demonstration. We apply our approach on a computer cursor pointing task, producing behavior that is 78% superhuman, while minimizing demonstration suboptimality provides 50% superhuman behavior{—}and only 72% even after selective data cleaning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-ziebart22a, title = {Towards Uniformly Superhuman Autonomy via Subdominance Minimization}, author = {Ziebart, Brian and Choudhury, Sanjiban and Yan, Xinyan and Vernaza, Paul}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {27654--27670}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/ziebart22a/ziebart22a.pdf}, url = {https://proceedings.mlr.press/v162/ziebart22a.html}, abstract = {Prevalent imitation learning methods seek to produce behavior that matches or exceeds average human performance. This often prevents achieving expert-level or superhuman performance when identifying the better demonstrations to imitate is difficult. We instead assume demonstrations are of varying quality and seek to induce behavior that is unambiguously better (i.e., Pareto dominant or minimally subdominant) than all human demonstrations. Our minimum subdominance inverse optimal control training objective is primarily defined by high quality demonstrations; lower quality demonstrations, which are more easily dominated, are effectively ignored instead of degrading imitation. With increasing probability, our approach produces superhuman behavior incurring lower cost than demonstrations on the demonstrator’s unknown cost function{—}even if that cost function differs for each demonstration. We apply our approach on a computer cursor pointing task, producing behavior that is 78% superhuman, while minimizing demonstration suboptimality provides 50% superhuman behavior{—}and only 72% even after selective data cleaning.} }
Endnote
%0 Conference Paper %T Towards Uniformly Superhuman Autonomy via Subdominance Minimization %A Brian Ziebart %A Sanjiban Choudhury %A Xinyan Yan %A Paul Vernaza %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-ziebart22a %I PMLR %P 27654--27670 %U https://proceedings.mlr.press/v162/ziebart22a.html %V 162 %X Prevalent imitation learning methods seek to produce behavior that matches or exceeds average human performance. This often prevents achieving expert-level or superhuman performance when identifying the better demonstrations to imitate is difficult. We instead assume demonstrations are of varying quality and seek to induce behavior that is unambiguously better (i.e., Pareto dominant or minimally subdominant) than all human demonstrations. Our minimum subdominance inverse optimal control training objective is primarily defined by high quality demonstrations; lower quality demonstrations, which are more easily dominated, are effectively ignored instead of degrading imitation. With increasing probability, our approach produces superhuman behavior incurring lower cost than demonstrations on the demonstrator’s unknown cost function{—}even if that cost function differs for each demonstration. We apply our approach on a computer cursor pointing task, producing behavior that is 78% superhuman, while minimizing demonstration suboptimality provides 50% superhuman behavior{—}and only 72% even after selective data cleaning.
APA
Ziebart, B., Choudhury, S., Yan, X. & Vernaza, P.. (2022). Towards Uniformly Superhuman Autonomy via Subdominance Minimization. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:27654-27670 Available from https://proceedings.mlr.press/v162/ziebart22a.html.

Related Material