Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation

Andi Peng, Aviv Netanyahu, Mark K Ho, Tianmin Shu, Andreea Bobu, Julie Shah, Pulkit Agrawal
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:27630-27641, 2023.

Abstract

Policies often fail at test-time due to distribution shifts—changes in the state and reward that occur when an end user deploys the policy in environments different from those seen in training. Data augmentation can help models be more robust to such shifts by varying specific concepts in the state, e.g. object color, that are task-irrelevant and should not impact desired actions. However, designers training the agent don’t often know which concepts are irrelevant a priori. We propose a human-in-the-loop framework to leverage feedback from the end user to quickly identify and augment task-irrelevant visual state concepts. Our framework generates counterfactual demonstrations that allow users to quickly isolate shifted state concepts and identify if they should not impact the desired task, and can therefore be augmented using existing actions. We present experiments validating our full pipeline on discrete and continuous control tasks with real human users. Our method better enables users to (1) understand agent failure, (2) improve sample efficiency of demonstrations required for finetuning, and (3) adapt the agent to their desired reward.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-peng23c, title = {Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation}, author = {Peng, Andi and Netanyahu, Aviv and Ho, Mark K and Shu, Tianmin and Bobu, Andreea and Shah, Julie and Agrawal, Pulkit}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {27630--27641}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/peng23c/peng23c.pdf}, url = {https://proceedings.mlr.press/v202/peng23c.html}, abstract = {Policies often fail at test-time due to distribution shifts—changes in the state and reward that occur when an end user deploys the policy in environments different from those seen in training. Data augmentation can help models be more robust to such shifts by varying specific concepts in the state, e.g. object color, that are task-irrelevant and should not impact desired actions. However, designers training the agent don’t often know which concepts are irrelevant a priori. We propose a human-in-the-loop framework to leverage feedback from the end user to quickly identify and augment task-irrelevant visual state concepts. Our framework generates counterfactual demonstrations that allow users to quickly isolate shifted state concepts and identify if they should not impact the desired task, and can therefore be augmented using existing actions. We present experiments validating our full pipeline on discrete and continuous control tasks with real human users. Our method better enables users to (1) understand agent failure, (2) improve sample efficiency of demonstrations required for finetuning, and (3) adapt the agent to their desired reward.} }
Endnote
%0 Conference Paper %T Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation %A Andi Peng %A Aviv Netanyahu %A Mark K Ho %A Tianmin Shu %A Andreea Bobu %A Julie Shah %A Pulkit Agrawal %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-peng23c %I PMLR %P 27630--27641 %U https://proceedings.mlr.press/v202/peng23c.html %V 202 %X Policies often fail at test-time due to distribution shifts—changes in the state and reward that occur when an end user deploys the policy in environments different from those seen in training. Data augmentation can help models be more robust to such shifts by varying specific concepts in the state, e.g. object color, that are task-irrelevant and should not impact desired actions. However, designers training the agent don’t often know which concepts are irrelevant a priori. We propose a human-in-the-loop framework to leverage feedback from the end user to quickly identify and augment task-irrelevant visual state concepts. Our framework generates counterfactual demonstrations that allow users to quickly isolate shifted state concepts and identify if they should not impact the desired task, and can therefore be augmented using existing actions. We present experiments validating our full pipeline on discrete and continuous control tasks with real human users. Our method better enables users to (1) understand agent failure, (2) improve sample efficiency of demonstrations required for finetuning, and (3) adapt the agent to their desired reward.
APA
Peng, A., Netanyahu, A., Ho, M.K., Shu, T., Bobu, A., Shah, J. & Agrawal, P.. (2023). Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:27630-27641 Available from https://proceedings.mlr.press/v202/peng23c.html.

Related Material