Learning Behaviors with Uncertain Human Feedback

Xu He; Haipeng Chen; Bo An

Learning Behaviors with Uncertain Human Feedback

Xu He, Haipeng Chen, Bo An

Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR 124:131-140, 2020.

Abstract

Human feedback is widely used to train agents in many domains. However, previous works rarely consider the uncertainty when humans provide feedback, especially in cases that the optimal actions are not obvious to the trainers. For example, the reward of a sub-optimal action can be stochastic and sometimes exceeds that of the optimal action, which is common in games or real-world. Trainers are likely to provide positive feedback to sub-optimal actions, negative feedback to the optimal actions and even do not provide feedback in some confusing situations. Existing works, which utilize the Expectation Maximization (EM) algorithm and treat the feedback model as hidden parameters, do not consider uncertainties in the learning environment and human feedback. To address this challenge, we introduce a novel feedback model that considers the uncertainty of human feedback. However, this incurs intractable calculus in the EM algorithm. To this end, we propose a novel approximate EM algorithm, in which we approximate the expectation step with the Gradient Descent method. Experimental results in both synthetic scenarios and two real-world scenarios with human participants demonstrate the superior performance of our proposed approach.

Cite this Paper

BibTeX

@InProceedings{pmlr-v124-he20a,
  title = 	 {Learning Behaviors with Uncertain Human Feedback},
  author =       {He, Xu and Chen, Haipeng and An, Bo},
  booktitle = 	 {Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)},
  pages = 	 {131--140},
  year = 	 {2020},
  editor = 	 {Peters, Jonas and Sontag, David},
  volume = 	 {124},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {03--06 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v124/he20a/he20a.pdf},
  url = 	 {https://proceedings.mlr.press/v124/he20a.html},
  abstract = 	 {Human feedback is widely used to train agents in many domains. However, previous works rarely consider the uncertainty when humans provide feedback, especially in cases that the optimal actions are not obvious to the trainers. For example, the reward of a sub-optimal action can be stochastic and sometimes exceeds that of the optimal action, which is common in games or real-world. Trainers are likely to provide positive feedback to sub-optimal actions, negative feedback to the optimal actions and even do not provide feedback in some confusing situations. Existing works, which utilize the Expectation Maximization (EM) algorithm and treat the feedback model as hidden parameters, do not consider uncertainties in the learning environment and human feedback. To address this challenge, we introduce a novel feedback model that considers the uncertainty of human feedback. However, this incurs intractable calculus in the EM algorithm. To this end, we propose a novel approximate EM algorithm, in which we approximate the expectation step with the Gradient Descent method. Experimental results in both synthetic scenarios and two real-world scenarios with human participants demonstrate the superior performance of our proposed approach.}
}

Endnote

%0 Conference Paper
%T Learning Behaviors with Uncertain Human Feedback
%A Xu He
%A Haipeng Chen
%A Bo An
%B Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)
%C Proceedings of Machine Learning Research
%D 2020
%E Jonas Peters
%E David Sontag	
%F pmlr-v124-he20a
%I PMLR
%P 131--140
%U https://proceedings.mlr.press/v124/he20a.html
%V 124
%X Human feedback is widely used to train agents in many domains. However, previous works rarely consider the uncertainty when humans provide feedback, especially in cases that the optimal actions are not obvious to the trainers. For example, the reward of a sub-optimal action can be stochastic and sometimes exceeds that of the optimal action, which is common in games or real-world. Trainers are likely to provide positive feedback to sub-optimal actions, negative feedback to the optimal actions and even do not provide feedback in some confusing situations. Existing works, which utilize the Expectation Maximization (EM) algorithm and treat the feedback model as hidden parameters, do not consider uncertainties in the learning environment and human feedback. To address this challenge, we introduce a novel feedback model that considers the uncertainty of human feedback. However, this incurs intractable calculus in the EM algorithm. To this end, we propose a novel approximate EM algorithm, in which we approximate the expectation step with the Gradient Descent method. Experimental results in both synthetic scenarios and two real-world scenarios with human participants demonstrate the superior performance of our proposed approach.

APA

He, X., Chen, H. & An, B.. (2020). Learning Behaviors with Uncertain Human Feedback. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), in Proceedings of Machine Learning Research 124:131-140 Available from https://proceedings.mlr.press/v124/he20a.html.

Related Material

Download PDF