Interaction-Grounded Learning

Tengyang Xie; John Langford; Paul Mineiro; Ida Momennejad

Interaction-Grounded Learning

Tengyang Xie, John Langford, Paul Mineiro, Ida Momennejad

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:11414-11423, 2021.

Abstract

Consider a prosthetic arm, learning to adapt to its user’s control signals. We propose \emph{Interaction-Grounded Learning} for this novel setting, in which a learner’s goal is to interact with the environment with no grounding or explicit reward to optimize its policies. Such a problem evades common RL solutions which require an explicit reward. The learning agent observes a multidimensional \emph{context vector}, takes an \emph{action}, and then observes a multidimensional \emph{feedback vector}. This multidimensional feedback vector has \emph{no} explicit reward information. In order to succeed, the algorithm must learn how to evaluate the feedback vector to discover a latent reward signal, with which it can ground its policies without supervision. We show that in an Interaction-Grounded Learning setting, with certain natural assumptions, a learner can discover the latent reward and ground its policy for successful interaction. We provide theoretical guarantees and a proof-of-concept empirical evaluation to demonstrate the effectiveness of our proposed approach.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-xie21e,
  title = 	 {Interaction-Grounded Learning},
  author =       {Xie, Tengyang and Langford, John and Mineiro, Paul and Momennejad, Ida},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {11414--11423},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/xie21e/xie21e.pdf},
  url = 	 {https://proceedings.mlr.press/v139/xie21e.html},
  abstract = 	 {Consider a prosthetic arm, learning to adapt to its user’s control signals. We propose \emph{Interaction-Grounded Learning} for this novel setting, in which a learner’s goal is to interact with the environment with no grounding or explicit reward to optimize its policies. Such a problem evades common RL solutions which require an explicit reward. The learning agent observes a multidimensional \emph{context vector}, takes an \emph{action}, and then observes a multidimensional \emph{feedback vector}. This multidimensional feedback vector has \emph{no} explicit reward information. In order to succeed, the algorithm must learn how to evaluate the feedback vector to discover a latent reward signal, with which it can ground its policies without supervision. We show that in an Interaction-Grounded Learning setting, with certain natural assumptions, a learner can discover the latent reward and ground its policy for successful interaction. We provide theoretical guarantees and a proof-of-concept empirical evaluation to demonstrate the effectiveness of our proposed approach.}
}

Endnote

%0 Conference Paper
%T Interaction-Grounded Learning
%A Tengyang Xie
%A John Langford
%A Paul Mineiro
%A Ida Momennejad
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-xie21e
%I PMLR
%P 11414--11423
%U https://proceedings.mlr.press/v139/xie21e.html
%V 139
%X Consider a prosthetic arm, learning to adapt to its user’s control signals. We propose \emph{Interaction-Grounded Learning} for this novel setting, in which a learner’s goal is to interact with the environment with no grounding or explicit reward to optimize its policies. Such a problem evades common RL solutions which require an explicit reward. The learning agent observes a multidimensional \emph{context vector}, takes an \emph{action}, and then observes a multidimensional \emph{feedback vector}. This multidimensional feedback vector has \emph{no} explicit reward information. In order to succeed, the algorithm must learn how to evaluate the feedback vector to discover a latent reward signal, with which it can ground its policies without supervision. We show that in an Interaction-Grounded Learning setting, with certain natural assumptions, a learner can discover the latent reward and ground its policy for successful interaction. We provide theoretical guarantees and a proof-of-concept empirical evaluation to demonstrate the effectiveness of our proposed approach.

APA

Xie, T., Langford, J., Mineiro, P. & Momennejad, I.. (2021). Interaction-Grounded Learning. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:11414-11423 Available from https://proceedings.mlr.press/v139/xie21e.html.

Interaction-Grounded Learning

Abstract

Cite this Paper

Related Material