Walking the Values in Bayesian Inverse Reinforcement Learning

Ondrej Bajgar; Alessandro Abate; Konstantinos Gatsis; Michael Osborne

Walking the Values in Bayesian Inverse Reinforcement Learning

Ondrej Bajgar, Alessandro Abate, Konstantinos Gatsis, Michael Osborne

Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:273-287, 2024.

Abstract

The goal of Bayesian inverse reinforcement learning (IRL) is recovering a posterior distribution over reward functions using a set of demonstrations from an expert optimizing for a reward unknown to the learner. The resulting posterior over rewards can then be used to synthesize an apprentice policy that performs well on the same or a similar task. A key challenge in Bayesian IRL is bridging the computational gap between the hypothesis space of possible rewards and the likelihood, often defined in terms of Q values: vanilla Bayesian IRL needs to solve the costly forward planning problem – going from rewards to the Q values – at every step of the algorithm, which may need to be done thousands of times. We propose to solve this by a simple change: instead of focusing on primarily sampling in the space of rewards, we can focus on primarily working in the space of Q-values, since the computation required to go from Q-values to reward is radically cheaper. Furthermore, this reversion of the computation makes it easy to compute the gradient allowing efficient sampling using Hamiltonian Monte Carlo. We propose ValueWalk – a new Markov chain Monte Carlo method based on this insight – and illustrate its advantages on several tasks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v244-bajgar24a,
  title = 	 {Walking the Values in Bayesian Inverse Reinforcement Learning},
  author =       {Bajgar, Ondrej and Abate, Alessandro and Gatsis, Konstantinos and Osborne, Michael},
  booktitle = 	 {Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence},
  pages = 	 {273--287},
  year = 	 {2024},
  editor = 	 {Kiyavash, Negar and Mooij, Joris M.},
  volume = 	 {244},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v244/main/assets/bajgar24a/bajgar24a.pdf},
  url = 	 {https://proceedings.mlr.press/v244/bajgar24a.html},
  abstract = 	 {The goal of Bayesian inverse reinforcement learning (IRL) is recovering a posterior distribution over reward functions using a set of demonstrations from an expert optimizing for a reward unknown to the learner. The resulting posterior over rewards can then be used to synthesize an apprentice policy that performs well on the same or a similar task. A key challenge in Bayesian IRL is bridging the computational gap between the hypothesis space of possible rewards and the likelihood, often defined in terms of Q values: vanilla Bayesian IRL needs to solve the costly forward planning problem – going from rewards to the Q values – at every step of the algorithm, which may need to be done thousands of times. We propose to solve this by a simple change: instead of focusing on primarily sampling in the space of rewards, we can focus on primarily working in the space of Q-values, since the computation required to go from Q-values to reward is radically cheaper. Furthermore, this reversion of the computation makes it easy to compute the gradient allowing efficient sampling using Hamiltonian Monte Carlo. We propose ValueWalk – a new Markov chain Monte Carlo method based on this insight – and illustrate its advantages on several tasks.}
}

Endnote

%0 Conference Paper
%T Walking the Values in Bayesian Inverse Reinforcement Learning
%A Ondrej Bajgar
%A Alessandro Abate
%A Konstantinos Gatsis
%A Michael Osborne
%B Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2024
%E Negar Kiyavash
%E Joris M. Mooij	
%F pmlr-v244-bajgar24a
%I PMLR
%P 273--287
%U https://proceedings.mlr.press/v244/bajgar24a.html
%V 244
%X The goal of Bayesian inverse reinforcement learning (IRL) is recovering a posterior distribution over reward functions using a set of demonstrations from an expert optimizing for a reward unknown to the learner. The resulting posterior over rewards can then be used to synthesize an apprentice policy that performs well on the same or a similar task. A key challenge in Bayesian IRL is bridging the computational gap between the hypothesis space of possible rewards and the likelihood, often defined in terms of Q values: vanilla Bayesian IRL needs to solve the costly forward planning problem – going from rewards to the Q values – at every step of the algorithm, which may need to be done thousands of times. We propose to solve this by a simple change: instead of focusing on primarily sampling in the space of rewards, we can focus on primarily working in the space of Q-values, since the computation required to go from Q-values to reward is radically cheaper. Furthermore, this reversion of the computation makes it easy to compute the gradient allowing efficient sampling using Hamiltonian Monte Carlo. We propose ValueWalk – a new Markov chain Monte Carlo method based on this insight – and illustrate its advantages on several tasks.

APA


Bajgar, O., Abate, A., Gatsis, K. & Osborne, M.. (2024). Walking the Values in Bayesian Inverse Reinforcement Learning. Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 244:273-287 Available from https://proceedings.mlr.press/v244/bajgar24a.html.

Walking the Values in Bayesian Inverse Reinforcement Learning

Abstract

Cite this Paper

Related Material