Adaptive Trade-Offs in Off-Policy Learning

Mark Rowland; Will Dabney; Remi Munos

Adaptive Trade-Offs in Off-Policy Learning

Mark Rowland, Will Dabney, Remi Munos

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:34-44, 2020.

Abstract

A great variety of off-policy learning algorithms exist in the literature, and new breakthroughs in this area continue to be made, improving theoretical understanding and yielding state-of-the-art reinforcement learning algorithms. In this paper, we take a unifying view of this space of algorithms, and consider their trade-offs of three fundamental quantities: update variance, fixed-point bias, and contraction rate. This leads to new perspectives on existing methods, and also naturally yields novel algorithms for off-policy evaluation and control. We develop one such algorithm, C-trace, demonstrating that it is able to more efficiently make these trade-offs than existing methods in use, and that it can be scaled to yield state-of-the-art performance in large-scale environments.

Cite this Paper

BibTeX

@InProceedings{pmlr-v108-rowland20a,
  title = 	 {Adaptive Trade-Offs in Off-Policy Learning},
  author =       {Rowland, Mark and Dabney, Will and Munos, Remi},
  booktitle = 	 {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics},
  pages = 	 {34--44},
  year = 	 {2020},
  editor = 	 {Chiappa, Silvia and Calandra, Roberto},
  volume = 	 {108},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {26--28 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v108/rowland20a/rowland20a.pdf},
  url = 	 {https://proceedings.mlr.press/v108/rowland20a.html},
  abstract = 	 {A great variety of off-policy learning algorithms exist in the literature, and new breakthroughs in this area continue to be made, improving theoretical understanding and yielding state-of-the-art reinforcement learning algorithms. In this paper, we take a unifying view of this space of algorithms, and consider their trade-offs of three fundamental quantities: update variance, fixed-point bias, and contraction rate. This leads to new perspectives on existing methods, and also naturally yields novel algorithms for off-policy evaluation and control. We develop one such algorithm, C-trace, demonstrating that it is able to more efficiently make these trade-offs than existing methods in use, and that it can be scaled to yield state-of-the-art performance in large-scale environments.}
}

Endnote

%0 Conference Paper
%T Adaptive Trade-Offs in Off-Policy Learning
%A Mark Rowland
%A Will Dabney
%A Remi Munos
%B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2020
%E Silvia Chiappa
%E Roberto Calandra	
%F pmlr-v108-rowland20a
%I PMLR
%P 34--44
%U https://proceedings.mlr.press/v108/rowland20a.html
%V 108
%X A great variety of off-policy learning algorithms exist in the literature, and new breakthroughs in this area continue to be made, improving theoretical understanding and yielding state-of-the-art reinforcement learning algorithms. In this paper, we take a unifying view of this space of algorithms, and consider their trade-offs of three fundamental quantities: update variance, fixed-point bias, and contraction rate. This leads to new perspectives on existing methods, and also naturally yields novel algorithms for off-policy evaluation and control. We develop one such algorithm, C-trace, demonstrating that it is able to more efficiently make these trade-offs than existing methods in use, and that it can be scaled to yield state-of-the-art performance in large-scale environments.

APA

Rowland, M., Dabney, W. & Munos, R.. (2020). Adaptive Trade-Offs in Off-Policy Learning. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:34-44 Available from https://proceedings.mlr.press/v108/rowland20a.html.

Adaptive Trade-Offs in Off-Policy Learning

Abstract

Cite this Paper

Related Material