Distributional reinforcement learning with linear function approximation

Marc G. Bellemare; Nicolas Le Roux; Pablo Samuel Castro; Subhodeep Moitra

Distributional reinforcement learning with linear function approximation

Marc G. Bellemare, Nicolas Le Roux, Pablo Samuel Castro, Subhodeep Moitra

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:2203-2211, 2019.

Abstract

Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited. One exception is Rowland et al. (2018)’s analysis of the C51 algorithm in terms of the Cramer distance, but their results only apply to the tabular setting and ignore C51’s use of a softmax to produce normalized distributions. In this paper we adapt the Cramer distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cramer-based and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the model’s prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the first proof of convergence of a distributional algorithm combined with function approximation. Perhaps surprisingly, our results provide evidence that Cramer-based distributional methods may perform worse than directly approximating the value function.

Cite this Paper

BibTeX

@InProceedings{pmlr-v89-bellemare19a,
  title = 	 {Distributional reinforcement learning with linear function approximation},
  author =       {Bellemare, Marc G. and Roux, Nicolas Le and Castro, Pablo Samuel and Moitra, Subhodeep},
  booktitle = 	 {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2203--2211},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Sugiyama, Masashi},
  volume = 	 {89},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v89/bellemare19a/bellemare19a.pdf},
  url = 	 {https://proceedings.mlr.press/v89/bellemare19a.html},
  abstract = 	 {Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited. One exception is Rowland et al. (2018)’s analysis of the C51 algorithm in terms of the Cramer distance, but their results only apply to the tabular setting and ignore C51’s use of a softmax to produce normalized distributions. In this paper we adapt the Cramer distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cramer-based and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the model’s prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the first proof of convergence of a distributional algorithm combined with function approximation. Perhaps surprisingly, our results provide evidence that Cramer-based distributional methods may perform worse than directly approximating the value function.}
}

Endnote

%0 Conference Paper
%T Distributional reinforcement learning with linear function approximation
%A Marc G. Bellemare
%A Nicolas Le Roux
%A Pablo Samuel Castro
%A Subhodeep Moitra
%B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Masashi Sugiyama	
%F pmlr-v89-bellemare19a
%I PMLR
%P 2203--2211
%U https://proceedings.mlr.press/v89/bellemare19a.html
%V 89
%X Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited. One exception is Rowland et al. (2018)’s analysis of the C51 algorithm in terms of the Cramer distance, but their results only apply to the tabular setting and ignore C51’s use of a softmax to produce normalized distributions. In this paper we adapt the Cramer distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cramer-based and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the model’s prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the first proof of convergence of a distributional algorithm combined with function approximation. Perhaps surprisingly, our results provide evidence that Cramer-based distributional methods may perform worse than directly approximating the value function.

APA

Bellemare, M.G., Roux, N.L., Castro, P.S. & Moitra, S.. (2019). Distributional reinforcement learning with linear function approximation. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:2203-2211 Available from https://proceedings.mlr.press/v89/bellemare19a.html.

Distributional reinforcement learning with linear function approximation

Abstract

Cite this Paper

Related Material