Stochastically Dominant Distributional Reinforcement Learning

John Martin; Michal Lyskawinski; Xiaohu Li; Brendan Englot

Stochastically Dominant Distributional Reinforcement Learning

John Martin, Michal Lyskawinski, Xiaohu Li, Brendan Englot

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:6745-6754, 2020.

Abstract

We describe a new approach for managing aleatoric uncertainty in the Reinforcement Learning (RL) paradigm. Instead of selecting actions according to a single statistic, we propose a distributional method based on the second-order stochastic dominance (SSD) relation. This compares the inherent dispersion of random returns induced by actions, producing a comprehensive evaluation of the environment’s uncertainty. The necessary conditions for SSD require estimators to predict accurate second moments. To accommodate this, we map the distributional RL problem to a Wasserstein gradient flow, treating the distributional Bellman residual as a potential energy functional. We propose a particle-based algorithm for which we prove optimality and convergence. Our experiments characterize the algorithm’s performance and demonstrate how uncertainty and performance are better balanced using an SSD policy than with other risk measures.

Cite this Paper

BibTeX

@InProceedings{pmlr-v119-martin20a,
  title = 	 {Stochastically Dominant Distributional Reinforcement Learning},
  author =       {Martin, John and Lyskawinski, Michal and Li, Xiaohu and Englot, Brendan},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {6745--6754},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/martin20a/martin20a.pdf},
  url = 	 {https://proceedings.mlr.press/v119/martin20a.html},
  abstract = 	 {We describe a new approach for managing aleatoric uncertainty in the Reinforcement Learning (RL) paradigm. Instead of selecting actions according to a single statistic, we propose a distributional method based on the second-order stochastic dominance (SSD) relation. This compares the inherent dispersion of random returns induced by actions, producing a comprehensive evaluation of the environment’s uncertainty. The necessary conditions for SSD require estimators to predict accurate second moments. To accommodate this, we map the distributional RL problem to a Wasserstein gradient flow, treating the distributional Bellman residual as a potential energy functional. We propose a particle-based algorithm for which we prove optimality and convergence. Our experiments characterize the algorithm’s performance and demonstrate how uncertainty and performance are better balanced using an SSD policy than with other risk measures.}
}

Endnote

%0 Conference Paper
%T Stochastically Dominant Distributional Reinforcement Learning
%A John Martin
%A Michal Lyskawinski
%A Xiaohu Li
%A Brendan Englot
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-martin20a
%I PMLR
%P 6745--6754
%U https://proceedings.mlr.press/v119/martin20a.html
%V 119
%X We describe a new approach for managing aleatoric uncertainty in the Reinforcement Learning (RL) paradigm. Instead of selecting actions according to a single statistic, we propose a distributional method based on the second-order stochastic dominance (SSD) relation. This compares the inherent dispersion of random returns induced by actions, producing a comprehensive evaluation of the environment’s uncertainty. The necessary conditions for SSD require estimators to predict accurate second moments. To accommodate this, we map the distributional RL problem to a Wasserstein gradient flow, treating the distributional Bellman residual as a potential energy functional. We propose a particle-based algorithm for which we prove optimality and convergence. Our experiments characterize the algorithm’s performance and demonstrate how uncertainty and performance are better balanced using an SSD policy than with other risk measures.

APA

Martin, J., Lyskawinski, M., Li, X. & Englot, B.. (2020). Stochastically Dominant Distributional Reinforcement Learning. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:6745-6754 Available from https://proceedings.mlr.press/v119/martin20a.html.

Stochastically Dominant Distributional Reinforcement Learning

Abstract

Cite this Paper

Related Material