Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity

Tanmay Gangwani, Jian Peng, Yuan Zhou
Proceedings of the 2020 Conference on Robot Learning, PMLR 155:2206-2215, 2021.

Abstract

Quality-Diversity (QD) is a concept from Neuroevolution with some intriguing applications to Reinforcement Learning. It facilitates learning a population of agents where each member is optimized to simultaneously accumulate high task-returns and exhibit behavioral diversity compared to other members. In this paper, we build on a recent kernel-based method for training a QD policy ensemble with Stein variational gradient descent. With kernels based on $f$-divergence between the stationary distributions of policies, we convert the problem to that of efficient estimation of the ratio of these stationary distributions. We then study various distribution ratio estimators used previously for off-policy evaluation and imitation and re-purpose them to compute the gradients for policies in an ensemble such that the resultant population is diverse and of high-quality.

Cite this Paper


BibTeX
@InProceedings{pmlr-v155-gangwani21a, title = {Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity}, author = {Gangwani, Tanmay and Peng, Jian and Zhou, Yuan}, booktitle = {Proceedings of the 2020 Conference on Robot Learning}, pages = {2206--2215}, year = {2021}, editor = {Kober, Jens and Ramos, Fabio and Tomlin, Claire}, volume = {155}, series = {Proceedings of Machine Learning Research}, month = {16--18 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v155/gangwani21a/gangwani21a.pdf}, url = {https://proceedings.mlr.press/v155/gangwani21a.html}, abstract = {Quality-Diversity (QD) is a concept from Neuroevolution with some intriguing applications to Reinforcement Learning. It facilitates learning a population of agents where each member is optimized to simultaneously accumulate high task-returns and exhibit behavioral diversity compared to other members. In this paper, we build on a recent kernel-based method for training a QD policy ensemble with Stein variational gradient descent. With kernels based on $f$-divergence between the stationary distributions of policies, we convert the problem to that of efficient estimation of the ratio of these stationary distributions. We then study various distribution ratio estimators used previously for off-policy evaluation and imitation and re-purpose them to compute the gradients for policies in an ensemble such that the resultant population is diverse and of high-quality.} }
Endnote
%0 Conference Paper %T Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity %A Tanmay Gangwani %A Jian Peng %A Yuan Zhou %B Proceedings of the 2020 Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2021 %E Jens Kober %E Fabio Ramos %E Claire Tomlin %F pmlr-v155-gangwani21a %I PMLR %P 2206--2215 %U https://proceedings.mlr.press/v155/gangwani21a.html %V 155 %X Quality-Diversity (QD) is a concept from Neuroevolution with some intriguing applications to Reinforcement Learning. It facilitates learning a population of agents where each member is optimized to simultaneously accumulate high task-returns and exhibit behavioral diversity compared to other members. In this paper, we build on a recent kernel-based method for training a QD policy ensemble with Stein variational gradient descent. With kernels based on $f$-divergence between the stationary distributions of policies, we convert the problem to that of efficient estimation of the ratio of these stationary distributions. We then study various distribution ratio estimators used previously for off-policy evaluation and imitation and re-purpose them to compute the gradients for policies in an ensemble such that the resultant population is diverse and of high-quality.
APA
Gangwani, T., Peng, J. & Zhou, Y.. (2021). Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity. Proceedings of the 2020 Conference on Robot Learning, in Proceedings of Machine Learning Research 155:2206-2215 Available from https://proceedings.mlr.press/v155/gangwani21a.html.

Related Material