DNS: Determinantal Point Process Based Neural Network Sampler for Ensemble Reinforcement Learning

Hassam Sheikh, Kizza Frisbee, Mariano Phielipp
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:19731-19746, 2022.

Abstract

The application of an ensemble of neural networks is becoming an imminent tool for advancing state-of-the-art deep reinforcement learning algorithms. However, training these large numbers of neural networks in the ensemble has an exceedingly high computation cost which may become a hindrance in training large-scale systems. In this paper, we propose DNS: a Determinantal Point Process based Neural Network Sampler that specifically uses k-DPP to sample a subset of neural networks for backpropagation at every training step thus significantly reducing the training time and computation cost. We integrated DNS in REDQ for continuous control tasks and evaluated on MuJoCo environments. Our experiments show that DNS augmented REDQ matches the baseline REDQ in terms of average cumulative reward and achieves this using less than 50% computation when measured in FLOPS. The code is available at https://github.com/IntelLabs/DNS

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-sheikh22a, title = {{DNS}: Determinantal Point Process Based Neural Network Sampler for Ensemble Reinforcement Learning}, author = {Sheikh, Hassam and Frisbee, Kizza and Phielipp, Mariano}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {19731--19746}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/sheikh22a/sheikh22a.pdf}, url = {https://proceedings.mlr.press/v162/sheikh22a.html}, abstract = {The application of an ensemble of neural networks is becoming an imminent tool for advancing state-of-the-art deep reinforcement learning algorithms. However, training these large numbers of neural networks in the ensemble has an exceedingly high computation cost which may become a hindrance in training large-scale systems. In this paper, we propose DNS: a Determinantal Point Process based Neural Network Sampler that specifically uses k-DPP to sample a subset of neural networks for backpropagation at every training step thus significantly reducing the training time and computation cost. We integrated DNS in REDQ for continuous control tasks and evaluated on MuJoCo environments. Our experiments show that DNS augmented REDQ matches the baseline REDQ in terms of average cumulative reward and achieves this using less than 50% computation when measured in FLOPS. The code is available at https://github.com/IntelLabs/DNS} }
Endnote
%0 Conference Paper %T DNS: Determinantal Point Process Based Neural Network Sampler for Ensemble Reinforcement Learning %A Hassam Sheikh %A Kizza Frisbee %A Mariano Phielipp %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-sheikh22a %I PMLR %P 19731--19746 %U https://proceedings.mlr.press/v162/sheikh22a.html %V 162 %X The application of an ensemble of neural networks is becoming an imminent tool for advancing state-of-the-art deep reinforcement learning algorithms. However, training these large numbers of neural networks in the ensemble has an exceedingly high computation cost which may become a hindrance in training large-scale systems. In this paper, we propose DNS: a Determinantal Point Process based Neural Network Sampler that specifically uses k-DPP to sample a subset of neural networks for backpropagation at every training step thus significantly reducing the training time and computation cost. We integrated DNS in REDQ for continuous control tasks and evaluated on MuJoCo environments. Our experiments show that DNS augmented REDQ matches the baseline REDQ in terms of average cumulative reward and achieves this using less than 50% computation when measured in FLOPS. The code is available at https://github.com/IntelLabs/DNS
APA
Sheikh, H., Frisbee, K. & Phielipp, M.. (2022). DNS: Determinantal Point Process Based Neural Network Sampler for Ensemble Reinforcement Learning. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:19731-19746 Available from https://proceedings.mlr.press/v162/sheikh22a.html.

Related Material