Distilling Policy Distillation

Wojciech M. Czarnecki, Razvan Pascanu, Simon Osindero, Siddhant Jayakumar, Grzegorz Swirszcz, Max Jaderberg
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:1331-1340, 2019.

Abstract

The transfer of knowledge from one policy to another is an important tool in Deep Reinforcement Learning. This process, referred to as distillation, has been used to great success, for example, by enhancing the optimisation of agents, leading to stronger performance faster, on harder domains. Despite the widespread use and conceptual simplicity of distillation, many different formulations are used in practice, and the subtle variations between them can often drastically change the performance and the resulting objective that is being optimised. In this work, we rigorously explore the entire landscape of policy distillation, comparing the motivations and strengths of each variant through theoretical and empirical analysis. Our results point to three distillation techniques, that are preferred depending on specifics of the task. Specifically a newly proposed expected entropy regularised distillation allows for quicker learning in a wide range of situations, while still guaranteeing convergence.

Cite this Paper


BibTeX
@InProceedings{pmlr-v89-czarnecki19a, title = {Distilling Policy Distillation}, author = {Czarnecki, Wojciech M. and Pascanu, Razvan and Osindero, Simon and Jayakumar, Siddhant and Swirszcz, Grzegorz and Jaderberg, Max}, booktitle = {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics}, pages = {1331--1340}, year = {2019}, editor = {Chaudhuri, Kamalika and Sugiyama, Masashi}, volume = {89}, series = {Proceedings of Machine Learning Research}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v89/czarnecki19a/czarnecki19a.pdf}, url = {https://proceedings.mlr.press/v89/czarnecki19a.html}, abstract = {The transfer of knowledge from one policy to another is an important tool in Deep Reinforcement Learning. This process, referred to as distillation, has been used to great success, for example, by enhancing the optimisation of agents, leading to stronger performance faster, on harder domains. Despite the widespread use and conceptual simplicity of distillation, many different formulations are used in practice, and the subtle variations between them can often drastically change the performance and the resulting objective that is being optimised. In this work, we rigorously explore the entire landscape of policy distillation, comparing the motivations and strengths of each variant through theoretical and empirical analysis. Our results point to three distillation techniques, that are preferred depending on specifics of the task. Specifically a newly proposed expected entropy regularised distillation allows for quicker learning in a wide range of situations, while still guaranteeing convergence.} }
Endnote
%0 Conference Paper %T Distilling Policy Distillation %A Wojciech M. Czarnecki %A Razvan Pascanu %A Simon Osindero %A Siddhant Jayakumar %A Grzegorz Swirszcz %A Max Jaderberg %B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Masashi Sugiyama %F pmlr-v89-czarnecki19a %I PMLR %P 1331--1340 %U https://proceedings.mlr.press/v89/czarnecki19a.html %V 89 %X The transfer of knowledge from one policy to another is an important tool in Deep Reinforcement Learning. This process, referred to as distillation, has been used to great success, for example, by enhancing the optimisation of agents, leading to stronger performance faster, on harder domains. Despite the widespread use and conceptual simplicity of distillation, many different formulations are used in practice, and the subtle variations between them can often drastically change the performance and the resulting objective that is being optimised. In this work, we rigorously explore the entire landscape of policy distillation, comparing the motivations and strengths of each variant through theoretical and empirical analysis. Our results point to three distillation techniques, that are preferred depending on specifics of the task. Specifically a newly proposed expected entropy regularised distillation allows for quicker learning in a wide range of situations, while still guaranteeing convergence.
APA
Czarnecki, W.M., Pascanu, R., Osindero, S., Jayakumar, S., Swirszcz, G. & Jaderberg, M.. (2019). Distilling Policy Distillation. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:1331-1340 Available from https://proceedings.mlr.press/v89/czarnecki19a.html.

Related Material