Interference and Generalization in Temporal Difference Learning

Emmanuel Bengio, Joelle Pineau, Doina Precup
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:767-777, 2020.

Abstract

We study the link between generalization and interference in temporal-difference (TD) learning. Interference is defined as the inner product of two different gradients, representing their alignment; this quantity emerges as being of interest from a variety of observations about neural networks, parameter sharing and the dynamics of learning. We find that TD easily leads to low-interference, under-generalizing parameters, while the effect seems reversed in supervised learning. We hypothesize that the cause can be traced back to the interplay between the dynamics of interference and bootstrapping. This is supported empirically by several observations: the negative relationship between the generalization gap and interference in TD, the negative effect of bootstrapping on interference and the local coherence of targets, and the contrast between the propagation rate of information in TD(0) versus TD($\lambda$) and regression tasks such as Monte-Carlo policy evaluation. We hope that these new findings can guide the future discovery of better bootstrapping methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-bengio20a, title = {Interference and Generalization in Temporal Difference Learning}, author = {Bengio, Emmanuel and Pineau, Joelle and Precup, Doina}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {767--777}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/bengio20a/bengio20a.pdf}, url = {https://proceedings.mlr.press/v119/bengio20a.html}, abstract = {We study the link between generalization and interference in temporal-difference (TD) learning. Interference is defined as the inner product of two different gradients, representing their alignment; this quantity emerges as being of interest from a variety of observations about neural networks, parameter sharing and the dynamics of learning. We find that TD easily leads to low-interference, under-generalizing parameters, while the effect seems reversed in supervised learning. We hypothesize that the cause can be traced back to the interplay between the dynamics of interference and bootstrapping. This is supported empirically by several observations: the negative relationship between the generalization gap and interference in TD, the negative effect of bootstrapping on interference and the local coherence of targets, and the contrast between the propagation rate of information in TD(0) versus TD($\lambda$) and regression tasks such as Monte-Carlo policy evaluation. We hope that these new findings can guide the future discovery of better bootstrapping methods.} }
Endnote
%0 Conference Paper %T Interference and Generalization in Temporal Difference Learning %A Emmanuel Bengio %A Joelle Pineau %A Doina Precup %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-bengio20a %I PMLR %P 767--777 %U https://proceedings.mlr.press/v119/bengio20a.html %V 119 %X We study the link between generalization and interference in temporal-difference (TD) learning. Interference is defined as the inner product of two different gradients, representing their alignment; this quantity emerges as being of interest from a variety of observations about neural networks, parameter sharing and the dynamics of learning. We find that TD easily leads to low-interference, under-generalizing parameters, while the effect seems reversed in supervised learning. We hypothesize that the cause can be traced back to the interplay between the dynamics of interference and bootstrapping. This is supported empirically by several observations: the negative relationship between the generalization gap and interference in TD, the negative effect of bootstrapping on interference and the local coherence of targets, and the contrast between the propagation rate of information in TD(0) versus TD($\lambda$) and regression tasks such as Monte-Carlo policy evaluation. We hope that these new findings can guide the future discovery of better bootstrapping methods.
APA
Bengio, E., Pineau, J. & Precup, D.. (2020). Interference and Generalization in Temporal Difference Learning. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:767-777 Available from https://proceedings.mlr.press/v119/bengio20a.html.

Related Material