Highway Value Iteration Networks

Yuhui Wang, Weida Li, Francesco Faccio, Qingyuan Wu, Jürgen Schmidhuber
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:50807-50821, 2024.

Abstract

Value iteration networks (VINs) enable end-to-end learning for planning tasks by employing a differentiable "planning module" that approximates the value iteration algorithm. However, long-term planning remains a challenge because training very deep VINs is difficult. To address this problem, we embed highway value iteration—a recent algorithm designed to facilitate long-term credit assignment—into the structure of VINs. This improvement augments the "planning module" of the VIN with three additional components: 1) an "aggregate gate," which constructs skip connections to improve information flow across many layers; 2) an "exploration module," crafted to increase the diversity of information and gradient flow in spatial dimensions; 3) a "filter gate" designed to ensure safe exploration. The resulting novel highway VIN can be trained effectively with hundreds of layers using standard backpropagation. In long-term planning tasks requiring hundreds of planning steps, deep highway VINs outperform both traditional VINs and several advanced, very deep NNs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-wang24ai, title = {Highway Value Iteration Networks}, author = {Wang, Yuhui and Li, Weida and Faccio, Francesco and Wu, Qingyuan and Schmidhuber, J\"{u}rgen}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {50807--50821}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/wang24ai/wang24ai.pdf}, url = {https://proceedings.mlr.press/v235/wang24ai.html}, abstract = {Value iteration networks (VINs) enable end-to-end learning for planning tasks by employing a differentiable "planning module" that approximates the value iteration algorithm. However, long-term planning remains a challenge because training very deep VINs is difficult. To address this problem, we embed highway value iteration—a recent algorithm designed to facilitate long-term credit assignment—into the structure of VINs. This improvement augments the "planning module" of the VIN with three additional components: 1) an "aggregate gate," which constructs skip connections to improve information flow across many layers; 2) an "exploration module," crafted to increase the diversity of information and gradient flow in spatial dimensions; 3) a "filter gate" designed to ensure safe exploration. The resulting novel highway VIN can be trained effectively with hundreds of layers using standard backpropagation. In long-term planning tasks requiring hundreds of planning steps, deep highway VINs outperform both traditional VINs and several advanced, very deep NNs.} }
Endnote
%0 Conference Paper %T Highway Value Iteration Networks %A Yuhui Wang %A Weida Li %A Francesco Faccio %A Qingyuan Wu %A Jürgen Schmidhuber %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-wang24ai %I PMLR %P 50807--50821 %U https://proceedings.mlr.press/v235/wang24ai.html %V 235 %X Value iteration networks (VINs) enable end-to-end learning for planning tasks by employing a differentiable "planning module" that approximates the value iteration algorithm. However, long-term planning remains a challenge because training very deep VINs is difficult. To address this problem, we embed highway value iteration—a recent algorithm designed to facilitate long-term credit assignment—into the structure of VINs. This improvement augments the "planning module" of the VIN with three additional components: 1) an "aggregate gate," which constructs skip connections to improve information flow across many layers; 2) an "exploration module," crafted to increase the diversity of information and gradient flow in spatial dimensions; 3) a "filter gate" designed to ensure safe exploration. The resulting novel highway VIN can be trained effectively with hundreds of layers using standard backpropagation. In long-term planning tasks requiring hundreds of planning steps, deep highway VINs outperform both traditional VINs and several advanced, very deep NNs.
APA
Wang, Y., Li, W., Faccio, F., Wu, Q. & Schmidhuber, J.. (2024). Highway Value Iteration Networks. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:50807-50821 Available from https://proceedings.mlr.press/v235/wang24ai.html.

Related Material