Programming with a Differentiable Forth Interpreter

Matko Bošnjak, Tim Rocktäschel, Jason Naradowsky, Sebastian Riedel
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:547-556, 2017.

Abstract

Given that in practice training data is scarce for all but a small set of problems, a core question is how to incorporate prior knowledge into a model. In this paper, we consider the case of prior procedural knowledge for neural networks, such as knowing how a program should traverse a sequence, but not what local actions should be performed at each step. To this end, we present an end-to-end differentiable interpreter for the programming language Forth which enables programmers to write program sketches with slots that can be filled with behaviour trained from program input-output data. We can optimise this behaviour directly through gradient descent techniques on user-specified objectives, and also integrate the program into any larger neural computation graph. We show empirically that our interpreter is able to effectively leverage different levels of prior program structure and learn complex behaviours such as sequence sorting and addition. When connected to outputs of an LSTM and trained jointly, our interpreter achieves state-of-the-art accuracy for end-to-end reasoning about quantities expressed in natural language stories.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-bosnjak17a, title = {Programming with a Differentiable Forth Interpreter}, author = {Matko Bo{\v{s}}njak and Tim Rockt{\"a}schel and Jason Naradowsky and Sebastian Riedel}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {547--556}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/bosnjak17a/bosnjak17a.pdf}, url = {https://proceedings.mlr.press/v70/bosnjak17a.html}, abstract = {Given that in practice training data is scarce for all but a small set of problems, a core question is how to incorporate prior knowledge into a model. In this paper, we consider the case of prior procedural knowledge for neural networks, such as knowing how a program should traverse a sequence, but not what local actions should be performed at each step. To this end, we present an end-to-end differentiable interpreter for the programming language Forth which enables programmers to write program sketches with slots that can be filled with behaviour trained from program input-output data. We can optimise this behaviour directly through gradient descent techniques on user-specified objectives, and also integrate the program into any larger neural computation graph. We show empirically that our interpreter is able to effectively leverage different levels of prior program structure and learn complex behaviours such as sequence sorting and addition. When connected to outputs of an LSTM and trained jointly, our interpreter achieves state-of-the-art accuracy for end-to-end reasoning about quantities expressed in natural language stories.} }
Endnote
%0 Conference Paper %T Programming with a Differentiable Forth Interpreter %A Matko Bošnjak %A Tim Rocktäschel %A Jason Naradowsky %A Sebastian Riedel %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-bosnjak17a %I PMLR %P 547--556 %U https://proceedings.mlr.press/v70/bosnjak17a.html %V 70 %X Given that in practice training data is scarce for all but a small set of problems, a core question is how to incorporate prior knowledge into a model. In this paper, we consider the case of prior procedural knowledge for neural networks, such as knowing how a program should traverse a sequence, but not what local actions should be performed at each step. To this end, we present an end-to-end differentiable interpreter for the programming language Forth which enables programmers to write program sketches with slots that can be filled with behaviour trained from program input-output data. We can optimise this behaviour directly through gradient descent techniques on user-specified objectives, and also integrate the program into any larger neural computation graph. We show empirically that our interpreter is able to effectively leverage different levels of prior program structure and learn complex behaviours such as sequence sorting and addition. When connected to outputs of an LSTM and trained jointly, our interpreter achieves state-of-the-art accuracy for end-to-end reasoning about quantities expressed in natural language stories.
APA
Bošnjak, M., Rocktäschel, T., Naradowsky, J. & Riedel, S.. (2017). Programming with a Differentiable Forth Interpreter. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:547-556 Available from https://proceedings.mlr.press/v70/bosnjak17a.html.

Related Material