Graph neural networks extrapolate out-of-distribution for shortest paths

Robert R. Nerem, Samantha Chen, Sanjoy Dasgupta, Yusu Wang
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:5273-5331, 2026.

Abstract

Neural networks (NNs), despite their success and wide adoption, still struggle to extrapolate out-of-distribution (OOD), i.e., to inputs that are not well-represented by their training dataset. Addressing the OOD generalization gap is crucial when models are deployed in environments significantly different from the training set, such as applying Graph Neural Networks (GNNs) trained on small graphs to large, real-world graphs. One promising approach for achieving robust OOD generalization is the framework of neural algorithmic alignment, which incorporates ideas from classical algorithms by designing neural architectures that resemble specific algorithmic paradigms (e.g. dynamic programming). The hope is that trained models of this form would have superior OOD capabilities, in much the same way that classical algorithms work for all instances. We employ sparsity regularization as a tool for analyzing the role of algorithmic alignment in achieving OOD generalization, focusing on graph neural networks (GNNs) applied to the canonical shortest path problem. We prove that if a trained GNN minimizes a sparsity-regularized loss over a small set of shortest-path instances, then the GNN implements $K$ steps of the Bellman-Ford algorithm for shortest paths. In fact, if a trained GNN minimizes this loss within an error of $\epsilon$, it computes $K$-step shortest path distances up to error $O(\epsilon)$. Our empirical results support our theory by showing that NNs trained by gradient descent are able to minimize this loss and extrapolate in practice.

Cite this Paper


BibTeX
@InProceedings{pmlr-v336-nerem26a, title = {Graph neural networks extrapolate out-of-distribution for shortest paths}, author = {Nerem, Robert R. and Chen, Samantha and Dasgupta, Sanjoy and Wang, Yusu}, booktitle = {Proceedings of Thirty Ninth Conference on Learning Theory}, pages = {5273--5331}, year = {2026}, editor = {Hanneke, Steve and Lattimore, Tor}, volume = {336}, series = {Proceedings of Machine Learning Research}, month = {29 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v336/main/assets/nerem26a/nerem26a.pdf}, url = {https://proceedings.mlr.press/v336/nerem26a.html}, abstract = {Neural networks (NNs), despite their success and wide adoption, still struggle to extrapolate out-of-distribution (OOD), i.e., to inputs that are not well-represented by their training dataset. Addressing the OOD generalization gap is crucial when models are deployed in environments significantly different from the training set, such as applying Graph Neural Networks (GNNs) trained on small graphs to large, real-world graphs. One promising approach for achieving robust OOD generalization is the framework of neural algorithmic alignment, which incorporates ideas from classical algorithms by designing neural architectures that resemble specific algorithmic paradigms (e.g. dynamic programming). The hope is that trained models of this form would have superior OOD capabilities, in much the same way that classical algorithms work for all instances. We employ sparsity regularization as a tool for analyzing the role of algorithmic alignment in achieving OOD generalization, focusing on graph neural networks (GNNs) applied to the canonical shortest path problem. We prove that if a trained GNN minimizes a sparsity-regularized loss over a small set of shortest-path instances, then the GNN implements $K$ steps of the Bellman-Ford algorithm for shortest paths. In fact, if a trained GNN minimizes this loss within an error of $\epsilon$, it computes $K$-step shortest path distances up to error $O(\epsilon)$. Our empirical results support our theory by showing that NNs trained by gradient descent are able to minimize this loss and extrapolate in practice.} }
Endnote
%0 Conference Paper %T Graph neural networks extrapolate out-of-distribution for shortest paths %A Robert R. Nerem %A Samantha Chen %A Sanjoy Dasgupta %A Yusu Wang %B Proceedings of Thirty Ninth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2026 %E Steve Hanneke %E Tor Lattimore %F pmlr-v336-nerem26a %I PMLR %P 5273--5331 %U https://proceedings.mlr.press/v336/nerem26a.html %V 336 %X Neural networks (NNs), despite their success and wide adoption, still struggle to extrapolate out-of-distribution (OOD), i.e., to inputs that are not well-represented by their training dataset. Addressing the OOD generalization gap is crucial when models are deployed in environments significantly different from the training set, such as applying Graph Neural Networks (GNNs) trained on small graphs to large, real-world graphs. One promising approach for achieving robust OOD generalization is the framework of neural algorithmic alignment, which incorporates ideas from classical algorithms by designing neural architectures that resemble specific algorithmic paradigms (e.g. dynamic programming). The hope is that trained models of this form would have superior OOD capabilities, in much the same way that classical algorithms work for all instances. We employ sparsity regularization as a tool for analyzing the role of algorithmic alignment in achieving OOD generalization, focusing on graph neural networks (GNNs) applied to the canonical shortest path problem. We prove that if a trained GNN minimizes a sparsity-regularized loss over a small set of shortest-path instances, then the GNN implements $K$ steps of the Bellman-Ford algorithm for shortest paths. In fact, if a trained GNN minimizes this loss within an error of $\epsilon$, it computes $K$-step shortest path distances up to error $O(\epsilon)$. Our empirical results support our theory by showing that NNs trained by gradient descent are able to minimize this loss and extrapolate in practice.
APA
Nerem, R.R., Chen, S., Dasgupta, S. & Wang, Y.. (2026). Graph neural networks extrapolate out-of-distribution for shortest paths. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:5273-5331 Available from https://proceedings.mlr.press/v336/nerem26a.html.

Related Material