Multi-Scale High-Resolution Logarithmic Grapher Module for Efficient Vision GNNs

Mustafa Munir, Alex Zhang, Radu Marculescu
Proceedings of the Third Learning on Graphs Conference, PMLR 269:37:1-37:13, 2025.

Abstract

Vision graph neural networks (ViG) have demonstrated promise in vision tasks as a competitive alternative to conventional convolutional neural nets (CNN) and transformers (ViTs); however, common graph construction methods, such as k-nearest neighbor (KNN), can be expensive on larger images. While methods such as Sparse Vision Graph Attention (SVGA) have shown promise, SVGA’s fixed step scale can lead to over-squashing and missing multiple connections to gain the same information that could be gained from a long-range link. Through this observation, we propose a new graph construction method, Logarithmic Scalable Graph Construction (LSGC) to enhance performance by limiting the number of long-range links. To this end, we propose LogViG, a novel hybrid CNN-GNN model that utilizes LSGC. Furthermore, inspired by the successes of multi-scale and high-resolution architectures, we introduce and apply a high-resolution branch and fuse features between our high-resolution and low-resolution branches for a multi-scale high-resolution Vision GNN network. Extensive experiments show that LogViG beats existing ViG, CNN, and ViT architectures in terms of accuracy, GMACs, and parameters on image classification and semantic segmentation tasks. Our smallest model, Ti-LogViG, achieves an average top-1 accuracy on ImageNet-1K of 79.9\% with a standard deviation of \textdollar \pm\textdollar 0.2\%, 1.7\% higher average accuracy than Vision GNN with a 24.3\% reduction in parameters and 35.3\% reduction in GMACs. Our work shows that leveraging long-range links in graph construction for ViGs through our proposed LSGC can exceed the performance of current state-of-the-art ViGs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v269-munir25a, title = {Multi-Scale High-Resolution Logarithmic Grapher Module for Efficient Vision GNNs}, author = {Munir, Mustafa and Zhang, Alex and Marculescu, Radu}, booktitle = {Proceedings of the Third Learning on Graphs Conference}, pages = {37:1--37:13}, year = {2025}, editor = {Wolf, Guy and Krishnaswamy, Smita}, volume = {269}, series = {Proceedings of Machine Learning Research}, month = {26--29 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v269/main/assets/munir25a/munir25a.pdf}, url = {https://proceedings.mlr.press/v269/munir25a.html}, abstract = {Vision graph neural networks (ViG) have demonstrated promise in vision tasks as a competitive alternative to conventional convolutional neural nets (CNN) and transformers (ViTs); however, common graph construction methods, such as k-nearest neighbor (KNN), can be expensive on larger images. While methods such as Sparse Vision Graph Attention (SVGA) have shown promise, SVGA’s fixed step scale can lead to over-squashing and missing multiple connections to gain the same information that could be gained from a long-range link. Through this observation, we propose a new graph construction method, Logarithmic Scalable Graph Construction (LSGC) to enhance performance by limiting the number of long-range links. To this end, we propose LogViG, a novel hybrid CNN-GNN model that utilizes LSGC. Furthermore, inspired by the successes of multi-scale and high-resolution architectures, we introduce and apply a high-resolution branch and fuse features between our high-resolution and low-resolution branches for a multi-scale high-resolution Vision GNN network. Extensive experiments show that LogViG beats existing ViG, CNN, and ViT architectures in terms of accuracy, GMACs, and parameters on image classification and semantic segmentation tasks. Our smallest model, Ti-LogViG, achieves an average top-1 accuracy on ImageNet-1K of 79.9\% with a standard deviation of \textdollar \pm\textdollar 0.2\%, 1.7\% higher average accuracy than Vision GNN with a 24.3\% reduction in parameters and 35.3\% reduction in GMACs. Our work shows that leveraging long-range links in graph construction for ViGs through our proposed LSGC can exceed the performance of current state-of-the-art ViGs.} }
Endnote
%0 Conference Paper %T Multi-Scale High-Resolution Logarithmic Grapher Module for Efficient Vision GNNs %A Mustafa Munir %A Alex Zhang %A Radu Marculescu %B Proceedings of the Third Learning on Graphs Conference %C Proceedings of Machine Learning Research %D 2025 %E Guy Wolf %E Smita Krishnaswamy %F pmlr-v269-munir25a %I PMLR %P 37:1--37:13 %U https://proceedings.mlr.press/v269/munir25a.html %V 269 %X Vision graph neural networks (ViG) have demonstrated promise in vision tasks as a competitive alternative to conventional convolutional neural nets (CNN) and transformers (ViTs); however, common graph construction methods, such as k-nearest neighbor (KNN), can be expensive on larger images. While methods such as Sparse Vision Graph Attention (SVGA) have shown promise, SVGA’s fixed step scale can lead to over-squashing and missing multiple connections to gain the same information that could be gained from a long-range link. Through this observation, we propose a new graph construction method, Logarithmic Scalable Graph Construction (LSGC) to enhance performance by limiting the number of long-range links. To this end, we propose LogViG, a novel hybrid CNN-GNN model that utilizes LSGC. Furthermore, inspired by the successes of multi-scale and high-resolution architectures, we introduce and apply a high-resolution branch and fuse features between our high-resolution and low-resolution branches for a multi-scale high-resolution Vision GNN network. Extensive experiments show that LogViG beats existing ViG, CNN, and ViT architectures in terms of accuracy, GMACs, and parameters on image classification and semantic segmentation tasks. Our smallest model, Ti-LogViG, achieves an average top-1 accuracy on ImageNet-1K of 79.9\% with a standard deviation of \textdollar \pm\textdollar 0.2\%, 1.7\% higher average accuracy than Vision GNN with a 24.3\% reduction in parameters and 35.3\% reduction in GMACs. Our work shows that leveraging long-range links in graph construction for ViGs through our proposed LSGC can exceed the performance of current state-of-the-art ViGs.
APA
Munir, M., Zhang, A. & Marculescu, R.. (2025). Multi-Scale High-Resolution Logarithmic Grapher Module for Efficient Vision GNNs. Proceedings of the Third Learning on Graphs Conference, in Proceedings of Machine Learning Research 269:37:1-37:13 Available from https://proceedings.mlr.press/v269/munir25a.html.

Related Material