LatentGNN: Learning Efficient Non-local Relations for Visual Recognition

Songyang Zhang, Xuming He, Shipeng Yan
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:7374-7383, 2019.

Abstract

Capturing long-range dependencies in feature representations is crucial for many visual recognition tasks. Despite recent successes of deep convolutional networks, it remains challenging to model non-local context relations between visual features. A promising strategy is to model the feature context by a fully-connected graph neural network (GNN), which augments traditional convolutional features with an estimated non-local context representation. However, most GNN-based approaches require computing a dense graph affinity matrix and hence have difficulty in scaling up to tackle complex real-world visual problems. In this work, we propose an efficient and yet flexible non-local relation representation based on a novel class of graph neural networks. Our key idea is to introduce a latent space to reduce the complexity of graph, which allows us to use a low-rank representation for the graph affinity matrix and to achieve a linear complexity in computation. Extensive experimental evaluations on three major visual recognition tasks show that our method outperforms the prior works with a large margin while maintaining a low computation cost.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-zhang19f, title = {{L}atent{GNN}: Learning Efficient Non-local Relations for Visual Recognition}, author = {Zhang, Songyang and He, Xuming and Yan, Shipeng}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {7374--7383}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/zhang19f/zhang19f.pdf}, url = {https://proceedings.mlr.press/v97/zhang19f.html}, abstract = {Capturing long-range dependencies in feature representations is crucial for many visual recognition tasks. Despite recent successes of deep convolutional networks, it remains challenging to model non-local context relations between visual features. A promising strategy is to model the feature context by a fully-connected graph neural network (GNN), which augments traditional convolutional features with an estimated non-local context representation. However, most GNN-based approaches require computing a dense graph affinity matrix and hence have difficulty in scaling up to tackle complex real-world visual problems. In this work, we propose an efficient and yet flexible non-local relation representation based on a novel class of graph neural networks. Our key idea is to introduce a latent space to reduce the complexity of graph, which allows us to use a low-rank representation for the graph affinity matrix and to achieve a linear complexity in computation. Extensive experimental evaluations on three major visual recognition tasks show that our method outperforms the prior works with a large margin while maintaining a low computation cost.} }
Endnote
%0 Conference Paper %T LatentGNN: Learning Efficient Non-local Relations for Visual Recognition %A Songyang Zhang %A Xuming He %A Shipeng Yan %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-zhang19f %I PMLR %P 7374--7383 %U https://proceedings.mlr.press/v97/zhang19f.html %V 97 %X Capturing long-range dependencies in feature representations is crucial for many visual recognition tasks. Despite recent successes of deep convolutional networks, it remains challenging to model non-local context relations between visual features. A promising strategy is to model the feature context by a fully-connected graph neural network (GNN), which augments traditional convolutional features with an estimated non-local context representation. However, most GNN-based approaches require computing a dense graph affinity matrix and hence have difficulty in scaling up to tackle complex real-world visual problems. In this work, we propose an efficient and yet flexible non-local relation representation based on a novel class of graph neural networks. Our key idea is to introduce a latent space to reduce the complexity of graph, which allows us to use a low-rank representation for the graph affinity matrix and to achieve a linear complexity in computation. Extensive experimental evaluations on three major visual recognition tasks show that our method outperforms the prior works with a large margin while maintaining a low computation cost.
APA
Zhang, S., He, X. & Yan, S.. (2019). LatentGNN: Learning Efficient Non-local Relations for Visual Recognition. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:7374-7383 Available from https://proceedings.mlr.press/v97/zhang19f.html.

Related Material