CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations

Gengchen Mai, Ni Lao, Yutong He, Jiaming Song, Stefano Ermon
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:23498-23515, 2023.

Abstract

Geo-tagged images are publicly available in large quantities, whereas labels such as object classes are rather scarce and expensive to collect. Meanwhile, contrastive learning has achieved tremendous success in various natural image and language tasks with limited labeled data. However, existing methods fail to fully leverage geospatial information, which can be paramount to distinguishing objects that are visually similar. To directly leverage the abundant geospatial information associated with images in pre-training, fine-tuning, and inference stages, we present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images. We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images, which can be transferred to downstream supervised tasks such as image classification. Experiments show that CSP can improve model performance on both iNat2018 and fMoW datasets. Especially, on iNat2018, CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-mai23a, title = {{CSP}: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations}, author = {Mai, Gengchen and Lao, Ni and He, Yutong and Song, Jiaming and Ermon, Stefano}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {23498--23515}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/mai23a/mai23a.pdf}, url = {https://proceedings.mlr.press/v202/mai23a.html}, abstract = {Geo-tagged images are publicly available in large quantities, whereas labels such as object classes are rather scarce and expensive to collect. Meanwhile, contrastive learning has achieved tremendous success in various natural image and language tasks with limited labeled data. However, existing methods fail to fully leverage geospatial information, which can be paramount to distinguishing objects that are visually similar. To directly leverage the abundant geospatial information associated with images in pre-training, fine-tuning, and inference stages, we present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images. We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images, which can be transferred to downstream supervised tasks such as image classification. Experiments show that CSP can improve model performance on both iNat2018 and fMoW datasets. Especially, on iNat2018, CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.} }
Endnote
%0 Conference Paper %T CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations %A Gengchen Mai %A Ni Lao %A Yutong He %A Jiaming Song %A Stefano Ermon %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-mai23a %I PMLR %P 23498--23515 %U https://proceedings.mlr.press/v202/mai23a.html %V 202 %X Geo-tagged images are publicly available in large quantities, whereas labels such as object classes are rather scarce and expensive to collect. Meanwhile, contrastive learning has achieved tremendous success in various natural image and language tasks with limited labeled data. However, existing methods fail to fully leverage geospatial information, which can be paramount to distinguishing objects that are visually similar. To directly leverage the abundant geospatial information associated with images in pre-training, fine-tuning, and inference stages, we present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images. We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images, which can be transferred to downstream supervised tasks such as image classification. Experiments show that CSP can improve model performance on both iNat2018 and fMoW datasets. Especially, on iNat2018, CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
APA
Mai, G., Lao, N., He, Y., Song, J. & Ermon, S.. (2023). CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:23498-23515 Available from https://proceedings.mlr.press/v202/mai23a.html.

Related Material