SeqMatchNet: Contrastive Learning with Sequence Matching for Place Recognition & Relocalization
Proceedings of the 5th Conference on Robot Learning, PMLR 164:429-443, 2022.
Visual Place Recognition (VPR) for mobile robot global relocalization is a well-studied problem, where contrastive learning based representation training methods have led to state-of-the-art performance. However, these methods are mainly designed for single image based VPR, where sequential information, which is ubiquitous in robotics, is only used as a post-processing step for filtering single image match scores, but is never used to guide the representation learning process itself. In this work, for the first time, we bridge the gap between single image representation learning and sequence matching through "SeqMatchNet" which transforms the single image descriptors such that they become more responsive to the sequence matching metric. We propose a novel triplet loss formulation where the distance metric is based on "sequence matching", that is, the aggregation of temporal order-based Euclidean distances computed using single images. We use the same metric for mining negatives online during the training which helps the optimization process by selecting appropriate positives and harder negatives. To overcome the computational overhead of sequence matching for negative mining, we propose a 2D convolution based formulation of sequence matching for efficiently aggregating distances within a distance matrix computed using single images. We show that our proposed method achieves consistent gains in performance as demonstrated on four benchmark datasets. Source code available at https://github.com/oravus/SeqMatchNet.