Multi-branch Siamese Network for High Performance Online Visual Tracking
Proceedings of The Eleventh Asian Conference on Machine Learning, PMLR 101:519-534, 2019.
Recently, Siamese networks have drawn great attention in the visual tracking community because of their balanced accuracy and speed. However, most existing Siamese frameworks describe the target appearance using a global pattern from the last layer, leading to high sensitivity to similar distractors, non-rigid appearance change, and partial occlusion. Addressing these issues, we propose a Multi-branch Siamese network (MSiam) for high-performance object tracking. The MSiam performs layer-wise feature aggregations and simultaneously considers the global-local patterns for more accurate target tracking. In particular, we propose a feature aggregation module (FAM) keeping the heterogeneity of the three types of features, further improving the discriminability of MSiam using both high-level semantic and low-level spatial information. To enhance the adaptability to non-rigid appearance change and partial occlusion, a multi-scale local pattern detection module (LPDM) is designed to identify discriminative regions of the target objects. By considering various combinations of the local structures, our tracker can form various types of structure patterns. Extensive evaluations on five benchmarks demonstrate that the proposed tracking algorithm performs favorably against state-of-the-art methods while running beyond real-time.