End-to-End Learning of Multi-scale Convolutional Neural Network for Stereo Matching

Li Zhang, Quanhong Wang, Haihua Lu, Yong Zhao
Proceedings of The 10th Asian Conference on Machine Learning, PMLR 95:81-96, 2018.

Abstract

Deep neural networks have shown excellent performance in stereo matching task. Recently CNN-based methods have shown that stereo matching can be formulated as a supervised learning task. However, less attention is paid on the fusion of contextual semantic information and details. To tackle this problem, we propose a network for disparity estimation based on abundant contextual details and semantic information, called Multi-scale Features Network (MSFNet). First, we design a new structure to encode rich semantic information and fine-grained details by fusing multi-scale features. And we combine the advantages of element-wise addition and concatenation, which is conducive to merge semantic information with details. Second, a guidance mechanism is introduced to guide the network to automatically focus more on the unreliable regions. Third, we formulate the consistency check as an error map, obtained by the low stage features with fine-grained details. Finally, we adopt the consistency checking between the left feature and the synthetic left feature to refine the initial disparity. Experiments on Scene Flow and KITTI 2015 benchmark demonstrated that the proposed method can achieve the state-of-the-art performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v95-zhang18a, title = {End-to-End Learning of Multi-scale Convolutional Neural Network for Stereo Matching}, author = {Zhang, Li and Wang, Quanhong and Lu, Haihua and Zhao, Yong}, booktitle = {Proceedings of The 10th Asian Conference on Machine Learning}, pages = {81--96}, year = {2018}, editor = {Zhu, Jun and Takeuchi, Ichiro}, volume = {95}, series = {Proceedings of Machine Learning Research}, month = {14--16 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v95/zhang18a/zhang18a.pdf}, url = {https://proceedings.mlr.press/v95/zhang18a.html}, abstract = {Deep neural networks have shown excellent performance in stereo matching task. Recently CNN-based methods have shown that stereo matching can be formulated as a supervised learning task. However, less attention is paid on the fusion of contextual semantic information and details. To tackle this problem, we propose a network for disparity estimation based on abundant contextual details and semantic information, called Multi-scale Features Network (MSFNet). First, we design a new structure to encode rich semantic information and fine-grained details by fusing multi-scale features. And we combine the advantages of element-wise addition and concatenation, which is conducive to merge semantic information with details. Second, a guidance mechanism is introduced to guide the network to automatically focus more on the unreliable regions. Third, we formulate the consistency check as an error map, obtained by the low stage features with fine-grained details. Finally, we adopt the consistency checking between the left feature and the synthetic left feature to refine the initial disparity. Experiments on Scene Flow and KITTI 2015 benchmark demonstrated that the proposed method can achieve the state-of-the-art performance.} }
Endnote
%0 Conference Paper %T End-to-End Learning of Multi-scale Convolutional Neural Network for Stereo Matching %A Li Zhang %A Quanhong Wang %A Haihua Lu %A Yong Zhao %B Proceedings of The 10th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jun Zhu %E Ichiro Takeuchi %F pmlr-v95-zhang18a %I PMLR %P 81--96 %U https://proceedings.mlr.press/v95/zhang18a.html %V 95 %X Deep neural networks have shown excellent performance in stereo matching task. Recently CNN-based methods have shown that stereo matching can be formulated as a supervised learning task. However, less attention is paid on the fusion of contextual semantic information and details. To tackle this problem, we propose a network for disparity estimation based on abundant contextual details and semantic information, called Multi-scale Features Network (MSFNet). First, we design a new structure to encode rich semantic information and fine-grained details by fusing multi-scale features. And we combine the advantages of element-wise addition and concatenation, which is conducive to merge semantic information with details. Second, a guidance mechanism is introduced to guide the network to automatically focus more on the unreliable regions. Third, we formulate the consistency check as an error map, obtained by the low stage features with fine-grained details. Finally, we adopt the consistency checking between the left feature and the synthetic left feature to refine the initial disparity. Experiments on Scene Flow and KITTI 2015 benchmark demonstrated that the proposed method can achieve the state-of-the-art performance.
APA
Zhang, L., Wang, Q., Lu, H. & Zhao, Y.. (2018). End-to-End Learning of Multi-scale Convolutional Neural Network for Stereo Matching. Proceedings of The 10th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 95:81-96 Available from https://proceedings.mlr.press/v95/zhang18a.html.

Related Material