A Comparative Analysis and Study of Multiview CNN Models for Joint Object Categorization and Pose Estimation

Mohamed Elhoseiny; Tarek El-Gaaly; Amr Bakry; Ahmed Elgammal

A Comparative Analysis and Study of Multiview CNN Models for Joint Object Categorization and Pose Estimation

Mohamed Elhoseiny, Tarek El-Gaaly, Amr Bakry, Ahmed Elgammal

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:888-897, 2016.

Abstract

In the Object Recognition task, there exists a dichotomy between the categorization of objects and estimating object pose, where the former necessitates a view-invariant representation, while the latter requires a representation capable of capturing pose information over different categories of objects. With the rise of deep architectures, the prime focus has been on object category recognition. Deep learning methods have achieved wide success in this task. In contrast, object pose estimation using these approaches has received relatively less attention. In this work, we study how Convolutional Neural Networks (CNN) architectures can be adapted to the task of simultaneous object recognition and pose estimation. We investigate and analyze the layers of various CNN models and extensively compare between them with the goal of discovering how the layers of distributed representations within CNNs represent object pose information and how this contradicts with object category representations. We extensively experiment on two recent large and challenging multi-view datasets and we achieve better than the state-of-the-art.

Cite this Paper

BibTeX

@InProceedings{pmlr-v48-elhoseiny16,
  title = 	 {A Comparative Analysis and Study of Multiview CNN Models for Joint Object Categorization and Pose Estimation},
  author = 	 {Elhoseiny, Mohamed and El-Gaaly, Tarek and Bakry, Amr and Elgammal, Ahmed},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {888--897},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/elhoseiny16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/elhoseiny16.html},
  abstract = 	 {In the Object Recognition task, there exists a dichotomy between the categorization of objects and estimating object pose, where the former necessitates a view-invariant representation, while the latter requires a representation capable of capturing pose information over different categories of objects. With the rise of deep architectures, the prime focus has been on object category recognition. Deep learning methods have achieved wide success in this task. In contrast, object pose estimation using these approaches has received relatively less attention. In this work, we study how Convolutional Neural Networks (CNN) architectures can be adapted to the task of simultaneous object recognition and pose estimation. We investigate and analyze the layers of various CNN models and extensively compare between them with the goal of discovering how the layers of distributed representations within CNNs represent object pose information and how this contradicts with object category representations. We extensively experiment on two recent large and challenging multi-view datasets and we achieve better than the state-of-the-art.}
}

Endnote

%0 Conference Paper
%T A Comparative Analysis and Study of Multiview CNN Models for Joint Object Categorization and Pose Estimation
%A Mohamed Elhoseiny
%A Tarek El-Gaaly
%A Amr Bakry
%A Ahmed Elgammal
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-elhoseiny16
%I PMLR
%P 888--897
%U https://proceedings.mlr.press/v48/elhoseiny16.html
%V 48
%X In the Object Recognition task, there exists a dichotomy between the categorization of objects and estimating object pose, where the former necessitates a view-invariant representation, while the latter requires a representation capable of capturing pose information over different categories of objects. With the rise of deep architectures, the prime focus has been on object category recognition. Deep learning methods have achieved wide success in this task. In contrast, object pose estimation using these approaches has received relatively less attention. In this work, we study how Convolutional Neural Networks (CNN) architectures can be adapted to the task of simultaneous object recognition and pose estimation. We investigate and analyze the layers of various CNN models and extensively compare between them with the goal of discovering how the layers of distributed representations within CNNs represent object pose information and how this contradicts with object category representations. We extensively experiment on two recent large and challenging multi-view datasets and we achieve better than the state-of-the-art.

RIS

TY  - CPAPER
TI  - A Comparative Analysis and Study of Multiview CNN Models for Joint Object Categorization and Pose Estimation
AU  - Mohamed Elhoseiny
AU  - Tarek El-Gaaly
AU  - Amr Bakry
AU  - Ahmed Elgammal
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-elhoseiny16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 888
EP  - 897
L1  - http://proceedings.mlr.press/v48/elhoseiny16.pdf
UR  - https://proceedings.mlr.press/v48/elhoseiny16.html
AB  - In the Object Recognition task, there exists a dichotomy between the categorization of objects and estimating object pose, where the former necessitates a view-invariant representation, while the latter requires a representation capable of capturing pose information over different categories of objects. With the rise of deep architectures, the prime focus has been on object category recognition. Deep learning methods have achieved wide success in this task. In contrast, object pose estimation using these approaches has received relatively less attention. In this work, we study how Convolutional Neural Networks (CNN) architectures can be adapted to the task of simultaneous object recognition and pose estimation. We investigate and analyze the layers of various CNN models and extensively compare between them with the goal of discovering how the layers of distributed representations within CNNs represent object pose information and how this contradicts with object category representations. We extensively experiment on two recent large and challenging multi-view datasets and we achieve better than the state-of-the-art.
ER  -

APA

Elhoseiny, M., El-Gaaly, T., Bakry, A. & Elgammal, A.. (2016). A Comparative Analysis and Study of Multiview CNN Models for Joint Object Categorization and Pose Estimation. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:888-897 Available from https://proceedings.mlr.press/v48/elhoseiny16.html.

A Comparative Analysis and Study of Multiview CNN Models for Joint Object Categorization and Pose Estimation

Abstract

Cite this Paper

Related Material