Exploring self-supervised learning techniques for hand pose estimation

Aneesh Dahiya, Adrian Spurr, Otmar Hilliges
NeurIPS 2020 Workshop on Pre-registration in Machine Learning, PMLR 148:255-271, 2021.

Abstract

3D hand pose estimation from monocular RGB is a challenging problem due to significantly varying environmental conditions such as lighting or variation in subject appearances. One way to improve performance across-the-board is to introduce more data. However, acquiring 3D annotated data for hands is a laborious task, as it involves heavy multi-camera setups leading to lab-like training data which does not generalize well. Alternatively, one could make use of unsupervised pre-training in order to significantly increase the training data size one can train on. More recently, contrastive learning has shown promising results on tasks such as image classification. Yet, no study has been made on how it affects structured regression problems such as hand pose estimation. We hypothesize that the contrastive objective does not extend well to such downstream task due to its inherent invariance and instead propose a relation objective, promoting equivariance. Our goal is to perform extensive experiments to validate our hypothesis.

Cite this Paper


BibTeX
@InProceedings{pmlr-v148-dahiya21a, title = {Exploring self-supervised learning techniques for hand pose estimation}, author = {Dahiya, Aneesh and Spurr, Adrian and Hilliges, Otmar}, booktitle = {NeurIPS 2020 Workshop on Pre-registration in Machine Learning}, pages = {255--271}, year = {2021}, editor = {Bertinetto, Luca and Henriques, João F. and Albanie, Samuel and Paganini, Michela and Varol, Gül}, volume = {148}, series = {Proceedings of Machine Learning Research}, month = {11 Dec}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v148/dahiya21a/dahiya21a.pdf}, url = {http://proceedings.mlr.press/v148/dahiya21a.html}, abstract = {3D hand pose estimation from monocular RGB is a challenging problem due to significantly varying environmental conditions such as lighting or variation in subject appearances. One way to improve performance across-the-board is to introduce more data. However, acquiring 3D annotated data for hands is a laborious task, as it involves heavy multi-camera setups leading to lab-like training data which does not generalize well. Alternatively, one could make use of unsupervised pre-training in order to significantly increase the training data size one can train on. More recently, contrastive learning has shown promising results on tasks such as image classification. Yet, no study has been made on how it affects structured regression problems such as hand pose estimation. We hypothesize that the contrastive objective does not extend well to such downstream task due to its inherent invariance and instead propose a relation objective, promoting equivariance. Our goal is to perform extensive experiments to validate our hypothesis.} }
Endnote
%0 Conference Paper %T Exploring self-supervised learning techniques for hand pose estimation %A Aneesh Dahiya %A Adrian Spurr %A Otmar Hilliges %B NeurIPS 2020 Workshop on Pre-registration in Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Luca Bertinetto %E João F. Henriques %E Samuel Albanie %E Michela Paganini %E Gül Varol %F pmlr-v148-dahiya21a %I PMLR %P 255--271 %U http://proceedings.mlr.press/v148/dahiya21a.html %V 148 %X 3D hand pose estimation from monocular RGB is a challenging problem due to significantly varying environmental conditions such as lighting or variation in subject appearances. One way to improve performance across-the-board is to introduce more data. However, acquiring 3D annotated data for hands is a laborious task, as it involves heavy multi-camera setups leading to lab-like training data which does not generalize well. Alternatively, one could make use of unsupervised pre-training in order to significantly increase the training data size one can train on. More recently, contrastive learning has shown promising results on tasks such as image classification. Yet, no study has been made on how it affects structured regression problems such as hand pose estimation. We hypothesize that the contrastive objective does not extend well to such downstream task due to its inherent invariance and instead propose a relation objective, promoting equivariance. Our goal is to perform extensive experiments to validate our hypothesis.
APA
Dahiya, A., Spurr, A. & Hilliges, O.. (2021). Exploring self-supervised learning techniques for hand pose estimation. NeurIPS 2020 Workshop on Pre-registration in Machine Learning, in Proceedings of Machine Learning Research 148:255-271 Available from http://proceedings.mlr.press/v148/dahiya21a.html.

Related Material