Bootstrap AutoEncoders With Contrastive Paradigm for Self-supervised Gaze Estimation

Yaoming Wang, Jin Li, Wenrui Dai, Bowen Shi, Xiaopeng Zhang, Chenglin Li, Hongkai Xiong
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:50794-50806, 2024.

Abstract

Existing self-supervised methods for gaze estimation using the dominant streams of contrastive and generative approaches are restricted to eye images and could fail in general full-face settings. In this paper, we reveal that contrastive methods are ineffective in data augmentation for self-supervised full-face gaze estimation, while generative methods are prone to trivial solutions due to the absence of explicit regularization on semantic representations. To address this challenge, we propose a novel approach called Bootstrap auto-encoders with Contrastive paradigm (BeCa), which combines the strengths of both generative and contrastive methods. Specifically, we revisit the Auto-Encoder used in generative approaches and incorporate the contrastive paradigm to introduce explicit regularization on gaze representation. Furthermore, we design the InfoMSE loss as an alternative to the vanilla MSE loss for Auto-Encoder to mitigate the inconsistency between reconstruction and representation learning. Experimental results demonstrate that the proposed approaches outperform state-of-the-art unsupervised gaze approaches on extensive datasets (including wild scenes) under both within-dataset and cross-dataset protocols.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-wang24ah, title = {Bootstrap {A}uto{E}ncoders With Contrastive Paradigm for Self-supervised Gaze Estimation}, author = {Wang, Yaoming and Li, Jin and Dai, Wenrui and Shi, Bowen and Zhang, Xiaopeng and Li, Chenglin and Xiong, Hongkai}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {50794--50806}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/wang24ah/wang24ah.pdf}, url = {https://proceedings.mlr.press/v235/wang24ah.html}, abstract = {Existing self-supervised methods for gaze estimation using the dominant streams of contrastive and generative approaches are restricted to eye images and could fail in general full-face settings. In this paper, we reveal that contrastive methods are ineffective in data augmentation for self-supervised full-face gaze estimation, while generative methods are prone to trivial solutions due to the absence of explicit regularization on semantic representations. To address this challenge, we propose a novel approach called Bootstrap auto-encoders with Contrastive paradigm (BeCa), which combines the strengths of both generative and contrastive methods. Specifically, we revisit the Auto-Encoder used in generative approaches and incorporate the contrastive paradigm to introduce explicit regularization on gaze representation. Furthermore, we design the InfoMSE loss as an alternative to the vanilla MSE loss for Auto-Encoder to mitigate the inconsistency between reconstruction and representation learning. Experimental results demonstrate that the proposed approaches outperform state-of-the-art unsupervised gaze approaches on extensive datasets (including wild scenes) under both within-dataset and cross-dataset protocols.} }
Endnote
%0 Conference Paper %T Bootstrap AutoEncoders With Contrastive Paradigm for Self-supervised Gaze Estimation %A Yaoming Wang %A Jin Li %A Wenrui Dai %A Bowen Shi %A Xiaopeng Zhang %A Chenglin Li %A Hongkai Xiong %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-wang24ah %I PMLR %P 50794--50806 %U https://proceedings.mlr.press/v235/wang24ah.html %V 235 %X Existing self-supervised methods for gaze estimation using the dominant streams of contrastive and generative approaches are restricted to eye images and could fail in general full-face settings. In this paper, we reveal that contrastive methods are ineffective in data augmentation for self-supervised full-face gaze estimation, while generative methods are prone to trivial solutions due to the absence of explicit regularization on semantic representations. To address this challenge, we propose a novel approach called Bootstrap auto-encoders with Contrastive paradigm (BeCa), which combines the strengths of both generative and contrastive methods. Specifically, we revisit the Auto-Encoder used in generative approaches and incorporate the contrastive paradigm to introduce explicit regularization on gaze representation. Furthermore, we design the InfoMSE loss as an alternative to the vanilla MSE loss for Auto-Encoder to mitigate the inconsistency between reconstruction and representation learning. Experimental results demonstrate that the proposed approaches outperform state-of-the-art unsupervised gaze approaches on extensive datasets (including wild scenes) under both within-dataset and cross-dataset protocols.
APA
Wang, Y., Li, J., Dai, W., Shi, B., Zhang, X., Li, C. & Xiong, H.. (2024). Bootstrap AutoEncoders With Contrastive Paradigm for Self-supervised Gaze Estimation. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:50794-50806 Available from https://proceedings.mlr.press/v235/wang24ah.html.

Related Material