Interaction-aware Dynamic 3D Gaze Estimation in Videos

Chenyi Kuang, Jeffrey O. Kephart, Qiang Ji
Proceedings of The 2nd Gaze Meets ML workshop, PMLR 226:107-124, 2024.

Abstract

Human gaze in in-the-wild and outdoor human activities is a continuous and dynamic process that is driven by the anatomical eye movements such as fixations, saccades and smooth pursuit. However, learning gaze dynamics in videos remains as a challenging task as annotating human gaze in videos is labor-expensive. In this paper, we propose a novel method for dynamic 3D gaze estimation in videos by utilizing the human interaction labels. Our model contains a temporal gaze estimator which is built upon Autoregressive Transformer structures. Besides, our model learns the spatial relationship of gaze among multiple subjects, by constructing a Human Interaction Graph from predicted gaze and update the gaze feature with a structure-aware Transformer. Our model predict future gaze conditioned on historical gaze and the gaze interactions in an autoregressive manner. We propose a multi-state training algorithm to alternately update the Interaction module and dynamic gaze estimation module, when training on a mixture of labeled and unlabeled sequences. We show significant improvements in both within-domain gaze estimation accuracy and cross-domain generalization on the physically-unconstrained gaze estimation benchmark.

Cite this Paper


BibTeX
@InProceedings{pmlr-v226-kuang24a, title = {Interaction-aware Dynamic 3D Gaze Estimation in Videos}, author = {Kuang, Chenyi and O. Kephart, Jeffrey and Ji, Qiang}, booktitle = {Proceedings of The 2nd Gaze Meets ML workshop}, pages = {107--124}, year = {2024}, editor = {Madu Blessing, Amarachi and Wu, Joy and Zanca, Dario and Krupinski, Elizabeth and Kashyap, Satyananda and Karargyris, Alexandros}, volume = {226}, series = {Proceedings of Machine Learning Research}, month = {16 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v226/kuang24a/kuang24a.pdf}, url = {https://proceedings.mlr.press/v226/kuang24a.html}, abstract = {Human gaze in in-the-wild and outdoor human activities is a continuous and dynamic process that is driven by the anatomical eye movements such as fixations, saccades and smooth pursuit. However, learning gaze dynamics in videos remains as a challenging task as annotating human gaze in videos is labor-expensive. In this paper, we propose a novel method for dynamic 3D gaze estimation in videos by utilizing the human interaction labels. Our model contains a temporal gaze estimator which is built upon Autoregressive Transformer structures. Besides, our model learns the spatial relationship of gaze among multiple subjects, by constructing a Human Interaction Graph from predicted gaze and update the gaze feature with a structure-aware Transformer. Our model predict future gaze conditioned on historical gaze and the gaze interactions in an autoregressive manner. We propose a multi-state training algorithm to alternately update the Interaction module and dynamic gaze estimation module, when training on a mixture of labeled and unlabeled sequences. We show significant improvements in both within-domain gaze estimation accuracy and cross-domain generalization on the physically-unconstrained gaze estimation benchmark.} }
Endnote
%0 Conference Paper %T Interaction-aware Dynamic 3D Gaze Estimation in Videos %A Chenyi Kuang %A Jeffrey O. Kephart %A Qiang Ji %B Proceedings of The 2nd Gaze Meets ML workshop %C Proceedings of Machine Learning Research %D 2024 %E Amarachi Madu Blessing %E Joy Wu %E Dario Zanca %E Elizabeth Krupinski %E Satyananda Kashyap %E Alexandros Karargyris %F pmlr-v226-kuang24a %I PMLR %P 107--124 %U https://proceedings.mlr.press/v226/kuang24a.html %V 226 %X Human gaze in in-the-wild and outdoor human activities is a continuous and dynamic process that is driven by the anatomical eye movements such as fixations, saccades and smooth pursuit. However, learning gaze dynamics in videos remains as a challenging task as annotating human gaze in videos is labor-expensive. In this paper, we propose a novel method for dynamic 3D gaze estimation in videos by utilizing the human interaction labels. Our model contains a temporal gaze estimator which is built upon Autoregressive Transformer structures. Besides, our model learns the spatial relationship of gaze among multiple subjects, by constructing a Human Interaction Graph from predicted gaze and update the gaze feature with a structure-aware Transformer. Our model predict future gaze conditioned on historical gaze and the gaze interactions in an autoregressive manner. We propose a multi-state training algorithm to alternately update the Interaction module and dynamic gaze estimation module, when training on a mixture of labeled and unlabeled sequences. We show significant improvements in both within-domain gaze estimation accuracy and cross-domain generalization on the physically-unconstrained gaze estimation benchmark.
APA
Kuang, C., O. Kephart, J. & Ji, Q.. (2024). Interaction-aware Dynamic 3D Gaze Estimation in Videos. Proceedings of The 2nd Gaze Meets ML workshop, in Proceedings of Machine Learning Research 226:107-124 Available from https://proceedings.mlr.press/v226/kuang24a.html.

Related Material