Temporal Understanding of Gaze Communication with GazeTransformer

Ryan Anthony de Belen; Gelareh Mohammadi; Arcot Sowmya

Temporal Understanding of Gaze Communication with GazeTransformer

Ryan Anthony de Belen, Gelareh Mohammadi, Arcot Sowmya

Proceedings of The 2nd Gaze Meets ML workshop, PMLR 226:40-60, 2024.

Abstract

Gaze plays a crucial role in daily social interactions as it allows humans to communicate intentions effectively. We address the problem of temporal understanding of gaze communication in social videos in two stages. First, we develop GazeTransformer, an end-to-end module that infers atomic-level behaviours in a given frame. Second, we develop a temporal module that predicts event-level behaviours in a video using the inferred atomic-level behaviours. Compared to existing methods, GazeTransformer does not require human head and object locations as input. Instead, it identifies these locations in a parallel and end-to-end manner. In addition, it can predict the attended targets of all predicted humans and infer more atomic-level behaviours that cannot be handled simultaneously by previous approaches. We achieve promising performance on both atomic- and event-level prediction on the (M)VACATION dataset. Code will be available at https://github.com/gazetransformer/gazetransformer.

Cite this Paper

BibTeX


@InProceedings{pmlr-v226-belen24a,
  title = 	 {Temporal Understanding of Gaze Communication with GazeTransformer},
  author =       {Anthony de Belen, Ryan and Mohammadi, Gelareh and Sowmya, Arcot},
  booktitle = 	 {Proceedings of The 2nd Gaze Meets ML workshop},
  pages = 	 {40--60},
  year = 	 {2024},
  editor = 	 {Madu Blessing, Amarachi and Wu, Joy and Zanca, Dario and Krupinski, Elizabeth and Kashyap, Satyananda and Karargyris, Alexandros},
  volume = 	 {226},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v226/belen24a/belen24a.pdf},
  url = 	 {https://proceedings.mlr.press/v226/belen24a.html},
  abstract = 	 {Gaze plays a crucial role in daily social interactions as it allows humans to communicate intentions effectively. We address the problem of temporal understanding of gaze communication in social videos in two stages. First, we develop GazeTransformer, an end-to-end module that infers atomic-level behaviours in a given frame. Second, we develop a temporal module that predicts event-level behaviours in a video using the inferred atomic-level behaviours. Compared to existing methods, GazeTransformer does not require human head and object locations as input. Instead, it identifies these locations in a parallel and end-to-end manner. In addition, it can predict the attended targets of all predicted humans and infer more atomic-level behaviours that cannot be handled simultaneously by previous approaches. We achieve promising performance on both atomic- and event-level prediction on the (M)VACATION dataset. Code will be available at https://github.com/gazetransformer/gazetransformer.}
}

Endnote

%0 Conference Paper
%T Temporal Understanding of Gaze Communication with GazeTransformer
%A Ryan Anthony de Belen
%A Gelareh Mohammadi
%A Arcot Sowmya
%B Proceedings of The 2nd Gaze Meets ML workshop
%C Proceedings of Machine Learning Research
%D 2024
%E Amarachi Madu Blessing
%E Joy Wu
%E Dario Zanca
%E Elizabeth Krupinski
%E Satyananda Kashyap
%E Alexandros Karargyris	
%F pmlr-v226-belen24a
%I PMLR
%P 40--60
%U https://proceedings.mlr.press/v226/belen24a.html
%V 226
%X Gaze plays a crucial role in daily social interactions as it allows humans to communicate intentions effectively. We address the problem of temporal understanding of gaze communication in social videos in two stages. First, we develop GazeTransformer, an end-to-end module that infers atomic-level behaviours in a given frame. Second, we develop a temporal module that predicts event-level behaviours in a video using the inferred atomic-level behaviours. Compared to existing methods, GazeTransformer does not require human head and object locations as input. Instead, it identifies these locations in a parallel and end-to-end manner. In addition, it can predict the attended targets of all predicted humans and infer more atomic-level behaviours that cannot be handled simultaneously by previous approaches. We achieve promising performance on both atomic- and event-level prediction on the (M)VACATION dataset. Code will be available at https://github.com/gazetransformer/gazetransformer.

APA


Anthony de Belen, R., Mohammadi, G. & Sowmya, A.. (2024). Temporal Understanding of Gaze Communication with GazeTransformer. Proceedings of The 2nd Gaze Meets ML workshop, in Proceedings of Machine Learning Research 226:40-60 Available from https://proceedings.mlr.press/v226/belen24a.html.

Related Material

Download PDF