Decoding Attention from Gaze: A Benchmark Dataset and End-to-End Models

Karan Uppal, Jaeah Kim, Shashank Singh
Proceedings of The 1st Gaze Meets ML workshop, PMLR 210:219-240, 2023.

Abstract

Eye-tracking has potential to provide rich behavioral data about human cognition in eco- logically valid environments. However, analyzing this rich data is often challenging. Most automated analyses are specific to simplistic artificial visual stimuli with well-separated, static regions of interest, while most analyses in the context of complex visual stimuli, such as most natural scenes, rely on laborious and time-consuming manual annotation. This paper studies using computer vision tools for “attention decoding”, the task of assessing the locus of a participant’s overt visual attention over time. We provide a publicly available Multiple Object Eye-Tracking (MOET) dataset, consisting of gaze data from participants tracking specific objects, annotated with labels and bounding boxes, in crowded real-world videos, for training and evaluating attention decoding algorithms. We also propose two end- to-end deep learning models for attention decoding and compare these to state-of-the-art heuristic methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v210-uppal23a, title = {Decoding Attention from Gaze: A Benchmark Dataset and End-to-End Models}, author = {Uppal, Karan and Kim, Jaeah and Singh, Shashank}, booktitle = {Proceedings of The 1st Gaze Meets ML workshop}, pages = {219--240}, year = {2023}, editor = {Lourentzou, Ismini and Wu, Joy and Kashyap, Satyananda and Karargyris, Alexandros and Celi, Leo Anthony and Kawas, Ban and Talathi, Sachin}, volume = {210}, series = {Proceedings of Machine Learning Research}, month = {03 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v210/uppal23a/uppal23a.pdf}, url = {https://proceedings.mlr.press/v210/uppal23a.html}, abstract = {Eye-tracking has potential to provide rich behavioral data about human cognition in eco- logically valid environments. However, analyzing this rich data is often challenging. Most automated analyses are specific to simplistic artificial visual stimuli with well-separated, static regions of interest, while most analyses in the context of complex visual stimuli, such as most natural scenes, rely on laborious and time-consuming manual annotation. This paper studies using computer vision tools for “attention decoding”, the task of assessing the locus of a participant’s overt visual attention over time. We provide a publicly available Multiple Object Eye-Tracking (MOET) dataset, consisting of gaze data from participants tracking specific objects, annotated with labels and bounding boxes, in crowded real-world videos, for training and evaluating attention decoding algorithms. We also propose two end- to-end deep learning models for attention decoding and compare these to state-of-the-art heuristic methods.} }
Endnote
%0 Conference Paper %T Decoding Attention from Gaze: A Benchmark Dataset and End-to-End Models %A Karan Uppal %A Jaeah Kim %A Shashank Singh %B Proceedings of The 1st Gaze Meets ML workshop %C Proceedings of Machine Learning Research %D 2023 %E Ismini Lourentzou %E Joy Wu %E Satyananda Kashyap %E Alexandros Karargyris %E Leo Anthony Celi %E Ban Kawas %E Sachin Talathi %F pmlr-v210-uppal23a %I PMLR %P 219--240 %U https://proceedings.mlr.press/v210/uppal23a.html %V 210 %X Eye-tracking has potential to provide rich behavioral data about human cognition in eco- logically valid environments. However, analyzing this rich data is often challenging. Most automated analyses are specific to simplistic artificial visual stimuli with well-separated, static regions of interest, while most analyses in the context of complex visual stimuli, such as most natural scenes, rely on laborious and time-consuming manual annotation. This paper studies using computer vision tools for “attention decoding”, the task of assessing the locus of a participant’s overt visual attention over time. We provide a publicly available Multiple Object Eye-Tracking (MOET) dataset, consisting of gaze data from participants tracking specific objects, annotated with labels and bounding boxes, in crowded real-world videos, for training and evaluating attention decoding algorithms. We also propose two end- to-end deep learning models for attention decoding and compare these to state-of-the-art heuristic methods.
APA
Uppal, K., Kim, J. & Singh, S.. (2023). Decoding Attention from Gaze: A Benchmark Dataset and End-to-End Models. Proceedings of The 1st Gaze Meets ML workshop, in Proceedings of Machine Learning Research 210:219-240 Available from https://proceedings.mlr.press/v210/uppal23a.html.

Related Material