SAM meets Gaze: Passive Eye Tracking for Prompt-based Instance Segmentation

Daniel Beckmann, Jacqueline Kockwelp, Joerg Gromoll, Friedemann Kiefer, Benjamin Risse
Proceedings of The 2nd Gaze Meets ML workshop, PMLR 226:21-39, 2024.

Abstract

The annotation of large new datasets for machine learning is a very time-consuming and expensive process. This is particularly true for pixel-accurate labelling of e.g. segmentation masks. Prompt-based methods have been developed to accelerate this label generation process by allowing the model to incorporate additional clues from other sources such as humans. The recently published Segment Anything foundation model (SAM) extends this approach by providing a flexible framework with a model that was trained on more than 1 billion segmentation masks, while also being able to exploit explicit user input. In this paper, we explore the usage of a passive eye tracking system to collect gaze data during unconstrained image inspections which we integrate as a novel prompt input for SAM. We evaluated our method on the original SAM model and finetuned the prompt encoder and mask decoder for different gaze-based inputs, namely fixation points, blurred gaze maps and multiple heatmap variants. Our results indicate that the acquisition of gaze data is faster than other prompt-based approaches while the segmentation performance stays comparable to the state-of-the-art performance of SAM. Code is available at https://zivgitlab.uni-muenster.de/cvmls/sam_meets_gaze.

Cite this Paper


BibTeX
@InProceedings{pmlr-v226-beckmann24a, title = {SAM meets Gaze: Passive Eye Tracking for Prompt-based Instance Segmentation}, author = {Beckmann, Daniel and Kockwelp, Jacqueline and Gromoll, Joerg and Kiefer, Friedemann and Risse, Benjamin}, booktitle = {Proceedings of The 2nd Gaze Meets ML workshop}, pages = {21--39}, year = {2024}, editor = {Madu Blessing, Amarachi and Wu, Joy and Zanca, Dario and Krupinski, Elizabeth and Kashyap, Satyananda and Karargyris, Alexandros}, volume = {226}, series = {Proceedings of Machine Learning Research}, month = {16 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v226/beckmann24a/beckmann24a.pdf}, url = {https://proceedings.mlr.press/v226/beckmann24a.html}, abstract = {The annotation of large new datasets for machine learning is a very time-consuming and expensive process. This is particularly true for pixel-accurate labelling of e.g. segmentation masks. Prompt-based methods have been developed to accelerate this label generation process by allowing the model to incorporate additional clues from other sources such as humans. The recently published Segment Anything foundation model (SAM) extends this approach by providing a flexible framework with a model that was trained on more than 1 billion segmentation masks, while also being able to exploit explicit user input. In this paper, we explore the usage of a passive eye tracking system to collect gaze data during unconstrained image inspections which we integrate as a novel prompt input for SAM. We evaluated our method on the original SAM model and finetuned the prompt encoder and mask decoder for different gaze-based inputs, namely fixation points, blurred gaze maps and multiple heatmap variants. Our results indicate that the acquisition of gaze data is faster than other prompt-based approaches while the segmentation performance stays comparable to the state-of-the-art performance of SAM. Code is available at https://zivgitlab.uni-muenster.de/cvmls/sam_meets_gaze.} }
Endnote
%0 Conference Paper %T SAM meets Gaze: Passive Eye Tracking for Prompt-based Instance Segmentation %A Daniel Beckmann %A Jacqueline Kockwelp %A Joerg Gromoll %A Friedemann Kiefer %A Benjamin Risse %B Proceedings of The 2nd Gaze Meets ML workshop %C Proceedings of Machine Learning Research %D 2024 %E Amarachi Madu Blessing %E Joy Wu %E Dario Zanca %E Elizabeth Krupinski %E Satyananda Kashyap %E Alexandros Karargyris %F pmlr-v226-beckmann24a %I PMLR %P 21--39 %U https://proceedings.mlr.press/v226/beckmann24a.html %V 226 %X The annotation of large new datasets for machine learning is a very time-consuming and expensive process. This is particularly true for pixel-accurate labelling of e.g. segmentation masks. Prompt-based methods have been developed to accelerate this label generation process by allowing the model to incorporate additional clues from other sources such as humans. The recently published Segment Anything foundation model (SAM) extends this approach by providing a flexible framework with a model that was trained on more than 1 billion segmentation masks, while also being able to exploit explicit user input. In this paper, we explore the usage of a passive eye tracking system to collect gaze data during unconstrained image inspections which we integrate as a novel prompt input for SAM. We evaluated our method on the original SAM model and finetuned the prompt encoder and mask decoder for different gaze-based inputs, namely fixation points, blurred gaze maps and multiple heatmap variants. Our results indicate that the acquisition of gaze data is faster than other prompt-based approaches while the segmentation performance stays comparable to the state-of-the-art performance of SAM. Code is available at https://zivgitlab.uni-muenster.de/cvmls/sam_meets_gaze.
APA
Beckmann, D., Kockwelp, J., Gromoll, J., Kiefer, F. & Risse, B.. (2024). SAM meets Gaze: Passive Eye Tracking for Prompt-based Instance Segmentation. Proceedings of The 2nd Gaze Meets ML workshop, in Proceedings of Machine Learning Research 226:21-39 Available from https://proceedings.mlr.press/v226/beckmann24a.html.

Related Material