GazeSAM: Interactive Image Segmentation with Eye Gaze and Segment Anything Model

Bin Wang, Armstrong Aboah, Zheyuan Zhang, Hongyi Pan, Ulas Bagci
Proceedings of The 2nd Gaze Meets ML workshop, PMLR 226:254-265, 2024.

Abstract

Interactive image segmentation aims to assist users in efficiently generating high-quality data annotations through user-friendly interactions such as clicking, scribbling, and bounding boxes. However, mouse-based interaction methods can induce user fatigue during large-scale dataset annotation and are not entirely suitable for some domains, such as radiology. This study introduces eye gaze as a novel interactive prompt for image segmentation, different than previous model-based applications. Specifically, leveraging the real-time interactive prompting feature of the recently proposed Segment Anything Model (SAM), we present the GazeSAM system to enable users to collect target segmentation masks by simply looking at the region of interest. GazeSAM tracks users’ eye gaze and utilizes it as the input prompt for SAM, generating target segmentation masks in real time. To our best knowledge, GazeSAM is the first work to combine eye gaze and SAM for interactive image segmentation. Experimental results demonstrate that GazeSAM can improve nearly 50% efficiency in 2D natural image and 3D medical image segmentation tasks. The code is available in https://github.com/ukaukaaaa/GazeSAM.

Cite this Paper


BibTeX
@InProceedings{pmlr-v226-wang24a, title = {GazeSAM: Interactive Image Segmentation with Eye Gaze and Segment Anything Model}, author = {Wang, Bin and Aboah, Armstrong and Zhang, Zheyuan and Pan, Hongyi and Bagci, Ulas}, booktitle = {Proceedings of The 2nd Gaze Meets ML workshop}, pages = {254--265}, year = {2024}, editor = {Madu Blessing, Amarachi and Wu, Joy and Zario, Danca and Krupinski, Elizabeth and Kashyap, Satyananda and Karargyris, Alexandros}, volume = {226}, series = {Proceedings of Machine Learning Research}, month = {16 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v226/wang24a/wang24a.pdf}, url = {https://proceedings.mlr.press/v226/wang24a.html}, abstract = {Interactive image segmentation aims to assist users in efficiently generating high-quality data annotations through user-friendly interactions such as clicking, scribbling, and bounding boxes. However, mouse-based interaction methods can induce user fatigue during large-scale dataset annotation and are not entirely suitable for some domains, such as radiology. This study introduces eye gaze as a novel interactive prompt for image segmentation, different than previous model-based applications. Specifically, leveraging the real-time interactive prompting feature of the recently proposed Segment Anything Model (SAM), we present the GazeSAM system to enable users to collect target segmentation masks by simply looking at the region of interest. GazeSAM tracks users’ eye gaze and utilizes it as the input prompt for SAM, generating target segmentation masks in real time. To our best knowledge, GazeSAM is the first work to combine eye gaze and SAM for interactive image segmentation. Experimental results demonstrate that GazeSAM can improve nearly 50% efficiency in 2D natural image and 3D medical image segmentation tasks. The code is available in https://github.com/ukaukaaaa/GazeSAM.} }
Endnote
%0 Conference Paper %T GazeSAM: Interactive Image Segmentation with Eye Gaze and Segment Anything Model %A Bin Wang %A Armstrong Aboah %A Zheyuan Zhang %A Hongyi Pan %A Ulas Bagci %B Proceedings of The 2nd Gaze Meets ML workshop %C Proceedings of Machine Learning Research %D 2024 %E Amarachi Madu Blessing %E Joy Wu %E Danca Zario %E Elizabeth Krupinski %E Satyananda Kashyap %E Alexandros Karargyris %F pmlr-v226-wang24a %I PMLR %P 254--265 %U https://proceedings.mlr.press/v226/wang24a.html %V 226 %X Interactive image segmentation aims to assist users in efficiently generating high-quality data annotations through user-friendly interactions such as clicking, scribbling, and bounding boxes. However, mouse-based interaction methods can induce user fatigue during large-scale dataset annotation and are not entirely suitable for some domains, such as radiology. This study introduces eye gaze as a novel interactive prompt for image segmentation, different than previous model-based applications. Specifically, leveraging the real-time interactive prompting feature of the recently proposed Segment Anything Model (SAM), we present the GazeSAM system to enable users to collect target segmentation masks by simply looking at the region of interest. GazeSAM tracks users’ eye gaze and utilizes it as the input prompt for SAM, generating target segmentation masks in real time. To our best knowledge, GazeSAM is the first work to combine eye gaze and SAM for interactive image segmentation. Experimental results demonstrate that GazeSAM can improve nearly 50% efficiency in 2D natural image and 3D medical image segmentation tasks. The code is available in https://github.com/ukaukaaaa/GazeSAM.
APA
Wang, B., Aboah, A., Zhang, Z., Pan, H. & Bagci, U.. (2024). GazeSAM: Interactive Image Segmentation with Eye Gaze and Segment Anything Model. Proceedings of The 2nd Gaze Meets ML workshop, in Proceedings of Machine Learning Research 226:254-265 Available from https://proceedings.mlr.press/v226/wang24a.html.

Related Material