Memory-Based Sequential Attention

Jason Stock, Charles Anderson
Proceedings of The 2nd Gaze Meets ML workshop, PMLR 226:236-253, 2024.

Abstract

Computational models of sequential attention often use recurrent neural networks, which may lead to information loss over accumulated glimpses and an inability to dynamically reweigh glimpses at each step. Addressing the former limitation should result in greater performance, while addressing the latter should enable greater interpretability. In this work, we propose a biologically-inspired model of sequential attention for image classification. Specifically, our algorithm contextualizes the history of observed locations from within an image to inform future gaze points, akin to scanpaths in the biological visual system. We achieve this by using a transformer-based memory module coupled with a reinforcement learning-based learning algorithm, improving both task performance and model interpretability. In addition to empirically evaluating our approach on classical vision tasks, we demonstrate the robustness of our algorithm to different initial locations in the image and provide interpretations of sampled locations from within the trajectory.

Cite this Paper


BibTeX
@InProceedings{pmlr-v226-stock24a, title = {Memory-Based Sequential Attention}, author = {Stock, Jason and Anderson, Charles}, booktitle = {Proceedings of The 2nd Gaze Meets ML workshop}, pages = {236--253}, year = {2024}, editor = {Madu Blessing, Amarachi and Wu, Joy and Zario, Danca and Krupinski, Elizabeth and Kashyap, Satyananda and Karargyris, Alexandros}, volume = {226}, series = {Proceedings of Machine Learning Research}, month = {16 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v226/stock24a/stock24a.pdf}, url = {https://proceedings.mlr.press/v226/stock24a.html}, abstract = {Computational models of sequential attention often use recurrent neural networks, which may lead to information loss over accumulated glimpses and an inability to dynamically reweigh glimpses at each step. Addressing the former limitation should result in greater performance, while addressing the latter should enable greater interpretability. In this work, we propose a biologically-inspired model of sequential attention for image classification. Specifically, our algorithm contextualizes the history of observed locations from within an image to inform future gaze points, akin to scanpaths in the biological visual system. We achieve this by using a transformer-based memory module coupled with a reinforcement learning-based learning algorithm, improving both task performance and model interpretability. In addition to empirically evaluating our approach on classical vision tasks, we demonstrate the robustness of our algorithm to different initial locations in the image and provide interpretations of sampled locations from within the trajectory.} }
Endnote
%0 Conference Paper %T Memory-Based Sequential Attention %A Jason Stock %A Charles Anderson %B Proceedings of The 2nd Gaze Meets ML workshop %C Proceedings of Machine Learning Research %D 2024 %E Amarachi Madu Blessing %E Joy Wu %E Danca Zario %E Elizabeth Krupinski %E Satyananda Kashyap %E Alexandros Karargyris %F pmlr-v226-stock24a %I PMLR %P 236--253 %U https://proceedings.mlr.press/v226/stock24a.html %V 226 %X Computational models of sequential attention often use recurrent neural networks, which may lead to information loss over accumulated glimpses and an inability to dynamically reweigh glimpses at each step. Addressing the former limitation should result in greater performance, while addressing the latter should enable greater interpretability. In this work, we propose a biologically-inspired model of sequential attention for image classification. Specifically, our algorithm contextualizes the history of observed locations from within an image to inform future gaze points, akin to scanpaths in the biological visual system. We achieve this by using a transformer-based memory module coupled with a reinforcement learning-based learning algorithm, improving both task performance and model interpretability. In addition to empirically evaluating our approach on classical vision tasks, we demonstrate the robustness of our algorithm to different initial locations in the image and provide interpretations of sampled locations from within the trajectory.
APA
Stock, J. & Anderson, C.. (2024). Memory-Based Sequential Attention. Proceedings of The 2nd Gaze Meets ML workshop, in Proceedings of Machine Learning Research 226:236-253 Available from https://proceedings.mlr.press/v226/stock24a.html.

Related Material