An Attention-based Predictive Agent for Handwritten Numeral/Alphabet Recognition via Generation

Bonny Banerjee, Murchana Baruah
Proceedings of The 2nd Gaze Meets ML workshop, PMLR 226:4-20, 2024.

Abstract

A number of attention-based models for either classification or generation of handwritten numerals/alphabets have been reported in the literature. However, generation and classification are done jointly in very few end-to-end models. We propose a predictive agent model that actively samples its visual environment via a sequence of glimpses. The attention is driven by the agent’s sensory prediction (or generation) error. At each sampling instant, the model predicts the observation class and completes the partial sequence observed till that instant. It learns where and what to sample by jointly minimizing the classification and generation errors. Three variants of this model are evaluated for handwriting generation and recognition on images of handwritten numerals and alphabets from benchmark datasets. We show that the proposed model is more efficient in handwritten numeral/alphabet recognition than human participants in a recently published study as well as a highly-cited attention-based reinforcement model. This is the first known attention-based agent to interact with and learn end-to-end from images for recognition via generation, with high degree of accuracy and efficiency.

Cite this Paper


BibTeX
@InProceedings{pmlr-v226-banerjee24a, title = {An Attention-based Predictive Agent for Handwritten Numeral/Alphabet Recognition via Generation}, author = {Banerjee, Bonny and Baruah, Murchana}, booktitle = {Proceedings of The 2nd Gaze Meets ML workshop}, pages = {4--20}, year = {2024}, editor = {Madu Blessing, Amarachi and Wu, Joy and Zanca, Dario and Krupinski, Elizabeth and Kashyap, Satyananda and Karargyris, Alexandros}, volume = {226}, series = {Proceedings of Machine Learning Research}, month = {16 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v226/banerjee24a/banerjee24a.pdf}, url = {https://proceedings.mlr.press/v226/banerjee24a.html}, abstract = {A number of attention-based models for either classification or generation of handwritten numerals/alphabets have been reported in the literature. However, generation and classification are done jointly in very few end-to-end models. We propose a predictive agent model that actively samples its visual environment via a sequence of glimpses. The attention is driven by the agent’s sensory prediction (or generation) error. At each sampling instant, the model predicts the observation class and completes the partial sequence observed till that instant. It learns where and what to sample by jointly minimizing the classification and generation errors. Three variants of this model are evaluated for handwriting generation and recognition on images of handwritten numerals and alphabets from benchmark datasets. We show that the proposed model is more efficient in handwritten numeral/alphabet recognition than human participants in a recently published study as well as a highly-cited attention-based reinforcement model. This is the first known attention-based agent to interact with and learn end-to-end from images for recognition via generation, with high degree of accuracy and efficiency.} }
Endnote
%0 Conference Paper %T An Attention-based Predictive Agent for Handwritten Numeral/Alphabet Recognition via Generation %A Bonny Banerjee %A Murchana Baruah %B Proceedings of The 2nd Gaze Meets ML workshop %C Proceedings of Machine Learning Research %D 2024 %E Amarachi Madu Blessing %E Joy Wu %E Dario Zanca %E Elizabeth Krupinski %E Satyananda Kashyap %E Alexandros Karargyris %F pmlr-v226-banerjee24a %I PMLR %P 4--20 %U https://proceedings.mlr.press/v226/banerjee24a.html %V 226 %X A number of attention-based models for either classification or generation of handwritten numerals/alphabets have been reported in the literature. However, generation and classification are done jointly in very few end-to-end models. We propose a predictive agent model that actively samples its visual environment via a sequence of glimpses. The attention is driven by the agent’s sensory prediction (or generation) error. At each sampling instant, the model predicts the observation class and completes the partial sequence observed till that instant. It learns where and what to sample by jointly minimizing the classification and generation errors. Three variants of this model are evaluated for handwriting generation and recognition on images of handwritten numerals and alphabets from benchmark datasets. We show that the proposed model is more efficient in handwritten numeral/alphabet recognition than human participants in a recently published study as well as a highly-cited attention-based reinforcement model. This is the first known attention-based agent to interact with and learn end-to-end from images for recognition via generation, with high degree of accuracy and efficiency.
APA
Banerjee, B. & Baruah, M.. (2024). An Attention-based Predictive Agent for Handwritten Numeral/Alphabet Recognition via Generation. Proceedings of The 2nd Gaze Meets ML workshop, in Proceedings of Machine Learning Research 226:4-20 Available from https://proceedings.mlr.press/v226/banerjee24a.html.

Related Material