Image-to-Markup Generation with Coarse-to-Fine Attention

Yuntian Deng; Anssi Kanervisto; Jeffrey Ling; Alexander M. Rush

Image-to-Markup Generation with Coarse-to-Fine Attention

Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, Alexander M. Rush

Proceedings of the 34th International Conference on Machine Learning, PMLR 70:980-989, 2017.

Abstract

We present a neural encoder-decoder model to convert images into presentational markup based on a scalable coarse-to-fine attention mechanism. Our method is evaluated in the context of image-to-LaTeX generation, and we introduce a new dataset of real-world rendered mathematical expressions paired with LaTeX markup. We show that unlike neural OCR techniques using CTC-based models, attention-based approaches can tackle this non-standard OCR task. Our approach outperforms classical mathematical OCR systems by a large margin on in-domain rendered data, and, with pretraining, also performs well on out-of-domain handwritten data. To reduce the inference complexity associated with the attention-based approaches, we introduce a new coarse-to-fine attention layer that selects a support region before applying attention.

Cite this Paper

BibTeX


@InProceedings{pmlr-v70-deng17a,
  title = 	 {Image-to-Markup Generation with Coarse-to-Fine Attention},
  author =       {Yuntian Deng and Anssi Kanervisto and Jeffrey Ling and Alexander M. Rush},
  booktitle = 	 {Proceedings of the 34th International Conference on Machine Learning},
  pages = 	 {980--989},
  year = 	 {2017},
  editor = 	 {Precup, Doina and Teh, Yee Whye},
  volume = 	 {70},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--11 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v70/deng17a/deng17a.pdf},
  url = 	 {https://proceedings.mlr.press/v70/deng17a.html},
  abstract = 	 {We present a neural encoder-decoder model to convert images into presentational markup based on a scalable coarse-to-fine attention mechanism. Our method is evaluated in the context of image-to-LaTeX generation, and we introduce a new dataset of real-world rendered mathematical expressions paired with LaTeX markup. We show that unlike neural OCR techniques using CTC-based models, attention-based approaches can tackle this non-standard OCR task. Our approach outperforms classical mathematical OCR systems by a large margin on in-domain rendered data, and, with pretraining, also performs well on out-of-domain handwritten data. To reduce the inference complexity associated with the attention-based approaches, we introduce a new coarse-to-fine attention layer that selects a support region before applying attention.}
}

Endnote

%0 Conference Paper
%T Image-to-Markup Generation with Coarse-to-Fine Attention
%A Yuntian Deng
%A Anssi Kanervisto
%A Jeffrey Ling
%A Alexander M. Rush
%B Proceedings of the 34th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2017
%E Doina Precup
%E Yee Whye Teh	
%F pmlr-v70-deng17a
%I PMLR
%P 980--989
%U https://proceedings.mlr.press/v70/deng17a.html
%V 70
%X We present a neural encoder-decoder model to convert images into presentational markup based on a scalable coarse-to-fine attention mechanism. Our method is evaluated in the context of image-to-LaTeX generation, and we introduce a new dataset of real-world rendered mathematical expressions paired with LaTeX markup. We show that unlike neural OCR techniques using CTC-based models, attention-based approaches can tackle this non-standard OCR task. Our approach outperforms classical mathematical OCR systems by a large margin on in-domain rendered data, and, with pretraining, also performs well on out-of-domain handwritten data. To reduce the inference complexity associated with the attention-based approaches, we introduce a new coarse-to-fine attention layer that selects a support region before applying attention.

APA


Deng, Y., Kanervisto, A., Ling, J. & Rush, A.M.. (2017). Image-to-Markup Generation with Coarse-to-Fine Attention. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:980-989 Available from https://proceedings.mlr.press/v70/deng17a.html.

Related Material

Download PDF