A Tree-Structured Decoder for Image-to-Markup Generation

Jianshu Zhang, Jun Du, Yongxin Yang, Yi-Zhe Song, Si Wei, Lirong Dai
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:11076-11085, 2020.

Abstract

Recent encoder-decoder approaches typically employ string decoders to convert images into serialized strings for image-to-markup. However, for tree-structured representational markup, string representations can hardly cope with the structural complexity. In this work, we first show via a set of toy problems that string decoders struggle to decode tree structures, especially as structural complexity increases, we then propose a tree-structured decoder that specifically aims at generating a tree-structured markup. Our decoders works sequentially, where at each step a child node and its parent node are simultaneously generated to form a sub-tree. This sub-tree is consequently used to construct the final tree structure in a recurrent manner. Key to the success of our tree decoder is twofold, (i) it strictly respects the parent-child relationship of trees, and (ii) it explicitly outputs trees as oppose to a linear string. Evaluated on both math formula recognition and chemical formula recognition, the proposed tree decoder is shown to greatly outperform strong string decoder baselines.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-zhang20g, title = {A Tree-Structured Decoder for Image-to-Markup Generation}, author = {Zhang, Jianshu and Du, Jun and Yang, Yongxin and Song, Yi-Zhe and Wei, Si and Dai, Lirong}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {11076--11085}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/zhang20g/zhang20g.pdf}, url = {https://proceedings.mlr.press/v119/zhang20g.html}, abstract = {Recent encoder-decoder approaches typically employ string decoders to convert images into serialized strings for image-to-markup. However, for tree-structured representational markup, string representations can hardly cope with the structural complexity. In this work, we first show via a set of toy problems that string decoders struggle to decode tree structures, especially as structural complexity increases, we then propose a tree-structured decoder that specifically aims at generating a tree-structured markup. Our decoders works sequentially, where at each step a child node and its parent node are simultaneously generated to form a sub-tree. This sub-tree is consequently used to construct the final tree structure in a recurrent manner. Key to the success of our tree decoder is twofold, (i) it strictly respects the parent-child relationship of trees, and (ii) it explicitly outputs trees as oppose to a linear string. Evaluated on both math formula recognition and chemical formula recognition, the proposed tree decoder is shown to greatly outperform strong string decoder baselines.} }
Endnote
%0 Conference Paper %T A Tree-Structured Decoder for Image-to-Markup Generation %A Jianshu Zhang %A Jun Du %A Yongxin Yang %A Yi-Zhe Song %A Si Wei %A Lirong Dai %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-zhang20g %I PMLR %P 11076--11085 %U https://proceedings.mlr.press/v119/zhang20g.html %V 119 %X Recent encoder-decoder approaches typically employ string decoders to convert images into serialized strings for image-to-markup. However, for tree-structured representational markup, string representations can hardly cope with the structural complexity. In this work, we first show via a set of toy problems that string decoders struggle to decode tree structures, especially as structural complexity increases, we then propose a tree-structured decoder that specifically aims at generating a tree-structured markup. Our decoders works sequentially, where at each step a child node and its parent node are simultaneously generated to form a sub-tree. This sub-tree is consequently used to construct the final tree structure in a recurrent manner. Key to the success of our tree decoder is twofold, (i) it strictly respects the parent-child relationship of trees, and (ii) it explicitly outputs trees as oppose to a linear string. Evaluated on both math formula recognition and chemical formula recognition, the proposed tree decoder is shown to greatly outperform strong string decoder baselines.
APA
Zhang, J., Du, J., Yang, Y., Song, Y., Wei, S. & Dai, L.. (2020). A Tree-Structured Decoder for Image-to-Markup Generation. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:11076-11085 Available from https://proceedings.mlr.press/v119/zhang20g.html.

Related Material