Dynamic Memory Networks for Visual and Textual Question Answering

Caiming Xiong; Stephen Merity; Richard Socher

Dynamic Memory Networks for Visual and Textual Question Answering

Caiming Xiong, Stephen Merity, Richard Socher

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:2397-2406, 2016.

Abstract

Neural network architectures with memory and attention mechanisms exhibit certain reason- ing capabilities required for question answering. One such architecture, the dynamic memory net- work (DMN), obtained high accuracy on a variety of language tasks. However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during training or whether it could be applied to other modalities such as images. Based on an analysis of the DMN, we propose several improvements to its memory and input modules. Together with these changes we introduce a novel input module for images in order to be able to answer visual questions. Our new DMN+ model improves the state of the art on both the Visual Question Answering dataset and the bAbI-10k text question-answering dataset without supporting fact supervision.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-xiong16,
  title = 	 {Dynamic Memory Networks for Visual and Textual Question Answering},
  author = 	 {Xiong, Caiming and Merity, Stephen and Socher, Richard},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {2397--2406},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/xiong16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/xiong16.html},
  abstract = 	 {Neural network architectures with memory and attention mechanisms exhibit certain reason- ing capabilities required for question answering. One such architecture, the dynamic memory net- work (DMN), obtained high accuracy on a variety of language tasks. However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during training or whether it could be applied to other modalities such as images. Based on an analysis of the DMN, we propose several improvements to its memory and input modules. Together with these changes we introduce a novel input module for images in order to be able to answer visual questions. Our new DMN+ model improves the state of the art on both the Visual Question Answering dataset and the bAbI-10k text question-answering dataset without supporting fact supervision.}
}

Endnote

%0 Conference Paper
%T Dynamic Memory Networks for Visual and Textual Question Answering
%A Caiming Xiong
%A Stephen Merity
%A Richard Socher
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-xiong16
%I PMLR
%P 2397--2406
%U https://proceedings.mlr.press/v48/xiong16.html
%V 48
%X Neural network architectures with memory and attention mechanisms exhibit certain reason- ing capabilities required for question answering. One such architecture, the dynamic memory net- work (DMN), obtained high accuracy on a variety of language tasks. However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during training or whether it could be applied to other modalities such as images. Based on an analysis of the DMN, we propose several improvements to its memory and input modules. Together with these changes we introduce a novel input module for images in order to be able to answer visual questions. Our new DMN+ model improves the state of the art on both the Visual Question Answering dataset and the bAbI-10k text question-answering dataset without supporting fact supervision.

RIS


TY  - CPAPER
TI  - Dynamic Memory Networks for Visual and Textual Question Answering
AU  - Caiming Xiong
AU  - Stephen Merity
AU  - Richard Socher
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-xiong16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 2397
EP  - 2406
L1  - http://proceedings.mlr.press/v48/xiong16.pdf
UR  - https://proceedings.mlr.press/v48/xiong16.html
AB  - Neural network architectures with memory and attention mechanisms exhibit certain reason- ing capabilities required for question answering. One such architecture, the dynamic memory net- work (DMN), obtained high accuracy on a variety of language tasks. However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during training or whether it could be applied to other modalities such as images. Based on an analysis of the DMN, we propose several improvements to its memory and input modules. Together with these changes we introduce a novel input module for images in order to be able to answer visual questions. Our new DMN+ model improves the state of the art on both the Visual Question Answering dataset and the bAbI-10k text question-answering dataset without supporting fact supervision.
ER  -

APA


Xiong, C., Merity, S. & Socher, R.. (2016). Dynamic Memory Networks for Visual and Textual Question Answering. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:2397-2406 Available from https://proceedings.mlr.press/v48/xiong16.html.

Related Material

Download PDF