Multimodal Attention Branch Network for Perspective-Free Sentence Generation

Aly Magassouba, Komei Sugiura, Hisashi Kawai
Proceedings of the Conference on Robot Learning, PMLR 100:76-85, 2020.

Abstract

In this paper, we address the automatic sentence generation of fetching instructions for domestic service robots. Typical fetching commands such as “bring me the yellow toy from the upper part of the white shelf” includes referring expressions, i.e., “from the white upper part of the white shelf”. To solve this task, we propose a multimodal attention branch network (Multi-ABN) which generates natural sentences in an end-to-end manner. Multi-ABN uses multiple images of the same fixed scene to generate sentences that are not tied to a particular viewpoint. This approach combines a linguistic attention branch mechanism with several attention branch mechanisms. We evaluated our approach, which outperforms the state-of-the-art method on a standard metrics. Our method also allows us to visualize the alignment between the linguistic and visual features.

Cite this Paper


BibTeX
@InProceedings{pmlr-v100-magassouba20a, title = {Multimodal Attention Branch Network for Perspective-Free Sentence Generation}, author = {Magassouba, Aly and Sugiura, Komei and Kawai, Hisashi}, booktitle = {Proceedings of the Conference on Robot Learning}, pages = {76--85}, year = {2020}, editor = {Kaelbling, Leslie Pack and Kragic, Danica and Sugiura, Komei}, volume = {100}, series = {Proceedings of Machine Learning Research}, month = {30 Oct--01 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v100/magassouba20a/magassouba20a.pdf}, url = {https://proceedings.mlr.press/v100/magassouba20a.html}, abstract = {In this paper, we address the automatic sentence generation of fetching instructions for domestic service robots. Typical fetching commands such as “bring me the yellow toy from the upper part of the white shelf” includes referring expressions, i.e., “from the white upper part of the white shelf”. To solve this task, we propose a multimodal attention branch network (Multi-ABN) which generates natural sentences in an end-to-end manner. Multi-ABN uses multiple images of the same fixed scene to generate sentences that are not tied to a particular viewpoint. This approach combines a linguistic attention branch mechanism with several attention branch mechanisms. We evaluated our approach, which outperforms the state-of-the-art method on a standard metrics. Our method also allows us to visualize the alignment between the linguistic and visual features.} }
Endnote
%0 Conference Paper %T Multimodal Attention Branch Network for Perspective-Free Sentence Generation %A Aly Magassouba %A Komei Sugiura %A Hisashi Kawai %B Proceedings of the Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2020 %E Leslie Pack Kaelbling %E Danica Kragic %E Komei Sugiura %F pmlr-v100-magassouba20a %I PMLR %P 76--85 %U https://proceedings.mlr.press/v100/magassouba20a.html %V 100 %X In this paper, we address the automatic sentence generation of fetching instructions for domestic service robots. Typical fetching commands such as “bring me the yellow toy from the upper part of the white shelf” includes referring expressions, i.e., “from the white upper part of the white shelf”. To solve this task, we propose a multimodal attention branch network (Multi-ABN) which generates natural sentences in an end-to-end manner. Multi-ABN uses multiple images of the same fixed scene to generate sentences that are not tied to a particular viewpoint. This approach combines a linguistic attention branch mechanism with several attention branch mechanisms. We evaluated our approach, which outperforms the state-of-the-art method on a standard metrics. Our method also allows us to visualize the alignment between the linguistic and visual features.
APA
Magassouba, A., Sugiura, K. & Kawai, H.. (2020). Multimodal Attention Branch Network for Perspective-Free Sentence Generation. Proceedings of the Conference on Robot Learning, in Proceedings of Machine Learning Research 100:76-85 Available from https://proceedings.mlr.press/v100/magassouba20a.html.

Related Material