Phrase-based Image Captioning

Remi Lebret; Pedro Pinheiro; Ronan Collobert

Phrase-based Image Captioning

Remi Lebret, Pedro Pinheiro, Ronan Collobert

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:2085-2094, 2015.

Abstract

Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a sample image. This model has a strong focus on the syntax of the descriptions. We train a purely linear model to embed an image representation (generated from a previously trained Convolutional Neural Network) into a multimodal space that is common to the images and the phrases that are used to described them. The system is then able to infer phrases from a given image sample. Based on the sentence description statistics, we propose a simple language model that can produce relevant descriptions for a given test image using the phrases inferred. Our approach, which is considerably simpler than state-of-the-art models, achieves comparable results in two popular datasets for the task: Flickr30k and the recently proposed Microsoft COCO.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-lebret15,
  title = 	 {Phrase-based Image Captioning},
  author = 	 {Lebret, Remi and Pinheiro, Pedro and Collobert, Ronan},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {2085--2094},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/lebret15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/lebret15.html},
  abstract = 	 {Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a sample image. This model has a strong focus on the syntax of the descriptions. We train a purely linear model to embed an image representation (generated from a previously trained Convolutional Neural Network) into a multimodal space that is common to the images and the phrases that are used to described them. The system is then able to infer phrases from a given image sample. Based on the sentence description statistics, we propose a simple language model that can produce relevant descriptions for a given test image using the phrases inferred. Our approach, which is considerably simpler than state-of-the-art models, achieves comparable results in two popular datasets for the task: Flickr30k and the recently proposed Microsoft COCO.}
}

Endnote

%0 Conference Paper
%T Phrase-based Image Captioning
%A Remi Lebret
%A Pedro Pinheiro
%A Ronan Collobert
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-lebret15
%I PMLR
%P 2085--2094
%U https://proceedings.mlr.press/v37/lebret15.html
%V 37
%X Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a sample image. This model has a strong focus on the syntax of the descriptions. We train a purely linear model to embed an image representation (generated from a previously trained Convolutional Neural Network) into a multimodal space that is common to the images and the phrases that are used to described them. The system is then able to infer phrases from a given image sample. Based on the sentence description statistics, we propose a simple language model that can produce relevant descriptions for a given test image using the phrases inferred. Our approach, which is considerably simpler than state-of-the-art models, achieves comparable results in two popular datasets for the task: Flickr30k and the recently proposed Microsoft COCO.

RIS


TY  - CPAPER
TI  - Phrase-based Image Captioning
AU  - Remi Lebret
AU  - Pedro Pinheiro
AU  - Ronan Collobert
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-lebret15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 2085
EP  - 2094
L1  - http://proceedings.mlr.press/v37/lebret15.pdf
UR  - https://proceedings.mlr.press/v37/lebret15.html
AB  - Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a sample image. This model has a strong focus on the syntax of the descriptions. We train a purely linear model to embed an image representation (generated from a previously trained Convolutional Neural Network) into a multimodal space that is common to the images and the phrases that are used to described them. The system is then able to infer phrases from a given image sample. Based on the sentence description statistics, we propose a simple language model that can produce relevant descriptions for a given test image using the phrases inferred. Our approach, which is considerably simpler than state-of-the-art models, achieves comparable results in two popular datasets for the task: Flickr30k and the recently proposed Microsoft COCO.
ER  -

APA


Lebret, R., Pinheiro, P. & Collobert, R.. (2015). Phrase-based Image Captioning. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:2085-2094 Available from https://proceedings.mlr.press/v37/lebret15.html.

Related Material

Download PDF