Realistic Image Generation using Region-phrase Attention

Wanming Huang, Richard Yi Da Xu, Ian Oppermann
Proceedings of The Eleventh Asian Conference on Machine Learning, PMLR 101:284-299, 2019.

Abstract

The Generative Adversarial Network (GAN) has achieved remarkable progress in generating synthetic images from text, especially since the use of the attention mechanism. The current state-of-the-art algorithm applies attentions between individual regular-grid regions of an image and words of a sentence. These approaches are sufficient to generate images that contain a single object in its foreground. However, natural languages often involve complex foreground objects and the background may also constitute a variable portion of the generated image. In this case, the regular-grid region based image attention weights may not necessarily concentrate on the intended foreground region(s), which in turn, results in an unnatural looking image. Additionally, individual words such as “a”, “blue” and “shirt” do not necessarily provide a full visual context unless they are applied together. For this reason, in our paper, we proposed a novel method in which we introduced an additional set of natural attentions between object-grid regions and word phrases. The object-grid region is defined by a set of auxiliary bounding boxes. They serve as superior location indicators to where the alignment and attention should be drawn with the word phrases. We perform experiments on the Microsoft Common Objects in Context (MSCOCO) dataset and prove that our proposed approach is capable of generating more realistic images compared with the current state-of-the-art algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v101-huang19a, title = {Realistic Image Generation using Region-phrase Attention}, author = {Huang, Wanming and Xu, Richard Yi Da and Oppermann, Ian}, booktitle = {Proceedings of The Eleventh Asian Conference on Machine Learning}, pages = {284--299}, year = {2019}, editor = {Lee, Wee Sun and Suzuki, Taiji}, volume = {101}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v101/huang19a/huang19a.pdf}, url = {https://proceedings.mlr.press/v101/huang19a.html}, abstract = {The Generative Adversarial Network (GAN) has achieved remarkable progress in generating synthetic images from text, especially since the use of the attention mechanism. The current state-of-the-art algorithm applies attentions between individual regular-grid regions of an image and words of a sentence. These approaches are sufficient to generate images that contain a single object in its foreground. However, natural languages often involve complex foreground objects and the background may also constitute a variable portion of the generated image. In this case, the regular-grid region based image attention weights may not necessarily concentrate on the intended foreground region(s), which in turn, results in an unnatural looking image. Additionally, individual words such as “a”, “blue” and “shirt” do not necessarily provide a full visual context unless they are applied together. For this reason, in our paper, we proposed a novel method in which we introduced an additional set of natural attentions between object-grid regions and word phrases. The object-grid region is defined by a set of auxiliary bounding boxes. They serve as superior location indicators to where the alignment and attention should be drawn with the word phrases. We perform experiments on the Microsoft Common Objects in Context (MSCOCO) dataset and prove that our proposed approach is capable of generating more realistic images compared with the current state-of-the-art algorithms.} }
Endnote
%0 Conference Paper %T Realistic Image Generation using Region-phrase Attention %A Wanming Huang %A Richard Yi Da Xu %A Ian Oppermann %B Proceedings of The Eleventh Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Wee Sun Lee %E Taiji Suzuki %F pmlr-v101-huang19a %I PMLR %P 284--299 %U https://proceedings.mlr.press/v101/huang19a.html %V 101 %X The Generative Adversarial Network (GAN) has achieved remarkable progress in generating synthetic images from text, especially since the use of the attention mechanism. The current state-of-the-art algorithm applies attentions between individual regular-grid regions of an image and words of a sentence. These approaches are sufficient to generate images that contain a single object in its foreground. However, natural languages often involve complex foreground objects and the background may also constitute a variable portion of the generated image. In this case, the regular-grid region based image attention weights may not necessarily concentrate on the intended foreground region(s), which in turn, results in an unnatural looking image. Additionally, individual words such as “a”, “blue” and “shirt” do not necessarily provide a full visual context unless they are applied together. For this reason, in our paper, we proposed a novel method in which we introduced an additional set of natural attentions between object-grid regions and word phrases. The object-grid region is defined by a set of auxiliary bounding boxes. They serve as superior location indicators to where the alignment and attention should be drawn with the word phrases. We perform experiments on the Microsoft Common Objects in Context (MSCOCO) dataset and prove that our proposed approach is capable of generating more realistic images compared with the current state-of-the-art algorithms.
APA
Huang, W., Xu, R.Y.D. & Oppermann, I.. (2019). Realistic Image Generation using Region-phrase Attention. Proceedings of The Eleventh Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 101:284-299 Available from https://proceedings.mlr.press/v101/huang19a.html.

Related Material