Radical-level Ideograph Encoder for RNN-based Sentiment Analysis of Chinese and Japanese

Yuanzhi Ke; Masafumi Hagiwara

Radical-level Ideograph Encoder for RNN-based Sentiment Analysis of Chinese and Japanese

Yuanzhi Ke, Masafumi Hagiwara

Proceedings of the Ninth Asian Conference on Machine Learning, PMLR 77:561-573, 2017.

Abstract

The character vocabulary can be very large in non-alphabetic languages such as Chinese and Japanese, which makes neural network models huge to process such languages. We explored a model for sentiment classification that takes the embeddings of the radicals of the Chinese characters, i.e, hanzi of Chinese and kanji of Japanese. Our model is composed of a CNN word feature encoder and a bi-directional RNN document feature encoder. The results achieved are on par with the character embedding-based models, and close to the state-of-the-art word embedding-based models, with 90% smaller vocabulary, and at least 13% and 80% fewer parameters than the character embedding-based models and word embedding-based models respectively. The results suggest that the radical embeddingbased approach is cost-effective for machine learning on Chinese and Japanese.

Cite this Paper

BibTeX


@InProceedings{pmlr-v77-ke17a,
  title = 	 {Radical-level Ideograph Encoder for RNN-based Sentiment Analysis of Chinese and Japanese},
  author = 	 {Ke, Yuanzhi and Hagiwara, Masafumi},
  booktitle = 	 {Proceedings of the Ninth Asian Conference on Machine Learning},
  pages = 	 {561--573},
  year = 	 {2017},
  editor = 	 {Zhang, Min-Ling and Noh, Yung-Kyun},
  volume = 	 {77},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Yonsei University, Seoul, Republic of Korea},
  month = 	 {15--17 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v77/ke17a/ke17a.pdf},
  url = 	 {https://proceedings.mlr.press/v77/ke17a.html},
  abstract = 	 {The character vocabulary can be very large in non-alphabetic languages such as Chinese and Japanese, which makes neural network models huge to process such languages. We explored a model for sentiment classification that takes the embeddings of the radicals of the Chinese characters, i.e, hanzi of Chinese and kanji of Japanese. Our model is composed of a CNN word feature encoder and a bi-directional RNN document feature encoder. The results achieved are on par with the character embedding-based models, and close to the state-of-the-art word embedding-based models, with 90% smaller vocabulary, and at least 13% and 80% fewer parameters than the character embedding-based models and word embedding-based models respectively. The results suggest that the radical embeddingbased approach is cost-effective for machine learning on Chinese and Japanese.}
}

Endnote

%0 Conference Paper
%T Radical-level Ideograph Encoder for RNN-based Sentiment Analysis of Chinese and Japanese
%A Yuanzhi Ke
%A Masafumi Hagiwara
%B Proceedings of the Ninth Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2017
%E Min-Ling Zhang
%E Yung-Kyun Noh	
%F pmlr-v77-ke17a
%I PMLR
%P 561--573
%U https://proceedings.mlr.press/v77/ke17a.html
%V 77
%X The character vocabulary can be very large in non-alphabetic languages such as Chinese and Japanese, which makes neural network models huge to process such languages. We explored a model for sentiment classification that takes the embeddings of the radicals of the Chinese characters, i.e, hanzi of Chinese and kanji of Japanese. Our model is composed of a CNN word feature encoder and a bi-directional RNN document feature encoder. The results achieved are on par with the character embedding-based models, and close to the state-of-the-art word embedding-based models, with 90% smaller vocabulary, and at least 13% and 80% fewer parameters than the character embedding-based models and word embedding-based models respectively. The results suggest that the radical embeddingbased approach is cost-effective for machine learning on Chinese and Japanese.

APA


Ke, Y. & Hagiwara, M.. (2017). Radical-level Ideograph Encoder for RNN-based Sentiment Analysis of Chinese and Japanese. Proceedings of the Ninth Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 77:561-573 Available from https://proceedings.mlr.press/v77/ke17a.html.

Related Material

Download PDF