Radical-level Ideograph Encoder for RNN-based Sentiment Analysis of Chinese and Japanese

Yuanzhi Ke, Masafumi Hagiwara
Proceedings of the Ninth Asian Conference on Machine Learning, PMLR 77:561-573, 2017.

Abstract

The character vocabulary can be very large in non-alphabetic languages such as Chinese and Japanese, which makes neural network models huge to process such languages. We explored a model for sentiment classification that takes the embeddings of the radicals of the Chinese characters, i.e, hanzi of Chinese and kanji of Japanese. Our model is composed of a CNN word feature encoder and a bi-directional RNN document feature encoder. The results achieved are on par with the character embedding-based models, and close to the state-of-the-art word embedding-based models, with 90% smaller vocabulary, and at least 13% and 80% fewer parameters than the character embedding-based models and word embedding-based models respectively. The results suggest that the radical embeddingbased approach is cost-effective for machine learning on Chinese and Japanese.

Cite this Paper


BibTeX
@InProceedings{pmlr-v77-ke17a, title = {Radical-level Ideograph Encoder for RNN-based Sentiment Analysis of Chinese and Japanese}, author = {Ke, Yuanzhi and Hagiwara, Masafumi}, booktitle = {Proceedings of the Ninth Asian Conference on Machine Learning}, pages = {561--573}, year = {2017}, editor = {Zhang, Min-Ling and Noh, Yung-Kyun}, volume = {77}, series = {Proceedings of Machine Learning Research}, address = {Yonsei University, Seoul, Republic of Korea}, month = {15--17 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v77/ke17a/ke17a.pdf}, url = {https://proceedings.mlr.press/v77/ke17a.html}, abstract = {The character vocabulary can be very large in non-alphabetic languages such as Chinese and Japanese, which makes neural network models huge to process such languages. We explored a model for sentiment classification that takes the embeddings of the radicals of the Chinese characters, i.e, hanzi of Chinese and kanji of Japanese. Our model is composed of a CNN word feature encoder and a bi-directional RNN document feature encoder. The results achieved are on par with the character embedding-based models, and close to the state-of-the-art word embedding-based models, with 90% smaller vocabulary, and at least 13% and 80% fewer parameters than the character embedding-based models and word embedding-based models respectively. The results suggest that the radical embeddingbased approach is cost-effective for machine learning on Chinese and Japanese.} }
Endnote
%0 Conference Paper %T Radical-level Ideograph Encoder for RNN-based Sentiment Analysis of Chinese and Japanese %A Yuanzhi Ke %A Masafumi Hagiwara %B Proceedings of the Ninth Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Min-Ling Zhang %E Yung-Kyun Noh %F pmlr-v77-ke17a %I PMLR %P 561--573 %U https://proceedings.mlr.press/v77/ke17a.html %V 77 %X The character vocabulary can be very large in non-alphabetic languages such as Chinese and Japanese, which makes neural network models huge to process such languages. We explored a model for sentiment classification that takes the embeddings of the radicals of the Chinese characters, i.e, hanzi of Chinese and kanji of Japanese. Our model is composed of a CNN word feature encoder and a bi-directional RNN document feature encoder. The results achieved are on par with the character embedding-based models, and close to the state-of-the-art word embedding-based models, with 90% smaller vocabulary, and at least 13% and 80% fewer parameters than the character embedding-based models and word embedding-based models respectively. The results suggest that the radical embeddingbased approach is cost-effective for machine learning on Chinese and Japanese.
APA
Ke, Y. & Hagiwara, M.. (2017). Radical-level Ideograph Encoder for RNN-based Sentiment Analysis of Chinese and Japanese. Proceedings of the Ninth Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 77:561-573 Available from https://proceedings.mlr.press/v77/ke17a.html.

Related Material