Text Length Adaptation in Sentiment Classification

Reinald Kim Amplayo, Seonjae Lim, Seung-won Hwang
Proceedings of The Eleventh Asian Conference on Machine Learning, PMLR 101:646-661, 2019.

Abstract

Can a text classifier generalize well for datasets where the text length is different? For example, when short reviews are sentiment-labeled, can these transfer to predict the sentiment of long reviews (i.e., short to long transfer), or vice versa? While unsupervised transfer learning has been well-studied for cross domain/lingual transfer tasks, \textbf{Cross Length Transfer} (CLT) has not yet been explored. One reason is the assumption that length difference is trivially transferable in classification. We show that it is not, because short/long texts differ in context richness and word intensity. We devise new benchmark datasets from diverse domains and languages, and show that existing models from similar tasks cannot deal with the unique challenge of transferring across text lengths. We introduce a strong baseline model called \textsc{BaggedCNN} that treats long texts as bags containing short texts. We propose a state-of-the-art CLT model called \textbf{Le}ngth \textbf{Tra}nsfer \textbf{Net}work\textbf{s} (\textsc{LeTraNets}) that introduces a two-way encoding scheme for short and long texts using multiple training mechanisms. We test our models and find that existing models perform worse than the \textsc{BaggedCNN} baseline, while \textsc{LeTraNets} outperforms all models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v101-amplayo19a, title = {Text Length Adaptation in Sentiment Classification}, author = {Amplayo, Reinald Kim and Lim, Seonjae and Hwang, Seung-won}, booktitle = {Proceedings of The Eleventh Asian Conference on Machine Learning}, pages = {646--661}, year = {2019}, editor = {Lee, Wee Sun and Suzuki, Taiji}, volume = {101}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v101/amplayo19a/amplayo19a.pdf}, url = {https://proceedings.mlr.press/v101/amplayo19a.html}, abstract = {Can a text classifier generalize well for datasets where the text length is different? For example, when short reviews are sentiment-labeled, can these transfer to predict the sentiment of long reviews (i.e., short to long transfer), or vice versa? While unsupervised transfer learning has been well-studied for cross domain/lingual transfer tasks, \textbf{Cross Length Transfer} (CLT) has not yet been explored. One reason is the assumption that length difference is trivially transferable in classification. We show that it is not, because short/long texts differ in context richness and word intensity. We devise new benchmark datasets from diverse domains and languages, and show that existing models from similar tasks cannot deal with the unique challenge of transferring across text lengths. We introduce a strong baseline model called \textsc{BaggedCNN} that treats long texts as bags containing short texts. We propose a state-of-the-art CLT model called \textbf{Le}ngth \textbf{Tra}nsfer \textbf{Net}work\textbf{s} (\textsc{LeTraNets}) that introduces a two-way encoding scheme for short and long texts using multiple training mechanisms. We test our models and find that existing models perform worse than the \textsc{BaggedCNN} baseline, while \textsc{LeTraNets} outperforms all models.} }
Endnote
%0 Conference Paper %T Text Length Adaptation in Sentiment Classification %A Reinald Kim Amplayo %A Seonjae Lim %A Seung-won Hwang %B Proceedings of The Eleventh Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Wee Sun Lee %E Taiji Suzuki %F pmlr-v101-amplayo19a %I PMLR %P 646--661 %U https://proceedings.mlr.press/v101/amplayo19a.html %V 101 %X Can a text classifier generalize well for datasets where the text length is different? For example, when short reviews are sentiment-labeled, can these transfer to predict the sentiment of long reviews (i.e., short to long transfer), or vice versa? While unsupervised transfer learning has been well-studied for cross domain/lingual transfer tasks, \textbf{Cross Length Transfer} (CLT) has not yet been explored. One reason is the assumption that length difference is trivially transferable in classification. We show that it is not, because short/long texts differ in context richness and word intensity. We devise new benchmark datasets from diverse domains and languages, and show that existing models from similar tasks cannot deal with the unique challenge of transferring across text lengths. We introduce a strong baseline model called \textsc{BaggedCNN} that treats long texts as bags containing short texts. We propose a state-of-the-art CLT model called \textbf{Le}ngth \textbf{Tra}nsfer \textbf{Net}work\textbf{s} (\textsc{LeTraNets}) that introduces a two-way encoding scheme for short and long texts using multiple training mechanisms. We test our models and find that existing models perform worse than the \textsc{BaggedCNN} baseline, while \textsc{LeTraNets} outperforms all models.
APA
Amplayo, R.K., Lim, S. & Hwang, S.. (2019). Text Length Adaptation in Sentiment Classification. Proceedings of The Eleventh Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 101:646-661 Available from https://proceedings.mlr.press/v101/amplayo19a.html.

Related Material