Concorde: Morphological Agreement in Conversational Models

Daniil Polykovskiy; Dmitry Soloviev; Sergey Nikolenko

Concorde: Morphological Agreement in Conversational Models

Daniil Polykovskiy, Dmitry Soloviev, Sergey Nikolenko

Proceedings of The 10th Asian Conference on Machine Learning, PMLR 95:407-421, 2018.

Abstract

Neural conversational models are widely used in applications such as personal assistants and chat bots. These models seem to give better performance when operating on the word level. However, for fusional languages such as French, Russian, or Polish, the vocabulary size can become infeasible since most of the words have multiple of word forms. To reduce vocabulary size, we propose a new pipeline for building conversational models: first generate words in a standard (lemmatized) form and then transform them into a grammatically correct sentence. In this work, we focus on the \emph{morphological agreement} part of the pipeline, i.e., reconstructing proper word forms from lemmatized sentences. For this task, we propose a neural network architecture that outperforms character-level models while being twice faster in training and 20% faster in inference. The proposed pipeline yields better performance than character-level conversational models according to human assessor testing.

Cite this Paper

BibTeX

@InProceedings{pmlr-v95-polykovskiy18a,
  title = 	 {Concorde: Morphological Agreement in Conversational Models},
  author =       {Polykovskiy, Daniil and Soloviev, Dmitry and Nikolenko, Sergey},
  booktitle = 	 {Proceedings of The 10th Asian Conference on Machine Learning},
  pages = 	 {407--421},
  year = 	 {2018},
  editor = 	 {Zhu, Jun and Takeuchi, Ichiro},
  volume = 	 {95},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {14--16 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v95/polykovskiy18a/polykovskiy18a.pdf},
  url = 	 {https://proceedings.mlr.press/v95/polykovskiy18a.html},
  abstract = 	 {Neural conversational models are widely used in applications such as personal assistants and chat bots. These models seem to give better performance when operating on the word level. However, for fusional languages such as French, Russian, or Polish, the vocabulary size can become infeasible since most of the words have multiple of word forms. To reduce vocabulary size, we propose a new pipeline for building conversational models: first generate words in a standard (lemmatized) form and then transform them into a grammatically correct sentence. In this work, we focus on the \emph{morphological agreement} part of the pipeline, i.e., reconstructing proper word forms from lemmatized sentences. For this task, we propose a neural network architecture that outperforms character-level models while being twice faster in training and 20% faster in inference. The proposed pipeline yields better performance than character-level conversational models according to human assessor testing.}
}

Endnote

%0 Conference Paper
%T Concorde: Morphological Agreement in Conversational Models
%A Daniil Polykovskiy
%A Dmitry Soloviev
%A Sergey Nikolenko
%B Proceedings of The 10th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jun Zhu
%E Ichiro Takeuchi	
%F pmlr-v95-polykovskiy18a
%I PMLR
%P 407--421
%U https://proceedings.mlr.press/v95/polykovskiy18a.html
%V 95
%X Neural conversational models are widely used in applications such as personal assistants and chat bots. These models seem to give better performance when operating on the word level. However, for fusional languages such as French, Russian, or Polish, the vocabulary size can become infeasible since most of the words have multiple of word forms. To reduce vocabulary size, we propose a new pipeline for building conversational models: first generate words in a standard (lemmatized) form and then transform them into a grammatically correct sentence. In this work, we focus on the \emph{morphological agreement} part of the pipeline, i.e., reconstructing proper word forms from lemmatized sentences. For this task, we propose a neural network architecture that outperforms character-level models while being twice faster in training and 20% faster in inference. The proposed pipeline yields better performance than character-level conversational models according to human assessor testing.

APA

Polykovskiy, D., Soloviev, D. & Nikolenko, S.. (2018). Concorde: Morphological Agreement in Conversational Models. Proceedings of The 10th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 95:407-421 Available from https://proceedings.mlr.press/v95/polykovskiy18a.html.

Related Material

Download PDF