Concorde: Morphological Agreement in Conversational Models

[edit]

Daniil Polykovskiy, Dmitry Soloviev, Sergey Nikolenko ;
Proceedings of The 10th Asian Conference on Machine Learning, PMLR 95:407-421, 2018.

Abstract

Neural conversational models are widely used in applications such as personal assistants and chat bots. These models seem to give better performance when operating on the word level. However, for fusional languages such as French, Russian, or Polish, the vocabulary size can become infeasible since most of the words have multiple of word forms. To reduce vocabulary size, we propose a new pipeline for building conversational models: first generate words in a standard (lemmatized) form and then transform them into a grammatically correct sentence. In this work, we focus on the \emph{morphological agreement} part of the pipeline, i.e., reconstructing proper word forms from lemmatized sentences. For this task, we propose a neural network architecture that outperforms character-level models while being twice faster in training and 20% faster in inference. The proposed pipeline yields better performance than character-level conversational models according to human assessor testing.

Related Material