Gap Between Theory and Practice: Noise Sensitive Word Alignment in Machine Translation


Tsuyoshi Okita, Yvette Graham, Andy Way ;
Proceedings of the First Workshop on Applications of Pattern Analysis, PMLR 11:119-126, 2010.


Word alignment is to estimate a lexical translation probability \emphp(\emphe|\emphf), or to estimate the correspondence \emphg(\emphe,\emphf) where a function \emphg outputs either 0 or 1, between a source word \emphf and a target word \emphe for given bilingual sentences. In practice, this formulation does not consider the existence of ’noise’ (or outlier) which may cause problems depending on the corpus. \emphN-to-\emphm mapping objects, such as paraphrases, non-literal translations, and multi-word expressions, may appear as both noise and also as valid training data. From this perspective, this paper tries to answer the following two questions: 1) how to detect stable patterns where noise seems legitimate, and 2) how to reduce such noise, where applicable, by supplying extra information as prior knowledge to a word aligner.

Related Material