Gap Between Theory and Practice: Noise Sensitive Word Alignment in Machine Translation

Tsuyoshi Okita, Yvette Graham, Andy Way
Proceedings of the First Workshop on Applications of Pattern Analysis, PMLR 11:119-126, 2010.

Abstract

Word alignment is to estimate a lexical translation probability \emphp(\emphe|\emphf), or to estimate the correspondence \emphg(\emphe,\emphf) where a function \emphg outputs either 0 or 1, between a source word \emphf and a target word \emphe for given bilingual sentences. In practice, this formulation does not consider the existence of ’noise’ (or outlier) which may cause problems depending on the corpus. \emphN-to-\emphm mapping objects, such as paraphrases, non-literal translations, and multi-word expressions, may appear as both noise and also as valid training data. From this perspective, this paper tries to answer the following two questions: 1) how to detect stable patterns where noise seems legitimate, and 2) how to reduce such noise, where applicable, by supplying extra information as prior knowledge to a word aligner.

Cite this Paper


BibTeX
@InProceedings{pmlr-v11-okita10a, title = {Gap Between Theory and Practice: Noise Sensitive Word Alignment in Machine Translation}, author = {Okita, Tsuyoshi and Graham, Yvette and Way, Andy}, booktitle = {Proceedings of the First Workshop on Applications of Pattern Analysis}, pages = {119--126}, year = {2010}, editor = {Diethe, Tom and Cristianini, Nello and Shawe-Taylor, John}, volume = {11}, series = {Proceedings of Machine Learning Research}, address = {Cumberland Lodge, Windsor, UK}, month = {01--03 Sep}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v11/okita10a/okita10a.pdf}, url = {https://proceedings.mlr.press/v11/okita10a.html}, abstract = {Word alignment is to estimate a lexical translation probability \emphp(\emphe|\emphf), or to estimate the correspondence \emphg(\emphe,\emphf) where a function \emphg outputs either 0 or 1, between a source word \emphf and a target word \emphe for given bilingual sentences. In practice, this formulation does not consider the existence of ’noise’ (or outlier) which may cause problems depending on the corpus. \emphN-to-\emphm mapping objects, such as paraphrases, non-literal translations, and multi-word expressions, may appear as both noise and also as valid training data. From this perspective, this paper tries to answer the following two questions: 1) how to detect stable patterns where noise seems legitimate, and 2) how to reduce such noise, where applicable, by supplying extra information as prior knowledge to a word aligner.} }
Endnote
%0 Conference Paper %T Gap Between Theory and Practice: Noise Sensitive Word Alignment in Machine Translation %A Tsuyoshi Okita %A Yvette Graham %A Andy Way %B Proceedings of the First Workshop on Applications of Pattern Analysis %C Proceedings of Machine Learning Research %D 2010 %E Tom Diethe %E Nello Cristianini %E John Shawe-Taylor %F pmlr-v11-okita10a %I PMLR %P 119--126 %U https://proceedings.mlr.press/v11/okita10a.html %V 11 %X Word alignment is to estimate a lexical translation probability \emphp(\emphe|\emphf), or to estimate the correspondence \emphg(\emphe,\emphf) where a function \emphg outputs either 0 or 1, between a source word \emphf and a target word \emphe for given bilingual sentences. In practice, this formulation does not consider the existence of ’noise’ (or outlier) which may cause problems depending on the corpus. \emphN-to-\emphm mapping objects, such as paraphrases, non-literal translations, and multi-word expressions, may appear as both noise and also as valid training data. From this perspective, this paper tries to answer the following two questions: 1) how to detect stable patterns where noise seems legitimate, and 2) how to reduce such noise, where applicable, by supplying extra information as prior knowledge to a word aligner.
RIS
TY - CPAPER TI - Gap Between Theory and Practice: Noise Sensitive Word Alignment in Machine Translation AU - Tsuyoshi Okita AU - Yvette Graham AU - Andy Way BT - Proceedings of the First Workshop on Applications of Pattern Analysis DA - 2010/09/30 ED - Tom Diethe ED - Nello Cristianini ED - John Shawe-Taylor ID - pmlr-v11-okita10a PB - PMLR DP - Proceedings of Machine Learning Research VL - 11 SP - 119 EP - 126 L1 - http://proceedings.mlr.press/v11/okita10a/okita10a.pdf UR - https://proceedings.mlr.press/v11/okita10a.html AB - Word alignment is to estimate a lexical translation probability \emphp(\emphe|\emphf), or to estimate the correspondence \emphg(\emphe,\emphf) where a function \emphg outputs either 0 or 1, between a source word \emphf and a target word \emphe for given bilingual sentences. In practice, this formulation does not consider the existence of ’noise’ (or outlier) which may cause problems depending on the corpus. \emphN-to-\emphm mapping objects, such as paraphrases, non-literal translations, and multi-word expressions, may appear as both noise and also as valid training data. From this perspective, this paper tries to answer the following two questions: 1) how to detect stable patterns where noise seems legitimate, and 2) how to reduce such noise, where applicable, by supplying extra information as prior knowledge to a word aligner. ER -
APA
Okita, T., Graham, Y. & Way, A.. (2010). Gap Between Theory and Practice: Noise Sensitive Word Alignment in Machine Translation. Proceedings of the First Workshop on Applications of Pattern Analysis, in Proceedings of Machine Learning Research 11:119-126 Available from https://proceedings.mlr.press/v11/okita10a.html.

Related Material