Towards Competitive N-gram Smoothing

Moein Falahatgar, Mesrob Ohannessian, Alon Orlitsky, Venkatadheeraj Pichapati
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:4206-4215, 2020.

Abstract

N-gram models remain a fundamental component of language modeling. In data-scarce regimes, they are a strong alternative to neural models. Even when not used as-is, recent work shows they can regularize neural models. Despite this success, the effectiveness of one of the best N-gram smoothing methods, the one suggested by Kneser and Ney (1995), is not fully understood. In the hopes of explaining this performance, we study it through the lens of competitive distribution estimation: the ability to perform as well as an oracle aware of further structure in the data. We first establish basic competitive properties of Kneser-Ney smoothing. We then investigate the nature of its backoff mechanism and show that it emerges from first principles, rather than being an assumption of the model. We do this by generalizing the Good-Turing estimator to the contextual setting. This exploration leads us to a powerful generalization of Kneser-Ney, which we conjecture to have even stronger competitive properties. Empirically, it significantly improves performance on language modeling, even matching feed-forward neural models. To show that the mechanisms at play are not restricted to language modeling, we demonstrate similar gains on the task of predicting attack types in the Global Terrorism Database.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-falahatgar20a, title = {Towards Competitive N-gram Smoothing}, author = {Falahatgar, Moein and Ohannessian, Mesrob and Orlitsky, Alon and Pichapati, Venkatadheeraj}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {4206--4215}, year = {2020}, editor = {Silvia Chiappa and Roberto Calandra}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/falahatgar20a/falahatgar20a.pdf}, url = { http://proceedings.mlr.press/v108/falahatgar20a.html }, abstract = {N-gram models remain a fundamental component of language modeling. In data-scarce regimes, they are a strong alternative to neural models. Even when not used as-is, recent work shows they can regularize neural models. Despite this success, the effectiveness of one of the best N-gram smoothing methods, the one suggested by Kneser and Ney (1995), is not fully understood. In the hopes of explaining this performance, we study it through the lens of competitive distribution estimation: the ability to perform as well as an oracle aware of further structure in the data. We first establish basic competitive properties of Kneser-Ney smoothing. We then investigate the nature of its backoff mechanism and show that it emerges from first principles, rather than being an assumption of the model. We do this by generalizing the Good-Turing estimator to the contextual setting. This exploration leads us to a powerful generalization of Kneser-Ney, which we conjecture to have even stronger competitive properties. Empirically, it significantly improves performance on language modeling, even matching feed-forward neural models. To show that the mechanisms at play are not restricted to language modeling, we demonstrate similar gains on the task of predicting attack types in the Global Terrorism Database.} }
Endnote
%0 Conference Paper %T Towards Competitive N-gram Smoothing %A Moein Falahatgar %A Mesrob Ohannessian %A Alon Orlitsky %A Venkatadheeraj Pichapati %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-falahatgar20a %I PMLR %P 4206--4215 %U http://proceedings.mlr.press/v108/falahatgar20a.html %V 108 %X N-gram models remain a fundamental component of language modeling. In data-scarce regimes, they are a strong alternative to neural models. Even when not used as-is, recent work shows they can regularize neural models. Despite this success, the effectiveness of one of the best N-gram smoothing methods, the one suggested by Kneser and Ney (1995), is not fully understood. In the hopes of explaining this performance, we study it through the lens of competitive distribution estimation: the ability to perform as well as an oracle aware of further structure in the data. We first establish basic competitive properties of Kneser-Ney smoothing. We then investigate the nature of its backoff mechanism and show that it emerges from first principles, rather than being an assumption of the model. We do this by generalizing the Good-Turing estimator to the contextual setting. This exploration leads us to a powerful generalization of Kneser-Ney, which we conjecture to have even stronger competitive properties. Empirically, it significantly improves performance on language modeling, even matching feed-forward neural models. To show that the mechanisms at play are not restricted to language modeling, we demonstrate similar gains on the task of predicting attack types in the Global Terrorism Database.
APA
Falahatgar, M., Ohannessian, M., Orlitsky, A. & Pichapati, V.. (2020). Towards Competitive N-gram Smoothing. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:4206-4215 Available from http://proceedings.mlr.press/v108/falahatgar20a.html .

Related Material