On the Naive Bayes Model for Text Categorization

Susana Eyheramendy, David D. Lewis, David Madigan
Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, PMLR R4:93-100, 2003.

Abstract

This paper empirically compares the performance of four probabilistic models for text classification - Poisson, Bernoulli, Multinomial and Negative Binomial. We examine the "naive Bayes" assumption in the four models and show that the multinomial model is a modified naive Bayes Poisson model that assumes independence of document length and document class. Despite the fact that this last assumption might not be correct in many situations, we find that, in general, relaxing it does not change the performance of the classifier. Finally we propose and evaluate an ad-hoc method for incorporating document length.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR4-eyheramendy03a, title = {On the Naive Bayes Model for Text Categorization}, author = {Eyheramendy, Susana and Lewis, David D. and Madigan, David}, booktitle = {Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics}, pages = {93--100}, year = {2003}, editor = {Bishop, Christopher M. and Frey, Brendan J.}, volume = {R4}, series = {Proceedings of Machine Learning Research}, month = {03--06 Jan}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/r4/eyheramendy03a/eyheramendy03a.pdf}, url = {https://proceedings.mlr.press/r4/eyheramendy03a.html}, abstract = {This paper empirically compares the performance of four probabilistic models for text classification - Poisson, Bernoulli, Multinomial and Negative Binomial. We examine the "naive Bayes" assumption in the four models and show that the multinomial model is a modified naive Bayes Poisson model that assumes independence of document length and document class. Despite the fact that this last assumption might not be correct in many situations, we find that, in general, relaxing it does not change the performance of the classifier. Finally we propose and evaluate an ad-hoc method for incorporating document length.}, note = {Reissued by PMLR on 01 April 2021.} }
Endnote
%0 Conference Paper %T On the Naive Bayes Model for Text Categorization %A Susana Eyheramendy %A David D. Lewis %A David Madigan %B Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2003 %E Christopher M. Bishop %E Brendan J. Frey %F pmlr-vR4-eyheramendy03a %I PMLR %P 93--100 %U https://proceedings.mlr.press/r4/eyheramendy03a.html %V R4 %X This paper empirically compares the performance of four probabilistic models for text classification - Poisson, Bernoulli, Multinomial and Negative Binomial. We examine the "naive Bayes" assumption in the four models and show that the multinomial model is a modified naive Bayes Poisson model that assumes independence of document length and document class. Despite the fact that this last assumption might not be correct in many situations, we find that, in general, relaxing it does not change the performance of the classifier. Finally we propose and evaluate an ad-hoc method for incorporating document length. %Z Reissued by PMLR on 01 April 2021.
APA
Eyheramendy, S., Lewis, D.D. & Madigan, D.. (2003). On the Naive Bayes Model for Text Categorization. Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R4:93-100 Available from https://proceedings.mlr.press/r4/eyheramendy03a.html. Reissued by PMLR on 01 April 2021.

Related Material