Bayesian Torrent Classification by File Name and Size Only

Eugene Dementiev, Norman Fenton
Proceedings of the Eighth International Conference on Probabilistic Graphical Models, PMLR 52:136-146, 2016.

Abstract

Torrent traffic, much of which is assumed to be illegal downloads of copyrighted content, accounts for up to 35% of internet downloads. Yet, the process of classification and identification of these downloads is unclear, and original data for such studies is often unavailable. Many torrent items lack supporting description or meta-data, in which case only file name and size are available. We describe a novel Bayesian network based classifier system that predicts medium category, pornographic content and risk of fakes and malware based on torrent name and size, optionally supplemented with external databases of titles and actors. We show that our method outperforms a commercial benchmark system and has the potential to rival human classifiers.

Cite this Paper


BibTeX
@InProceedings{pmlr-v52-dementiev16, title = {{B}ayesian Torrent Classification by File Name and Size Only}, author = {Dementiev, Eugene and Fenton, Norman}, booktitle = {Proceedings of the Eighth International Conference on Probabilistic Graphical Models}, pages = {136--146}, year = {2016}, editor = {Antonucci, Alessandro and Corani, Giorgio and Campos}, Cassio Polpo}, volume = {52}, series = {Proceedings of Machine Learning Research}, address = {Lugano, Switzerland}, month = {06--09 Sep}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v52/dementiev16.pdf}, url = {https://proceedings.mlr.press/v52/dementiev16.html}, abstract = {Torrent traffic, much of which is assumed to be illegal downloads of copyrighted content, accounts for up to 35% of internet downloads. Yet, the process of classification and identification of these downloads is unclear, and original data for such studies is often unavailable. Many torrent items lack supporting description or meta-data, in which case only file name and size are available. We describe a novel Bayesian network based classifier system that predicts medium category, pornographic content and risk of fakes and malware based on torrent name and size, optionally supplemented with external databases of titles and actors. We show that our method outperforms a commercial benchmark system and has the potential to rival human classifiers.} }
Endnote
%0 Conference Paper %T Bayesian Torrent Classification by File Name and Size Only %A Eugene Dementiev %A Norman Fenton %B Proceedings of the Eighth International Conference on Probabilistic Graphical Models %C Proceedings of Machine Learning Research %D 2016 %E Alessandro Antonucci %E Giorgio Corani %E Cassio Polpo Campos} %F pmlr-v52-dementiev16 %I PMLR %P 136--146 %U https://proceedings.mlr.press/v52/dementiev16.html %V 52 %X Torrent traffic, much of which is assumed to be illegal downloads of copyrighted content, accounts for up to 35% of internet downloads. Yet, the process of classification and identification of these downloads is unclear, and original data for such studies is often unavailable. Many torrent items lack supporting description or meta-data, in which case only file name and size are available. We describe a novel Bayesian network based classifier system that predicts medium category, pornographic content and risk of fakes and malware based on torrent name and size, optionally supplemented with external databases of titles and actors. We show that our method outperforms a commercial benchmark system and has the potential to rival human classifiers.
RIS
TY - CPAPER TI - Bayesian Torrent Classification by File Name and Size Only AU - Eugene Dementiev AU - Norman Fenton BT - Proceedings of the Eighth International Conference on Probabilistic Graphical Models DA - 2016/08/15 ED - Alessandro Antonucci ED - Giorgio Corani ED - Cassio Polpo Campos} ID - pmlr-v52-dementiev16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 52 SP - 136 EP - 146 L1 - http://proceedings.mlr.press/v52/dementiev16.pdf UR - https://proceedings.mlr.press/v52/dementiev16.html AB - Torrent traffic, much of which is assumed to be illegal downloads of copyrighted content, accounts for up to 35% of internet downloads. Yet, the process of classification and identification of these downloads is unclear, and original data for such studies is often unavailable. Many torrent items lack supporting description or meta-data, in which case only file name and size are available. We describe a novel Bayesian network based classifier system that predicts medium category, pornographic content and risk of fakes and malware based on torrent name and size, optionally supplemented with external databases of titles and actors. We show that our method outperforms a commercial benchmark system and has the potential to rival human classifiers. ER -
APA
Dementiev, E. & Fenton, N.. (2016). Bayesian Torrent Classification by File Name and Size Only. Proceedings of the Eighth International Conference on Probabilistic Graphical Models, in Proceedings of Machine Learning Research 52:136-146 Available from https://proceedings.mlr.press/v52/dementiev16.html.

Related Material