Interactive Anomaly Detection in Mixed Tabular Data using Bayesian Networks

Evan Dufraisse, Philippe Leray, Raphaël Nedellec, Tarek Benkhelif
Proceedings of the 10th International Conference on Probabilistic Graphical Models, PMLR 138:185-196, 2020.

Abstract

The last decades improvements in processing abilities have quickly led to an increasing use of data analyses implying massive data-sets. To retrieve insightful information from any data driven approach, a pivotal aspect to ensure is good data quality. Manual correction of massive data-sets requires tremendous efforts, is prone to errors, and results being really costly. If knowledge in a specific field can often allow the development of efficient models for anomaly detection and data correction, this knowledge can sometimes be unavailable and a more generic approach should be found. This paper presents a novel approach to anomaly detection and correction in mixed tabular data using Bayesian Networks. We present an algorithm for detecting anomalies and offering correction hints based on Jensen scores computed within the Markov Blankets of considered variables. We also discuss the incremental corrections of detection model using user’s feedback, as well as additional aspects related to discretization in mixed data and its effects on detection efficiency. Finally we also discuss how functional dependencies can be managed to detect errors while improving faithfulness and computation speed.

Cite this Paper


BibTeX
@InProceedings{pmlr-v138-dufraisse20a, title = {Interactive Anomaly Detection in Mixed Tabular Data using Bayesian Networks}, author = {Dufraisse, Evan and Leray, Philippe and Nedellec, Rapha\"el and Benkhelif, Tarek}, booktitle = {Proceedings of the 10th International Conference on Probabilistic Graphical Models}, pages = {185--196}, year = {2020}, editor = {Jaeger, Manfred and Nielsen, Thomas Dyhre}, volume = {138}, series = {Proceedings of Machine Learning Research}, month = {23--25 Sep}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v138/dufraisse20a/dufraisse20a.pdf}, url = {http://proceedings.mlr.press/v138/dufraisse20a.html}, abstract = {The last decades improvements in processing abilities have quickly led to an increasing use of data analyses implying massive data-sets. To retrieve insightful information from any data driven approach, a pivotal aspect to ensure is good data quality. Manual correction of massive data-sets requires tremendous efforts, is prone to errors, and results being really costly. If knowledge in a specific field can often allow the development of efficient models for anomaly detection and data correction, this knowledge can sometimes be unavailable and a more generic approach should be found. This paper presents a novel approach to anomaly detection and correction in mixed tabular data using Bayesian Networks. We present an algorithm for detecting anomalies and offering correction hints based on Jensen scores computed within the Markov Blankets of considered variables. We also discuss the incremental corrections of detection model using user’s feedback, as well as additional aspects related to discretization in mixed data and its effects on detection efficiency. Finally we also discuss how functional dependencies can be managed to detect errors while improving faithfulness and computation speed. } }
Endnote
%0 Conference Paper %T Interactive Anomaly Detection in Mixed Tabular Data using Bayesian Networks %A Evan Dufraisse %A Philippe Leray %A Raphaël Nedellec %A Tarek Benkhelif %B Proceedings of the 10th International Conference on Probabilistic Graphical Models %C Proceedings of Machine Learning Research %D 2020 %E Manfred Jaeger %E Thomas Dyhre Nielsen %F pmlr-v138-dufraisse20a %I PMLR %P 185--196 %U http://proceedings.mlr.press/v138/dufraisse20a.html %V 138 %X The last decades improvements in processing abilities have quickly led to an increasing use of data analyses implying massive data-sets. To retrieve insightful information from any data driven approach, a pivotal aspect to ensure is good data quality. Manual correction of massive data-sets requires tremendous efforts, is prone to errors, and results being really costly. If knowledge in a specific field can often allow the development of efficient models for anomaly detection and data correction, this knowledge can sometimes be unavailable and a more generic approach should be found. This paper presents a novel approach to anomaly detection and correction in mixed tabular data using Bayesian Networks. We present an algorithm for detecting anomalies and offering correction hints based on Jensen scores computed within the Markov Blankets of considered variables. We also discuss the incremental corrections of detection model using user’s feedback, as well as additional aspects related to discretization in mixed data and its effects on detection efficiency. Finally we also discuss how functional dependencies can be managed to detect errors while improving faithfulness and computation speed.
APA
Dufraisse, E., Leray, P., Nedellec, R. & Benkhelif, T.. (2020). Interactive Anomaly Detection in Mixed Tabular Data using Bayesian Networks. Proceedings of the 10th International Conference on Probabilistic Graphical Models, in Proceedings of Machine Learning Research 138:185-196 Available from http://proceedings.mlr.press/v138/dufraisse20a.html.

Related Material