Robust Variational Autoencoders for Outlier Detection and Repair of Mixed-Type Data

Simao Eduardo, Alfredo Nazabal, Christopher K. I. Williams, Charles Sutton
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:4056-4066, 2020.

Abstract

We focus on the problem of unsupervised cell outlier detection and repair inmixed-type tabular data. Traditional methods are concerned only with detecting which rows in the dataset areoutliers. However, identifying which cells are corrupted in aspecific row is an important problem in practice, and the very first steptowards repairing them. We introduce the Robust VariationalAutoencoder (RVAE), a deep generative model that learns the jointdistribution of the clean data while identifying the outlier cells, allowing their imputation (repair). RVAE explicitly learns the probability of each cell being an outlier, balancing differentlikelihood models in the row outlier score, making the method suitablefor outlier detection in mixed-type datasets.We show experimentallythat not only RVAE performs better than several state-of-the-art methods incell outlier detection and repair for tabular data, but also that is robust against theinitial hyper-parameter selection.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-eduardo20a, title = {Robust Variational Autoencoders for Outlier Detection and Repair of Mixed-Type Data}, author = {Eduardo, Simao and Nazabal, Alfredo and Williams, Christopher K. I. and Sutton, Charles}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {4056--4066}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/eduardo20a/eduardo20a.pdf}, url = {https://proceedings.mlr.press/v108/eduardo20a.html}, abstract = {We focus on the problem of unsupervised cell outlier detection and repair inmixed-type tabular data. Traditional methods are concerned only with detecting which rows in the dataset areoutliers. However, identifying which cells are corrupted in aspecific row is an important problem in practice, and the very first steptowards repairing them. We introduce the Robust VariationalAutoencoder (RVAE), a deep generative model that learns the jointdistribution of the clean data while identifying the outlier cells, allowing their imputation (repair). RVAE explicitly learns the probability of each cell being an outlier, balancing differentlikelihood models in the row outlier score, making the method suitablefor outlier detection in mixed-type datasets.We show experimentallythat not only RVAE performs better than several state-of-the-art methods incell outlier detection and repair for tabular data, but also that is robust against theinitial hyper-parameter selection.} }
Endnote
%0 Conference Paper %T Robust Variational Autoencoders for Outlier Detection and Repair of Mixed-Type Data %A Simao Eduardo %A Alfredo Nazabal %A Christopher K. I. Williams %A Charles Sutton %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-eduardo20a %I PMLR %P 4056--4066 %U https://proceedings.mlr.press/v108/eduardo20a.html %V 108 %X We focus on the problem of unsupervised cell outlier detection and repair inmixed-type tabular data. Traditional methods are concerned only with detecting which rows in the dataset areoutliers. However, identifying which cells are corrupted in aspecific row is an important problem in practice, and the very first steptowards repairing them. We introduce the Robust VariationalAutoencoder (RVAE), a deep generative model that learns the jointdistribution of the clean data while identifying the outlier cells, allowing their imputation (repair). RVAE explicitly learns the probability of each cell being an outlier, balancing differentlikelihood models in the row outlier score, making the method suitablefor outlier detection in mixed-type datasets.We show experimentallythat not only RVAE performs better than several state-of-the-art methods incell outlier detection and repair for tabular data, but also that is robust against theinitial hyper-parameter selection.
APA
Eduardo, S., Nazabal, A., Williams, C.K.I. & Sutton, C.. (2020). Robust Variational Autoencoders for Outlier Detection and Repair of Mixed-Type Data. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:4056-4066 Available from https://proceedings.mlr.press/v108/eduardo20a.html.

Related Material