[edit]
Applying a Gaussian-Bernoulli Mixture Model Network to Binary and Continuous Missing Data in Medicine
Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, PMLR R1:429-436, 1997.
Abstract
We wish to train a feedforward projective-sigmoidal neural network (MLP) on breast cancer outcomes data missing both binary and continuous input variable values. A Gaussian-Bernoulli mixture model is trained on the data (using EM). It then performs stochastic imputation (filling in) of the missing values, as a preprocessor to the MLP. In order to compare predictive accuracy when the training data are complete vs. incomplete/imputed, we use only complete cases from a natural data set, but artificially remove 80% of their input data values. Very little difference is observed in the comparison, suggesting that the mixture model is quite effective here, despite the fact that more than 99% of the casesfmstances had had some missing value(s). The mixture model can be used both for output/outcome prediction by a trained MLP and for the training process itself.