Applying a Gaussian-Bernoulli Mixture Model Network to Binary and Continuous Missing Data in Medicine

David B. Rosen, Harry B. Burke
Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, PMLR R1:429-436, 1997.

Abstract

We wish to train a feedforward projective-sigmoidal neural network (MLP) on breast cancer outcomes data missing both binary and continuous input variable values. A Gaussian-Bernoulli mixture model is trained on the data (using EM). It then performs stochastic imputation (filling in) of the missing values, as a preprocessor to the MLP. In order to compare predictive accuracy when the training data are complete vs. incomplete/imputed, we use only complete cases from a natural data set, but artificially remove 80% of their input data values. Very little difference is observed in the comparison, suggesting that the mixture model is quite effective here, despite the fact that more than 99% of the casesfmstances had had some missing value(s). The mixture model can be used both for output/outcome prediction by a trained MLP and for the training process itself.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR1-rosen97a, title = {Applying a {G}aussian-{B}ernoulli Mixture Model Network to Binary and Continuous Missing Data in Medicine}, author = {Rosen, David B. and Burke, Harry B.}, booktitle = {Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics}, pages = {429--436}, year = {1997}, editor = {Madigan, David and Smyth, Padhraic}, volume = {R1}, series = {Proceedings of Machine Learning Research}, month = {04--07 Jan}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/r1/rosen97a/rosen97a.pdf}, url = {https://proceedings.mlr.press/r1/rosen97a.html}, abstract = {We wish to train a feedforward projective-sigmoidal neural network (MLP) on breast cancer outcomes data missing both binary and continuous input variable values. A Gaussian-Bernoulli mixture model is trained on the data (using EM). It then performs stochastic imputation (filling in) of the missing values, as a preprocessor to the MLP. In order to compare predictive accuracy when the training data are complete vs. incomplete/imputed, we use only complete cases from a natural data set, but artificially remove 80% of their input data values. Very little difference is observed in the comparison, suggesting that the mixture model is quite effective here, despite the fact that more than 99% of the casesfmstances had had some missing value(s). The mixture model can be used both for output/outcome prediction by a trained MLP and for the training process itself. }, note = {Reissued by PMLR on 30 March 2021.} }
Endnote
%0 Conference Paper %T Applying a Gaussian-Bernoulli Mixture Model Network to Binary and Continuous Missing Data in Medicine %A David B. Rosen %A Harry B. Burke %B Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 1997 %E David Madigan %E Padhraic Smyth %F pmlr-vR1-rosen97a %I PMLR %P 429--436 %U https://proceedings.mlr.press/r1/rosen97a.html %V R1 %X We wish to train a feedforward projective-sigmoidal neural network (MLP) on breast cancer outcomes data missing both binary and continuous input variable values. A Gaussian-Bernoulli mixture model is trained on the data (using EM). It then performs stochastic imputation (filling in) of the missing values, as a preprocessor to the MLP. In order to compare predictive accuracy when the training data are complete vs. incomplete/imputed, we use only complete cases from a natural data set, but artificially remove 80% of their input data values. Very little difference is observed in the comparison, suggesting that the mixture model is quite effective here, despite the fact that more than 99% of the casesfmstances had had some missing value(s). The mixture model can be used both for output/outcome prediction by a trained MLP and for the training process itself. %Z Reissued by PMLR on 30 March 2021.
APA
Rosen, D.B. & Burke, H.B.. (1997). Applying a Gaussian-Bernoulli Mixture Model Network to Binary and Continuous Missing Data in Medicine. Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R1:429-436 Available from https://proceedings.mlr.press/r1/rosen97a.html. Reissued by PMLR on 30 March 2021.

Related Material