A Non-Parametric EM-Style Algorithm for Imputing Missing Values

Rich Caruana

A Non-Parametric EM-Style Algorithm for Imputing Missing Values

Rich Caruana

Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, PMLR R3:35-40, 2001.

Abstract

We present an iterative non-parametric algorithm for imputing missing values. The algorithm is similar to EM except that it uses non-parametric models such as k-nearest neighbor or kernel regression instead of the parametric models used with EM. An interesting feature of the algorithm is that the E and M steps collapse into a single step because the data being filled in is the model - updating the filled-in values updates the model at the same time. The main advantages of this approach compared to parametric EM methods are that: 1) it is more efficient for moderate size data sets, and 2) it is less susceptible to errors that parametric methods make when the parametric models do not fit the data well. The robustness to model failure makes the non-parametric method more accurate when models of the data are not known apriori and cannot be determined reliably. We evaluate the method using a real medical data set that has many missing values.

Cite this Paper

BibTeX


@InProceedings{pmlr-vR3-caruana01a,
  title = 	 {A Non-Parametric EM-Style Algorithm for Imputing Missing Values},
  author =       {Caruana, Rich},
  booktitle = 	 {Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics},
  pages = 	 {35--40},
  year = 	 {2001},
  editor = 	 {Richardson, Thomas S. and Jaakkola, Tommi S.},
  volume = 	 {R3},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {04--07 Jan},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/r3/caruana01a/caruana01a.pdf},
  url = 	 {https://proceedings.mlr.press/r3/caruana01a.html},
  abstract = 	 {We present an iterative non-parametric algorithm for imputing missing values. The algorithm is similar to EM except that it uses non-parametric models such as k-nearest neighbor or kernel regression instead of the parametric models used with EM. An interesting feature of the algorithm is that the E and M steps collapse into a single step because the data being filled in is the model - updating the filled-in values updates the model at the same time. The main advantages of this approach compared to parametric EM methods are that: 1) it is more efficient for moderate size data sets, and 2) it is less susceptible to errors that parametric methods make when the parametric models do not fit the data well. The robustness to model failure makes the non-parametric method more accurate when models of the data are not known apriori and cannot be determined reliably. We evaluate the method using a real medical data set that has many missing values.},
  note =         {Reissued by PMLR on 31 March 2021.}
}

Endnote

%0 Conference Paper
%T A Non-Parametric EM-Style Algorithm for Imputing Missing Values
%A Rich Caruana
%B Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2001
%E Thomas S. Richardson
%E Tommi S. Jaakkola	
%F pmlr-vR3-caruana01a
%I PMLR
%P 35--40
%U https://proceedings.mlr.press/r3/caruana01a.html
%V R3
%X We present an iterative non-parametric algorithm for imputing missing values. The algorithm is similar to EM except that it uses non-parametric models such as k-nearest neighbor or kernel regression instead of the parametric models used with EM. An interesting feature of the algorithm is that the E and M steps collapse into a single step because the data being filled in is the model - updating the filled-in values updates the model at the same time. The main advantages of this approach compared to parametric EM methods are that: 1) it is more efficient for moderate size data sets, and 2) it is less susceptible to errors that parametric methods make when the parametric models do not fit the data well. The robustness to model failure makes the non-parametric method more accurate when models of the data are not known apriori and cannot be determined reliably. We evaluate the method using a real medical data set that has many missing values.
%Z Reissued by PMLR on 31 March 2021.

APA


Caruana, R.. (2001). A Non-Parametric EM-Style Algorithm for Imputing Missing Values. Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R3:35-40 Available from https://proceedings.mlr.press/r3/caruana01a.html. Reissued by PMLR on 31 March 2021.

Related Material

Download PDF