[edit]
A Non-Parametric EM-Style Algorithm for Imputing Missing Values
Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, PMLR R3:35-40, 2001.
Abstract
We present an iterative non-parametric algorithm for imputing missing values. The algorithm is similar to EM except that it uses non-parametric models such as k-nearest neighbor or kernel regression instead of the parametric models used with EM. An interesting feature of the algorithm is that the E and M steps collapse into a single step because the data being filled in is the model - updating the filled-in values updates the model at the same time. The main advantages of this approach compared to parametric EM methods are that: 1) it is more efficient for moderate size data sets, and 2) it is less susceptible to errors that parametric methods make when the parametric models do not fit the data well. The robustness to model failure makes the non-parametric method more accurate when models of the data are not known apriori and cannot be determined reliably. We evaluate the method using a real medical data set that has many missing values.