Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
; Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, PMLR 2:283-290, 2007.
We consider the problem of estimating the joint density of a d-dimensional random vector X = (X_1 , X_2, ..., X_d ) when d is large. We assume that the density is a product of a parametric component and a nonparametric component which depends on an unknown subset of the variables. Using a modification of a recently developed nonparametric regression framework called rodeo (regularization of derivative expectation operator), we propose a method to greedily select bandwidths in a kernel density estimate. It is shown empirically that the density rodeo works well even for very high dimensional problems. When the unknown density function satisfies a suitably defined sparsity condition, and the parametric baseline density is smooth, the approach is shown to achieve near optimal minimax rates of convergence, and thus avoids the curse of dimensionality.