Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Han Liu; John Lafferty; Larry Wasserman

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Han Liu, John Lafferty, Larry Wasserman

Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, PMLR 2:283-290, 2007.

Abstract

We consider the problem of estimating the joint density of a $d$-dimensional random vector $X = (X_1 , X_2, ..., X_d )$ when d is large. We assume that the density is a product of a parametric component and a nonparametric component which depends on an unknown subset of the variables. Using a modification of a recently developed nonparametric regression framework called rodeo (regularization of derivative expectation operator), we propose a method to greedily select bandwidths in a kernel density estimate. It is shown empirically that the density rodeo works well even for very high dimensional problems. When the unknown density function satisfies a suitably defined sparsity condition, and the parametric baseline density is smooth, the approach is shown to achieve near optimal minimax rates of convergence, and thus avoids the curse of dimensionality.

Cite this Paper

BibTeX

@InProceedings{pmlr-v2-liu07a,
  title = 	 {Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo},
  author = 	 {Liu, Han and Lafferty, John and Wasserman, Larry},
  booktitle = 	 {Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics},
  pages = 	 {283--290},
  year = 	 {2007},
  editor = 	 {Meila, Marina and Shen, Xiaotong},
  volume = 	 {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {San Juan, Puerto Rico},
  month = 	 {21--24 Mar},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v2/liu07a/liu07a.pdf},
  url = 	 {https://proceedings.mlr.press/v2/liu07a.html},
  abstract = 	 {We consider the problem of estimating the joint density of a $d$-dimensional random vector $X = (X_1 , X_2, ..., X_d )$ when d is large. We assume that the density is a product of a parametric component and a nonparametric component which depends on an unknown subset of the variables. Using a modification of a recently developed nonparametric regression framework called rodeo (regularization of derivative expectation operator), we propose a method to greedily select bandwidths in a kernel density estimate. It is shown empirically that the density rodeo works well even for very high dimensional problems. When the unknown density function satisfies a suitably defined sparsity condition, and the parametric baseline density is smooth, the approach is shown to achieve near optimal minimax rates of convergence, and thus avoids the curse of dimensionality.}
}

Endnote

%0 Conference Paper
%T Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
%A Han Liu
%A John Lafferty
%A Larry Wasserman
%B Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2007
%E Marina Meila
%E Xiaotong Shen	
%F pmlr-v2-liu07a
%I PMLR
%P 283--290
%U https://proceedings.mlr.press/v2/liu07a.html
%V 2
%X We consider the problem of estimating the joint density of a $d$-dimensional random vector $X = (X_1 , X_2, ..., X_d )$ when d is large. We assume that the density is a product of a parametric component and a nonparametric component which depends on an unknown subset of the variables. Using a modification of a recently developed nonparametric regression framework called rodeo (regularization of derivative expectation operator), we propose a method to greedily select bandwidths in a kernel density estimate. It is shown empirically that the density rodeo works well even for very high dimensional problems. When the unknown density function satisfies a suitably defined sparsity condition, and the parametric baseline density is smooth, the approach is shown to achieve near optimal minimax rates of convergence, and thus avoids the curse of dimensionality.

RIS

TY  - CPAPER
TI  - Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
AU  - Han Liu
AU  - John Lafferty
AU  - Larry Wasserman
BT  - Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics
DA  - 2007/03/11
ED  - Marina Meila
ED  - Xiaotong Shen	
ID  - pmlr-v2-liu07a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 2
SP  - 283
EP  - 290
L1  - http://proceedings.mlr.press/v2/liu07a/liu07a.pdf
UR  - https://proceedings.mlr.press/v2/liu07a.html
AB  - We consider the problem of estimating the joint density of a $d$-dimensional random vector $X = (X_1 , X_2, ..., X_d )$ when d is large. We assume that the density is a product of a parametric component and a nonparametric component which depends on an unknown subset of the variables. Using a modification of a recently developed nonparametric regression framework called rodeo (regularization of derivative expectation operator), we propose a method to greedily select bandwidths in a kernel density estimate. It is shown empirically that the density rodeo works well even for very high dimensional problems. When the unknown density function satisfies a suitably defined sparsity condition, and the parametric baseline density is smooth, the approach is shown to achieve near optimal minimax rates of convergence, and thus avoids the curse of dimensionality.
ER  -

APA

Liu, H., Lafferty, J. & Wasserman, L.. (2007). Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo. Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 2:283-290 Available from https://proceedings.mlr.press/v2/liu07a.html.

Related Material

Download PDF