Treelets | A Tool for Dimensionality Reduction and Multi-Scale Analysis of Unstructured Data

Ann B. Lee; Boaz Nadler

Treelets | A Tool for Dimensionality Reduction and Multi-Scale Analysis of Unstructured Data

Ann B. Lee, Boaz Nadler

Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, PMLR 2:259-266, 2007.

Abstract

In many modern data mining applications, such as analysis of gene expression or word-document data sets, the data is high-dimensional with hundreds or even thousands of variables, unstructured with no specific order of the original variables, and noisy. Despite the high dimensionality, the data is typically redundant with underlying structures that can be represented by only a few features. In such settings and specifically when the number of variables is much larger than the sample size, standard global methods may not perform well for common learning tasks such as classification, regression and clustering. In this paper, we present treelets – a new tool for multi-resolution analysis that extends wavelets on smooth signals to general unstructured data sets. By construction, treelets provide an orthogonal basis that reflects the internal structure of the data. In addition, treelets can be useful for feature selection and dimensionality reduction prior to learning. We give a theoretical analysis of our algorithm for a linear mixture model, and present a variety of situations where treelets outperform classical principal component analysis, as well as variable selection schemes such as supervised (sparse) PCA.

Cite this Paper

BibTeX

@InProceedings{pmlr-v2-lee07a,
  title = 	 {Treelets | A Tool for Dimensionality Reduction and Multi-Scale Analysis of Unstructured Data},
  author = 	 {Lee, Ann B. and Nadler, Boaz},
  booktitle = 	 {Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics},
  pages = 	 {259--266},
  year = 	 {2007},
  editor = 	 {Meila, Marina and Shen, Xiaotong},
  volume = 	 {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {San Juan, Puerto Rico},
  month = 	 {21--24 Mar},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v2/lee07a/lee07a.pdf},
  url = 	 {https://proceedings.mlr.press/v2/lee07a.html},
  abstract = 	 {In many modern data mining applications, such as analysis of gene expression or word-document data sets, the data is high-dimensional with hundreds or even thousands of variables, unstructured with no specific order of the original variables, and noisy. Despite the high dimensionality, the data is typically redundant with underlying structures that can be represented by only a few features. In such settings and specifically when the number of variables is much larger than the sample size, standard global methods may not perform well for common learning tasks such as classification, regression and clustering. In this paper, we present treelets – a new tool for multi-resolution analysis that extends wavelets on smooth signals to general unstructured data sets. By construction, treelets provide an orthogonal basis that reflects the internal structure of the data. In addition, treelets can be useful for feature selection and dimensionality reduction prior to learning. We give a theoretical analysis of our algorithm for a linear mixture model, and present a variety of situations where treelets outperform classical principal component analysis, as well as variable selection schemes such as supervised (sparse) PCA.}
}

Endnote

%0 Conference Paper
%T Treelets | A Tool for Dimensionality Reduction and Multi-Scale Analysis of Unstructured Data
%A Ann B. Lee
%A Boaz Nadler
%B Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2007
%E Marina Meila
%E Xiaotong Shen	
%F pmlr-v2-lee07a
%I PMLR
%P 259--266
%U https://proceedings.mlr.press/v2/lee07a.html
%V 2
%X In many modern data mining applications, such as analysis of gene expression or word-document data sets, the data is high-dimensional with hundreds or even thousands of variables, unstructured with no specific order of the original variables, and noisy. Despite the high dimensionality, the data is typically redundant with underlying structures that can be represented by only a few features. In such settings and specifically when the number of variables is much larger than the sample size, standard global methods may not perform well for common learning tasks such as classification, regression and clustering. In this paper, we present treelets – a new tool for multi-resolution analysis that extends wavelets on smooth signals to general unstructured data sets. By construction, treelets provide an orthogonal basis that reflects the internal structure of the data. In addition, treelets can be useful for feature selection and dimensionality reduction prior to learning. We give a theoretical analysis of our algorithm for a linear mixture model, and present a variety of situations where treelets outperform classical principal component analysis, as well as variable selection schemes such as supervised (sparse) PCA.

RIS

TY  - CPAPER
TI  - Treelets | A Tool for Dimensionality Reduction and Multi-Scale Analysis of Unstructured Data
AU  - Ann B. Lee
AU  - Boaz Nadler
BT  - Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics
DA  - 2007/03/11
ED  - Marina Meila
ED  - Xiaotong Shen	
ID  - pmlr-v2-lee07a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 2
SP  - 259
EP  - 266
L1  - http://proceedings.mlr.press/v2/lee07a/lee07a.pdf
UR  - https://proceedings.mlr.press/v2/lee07a.html
AB  - In many modern data mining applications, such as analysis of gene expression or word-document data sets, the data is high-dimensional with hundreds or even thousands of variables, unstructured with no specific order of the original variables, and noisy. Despite the high dimensionality, the data is typically redundant with underlying structures that can be represented by only a few features. In such settings and specifically when the number of variables is much larger than the sample size, standard global methods may not perform well for common learning tasks such as classification, regression and clustering. In this paper, we present treelets – a new tool for multi-resolution analysis that extends wavelets on smooth signals to general unstructured data sets. By construction, treelets provide an orthogonal basis that reflects the internal structure of the data. In addition, treelets can be useful for feature selection and dimensionality reduction prior to learning. We give a theoretical analysis of our algorithm for a linear mixture model, and present a variety of situations where treelets outperform classical principal component analysis, as well as variable selection schemes such as supervised (sparse) PCA.
ER  -

APA

Lee, A.B. & Nadler, B.. (2007). Treelets | A Tool for Dimensionality Reduction and Multi-Scale Analysis of Unstructured Data. Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 2:259-266 Available from https://proceedings.mlr.press/v2/lee07a.html.

Related Material

Download PDF