Treelets | A Tool for Dimensionality Reduction and Multi-Scale Analysis of Unstructured Data

Ann B. Lee, Boaz Nadler
Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, PMLR 2:259-266, 2007.

Abstract

In many modern data mining applications, such as analysis of gene expression or worddocument data sets, the data is highdimensional with hundreds or even thousands of variables, unstructured with no specific order of the original variables, and noisy. Despite the high dimensionality, the data is typically redundant with underlying structures that can be represented by only a few features. In such settings and specifically when the number of variables is much larger than the sample size, standard global methods may not perform well for common learning tasks such as classification, regression and clustering. In this paper, we present treelets – a new tool for multi-resolution analysis that extends wavelets on smooth signals to general unstructured data sets. By construction, treelets provide an orthogonal basis that reflects the internal structure of the data. In addition, treelets can be useful for feature selection and dimensionality reduction prior to learning. We give a theoretical analysis of our algorithm for a linear mixture model, and present a variety of situations where treelets outperform classical principal component analysis, as well as variable selection schemes such as supervised (sparse) PCA.

Cite this Paper


BibTeX
@InProceedings{pmlr-v2-lee07a, title = {Treelets | A Tool for Dimensionality Reduction and Multi-Scale Analysis of Unstructured Data}, author = {Lee, Ann B. and Nadler, Boaz}, booktitle = {Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics}, pages = {259--266}, year = {2007}, editor = {Meila, Marina and Shen, Xiaotong}, volume = {2}, series = {Proceedings of Machine Learning Research}, address = {San Juan, Puerto Rico}, month = {21--24 Mar}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v2/lee07a/lee07a.pdf}, url = {https://proceedings.mlr.press/v2/lee07a.html}, abstract = {In many modern data mining applications, such as analysis of gene expression or worddocument data sets, the data is highdimensional with hundreds or even thousands of variables, unstructured with no specific order of the original variables, and noisy. Despite the high dimensionality, the data is typically redundant with underlying structures that can be represented by only a few features. In such settings and specifically when the number of variables is much larger than the sample size, standard global methods may not perform well for common learning tasks such as classification, regression and clustering. In this paper, we present treelets – a new tool for multi-resolution analysis that extends wavelets on smooth signals to general unstructured data sets. By construction, treelets provide an orthogonal basis that reflects the internal structure of the data. In addition, treelets can be useful for feature selection and dimensionality reduction prior to learning. We give a theoretical analysis of our algorithm for a linear mixture model, and present a variety of situations where treelets outperform classical principal component analysis, as well as variable selection schemes such as supervised (sparse) PCA.} }
Endnote
%0 Conference Paper %T Treelets | A Tool for Dimensionality Reduction and Multi-Scale Analysis of Unstructured Data %A Ann B. Lee %A Boaz Nadler %B Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2007 %E Marina Meila %E Xiaotong Shen %F pmlr-v2-lee07a %I PMLR %P 259--266 %U https://proceedings.mlr.press/v2/lee07a.html %V 2 %X In many modern data mining applications, such as analysis of gene expression or worddocument data sets, the data is highdimensional with hundreds or even thousands of variables, unstructured with no specific order of the original variables, and noisy. Despite the high dimensionality, the data is typically redundant with underlying structures that can be represented by only a few features. In such settings and specifically when the number of variables is much larger than the sample size, standard global methods may not perform well for common learning tasks such as classification, regression and clustering. In this paper, we present treelets – a new tool for multi-resolution analysis that extends wavelets on smooth signals to general unstructured data sets. By construction, treelets provide an orthogonal basis that reflects the internal structure of the data. In addition, treelets can be useful for feature selection and dimensionality reduction prior to learning. We give a theoretical analysis of our algorithm for a linear mixture model, and present a variety of situations where treelets outperform classical principal component analysis, as well as variable selection schemes such as supervised (sparse) PCA.
RIS
TY - CPAPER TI - Treelets | A Tool for Dimensionality Reduction and Multi-Scale Analysis of Unstructured Data AU - Ann B. Lee AU - Boaz Nadler BT - Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics DA - 2007/03/11 ED - Marina Meila ED - Xiaotong Shen ID - pmlr-v2-lee07a PB - PMLR DP - Proceedings of Machine Learning Research VL - 2 SP - 259 EP - 266 L1 - http://proceedings.mlr.press/v2/lee07a/lee07a.pdf UR - https://proceedings.mlr.press/v2/lee07a.html AB - In many modern data mining applications, such as analysis of gene expression or worddocument data sets, the data is highdimensional with hundreds or even thousands of variables, unstructured with no specific order of the original variables, and noisy. Despite the high dimensionality, the data is typically redundant with underlying structures that can be represented by only a few features. In such settings and specifically when the number of variables is much larger than the sample size, standard global methods may not perform well for common learning tasks such as classification, regression and clustering. In this paper, we present treelets – a new tool for multi-resolution analysis that extends wavelets on smooth signals to general unstructured data sets. By construction, treelets provide an orthogonal basis that reflects the internal structure of the data. In addition, treelets can be useful for feature selection and dimensionality reduction prior to learning. We give a theoretical analysis of our algorithm for a linear mixture model, and present a variety of situations where treelets outperform classical principal component analysis, as well as variable selection schemes such as supervised (sparse) PCA. ER -
APA
Lee, A.B. & Nadler, B.. (2007). Treelets | A Tool for Dimensionality Reduction and Multi-Scale Analysis of Unstructured Data. Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 2:259-266 Available from https://proceedings.mlr.press/v2/lee07a.html.

Related Material