The Sample Complexity of Dictionary Learning

Daniel Vainsencher; Shie Mannor; Alfred M. Bruckstein

The Sample Complexity of Dictionary Learning

Daniel Vainsencher, Shie Mannor, Alfred M. Bruckstein

Proceedings of the 24th Annual Conference on Learning Theory, PMLR 19:773-788, 2011.

Abstract

A large set of signals can sometimes be described sparsely using a dictionary, that is, every element can be represented as a linear combination of few elements from the dictionary. Algorithms for various signal processing applications, including classification, denoising and signal separation, learn a dictionary from a given set of signals to be represented. Can we expect that the error in representing by such a dictionary a previously unseen signal from the same source will be of similar magnitude as those for the given examples? We assume signals are generated from a fixed distribution, and study these questions from a statistical learning theory perspective. We develop generalization bounds on the quality of the learned dictionary for two types of constraints on the coefficient selection, as measured by the expected $L_2$ error in representation when the dictionary is used. For the case of $l_1$ regularized coefficient selection we provide a generalization bound of the order of $O\left(\sqrt{np\ln(m λ)/m}\right)$, where $n$ is the dimension, $p$ is the number of elements in the dictionary, λis a bound on the $l_1$ norm of the coefficient vector and m is the number of samples, which complements existing results. For the case of representing a new signal as a combination of at most $k$ dictionary elements, we provide a bound ofthe order $O(\sqrt{np\ln(m k)/m})$ under an assumption on the closeness to orthogonality of the dictionary (low Babel function). We further show that this assumption holds for most dictionaries in high dimensions in a strong probabilistic sense. Our results also include bounds that converge as $1/m$, not previously known for this problem. We provide similar results in a general setting using kernels with weak smoothness requirements.

Cite this Paper

BibTeX


@InProceedings{pmlr-v19-vainsencher11a,
  title = 	 {The Sample Complexity of Dictionary Learning},
  author = 	 {Vainsencher, Daniel and Mannor, Shie and Bruckstein, Alfred M.},
  booktitle = 	 {Proceedings of the 24th Annual Conference on Learning Theory},
  pages = 	 {773--788},
  year = 	 {2011},
  editor = 	 {Kakade, Sham M. and von Luxburg, Ulrike},
  volume = 	 {19},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Budapest, Hungary},
  month = 	 {09--11 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v19/vainsencher11a/vainsencher11a.pdf},
  url = 	 {https://proceedings.mlr.press/v19/vainsencher11a.html},
  abstract = 	 {A large set of signals can sometimes be described sparsely using a dictionary, that is, every element can be represented as a linear combination of few elements from the dictionary.  Algorithms for various signal processing applications, including classification, denoising and signal separation, learn a dictionary from a given set of signals to be represented. Can we expect that the error in representing by such a dictionary a previously unseen signal from the same source will be of similar magnitude as those for the given examples? We assume signals are generated from a fixed distribution, and study these questions from a statistical learning theory perspective. We develop generalization bounds on the quality of the learned dictionary for two types of constraints on the coefficient selection, as measured by the expected $L_2$ error in representation when the dictionary is used. For the case of $l_1$ regularized coefficient selection we provide a generalization bound of the order of $O\left(\sqrt{np\ln(m λ)/m}\right)$, where $n$ is the dimension, $p$ is the number of elements in the dictionary, λis a bound on the $l_1$ norm of the coefficient vector and m is the number of samples, which complements existing results. For the case of representing a new signal as a combination of at most $k$ dictionary elements, we provide a bound ofthe order $O(\sqrt{np\ln(m  k)/m})$ under an assumption on the closeness to orthogonality of the dictionary (low Babel function). We further show that this assumption holds for most dictionaries in high dimensions in a strong probabilistic sense. Our results also include bounds that converge as $1/m$, not previously known for this problem. We provide similar results in a general setting using kernels with weak smoothness requirements.}
}

Endnote

%0 Conference Paper
%T The Sample Complexity of Dictionary Learning
%A Daniel Vainsencher
%A Shie Mannor
%A Alfred M. Bruckstein
%B Proceedings of the 24th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2011
%E Sham M. Kakade
%E Ulrike von Luxburg	
%F pmlr-v19-vainsencher11a
%I PMLR
%P 773--788
%U https://proceedings.mlr.press/v19/vainsencher11a.html
%V 19
%X A large set of signals can sometimes be described sparsely using a dictionary, that is, every element can be represented as a linear combination of few elements from the dictionary.  Algorithms for various signal processing applications, including classification, denoising and signal separation, learn a dictionary from a given set of signals to be represented. Can we expect that the error in representing by such a dictionary a previously unseen signal from the same source will be of similar magnitude as those for the given examples? We assume signals are generated from a fixed distribution, and study these questions from a statistical learning theory perspective. We develop generalization bounds on the quality of the learned dictionary for two types of constraints on the coefficient selection, as measured by the expected $L_2$ error in representation when the dictionary is used. For the case of $l_1$ regularized coefficient selection we provide a generalization bound of the order of $O\left(\sqrt{np\ln(m λ)/m}\right)$, where $n$ is the dimension, $p$ is the number of elements in the dictionary, λis a bound on the $l_1$ norm of the coefficient vector and m is the number of samples, which complements existing results. For the case of representing a new signal as a combination of at most $k$ dictionary elements, we provide a bound ofthe order $O(\sqrt{np\ln(m  k)/m})$ under an assumption on the closeness to orthogonality of the dictionary (low Babel function). We further show that this assumption holds for most dictionaries in high dimensions in a strong probabilistic sense. Our results also include bounds that converge as $1/m$, not previously known for this problem. We provide similar results in a general setting using kernels with weak smoothness requirements.

RIS


TY  - CPAPER
TI  - The Sample Complexity of Dictionary Learning
AU  - Daniel Vainsencher
AU  - Shie Mannor
AU  - Alfred M. Bruckstein
BT  - Proceedings of the 24th Annual Conference on Learning Theory
DA  - 2011/12/21
ED  - Sham M. Kakade
ED  - Ulrike von Luxburg	
ID  - pmlr-v19-vainsencher11a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 19
SP  - 773
EP  - 788
L1  - http://proceedings.mlr.press/v19/vainsencher11a/vainsencher11a.pdf
UR  - https://proceedings.mlr.press/v19/vainsencher11a.html
AB  - A large set of signals can sometimes be described sparsely using a dictionary, that is, every element can be represented as a linear combination of few elements from the dictionary.  Algorithms for various signal processing applications, including classification, denoising and signal separation, learn a dictionary from a given set of signals to be represented. Can we expect that the error in representing by such a dictionary a previously unseen signal from the same source will be of similar magnitude as those for the given examples? We assume signals are generated from a fixed distribution, and study these questions from a statistical learning theory perspective. We develop generalization bounds on the quality of the learned dictionary for two types of constraints on the coefficient selection, as measured by the expected $L_2$ error in representation when the dictionary is used. For the case of $l_1$ regularized coefficient selection we provide a generalization bound of the order of $O\left(\sqrt{np\ln(m λ)/m}\right)$, where $n$ is the dimension, $p$ is the number of elements in the dictionary, λis a bound on the $l_1$ norm of the coefficient vector and m is the number of samples, which complements existing results. For the case of representing a new signal as a combination of at most $k$ dictionary elements, we provide a bound ofthe order $O(\sqrt{np\ln(m  k)/m})$ under an assumption on the closeness to orthogonality of the dictionary (low Babel function). We further show that this assumption holds for most dictionaries in high dimensions in a strong probabilistic sense. Our results also include bounds that converge as $1/m$, not previously known for this problem. We provide similar results in a general setting using kernels with weak smoothness requirements.
ER  -

APA


Vainsencher, D., Mannor, S. & Bruckstein, A.M.. (2011). The Sample Complexity of Dictionary Learning. Proceedings of the 24th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 19:773-788 Available from https://proceedings.mlr.press/v19/vainsencher11a.html.

Related Material

Download PDF