The Sample Complexity of Dictionary Learning

Daniel Vainsencher, Shie Mannor, Alfred M. Bruckstein
Proceedings of the 24th Annual Conference on Learning Theory, PMLR 19:773-788, 2011.

Abstract

A large set of signals can sometimes be described sparsely using a dictionary, that is, every element can be represented as a linear combination of few elements from the dictionary. Algorithms for various signal processing applications, including classification, denoising and signal separation, learn a dictionary from a given set of signals to be represented. Can we expect that the error in representing by such a dictionary a previously unseen signal from the same source will be of similar magnitude as those for the given examples? We assume signals are generated from a fixed distribution, and study these questions from a statistical learning theory perspective. We develop generalization bounds on the quality of the learned dictionary for two types of constraints on the coefficient selection, as measured by the expected L2 error in representation when the dictionary is used. For the case of l1 regularized coefficient selection we provide a generalization bound of the order of O\left(\sqrt{np\ln(m λ)/m}\right), where n is the dimension, p is the number of elements in the dictionary, λis a bound on the l_1 norm of the coefficient vector and m is the number of samples, which complements existing results. For the case of representing a new signal as a combination of at most k dictionary elements, we provide a bound ofthe order O(\sqrt{np\ln(m k)/m}) under an assumption on the closeness to orthogonality of the dictionary (low Babel function). We further show that this assumption holds for most dictionaries in high dimensions in a strong probabilistic sense. Our results also include bounds that converge as 1/m, not previously known for this problem. We provide similar results in a general setting using kernels with weak smoothness requirements.

Cite this Paper


BibTeX
@InProceedings{pmlr-v19-vainsencher11a, title = {The Sample Complexity of Dictionary Learning}, author = {Vainsencher, Daniel and Mannor, Shie and Bruckstein, Alfred M.}, booktitle = {Proceedings of the 24th Annual Conference on Learning Theory}, pages = {773--788}, year = {2011}, editor = {Kakade, Sham M. and von Luxburg, Ulrike}, volume = {19}, series = {Proceedings of Machine Learning Research}, address = {Budapest, Hungary}, month = {09--11 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v19/vainsencher11a/vainsencher11a.pdf}, url = {https://proceedings.mlr.press/v19/vainsencher11a.html}, abstract = {A large set of signals can sometimes be described sparsely using a dictionary, that is, every element can be represented as a linear combination of few elements from the dictionary. Algorithms for various signal processing applications, including classification, denoising and signal separation, learn a dictionary from a given set of signals to be represented. Can we expect that the error in representing by such a dictionary a previously unseen signal from the same source will be of similar magnitude as those for the given examples? We assume signals are generated from a fixed distribution, and study these questions from a statistical learning theory perspective. We develop generalization bounds on the quality of the learned dictionary for two types of constraints on the coefficient selection, as measured by the expected $L_2$ error in representation when the dictionary is used. For the case of $l_1$ regularized coefficient selection we provide a generalization bound of the order of $O\left(\sqrt{np\ln(m λ)/m}\right)$, where $n$ is the dimension, $p$ is the number of elements in the dictionary, λis a bound on the $l_1$ norm of the coefficient vector and m is the number of samples, which complements existing results. For the case of representing a new signal as a combination of at most $k$ dictionary elements, we provide a bound ofthe order $O(\sqrt{np\ln(m k)/m})$ under an assumption on the closeness to orthogonality of the dictionary (low Babel function). We further show that this assumption holds for most dictionaries in high dimensions in a strong probabilistic sense. Our results also include bounds that converge as $1/m$, not previously known for this problem. We provide similar results in a general setting using kernels with weak smoothness requirements.} }
Endnote
%0 Conference Paper %T The Sample Complexity of Dictionary Learning %A Daniel Vainsencher %A Shie Mannor %A Alfred M. Bruckstein %B Proceedings of the 24th Annual Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2011 %E Sham M. Kakade %E Ulrike von Luxburg %F pmlr-v19-vainsencher11a %I PMLR %P 773--788 %U https://proceedings.mlr.press/v19/vainsencher11a.html %V 19 %X A large set of signals can sometimes be described sparsely using a dictionary, that is, every element can be represented as a linear combination of few elements from the dictionary. Algorithms for various signal processing applications, including classification, denoising and signal separation, learn a dictionary from a given set of signals to be represented. Can we expect that the error in representing by such a dictionary a previously unseen signal from the same source will be of similar magnitude as those for the given examples? We assume signals are generated from a fixed distribution, and study these questions from a statistical learning theory perspective. We develop generalization bounds on the quality of the learned dictionary for two types of constraints on the coefficient selection, as measured by the expected $L_2$ error in representation when the dictionary is used. For the case of $l_1$ regularized coefficient selection we provide a generalization bound of the order of $O\left(\sqrt{np\ln(m λ)/m}\right)$, where $n$ is the dimension, $p$ is the number of elements in the dictionary, λis a bound on the $l_1$ norm of the coefficient vector and m is the number of samples, which complements existing results. For the case of representing a new signal as a combination of at most $k$ dictionary elements, we provide a bound ofthe order $O(\sqrt{np\ln(m k)/m})$ under an assumption on the closeness to orthogonality of the dictionary (low Babel function). We further show that this assumption holds for most dictionaries in high dimensions in a strong probabilistic sense. Our results also include bounds that converge as $1/m$, not previously known for this problem. We provide similar results in a general setting using kernels with weak smoothness requirements.
RIS
TY - CPAPER TI - The Sample Complexity of Dictionary Learning AU - Daniel Vainsencher AU - Shie Mannor AU - Alfred M. Bruckstein BT - Proceedings of the 24th Annual Conference on Learning Theory DA - 2011/12/21 ED - Sham M. Kakade ED - Ulrike von Luxburg ID - pmlr-v19-vainsencher11a PB - PMLR DP - Proceedings of Machine Learning Research VL - 19 SP - 773 EP - 788 L1 - http://proceedings.mlr.press/v19/vainsencher11a/vainsencher11a.pdf UR - https://proceedings.mlr.press/v19/vainsencher11a.html AB - A large set of signals can sometimes be described sparsely using a dictionary, that is, every element can be represented as a linear combination of few elements from the dictionary. Algorithms for various signal processing applications, including classification, denoising and signal separation, learn a dictionary from a given set of signals to be represented. Can we expect that the error in representing by such a dictionary a previously unseen signal from the same source will be of similar magnitude as those for the given examples? We assume signals are generated from a fixed distribution, and study these questions from a statistical learning theory perspective. We develop generalization bounds on the quality of the learned dictionary for two types of constraints on the coefficient selection, as measured by the expected $L_2$ error in representation when the dictionary is used. For the case of $l_1$ regularized coefficient selection we provide a generalization bound of the order of $O\left(\sqrt{np\ln(m λ)/m}\right)$, where $n$ is the dimension, $p$ is the number of elements in the dictionary, λis a bound on the $l_1$ norm of the coefficient vector and m is the number of samples, which complements existing results. For the case of representing a new signal as a combination of at most $k$ dictionary elements, we provide a bound ofthe order $O(\sqrt{np\ln(m k)/m})$ under an assumption on the closeness to orthogonality of the dictionary (low Babel function). We further show that this assumption holds for most dictionaries in high dimensions in a strong probabilistic sense. Our results also include bounds that converge as $1/m$, not previously known for this problem. We provide similar results in a general setting using kernels with weak smoothness requirements. ER -
APA
Vainsencher, D., Mannor, S. & Bruckstein, A.M.. (2011). The Sample Complexity of Dictionary Learning. Proceedings of the 24th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 19:773-788 Available from https://proceedings.mlr.press/v19/vainsencher11a.html.

Related Material