Why Regularized Auto-Encoders learn Sparse Representation?
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:136-144, 2016.
Sparse distributed representation is the key to learning useful features in deep learning algorithms, because not only it is an efficient mode of data representation, but also – more importantly – it captures the generation process of most real world data. While a number of regularized auto-encoders (AE) enforce sparsity explicitly in their learned representation and others don’t, there has been little formal analysis on what encourages sparsity in these models in general. Our objective is to formally study this general problem for regularized auto-encoders. We provide sufficient conditions on both regularization and activation functions that encourage sparsity. We show that multiple popular models (de-noising and contractive auto encoders, e.g.) and activations (rectified linear and sigmoid, e.g.) satisfy these conditions; thus, our conditions help explain sparsity in their learned representation. Thus our theoretical and empirical analysis together shed light on the properties of regularization/activation that are conductive to sparsity and unify a number of existing auto-encoder models and activation functions under the same analytical framework.