The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training

Dumitru Erhan; Pierre-Antoine Manzagol; Yoshua Bengio; Samy Bengio; Pascal Vincent

The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training

Dumitru Erhan, Pierre-Antoine Manzagol, Yoshua Bengio, Samy Bengio, Pascal Vincent

Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, PMLR 5:153-160, 2009.

Abstract

Whereas theoretical work suggests that deep architectures might be more efficient at representing highly-varying functions, training deep architectures was unsuccessful until the recent advent of algorithms based on unsupervised pre-training. Even though these new algorithms have enabled training deep models, many questions remain as to the nature of this difficult learning problem. Answering these questions is important if learning in deep architectures is to be further improved. We attempt to shed some light on these questions through extensive simulations. The experiments confirm and clarify the advantage of unsupervised pre-training. They demonstrate the robustness of the training procedure with respect to the random initialization, the positive effect of pre-training in terms of optimization and its role as a kind of regularizer. We show the influence of architecture depth, model capacity, and number of training examples.

Cite this Paper

BibTeX


@InProceedings{pmlr-v5-erhan09a,
  title = 	 {The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training},
  author = 	 {Erhan, Dumitru and Manzagol, Pierre-Antoine and Bengio, Yoshua and Bengio, Samy and Vincent, Pascal},
  booktitle = 	 {Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {153--160},
  year = 	 {2009},
  editor = 	 {van Dyk, David and Welling, Max},
  volume = 	 {5},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v5/erhan09a/erhan09a.pdf},
  url = 	 {https://proceedings.mlr.press/v5/erhan09a.html},
  abstract = 	 {Whereas theoretical work suggests that deep architectures might be more efficient at representing highly-varying functions, training deep architectures  was unsuccessful until the recent advent of algorithms based on unsupervised pre-training. Even though these new algorithms have enabled training deep models, many questions remain as to the nature of this difficult learning problem. Answering these questions is important if learning in deep architectures is to be further improved. We attempt to shed some light on these questions through extensive simulations. The experiments confirm and clarify the advantage of unsupervised pre-training. They demonstrate the robustness of the training procedure with respect to the random initialization, the positive  effect of pre-training in terms of optimization and its role as a kind of regularizer. We show the influence of architecture depth, model capacity, and number of training examples.}
}

Endnote

%0 Conference Paper
%T The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training
%A Dumitru Erhan
%A Pierre-Antoine Manzagol
%A Yoshua Bengio
%A Samy Bengio
%A Pascal Vincent
%B Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2009
%E David van Dyk
%E Max Welling	
%F pmlr-v5-erhan09a
%I PMLR
%P 153--160
%U https://proceedings.mlr.press/v5/erhan09a.html
%V 5
%X Whereas theoretical work suggests that deep architectures might be more efficient at representing highly-varying functions, training deep architectures  was unsuccessful until the recent advent of algorithms based on unsupervised pre-training. Even though these new algorithms have enabled training deep models, many questions remain as to the nature of this difficult learning problem. Answering these questions is important if learning in deep architectures is to be further improved. We attempt to shed some light on these questions through extensive simulations. The experiments confirm and clarify the advantage of unsupervised pre-training. They demonstrate the robustness of the training procedure with respect to the random initialization, the positive  effect of pre-training in terms of optimization and its role as a kind of regularizer. We show the influence of architecture depth, model capacity, and number of training examples.

RIS


TY  - CPAPER
TI  - The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training
AU  - Dumitru Erhan
AU  - Pierre-Antoine Manzagol
AU  - Yoshua Bengio
AU  - Samy Bengio
AU  - Pascal Vincent
BT  - Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics
DA  - 2009/04/15
ED  - David van Dyk
ED  - Max Welling	
ID  - pmlr-v5-erhan09a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 5
SP  - 153
EP  - 160
L1  - http://proceedings.mlr.press/v5/erhan09a/erhan09a.pdf
UR  - https://proceedings.mlr.press/v5/erhan09a.html
AB  - Whereas theoretical work suggests that deep architectures might be more efficient at representing highly-varying functions, training deep architectures  was unsuccessful until the recent advent of algorithms based on unsupervised pre-training. Even though these new algorithms have enabled training deep models, many questions remain as to the nature of this difficult learning problem. Answering these questions is important if learning in deep architectures is to be further improved. We attempt to shed some light on these questions through extensive simulations. The experiments confirm and clarify the advantage of unsupervised pre-training. They demonstrate the robustness of the training procedure with respect to the random initialization, the positive  effect of pre-training in terms of optimization and its role as a kind of regularizer. We show the influence of architecture depth, model capacity, and number of training examples.
ER  -

APA


Erhan, D., Manzagol, P., Bengio, Y., Bengio, S. & Vincent, P.. (2009). The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training. Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 5:153-160 Available from https://proceedings.mlr.press/v5/erhan09a.html.

Related Material

Download PDF