Stage-wise Training: An Improved Feature Learning Strategy for Deep Models

Elnaz Barshan; Paul Fieguth

Stage-wise Training: An Improved Feature Learning Strategy for Deep Models

Elnaz Barshan, Paul Fieguth

Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015, PMLR 44:49-59, 2015.

Abstract

Deep neural networks currently stand at the state of the art for many machine learning applications, yet there still remain limitations in the training of such networks because of their very high parameter dimensionality. In this paper we show that network training performance can be improved using a stage-wise learning strategy, in which the learning process is broken down into a number of related sub-tasks that are completed stage-by-stage. The idea is to inject the information to the network \textitgradually so that in the early stages of training the “coarse-scale” properties of the data are captured while the “finer-scale” characteristics are learned in later stages. Moreover, the solution found in each stage serves as a prior to the next stage, which produces a regularization effect and enhances the generalization of the learned representations. We show that decoupling the classifier layer from the feature extraction layers of the network is necessary, as it alleviates the diffusion of gradient and over-fitting problems. Experimental results in the context of image classification support these claims.

Cite this Paper

BibTeX


@InProceedings{pmlr-v44-Barshan2015,
  title = 	 {Stage-wise Training: An Improved Feature Learning Strategy for Deep Models},
  author = 	 {Barshan, Elnaz and Fieguth, Paul},
  booktitle = 	 {Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015},
  pages = 	 {49--59},
  year = 	 {2015},
  editor = 	 {Storcheus, Dmitry and Rostamizadeh, Afshin and Kumar, Sanjiv},
  volume = 	 {44},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Montreal, Canada},
  month = 	 {11 Dec},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v44/Barshan2015.pdf},
  url = 	 {https://proceedings.mlr.press/v44/Barshan2015.html},
  abstract = 	 {Deep neural networks currently stand at the state of the art for many machine learning applications, yet there still remain limitations in the training of such networks because of their very high parameter dimensionality. In this paper we show that network training performance can be improved using a stage-wise learning strategy, in which the learning process is broken down into a number of related sub-tasks that are completed stage-by-stage. The idea is to inject the information to the network \textitgradually so that in the early stages of training the “coarse-scale” properties of the data are captured while the “finer-scale” characteristics are learned in later stages. Moreover, the solution found in each stage serves as a prior to the next stage, which produces a regularization effect and enhances the generalization of the learned representations. We show that decoupling the classifier layer from the feature extraction layers of the network is necessary, as it alleviates the diffusion of gradient and over-fitting problems.  Experimental results in the context of image classification support these claims.}
}

Endnote

%0 Conference Paper
%T Stage-wise Training: An Improved Feature Learning Strategy for Deep Models
%A Elnaz Barshan
%A Paul Fieguth
%B Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015
%C Proceedings of Machine Learning Research
%D 2015
%E Dmitry Storcheus
%E Afshin Rostamizadeh
%E Sanjiv Kumar	
%F pmlr-v44-Barshan2015
%I PMLR
%P 49--59
%U https://proceedings.mlr.press/v44/Barshan2015.html
%V 44
%X Deep neural networks currently stand at the state of the art for many machine learning applications, yet there still remain limitations in the training of such networks because of their very high parameter dimensionality. In this paper we show that network training performance can be improved using a stage-wise learning strategy, in which the learning process is broken down into a number of related sub-tasks that are completed stage-by-stage. The idea is to inject the information to the network \textitgradually so that in the early stages of training the “coarse-scale” properties of the data are captured while the “finer-scale” characteristics are learned in later stages. Moreover, the solution found in each stage serves as a prior to the next stage, which produces a regularization effect and enhances the generalization of the learned representations. We show that decoupling the classifier layer from the feature extraction layers of the network is necessary, as it alleviates the diffusion of gradient and over-fitting problems.  Experimental results in the context of image classification support these claims.

RIS


TY  - CPAPER
TI  - Stage-wise Training: An Improved Feature Learning Strategy for Deep Models
AU  - Elnaz Barshan
AU  - Paul Fieguth
BT  - Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015
DA  - 2015/12/08
ED  - Dmitry Storcheus
ED  - Afshin Rostamizadeh
ED  - Sanjiv Kumar	
ID  - pmlr-v44-Barshan2015
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 44
SP  - 49
EP  - 59
L1  - http://proceedings.mlr.press/v44/Barshan2015.pdf
UR  - https://proceedings.mlr.press/v44/Barshan2015.html
AB  - Deep neural networks currently stand at the state of the art for many machine learning applications, yet there still remain limitations in the training of such networks because of their very high parameter dimensionality. In this paper we show that network training performance can be improved using a stage-wise learning strategy, in which the learning process is broken down into a number of related sub-tasks that are completed stage-by-stage. The idea is to inject the information to the network \textitgradually so that in the early stages of training the “coarse-scale” properties of the data are captured while the “finer-scale” characteristics are learned in later stages. Moreover, the solution found in each stage serves as a prior to the next stage, which produces a regularization effect and enhances the generalization of the learned representations. We show that decoupling the classifier layer from the feature extraction layers of the network is necessary, as it alleviates the diffusion of gradient and over-fitting problems.  Experimental results in the context of image classification support these claims.
ER  -

APA


Barshan, E. & Fieguth, P.. (2015). Stage-wise Training: An Improved Feature Learning Strategy for Deep Models. Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015, in Proceedings of Machine Learning Research 44:49-59 Available from https://proceedings.mlr.press/v44/Barshan2015.html.

Related Material

Download PDF