Stage-wise Training: An Improved Feature Learning Strategy for Deep Models

Elnaz Barshan, Paul Fieguth
Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015, PMLR 44:49-59, 2015.

Abstract

Deep neural networks currently stand at the state of the art for many machine learning applications, yet there still remain limitations in the training of such networks because of their very high parameter dimensionality. In this paper we show that network training performance can be improved using a stage-wise learning strategy, in which the learning process is broken down into a number of related sub-tasks that are completed stage-by-stage. The idea is to inject the information to the network \textitgradually so that in the early stages of training the “coarse-scale” properties of the data are captured while the “finer-scale” characteristics are learned in later stages. Moreover, the solution found in each stage serves as a prior to the next stage, which produces a regularization effect and enhances the generalization of the learned representations. We show that decoupling the classifier layer from the feature extraction layers of the network is necessary, as it alleviates the diffusion of gradient and over-fitting problems. Experimental results in the context of image classification support these claims.

Cite this Paper


BibTeX
@InProceedings{pmlr-v44-Barshan2015, title = {Stage-wise Training: An Improved Feature Learning Strategy for Deep Models}, author = {Barshan, Elnaz and Fieguth, Paul}, booktitle = {Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015}, pages = {49--59}, year = {2015}, editor = {Storcheus, Dmitry and Rostamizadeh, Afshin and Kumar, Sanjiv}, volume = {44}, series = {Proceedings of Machine Learning Research}, address = {Montreal, Canada}, month = {11 Dec}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v44/Barshan2015.pdf}, url = {https://proceedings.mlr.press/v44/Barshan2015.html}, abstract = {Deep neural networks currently stand at the state of the art for many machine learning applications, yet there still remain limitations in the training of such networks because of their very high parameter dimensionality. In this paper we show that network training performance can be improved using a stage-wise learning strategy, in which the learning process is broken down into a number of related sub-tasks that are completed stage-by-stage. The idea is to inject the information to the network \textitgradually so that in the early stages of training the “coarse-scale” properties of the data are captured while the “finer-scale” characteristics are learned in later stages. Moreover, the solution found in each stage serves as a prior to the next stage, which produces a regularization effect and enhances the generalization of the learned representations. We show that decoupling the classifier layer from the feature extraction layers of the network is necessary, as it alleviates the diffusion of gradient and over-fitting problems. Experimental results in the context of image classification support these claims.} }
Endnote
%0 Conference Paper %T Stage-wise Training: An Improved Feature Learning Strategy for Deep Models %A Elnaz Barshan %A Paul Fieguth %B Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015 %C Proceedings of Machine Learning Research %D 2015 %E Dmitry Storcheus %E Afshin Rostamizadeh %E Sanjiv Kumar %F pmlr-v44-Barshan2015 %I PMLR %P 49--59 %U https://proceedings.mlr.press/v44/Barshan2015.html %V 44 %X Deep neural networks currently stand at the state of the art for many machine learning applications, yet there still remain limitations in the training of such networks because of their very high parameter dimensionality. In this paper we show that network training performance can be improved using a stage-wise learning strategy, in which the learning process is broken down into a number of related sub-tasks that are completed stage-by-stage. The idea is to inject the information to the network \textitgradually so that in the early stages of training the “coarse-scale” properties of the data are captured while the “finer-scale” characteristics are learned in later stages. Moreover, the solution found in each stage serves as a prior to the next stage, which produces a regularization effect and enhances the generalization of the learned representations. We show that decoupling the classifier layer from the feature extraction layers of the network is necessary, as it alleviates the diffusion of gradient and over-fitting problems. Experimental results in the context of image classification support these claims.
RIS
TY - CPAPER TI - Stage-wise Training: An Improved Feature Learning Strategy for Deep Models AU - Elnaz Barshan AU - Paul Fieguth BT - Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015 DA - 2015/12/08 ED - Dmitry Storcheus ED - Afshin Rostamizadeh ED - Sanjiv Kumar ID - pmlr-v44-Barshan2015 PB - PMLR DP - Proceedings of Machine Learning Research VL - 44 SP - 49 EP - 59 L1 - http://proceedings.mlr.press/v44/Barshan2015.pdf UR - https://proceedings.mlr.press/v44/Barshan2015.html AB - Deep neural networks currently stand at the state of the art for many machine learning applications, yet there still remain limitations in the training of such networks because of their very high parameter dimensionality. In this paper we show that network training performance can be improved using a stage-wise learning strategy, in which the learning process is broken down into a number of related sub-tasks that are completed stage-by-stage. The idea is to inject the information to the network \textitgradually so that in the early stages of training the “coarse-scale” properties of the data are captured while the “finer-scale” characteristics are learned in later stages. Moreover, the solution found in each stage serves as a prior to the next stage, which produces a regularization effect and enhances the generalization of the learned representations. We show that decoupling the classifier layer from the feature extraction layers of the network is necessary, as it alleviates the diffusion of gradient and over-fitting problems. Experimental results in the context of image classification support these claims. ER -
APA
Barshan, E. & Fieguth, P.. (2015). Stage-wise Training: An Improved Feature Learning Strategy for Deep Models. Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015, in Proceedings of Machine Learning Research 44:49-59 Available from https://proceedings.mlr.press/v44/Barshan2015.html.

Related Material