Deeply-Supervised Nets

Chen-Yu Lee; Saining Xie; Patrick Gallagher; Zhengyou Zhang; Zhuowen Tu

Deeply-Supervised Nets

Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, Zhuowen Tu

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR 38:562-570, 2015.

Abstract

We propose deeply-supervised nets (DSN), a method that simultaneously minimizes classification error and improves the directness and transparency of the hidden layer learning process. We focus our attention on three aspects of traditional convolutional-neural-network-type (CNN-type) architectures: (1) transparency in the effect intermediate layers have on overall classification; (2) discriminativeness and robustness of learned features, especially in early layers; (3) training effectiveness in the face of “vanishing” gradients. To combat these issues, we introduce “companion” objective functions at each hidden layer, in addition to the overall objective function at the output layer (an integrated strategy distinct from layer-wise pre-training). We also analyze our algorithm using techniques extended from stochastic gradient methods. The advantages provided by our method are evident in our experimental results, showing state-of-the-art performance on MNIST, CIFAR-10, CIFAR-100, and SVHN.

Cite this Paper

BibTeX


@InProceedings{pmlr-v38-lee15a,
  title = 	 {{Deeply-Supervised Nets}},
  author = 	 {Lee, Chen-Yu and Xie, Saining and Gallagher, Patrick and Zhang, Zhengyou and Tu, Zhuowen},
  booktitle = 	 {Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {562--570},
  year = 	 {2015},
  editor = 	 {Lebanon, Guy and Vishwanathan, S. V. N.},
  volume = 	 {38},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {San Diego, California, USA},
  month = 	 {09--12 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v38/lee15a.pdf},
  url = 	 {https://proceedings.mlr.press/v38/lee15a.html},
  abstract = 	 {We propose deeply-supervised nets (DSN), a method that simultaneously minimizes classification error and improves the directness and transparency of the hidden layer learning process. We focus our attention on three aspects of traditional convolutional-neural-network-type (CNN-type) architectures:  (1) transparency in the effect intermediate layers have on overall classification;  (2) discriminativeness and robustness of learned features, especially in early layers;  (3) training effectiveness in the face of “vanishing” gradients.  To combat these issues, we introduce “companion” objective functions at each hidden layer, in addition to the overall objective function at the output layer (an integrated strategy distinct from layer-wise pre-training). We also analyze our algorithm using techniques extended from stochastic gradient methods. The advantages provided by our method are evident in our experimental results, showing state-of-the-art performance on MNIST, CIFAR-10, CIFAR-100, and SVHN.}
}

Endnote

%0 Conference Paper
%T Deeply-Supervised Nets
%A Chen-Yu Lee
%A Saining Xie
%A Patrick Gallagher
%A Zhengyou Zhang
%A Zhuowen Tu
%B Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2015
%E Guy Lebanon
%E S. V. N. Vishwanathan	
%F pmlr-v38-lee15a
%I PMLR
%P 562--570
%U https://proceedings.mlr.press/v38/lee15a.html
%V 38
%X We propose deeply-supervised nets (DSN), a method that simultaneously minimizes classification error and improves the directness and transparency of the hidden layer learning process. We focus our attention on three aspects of traditional convolutional-neural-network-type (CNN-type) architectures:  (1) transparency in the effect intermediate layers have on overall classification;  (2) discriminativeness and robustness of learned features, especially in early layers;  (3) training effectiveness in the face of “vanishing” gradients.  To combat these issues, we introduce “companion” objective functions at each hidden layer, in addition to the overall objective function at the output layer (an integrated strategy distinct from layer-wise pre-training). We also analyze our algorithm using techniques extended from stochastic gradient methods. The advantages provided by our method are evident in our experimental results, showing state-of-the-art performance on MNIST, CIFAR-10, CIFAR-100, and SVHN.

RIS


TY  - CPAPER
TI  - Deeply-Supervised Nets
AU  - Chen-Yu Lee
AU  - Saining Xie
AU  - Patrick Gallagher
AU  - Zhengyou Zhang
AU  - Zhuowen Tu
BT  - Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
DA  - 2015/02/21
ED  - Guy Lebanon
ED  - S. V. N. Vishwanathan	
ID  - pmlr-v38-lee15a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 38
SP  - 562
EP  - 570
L1  - http://proceedings.mlr.press/v38/lee15a.pdf
UR  - https://proceedings.mlr.press/v38/lee15a.html
AB  - We propose deeply-supervised nets (DSN), a method that simultaneously minimizes classification error and improves the directness and transparency of the hidden layer learning process. We focus our attention on three aspects of traditional convolutional-neural-network-type (CNN-type) architectures:  (1) transparency in the effect intermediate layers have on overall classification;  (2) discriminativeness and robustness of learned features, especially in early layers;  (3) training effectiveness in the face of “vanishing” gradients.  To combat these issues, we introduce “companion” objective functions at each hidden layer, in addition to the overall objective function at the output layer (an integrated strategy distinct from layer-wise pre-training). We also analyze our algorithm using techniques extended from stochastic gradient methods. The advantages provided by our method are evident in our experimental results, showing state-of-the-art performance on MNIST, CIFAR-10, CIFAR-100, and SVHN.
ER  -

APA


Lee, C., Xie, S., Gallagher, P., Zhang, Z. & Tu, Z.. (2015). Deeply-Supervised Nets. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 38:562-570 Available from https://proceedings.mlr.press/v38/lee15a.html.

Related Material

Download PDF