Deconstructing the Ladder Network Architecture

Mohammad Pezeshki; Linxi Fan; Philemon Brakel; Aaron Courville; Yoshua Bengio

Deconstructing the Ladder Network Architecture

Mohammad Pezeshki, Linxi Fan, Philemon Brakel, Aaron Courville, Yoshua Bengio

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:2368-2376, 2016.

Abstract

The Ladder Network is a recent new approach to semi-supervised learning that turned out to be very successful. While showing impressive performance, the Ladder Network has many components intertwined, whose contributions are not obvious in such a complex architecture. This paper presents an extensive experimental investigation of variants of the Ladder Network in which we replaced or removed individual components to learn about their relative importance. For semi-supervised tasks, we conclude that the most important contribution is made by the lateral connections, followed by the application of noise, and the choice of what we refer to as the ‘combinator function’. As the number of labeled training examples increases, the lateral connections and the reconstruction criterion become less important, with most of the generalization improvement coming from the injection of noise in each layer. Finally, we introduce a combinator function that reduces test error rates on Permutation-Invariant MNIST to 0.57% for the supervised setting, and to 0.97% and 1.0% for semi-supervised settings with 1000 and 100 labeled examples, respectively.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-pezeshki16,
  title = 	 {Deconstructing the Ladder Network Architecture},
  author = 	 {Pezeshki, Mohammad and Fan, Linxi and Brakel, Philemon and Courville, Aaron and Bengio, Yoshua},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {2368--2376},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/pezeshki16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/pezeshki16.html},
  abstract = 	 {The Ladder Network is a recent new approach to semi-supervised learning that turned out to be very successful. While showing impressive performance, the Ladder Network has many components intertwined, whose contributions are not obvious in such a complex architecture. This paper presents an extensive experimental investigation of variants of the Ladder Network in which we replaced or removed individual components to learn about their relative importance. For semi-supervised tasks, we conclude that the most important contribution is made by the lateral connections, followed by the application of noise, and the choice of what we refer to as the ‘combinator function’. As the number of labeled training examples increases, the lateral connections and the reconstruction criterion become less important, with most of the generalization improvement coming from the injection of noise in each layer. Finally, we introduce a combinator function that reduces test error rates on Permutation-Invariant MNIST to 0.57% for the supervised setting, and to 0.97% and 1.0% for semi-supervised settings with 1000 and 100 labeled examples, respectively.}
}

Endnote

%0 Conference Paper
%T Deconstructing the Ladder Network Architecture
%A Mohammad Pezeshki
%A Linxi Fan
%A Philemon Brakel
%A Aaron Courville
%A Yoshua Bengio
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-pezeshki16
%I PMLR
%P 2368--2376
%U https://proceedings.mlr.press/v48/pezeshki16.html
%V 48
%X The Ladder Network is a recent new approach to semi-supervised learning that turned out to be very successful. While showing impressive performance, the Ladder Network has many components intertwined, whose contributions are not obvious in such a complex architecture. This paper presents an extensive experimental investigation of variants of the Ladder Network in which we replaced or removed individual components to learn about their relative importance. For semi-supervised tasks, we conclude that the most important contribution is made by the lateral connections, followed by the application of noise, and the choice of what we refer to as the ‘combinator function’. As the number of labeled training examples increases, the lateral connections and the reconstruction criterion become less important, with most of the generalization improvement coming from the injection of noise in each layer. Finally, we introduce a combinator function that reduces test error rates on Permutation-Invariant MNIST to 0.57% for the supervised setting, and to 0.97% and 1.0% for semi-supervised settings with 1000 and 100 labeled examples, respectively.

RIS


TY  - CPAPER
TI  - Deconstructing the Ladder Network Architecture
AU  - Mohammad Pezeshki
AU  - Linxi Fan
AU  - Philemon Brakel
AU  - Aaron Courville
AU  - Yoshua Bengio
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-pezeshki16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 2368
EP  - 2376
L1  - http://proceedings.mlr.press/v48/pezeshki16.pdf
UR  - https://proceedings.mlr.press/v48/pezeshki16.html
AB  - The Ladder Network is a recent new approach to semi-supervised learning that turned out to be very successful. While showing impressive performance, the Ladder Network has many components intertwined, whose contributions are not obvious in such a complex architecture. This paper presents an extensive experimental investigation of variants of the Ladder Network in which we replaced or removed individual components to learn about their relative importance. For semi-supervised tasks, we conclude that the most important contribution is made by the lateral connections, followed by the application of noise, and the choice of what we refer to as the ‘combinator function’. As the number of labeled training examples increases, the lateral connections and the reconstruction criterion become less important, with most of the generalization improvement coming from the injection of noise in each layer. Finally, we introduce a combinator function that reduces test error rates on Permutation-Invariant MNIST to 0.57% for the supervised setting, and to 0.97% and 1.0% for semi-supervised settings with 1000 and 100 labeled examples, respectively.
ER  -

APA


Pezeshki, M., Fan, L., Brakel, P., Courville, A. & Bengio, Y.. (2016). Deconstructing the Ladder Network Architecture. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:2368-2376 Available from https://proceedings.mlr.press/v48/pezeshki16.html.

Deconstructing the Ladder Network Architecture

Abstract

Cite this Paper

Related Material