Deconstructing the Ladder Network Architecture

Mohammad Pezeshki, Linxi Fan, Philemon Brakel, Aaron Courville, Yoshua Bengio
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:2368-2376, 2016.

Abstract

The Ladder Network is a recent new approach to semi-supervised learning that turned out to be very successful. While showing impressive performance, the Ladder Network has many components intertwined, whose contributions are not obvious in such a complex architecture. This paper presents an extensive experimental investigation of variants of the Ladder Network in which we replaced or removed individual components to learn about their relative importance. For semi-supervised tasks, we conclude that the most important contribution is made by the lateral connections, followed by the application of noise, and the choice of what we refer to as the ‘combinator function’. As the number of labeled training examples increases, the lateral connections and the reconstruction criterion become less important, with most of the generalization improvement coming from the injection of noise in each layer. Finally, we introduce a combinator function that reduces test error rates on Permutation-Invariant MNIST to 0.57% for the supervised setting, and to 0.97% and 1.0% for semi-supervised settings with 1000 and 100 labeled examples, respectively.

Cite this Paper


BibTeX
@InProceedings{pmlr-v48-pezeshki16, title = {Deconstructing the Ladder Network Architecture}, author = {Pezeshki, Mohammad and Fan, Linxi and Brakel, Philemon and Courville, Aaron and Bengio, Yoshua}, booktitle = {Proceedings of The 33rd International Conference on Machine Learning}, pages = {2368--2376}, year = {2016}, editor = {Balcan, Maria Florina and Weinberger, Kilian Q.}, volume = {48}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {20--22 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v48/pezeshki16.pdf}, url = {https://proceedings.mlr.press/v48/pezeshki16.html}, abstract = {The Ladder Network is a recent new approach to semi-supervised learning that turned out to be very successful. While showing impressive performance, the Ladder Network has many components intertwined, whose contributions are not obvious in such a complex architecture. This paper presents an extensive experimental investigation of variants of the Ladder Network in which we replaced or removed individual components to learn about their relative importance. For semi-supervised tasks, we conclude that the most important contribution is made by the lateral connections, followed by the application of noise, and the choice of what we refer to as the ‘combinator function’. As the number of labeled training examples increases, the lateral connections and the reconstruction criterion become less important, with most of the generalization improvement coming from the injection of noise in each layer. Finally, we introduce a combinator function that reduces test error rates on Permutation-Invariant MNIST to 0.57% for the supervised setting, and to 0.97% and 1.0% for semi-supervised settings with 1000 and 100 labeled examples, respectively.} }
Endnote
%0 Conference Paper %T Deconstructing the Ladder Network Architecture %A Mohammad Pezeshki %A Linxi Fan %A Philemon Brakel %A Aaron Courville %A Yoshua Bengio %B Proceedings of The 33rd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Maria Florina Balcan %E Kilian Q. Weinberger %F pmlr-v48-pezeshki16 %I PMLR %P 2368--2376 %U https://proceedings.mlr.press/v48/pezeshki16.html %V 48 %X The Ladder Network is a recent new approach to semi-supervised learning that turned out to be very successful. While showing impressive performance, the Ladder Network has many components intertwined, whose contributions are not obvious in such a complex architecture. This paper presents an extensive experimental investigation of variants of the Ladder Network in which we replaced or removed individual components to learn about their relative importance. For semi-supervised tasks, we conclude that the most important contribution is made by the lateral connections, followed by the application of noise, and the choice of what we refer to as the ‘combinator function’. As the number of labeled training examples increases, the lateral connections and the reconstruction criterion become less important, with most of the generalization improvement coming from the injection of noise in each layer. Finally, we introduce a combinator function that reduces test error rates on Permutation-Invariant MNIST to 0.57% for the supervised setting, and to 0.97% and 1.0% for semi-supervised settings with 1000 and 100 labeled examples, respectively.
RIS
TY - CPAPER TI - Deconstructing the Ladder Network Architecture AU - Mohammad Pezeshki AU - Linxi Fan AU - Philemon Brakel AU - Aaron Courville AU - Yoshua Bengio BT - Proceedings of The 33rd International Conference on Machine Learning DA - 2016/06/11 ED - Maria Florina Balcan ED - Kilian Q. Weinberger ID - pmlr-v48-pezeshki16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 48 SP - 2368 EP - 2376 L1 - http://proceedings.mlr.press/v48/pezeshki16.pdf UR - https://proceedings.mlr.press/v48/pezeshki16.html AB - The Ladder Network is a recent new approach to semi-supervised learning that turned out to be very successful. While showing impressive performance, the Ladder Network has many components intertwined, whose contributions are not obvious in such a complex architecture. This paper presents an extensive experimental investigation of variants of the Ladder Network in which we replaced or removed individual components to learn about their relative importance. For semi-supervised tasks, we conclude that the most important contribution is made by the lateral connections, followed by the application of noise, and the choice of what we refer to as the ‘combinator function’. As the number of labeled training examples increases, the lateral connections and the reconstruction criterion become less important, with most of the generalization improvement coming from the injection of noise in each layer. Finally, we introduce a combinator function that reduces test error rates on Permutation-Invariant MNIST to 0.57% for the supervised setting, and to 0.97% and 1.0% for semi-supervised settings with 1000 and 100 labeled examples, respectively. ER -
APA
Pezeshki, M., Fan, L., Brakel, P., Courville, A. & Bengio, Y.. (2016). Deconstructing the Ladder Network Architecture. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:2368-2376 Available from https://proceedings.mlr.press/v48/pezeshki16.html.

Related Material