Deep Learning Made Easier by Linear Transformations in Perceptrons

Tapani Raiko; Harri Valpola; Yann Lecun

Deep Learning Made Easier by Linear Transformations in Perceptrons

Tapani Raiko, Harri Valpola, Yann Lecun

Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:924-932, 2012.

Abstract

We transform the outputs of each hidden neuron in a multi-layer perceptron network to have zero activation and zero slope on average, and use separate shortcut connections to model the linear dependencies instead. This transformation aims at separating the problems of learning the linear and nonlinear parts of the whole input-output mapping, which has many benefits. We study the theoretical properties of the transformation by noting that they make the Fisher information matrix closer to a diagonal matrix, and thus standard gradient closer to the natural gradient. We experimentally confirm the usefulness of the transformations by noting that they make basic stochastic gradient learning competitive with state-of-the-art learning algorithms in speed, and that they seem also to help find solutions that generalize better. The experiments include both classification of small images and learning a low-dimensional representation for images by using a deep unsupervised auto-encoder network. The transformations were beneficial in all cases, with and without regularization and with networks from two to five hidden layers.

Cite this Paper

BibTeX


@InProceedings{pmlr-v22-raiko12,
  title = 	 {Deep Learning Made Easier by Linear Transformations in Perceptrons},
  author = 	 {Raiko, Tapani and Valpola, Harri and Lecun, Yann},
  booktitle = 	 {Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {924--932},
  year = 	 {2012},
  editor = 	 {Lawrence, Neil D. and Girolami, Mark},
  volume = 	 {22},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {La Palma, Canary Islands},
  month = 	 {21--23 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v22/raiko12/raiko12.pdf},
  url = 	 {https://proceedings.mlr.press/v22/raiko12.html},
  abstract = 	 {We transform the outputs of each hidden neuron in a multi-layer perceptron network to have zero activation and zero slope on average, and use separate shortcut connections to model the linear dependencies instead. This transformation aims at separating the problems of learning the linear and nonlinear parts of the whole input-output mapping, which has many benefits. We study the theoretical properties of the transformation by noting that they make the Fisher information matrix closer to a diagonal matrix, and thus standard gradient closer to the natural gradient. We experimentally confirm the usefulness of the transformations by noting that they make basic stochastic gradient learning competitive with state-of-the-art learning algorithms in speed, and that they seem also to help find solutions that generalize better. The experiments include both classification of small images and learning a low-dimensional representation for images by using a deep unsupervised auto-encoder network. The transformations were beneficial in all cases, with and without regularization and with networks from two to five hidden layers.}
}

Endnote

%0 Conference Paper
%T Deep Learning Made Easier by Linear Transformations in Perceptrons
%A Tapani Raiko
%A Harri Valpola
%A Yann Lecun
%B Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2012
%E Neil D. Lawrence
%E Mark Girolami	
%F pmlr-v22-raiko12
%I PMLR
%P 924--932
%U https://proceedings.mlr.press/v22/raiko12.html
%V 22
%X We transform the outputs of each hidden neuron in a multi-layer perceptron network to have zero activation and zero slope on average, and use separate shortcut connections to model the linear dependencies instead. This transformation aims at separating the problems of learning the linear and nonlinear parts of the whole input-output mapping, which has many benefits. We study the theoretical properties of the transformation by noting that they make the Fisher information matrix closer to a diagonal matrix, and thus standard gradient closer to the natural gradient. We experimentally confirm the usefulness of the transformations by noting that they make basic stochastic gradient learning competitive with state-of-the-art learning algorithms in speed, and that they seem also to help find solutions that generalize better. The experiments include both classification of small images and learning a low-dimensional representation for images by using a deep unsupervised auto-encoder network. The transformations were beneficial in all cases, with and without regularization and with networks from two to five hidden layers.

RIS


TY  - CPAPER
TI  - Deep Learning Made Easier by Linear Transformations in Perceptrons
AU  - Tapani Raiko
AU  - Harri Valpola
AU  - Yann Lecun
BT  - Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
DA  - 2012/03/21
ED  - Neil D. Lawrence
ED  - Mark Girolami	
ID  - pmlr-v22-raiko12
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 22
SP  - 924
EP  - 932
L1  - http://proceedings.mlr.press/v22/raiko12/raiko12.pdf
UR  - https://proceedings.mlr.press/v22/raiko12.html
AB  - We transform the outputs of each hidden neuron in a multi-layer perceptron network to have zero activation and zero slope on average, and use separate shortcut connections to model the linear dependencies instead. This transformation aims at separating the problems of learning the linear and nonlinear parts of the whole input-output mapping, which has many benefits. We study the theoretical properties of the transformation by noting that they make the Fisher information matrix closer to a diagonal matrix, and thus standard gradient closer to the natural gradient. We experimentally confirm the usefulness of the transformations by noting that they make basic stochastic gradient learning competitive with state-of-the-art learning algorithms in speed, and that they seem also to help find solutions that generalize better. The experiments include both classification of small images and learning a low-dimensional representation for images by using a deep unsupervised auto-encoder network. The transformations were beneficial in all cases, with and without regularization and with networks from two to five hidden layers.
ER  -

APA


Raiko, T., Valpola, H. & Lecun, Y.. (2012). Deep Learning Made Easier by Linear Transformations in Perceptrons. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 22:924-932 Available from https://proceedings.mlr.press/v22/raiko12.html.

Related Material

Download PDF