Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization

Jiong Zhang; Qi Lei; Inderjit Dhillon

Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization

Jiong Zhang, Qi Lei, Inderjit Dhillon

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:5806-5814, 2018.

Abstract

Vanishing and exploding gradients are two of the main obstacles in training deep neural networks, especially in capturing long range dependencies in recurrent neural networks (RNNs). In this paper, we present an efficient parametrization of the transition matrix of an RNN that allows us to stabilize the gradients that arise in its training. Specifically, we parameterize the transition matrix by its singular value decomposition (SVD), which allows us to explicitly track and control its singular values. We attain efficiency by using tools that are common in numerical linear algebra, namely Householder reflectors for representing the orthogonal matrices that arise in the SVD. By explicitly controlling the singular values, our proposed Spectral-RNN method allows us to easily solve the exploding gradient problem and we observe that it empirically solves the vanishing gradient issue to a large extent. We note that the SVD parameterization can be used for any rectangular weight matrix, hence it can be easily extended to any deep neural network, such as a multi-layer perceptron. Theoretically, we demonstrate that our parameterization does not lose any expressive power, and show how it potentially makes the optimization process easier. Our extensive experimental results also demonstrate that the proposed framework converges faster, and has good generalization, especially in capturing long range dependencies, as shown on the synthetic addition and copy tasks, as well as on MNIST and Penn Tree Bank data sets.

Cite this Paper

BibTeX

@InProceedings{pmlr-v80-zhang18g,
  title = 	 {Stabilizing Gradients for Deep Neural Networks via Efficient {SVD} Parameterization},
  author =       {Zhang, Jiong and Lei, Qi and Dhillon, Inderjit},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {5806--5814},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/zhang18g/zhang18g.pdf},
  url = 	 {https://proceedings.mlr.press/v80/zhang18g.html},
  abstract = 	 {Vanishing and exploding gradients are two of the main obstacles in training deep neural networks, especially in capturing long range dependencies in recurrent neural networks (RNNs). In this paper, we present an efficient parametrization of the transition matrix of an RNN that allows us to stabilize the gradients that arise in its training. Specifically, we parameterize the transition matrix by its singular value decomposition (SVD), which allows us to explicitly track and control its singular values. We attain efficiency by using tools that are common in numerical linear algebra, namely Householder reflectors for representing the orthogonal matrices that arise in the SVD. By explicitly controlling the singular values, our proposed Spectral-RNN method allows us to easily solve the exploding gradient problem and we observe that it empirically solves the vanishing gradient issue to a large extent. We note that the SVD parameterization can be used for any rectangular weight matrix, hence it can be easily extended to any deep neural network, such as a multi-layer perceptron. Theoretically, we demonstrate that our parameterization does not lose any expressive power, and show how it potentially makes the optimization process easier. Our extensive experimental results also demonstrate that the proposed framework converges faster, and has good generalization, especially in capturing long range dependencies, as shown on the synthetic addition and copy tasks, as well as on MNIST and Penn Tree Bank data sets.}
}

Endnote

%0 Conference Paper
%T Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization
%A Jiong Zhang
%A Qi Lei
%A Inderjit Dhillon
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-zhang18g
%I PMLR
%P 5806--5814
%U https://proceedings.mlr.press/v80/zhang18g.html
%V 80
%X Vanishing and exploding gradients are two of the main obstacles in training deep neural networks, especially in capturing long range dependencies in recurrent neural networks (RNNs). In this paper, we present an efficient parametrization of the transition matrix of an RNN that allows us to stabilize the gradients that arise in its training. Specifically, we parameterize the transition matrix by its singular value decomposition (SVD), which allows us to explicitly track and control its singular values. We attain efficiency by using tools that are common in numerical linear algebra, namely Householder reflectors for representing the orthogonal matrices that arise in the SVD. By explicitly controlling the singular values, our proposed Spectral-RNN method allows us to easily solve the exploding gradient problem and we observe that it empirically solves the vanishing gradient issue to a large extent. We note that the SVD parameterization can be used for any rectangular weight matrix, hence it can be easily extended to any deep neural network, such as a multi-layer perceptron. Theoretically, we demonstrate that our parameterization does not lose any expressive power, and show how it potentially makes the optimization process easier. Our extensive experimental results also demonstrate that the proposed framework converges faster, and has good generalization, especially in capturing long range dependencies, as shown on the synthetic addition and copy tasks, as well as on MNIST and Penn Tree Bank data sets.

APA

Zhang, J., Lei, Q. & Dhillon, I.. (2018). Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:5806-5814 Available from https://proceedings.mlr.press/v80/zhang18g.html.

Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization

Abstract

Cite this Paper

Related Material