Towards Binary-Valued Gates for Robust LSTM Training

Zhuohan Li, Di He, Fei Tian, Wei Chen, Tao Qin, Liwei Wang, Tieyan Liu
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2995-3004, 2018.

Abstract

Long Short-Term Memory (LSTM) is one of the most widely used recurrent structures in sequence modeling. It aims to use gates to control information flow (e.g., whether to skip some information or not) in the recurrent computations, although its practical implementation based on soft gates only partially achieves this goal. In this paper, we propose a new way for LSTM training, which pushes the output values of the gates towards 0 or 1. By doing so, we can better control the information flow: the gates are mostly open or closed, instead of in a middle state, which makes the results more interpretable. Empirical studies show that (1) Although it seems that we restrict the model capacity, there is no performance drop: we achieve better or comparable performances due to its better generalization ability; (2) The outputs of gates are not sensitive to their inputs: we can easily compress the LSTM unit in multiple ways, e.g., low-rank approximation and low-precision approximation. The compressed models are even better than the baseline models without compression.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-li18c, title = {Towards Binary-Valued Gates for Robust {LSTM} Training}, author = {Li, Zhuohan and He, Di and Tian, Fei and Chen, Wei and Qin, Tao and Wang, Liwei and Liu, Tieyan}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {2995--3004}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/li18c/li18c.pdf}, url = {https://proceedings.mlr.press/v80/li18c.html}, abstract = {Long Short-Term Memory (LSTM) is one of the most widely used recurrent structures in sequence modeling. It aims to use gates to control information flow (e.g., whether to skip some information or not) in the recurrent computations, although its practical implementation based on soft gates only partially achieves this goal. In this paper, we propose a new way for LSTM training, which pushes the output values of the gates towards 0 or 1. By doing so, we can better control the information flow: the gates are mostly open or closed, instead of in a middle state, which makes the results more interpretable. Empirical studies show that (1) Although it seems that we restrict the model capacity, there is no performance drop: we achieve better or comparable performances due to its better generalization ability; (2) The outputs of gates are not sensitive to their inputs: we can easily compress the LSTM unit in multiple ways, e.g., low-rank approximation and low-precision approximation. The compressed models are even better than the baseline models without compression.} }
Endnote
%0 Conference Paper %T Towards Binary-Valued Gates for Robust LSTM Training %A Zhuohan Li %A Di He %A Fei Tian %A Wei Chen %A Tao Qin %A Liwei Wang %A Tieyan Liu %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-li18c %I PMLR %P 2995--3004 %U https://proceedings.mlr.press/v80/li18c.html %V 80 %X Long Short-Term Memory (LSTM) is one of the most widely used recurrent structures in sequence modeling. It aims to use gates to control information flow (e.g., whether to skip some information or not) in the recurrent computations, although its practical implementation based on soft gates only partially achieves this goal. In this paper, we propose a new way for LSTM training, which pushes the output values of the gates towards 0 or 1. By doing so, we can better control the information flow: the gates are mostly open or closed, instead of in a middle state, which makes the results more interpretable. Empirical studies show that (1) Although it seems that we restrict the model capacity, there is no performance drop: we achieve better or comparable performances due to its better generalization ability; (2) The outputs of gates are not sensitive to their inputs: we can easily compress the LSTM unit in multiple ways, e.g., low-rank approximation and low-precision approximation. The compressed models are even better than the baseline models without compression.
APA
Li, Z., He, D., Tian, F., Chen, W., Qin, T., Wang, L. & Liu, T.. (2018). Towards Binary-Valued Gates for Robust LSTM Training. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:2995-3004 Available from https://proceedings.mlr.press/v80/li18c.html.

Related Material