Parallelizing Legendre Memory Unit Training

Narsimha Reddy Chilkuri, Chris Eliasmith
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:1898-1907, 2021.

Abstract

Recently, a new recurrent neural network (RNN) named the Legendre Memory Unit (LMU) was proposed and shown to achieve state-of-the-art performance on several benchmark datasets. Here we leverage the linear time-invariant (LTI) memory component of the LMU to construct a simplified variant that can be parallelized during training (and yet executed as an RNN during inference), resulting in up to 200 times faster training. We note that our efficient parallelizing scheme is general and is applicable to any deep network whose recurrent components are linear dynamical systems. We demonstrate the improved accuracy of our new architecture compared to the original LMU and a variety of published LSTM and transformer networks across seven benchmarks. For instance, our LMU sets a new state-of-the-art result on psMNIST, and uses half the parameters while outperforming DistilBERT and LSTM models on IMDB sentiment analysis.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-chilkuri21a, title = {Parallelizing Legendre Memory Unit Training}, author = {Chilkuri, Narsimha Reddy and Eliasmith, Chris}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {1898--1907}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/chilkuri21a/chilkuri21a.pdf}, url = {https://proceedings.mlr.press/v139/chilkuri21a.html}, abstract = {Recently, a new recurrent neural network (RNN) named the Legendre Memory Unit (LMU) was proposed and shown to achieve state-of-the-art performance on several benchmark datasets. Here we leverage the linear time-invariant (LTI) memory component of the LMU to construct a simplified variant that can be parallelized during training (and yet executed as an RNN during inference), resulting in up to 200 times faster training. We note that our efficient parallelizing scheme is general and is applicable to any deep network whose recurrent components are linear dynamical systems. We demonstrate the improved accuracy of our new architecture compared to the original LMU and a variety of published LSTM and transformer networks across seven benchmarks. For instance, our LMU sets a new state-of-the-art result on psMNIST, and uses half the parameters while outperforming DistilBERT and LSTM models on IMDB sentiment analysis.} }
Endnote
%0 Conference Paper %T Parallelizing Legendre Memory Unit Training %A Narsimha Reddy Chilkuri %A Chris Eliasmith %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-chilkuri21a %I PMLR %P 1898--1907 %U https://proceedings.mlr.press/v139/chilkuri21a.html %V 139 %X Recently, a new recurrent neural network (RNN) named the Legendre Memory Unit (LMU) was proposed and shown to achieve state-of-the-art performance on several benchmark datasets. Here we leverage the linear time-invariant (LTI) memory component of the LMU to construct a simplified variant that can be parallelized during training (and yet executed as an RNN during inference), resulting in up to 200 times faster training. We note that our efficient parallelizing scheme is general and is applicable to any deep network whose recurrent components are linear dynamical systems. We demonstrate the improved accuracy of our new architecture compared to the original LMU and a variety of published LSTM and transformer networks across seven benchmarks. For instance, our LMU sets a new state-of-the-art result on psMNIST, and uses half the parameters while outperforming DistilBERT and LSTM models on IMDB sentiment analysis.
APA
Chilkuri, N.R. & Eliasmith, C.. (2021). Parallelizing Legendre Memory Unit Training. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:1898-1907 Available from https://proceedings.mlr.press/v139/chilkuri21a.html.

Related Material