Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

Jerry Yao-Chieh Hu, Pei-Hsuan Chang, Haozheng Luo, Hong-Yu Chen, Weijian Li, Wei-Po Wang, Han Liu
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:19123-19152, 2024.

Abstract

We introduce an Outlier-Efficient Modern Hopfield Model (termed OutEffHop) and use it to address the outlier inefficiency problem of training gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating outlier-efficient associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an outlier-efficient attention mechanism (Softmax_1): it is an approximation of the memory retrieval process of OutEffHop. Methodologically, this allows us to introduce novel outlier-efficient Hopfield layers as powerful alternatives to traditional attention mechanisms, with superior post-quantization performance. Theoretically, the Outlier-Efficient Modern Hopfield Model retains and improves the desirable properties of standard modern Hopfield models, including fixed point convergence and exponential storage capacity. Empirically, we demonstrate the efficacy of the proposed model across large-scale transformer-based and Hopfield-based models (including BERT, OPT, ViT, and STanHop-Net), benchmarking against state-of-the-art methods like Clipped_Softmax and Gated_Attention. Notably, OutEffHop achieves an average reduction of 22+% in average kurtosis and 26+% in the maximum infinity norm of model outputs across four models. Code is available at GitHub; future updates are on arXiv.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-hu24a, title = {Outlier-Efficient Hopfield Layers for Large Transformer-Based Models}, author = {Hu, Jerry Yao-Chieh and Chang, Pei-Hsuan and Luo, Haozheng and Chen, Hong-Yu and Li, Weijian and Wang, Wei-Po and Liu, Han}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {19123--19152}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/hu24a/hu24a.pdf}, url = {https://proceedings.mlr.press/v235/hu24a.html}, abstract = {We introduce an Outlier-Efficient Modern Hopfield Model (termed OutEffHop) and use it to address the outlier inefficiency problem of training gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating outlier-efficient associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an outlier-efficient attention mechanism (Softmax_1): it is an approximation of the memory retrieval process of OutEffHop. Methodologically, this allows us to introduce novel outlier-efficient Hopfield layers as powerful alternatives to traditional attention mechanisms, with superior post-quantization performance. Theoretically, the Outlier-Efficient Modern Hopfield Model retains and improves the desirable properties of standard modern Hopfield models, including fixed point convergence and exponential storage capacity. Empirically, we demonstrate the efficacy of the proposed model across large-scale transformer-based and Hopfield-based models (including BERT, OPT, ViT, and STanHop-Net), benchmarking against state-of-the-art methods like Clipped_Softmax and Gated_Attention. Notably, OutEffHop achieves an average reduction of 22+% in average kurtosis and 26+% in the maximum infinity norm of model outputs across four models. Code is available at GitHub; future updates are on arXiv.} }
Endnote
%0 Conference Paper %T Outlier-Efficient Hopfield Layers for Large Transformer-Based Models %A Jerry Yao-Chieh Hu %A Pei-Hsuan Chang %A Haozheng Luo %A Hong-Yu Chen %A Weijian Li %A Wei-Po Wang %A Han Liu %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-hu24a %I PMLR %P 19123--19152 %U https://proceedings.mlr.press/v235/hu24a.html %V 235 %X We introduce an Outlier-Efficient Modern Hopfield Model (termed OutEffHop) and use it to address the outlier inefficiency problem of training gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating outlier-efficient associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an outlier-efficient attention mechanism (Softmax_1): it is an approximation of the memory retrieval process of OutEffHop. Methodologically, this allows us to introduce novel outlier-efficient Hopfield layers as powerful alternatives to traditional attention mechanisms, with superior post-quantization performance. Theoretically, the Outlier-Efficient Modern Hopfield Model retains and improves the desirable properties of standard modern Hopfield models, including fixed point convergence and exponential storage capacity. Empirically, we demonstrate the efficacy of the proposed model across large-scale transformer-based and Hopfield-based models (including BERT, OPT, ViT, and STanHop-Net), benchmarking against state-of-the-art methods like Clipped_Softmax and Gated_Attention. Notably, OutEffHop achieves an average reduction of 22+% in average kurtosis and 26+% in the maximum infinity norm of model outputs across four models. Code is available at GitHub; future updates are on arXiv.
APA
Hu, J.Y., Chang, P., Luo, H., Chen, H., Li, W., Wang, W. & Liu, H.. (2024). Outlier-Efficient Hopfield Layers for Large Transformer-Based Models. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:19123-19152 Available from https://proceedings.mlr.press/v235/hu24a.html.

Related Material