xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference

Maximilian Beck; Korbinian Pöppel; Phillip Lippe; Richard Kurle; Patrick M Blies; Günter Klambauer; Sebastian Böck; Sepp Hochreiter

xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference

Maximilian Beck, Korbinian Pöppel, Phillip Lippe, Richard Kurle, Patrick M Blies, Günter Klambauer, Sebastian Böck, Sepp Hochreiter

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:3335-3357, 2025.

Abstract

Recent breakthroughs in solving reasoning, math and coding problems with Large Language Models (LLMs) have been enabled by investing substantial computation budgets at inference time. Therefore, inference speed is one of the most critical properties of LLM architectures, and there is a growing need for LLMs that are efficient and fast at inference. Recently, LLMs built on the xLSTM architecture have emerged as a powerful alternative to Transformers, offering linear compute scaling with sequence length and constant memory usage, both highly desirable properties for efficient inference. However, such xLSTM-based LLMs have yet to be scaled to larger models and assessed and compared with respect to inference speed and efficiency. In this work, we introduce xLSTM 7B, a 7-billion-parameter LLM that combines xLSTM’s architectural benefits with targeted optimizations for fast and efficient inference. Our experiments demonstrate that xLSTM 7B achieves performance on downstream tasks comparable to other similar-sized LLMs, while providing significantly faster inference speeds and greater efficiency compared to Llama- and Mamba-based LLMs. These results establish xLSTM 7B as the fastest and most efficient 7B LLM, offering a solution for tasks that require large amounts of test-time computation. Our work highlights xLSTM’s potential as a foundational architecture for methods building on heavy use of LLM inference. Our model weights, model code and training code are open-source. Model: https://huggingface.co/NX-AI/xLSTM-7b Code: https://github.com/NX-AI/xlstm and https://github.com/NX-AI/xlstm-jax.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-beck25b,
  title = 	 {x{LSTM} 7{B}: A Recurrent {LLM} for Fast and Efficient Inference},
  author =       {Beck, Maximilian and P\"{o}ppel, Korbinian and Lippe, Phillip and Kurle, Richard and Blies, Patrick M and Klambauer, G\"{u}nter and B\"{o}ck, Sebastian and Hochreiter, Sepp},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {3335--3357},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/beck25b/beck25b.pdf},
  url = 	 {https://proceedings.mlr.press/v267/beck25b.html},
  abstract = 	 {Recent breakthroughs in solving reasoning, math and coding problems with Large Language Models (LLMs) have been enabled by investing substantial computation budgets at inference time. Therefore, inference speed is one of the most critical properties of LLM architectures, and there is a growing need for LLMs that are efficient and fast at inference. Recently, LLMs built on the xLSTM architecture have emerged as a powerful alternative to Transformers, offering linear compute scaling with sequence length and constant memory usage, both highly desirable properties for efficient inference. However, such xLSTM-based LLMs have yet to be scaled to larger models and assessed and compared with respect to inference speed and efficiency. In this work, we introduce xLSTM 7B, a 7-billion-parameter LLM that combines xLSTM’s architectural benefits with targeted optimizations for fast and efficient inference. Our experiments demonstrate that xLSTM 7B achieves performance on downstream tasks comparable to other similar-sized LLMs, while providing significantly faster inference speeds and greater efficiency compared to Llama- and Mamba-based LLMs. These results establish xLSTM 7B as the fastest and most efficient 7B LLM, offering a solution for tasks that require large amounts of test-time computation. Our work highlights xLSTM’s potential as a foundational architecture for methods building on heavy use of LLM inference. Our model weights, model code and training code are open-source. Model: https://huggingface.co/NX-AI/xLSTM-7b Code: https://github.com/NX-AI/xlstm and https://github.com/NX-AI/xlstm-jax.}
}

Endnote

%0 Conference Paper
%T xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
%A Maximilian Beck
%A Korbinian Pöppel
%A Phillip Lippe
%A Richard Kurle
%A Patrick M Blies
%A Günter Klambauer
%A Sebastian Böck
%A Sepp Hochreiter
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-beck25b
%I PMLR
%P 3335--3357
%U https://proceedings.mlr.press/v267/beck25b.html
%V 267
%X Recent breakthroughs in solving reasoning, math and coding problems with Large Language Models (LLMs) have been enabled by investing substantial computation budgets at inference time. Therefore, inference speed is one of the most critical properties of LLM architectures, and there is a growing need for LLMs that are efficient and fast at inference. Recently, LLMs built on the xLSTM architecture have emerged as a powerful alternative to Transformers, offering linear compute scaling with sequence length and constant memory usage, both highly desirable properties for efficient inference. However, such xLSTM-based LLMs have yet to be scaled to larger models and assessed and compared with respect to inference speed and efficiency. In this work, we introduce xLSTM 7B, a 7-billion-parameter LLM that combines xLSTM’s architectural benefits with targeted optimizations for fast and efficient inference. Our experiments demonstrate that xLSTM 7B achieves performance on downstream tasks comparable to other similar-sized LLMs, while providing significantly faster inference speeds and greater efficiency compared to Llama- and Mamba-based LLMs. These results establish xLSTM 7B as the fastest and most efficient 7B LLM, offering a solution for tasks that require large amounts of test-time computation. Our work highlights xLSTM’s potential as a foundational architecture for methods building on heavy use of LLM inference. Our model weights, model code and training code are open-source. Model: https://huggingface.co/NX-AI/xLSTM-7b Code: https://github.com/NX-AI/xlstm and https://github.com/NX-AI/xlstm-jax.

APA

Beck, M., Pöppel, K., Lippe, P., Kurle, R., Blies, P.M., Klambauer, G., Böck, S. & Hochreiter, S.. (2025). xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:3335-3357 Available from https://proceedings.mlr.press/v267/beck25b.html.

xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference

Abstract

Cite this Paper

Related Material