Looped ReLU MLPs May Be All You Need as Practical Programmable Computers

Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:2647-2655, 2025.

Abstract

Previous work has demonstrated that attention mechanisms are Turing complete. More recently, it has been shown that a looped 9-layer Transformer can function as a universal programmable computer. In contrast, the multi-layer perceptrons with $\mathsf{ReLU}$ activation ($\mathsf{ReLU}$-$\mathsf{MLP}$), one of the most fundamental components of neural networks, is known to be expressive; specifically, a two-layer neural network is a universal approximator given an exponentially large number of hidden neurons. However, it remains unclear whether a $\mathsf{ReLU}$-$\mathsf{MLP}$ can be made into a universal programmable computer using a practical number of weights. In this work, we provide an affirmative answer that a looped 23-layer $\mathsf{ReLU}$-$\mathsf{MLP}$ is capable of performing the basic necessary operations, more efficiently and effectively functioning as a programmable computer than a looped Transformer. This indicates simple modules have stronger expressive power than previously expected and have not been fully explored. Our work provides insights into the mechanisms of neural networks and demonstrates that complex tasks, such as functioning as a programmable computer, do not necessarily require advanced architectures like Transformers.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-liang25a, title = {Looped ReLU MLPs May Be All You Need as Practical Programmable Computers}, author = {Liang, Yingyu and Sha, Zhizhou and Shi, Zhenmei and Song, Zhao and Zhou, Yufa}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {2647--2655}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/liang25a/liang25a.pdf}, url = {https://proceedings.mlr.press/v258/liang25a.html}, abstract = {Previous work has demonstrated that attention mechanisms are Turing complete. More recently, it has been shown that a looped 9-layer Transformer can function as a universal programmable computer. In contrast, the multi-layer perceptrons with $\mathsf{ReLU}$ activation ($\mathsf{ReLU}$-$\mathsf{MLP}$), one of the most fundamental components of neural networks, is known to be expressive; specifically, a two-layer neural network is a universal approximator given an exponentially large number of hidden neurons. However, it remains unclear whether a $\mathsf{ReLU}$-$\mathsf{MLP}$ can be made into a universal programmable computer using a practical number of weights. In this work, we provide an affirmative answer that a looped 23-layer $\mathsf{ReLU}$-$\mathsf{MLP}$ is capable of performing the basic necessary operations, more efficiently and effectively functioning as a programmable computer than a looped Transformer. This indicates simple modules have stronger expressive power than previously expected and have not been fully explored. Our work provides insights into the mechanisms of neural networks and demonstrates that complex tasks, such as functioning as a programmable computer, do not necessarily require advanced architectures like Transformers.} }
Endnote
%0 Conference Paper %T Looped ReLU MLPs May Be All You Need as Practical Programmable Computers %A Yingyu Liang %A Zhizhou Sha %A Zhenmei Shi %A Zhao Song %A Yufa Zhou %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-liang25a %I PMLR %P 2647--2655 %U https://proceedings.mlr.press/v258/liang25a.html %V 258 %X Previous work has demonstrated that attention mechanisms are Turing complete. More recently, it has been shown that a looped 9-layer Transformer can function as a universal programmable computer. In contrast, the multi-layer perceptrons with $\mathsf{ReLU}$ activation ($\mathsf{ReLU}$-$\mathsf{MLP}$), one of the most fundamental components of neural networks, is known to be expressive; specifically, a two-layer neural network is a universal approximator given an exponentially large number of hidden neurons. However, it remains unclear whether a $\mathsf{ReLU}$-$\mathsf{MLP}$ can be made into a universal programmable computer using a practical number of weights. In this work, we provide an affirmative answer that a looped 23-layer $\mathsf{ReLU}$-$\mathsf{MLP}$ is capable of performing the basic necessary operations, more efficiently and effectively functioning as a programmable computer than a looped Transformer. This indicates simple modules have stronger expressive power than previously expected and have not been fully explored. Our work provides insights into the mechanisms of neural networks and demonstrates that complex tasks, such as functioning as a programmable computer, do not necessarily require advanced architectures like Transformers.
APA
Liang, Y., Sha, Z., Shi, Z., Song, Z. & Zhou, Y.. (2025). Looped ReLU MLPs May Be All You Need as Practical Programmable Computers. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:2647-2655 Available from https://proceedings.mlr.press/v258/liang25a.html.

Related Material