ReLU Network with Width $d+\mathcalO(1)$ Can Achieve Optimal Approximation Rate

Chenghao Liu, Minghua Chen
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:30755-30788, 2024.

Abstract

The prevalent employment of narrow neural networks, characterized by their minimal parameter count per layer, has led to a surge in research exploring their potential as universal function approximators. A notable result in this field states that networks with just a width of $d+1$ can approximate any continuous function for input dimension $d$ arbitrarily well. However, the optimal approximation rate for these narrowest networks, i.e., the optimal relation between the count of tunable parameters and the approximation error, remained unclear. In this paper, we address this gap by proving that ReLU networks with width $d+1$ can achieve the optimal approximation rate for continuous functions over the domain $[0,1]^d$ under $L^p$ norm for $p\in[1,\infty)$. We further show that for the uniform norm, a width of $d+11$ is sufficient. We also extend the results to narrow feed-forward networks with various activations, confirming their capability to approximate at the optimal rate. This work adds to the understanding of universal approximation of narrow networks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-liu24g, title = {{R}e{LU} Network with Width $d+\mathcal{O}(1)$ Can Achieve Optimal Approximation Rate}, author = {Liu, Chenghao and Chen, Minghua}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {30755--30788}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/liu24g/liu24g.pdf}, url = {https://proceedings.mlr.press/v235/liu24g.html}, abstract = {The prevalent employment of narrow neural networks, characterized by their minimal parameter count per layer, has led to a surge in research exploring their potential as universal function approximators. A notable result in this field states that networks with just a width of $d+1$ can approximate any continuous function for input dimension $d$ arbitrarily well. However, the optimal approximation rate for these narrowest networks, i.e., the optimal relation between the count of tunable parameters and the approximation error, remained unclear. In this paper, we address this gap by proving that ReLU networks with width $d+1$ can achieve the optimal approximation rate for continuous functions over the domain $[0,1]^d$ under $L^p$ norm for $p\in[1,\infty)$. We further show that for the uniform norm, a width of $d+11$ is sufficient. We also extend the results to narrow feed-forward networks with various activations, confirming their capability to approximate at the optimal rate. This work adds to the understanding of universal approximation of narrow networks.} }
Endnote
%0 Conference Paper %T ReLU Network with Width $d+\mathcalO(1)$ Can Achieve Optimal Approximation Rate %A Chenghao Liu %A Minghua Chen %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-liu24g %I PMLR %P 30755--30788 %U https://proceedings.mlr.press/v235/liu24g.html %V 235 %X The prevalent employment of narrow neural networks, characterized by their minimal parameter count per layer, has led to a surge in research exploring their potential as universal function approximators. A notable result in this field states that networks with just a width of $d+1$ can approximate any continuous function for input dimension $d$ arbitrarily well. However, the optimal approximation rate for these narrowest networks, i.e., the optimal relation between the count of tunable parameters and the approximation error, remained unclear. In this paper, we address this gap by proving that ReLU networks with width $d+1$ can achieve the optimal approximation rate for continuous functions over the domain $[0,1]^d$ under $L^p$ norm for $p\in[1,\infty)$. We further show that for the uniform norm, a width of $d+11$ is sufficient. We also extend the results to narrow feed-forward networks with various activations, confirming their capability to approximate at the optimal rate. This work adds to the understanding of universal approximation of narrow networks.
APA
Liu, C. & Chen, M.. (2024). ReLU Network with Width $d+\mathcalO(1)$ Can Achieve Optimal Approximation Rate. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:30755-30788 Available from https://proceedings.mlr.press/v235/liu24g.html.

Related Material