Characterizing ResNet’s Universal Approximation Capability

Chenghao Liu, Enming Liang, Minghua Chen
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:31477-31515, 2024.

Abstract

Since its debut in 2016, ResNet has become arguably the most favorable architecture in deep neural network (DNN) design. It effectively addresses the gradient vanishing/exploding issue in DNN training, allowing engineers to fully unleash DNN’s potential in tackling challenging problems in various domains. Despite its practical success, an essential theoretical question remains largely open: how well/best can ResNet approximate functions? In this paper, we answer this question for several important function classes, including polynomials and smooth functions. In particular, we show that ResNet with constant width can approximate Lipschitz continuous function with a Lipschitz constant $\mu$ using $\mathcal{O}(c(d)(\varepsilon/\mu)^{-d/2})$ tunable weights, where $c(d)$ is a constant depending on the input dimension $d$ and $\epsilon>0$ is the target approximation error. Further, we extend such a result to Lebesgue-integrable functions with the upper bound characterized by the modulus of continuity. These results indicate a factor of $d$ reduction in the number of tunable weights compared with the classical results for ReLU networks. Our results are also order-optimal in $\varepsilon$, thus achieving optimal approximation rate, as they match a generalized lower bound derived in this paper. This work adds to the theoretical justifications for ResNet’s stellar practical performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-liu24am, title = {Characterizing {R}es{N}et’s Universal Approximation Capability}, author = {Liu, Chenghao and Liang, Enming and Chen, Minghua}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {31477--31515}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/liu24am/liu24am.pdf}, url = {https://proceedings.mlr.press/v235/liu24am.html}, abstract = {Since its debut in 2016, ResNet has become arguably the most favorable architecture in deep neural network (DNN) design. It effectively addresses the gradient vanishing/exploding issue in DNN training, allowing engineers to fully unleash DNN’s potential in tackling challenging problems in various domains. Despite its practical success, an essential theoretical question remains largely open: how well/best can ResNet approximate functions? In this paper, we answer this question for several important function classes, including polynomials and smooth functions. In particular, we show that ResNet with constant width can approximate Lipschitz continuous function with a Lipschitz constant $\mu$ using $\mathcal{O}(c(d)(\varepsilon/\mu)^{-d/2})$ tunable weights, where $c(d)$ is a constant depending on the input dimension $d$ and $\epsilon>0$ is the target approximation error. Further, we extend such a result to Lebesgue-integrable functions with the upper bound characterized by the modulus of continuity. These results indicate a factor of $d$ reduction in the number of tunable weights compared with the classical results for ReLU networks. Our results are also order-optimal in $\varepsilon$, thus achieving optimal approximation rate, as they match a generalized lower bound derived in this paper. This work adds to the theoretical justifications for ResNet’s stellar practical performance.} }
Endnote
%0 Conference Paper %T Characterizing ResNet’s Universal Approximation Capability %A Chenghao Liu %A Enming Liang %A Minghua Chen %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-liu24am %I PMLR %P 31477--31515 %U https://proceedings.mlr.press/v235/liu24am.html %V 235 %X Since its debut in 2016, ResNet has become arguably the most favorable architecture in deep neural network (DNN) design. It effectively addresses the gradient vanishing/exploding issue in DNN training, allowing engineers to fully unleash DNN’s potential in tackling challenging problems in various domains. Despite its practical success, an essential theoretical question remains largely open: how well/best can ResNet approximate functions? In this paper, we answer this question for several important function classes, including polynomials and smooth functions. In particular, we show that ResNet with constant width can approximate Lipschitz continuous function with a Lipschitz constant $\mu$ using $\mathcal{O}(c(d)(\varepsilon/\mu)^{-d/2})$ tunable weights, where $c(d)$ is a constant depending on the input dimension $d$ and $\epsilon>0$ is the target approximation error. Further, we extend such a result to Lebesgue-integrable functions with the upper bound characterized by the modulus of continuity. These results indicate a factor of $d$ reduction in the number of tunable weights compared with the classical results for ReLU networks. Our results are also order-optimal in $\varepsilon$, thus achieving optimal approximation rate, as they match a generalized lower bound derived in this paper. This work adds to the theoretical justifications for ResNet’s stellar practical performance.
APA
Liu, C., Liang, E. & Chen, M.. (2024). Characterizing ResNet’s Universal Approximation Capability. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:31477-31515 Available from https://proceedings.mlr.press/v235/liu24am.html.

Related Material