Minimum Width for Universal Approximation using Squashable Activation Functions

Jonghyun Shin; Namjun Kim; Geonho Hwang; Sejun Park

Minimum Width for Universal Approximation using Squashable Activation Functions

Jonghyun Shin, Namjun Kim, Geonho Hwang, Sejun Park

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:55096-55121, 2025.

Abstract

The exact minimum width that allows for universal approximation of unbounded-depth networks is known only for ReLU and its variants. In this work, we study the minimum width of networks using general activation functions. Specifically, we focus on squashable functions that can approximate the identity function and binary step function by alternatively composing with affine transformations. We show that for networks using a squashable activation function to universally approximate $L^p$ functions from $[0,1]^{d_x}$ to $\mathbb R^{d_y}$, the minimum width is $\max\\{d_x,d_y,2\\}$ unless $d_x=d_y=1$; the same bound holds for $d_x=d_y=1$ if the activation function is monotone. We then provide sufficient conditions for squashability and show that all non-affine analytic functions and a class of piecewise functions are squashable, i.e., our minimum width result holds for those general classes of activation functions.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-shin25b,
  title = 	 {Minimum Width for Universal Approximation using Squashable Activation Functions},
  author =       {Shin, Jonghyun and Kim, Namjun and Hwang, Geonho and Park, Sejun},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {55096--55121},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/shin25b/shin25b.pdf},
  url = 	 {https://proceedings.mlr.press/v267/shin25b.html},
  abstract = 	 {The exact minimum width that allows for universal approximation of unbounded-depth networks is known only for ReLU and its variants. In this work, we study the minimum width of networks using general activation functions. Specifically, we focus on squashable functions that can approximate the identity function and binary step function by alternatively composing with affine transformations. We show that for networks using a squashable activation function to universally approximate $L^p$ functions from $[0,1]^{d_x}$ to $\mathbb R^{d_y}$, the minimum width is $\max\\{d_x,d_y,2\\}$ unless $d_x=d_y=1$; the same bound holds for $d_x=d_y=1$ if the activation function is monotone. We then provide sufficient conditions for squashability and show that all non-affine analytic functions and a class of piecewise functions are squashable, i.e., our minimum width result holds for those general classes of activation functions.}
}

Endnote

%0 Conference Paper
%T Minimum Width for Universal Approximation using Squashable Activation Functions
%A Jonghyun Shin
%A Namjun Kim
%A Geonho Hwang
%A Sejun Park
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-shin25b
%I PMLR
%P 55096--55121
%U https://proceedings.mlr.press/v267/shin25b.html
%V 267
%X The exact minimum width that allows for universal approximation of unbounded-depth networks is known only for ReLU and its variants. In this work, we study the minimum width of networks using general activation functions. Specifically, we focus on squashable functions that can approximate the identity function and binary step function by alternatively composing with affine transformations. We show that for networks using a squashable activation function to universally approximate $L^p$ functions from $[0,1]^{d_x}$ to $\mathbb R^{d_y}$, the minimum width is $\max\\{d_x,d_y,2\\}$ unless $d_x=d_y=1$; the same bound holds for $d_x=d_y=1$ if the activation function is monotone. We then provide sufficient conditions for squashability and show that all non-affine analytic functions and a class of piecewise functions are squashable, i.e., our minimum width result holds for those general classes of activation functions.

APA

Shin, J., Kim, N., Hwang, G. & Park, S.. (2025). Minimum Width for Universal Approximation using Squashable Activation Functions. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:55096-55121 Available from https://proceedings.mlr.press/v267/shin25b.html.

Minimum Width for Universal Approximation using Squashable Activation Functions

Abstract

Cite this Paper

Related Material