[edit]
Depth and Feature Learning are Provably Beneficial for Neural Network Discriminators
Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:421-447, 2022.
Abstract
We construct pairs of distributions $\mu_d, \nu_d$ on $\mathbb{R}^d$ such that the quantity $|\mathbb{E}_{x \sim \mu_d} [F(x)] - \mathbb{E}_{x \sim \nu_d} [F(x)]|$ decreases as $\Omega(1/d^2)$ for some three-layer ReLU network $F$ with polynomial width and weights, while declining exponentially in $d$ if $F$ is any two-layer network with polynomial weights. This shows that deep GAN discriminators are able to distinguish distributions that shallow discriminators cannot. Analogously, we build pairs of distributions $\mu_d, \nu_d$ on $\mathbb{R}^d$ such that $|\mathbb{E}_{x \sim \mu_d} [F(x)] - \mathbb{E}_{x \sim \nu_d} [F(x)]|$ decreases as $\Omega(1/(d\log d))$ for two-layer ReLU networks with polynomial weights, while declining exponentially for bounded-norm functions in the associated RKHS. This confirms that feature learning is beneficial for discriminators. Our bounds are based on Fourier transforms.