Dropout Training, Data-dependent Regularization, and Generalization Bounds

Wenlong Mou, Yuchen Zhou, Jun Gao, Liwei Wang
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:3645-3653, 2018.

Abstract

We study the problem of generalization guarantees for dropout training. A general framework is first proposed for learning procedures with random perturbation on model parameters. The generalization error is bounded by sum of two offset Rademacher complexities: the main term is Rademacher complexity of the hypothesis class with minus offset induced by the perturbation variance, which characterizes data-dependent regularization by the random perturbation; the auxiliary term is offset Rademacher complexity for the variance class, controlling the degree to which this regularization effect can be weakened. For neural networks, we estimate upper and lower bounds for the variance induced by truthful dropout, a variant of dropout that we propose to ensure unbiased output and fit into our framework, and the variance bounds exhibits connection to adaptive regularization methods. By applying our framework to ReLU networks with one hidden layer, a generalization upper bound is derived with no assumptions on the parameter norms or data distribution, with $O(1/n)$ fast rate and adaptivity to geometry of data points being achieved at the same time.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-mou18a, title = {Dropout Training, Data-dependent Regularization, and Generalization Bounds}, author = {Mou, Wenlong and Zhou, Yuchen and Gao, Jun and Wang, Liwei}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {3645--3653}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/mou18a/mou18a.pdf}, url = {https://proceedings.mlr.press/v80/mou18a.html}, abstract = {We study the problem of generalization guarantees for dropout training. A general framework is first proposed for learning procedures with random perturbation on model parameters. The generalization error is bounded by sum of two offset Rademacher complexities: the main term is Rademacher complexity of the hypothesis class with minus offset induced by the perturbation variance, which characterizes data-dependent regularization by the random perturbation; the auxiliary term is offset Rademacher complexity for the variance class, controlling the degree to which this regularization effect can be weakened. For neural networks, we estimate upper and lower bounds for the variance induced by truthful dropout, a variant of dropout that we propose to ensure unbiased output and fit into our framework, and the variance bounds exhibits connection to adaptive regularization methods. By applying our framework to ReLU networks with one hidden layer, a generalization upper bound is derived with no assumptions on the parameter norms or data distribution, with $O(1/n)$ fast rate and adaptivity to geometry of data points being achieved at the same time.} }
Endnote
%0 Conference Paper %T Dropout Training, Data-dependent Regularization, and Generalization Bounds %A Wenlong Mou %A Yuchen Zhou %A Jun Gao %A Liwei Wang %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-mou18a %I PMLR %P 3645--3653 %U https://proceedings.mlr.press/v80/mou18a.html %V 80 %X We study the problem of generalization guarantees for dropout training. A general framework is first proposed for learning procedures with random perturbation on model parameters. The generalization error is bounded by sum of two offset Rademacher complexities: the main term is Rademacher complexity of the hypothesis class with minus offset induced by the perturbation variance, which characterizes data-dependent regularization by the random perturbation; the auxiliary term is offset Rademacher complexity for the variance class, controlling the degree to which this regularization effect can be weakened. For neural networks, we estimate upper and lower bounds for the variance induced by truthful dropout, a variant of dropout that we propose to ensure unbiased output and fit into our framework, and the variance bounds exhibits connection to adaptive regularization methods. By applying our framework to ReLU networks with one hidden layer, a generalization upper bound is derived with no assumptions on the parameter norms or data distribution, with $O(1/n)$ fast rate and adaptivity to geometry of data points being achieved at the same time.
APA
Mou, W., Zhou, Y., Gao, J. & Wang, L.. (2018). Dropout Training, Data-dependent Regularization, and Generalization Bounds. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:3645-3653 Available from https://proceedings.mlr.press/v80/mou18a.html.

Related Material