Dropout Training, Data-dependent Regularization, and Generalization Bounds

Wenlong Mou; Yuchen Zhou; Jun Gao; Liwei Wang

Dropout Training, Data-dependent Regularization, and Generalization Bounds

Wenlong Mou, Yuchen Zhou, Jun Gao, Liwei Wang

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:3645-3653, 2018.

Abstract

We study the problem of generalization guarantees for dropout training. A general framework is first proposed for learning procedures with random perturbation on model parameters. The generalization error is bounded by sum of two offset Rademacher complexities: the main term is Rademacher complexity of the hypothesis class with minus offset induced by the perturbation variance, which characterizes data-dependent regularization by the random perturbation; the auxiliary term is offset Rademacher complexity for the variance class, controlling the degree to which this regularization effect can be weakened. For neural networks, we estimate upper and lower bounds for the variance induced by truthful dropout, a variant of dropout that we propose to ensure unbiased output and fit into our framework, and the variance bounds exhibits connection to adaptive regularization methods. By applying our framework to ReLU networks with one hidden layer, a generalization upper bound is derived with no assumptions on the parameter norms or data distribution, with $O(1/n)$ fast rate and adaptivity to geometry of data points being achieved at the same time.

Cite this Paper

BibTeX

@InProceedings{pmlr-v80-mou18a,
  title = 	 {Dropout Training, Data-dependent Regularization, and Generalization Bounds},
  author =       {Mou, Wenlong and Zhou, Yuchen and Gao, Jun and Wang, Liwei},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {3645--3653},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/mou18a/mou18a.pdf},
  url = 	 {https://proceedings.mlr.press/v80/mou18a.html},
  abstract = 	 {We study the problem of generalization guarantees for dropout training. A general framework is first proposed for learning procedures with random perturbation on model parameters. The generalization error is bounded by sum of two offset Rademacher complexities: the main term is Rademacher complexity of the hypothesis class with minus offset induced by the perturbation variance, which characterizes data-dependent regularization by the random perturbation; the auxiliary term is offset Rademacher complexity for the variance class, controlling the degree to which this regularization effect can be weakened. For neural networks, we estimate upper and lower bounds for the variance induced by truthful dropout, a variant of dropout that we propose to ensure unbiased output and fit into our framework, and the variance bounds exhibits connection to adaptive regularization methods. By applying our framework to ReLU networks with one hidden layer, a generalization upper bound is derived with no assumptions on the parameter norms or data distribution, with $O(1/n)$ fast rate and adaptivity to geometry of data points being achieved at the same time.}
}

Endnote

%0 Conference Paper
%T Dropout Training, Data-dependent Regularization, and Generalization Bounds
%A Wenlong Mou
%A Yuchen Zhou
%A Jun Gao
%A Liwei Wang
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-mou18a
%I PMLR
%P 3645--3653
%U https://proceedings.mlr.press/v80/mou18a.html
%V 80
%X We study the problem of generalization guarantees for dropout training. A general framework is first proposed for learning procedures with random perturbation on model parameters. The generalization error is bounded by sum of two offset Rademacher complexities: the main term is Rademacher complexity of the hypothesis class with minus offset induced by the perturbation variance, which characterizes data-dependent regularization by the random perturbation; the auxiliary term is offset Rademacher complexity for the variance class, controlling the degree to which this regularization effect can be weakened. For neural networks, we estimate upper and lower bounds for the variance induced by truthful dropout, a variant of dropout that we propose to ensure unbiased output and fit into our framework, and the variance bounds exhibits connection to adaptive regularization methods. By applying our framework to ReLU networks with one hidden layer, a generalization upper bound is derived with no assumptions on the parameter norms or data distribution, with $O(1/n)$ fast rate and adaptivity to geometry of data points being achieved at the same time.

APA

Mou, W., Zhou, Y., Gao, J. & Wang, L.. (2018). Dropout Training, Data-dependent Regularization, and Generalization Bounds. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:3645-3653 Available from https://proceedings.mlr.press/v80/mou18a.html.

Dropout Training, Data-dependent Regularization, and Generalization Bounds

Abstract

Cite this Paper

Related Material