A Finite-Sample Generalization Bound for Semiparametric Regression: Partially Linear Models

Ruitong Huang; Csaba Szepesvari

A Finite-Sample Generalization Bound for Semiparametric Regression: Partially Linear Models

Ruitong Huang, Csaba Szepesvari

Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR 33:402-410, 2014.

Abstract

In this paper we provide generalization bounds for semiparametric regression with the so-called partially linear models where the regression function is written as the sum of a linear parametric and a nonlinear, nonparametric function, the latter taken from a some set \mathcalH with finite entropy-integral. The problem is technically challenging because the parametric part is unconstrained and the model is underdetermined, while the response is allowed to be unbounded with subgaussian tails. Under natural regularity conditions, we bound the generalization error as a function of the metric entropy of \mathcalH and the dimension of the linear model. Our main tool is a ratio-type concentration inequality for increments of empirical processes, based on which we are able to give an exponential tail bound on the size of the parametric component. We also provide a comparison to alternatives of this technique and discuss why and when the unconstrained parametric part in the model may cause a problem in terms of the expected risk. We also explain by means of a specific example why this problem cannot be detected using the results of classical asymptotic analysis often seen in the statistics literature.

Cite this Paper

BibTeX


@InProceedings{pmlr-v33-huang14,
  title = 	 {{A Finite-Sample Generalization Bound for Semiparametric Regression: Partially Linear Models}},
  author = 	 {Huang, Ruitong and Szepesvari, Csaba},
  booktitle = 	 {Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {402--410},
  year = 	 {2014},
  editor = 	 {Kaski, Samuel and Corander, Jukka},
  volume = 	 {33},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Reykjavik, Iceland},
  month = 	 {22--25 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v33/huang14.pdf},
  url = 	 {https://proceedings.mlr.press/v33/huang14.html},
  abstract = 	 {In this paper we provide generalization bounds for semiparametric regression with the so-called partially linear models where the regression function is written as the sum of a linear parametric and a nonlinear, nonparametric function, the latter taken from a some set \mathcalH with finite entropy-integral. The problem is technically challenging because the parametric part is unconstrained and the model is underdetermined, while the response is allowed to be unbounded with subgaussian tails. Under natural regularity conditions, we bound the generalization error as a function of the metric entropy of \mathcalH and the dimension of the linear model. Our main tool is a ratio-type concentration inequality for increments of empirical processes, based on which we are able to give an exponential tail bound on the size of the parametric component. We also provide a comparison to alternatives of this technique and discuss why and when the unconstrained parametric part in the model may cause a problem in terms of the expected risk. We also explain by means of a specific example why this problem cannot be detected using the results of classical asymptotic analysis often seen in the statistics literature.}
}

Endnote

%0 Conference Paper
%T A Finite-Sample Generalization Bound for Semiparametric Regression: Partially Linear Models
%A Ruitong Huang
%A Csaba Szepesvari
%B Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2014
%E Samuel Kaski
%E Jukka Corander	
%F pmlr-v33-huang14
%I PMLR
%P 402--410
%U https://proceedings.mlr.press/v33/huang14.html
%V 33
%X In this paper we provide generalization bounds for semiparametric regression with the so-called partially linear models where the regression function is written as the sum of a linear parametric and a nonlinear, nonparametric function, the latter taken from a some set \mathcalH with finite entropy-integral. The problem is technically challenging because the parametric part is unconstrained and the model is underdetermined, while the response is allowed to be unbounded with subgaussian tails. Under natural regularity conditions, we bound the generalization error as a function of the metric entropy of \mathcalH and the dimension of the linear model. Our main tool is a ratio-type concentration inequality for increments of empirical processes, based on which we are able to give an exponential tail bound on the size of the parametric component. We also provide a comparison to alternatives of this technique and discuss why and when the unconstrained parametric part in the model may cause a problem in terms of the expected risk. We also explain by means of a specific example why this problem cannot be detected using the results of classical asymptotic analysis often seen in the statistics literature.

RIS


TY  - CPAPER
TI  - A Finite-Sample Generalization Bound for Semiparametric Regression: Partially Linear Models
AU  - Ruitong Huang
AU  - Csaba Szepesvari
BT  - Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics
DA  - 2014/04/02
ED  - Samuel Kaski
ED  - Jukka Corander	
ID  - pmlr-v33-huang14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 33
SP  - 402
EP  - 410
L1  - http://proceedings.mlr.press/v33/huang14.pdf
UR  - https://proceedings.mlr.press/v33/huang14.html
AB  - In this paper we provide generalization bounds for semiparametric regression with the so-called partially linear models where the regression function is written as the sum of a linear parametric and a nonlinear, nonparametric function, the latter taken from a some set \mathcalH with finite entropy-integral. The problem is technically challenging because the parametric part is unconstrained and the model is underdetermined, while the response is allowed to be unbounded with subgaussian tails. Under natural regularity conditions, we bound the generalization error as a function of the metric entropy of \mathcalH and the dimension of the linear model. Our main tool is a ratio-type concentration inequality for increments of empirical processes, based on which we are able to give an exponential tail bound on the size of the parametric component. We also provide a comparison to alternatives of this technique and discuss why and when the unconstrained parametric part in the model may cause a problem in terms of the expected risk. We also explain by means of a specific example why this problem cannot be detected using the results of classical asymptotic analysis often seen in the statistics literature.
ER  -

APA


Huang, R. & Szepesvari, C.. (2014). A Finite-Sample Generalization Bound for Semiparametric Regression: Partially Linear Models. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 33:402-410 Available from https://proceedings.mlr.press/v33/huang14.html.

A Finite-Sample Generalization Bound for Semiparametric Regression: Partially Linear Models

Abstract

Cite this Paper

Related Material