Asymptotic behavior of \ell_p-based Laplacian regularization in semi-supervised learning

Ahmed El Alaoui; Xiang Cheng; Aaditya Ramdas; Martin J. Wainwright; Michael I. Jordan

Asymptotic behavior of \ell_p-based Laplacian regularization in semi-supervised learning

Ahmed El Alaoui, Xiang Cheng, Aaditya Ramdas, Martin J. Wainwright, Michael I. Jordan

29th Annual Conference on Learning Theory, PMLR 49:879-906, 2016.

Abstract

Given a weighted graph with N vertices, consider a real-valued regression problem in a semi-supervised setting, where one observes n labeled vertices, and the task is to label the remaining ones. We present a theoretical study of \ell_p-based Laplacian regularization under a d-dimensional geometric random graph model. We provide a variational characterization of the performance of this regularized learner as N grows to infinity while n stays constant; the associated optimality conditions lead to a partial differential equation that must be satisfied by the associated function estimate \widehatf. From this formulation we derive several predictions on the limiting behavior the function \fhat, including (a) a phase transition in its smoothness at the threshold p = d + 1; and (b) a tradeoff between smoothness and sensitivity to the underlying unlabeled data distribution P. Thus, over the range p ≤d, the function estimate \widehatf is degenerate and “spiky,” whereas for p≥d+1, the function estimate \fhat is smooth. We show that the effect of the underlying density vanishes monotonically with p, such that in the limit p = ∞, corresponding to the so-called Absolutely Minimal Lipschitz Extension, the estimate \widehatf is independent of the distribution P. Under the assumption of semi-supervised smoothness, ignoring P can lead to poor statistical performance; in particular, we construct a specific example for d=1 to demonstrate that p=2 has lower risk than p=∞due to the former penalty adapting to P and the latter ignoring it. We also provide simulations that verify the accuracy of our predictions for finite sample sizes. Together, these properties show that p = d+1 is an optimal choice, yielding a function estimate \fhat that is both smooth and non-degenerate, while remaining maximally sensitive to P.

Cite this Paper

BibTeX


@InProceedings{pmlr-v49-elalaoui16,
  title = 	 {Asymptotic behavior of $\ell_p$-based {L}aplacian regularization in semi-supervised learning},
  author = 	 {El Alaoui, Ahmed and Cheng, Xiang and Ramdas, Aaditya and Wainwright, Martin J. and Jordan, Michael I.},
  booktitle = 	 {29th Annual Conference on Learning Theory},
  pages = 	 {879--906},
  year = 	 {2016},
  editor = 	 {Feldman, Vitaly and Rakhlin, Alexander and Shamir, Ohad},
  volume = 	 {49},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Columbia University, New York, New York, USA},
  month = 	 {23--26 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v49/elalaoui16.pdf},
  url = 	 {https://proceedings.mlr.press/v49/elalaoui16.html},
  abstract = 	 {Given a weighted graph with N vertices, consider a real-valued regression problem in a semi-supervised setting, where one observes n labeled vertices, and the task is to label the remaining ones. We present a theoretical study of \ell_p-based Laplacian regularization under a d-dimensional geometric random graph model. We provide a variational characterization of the performance of this regularized learner as N grows to infinity while n stays constant; the associated optimality conditions lead to a partial differential equation that must be satisfied by the associated function estimate \widehatf.  From this formulation we derive several predictions on the limiting behavior the function \fhat, including (a) a phase transition in its smoothness at the threshold p = d + 1; and (b) a tradeoff between smoothness and sensitivity to the underlying unlabeled data distribution P.  Thus, over the range p ≤d, the function estimate \widehatf is degenerate and “spiky,” whereas for p≥d+1, the function estimate \fhat is smooth.  We show that the effect of the underlying density vanishes monotonically with p, such that in the limit p = ∞, corresponding to the so-called Absolutely Minimal Lipschitz Extension, the estimate \widehatf is independent of the distribution P. Under the assumption of semi-supervised smoothness, ignoring P can lead to poor statistical performance; in particular, we construct a specific example for d=1 to demonstrate that p=2 has lower risk than p=∞due to the former penalty adapting to P and the latter ignoring it.  We also provide simulations that verify the accuracy of our predictions for finite sample sizes.  Together, these properties show that p = d+1 is an optimal choice, yielding a function estimate \fhat that is both smooth and non-degenerate, while remaining maximally sensitive to P. }
}

Endnote

%0 Conference Paper
%T Asymptotic behavior of \ell_p-based Laplacian regularization in semi-supervised learning
%A Ahmed El Alaoui
%A Xiang Cheng
%A Aaditya Ramdas
%A Martin J. Wainwright
%A Michael I. Jordan
%B 29th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2016
%E Vitaly Feldman
%E Alexander Rakhlin
%E Ohad Shamir	
%F pmlr-v49-elalaoui16
%I PMLR
%P 879--906
%U https://proceedings.mlr.press/v49/elalaoui16.html
%V 49
%X Given a weighted graph with N vertices, consider a real-valued regression problem in a semi-supervised setting, where one observes n labeled vertices, and the task is to label the remaining ones. We present a theoretical study of \ell_p-based Laplacian regularization under a d-dimensional geometric random graph model. We provide a variational characterization of the performance of this regularized learner as N grows to infinity while n stays constant; the associated optimality conditions lead to a partial differential equation that must be satisfied by the associated function estimate \widehatf.  From this formulation we derive several predictions on the limiting behavior the function \fhat, including (a) a phase transition in its smoothness at the threshold p = d + 1; and (b) a tradeoff between smoothness and sensitivity to the underlying unlabeled data distribution P.  Thus, over the range p ≤d, the function estimate \widehatf is degenerate and “spiky,” whereas for p≥d+1, the function estimate \fhat is smooth.  We show that the effect of the underlying density vanishes monotonically with p, such that in the limit p = ∞, corresponding to the so-called Absolutely Minimal Lipschitz Extension, the estimate \widehatf is independent of the distribution P. Under the assumption of semi-supervised smoothness, ignoring P can lead to poor statistical performance; in particular, we construct a specific example for d=1 to demonstrate that p=2 has lower risk than p=∞due to the former penalty adapting to P and the latter ignoring it.  We also provide simulations that verify the accuracy of our predictions for finite sample sizes.  Together, these properties show that p = d+1 is an optimal choice, yielding a function estimate \fhat that is both smooth and non-degenerate, while remaining maximally sensitive to P.

RIS


TY  - CPAPER
TI  - Asymptotic behavior of \ell_p-based Laplacian regularization in semi-supervised learning
AU  - Ahmed El Alaoui
AU  - Xiang Cheng
AU  - Aaditya Ramdas
AU  - Martin J. Wainwright
AU  - Michael I. Jordan
BT  - 29th Annual Conference on Learning Theory
DA  - 2016/06/06
ED  - Vitaly Feldman
ED  - Alexander Rakhlin
ED  - Ohad Shamir	
ID  - pmlr-v49-elalaoui16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 49
SP  - 879
EP  - 906
L1  - http://proceedings.mlr.press/v49/elalaoui16.pdf
UR  - https://proceedings.mlr.press/v49/elalaoui16.html
AB  - Given a weighted graph with N vertices, consider a real-valued regression problem in a semi-supervised setting, where one observes n labeled vertices, and the task is to label the remaining ones. We present a theoretical study of \ell_p-based Laplacian regularization under a d-dimensional geometric random graph model. We provide a variational characterization of the performance of this regularized learner as N grows to infinity while n stays constant; the associated optimality conditions lead to a partial differential equation that must be satisfied by the associated function estimate \widehatf.  From this formulation we derive several predictions on the limiting behavior the function \fhat, including (a) a phase transition in its smoothness at the threshold p = d + 1; and (b) a tradeoff between smoothness and sensitivity to the underlying unlabeled data distribution P.  Thus, over the range p ≤d, the function estimate \widehatf is degenerate and “spiky,” whereas for p≥d+1, the function estimate \fhat is smooth.  We show that the effect of the underlying density vanishes monotonically with p, such that in the limit p = ∞, corresponding to the so-called Absolutely Minimal Lipschitz Extension, the estimate \widehatf is independent of the distribution P. Under the assumption of semi-supervised smoothness, ignoring P can lead to poor statistical performance; in particular, we construct a specific example for d=1 to demonstrate that p=2 has lower risk than p=∞due to the former penalty adapting to P and the latter ignoring it.  We also provide simulations that verify the accuracy of our predictions for finite sample sizes.  Together, these properties show that p = d+1 is an optimal choice, yielding a function estimate \fhat that is both smooth and non-degenerate, while remaining maximally sensitive to P. 
ER  -

APA


El Alaoui, A., Cheng, X., Ramdas, A., Wainwright, M.J. & Jordan, M.I.. (2016). Asymptotic behavior of \ell_p-based Laplacian regularization in semi-supervised learning. 29th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 49:879-906 Available from https://proceedings.mlr.press/v49/elalaoui16.html.

Related Material

Download PDF