Lipschitz DensityRatios, Structured Data, and Datadriven Tuning
[edit]
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:13201328, 2017.
Abstract
Densityratio estimation (i.e. estimating $f = f_Q/f_P$ for two unknown distributions Q and P) has proved useful in many Machine Learning tasks, e.g., riskcalibration in transferlearning, twosample tests, and also useful in common techniques such importance sampling and bias correction. While there are many important analyses of this estimation problem, the present paper derives convergence rates in other practical settings that are less understood, namely, extensions of traditional Lipschitz smoothness conditions, and common highdimensional settings with structured data (e.g. manifold data, sparse data). Various interesting facts, which hold in earlier settings, are shown to extend to these settings. Namely, (1) optimal rates depend only on the smoothness of the ratio f, and not on the densities $f_Q$, $f_P$, supporting the belief that plugging in estimates for $f_Q$, $f_P$ is suboptimal; (2) optimal rates depend only on the intrinsic dimension of data, i.e. this problem – unlike density estimation – escapes the curse of dimension. We further show that nearoptimal rates are attainable by estimators tuned from data alone, i.e. with no prior distributional information. This last fact is of special interest in unsupervised settings such as this one, where only oracle rates seem to be known, i.e., rates which assume critical distributional information usually unavailable in practice.
Related Material


