Lipschitz Density-Ratios, Structured Data, and Data-driven Tuning

Samory Kpotufe
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:1320-1328, 2017.

Abstract

Density-ratio estimation (i.e. estimating $f = f_Q/f_P$ for two unknown distributions Q and P) has proved useful in many Machine Learning tasks, e.g., risk-calibration in transfer-learning, two-sample tests, and also useful in common techniques such importance sampling and bias correction. While there are many important analyses of this estimation problem, the present paper derives convergence rates in other practical settings that are less understood, namely, extensions of traditional Lipschitz smoothness conditions, and common high-dimensional settings with structured data (e.g. manifold data, sparse data). Various interesting facts, which hold in earlier settings, are shown to extend to these settings. Namely, (1) optimal rates depend only on the smoothness of the ratio f, and not on the densities $f_Q$, $f_P$, supporting the belief that plugging in estimates for $f_Q$, $f_P$ is suboptimal; (2) optimal rates depend only on the intrinsic dimension of data, i.e. this problem – unlike density estimation – escapes the curse of dimension. We further show that near-optimal rates are attainable by estimators tuned from data alone, i.e. with no prior distributional information. This last fact is of special interest in unsupervised settings such as this one, where only oracle rates seem to be known, i.e., rates which assume critical distributional information usually unavailable in practice.

Cite this Paper


BibTeX
@InProceedings{pmlr-v54-kpotufe17a, title = {{Lipschitz Density-Ratios, Structured Data, and Data-driven Tuning}}, author = {Kpotufe, Samory}, booktitle = {Proceedings of the 20th International Conference on Artificial Intelligence and Statistics}, pages = {1320--1328}, year = {2017}, editor = {Singh, Aarti and Zhu, Jerry}, volume = {54}, series = {Proceedings of Machine Learning Research}, month = {20--22 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v54/kpotufe17a/kpotufe17a.pdf}, url = {https://proceedings.mlr.press/v54/kpotufe17a.html}, abstract = {Density-ratio estimation (i.e. estimating $f = f_Q/f_P$ for two unknown distributions Q and P) has proved useful in many Machine Learning tasks, e.g., risk-calibration in transfer-learning, two-sample tests, and also useful in common techniques such importance sampling and bias correction. While there are many important analyses of this estimation problem, the present paper derives convergence rates in other practical settings that are less understood, namely, extensions of traditional Lipschitz smoothness conditions, and common high-dimensional settings with structured data (e.g. manifold data, sparse data). Various interesting facts, which hold in earlier settings, are shown to extend to these settings. Namely, (1) optimal rates depend only on the smoothness of the ratio f, and not on the densities $f_Q$, $f_P$, supporting the belief that plugging in estimates for $f_Q$, $f_P$ is suboptimal; (2) optimal rates depend only on the intrinsic dimension of data, i.e. this problem – unlike density estimation – escapes the curse of dimension. We further show that near-optimal rates are attainable by estimators tuned from data alone, i.e. with no prior distributional information. This last fact is of special interest in unsupervised settings such as this one, where only oracle rates seem to be known, i.e., rates which assume critical distributional information usually unavailable in practice. } }
Endnote
%0 Conference Paper %T Lipschitz Density-Ratios, Structured Data, and Data-driven Tuning %A Samory Kpotufe %B Proceedings of the 20th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2017 %E Aarti Singh %E Jerry Zhu %F pmlr-v54-kpotufe17a %I PMLR %P 1320--1328 %U https://proceedings.mlr.press/v54/kpotufe17a.html %V 54 %X Density-ratio estimation (i.e. estimating $f = f_Q/f_P$ for two unknown distributions Q and P) has proved useful in many Machine Learning tasks, e.g., risk-calibration in transfer-learning, two-sample tests, and also useful in common techniques such importance sampling and bias correction. While there are many important analyses of this estimation problem, the present paper derives convergence rates in other practical settings that are less understood, namely, extensions of traditional Lipschitz smoothness conditions, and common high-dimensional settings with structured data (e.g. manifold data, sparse data). Various interesting facts, which hold in earlier settings, are shown to extend to these settings. Namely, (1) optimal rates depend only on the smoothness of the ratio f, and not on the densities $f_Q$, $f_P$, supporting the belief that plugging in estimates for $f_Q$, $f_P$ is suboptimal; (2) optimal rates depend only on the intrinsic dimension of data, i.e. this problem – unlike density estimation – escapes the curse of dimension. We further show that near-optimal rates are attainable by estimators tuned from data alone, i.e. with no prior distributional information. This last fact is of special interest in unsupervised settings such as this one, where only oracle rates seem to be known, i.e., rates which assume critical distributional information usually unavailable in practice.
APA
Kpotufe, S.. (2017). Lipschitz Density-Ratios, Structured Data, and Data-driven Tuning. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 54:1320-1328 Available from https://proceedings.mlr.press/v54/kpotufe17a.html.

Related Material