Mixture Proportion Estimation via Kernel Embeddings of Distributions

Harish Ramaswamy; Clayton Scott; Ambuj Tewari

Mixture Proportion Estimation via Kernel Embeddings of Distributions

Harish Ramaswamy, Clayton Scott, Ambuj Tewari

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:2052-2060, 2016.

Abstract

Mixture proportion estimation (MPE) is the problem of estimating the weight of a component distribution in a mixture, given samples from the mixture and component. This problem constitutes a key part in many "weakly supervised learning" problems like learning with positive and unlabelled samples, learning with label noise, anomaly detection and crowdsourcing. While there have been several methods proposed to solve this problem, to the best of our knowledge no efficient algorithm with a proven convergence rate towards the true proportion exists for this problem. We fill this gap by constructing a provably correct algorithm for MPE, and derive convergence rates under certain assumptions on the distribution. Our method is based on embedding distributions onto an RKHS, and implementing it only requires solving a simple convex quadratic programming problem a few times. We run our algorithm on several standard classification datasets, and demonstrate that it performs comparably to or better than other algorithms on most datasets.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-ramaswamy16,
  title = 	 {Mixture Proportion Estimation via Kernel Embeddings of Distributions},
  author = 	 {Ramaswamy, Harish and Scott, Clayton and Tewari, Ambuj},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {2052--2060},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/ramaswamy16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/ramaswamy16.html},
  abstract = 	 {Mixture proportion estimation (MPE) is the problem of estimating the weight of a component distribution in a mixture, given samples from the mixture and component. This problem constitutes a key part in many "weakly supervised learning" problems like learning with positive and unlabelled samples, learning with label noise, anomaly detection and crowdsourcing. While there have been several methods proposed to solve this problem, to the best of our knowledge no efficient algorithm with a proven convergence rate towards the true proportion exists for this problem. We fill this gap by constructing a provably correct algorithm for MPE, and derive convergence rates under certain assumptions on the distribution. Our method is based on embedding distributions onto an RKHS, and implementing it only requires solving a simple convex quadratic programming problem a few times. We run our algorithm on several standard classification datasets, and demonstrate that it performs comparably to or better than other algorithms on most datasets.}
}

Endnote

%0 Conference Paper
%T Mixture Proportion Estimation via Kernel Embeddings of Distributions
%A Harish Ramaswamy
%A Clayton Scott
%A Ambuj Tewari
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-ramaswamy16
%I PMLR
%P 2052--2060
%U https://proceedings.mlr.press/v48/ramaswamy16.html
%V 48
%X Mixture proportion estimation (MPE) is the problem of estimating the weight of a component distribution in a mixture, given samples from the mixture and component. This problem constitutes a key part in many "weakly supervised learning" problems like learning with positive and unlabelled samples, learning with label noise, anomaly detection and crowdsourcing. While there have been several methods proposed to solve this problem, to the best of our knowledge no efficient algorithm with a proven convergence rate towards the true proportion exists for this problem. We fill this gap by constructing a provably correct algorithm for MPE, and derive convergence rates under certain assumptions on the distribution. Our method is based on embedding distributions onto an RKHS, and implementing it only requires solving a simple convex quadratic programming problem a few times. We run our algorithm on several standard classification datasets, and demonstrate that it performs comparably to or better than other algorithms on most datasets.

RIS


TY  - CPAPER
TI  - Mixture Proportion Estimation via Kernel Embeddings of Distributions
AU  - Harish Ramaswamy
AU  - Clayton Scott
AU  - Ambuj Tewari
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-ramaswamy16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 2052
EP  - 2060
L1  - http://proceedings.mlr.press/v48/ramaswamy16.pdf
UR  - https://proceedings.mlr.press/v48/ramaswamy16.html
AB  - Mixture proportion estimation (MPE) is the problem of estimating the weight of a component distribution in a mixture, given samples from the mixture and component. This problem constitutes a key part in many "weakly supervised learning" problems like learning with positive and unlabelled samples, learning with label noise, anomaly detection and crowdsourcing. While there have been several methods proposed to solve this problem, to the best of our knowledge no efficient algorithm with a proven convergence rate towards the true proportion exists for this problem. We fill this gap by constructing a provably correct algorithm for MPE, and derive convergence rates under certain assumptions on the distribution. Our method is based on embedding distributions onto an RKHS, and implementing it only requires solving a simple convex quadratic programming problem a few times. We run our algorithm on several standard classification datasets, and demonstrate that it performs comparably to or better than other algorithms on most datasets.
ER  -

APA


Ramaswamy, H., Scott, C. & Tewari, A.. (2016). Mixture Proportion Estimation via Kernel Embeddings of Distributions. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:2052-2060 Available from https://proceedings.mlr.press/v48/ramaswamy16.html.

Mixture Proportion Estimation via Kernel Embeddings of Distributions

Abstract

Cite this Paper

Related Material