Maximum Mean Discrepancy for Class Ratio Estimation: Convergence Bounds and Kernel Selection

Arun Iyer; Saketha Nath; Sunita Sarawagi

Maximum Mean Discrepancy for Class Ratio Estimation: Convergence Bounds and Kernel Selection

Arun Iyer, Saketha Nath, Sunita Sarawagi

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(1):530-538, 2014.

Abstract

In recent times, many real world applications have emerged that require estimates of class ratios in an unlabeled instance collection as opposed to labels of individual instances in the collection. In this paper we investigate the use of maximum mean discrepancy (MMD) in a reproducing kernel Hilbert space (RKHS) for estimating such ratios. First, we theoretically analyze the MMD-based estimates. Our analysis establishes that, under some mild conditions, the estimate is statistically consistent. More importantly, it provides an upper bound on the error in the estimate in terms of intuitive geometric quantities like class separation and data spread. Next, we use the insights obtained from the theoretical analysis, to propose a novel convex formulation that automatically learns the kernel to be employed in the MMD-based estimation. We design an efficient cutting plane algorithm for solving this formulation. Finally, we empirically compare our estimator with several existing methods, and show significantly improved performance under varying datasets, class ratios, and training sizes.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-iyer14,
  title = 	 {Maximum Mean Discrepancy for Class Ratio Estimation: Convergence Bounds and Kernel Selection},
  author = 	 {Iyer, Arun and Nath, Saketha and Sarawagi, Sunita},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {530--538},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {1},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/iyer14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/iyer14.html},
  abstract = 	 {In recent times, many real world applications have emerged that require estimates of class ratios in an unlabeled instance collection as opposed to labels of individual instances in the collection.  In this paper we investigate the use of maximum mean discrepancy (MMD) in a reproducing kernel Hilbert space (RKHS) for estimating such ratios. First, we theoretically analyze the MMD-based estimates. Our analysis establishes that, under some mild conditions, the estimate is statistically consistent. More importantly, it provides an upper bound on the error in the estimate in terms of intuitive geometric quantities like class separation and data spread. Next, we use the insights obtained from the theoretical analysis, to propose a novel convex formulation that automatically learns the kernel to be employed in the MMD-based estimation. We design an efficient cutting plane algorithm for solving this formulation.  Finally, we empirically compare our estimator with several existing methods, and show significantly improved performance under varying datasets, class ratios, and training sizes.}
}

Endnote

%0 Conference Paper
%T Maximum Mean Discrepancy for Class Ratio Estimation: Convergence Bounds and Kernel Selection
%A Arun Iyer
%A Saketha Nath
%A Sunita Sarawagi
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-iyer14
%I PMLR
%P 530--538
%U https://proceedings.mlr.press/v32/iyer14.html
%V 32
%N 1
%X In recent times, many real world applications have emerged that require estimates of class ratios in an unlabeled instance collection as opposed to labels of individual instances in the collection.  In this paper we investigate the use of maximum mean discrepancy (MMD) in a reproducing kernel Hilbert space (RKHS) for estimating such ratios. First, we theoretically analyze the MMD-based estimates. Our analysis establishes that, under some mild conditions, the estimate is statistically consistent. More importantly, it provides an upper bound on the error in the estimate in terms of intuitive geometric quantities like class separation and data spread. Next, we use the insights obtained from the theoretical analysis, to propose a novel convex formulation that automatically learns the kernel to be employed in the MMD-based estimation. We design an efficient cutting plane algorithm for solving this formulation.  Finally, we empirically compare our estimator with several existing methods, and show significantly improved performance under varying datasets, class ratios, and training sizes.

RIS


TY  - CPAPER
TI  - Maximum Mean Discrepancy for Class Ratio Estimation: Convergence Bounds and Kernel Selection
AU  - Arun Iyer
AU  - Saketha Nath
AU  - Sunita Sarawagi
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/01/27
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-iyer14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 1
SP  - 530
EP  - 538
L1  - http://proceedings.mlr.press/v32/iyer14.pdf
UR  - https://proceedings.mlr.press/v32/iyer14.html
AB  - In recent times, many real world applications have emerged that require estimates of class ratios in an unlabeled instance collection as opposed to labels of individual instances in the collection.  In this paper we investigate the use of maximum mean discrepancy (MMD) in a reproducing kernel Hilbert space (RKHS) for estimating such ratios. First, we theoretically analyze the MMD-based estimates. Our analysis establishes that, under some mild conditions, the estimate is statistically consistent. More importantly, it provides an upper bound on the error in the estimate in terms of intuitive geometric quantities like class separation and data spread. Next, we use the insights obtained from the theoretical analysis, to propose a novel convex formulation that automatically learns the kernel to be employed in the MMD-based estimation. We design an efficient cutting plane algorithm for solving this formulation.  Finally, we empirically compare our estimator with several existing methods, and show significantly improved performance under varying datasets, class ratios, and training sizes.
ER  -

APA


Iyer, A., Nath, S. & Sarawagi, S.. (2014). Maximum Mean Discrepancy for Class Ratio Estimation: Convergence Bounds and Kernel Selection. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(1):530-538 Available from https://proceedings.mlr.press/v32/iyer14.html.

Maximum Mean Discrepancy for Class Ratio Estimation: Convergence Bounds and Kernel Selection

Abstract

Cite this Paper

Related Material