On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection

Pratik Jawanpuria; Manik Varma; Saketha Nath

On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection

Pratik Jawanpuria, Manik Varma, Saketha Nath

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):118-126, 2014.

Abstract

Our objective is to develop formulations and algorithms for efficiently computing the feature selection path – i.e. the variation in classification accuracy as the fraction of selected features is varied from null to unity. Multiple Kernel Learning subject to l_p\geq1 regularization (l_p-MKL) has been demonstrated to be one of the most effective techniques for non-linear feature selection. However, state-of-the-art l_p-MKL algorithms are too computationally expensive to be invoked thousands of times to determine the entire path. We propose a novel conjecture which states that, for certain l_p-MKL formulations, the number of features selected in the optimal solution monotonically decreases as p is decreased from an initial value to unity. We prove the conjecture, for a generic family of kernel target alignment based formulations, and show that the feature weights themselves decay (grow) monotonically once they are below (above) a certain threshold at optimality. This allows us to develop a path following algorithm that systematically generates optimal feature sets of decreasing size. The proposed algorithm sets certain feature weights directly to zero for potentially large intervals of p thereby reducing optimization costs while simultaneously providing approximation guarantees. We empirically demonstrate that our formulation can lead to classification accuracies which are as much as 10% higher on benchmark data sets not only as compared to other l_p-MKL formulations and uniform kernel baselines but also leading feature selection methods. We further demonstrate that our algorithm reduces training time significantly over other path following algorithms and state-of-the-art l_p-MKL optimizers such as SMO-MKL. In particular, we generate the entire feature selection path for data sets with a hundred thousand features in approximately half an hour on standard hardware.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-jawanpuria14,
  title = 	 {On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection},
  author = 	 {Jawanpuria, Pratik and Varma, Manik and Nath, Saketha},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {118--126},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/jawanpuria14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/jawanpuria14.html},
  abstract = 	 {Our objective is to develop formulations and algorithms for efficiently computing the feature selection path – i.e. the variation in classification accuracy as the fraction of selected features is varied from null to unity. Multiple Kernel Learning subject to l_p\geq1 regularization (l_p-MKL) has been demonstrated to be one of the most effective techniques for non-linear feature selection. However, state-of-the-art l_p-MKL algorithms are too computationally expensive to be invoked thousands of times to determine the entire path.    We propose a novel conjecture which states that, for certain l_p-MKL formulations, the number of features selected in the optimal solution monotonically decreases as p is decreased from an initial value to unity. We prove the conjecture, for a generic family of kernel target alignment based formulations, and show that the feature weights themselves decay (grow) monotonically once they are below (above) a certain threshold at optimality. This allows us to develop a path following algorithm that systematically generates optimal feature sets of decreasing size. The proposed algorithm sets certain feature weights directly to zero for potentially large intervals of p thereby reducing optimization costs while simultaneously providing approximation guarantees.    We empirically demonstrate that our formulation can lead to classification accuracies which are as much as 10% higher on benchmark data sets not only as compared to other l_p-MKL formulations and uniform kernel baselines but also leading feature selection methods. We further demonstrate that our algorithm reduces training time significantly over other path following algorithms and state-of-the-art l_p-MKL optimizers such as SMO-MKL. In particular, we generate the entire feature selection path for data sets with a hundred thousand features in approximately half an hour on standard hardware.}
}

Endnote

%0 Conference Paper
%T On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection
%A Pratik Jawanpuria
%A Manik Varma
%A Saketha Nath
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-jawanpuria14
%I PMLR
%P 118--126
%U https://proceedings.mlr.press/v32/jawanpuria14.html
%V 32
%N 2
%X Our objective is to develop formulations and algorithms for efficiently computing the feature selection path – i.e. the variation in classification accuracy as the fraction of selected features is varied from null to unity. Multiple Kernel Learning subject to l_p\geq1 regularization (l_p-MKL) has been demonstrated to be one of the most effective techniques for non-linear feature selection. However, state-of-the-art l_p-MKL algorithms are too computationally expensive to be invoked thousands of times to determine the entire path.    We propose a novel conjecture which states that, for certain l_p-MKL formulations, the number of features selected in the optimal solution monotonically decreases as p is decreased from an initial value to unity. We prove the conjecture, for a generic family of kernel target alignment based formulations, and show that the feature weights themselves decay (grow) monotonically once they are below (above) a certain threshold at optimality. This allows us to develop a path following algorithm that systematically generates optimal feature sets of decreasing size. The proposed algorithm sets certain feature weights directly to zero for potentially large intervals of p thereby reducing optimization costs while simultaneously providing approximation guarantees.    We empirically demonstrate that our formulation can lead to classification accuracies which are as much as 10% higher on benchmark data sets not only as compared to other l_p-MKL formulations and uniform kernel baselines but also leading feature selection methods. We further demonstrate that our algorithm reduces training time significantly over other path following algorithms and state-of-the-art l_p-MKL optimizers such as SMO-MKL. In particular, we generate the entire feature selection path for data sets with a hundred thousand features in approximately half an hour on standard hardware.

RIS


TY  - CPAPER
TI  - On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection
AU  - Pratik Jawanpuria
AU  - Manik Varma
AU  - Saketha Nath
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/06/18
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-jawanpuria14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 2
SP  - 118
EP  - 126
L1  - http://proceedings.mlr.press/v32/jawanpuria14.pdf
UR  - https://proceedings.mlr.press/v32/jawanpuria14.html
AB  - Our objective is to develop formulations and algorithms for efficiently computing the feature selection path – i.e. the variation in classification accuracy as the fraction of selected features is varied from null to unity. Multiple Kernel Learning subject to l_p\geq1 regularization (l_p-MKL) has been demonstrated to be one of the most effective techniques for non-linear feature selection. However, state-of-the-art l_p-MKL algorithms are too computationally expensive to be invoked thousands of times to determine the entire path.    We propose a novel conjecture which states that, for certain l_p-MKL formulations, the number of features selected in the optimal solution monotonically decreases as p is decreased from an initial value to unity. We prove the conjecture, for a generic family of kernel target alignment based formulations, and show that the feature weights themselves decay (grow) monotonically once they are below (above) a certain threshold at optimality. This allows us to develop a path following algorithm that systematically generates optimal feature sets of decreasing size. The proposed algorithm sets certain feature weights directly to zero for potentially large intervals of p thereby reducing optimization costs while simultaneously providing approximation guarantees.    We empirically demonstrate that our formulation can lead to classification accuracies which are as much as 10% higher on benchmark data sets not only as compared to other l_p-MKL formulations and uniform kernel baselines but also leading feature selection methods. We further demonstrate that our algorithm reduces training time significantly over other path following algorithms and state-of-the-art l_p-MKL optimizers such as SMO-MKL. In particular, we generate the entire feature selection path for data sets with a hundred thousand features in approximately half an hour on standard hardware.
ER  -

APA


Jawanpuria, P., Varma, M. & Nath, S.. (2014). On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):118-126 Available from https://proceedings.mlr.press/v32/jawanpuria14.html.

On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection

Abstract

Cite this Paper

Related Material