The Feature Selection Path in Kernel Methods
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:445-452, 2010.
The problem of automatic feature selection/weighting in kernel methods is examined. We work on a formulation that optimizes both the weights of features and the parameters of the kernel model simultaneously, using L_1 regularization for feature selection. Under quite general choices of kernels, we prove that there exists a unique regularization path for this problem, that runs from 0 to a stationary point of the non-regularized problem. We propose an ODE-based homotopy method to follow this trajectory. By following the path, our algorithm is able to automatically discard irrelevant features and to automatically go back and forth to avoid local optima. Experiments on synthetic and real datasets show that the method achieves low prediction error and is efficient in separating relevant from irrelevant features.