Differentiable Feature Selection by Discrete Relaxation
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:1564-1572, 2020.
In this paper, we introduce Differentiable Feature Selection, a gradient-based search algorithm for feature selection. Our approach extends a recent result on the estimation of learnability in the sublinear data regime by showing that the calculation can be performed iteratively (i.e. in mini-batches) and in linear time and space with respect to both the number of features D and the sample size N. This, along with a discrete-to-continuous relaxation of the search domain, allows for an efficient, gradient-based search algorithm among feature subsets for very large datasets. Our algorithm utilizes higher-order correlations between features and targets for both the N>D and N