Post Selection Inference with Kernels
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:152-160, 2018.
Finding a set of statistically significant features from complex data (e.g., nonlinear and/or multi-dimensional output data) is important for scientific discovery and has a number of practical applications including biomarker discovery. In this paper, we propose a kernel-based post-selection inference (PSI) algorithm that can find a set of statistically significant features from non-linearly related data. Specifically, our PSI algorithm is based on independence measures, and we call it the Hilbert-Schmidt Independence Criterion (HSIC)-based PSI algorithm (hsicInf). The novelty of hsicInf is that it can handle non-linearity and/or multi-variate/multi-class outputs through kernels. Through synthetic experiments, we show that hsicInf can find a set of statistically significant features for both regression and classification problems. We applied hsicInf to real-world datasets and show that it can successfully identify important features.