Causal & Non-Causal Feature Selection for Ridge Regression

Gavin C. Cawley

Causal & Non-Causal Feature Selection for Ridge Regression

Gavin C. Cawley

Proceedings of the Workshop on the Causation and Prediction Challenge at WCCI 2008, PMLR 3:107-128, 2008.

Abstract

In this paper we investigate the use of causal and non-causal feature selection methods for linear classifiers in situations where the causal relationships between the input and response variables may differ between the training and operational data. The causal feature selection methods investigated include inference of the Markov Blanket and inference of direct causes and of direct effects. The non-causal feature selection method is based on logistic regression with Bayesian regularisation using a Laplace prior. A simple ridge regression model is used as the base classifier, where the ridge parameter is efficiently tuned so as to minimise the leave-one-out error, via eigen-decomposition of the data covariance matrix. For tasks with more features than patterns, linear kernel ridge regression is used for computational efficiency. Results are presented for all of the WCCI-2008 Causation and Prediction Challenge datasets, demonstrating that, somewhat surprisingly, causal feature selection procedures do not provide significant benefits in terms of predictive accuracy over non-causal feature selection and/or classification using the entire feature set.

Cite this Paper

BibTeX


@InProceedings{pmlr-v3-cawley09a,
  title = 	 {Causal & Non-Causal Feature Selection for Ridge Regression},
  author = 	 {Cawley, Gavin C.},
  booktitle = 	 {Proceedings of the Workshop on the Causation and Prediction Challenge at WCCI 2008},
  pages = 	 {107--128},
  year = 	 {2008},
  editor = 	 {Guyon, Isabelle and Aliferis, Constantin and Cooper, Greg and Elisseeff, André and Pellet, Jean-Philippe and Spirtes, Peter and Statnikov, Alexander},
  volume = 	 {3},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Hong Kong},
  month = 	 {03--04 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v3/cawley09a/cawley09a.pdf},
  url = 	 {http://proceedings.mlr.press/v3/cawley09a.html},
  abstract = 	 {In this paper we investigate the use of causal and non-causal feature selection methods for linear classifiers in situations where the causal relationships between the input and response variables may differ between the training and operational data. The causal feature selection methods investigated include inference of the Markov Blanket and inference of direct causes and of direct effects. The non-causal feature selection method is based on logistic regression with Bayesian regularisation using a Laplace prior. A simple ridge regression model is used as the base classifier, where the ridge parameter is efficiently tuned so as to minimise the leave-one-out error, via eigen-decomposition of the data covariance matrix. For tasks with more features than patterns, linear kernel ridge regression is used for computational efficiency. Results are presented for all of the WCCI-2008 Causation and Prediction Challenge datasets, demonstrating that, somewhat surprisingly, causal feature selection procedures do not provide significant benefits in terms of predictive accuracy over non-causal feature selection and/or classification using the entire feature set.}
}

Endnote

%0 Conference Paper
%T Causal & Non-Causal Feature Selection for Ridge Regression
%A Gavin C. Cawley
%B Proceedings of the Workshop on the Causation and Prediction Challenge at WCCI 2008
%C Proceedings of Machine Learning Research
%D 2008
%E Isabelle Guyon
%E Constantin Aliferis
%E Greg Cooper
%E André Elisseeff
%E Jean-Philippe Pellet
%E Peter Spirtes
%E Alexander Statnikov	
%F pmlr-v3-cawley09a
%I PMLR
%P 107--128
%U http://proceedings.mlr.press/v3/cawley09a.html
%V 3
%X In this paper we investigate the use of causal and non-causal feature selection methods for linear classifiers in situations where the causal relationships between the input and response variables may differ between the training and operational data. The causal feature selection methods investigated include inference of the Markov Blanket and inference of direct causes and of direct effects. The non-causal feature selection method is based on logistic regression with Bayesian regularisation using a Laplace prior. A simple ridge regression model is used as the base classifier, where the ridge parameter is efficiently tuned so as to minimise the leave-one-out error, via eigen-decomposition of the data covariance matrix. For tasks with more features than patterns, linear kernel ridge regression is used for computational efficiency. Results are presented for all of the WCCI-2008 Causation and Prediction Challenge datasets, demonstrating that, somewhat surprisingly, causal feature selection procedures do not provide significant benefits in terms of predictive accuracy over non-causal feature selection and/or classification using the entire feature set.

RIS


TY  - CPAPER
TI  - Causal & Non-Causal Feature Selection for Ridge Regression
AU  - Gavin C. Cawley
BT  - Proceedings of the Workshop on the Causation and Prediction Challenge at WCCI 2008
DA  - 2008/12/31
ED  - Isabelle Guyon
ED  - Constantin Aliferis
ED  - Greg Cooper
ED  - André Elisseeff
ED  - Jean-Philippe Pellet
ED  - Peter Spirtes
ED  - Alexander Statnikov	
ID  - pmlr-v3-cawley09a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 3
SP  - 107
EP  - 128
L1  - http://proceedings.mlr.press/v3/cawley09a/cawley09a.pdf
UR  - http://proceedings.mlr.press/v3/cawley09a.html
AB  - In this paper we investigate the use of causal and non-causal feature selection methods for linear classifiers in situations where the causal relationships between the input and response variables may differ between the training and operational data. The causal feature selection methods investigated include inference of the Markov Blanket and inference of direct causes and of direct effects. The non-causal feature selection method is based on logistic regression with Bayesian regularisation using a Laplace prior. A simple ridge regression model is used as the base classifier, where the ridge parameter is efficiently tuned so as to minimise the leave-one-out error, via eigen-decomposition of the data covariance matrix. For tasks with more features than patterns, linear kernel ridge regression is used for computational efficiency. Results are presented for all of the WCCI-2008 Causation and Prediction Challenge datasets, demonstrating that, somewhat surprisingly, causal feature selection procedures do not provide significant benefits in terms of predictive accuracy over non-causal feature selection and/or classification using the entire feature set.
ER  -

APA


Cawley, G.C.. (2008). Causal & Non-Causal Feature Selection for Ridge Regression. Proceedings of the Workshop on the Causation and Prediction Challenge at WCCI 2008, in Proceedings of Machine Learning Research 3:107-128 Available from http://proceedings.mlr.press/v3/cawley09a.html.

Related Material

Download PDF