Using Causal Knowledge to Learn More Useful Decision Rules From Data
Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, PMLR R0:151-160, 1995.
One of the most popular and enduring paradigms in the intersection of machine-learning and computational statistics is the use of recursive-partitioning or "tree-structured" methods to "learn" classification trees from data sets [Buntine, 1993; Quinlan, 1986]. This approach applies to independent variables of all scale types (binary, categorical, ordered categorical, and continuous) and to noisy as well as to noiseless training sets. It produces classification trees that can readily be reexpressed as sets of expert systems rules (with each conjunction of literals corresponding to a set of values for variables along one branch through the tree). Each such rule produces a probability vector for the possible classes (or dependent variable values) that the object being classified may have, thus automatically presenting confidence and uncertainty information about its conclusions. Classification trees can be validated by methods such as cross-validation (Breiman et al., 1984), and they can easily be modified to handle missing data by constructing rules that exploit only the information contained in the observed variables.