Interpretable Random Forests via Rule Extraction

Clément Bénard, Gérard Biau, Sébastien da Veiga, Erwan Scornet
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:937-945, 2021.

Abstract

We introduce SIRUS (Stable and Interpretable RUle Set) for regression, a stable rule learning algorithm, which takes the form of a short and simple list of rules. State-of-the-art learning algorithms are often referred to as “black boxes” because of the high number of operations involved in their prediction process. Despite their powerful predictivity, this lack of interpretability may be highly restrictive for applications with critical decisions at stake. On the other hand, algorithms with a simple structure—typically decision trees, rule algorithms, or sparse linear models—are well known for their instability. This undesirable feature makes the conclusions of the data analysis unreliable and turns out to be a strong operational limitation. This motivates the design of SIRUS, based on random forests, which combines a simple structure, a remarkable stable behavior when data is perturbed, and an accuracy comparable to its competitors. We demonstrate the efficiency of the method both empirically (through experiments) and theoretically (with the proof of its asymptotic stability). A R/C++ software implementation sirus is available from CRAN.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-benard21a, title = { Interpretable Random Forests via Rule Extraction }, author = {B{\'e}nard, Cl{\'e}ment and Biau, G{\'e}rard and da Veiga, S{\'e}bastien and Scornet, Erwan}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {937--945}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/benard21a/benard21a.pdf}, url = {https://proceedings.mlr.press/v130/benard21a.html}, abstract = { We introduce SIRUS (Stable and Interpretable RUle Set) for regression, a stable rule learning algorithm, which takes the form of a short and simple list of rules. State-of-the-art learning algorithms are often referred to as “black boxes” because of the high number of operations involved in their prediction process. Despite their powerful predictivity, this lack of interpretability may be highly restrictive for applications with critical decisions at stake. On the other hand, algorithms with a simple structure—typically decision trees, rule algorithms, or sparse linear models—are well known for their instability. This undesirable feature makes the conclusions of the data analysis unreliable and turns out to be a strong operational limitation. This motivates the design of SIRUS, based on random forests, which combines a simple structure, a remarkable stable behavior when data is perturbed, and an accuracy comparable to its competitors. We demonstrate the efficiency of the method both empirically (through experiments) and theoretically (with the proof of its asymptotic stability). A R/C++ software implementation sirus is available from CRAN. } }
Endnote
%0 Conference Paper %T Interpretable Random Forests via Rule Extraction %A Clément Bénard %A Gérard Biau %A Sébastien da Veiga %A Erwan Scornet %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-benard21a %I PMLR %P 937--945 %U https://proceedings.mlr.press/v130/benard21a.html %V 130 %X We introduce SIRUS (Stable and Interpretable RUle Set) for regression, a stable rule learning algorithm, which takes the form of a short and simple list of rules. State-of-the-art learning algorithms are often referred to as “black boxes” because of the high number of operations involved in their prediction process. Despite their powerful predictivity, this lack of interpretability may be highly restrictive for applications with critical decisions at stake. On the other hand, algorithms with a simple structure—typically decision trees, rule algorithms, or sparse linear models—are well known for their instability. This undesirable feature makes the conclusions of the data analysis unreliable and turns out to be a strong operational limitation. This motivates the design of SIRUS, based on random forests, which combines a simple structure, a remarkable stable behavior when data is perturbed, and an accuracy comparable to its competitors. We demonstrate the efficiency of the method both empirically (through experiments) and theoretically (with the proof of its asymptotic stability). A R/C++ software implementation sirus is available from CRAN.
APA
Bénard, C., Biau, G., da Veiga, S. & Scornet, E.. (2021). Interpretable Random Forests via Rule Extraction . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:937-945 Available from https://proceedings.mlr.press/v130/benard21a.html.

Related Material