TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning

Randal S. Olson; Jason H. Moore

TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning

Randal S. Olson, Jason H. Moore

Proceedings of the Workshop on Automatic Machine Learning, PMLR 64:66-74, 2016.

Abstract

As data science becomes more mainstream, there will be an ever-growing demand for data science tools that are more accessible, flexible, and scalable. In response to this demand, automated machine learning (autoML) researchers have begun building systems that automate the process of designing and optimizing machine learning pipelines. In this paper we present TPOT, an open source genetic programming-based autoML system that optimizes a series of feature preprocessors and machine learning models with the goal of maximizing classification accuracy on a supervised classification task. We benchmark TPOT on a series of 150 supervised classification tasks and find that it significantly outperforms a basic machine learning analysis in 22 of them, while experiencing minimal degradation in accuracy on 5 of the benchmarks—all without any domain knowledge nor human input. As such, GP-based autoML systems show considerable promise in the autoML domain.

Cite this Paper

BibTeX


@InProceedings{pmlr-v64-olson_tpot_2016,
  title = 	 {TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning},
  author = 	 {Olson, Randal S. and Moore, Jason H.},
  booktitle = 	 {Proceedings of the Workshop on Automatic Machine Learning},
  pages = 	 {66--74},
  year = 	 {2016},
  editor = 	 {Hutter, Frank and Kotthoff, Lars and Vanschoren, Joaquin},
  volume = 	 {64},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v64/olson_tpot_2016.pdf},
  url = 	 {https://proceedings.mlr.press/v64/olson_tpot_2016.html},
  abstract = 	 {As data science becomes more mainstream, there will be an ever-growing demand for data science tools that are more accessible, flexible, and scalable. In response to this demand, automated machine learning (autoML) researchers have begun building systems that automate the process of designing and optimizing machine learning pipelines. In this paper we present TPOT, an open source genetic programming-based autoML system that optimizes a series of feature preprocessors and machine learning models with the goal of maximizing classification accuracy on a supervised classification task. We benchmark TPOT on a series of 150 supervised classification tasks and find that it significantly outperforms a basic machine learning analysis in 22 of them, while experiencing minimal degradation in accuracy on 5 of the benchmarks—all without any domain knowledge nor human input. As such, GP-based autoML systems show considerable promise in the autoML domain.}
}

Endnote

%0 Conference Paper
%T TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning
%A Randal S. Olson
%A Jason H. Moore
%B Proceedings of the Workshop on Automatic Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Frank Hutter
%E Lars Kotthoff
%E Joaquin Vanschoren	
%F pmlr-v64-olson_tpot_2016
%I PMLR
%P 66--74
%U https://proceedings.mlr.press/v64/olson_tpot_2016.html
%V 64
%X As data science becomes more mainstream, there will be an ever-growing demand for data science tools that are more accessible, flexible, and scalable. In response to this demand, automated machine learning (autoML) researchers have begun building systems that automate the process of designing and optimizing machine learning pipelines. In this paper we present TPOT, an open source genetic programming-based autoML system that optimizes a series of feature preprocessors and machine learning models with the goal of maximizing classification accuracy on a supervised classification task. We benchmark TPOT on a series of 150 supervised classification tasks and find that it significantly outperforms a basic machine learning analysis in 22 of them, while experiencing minimal degradation in accuracy on 5 of the benchmarks—all without any domain knowledge nor human input. As such, GP-based autoML systems show considerable promise in the autoML domain.

RIS


TY  - CPAPER
TI  - TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning
AU  - Randal S. Olson
AU  - Jason H. Moore
BT  - Proceedings of the Workshop on Automatic Machine Learning
DA  - 2016/12/04
ED  - Frank Hutter
ED  - Lars Kotthoff
ED  - Joaquin Vanschoren	
ID  - pmlr-v64-olson_tpot_2016
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 64
SP  - 66
EP  - 74
L1  - http://proceedings.mlr.press/v64/olson_tpot_2016.pdf
UR  - https://proceedings.mlr.press/v64/olson_tpot_2016.html
AB  - As data science becomes more mainstream, there will be an ever-growing demand for data science tools that are more accessible, flexible, and scalable. In response to this demand, automated machine learning (autoML) researchers have begun building systems that automate the process of designing and optimizing machine learning pipelines. In this paper we present TPOT, an open source genetic programming-based autoML system that optimizes a series of feature preprocessors and machine learning models with the goal of maximizing classification accuracy on a supervised classification task. We benchmark TPOT on a series of 150 supervised classification tasks and find that it significantly outperforms a basic machine learning analysis in 22 of them, while experiencing minimal degradation in accuracy on 5 of the benchmarks—all without any domain knowledge nor human input. As such, GP-based autoML systems show considerable promise in the autoML domain.
ER  -

APA


Olson, R.S. & Moore, J.H.. (2016). TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning. Proceedings of the Workshop on Automatic Machine Learning, in Proceedings of Machine Learning Research 64:66-74 Available from https://proceedings.mlr.press/v64/olson_tpot_2016.html.

Related Material

Download PDF