A Boo(n) for Evaluating Architecture Performance

Ondrej Bajgar, Rudolf Kadlec, Jan Kleindienst
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:334-343, 2018.

Abstract

We point out important problems with the common practice of using the best single model performance for comparing deep learning architectures, and we propose a method that corrects these flaws. Each time a model is trained, one gets a different result due to random factors in the training process, which include random parameter initialization and random data shuffling. Reporting the best single model performance does not appropriately address this stochasticity. We propose a normalized expected best-out-of-$n$ performance ($\text{Boo}_n$) as a way to correct these problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-bajgar18a, title = {A Boo(n) for Evaluating Architecture Performance}, author = {Bajgar, Ondrej and Kadlec, Rudolf and Kleindienst, Jan}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {334--343}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/bajgar18a/bajgar18a.pdf}, url = {https://proceedings.mlr.press/v80/bajgar18a.html}, abstract = {We point out important problems with the common practice of using the best single model performance for comparing deep learning architectures, and we propose a method that corrects these flaws. Each time a model is trained, one gets a different result due to random factors in the training process, which include random parameter initialization and random data shuffling. Reporting the best single model performance does not appropriately address this stochasticity. We propose a normalized expected best-out-of-$n$ performance ($\text{Boo}_n$) as a way to correct these problems.} }
Endnote
%0 Conference Paper %T A Boo(n) for Evaluating Architecture Performance %A Ondrej Bajgar %A Rudolf Kadlec %A Jan Kleindienst %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-bajgar18a %I PMLR %P 334--343 %U https://proceedings.mlr.press/v80/bajgar18a.html %V 80 %X We point out important problems with the common practice of using the best single model performance for comparing deep learning architectures, and we propose a method that corrects these flaws. Each time a model is trained, one gets a different result due to random factors in the training process, which include random parameter initialization and random data shuffling. Reporting the best single model performance does not appropriately address this stochasticity. We propose a normalized expected best-out-of-$n$ performance ($\text{Boo}_n$) as a way to correct these problems.
APA
Bajgar, O., Kadlec, R. & Kleindienst, J.. (2018). A Boo(n) for Evaluating Architecture Performance. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:334-343 Available from https://proceedings.mlr.press/v80/bajgar18a.html.

Related Material