Consistency versus Realizable H-Consistency for Multiclass Classification

Phil Long, Rocco Servedio
Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):801-809, 2013.

Abstract

A consistent loss function for multiclass classification is one such that for any source of labeled examples, any tuple of scoring functions that minimizes the expected loss will have classification accuracy close to that of the Bayes optimal classifier. While consistency has been proposed as a desirable property for multiclass loss functions, we give experimental and theoretical results exhibiting a sequence of linearly separable data sources with the following property: a multiclass classification algorithm which optimizes a loss function due to Crammer and Singer (which is known not to be consistent) produces classifiers whose expected error goes to 0, while the expected error of an algorithm which optimizes a generalization of the loss function used by LogitBoost (a loss function which is known to be consistent) is bounded below by a positive constant. We identify a property of a loss function, realizable consistency with respect to a restricted class of scoring functions, that accounts for this difference. As our main technical results we show that the Crammer–Singer loss function is realizable consistent for the class of linear scoring functions, while the generalization of LogitBoost is not. Our result for LogitBoost is a special case of a more general theorem that applies to several other loss functions that have been proposed for multiclass classification.

Cite this Paper


BibTeX
@InProceedings{pmlr-v28-long13, title = {Consistency versus Realizable H-Consistency for Multiclass Classification}, author = {Long, Phil and Servedio, Rocco}, booktitle = {Proceedings of the 30th International Conference on Machine Learning}, pages = {801--809}, year = {2013}, editor = {Dasgupta, Sanjoy and McAllester, David}, volume = {28}, number = {3}, series = {Proceedings of Machine Learning Research}, address = {Atlanta, Georgia, USA}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v28/long13.pdf}, url = {https://proceedings.mlr.press/v28/long13.html}, abstract = {A consistent loss function for multiclass classification is one such that for any source of labeled examples, any tuple of scoring functions that minimizes the expected loss will have classification accuracy close to that of the Bayes optimal classifier. While consistency has been proposed as a desirable property for multiclass loss functions, we give experimental and theoretical results exhibiting a sequence of linearly separable data sources with the following property: a multiclass classification algorithm which optimizes a loss function due to Crammer and Singer (which is known not to be consistent) produces classifiers whose expected error goes to 0, while the expected error of an algorithm which optimizes a generalization of the loss function used by LogitBoost (a loss function which is known to be consistent) is bounded below by a positive constant. We identify a property of a loss function, realizable consistency with respect to a restricted class of scoring functions, that accounts for this difference. As our main technical results we show that the Crammer–Singer loss function is realizable consistent for the class of linear scoring functions, while the generalization of LogitBoost is not. Our result for LogitBoost is a special case of a more general theorem that applies to several other loss functions that have been proposed for multiclass classification. } }
Endnote
%0 Conference Paper %T Consistency versus Realizable H-Consistency for Multiclass Classification %A Phil Long %A Rocco Servedio %B Proceedings of the 30th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Sanjoy Dasgupta %E David McAllester %F pmlr-v28-long13 %I PMLR %P 801--809 %U https://proceedings.mlr.press/v28/long13.html %V 28 %N 3 %X A consistent loss function for multiclass classification is one such that for any source of labeled examples, any tuple of scoring functions that minimizes the expected loss will have classification accuracy close to that of the Bayes optimal classifier. While consistency has been proposed as a desirable property for multiclass loss functions, we give experimental and theoretical results exhibiting a sequence of linearly separable data sources with the following property: a multiclass classification algorithm which optimizes a loss function due to Crammer and Singer (which is known not to be consistent) produces classifiers whose expected error goes to 0, while the expected error of an algorithm which optimizes a generalization of the loss function used by LogitBoost (a loss function which is known to be consistent) is bounded below by a positive constant. We identify a property of a loss function, realizable consistency with respect to a restricted class of scoring functions, that accounts for this difference. As our main technical results we show that the Crammer–Singer loss function is realizable consistent for the class of linear scoring functions, while the generalization of LogitBoost is not. Our result for LogitBoost is a special case of a more general theorem that applies to several other loss functions that have been proposed for multiclass classification.
RIS
TY - CPAPER TI - Consistency versus Realizable H-Consistency for Multiclass Classification AU - Phil Long AU - Rocco Servedio BT - Proceedings of the 30th International Conference on Machine Learning DA - 2013/05/26 ED - Sanjoy Dasgupta ED - David McAllester ID - pmlr-v28-long13 PB - PMLR DP - Proceedings of Machine Learning Research VL - 28 IS - 3 SP - 801 EP - 809 L1 - http://proceedings.mlr.press/v28/long13.pdf UR - https://proceedings.mlr.press/v28/long13.html AB - A consistent loss function for multiclass classification is one such that for any source of labeled examples, any tuple of scoring functions that minimizes the expected loss will have classification accuracy close to that of the Bayes optimal classifier. While consistency has been proposed as a desirable property for multiclass loss functions, we give experimental and theoretical results exhibiting a sequence of linearly separable data sources with the following property: a multiclass classification algorithm which optimizes a loss function due to Crammer and Singer (which is known not to be consistent) produces classifiers whose expected error goes to 0, while the expected error of an algorithm which optimizes a generalization of the loss function used by LogitBoost (a loss function which is known to be consistent) is bounded below by a positive constant. We identify a property of a loss function, realizable consistency with respect to a restricted class of scoring functions, that accounts for this difference. As our main technical results we show that the Crammer–Singer loss function is realizable consistent for the class of linear scoring functions, while the generalization of LogitBoost is not. Our result for LogitBoost is a special case of a more general theorem that applies to several other loss functions that have been proposed for multiclass classification. ER -
APA
Long, P. & Servedio, R.. (2013). Consistency versus Realizable H-Consistency for Multiclass Classification. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):801-809 Available from https://proceedings.mlr.press/v28/long13.html.

Related Material