Surrogate regret bounds for generalized classification performance metrics

Wojciech Kotlowski; Krzysztof Dembczyński

Surrogate regret bounds for generalized classification performance metrics

Wojciech Kotlowski, Krzysztof Dembczyński

Asian Conference on Machine Learning, PMLR 45:301-316, 2016.

Abstract

We consider optimization of generalized performance metrics for binary classification by means of surrogate loss. We focus on a class of metrics, which are linear-fractional functions of the false positive and false negative rates (examples of which include $F_\\beta$-measure, Jaccard similarity coefficient, AM measure, and many others). Our analysis concerns the following two-step procedure. First, a real-valued function $f$ is learned by minimizing a surrogate loss for binary classification on the training sample. It is assumed that the surrogate loss is a strongly proper composite loss function (examples of which include logistic loss, squared-error loss, exponential loss, etc.). Then, given $f$, a threshold $\\hat{\\theta}$ is tuned on a separate validation sample, by direct optimization of the target performance measure. We show that the regret of the resulting classifier (obtained from thresholding $f$ on $\\hat{\\theta}$ measured with respect to the target metric is upperbounded by the regret of f measured with respect to the surrogate loss. Our finding is further analyzed in a computational study on both synthetic and real data sets.

Cite this Paper

BibTeX


@InProceedings{pmlr-v45-Kotlowski15,
  title = 	 {Surrogate regret bounds for generalized classification performance metrics},
  author = 	 {Kotlowski, Wojciech and Dembczyński, Krzysztof},
  booktitle = 	 {Asian Conference on Machine Learning},
  pages = 	 {301--316},
  year = 	 {2016},
  editor = 	 {Holmes, Geoffrey and Liu, Tie-Yan},
  volume = 	 {45},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Hong Kong},
  month = 	 {20--22 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v45/Kotlowski15.pdf},
  url = 	 {https://proceedings.mlr.press/v45/Kotlowski15.html},
  abstract = 	 {We consider optimization of generalized performance metrics for binary classification by means of surrogate loss.  We focus on a class of metrics, which are linear-fractional functions of the false positive and false negative rates (examples of which include $F_\\beta$-measure, Jaccard similarity coefficient, AM measure, and many others). Our analysis concerns the following two-step procedure. First, a real-valued function $f$ is learned by minimizing a surrogate loss for binary classification on the training sample. It is assumed that the surrogate loss is a strongly proper composite loss function (examples of which include logistic loss, squared-error loss, exponential loss, etc.). Then, given $f$, a threshold $\\hat{\\theta}$ is tuned on a separate validation sample, by direct optimization of the target performance measure. We show that the regret of the resulting classifier (obtained from thresholding $f$ on $\\hat{\\theta}$  measured with respect to the target metric is upperbounded by the regret of f measured with respect to the surrogate loss.  Our finding is further analyzed in a computational study on both synthetic and real data sets.}
}

Endnote

%0 Conference Paper
%T Surrogate regret bounds for generalized classification performance metrics
%A Wojciech Kotlowski
%A Krzysztof Dembczyński
%B Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Geoffrey Holmes
%E Tie-Yan Liu	
%F pmlr-v45-Kotlowski15
%I PMLR
%P 301--316
%U https://proceedings.mlr.press/v45/Kotlowski15.html
%V 45
%X We consider optimization of generalized performance metrics for binary classification by means of surrogate loss.  We focus on a class of metrics, which are linear-fractional functions of the false positive and false negative rates (examples of which include $F_\\beta$-measure, Jaccard similarity coefficient, AM measure, and many others). Our analysis concerns the following two-step procedure. First, a real-valued function $f$ is learned by minimizing a surrogate loss for binary classification on the training sample. It is assumed that the surrogate loss is a strongly proper composite loss function (examples of which include logistic loss, squared-error loss, exponential loss, etc.). Then, given $f$, a threshold $\\hat{\\theta}$ is tuned on a separate validation sample, by direct optimization of the target performance measure. We show that the regret of the resulting classifier (obtained from thresholding $f$ on $\\hat{\\theta}$  measured with respect to the target metric is upperbounded by the regret of f measured with respect to the surrogate loss.  Our finding is further analyzed in a computational study on both synthetic and real data sets.

RIS


TY  - CPAPER
TI  - Surrogate regret bounds for generalized classification performance metrics
AU  - Wojciech Kotlowski
AU  - Krzysztof Dembczyński
BT  - Asian Conference on Machine Learning
DA  - 2016/02/25
ED  - Geoffrey Holmes
ED  - Tie-Yan Liu	
ID  - pmlr-v45-Kotlowski15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 45
SP  - 301
EP  - 316
L1  - http://proceedings.mlr.press/v45/Kotlowski15.pdf
UR  - https://proceedings.mlr.press/v45/Kotlowski15.html
AB  - We consider optimization of generalized performance metrics for binary classification by means of surrogate loss.  We focus on a class of metrics, which are linear-fractional functions of the false positive and false negative rates (examples of which include $F_\\beta$-measure, Jaccard similarity coefficient, AM measure, and many others). Our analysis concerns the following two-step procedure. First, a real-valued function $f$ is learned by minimizing a surrogate loss for binary classification on the training sample. It is assumed that the surrogate loss is a strongly proper composite loss function (examples of which include logistic loss, squared-error loss, exponential loss, etc.). Then, given $f$, a threshold $\\hat{\\theta}$ is tuned on a separate validation sample, by direct optimization of the target performance measure. We show that the regret of the resulting classifier (obtained from thresholding $f$ on $\\hat{\\theta}$  measured with respect to the target metric is upperbounded by the regret of f measured with respect to the surrogate loss.  Our finding is further analyzed in a computational study on both synthetic and real data sets.
ER  -

APA


Kotlowski, W. & Dembczyński, K.. (2016). Surrogate regret bounds for generalized classification performance metrics. Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 45:301-316 Available from https://proceedings.mlr.press/v45/Kotlowski15.html.

Related Material

Download PDF