Surrogate regret bounds for generalized classification performance metrics

Wojciech Kotlowski, Krzysztof Dembczyński
Asian Conference on Machine Learning, PMLR 45:301-316, 2016.

Abstract

We consider optimization of generalized performance metrics for binary classification by means of surrogate loss. We focus on a class of metrics, which are linear-fractional functions of the false positive and false negative rates (examples of which include $F_\\beta$-measure, Jaccard similarity coefficient, AM measure, and many others). Our analysis concerns the following two-step procedure. First, a real-valued function $f$ is learned by minimizing a surrogate loss for binary classification on the training sample. It is assumed that the surrogate loss is a strongly proper composite loss function (examples of which include logistic loss, squared-error loss, exponential loss, etc.). Then, given $f$, a threshold $\\hat{\\theta}$ is tuned on a separate validation sample, by direct optimization of the target performance measure. We show that the regret of the resulting classifier (obtained from thresholding $f$ on $\\hat{\\theta}$ measured with respect to the target metric is upperbounded by the regret of f measured with respect to the surrogate loss. Our finding is further analyzed in a computational study on both synthetic and real data sets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v45-Kotlowski15, title = {Surrogate regret bounds for generalized classification performance metrics}, author = {Kotlowski, Wojciech and Dembczyński, Krzysztof}, booktitle = {Asian Conference on Machine Learning}, pages = {301--316}, year = {2016}, editor = {Holmes, Geoffrey and Liu, Tie-Yan}, volume = {45}, series = {Proceedings of Machine Learning Research}, address = {Hong Kong}, month = {20--22 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v45/Kotlowski15.pdf}, url = {https://proceedings.mlr.press/v45/Kotlowski15.html}, abstract = {We consider optimization of generalized performance metrics for binary classification by means of surrogate loss. We focus on a class of metrics, which are linear-fractional functions of the false positive and false negative rates (examples of which include $F_\\beta$-measure, Jaccard similarity coefficient, AM measure, and many others). Our analysis concerns the following two-step procedure. First, a real-valued function $f$ is learned by minimizing a surrogate loss for binary classification on the training sample. It is assumed that the surrogate loss is a strongly proper composite loss function (examples of which include logistic loss, squared-error loss, exponential loss, etc.). Then, given $f$, a threshold $\\hat{\\theta}$ is tuned on a separate validation sample, by direct optimization of the target performance measure. We show that the regret of the resulting classifier (obtained from thresholding $f$ on $\\hat{\\theta}$ measured with respect to the target metric is upperbounded by the regret of f measured with respect to the surrogate loss. Our finding is further analyzed in a computational study on both synthetic and real data sets.} }
Endnote
%0 Conference Paper %T Surrogate regret bounds for generalized classification performance metrics %A Wojciech Kotlowski %A Krzysztof Dembczyński %B Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Geoffrey Holmes %E Tie-Yan Liu %F pmlr-v45-Kotlowski15 %I PMLR %P 301--316 %U https://proceedings.mlr.press/v45/Kotlowski15.html %V 45 %X We consider optimization of generalized performance metrics for binary classification by means of surrogate loss. We focus on a class of metrics, which are linear-fractional functions of the false positive and false negative rates (examples of which include $F_\\beta$-measure, Jaccard similarity coefficient, AM measure, and many others). Our analysis concerns the following two-step procedure. First, a real-valued function $f$ is learned by minimizing a surrogate loss for binary classification on the training sample. It is assumed that the surrogate loss is a strongly proper composite loss function (examples of which include logistic loss, squared-error loss, exponential loss, etc.). Then, given $f$, a threshold $\\hat{\\theta}$ is tuned on a separate validation sample, by direct optimization of the target performance measure. We show that the regret of the resulting classifier (obtained from thresholding $f$ on $\\hat{\\theta}$ measured with respect to the target metric is upperbounded by the regret of f measured with respect to the surrogate loss. Our finding is further analyzed in a computational study on both synthetic and real data sets.
RIS
TY - CPAPER TI - Surrogate regret bounds for generalized classification performance metrics AU - Wojciech Kotlowski AU - Krzysztof Dembczyński BT - Asian Conference on Machine Learning DA - 2016/02/25 ED - Geoffrey Holmes ED - Tie-Yan Liu ID - pmlr-v45-Kotlowski15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 45 SP - 301 EP - 316 L1 - http://proceedings.mlr.press/v45/Kotlowski15.pdf UR - https://proceedings.mlr.press/v45/Kotlowski15.html AB - We consider optimization of generalized performance metrics for binary classification by means of surrogate loss. We focus on a class of metrics, which are linear-fractional functions of the false positive and false negative rates (examples of which include $F_\\beta$-measure, Jaccard similarity coefficient, AM measure, and many others). Our analysis concerns the following two-step procedure. First, a real-valued function $f$ is learned by minimizing a surrogate loss for binary classification on the training sample. It is assumed that the surrogate loss is a strongly proper composite loss function (examples of which include logistic loss, squared-error loss, exponential loss, etc.). Then, given $f$, a threshold $\\hat{\\theta}$ is tuned on a separate validation sample, by direct optimization of the target performance measure. We show that the regret of the resulting classifier (obtained from thresholding $f$ on $\\hat{\\theta}$ measured with respect to the target metric is upperbounded by the regret of f measured with respect to the surrogate loss. Our finding is further analyzed in a computational study on both synthetic and real data sets. ER -
APA
Kotlowski, W. & Dembczyński, K.. (2016). Surrogate regret bounds for generalized classification performance metrics. Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 45:301-316 Available from https://proceedings.mlr.press/v45/Kotlowski15.html.

Related Material