Optimal Classification with Multivariate Losses

Nagarajan Natarajan, Oluwasanmi Koyejo, Pradeep Ravikumar, Inderjit Dhillon
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1530-1538, 2016.

Abstract

Multivariate loss functions are extensively employed in several prediction tasks arising in Information Retrieval. Often, the goal in the tasks is to minimize expected loss when retrieving relevant items from a presented set of items, where the expectation is with respect to the joint distribution over item sets. Our key result is that for most multivariate losses, the expected loss is provably optimized by sorting the items by the conditional probability of label being positive and then selecting top k items. Such a result was previously known only for the F-measure. Leveraging on the optimality characterization, we give an algorithm for estimating optimal predictions in practice with runtime quadratic in size of item sets for many losses. We provide empirical results on benchmark datasets, comparing the proposed algorithm to state-of-the-art methods for optimizing multivariate losses.

Cite this Paper


BibTeX
@InProceedings{pmlr-v48-natarajan16, title = {Optimal Classification with Multivariate Losses}, author = {Natarajan, Nagarajan and Koyejo, Oluwasanmi and Ravikumar, Pradeep and Dhillon, Inderjit}, booktitle = {Proceedings of The 33rd International Conference on Machine Learning}, pages = {1530--1538}, year = {2016}, editor = {Balcan, Maria Florina and Weinberger, Kilian Q.}, volume = {48}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {20--22 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v48/natarajan16.pdf}, url = {https://proceedings.mlr.press/v48/natarajan16.html}, abstract = {Multivariate loss functions are extensively employed in several prediction tasks arising in Information Retrieval. Often, the goal in the tasks is to minimize expected loss when retrieving relevant items from a presented set of items, where the expectation is with respect to the joint distribution over item sets. Our key result is that for most multivariate losses, the expected loss is provably optimized by sorting the items by the conditional probability of label being positive and then selecting top k items. Such a result was previously known only for the F-measure. Leveraging on the optimality characterization, we give an algorithm for estimating optimal predictions in practice with runtime quadratic in size of item sets for many losses. We provide empirical results on benchmark datasets, comparing the proposed algorithm to state-of-the-art methods for optimizing multivariate losses.} }
Endnote
%0 Conference Paper %T Optimal Classification with Multivariate Losses %A Nagarajan Natarajan %A Oluwasanmi Koyejo %A Pradeep Ravikumar %A Inderjit Dhillon %B Proceedings of The 33rd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Maria Florina Balcan %E Kilian Q. Weinberger %F pmlr-v48-natarajan16 %I PMLR %P 1530--1538 %U https://proceedings.mlr.press/v48/natarajan16.html %V 48 %X Multivariate loss functions are extensively employed in several prediction tasks arising in Information Retrieval. Often, the goal in the tasks is to minimize expected loss when retrieving relevant items from a presented set of items, where the expectation is with respect to the joint distribution over item sets. Our key result is that for most multivariate losses, the expected loss is provably optimized by sorting the items by the conditional probability of label being positive and then selecting top k items. Such a result was previously known only for the F-measure. Leveraging on the optimality characterization, we give an algorithm for estimating optimal predictions in practice with runtime quadratic in size of item sets for many losses. We provide empirical results on benchmark datasets, comparing the proposed algorithm to state-of-the-art methods for optimizing multivariate losses.
RIS
TY - CPAPER TI - Optimal Classification with Multivariate Losses AU - Nagarajan Natarajan AU - Oluwasanmi Koyejo AU - Pradeep Ravikumar AU - Inderjit Dhillon BT - Proceedings of The 33rd International Conference on Machine Learning DA - 2016/06/11 ED - Maria Florina Balcan ED - Kilian Q. Weinberger ID - pmlr-v48-natarajan16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 48 SP - 1530 EP - 1538 L1 - http://proceedings.mlr.press/v48/natarajan16.pdf UR - https://proceedings.mlr.press/v48/natarajan16.html AB - Multivariate loss functions are extensively employed in several prediction tasks arising in Information Retrieval. Often, the goal in the tasks is to minimize expected loss when retrieving relevant items from a presented set of items, where the expectation is with respect to the joint distribution over item sets. Our key result is that for most multivariate losses, the expected loss is provably optimized by sorting the items by the conditional probability of label being positive and then selecting top k items. Such a result was previously known only for the F-measure. Leveraging on the optimality characterization, we give an algorithm for estimating optimal predictions in practice with runtime quadratic in size of item sets for many losses. We provide empirical results on benchmark datasets, comparing the proposed algorithm to state-of-the-art methods for optimizing multivariate losses. ER -
APA
Natarajan, N., Koyejo, O., Ravikumar, P. & Dhillon, I.. (2016). Optimal Classification with Multivariate Losses. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:1530-1538 Available from https://proceedings.mlr.press/v48/natarajan16.html.

Related Material