Surrogate Regret Bounds for the Area Under the ROC Curve via Strongly Proper Losses
Proceedings of the 26th Annual Conference on Learning Theory, PMLR 30:338-353, 2013.
The area under the ROC curve (AUC) is a widely used performance measure in machine learning, and has been widely studied in recent years particularly in the context of bipartite ranking. A dominant theoretical and algorithmic framework for AUC optimization/bipartite ranking has been to reduce the problem to pairwise classification; in particular, it is well known that the AUC regret can be formulated as a pairwise classification regret, which in turn can be upper bounded using usual regret bounds for binary classification. Recently, Kotlowski et al. (2011) showed AUC regret bounds in terms of the regret associated with ‘balanced’ versions of the standard (non-pairwise) logistic and exponential losses. In this paper, we obtain such (non-pairwise) surrogate regret bounds for the AUC in terms of a broad class of proper (composite) losses that we term \emphstrongly proper. Our proof technique is considerably simpler than that of Kotlowski et al. (2011), and relies on properties of proper (composite) losses as elucidated recently by Reid and Williamson (2009, 2010, 2011) and others. Our result yields explicit surrogate bounds (with no hidden balancing terms) in terms of a variety of strongly proper losses, including for example logistic, exponential, squared and squared hinge losses. An important consequence is that standard algorithms minimizing a (non-pairwise) strongly proper loss, such as logistic regression and boosting algorithms (assuming a universal function class and appropriate regularization), are in fact AUC-consistent; moreover, our results allow us to quantify the AUC regret in terms of the corresponding surrogate regret. We also obtain tighter surrogate regret bounds under certain low-noise conditions via a recent result of Clémen\con and Robbiano (2011).