scispace - formally typeset
Open AccessProceedings Article

On the consistency of AUC pairwise optimization

Wei Gao, +1 more
- pp 939-945
TLDR
In this article, the generalized calibration for AUC optimization is introduced, and it is shown that it is a necessary condition for consistency of AUC, which can be used to study the consistency of various surrogate losses.
Abstract
AUC (Area Under ROC Curve) has been an important criterion widely used in diverse learning tasks. To optimize AUC, many learning approaches have been developed, most working with pairwise surrogate losses. Thus, it is important to study the AUC consistency based on minimizing pairwise surrogate losses. In this paper, we introduce the generalized calibration for AUC optimization, and prove that it is a necessary condition for AUC consistency. We then provide a sufficient condition for AUC consistency, and show its usefulness in studying the consistency of various surrogate losses, as well as the invention of new consistent losses. We further derive regret bounds for exponential and logistic losses, and present regret bounds for more general surrogate losses in the realizable setting. Finally, we prove regret bounds that disclose the equivalence between the pairwise exponential loss of AUC and univariate exponential loss of accuracy.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article

Stochastic online AUC maximization

TL;DR: It is shown that AUC optimization can be equivalently formulated as a convex-concave saddle point problem and a stochastic online algorithm (SOLAM) is proposed which has time and space complexity of one datum.
Journal ArticleDOI

One-pass AUC optimization

TL;DR: In this article, the authors focus on one-pass AUC optimization that requires going through training data only once without having to store the entire training dataset and develop a regression-based algorithm which only needs to maintain the first and second-order statistics of training data in memory.
Posted Content

On Symmetric Losses for Learning from Corrupted Labels

TL;DR: It is emphasized that using a symmetric loss is advantageous in the balanced error rate (BER) minimization and area under the receiver operating characteristic curve (AUC) maximization from corrupted labels, and general theoretical properties of symmetric losses are proved.
Proceedings Article

Stochastic Proximal Algorithms for AUC Maximization

TL;DR: This paper develops a novel stochastic proximal algorithm for AUC maximization which is referred to as SPAM, and achieves a convergence rate of O( log t t ) for strongly convex functions while both space and per-iteration costs are of one datum.
Proceedings Article

One-Pass AUC Optimization

TL;DR: In this paper, the authors focus on one-pass AUC optimization that requires going through the training data only once without storing the entire training dataset, where conventional online learning algorithms cannot be applied directly because AUC is measured by a sum of losses defined over pairs of instances from different classes.
References
More filters

Statistical learning theory

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Journal ArticleDOI

The meaning and use of the area under a receiver operating characteristic (ROC) curve.

James A. Hanley, +1 more
- 01 Apr 1982 - 
TL;DR: A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented and it is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a random chosen non-diseased subject.
Journal ArticleDOI

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

TL;DR: The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.
Journal ArticleDOI

Additive Logistic Regression : A Statistical View of Boosting

TL;DR: This work shows that this seemingly mysterious phenomenon of boosting can be understood in terms of well-known statistical principles, namely additive modeling and maximum likelihood, and develops more direct approximations and shows that they exhibit nearly identical results to boosting.
Proceedings Article

The foundations of cost-sensitive learning

TL;DR: It is argued that changing the balance of negative and positive training examples has little effect on the classifiers produced by standard Bayesian and decision tree learning methods, and the recommended way of applying one of these methods is to learn a classifier from the training set and then to compute optimal decisions explicitly using the probability estimates given by the classifier.