scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The area above the ordinal dominance graph and the area below the receiver operating characteristic graph

01 Nov 1975-Journal of Mathematical Psychology (Academic Press)-Vol. 12, Iss: 4, pp 387-415
TL;DR: In this article, receiver operating characteristic graphs are shown to be a variant form of ordinal dominance graphs, and several different methods of constructing confidence intervals for the area measure are presented and the strengths and weaknesses of each of these methods are discussed.
About: This article is published in Journal of Mathematical Psychology.The article was published on 1975-11-01. It has received 1409 citations till now. The article focuses on the topics: Estimator & U-statistic.
Citations
More filters
Journal ArticleDOI
TL;DR: A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented and it is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a random chosen non-diseased subject.
Abstract: A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented. It is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject. Moreover, this probability of a correct ranking is the same quantity that is estimated by the already well-studied nonparametric Wilcoxon statistic. These two relationships are exploited to (a) provide rapid closed-form expressions for the approximate magnitude of the sampling variability, i.e., standard error that one uses to accompany the area under a smoothed ROC curve, (b) guide in determining the size of the sample required to provide a sufficiently reliable estimate of this area, and (c) determine how large sample sizes should be to ensure that one can statistically detect difference...

19,398 citations


Cites background from "The area above the ordinal dominanc..."

  • ...More important, however, was the more recent recognition by Bamber (8 ) that this " probability of correctly ranking a (normal, abnormal) pair " is intimately connected with the quantity calculated in the Wilcoxon or Mann- Whitney statistical test....

    [...]

  • ...More important, however, was the more recent recognition by Bamber (8) that this “probability of correctly ranking a (normal, abnormal) pair” is intimately connected with the quantity calculated in the Wilcoxon or MannWhitney statistical test....

    [...]

  • ...5, W is no longer nonparametric; its standard error, SE(W), depends on two distribution-specific quantities, Qi and Q2, which have the following interpretation: Q i = Prob (two randomly chosen abnormal images will both be ranked with greater suspicion than a randomly chosen normal image) Q 2 Prob (one randomly chosen abnormal image will be ranked with greater suspicion than two randomly chosen normal images) If we assume for the moment (as Green and Swets do in their proof regarding 0 and the area under the ROC curve) that the ratings are on a scale that is sufficiently continuous that it does not produce “ties,” then SE(W), or equivalently SE(area underneath empirical ROC curve), can be shown (8) to be...

    [...]

Journal ArticleDOI
TL;DR: A nonparametric approach to the analysis of areas under correlated ROC curves is presented, by using the theory on generalized U-statistics to generate an estimated covariance matrix.
Abstract: Methods of evaluating and comparing the performance of diagnostic tests are of increasing importance as new tests are developed and marketed. When a test is based on an observed variable that lies on a continuous or graded scale, an assessment of the overall value of the test can be made through the use of a receiver operating characteristic (ROC) curve. The curve is constructed by varying the cutpoint used to determine which values of the observed variable will be considered abnormal and then plotting the resulting sensitivities against the corresponding false positive rates. When two or more empirical curves are constructed based on tests performed on the same individuals, statistical analysis on differences between curves must take into account the correlated nature of the data. This paper presents a nonparametric approach to the analysis of areas under correlated ROC curves, by using the theory on generalized U-statistics to generate an estimated covariance matrix.

16,496 citations

Journal ArticleDOI
TL;DR: Receiver-operating characteristic (ROC) plots provide a pure index of accuracy by demonstrating the limits of a test's ability to discriminate between alternative states of health over the complete spectrum of operating conditions.
Abstract: The clinical performance of a laboratory test can be described in terms of diagnostic accuracy, or the ability to correctly classify subjects into clinically relevant subgroups. Diagnostic accuracy refers to the quality of the information provided by the classification device and should be distinguished from the usefulness, or actual practical value, of the information. Receiver-operating characteristic (ROC) plots provide a pure index of accuracy by demonstrating the limits of a test's ability to discriminate between alternative states of health over the complete spectrum of operating conditions. Furthermore, ROC plots occupy a central or unifying position in the process of assessing and using diagnostic tools. Once the plot is generated, a user can readily go on to many other activities such as performing quantitative ROC analysis and comparisons of tests, using likelihood ratio to revise the probability of disease in individual subjects, selecting decision thresholds, using logistic-regression analysis, using discriminant-function analysis, or incorporating the tool into a clinical strategy by using decision analysis.

6,339 citations


Cites background or methods from "The area above the ordinal dominanc..."

  • ...Analytical formulas to calculate the area are in reports by Bamber (31) and Hanley and McNeil (32)....

    [...]

  • ...(31, 32), introduced by the chemist Frank Wilcoxon....

    [...]

Journal ArticleDOI
TL;DR: Two new measures, one based on integrated sensitivity and specificity and the other on reclassification tables, are introduced that offer incremental information over the AUC and are proposed to be considered in addition to the A UC when assessing the performance of newer biomarkers.
Abstract: Identification of key factors associated with the risk of developing cardiovascular disease and quantification of this risk using multivariable prediction algorithms are among the major advances made in preventive cardiology and cardiovascular epidemiology in the 20th century. The ongoing discovery of new risk markers by scientists presents opportunities and challenges for statisticians and clinicians to evaluate these biomarkers and to develop new risk formulations that incorporate them. One of the key questions is how best to assess and quantify the improvement in risk prediction offered by these new models. Demonstration of a statistically significant association of a new biomarker with cardiovascular risk is not enough. Some researchers have advanced that the improvement in the area under the receiver-operating-characteristic curve (AUC) should be the main criterion, whereas others argue that better measures of performance of prediction models are needed. In this paper, we address this question by introducing two new measures, one based on integrated sensitivity and specificity and the other on reclassification tables. These new measures offer incremental information over the AUC. We discuss the properties of these new measures and contrast them with the AUC. We also develop simple asymptotic tests of significance. We illustrate the use of these measures with an example from the Framingham Heart Study. We propose that scientists consider these types of measures in addition to the AUC when assessing the performance of newer biomarkers.

5,651 citations

Book ChapterDOI
24 Aug 2005
TL;DR: An easily interpretable index of predictive discrimination as well as methods for assessing calibration of predicted survival probabilities are discussed, applicable to all regression models, but are particularly needed for binary, ordinal, and time-to-event outcomes.
Abstract: Multivariable regression models are powerful tools that are used frequently in studies of clinical outcomes. These models can use a mixture of categorical and continuous variables and can handle partially observed (censored) responses. However, uncritical application of modelling techniques can result in models that poorly fit the dataset at hand, or, even more likely, inaccurately predict outcomes on new subjects. One must know how to measure qualities of a model's fit in order to avoid poorly fitted or overfitted models. Measurement of predictive accuracy can be difficult for survival time data in the presence of censoring. We discuss an easily interpretable index of predictive discrimination as well as methods for assessing calibration of predicted survival probabilities. Both types of predictive accuracy should be unbiasedly validated using bootstrapping or cross-validation, before using predictions in a new data series. We discuss some of the hazards of poorly fitted and overfitted regression models and present one modelling strategy that avoids many of the problems discussed. The methods described are applicable to all regression models, but are particularly needed for binary, ordinal, and time-to-event outcomes. Methods are illustrated with a survival analysis in prostate cancer using Cox regression.

4,905 citations

References
More filters
Book
01 Jan 1966
TL;DR: This book discusses statistical decision theory and sensory processes in signal detection theory and psychophysics and describes how these processes affect decision-making.
Abstract: Book on statistical decision theory and sensory processes in signal detection theory and psychophysics

11,820 citations

Journal ArticleDOI
TL;DR: In this paper, the authors show that the limit distribution is normal if n, n$ go to infinity in any arbitrary manner, where n = m = 8 and n = n = 8.
Abstract: Let $x$ and $y$ be two random variables with continuous cumulative distribution functions $f$ and $g$. A statistic $U$ depending on the relative ranks of the $x$'s and $y$'s is proposed for testing the hypothesis $f = g$. Wilcoxon proposed an equivalent test in the Biometrics Bulletin, December, 1945, but gave only a few points of the distribution of his statistic. Under the hypothesis $f = g$ the probability of obtaining a given $U$ in a sample of $n x's$ and $m y's$ is the solution of a certain recurrence relation involving $n$ and $m$. Using this recurrence relation tables have been computed giving the probability of $U$ for samples up to $n = m = 8$. At this point the distribution is almost normal. From the recurrence relation explicit expressions for the mean, variance, and fourth moment are obtained. The 2rth moment is shown to have a certain form which enabled us to prove that the limit distribution is normal if $m, n$ go to infinity in any arbitrary manner. The test is shown to be consistent with respect to the class of alternatives $f(x) > g(x)$ for every $x$.

11,055 citations

Journal ArticleDOI
TL;DR: In this paper, it was shown that there exist strictly unbiased and consistent tests for the univariate and multivariate two-and fc-sample problem, for the hypothesis of independence, and for the hypotheses of symmetry with respect to a given point.
Abstract: It is shown that there exist strictly unbiased and consistent tests for the univariate and multivariate two- and fc-sample problem, for the hypothesis of independence, and for the hypothesis of symmetry with respect to a given point. Certain new tests for the univariate two-sample problem are discussed. The large sample power of these tests and of the Mann-Whitney test are obtained by means of a theorem of Hoeffding. There’is a discussion of the problem of tied observations.

382 citations

01 Jan 1956
TL;DR: In this paper, the authors present a survey of the known properties of the U statistic which are of importance for its use in testing hypotheses and discuss another use of this statistic which has attracted less attention.
Abstract: For m = n an equivalent statistic had been proposed and studied earlier by Wilcoxon [2]. The main aim of these studies was to develop a test of the hypothesis that X and Y have the same probability distribution: F = G. More about this test will be reported in section 2. Independently, Haldane and Smith [3] investigated the following problem. In some hereditary conditions, the probability that a member of a sibship has the condition depends partly on his birth rank. Having records of sibships in the order of birth, stating for each individual whether it has or does not have the condition, how does one test for independence of the condition from birth rank? To answer this question, Haldane and Smith constructed a test statistic which is equivalent with the U statistic (1.2). Without an attempt at completeness, we shall give in section 2 a brief survey of the known properties of the U statistic which are of importance for its use in testing hypotheses. The main purpose of the present paper, however, is to discuss another use of this statistic which, while not new, seems to have attracted less attention. Let

234 citations