A Coefficient of agreement for nominal Scales

doi:10.1177/001316446002000104

Home
/
Papers
/
A Coefficient of agreement for nominal Scales

Journal Article•DOI•

A Coefficient of agreement for nominal Scales

Jacob Cohen¹•Institutions (1)

York University¹

01 Apr 1960-Educational and Psychological Measurement (SAGE Publications)-Vol. 20, Iss: 1, pp 37-46

TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.

read less

Abstract: CONSIDER Table 1. It represents in its formal characteristics a situation which arises in the clinical-social-personality areas of psychology, where it frequently occurs that the only useful level of measurement obtainable is nominal scaling (Stevens, 1951, pp. 2526), i.e. placement in a set of k unordered categories. Because the categorizing of the units is a consequence of some complex judgment process performed by a &dquo;two-legged meter&dquo; (Stevens, 1958), it becomes important to determine the extent to which these judgments are reproducible, i.e., reliable. The procedure which suggests itself is that of having two (or more) judges independently categorize a sample of units and determine the degree, significance, and

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

The measurement of observer agreement for categorical data

[...]

J. R. Landis¹, Gary G. Koch•Institutions (1)

University of Michigan¹

01 Mar 1977-Biometrics

TL;DR: A general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies is presented and tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interob server agreement are developed as generalized kappa-type statistics.

...read moreread less

Abstract: This paper presents a general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies. The procedure essentially involves the construction of functions of the observed proportions which are directed at the extent to which the observers agree among themselves and the construction of test statistics for hypotheses involving these functions. Tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interobserver agreement are developed as generalized kappa-type statistics. These procedures are illustrated with a clinical diagnosis example from the epidemiological literature.

...read moreread less

64,109 citations

Cites background or methods from "A Coefficient of agreement for nomi..."

...In particular, w1j represents the set of weights which generate the kappa measure of perfect agreement proposed in Cohen [1960]. The sequence of hierarchical kappa-type statistics within each of the two patient populations associated with the weights given in Table 2 can be expressed in the formulation (A....
[...]
...example, the weights w2j in Table 8 are directly analogous to those discussed in Cohen [1968], Fleiss, Cohen and Everitt [1969] and Cicchetti [1972], which were used to generate weighted kappa and C statistics....
[...]
..., Goodman and Kruskal [1954], Cohen [1960, 1968], Fleiss [1971], Light [1971], and Cicchetti [1972]....
[...]
...Furthermore, as shown in Fleiss and Cohen [1973] and Fleiss [1975], K is directly analogous to the intraclass correlation coefficient obtained from ANOVA models for quantitative measurements and can be used as a measure of the reliability of multiple determinations on the same subj ects....
[...]

Book•

Data Mining: Practical Machine Learning Tools and Techniques

[...]

Ian H. Witten, Eibe Frank, Mark Hall

25 Oct 1999

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.

...read moreread less

Abstract: Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

...read moreread less

20,196 citations

Journal Article•DOI•

Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study

[...]

Salim Yusuf¹, Steven Hawken¹, Stephanie Ôunpuu¹, Tony Dans¹, Alvaro Avezum¹, Fernando Lanas¹, Matthew J. McQueen¹, Andrzej Budaj¹, Prem Pais¹, John Varigos¹, Liu Lisheng¹ - Show less +7 more•Institutions (1)

Population Health Research Institute¹

11 Sep 2004-The Lancet

TL;DR: Abnormal lipids, smoking, hypertension, diabetes, abdominal obesity, psychosocial factors, consumption of fruits, vegetables, and alcohol, and regular physical activity account for most of the risk of myocardial infarction worldwide in both sexes and at all ages in all regions.

...read moreread less

10,387 citations

Journal Article•DOI•

Interrater reliability: the kappa statistic

[...]

Marry L. McHugh

15 Oct 2012-Biochemia Medica

TL;DR: While the kappa is one of the most commonly used statistics to test interrater reliability, it has limitations and levels for both kappa and percent agreement that should be demanded in healthcare studies are suggested.

...read moreread less

Abstract: The kappa statistic is frequently used to test interrater reliability. The importance of rater reliability lies in the fact that it represents the extent to which the data collected in the study are correct representations of the variables measured. Measurement of the extent to which data collectors (raters) assign the same score to the same variable is called interrater reliability. While there have been a variety of methods to measure interrater reliability, traditionally it was measured as percent agreement, calculated as the number of agreement scores divided by the total number of scores. In 1960, Jacob Cohen critiqued use of percent agreement due to its inability to account for chance agreement. He introduced the Cohen's kappa, developed to account for the possibility that raters actually guess on at least some variables due to uncertainty. Like most correlation statistics, the kappa can range from -1 to +1. While the kappa is one of the most commonly used statistics to test interrater reliability, it has limitations. Judgments about what level of kappa should be acceptable for health research are questioned. Cohen's suggested interpretation may be too lenient for health related studies because it implies that a score as low as 0.41 might be acceptable. Kappa and percent agreement are compared, and levels for both kappa and percent agreement that should be demanded in healthcare studies are suggested.

...read moreread less

9,097 citations

Journal Article•DOI•

Schedule for Affective Disorders and Schizophrenia for School-Age Children-Present and Lifetime Version (K-SADS-PL): Initial Reliability and Validity Data

[...]

Joan Kaufman¹, Boris Birmaher¹, David A. Brent¹, Uma Rao², Uma Rao¹, Cynthia Flynn³, Cynthia Flynn¹, Paula Moreci¹, Douglas E. Williamson¹, Neal D. Ryan¹ - Show less +6 more•Institutions (3)

University of Pittsburgh¹, University of California, Los Angeles², Vanderbilt University³

01 Jul 1997-Journal of the American Academy of Child and Adolescent Psychiatry

TL;DR: Results suggest the K-SADS-PL generates reliable and valid child psychiatric diagnoses.

...read moreread less

Abstract: Objective To describe the psychometric properties of the Schedule for Affective Disorders and Schizophrenia for School-Age Children-Present and Lifetime version (K-SADS-PL) interview, which surveys additional disorders not assessed in prior K-SADS, contains improved probes and anchor points, includes diagnosis-specific impairment ratings, generates DSM-III-R and DSM-IV diagnoses, and divides symptoms surveyed into a screening interview and five diagnostic supplements. Method Subjects were 55 psychiatric outpatients and 11 normal controls (aged 7 through 17 years). Both parents and children were used as informants. Concurrent validity of the screen criteria and the K-SADS-PL diagnoses was assessed against standard self-report scales. Interrater ( n = 15) and test-retest ( n = 20) reliability data were also collected (mean retest interval: 18 days; range: 2 to 38 days). Results Rating scale data support the concurrent validity of screens and K-SADS-PL diagnoses. Interrater agreement in scoring screens and diagnoses was high (range: 93% to 100%). Test-retest reliability κ coefficients were in the excellent range for present and/or lifetime diagnoses of major depression, any bipolar, generalized anxiety, conduct, and oppositional defiant disorder (.77 to 1.00) and in the good range for present diagnoses of posttraumatic stress disorder and attention-deficit hyperactivity disorder (.63 to .67). Conclusion Results suggest the K-SADS-PL generates reliable and valid child psychiatric diagnoses. J. Am. Acad. Child Adolesc. Psychiatry , 1997, 36(7): 980–988.

...read moreread less

8,742 citations

Cites methods from "A Coefficient of agreement for nomi..."

...Percent agreement was used to generate interrater reliability estimates, as there were an insufficient number of cases (n < 5) to justilY calculation of a J( statistic (Cohen, 1960) in most diagnostic categories....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•

Fundamental statistics in psychology and education

[...]

J. P. Guilford

01 Jan 1942

3,601 citations

Book•

Measures of association for cross classifications

[...]

Leo A. Goodman¹, William Kruskal¹•Institutions (1)

University of Chicago¹

01 Jan 1979

TL;DR: In this article, a number of alternative measures are considered, almost all based upon a probabilistic model for activity to which the cross-classification may typically lead, and only the case in which the population is completely known is considered, so no question of sampling or measurement error appears.

...read moreread less

Abstract: When populations are cross-classified with respect to two or more classifications or polytomies, questions often arise about the degree of association existing between the several polytomies. Most of the traditional measures or indices of association are based upon the standard chi-square statistic or on an assumption of underlying joint normality. In this paper a number of alternative measures are considered, almost all based upon a probabilistic model for activity to which the cross-classification may typically lead. Only the case in which the population is completely known is considered, so no question of sampling or measurement error appears. We hope, however, to publish before long some approximate distributions for sample estimators of the measures we propose, and approximate tests of hypotheses. Our major theme is that the measures of association used by an empirical investigator should not be blindly chosen because of tradition and convention only, although these factors may properly be g...

...read moreread less

2,672 citations

Journal Article•DOI•

Reliability of Content Analysis:The Case of Nominal Scale Coding

[...]

William A. Scott¹•Institutions (1)

University of Michigan¹

01 Jan 1955-Public Opinion Quarterly

1,872 citations

Journal Article•DOI•

Fundamental statistics in psychology and education.

[...]