Large sample standard errors of kappa and weighted kappa.

doi:10.1037/H0028106

Journal ArticleDOI

Large sample standard errors of kappa and weighted kappa.

Joseph L. Fleiss, +2 more

- 01 Nov 1969 -

Psychological Bulletin

- Vol. 72, Iss: 5, pp 323-327

TLDR

The statistics kappa and weighted kappa (Cohen, 1960) were introduced to provide coefficients of agreement between two raters for nominal scales as discussed by the authors, and they were used to provide a measure of the relative seriousness of the different possible disagreements.

Abstract:

The statistics kappa (Cohen, 1960) and weighted kappa (Cohen, 1968) were introduced to provide coefficients of agreement between two raters for nominal scales. Kappa is appropriate when all disagreements may be considered equally serious, and weighted kappa is appropriate when the relative seriousness of the different possible disagreements can be specified. The papers describing these two statistics also present expressions for their standard errors. These expressions are incorrect, having been derived from the contradictory assumptions of fixed marginal totals and binomial variation of cell frequencies. Everitt (1968) derived the exact variances of weighted and unweighted kappa when the parameters are zero by assuming a generalized hypergeometric distribution. He found these expressions to be far too complicated for routine use, and offered, as alternatives, expressions derived by assuming binomial distributions. These alternative expressions are incorrect, essentially for the same reason as above. Assume that N subjects are distributed into k* cells by each of them being assigned to one of k categories by one rater and, independently, to one of the same k categories by a second

Large sample standard errors of kappa and weighted kappa.

Citations

The measurement of observer agreement for categorical data

Measuring nominal scale agreement among many raters.

The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability.

Mixing patterns in networks.

An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers.

References

A Coefficient of agreement for nominal Scales

Linear statistical inference and its applications

Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit.

Linear Statistical Inference and its Applications

Moments of the statistics kappa and weighted kappa

Related Papers (5)

A Coefficient of agreement for nominal Scales

The measurement of observer agreement for categorical data

Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit.

Statistical methods for rates and proportions

Measuring nominal scale agreement among many raters.