Kappa coefficients in medical research

doi:10.1002/SIM.1180

Journal ArticleDOI

Kappa coefficients in medical research

Helena C. Kraemer, +2 more

- 30 Jul 2002 -

Statistics in Medicine

- Vol. 21, Iss: 14, pp 2109-2129

TLDR

Kappa coefficients are measures of correlation between categorical variables often used as reliability or validity coefficients, and development and definitions of the K by M (ratings) kappas (K x M) are recapitulate and the use of the recommended kappa with applications in medical research is illustrated.

Abstract:

Kappa coefficients are measures of correlation between categorical variables often used as reliability or validity coefficients. We recapitulate development and definitions of the K (categories) by M (ratings) kappas (K x M), discuss what they are well- or ill-designed to do, and summarize where kappas now stand with regard to their application in medical research. The 2 x M(M>/=2) intraclass kappa seems the ideal measure of binary reliability; a 2 x 2 weighted kappa is an excellent choice, though not a unique one, as a validity measure. For both the intraclass and weighted kappas, we address continuing problems with kappas. There are serious problems with using the K x M intraclass (K>2) or the various K x M weighted kappas for K>2 or M>2 in any context, either because they convey incomplete and possibly misleading information, or because other approaches are preferable to their use. We illustrate the use of the recommended kappas with applications in medical research.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements

Julius Sim, +1 more

- 01 Mar 2005 -

Physical Therapy

TL;DR: The issue of statistical testing of kappa is considered, including the use of confidence intervals, and appropriate sample sizes for reliability studies using kappa are tabulated.

...read moreread less

Journal ArticleDOI

The NimStim set of facial expressions: Judgments from untrained research participants

Nim Tottenham, +9 more

- 15 Aug 2009 -

Psychiatry Research-neuroimaging

TL;DR: The results lend empirical support for the validity and reliability of this set of facial expressions as determined by accurate identification of expressions and high intra-participant agreement across two testing sessions, respectively.

...read moreread less

Journal ArticleDOI

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed.

Jan Kottner, +8 more

- 01 Jan 2011 -

Journal of Clinical Epidemiology

TL;DR: In this paper, the authors developed guidelines for reporting reliability and agreement studies in interrater and intra-arater reliability and agreements, and proposed 15 issues that should be addressed when reporting such studies.

...read moreread less

Book

Bayesian Cognitive Modeling: A Practical Course

Michael D. Lee, +1 more

TL;DR: In this article, the basics of Bayesian analysis are discussed, and a WinBUGS-based approach is presented to get started with WinBUGs, which is based on the SIMPLE model of memory.

...read moreread less

Journal ArticleDOI

Clinical classification schemes for predicting hemorrhage: Results from the National Registry of Atrial Fibrillation (NRAF)

Brian F. Gage, +6 more

- 01 Mar 2006 -

American Heart Journal

TL;DR: In this article, a new bleeding risk scheme, HEMORR 2 HAGES, was proposed to quantify the risk of hemorrhage in elderly patients with atrial fibrillation.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

The measurement of observer agreement for categorical data

J. R. Landis, +1 more

- 01 Mar 1977 -

Biometrics

TL;DR: A general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies is presented and tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interob server agreement are developed as generalized kappa-type statistics.

...read moreread less

Journal ArticleDOI

A Coefficient of agreement for nominal Scales

Jacob Cohen

- 01 Apr 1960 -

Educational and Psychological Measuremen...

TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.

...read moreread less

Journal ArticleDOI

Intraclass correlations: uses in assessing rater reliability.

Patrick E. Shrout, +1 more

- 01 Mar 1979 -

Psychological Bulletin

TL;DR: In this article, the authors present guidelines for choosing among six different forms of the intraclass correlation for reliability studies in which n target are rated by k judges, and the confidence intervals for each of the forms are reviewed.

...read moreread less

Book

Statistical methods for rates and proportions

Joseph L. Fleiss

TL;DR: In this paper, the basic theory of Maximum Likelihood Estimation (MLE) is used to detect a difference between two different proportions of a given proportion in a single proportion.

...read moreread less

Journal ArticleDOI

Bootstrap Methods: Another Look at the Jackknife

Bradley Efron

- 01 Jan 1979 -

Annals of Statistics

TL;DR: In this article, the authors discuss the problem of estimating the sampling distribution of a pre-specified random variable R(X, F) on the basis of the observed data x.

...read moreread less

Collapse

Kappa coefficients in medical research

Citations

The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements

The NimStim set of facial expressions: Judgments from untrained research participants

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed.

Bayesian Cognitive Modeling: A Practical Course

Clinical classification schemes for predicting hemorrhage: Results from the National Registry of Atrial Fibrillation (NRAF)

References

The measurement of observer agreement for categorical data

A Coefficient of agreement for nominal Scales

Intraclass correlations: uses in assessing rater reliability.

Statistical methods for rates and proportions

Bootstrap Methods: Another Look at the Jackknife

Related Papers (5)

The measurement of observer agreement for categorical data

A Coefficient of agreement for nominal Scales

Statistical methods for rates and proportions

Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit.

High agreement but low kappa: I. The problems of two paradoxes.