Journal ArticleDOI
Kappa coefficients in medical research
TLDR
Kappa coefficients are measures of correlation between categorical variables often used as reliability or validity coefficients, and development and definitions of the K by M (ratings) kappas (K x M) are recapitulate and the use of the recommended kappa with applications in medical research is illustrated.Abstract:
Kappa coefficients are measures of correlation between categorical variables often used as reliability or validity coefficients. We recapitulate development and definitions of the K (categories) by M (ratings) kappas (K x M), discuss what they are well- or ill-designed to do, and summarize where kappas now stand with regard to their application in medical research. The 2 x M(M>/=2) intraclass kappa seems the ideal measure of binary reliability; a 2 x 2 weighted kappa is an excellent choice, though not a unique one, as a validity measure. For both the intraclass and weighted kappas, we address continuing problems with kappas. There are serious problems with using the K x M intraclass (K>2) or the various K x M weighted kappas for K>2 or M>2 in any context, either because they convey incomplete and possibly misleading information, or because other approaches are preferable to their use. We illustrate the use of the recommended kappas with applications in medical research.read more
Citations
More filters
Journal ArticleDOI
The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements
Julius Sim,Chris Wright +1 more
TL;DR: The issue of statistical testing of kappa is considered, including the use of confidence intervals, and appropriate sample sizes for reliability studies using kappa are tabulated.
Journal ArticleDOI
The NimStim set of facial expressions: Judgments from untrained research participants
Nim Tottenham,James W. Tanaka,Andrew C. Leon,Thomas McCarry,Marcella Nurse,Todd A. Hare,David J. Marcus,Alissa Westerlund,B. J. Casey,Charles A. Nelson +9 more
TL;DR: The results lend empirical support for the validity and reliability of this set of facial expressions as determined by accurate identification of expressions and high intra-participant agreement across two testing sessions, respectively.
Journal ArticleDOI
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed.
Jan Kottner,Laurent Audigé,Stig Brorson,Allan Donner,Byron J. Gajewski,Asbjørn Hróbjartsson,Chris Roberts,Mohamed Shoukri,David L. Streiner +8 more
TL;DR: In this paper, the authors developed guidelines for reporting reliability and agreement studies in interrater and intra-arater reliability and agreements, and proposed 15 issues that should be addressed when reporting such studies.
Book
Bayesian Cognitive Modeling: A Practical Course
TL;DR: In this article, the basics of Bayesian analysis are discussed, and a WinBUGS-based approach is presented to get started with WinBUGs, which is based on the SIMPLE model of memory.
Journal ArticleDOI
Clinical classification schemes for predicting hemorrhage: Results from the National Registry of Atrial Fibrillation (NRAF)
Brian F. Gage,Yan Yan,Paul E. Milligan,Amy D. Waterman,Robert Culverhouse,Michael W. Rich,Martha J. Radford +6 more
TL;DR: In this article, a new bleeding risk scheme, HEMORR 2 HAGES, was proposed to quantify the risk of hemorrhage in elderly patients with atrial fibrillation.
References
More filters
Journal ArticleDOI
The measurement of observer agreement for categorical data
J. R. Landis,Gary G. Koch +1 more
TL;DR: A general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies is presented and tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interob server agreement are developed as generalized kappa-type statistics.
Journal ArticleDOI
A Coefficient of agreement for nominal Scales
TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.
Journal ArticleDOI
Intraclass correlations: uses in assessing rater reliability.
TL;DR: In this article, the authors present guidelines for choosing among six different forms of the intraclass correlation for reliability studies in which n target are rated by k judges, and the confidence intervals for each of the forms are reviewed.
Book
Statistical methods for rates and proportions
TL;DR: In this paper, the basic theory of Maximum Likelihood Estimation (MLE) is used to detect a difference between two different proportions of a given proportion in a single proportion.
Journal ArticleDOI
Bootstrap Methods: Another Look at the Jackknife
TL;DR: In this article, the authors discuss the problem of estimating the sampling distribution of a pre-specified random variable R(X, F) on the basis of the observed data x.