Content Analysis in Mass Communication: Assessment and Reporting of Intercoder Reliability
Citations
3,381 citations
Cites background from "Content Analysis in Mass Communicat..."
...This complexity combined with the lack of consensus among communication researchers on which measures are appropriate led Lombard, Snyder-Duch, and Bracken (2002, 2004) to call for a reliability standard that can span the variable nature of available data....
[...]
2,101 citations
Cites background or methods from "Content Analysis in Mass Communicat..."
...…(2002, p. 163) who merely quotes a concern expressed elsewhere about the appropriateness of using different coders for coding different but overlapping sets of units, Lombard et al. (2002) make it a point of recommending against this attractive possibility (p. 602) – without justification, however....
[...]
...In a recent paper published in a special issue of Human Communication Research devoted to methodological topics (Vol. 28, No. 4), Lombard, Snyder-Duch, and Bracken (2002) presented their findings of how reliability was treated in 200 content analyses indexed in Communication Abstracts between 1994…...
[...]
...In a recent article published in this journal, Lombard, Snyder-Duch, and Bracken (2002) surveyed 200 content analyses for their reporting of reliability tests; compared the virtues and drawbacks of five popular reliability measures; and proposed guidelines and standards for their use....
[...]
...This highly undesirable property benefits coders who disagree on these margins over those who agree and it clearly contradicts what its proponents (Cohen, 1960; Fleiss, 1975) argued and what Lombard et al. (2002) have found to be the dominant opinion in the literature....
[...]
...As already mentioned, Lombard et al. (2002) applied the following criterion for accepting content analysis findings as sufficiently reliable: .70, otherwise %-agreement .90 (p. 596)....
[...]
1,668 citations
1,058 citations
Cites background from "Content Analysis in Mass Communicat..."
...Next human coders are unleashed on the data and numerical estimates for each document compared across coders (Lombard et al., 2006; Artstein and Poesio, 2008)....
[...]
934 citations
Cites background or methods from "Content Analysis in Mass Communicat..."
...It can accommodate any number of coders, but it has a major weakness: it fails to account for agreement by chance (Lombard et al., 2002; Neuendorf, 2002)....
[...]
...Percent agreement is considered an overly liberal index by some researchers, and the indices which do account for chance agreement, such as Krippendorff s alpha, are considered overly conservative and often too restrictive (Lombard et al., 2002; Rourke et al., 2001)....
[...]
...When it is calculated across a set of variables, it is not considered as a good measure because it can veil variables with unacceptably low levels of reliability (Lombard et al., 2002)....
[...]
...Following Lombard and colleagues (Lombard et al., 2002), the ‘‘biggest drawback to its use has been its complexity and the resulting difficulty of by hand calculations, especially for interval and ratio level variables’’....
[...]
...Krippendorff s alpha takes into account the magnitude of the misses, adjusting for whether the variable is measured as nominal, ordinal, interval, or ratio (Krippendorff, 1980; Lombard et al., 2002; Neuendorf, 2002)....
[...]
References
34,965 citations
"Content Analysis in Mass Communicat..." refers background or methods in this paper
...is the simple percentage of agreement”; they call Cohen’s kappa “the most widely used measure of interjudge reliability across the behavioral science literature” (p. 137). Hughes and Garrett (1990) coded 68 articles in Journal of Marketing Research, Journal of Marketing, and Journal of Consumer Research during 1984– 1987 that contained reports of intercoder reliability and found 65% used percent agreement. Kolbe and Burnett (1991) coded 128 articles from consumer behavior research in 28 journals, three proceedings and one anthology between 1978 and 1989....
[...]
...is the simple percentage of agreement”; they call Cohen’s kappa “the most widely used measure of interjudge reliability across the behavioral science literature” (p. 137). Hughes and Garrett (1990) coded 68 articles in Journal of Marketing Research, Journal of Marketing, and Journal of Consumer Research during 1984– 1987 that contained reports of intercoder reliability and found 65% used percent agreement....
[...]
...is the simple percentage of agreement”; they call Cohen’s kappa “the most widely used measure of interjudge reliability across the behavioral science literature” (p. 137). Hughes and Garrett (1990) coded 68 articles in Journal of Marketing Research, Journal of Marketing, and Journal of Consumer Research during 1984– 1987 that contained reports of intercoder reliability and found 65% used percent agreement. Kolbe and Burnett (1991) coded 128 articles from consumer behavior research in 28 journals, three proceedings and one anthology between 1978 and 1989. Most of the authors were in marketing departments (only 12.2% were from communication, advertising, and journalism schools or departments). Percent agreement was reported in 32% of the studies, followed by Krippendorff’s alpha (7%), and Holsti’s method (4%); often the calculation method wasn’t specified, and in 31% of the articles no reliability was reported. Also, 36% of the studies reported only an overall reliability, which can hide variables with unacceptably low agreement. Consistent with these findings, Kang et al. (1993) reviewed the 22 articles published in the Journal of Advertising between 1981 and 1990 that employed content analysis and found that 78% “used percentage agreement or some other inappropriate measure” (p. 18). Pasadeos, Huhman, Standley, and Wilson (1995) coded 163 content analyses of news-media messages in four journals (Journalism & Mass Communication Quarterly, Newspaper Research Journal, Journal of Broadcasting and Electronic Media, and Journal of Communication) for the 6-year period of 1988–1993....
[...]
...Percent agreement, Scott’s pi, Cohen’s kappa, and Krippendorff’s alpha were all used to assess intercoder reliability for each variable coded. A beta version of the software package PRAM (Program for Reliability Assessment with Multiple-coders, Skymeg Software, 2002) was used to calculate the first three of these. A beta version of a separate program, Krippendorff’s Alpha 3.12, was used to calculate the fourth. Holsti’s (1969) method was not calculated because, in the case of two coders who evaluate the same reliability sample, the results are identical to those for percent agreement....
[...]
...is the simple percentage of agreement”; they call Cohen’s kappa “the most widely used measure of interjudge reliability across the behavioral science literature” (p. 137). Hughes and Garrett (1990) coded 68 articles in Journal of Marketing Research, Journal of Marketing, and Journal of Consumer Research during 1984– 1987 that contained reports of intercoder reliability and found 65% used percent agreement. Kolbe and Burnett (1991) coded 128 articles from consumer behavior research in 28 journals, three proceedings and one anthology between 1978 and 1989. Most of the authors were in marketing departments (only 12.2% were from communication, advertising, and journalism schools or departments). Percent agreement was reported in 32% of the studies, followed by Krippendorff’s alpha (7%), and Holsti’s method (4%); often the calculation method wasn’t specified, and in 31% of the articles no reliability was reported. Also, 36% of the studies reported only an overall reliability, which can hide variables with unacceptably low agreement. Consistent with these findings, Kang et al. (1993) reviewed the 22 articles published in the Journal of Advertising between 1981 and 1990 that employed content analysis and found that 78% “used percentage agreement or some other inappropriate measure” (p. 18). Pasadeos, Huhman, Standley, and Wilson (1995) coded 163 content analyses of news-media messages in four journals (Journalism & Mass Communication Quarterly, Newspaper Research Journal, Journal of Broadcasting and Electronic Media, and Journal of Communication) for the 6-year period of 1988–1993. They wrote that “we were not able to ascertain who specifically had done the coding in approximately 55% of the studies; a similar number had not reported on whether coding was done independently or by consensus; and more than 80% made no mention of coder training” (p. 8). In their study 51% of the articles did not address reliability at all, 31% used percent agreement, 10% used Scott’s pi, and 6% used Holsti’s method. Only 19% gave reliability figures for all variables while 20% gave only an overall figure. In a study of content analyses published in Journalism & Mass Communication Quarterly between 1971 and 1995, Riffe and Freitag (1997) found that out of 486 articles, only 56% reported intercoder reliability and of those most only reported an overall figure, while only 10% “explicitly specified random sampling in reliability tests” (p....
[...]
25,749 citations
Additional excerpts
...…“rules of thumb” set out by several methodologists (including Banerjee, Capozzoli, McSweeney, & Sinha, 1999; Ellis, 1994; Frey, Botan, & Kreps, 2000; Krippendorff, 1980; Popping, 1988; and Riffe, Lacy, & Fico, 1998) and concludes that “coefficients of .90 or greater would be acceptable to all,…...
[...]
...Again, there are no established standards, but Neuendorf (2002) reviews “rules of thumb” set out by several methodologists (including Banerjee, Capozzoli, McSweeney, & Sinha, 1999; Ellis, 1994; Frey, Botan, & Kreps, 2000; Krippendorff, 1980; Popping, 1988; and Riffe, Lacy, & Fico, 1998) and concludes that “coefficients of ....
[...]
7,877 citations
"Content Analysis in Mass Communicat..." refers background or methods in this paper
...), but this technique has also been questioned (Neuendorf, 2002). With the coding data in hand, the researcher calculates and reports one or more indices of reliability. Popping (1988) identified 39 different “agreement indices” for coding nominal categories, which excludes several techniques for ratio and interval level data, but only a handful of techniques are widely used....
[...]
...The result is often calculated not for a single variable but across a set of variables, a very poor practice which can hide variables with unacceptably low levels of reliability (Kolbe & Burnett, 1991; Neuendorf, 2002)....
[...]
...This index also does not account for differences in how the individual coders distribute their values across the coding categories, a potential source of systematic bias; that is, it assumes the coders have distributed their values across the categories identically and if this is not the case, the formula fails to account for the reduced agreement (Craig, 1981; Hughes & Garrett, 1990; Neuendorf, 2002)....
[...]
...…the coding categories, a potential source of systematic bias; that is, it assumes the coders have distributed their values across the categories identically and if this is not the case, the formula fails to account for the reduced agreement (Craig, 1981; Hughes & Garrett, 1990; Neuendorf, 2002)....
[...]
...In some cases the coders evaluate different but overlapping units (e.g., coder 1 codes units 1–20, coder 2 codes units 11–30, etc.), but this technique has also been questioned (Neuendorf, 2002)....
[...]
7,604 citations
"Content Analysis in Mass Communicat..." refers methods in this paper
...Cohen (1968) proposed a weighted kappa to account for different types of disagreements, however, as with the other indices discussed so far, this measure is generally used only for nominal level variables....
[...]
7,318 citations
"Content Analysis in Mass Communicat..." refers background in this paper
...The index has been adapted for multiple coders and cases in which different coders evaluate different units (Fleiss, 1971)....
[...]