scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Content Analysis in Mass Communication: Assessment and Reporting of Intercoder Reliability

01 Oct 2002-Human Communication Research (Blackwell Publishing Ltd)-Vol. 28, Iss: 4, pp 587-604
TL;DR: A content analysis of 200 studies utilizing content analysis published in the communication literature between 1994 and 1998 is used to characterize practices in the field and demonstrate that mass communication researchers often fail to assess (or at least report) intercoder reliability and often rely on percent agreement, an overly liberal index.
Abstract: As a method specifically intended for the study of messages, content analysis is fundamental to mass communication research. Intercoder reliability, more specifically termed intercoder agreement, is a measure of the extent to which independent judges make the same coding decisions in evaluating the characteristics of messages, and is at the heart of this method. Yet there are few standard and accessible guidelines available regarding the appropriate procedures to use to assess and report intercoder reliability, or software tools to calculate it. As a result, it seems likely that there is little consistency in how this critical element of content analysis is assessed and reported in published mass communication studies. Following a review of relevant concepts, indices, and tools, a content analysis of 200 studies utilizing content analysis published in the communication literature between 1994 and 1998 is used to characterize practices in the field. The results demonstrate that mass communication researchers often fail to assess (or at least report) intercoder reliability and often rely on percent agreement, an overly liberal index. Based on the review and these results, concrete guidelines are offered regarding procedures for assessment and reporting of this important aspect of content analysis.

Content maybe subject to copyright    Report

3,<,3(5+$:(:,&50<,890:?3,<,3(5+$:(:,&50<,890:?
5.(.,+$*/63(89/07$&5.(.,+$*/63(89/07$&
644;50*(:065(*;3:?";)30*(:0659 $*/6636-644;50*(:065

65:,5:5(3?90905!(99644;50*(:06599,994,5:(5+65:,5:5(3?90905!(99644;50*(:06599,994,5:(5+
#,768:05.6-5:,8*6+,8#,30()030:?#,768:05.6-5:,8*6+,8#,30()030:?
!(::/,= 64)(8+
,550-,8$5?+,8;*/
/,8?38(*2,5
3,<,3(5+$:(:,&50<,890:?
*)8(*2,5*9;6/06,+;
6336=:/09(5+(++0:065(3=6829(:/::79,5.(.,+9*/63(89/07*9;6/06,+;*3*64'-(*7;)
"(8:6-:/,644;50*(:065644659
6=+6,9(**,99:6:/09=682),5,B:?6; ,:;9256=6=+6,9(**,99:6:/09=682),5,B:?6; ,:;9256=
";)309/,89$:(:,4,5:
%/0909:/,(**,7:,+<,890656-:/,-6336=05.(8:0*3, 64)(8+!$5?+,8;*/8(*2,5
65:,5:5(3?90905!(99644;50*(:06599,994,5:(5+#,768:05.6-5:,8*6+,8
#,30()030:?;4(5644;50*(:065#,9,(8*/A/::79+6068.
1:)>=/0*//(9),,57;)309/,+05B5(3-684(:/::79+6068.
1:)>
#,*644,5+,+0:(:065#,*644,5+,+0:(:065
64)(8+!$5?+,8;*/8(*2,565:,5:5(3?90905!(99644;50*(:065
99,994,5:(5+#,768:05.6-5:,8*6+,8#,30()030:?;4(5644;50*(:065#,9,(8*/A
/::79+6068.1:)>
%/098:0*3,09)86;./::6?6;-68-8,,(5+67,5(**,99)?:/,$*/6636-644;50*(:065(:
5.(.,+$*/63(89/07$&:/(9),,5(**,7:,+-6805*3;906505644;50*(:065(*;3:?";)30*(:0659)?(5
(;:/680@,+(+40509:8(:686-5.(.,+$*/63(89/07$&68468,05-684(:06573,(9,*65:(*:
30)8(8?,9*9;6/06,+;

Content
Analysis
in
Mass
Communication
Assessment
and
Reporting
of
Intercoder
Reliability
MATTHEW
LOMBARD
JENNIFER
SNYDER-DUCH
CHERYL
CAMPANELLA
BRACKEN
As
a
method
specifically
intended
for
the
study
of
messages,
content
analysis
is
fundamental
to
mass
communication
research.
Intercoder
reliability,
more
specifically
termed
intercoder
agreement,
is
a
measure
of
the
extent
to
which
independent
judges
make
the
same
coding
decisions
in
evaluating
the
characteristics
of
messages,
and
is
at
the
heart
of
this
method.
Yet
there
are
few
standard
and
accessible
guidelines
available
regarding
the
appropriate
proce
dures
to
use
to
assess
and
report
intercoder
reliability,
or
software
tools
to
calculate
it.
As
a
result,
it
seems
likely
that
there
is
little
consistency
in
how
this
critical
element
of
content
analysis
is
assessed
and
reported
in
published
mass
communication
studies.
Following
a
re
view
of
relevant
concepts,
indices,
and
tools,
a
content
analysis
of
200
studies
utilizing
content
analysis
published
in
the
communication
literature
between
1994
and
1998
is
used
to
charac
terize
practices
in
the
field.
The
results
demonstrate
that
mass
communication
researchers
often
fail
to
assess
(or
at
least
report)
intercoder
reliability
and
often
rely
on
percent
agreement,
an
overly
liberal
index.
Based
on
the review
and
these
results,
concrete
guidelines
are
offered
regarding
procedures
for
assessment
and
reporting
of
this
important
aspect
of
content
analysis.
T
he
study
of
communication
is
interdisciplinary,
sharing
topics,
lit
eratures,
expertise,
and
research
methods
with
many
academic
fields
and
disciplines.
But
one
method,
content
analysis,
is
specifi
cally
appropriate
and
necessary
for
(arguably)
the
central
work
of
com
munication
scholars,
in
particular
those
who
study
mass
communication:
the
analysis
of
messages.
Given
that
content
analysis
is
fundamental
to
communication
research
(and
thus
theory),
it
would
be
logical
to
expect
researchers
in
communication
to
be
among
the
most,
if
not
the
most,
pro
ficient
and
rigorous
in
their
use
of
this
method.

Intercoder
reliability
(more
specifically
"intercoder
agreement";
Tinsley
&
Weiss,
1975,2000)
is
"near
the
heart
of
content
analysis;
if
the
coding
is
not
reliable, the
analysis
cannot
be
trusted"
(Singletary,
1993,
p.
294).
However,
there
are
few
standards
or
guidelines available
concerning
how
to
properly
calculate
and
report
intercoder
reliability.
Further,
although
a
handful
of
tools
are
available
to
implement
the
sometimes
complex
formulae
required,
information
about
them
is
often
difficult
to
find
and
they
are
often
difficult
to
use.
It
therefore
seems
likely
that
many
studies
fail
to
adequately
estab
lish
and
report
this critical
component
of
the
content
analysis
method.
This
article
reviews
the
importance
of
intercoder
agreement
for
con
tent
analysis
in
mass
communication
research.
It
first
describes
several
indices
for
calculating
this
type
of
reliability
(varying
in
appropriateness,
complexity,
and
apparent
prevalence
of
use),
and
then
presents
a
content
analysis
of
content
analyses
reported
in
communication
journals
to
es
tablish
how
mass
communication
researchers
have
assessed
and
reported
reliability,
demonstrating
the
importance
of
the
choices
they
make
con
cerning
it.
The
article
concludes
with
a
presentation
of
guidelines
and
recommendations
for
the
calculation
and reporting
of
intercoder
reliability.
CONTENT
ANALYSIS
AND
THE
IMPORTANCE
OF
INTERCODER
RELIABILITY
Berelson's
(1952)
often
cited
definition
of
content
analysis
as
"a
research
technique
for
the
objective,
systematic,
and
quantitative
description
of
the
manifest
content
of
communication"
(p.
18)
makes
clear
the
technique's
unique
appropriateness
for
researchers
in
our
field.
This
is
reinforced
by
Kolbe
and
Burnett's
(1991)
definition
which
states
that
content
analysis
is
"an
observational
research
method
that
is
used
to
systematically
evalu
ate
the
symbolic
content
of
all
forms
of
recorded
communication.
These
communications
can
also
be
analyzed
at
many
levels
(image,
word,
roles,
etc.),
thereby
creating
a
realm
of
research
opportunities"
(p.
243).
While
content
analysis
can
be
applied
to
any
message,
the
method
is
often
used
in
research
on
mass
mediated
communication.
Riffe
and
Freitag
(1997)
note
several
studies
that
demonstrate
the
wide
spread
and
increasing
use
of
content
analysis
in
communication.
The
method
has
been
well
represented
in
graduate
research
methods
courses,
theses,
dissertations,
and
journals.
In
their
own
study
they
report
a
statis
tically
significant
trend
over
25
years
(1971-1995)
in
the
percentage
of
full
research
reports
in
Journalism
&
Mass
Communication
Quarterly
that
fea
ture
this
method,
and
they
note
that
improved
access
to
media
content
through
databases
and
archives,
along
with
new
tools
for
computerized
content
analysis,
suggests
the
trend
is
likely
to
continue.

Intercoder
reliability
is
the
widely
used
term
for
the
extent
to
which
independent
coders
evaluate
a
characteristic
of
a
message
or
artifact
and
reach
the
same
conclusion.
Although
this
term
is
appropriate
and
will
be
used
here,
Tinsley
and
Weiss
(1975,2000)
note
that
the
more
specific
term
for
the
type
of
consistency
required
in
content
analysis
is
intercoder
(or
interrater)
agreement.
They
write
that
while
reliability
could
be
based
on
correlational
(or
analysis
of
variance)
indices
that
assess
the
degree
to
which
"ratings
of
different
judges
are
the
same
when
expressed
as
de
viations
from
their
means,"
intercoder
agreement
is
needed
in
con
tent
analysis
because
it
measures
only
"the
extent
to
which
the
differ
ent
judges
tend
to
assign
exactly
the
same
rating
to
each
object"
(Tinsley
&
Weiss,
2000,
p.
98).
1
It
is
widely
acknowledged
that
intercoder
reliability
is
a
critical
com
ponent
of
content
analysis
and
(although
it
does
not
ensure
validity)
when
it
is
not
established,
the
data
and
interpretations
of
the
data
can
never
be
considered
valid.
As
Neuendorf
(2002)
notes,
"given
that
a
goal
of
con
tent
analysis
is
to
identify
and
record
relatively
objective
(or
at
least
intersubjective) characteristics
of
messages,
reliability
is
paramount.
With
out
the
establishment
of
reliability,
content
analysis
measures
are
use
less"
(p.
141).
Kolbe
and
Burnett
(1991)
write
that
"interjudge
reliability
is
often
perceived
as
the
standard
measure
of
research
quality.
High
levels
of
disagreement
among
judges
suggest
weaknesses
in
research
methods,
including
the
possibility
of
poor
operational
definitions,
categories,
and
judge
training"
(p.
248).
A
distinction
is
often
made
between
the
coding
of
the
manifest
con
tent,
information
"on
the
surface,"
and
the
latent
content
beneath
these
surface
elements.
Potter
and
Levine-Donnerstein
(1999)
note
that
for
la
tent
content
the
coders
must
provide
subjective
interpretations
based
on
their
own
mental
schema
and
that
this
"only
increases
the
importance
of
making
the
case
that
the
judgments
of
coders
are
intersubjective,
that
is,
those
judgments,
while
subjectively
derived,
are
shared
across
coders,
and
the
meaning
therefore
is
also
likely
to
reach
out
to
readers
of
the
research"
(p.
266).
There
are
important
practical
reasons
to
establish
intercoder
reliability
as
well.
Neuendorf
(2002)
argues
that,
in
addition
to
being
a
necessary
(although
not
sufficient)
step
in
validating
a
coding
scheme,
establishing
a
high
level
of
reliability
also
has
the
practical
benefit
of
allowing
the
researcher
to
divide
the
coding
work
among
many
different
coders.
Rust
and
Cooil
(1994)
note
that
intercoder
reliability
is
important
to
marketing
researchers
in
part
because
"high
reliability
makes
it
less
likely
that
bad
managerial
decisions
will
result
from
using
the
data"
(p.
11).
Potter
and
Levine-Donnerstein
(1999)
make
a
similar
argument
regarding
applied
work
in
public
information
campaigns.

MEASURING
INTERCODER
RELIABILITY
Intercoder
reliability
is
assessed
by
having
two
or
more
coders
catego
rize
units
(programs,
scenes,
articles,
stories,
words,
etc.),
and
then
using
these
categorizations
to
calculate
a
numerical
index
of
the
extent
of
agree
ment
between
or
among
the
coders.
There
are
many
variations
in
how
this
process
can
and
should
be
conducted,
but
at
a
minimum
the
researcher
has
to
create
a
representative
set
of
units
for
testing
reliability
and
the
coding
decisions
must
be
made
independently
under
the
same
condi
tions.
A
separate
pilot
test
is
often
used
to
assess
reliability
during
coder
training,
with
a
final
test
to
establish
reliability
levels
for
the
coding
of
the
full
sample
(or
census)
of
units.
Researchers
themselves
may
serve
as
cod
ers,
a
practice
questioned
by
some
(e.g.,
Kolbe
&
Burnett,
1991)
because
it
weakens
the
argument
that
other
independent
judges
can
reliably
apply
the
coding
scheme.
In
some
cases
the
coders
evaluate
different
but
over
lapping
units
(e.g.,
coder
1
codes
units
1-20,
coder
2
codes
units
11-30,
etc.),
but
this
technique
has
also
been
questioned
(Neuendorf,
2002).
With
the
coding
data
in
hand,
the
researcher
calculates
and
reports
one
or
more
indices
of
reliability.
Popping
(1988)
identified
39
different
"agreement
indices"
for
coding
nominal
categories,
which
excludes
sev
eral
techniques
for
ratio
and
interval
level
data,
but
only
a
handful
of
techniques
are
widely
used.
2
Percent
Agreement
Percent
agreement
also
called
simple
agreement,
percentage
of
agree
ment, raw
percent
agreement,
or
crude
agreement
is
the
percentage
of
all
coding
decisions
made
by
pairs
of
coders
on
which
the
coders
agree.
As
with
most
indices,
percent
agreement
takes
values
of
.00
(no
agreement)
to
1.00
(perfect
agreement).
The
obvious
advantages
of
this
index
are
that
it
is
simple,
intuitive,
and
easy
to
calculate.
It
also
can
accommodate
any
number
of
coders.
However,
this
method
also
has
major
weaknesses,
the
most
important
of
which
involves
its
failure
to
account
for
agreement
that
would
occur
simply
by
chance.
Consider
this
example:
Two
coders
are
given
100
units
(news stories,
words,
etc.)
to
code
as
having
or
not
hav
ing
a
given
property.
Without
any
instructions
or
training,
without
even
knowing
the
property
they
are
to
identify,
they
will
agree
half
of
the
time,
and these
random
agreements
will
produce
a
percent
agreement
value
of
.50.
This
problem
is
most
severe
when
there
are
fewer
categories
in
a
cod
ing
scheme,
but
it
remains
in
any
case,
making
it
difficult
to
judge
and
compare
true
reliability
across
variables
(Perrault
&
Leigh,
1989).
Seun
and
Lee
(1985)
reanalyzed
data
from
a
sample
of
published
studies
correcting
for
chance
agreement
and
concluded
that
"be
tween
one-fourth
and
three-fourths
of
the
reported
observations

Citations
More filters
Journal ArticleDOI
TL;DR: This work proposes Krippendorff's alpha as the standard reliability measure, general in that it can be used regardless of the number of observers, levels of measurement, sample sizes, and presence or absence of missing data.
Abstract: In content analysis and similar methods, data are typically generated by trained human observers who record or transcribe textual, pictorial, or audible matter in terms suitable for analysis. Conclusions from such data can be trusted only after demonstrating their reliability. Unfortunately, the content analysis literature is full of proposals for so-called reliability coefficients, leaving investigators easily confused, not knowing which to choose. After describing the criteria for a good measure of reliability, we propose Krippendorff's alpha as the standard reliability measure. It is general in that it can be used regardless of the number of observers, levels of measurement, sample sizes, and presence or absence of missing data. To facilitate the adoption of this recommendation, we describe a freely available macro written for SPSS and SAS to calculate Krippendorff's alpha and illustrate its use with a simple example.

3,381 citations


Cites background from "Content Analysis in Mass Communicat..."

  • ...This complexity combined with the lack of consensus among communication researchers on which measures are appropriate led Lombard, Snyder-Duch, and Bracken (2002, 2004) to call for a reliability standard that can span the variable nature of available data....

    [...]

Journal ArticleDOI
TL;DR: In a recent article as mentioned in this paper, Lombard, Snyder-Duch, and Bracken surveyed 200 content analyses for their reporting of reliability tests, compared the virtues and drawbacks of five popular reliability measures, and proposed guidelines and standards for their use.
Abstract: In a recent article in this journal, Lombard, Snyder-Duch, and Bracken (2002) surveyed 200 content analyses for their reporting of reliability tests, compared the virtues and drawbacks of five popular reliability measures, and proposed guidelines and standards for their use. Their discussion revealed that numerous misconceptions circulate in the content analysis literature regarding how these measures behave and can aid or deceive content analysts in their effort to ensure the reliability of their data. This article proposes three conditions for statistical measures to serve as indices of the reliability of data and examines the mathematical structure and the behavior of the five coefficients discussed by the authors, as well as two others. It compares common beliefs about these coefficients with what they actually do and concludes with alternative recommendations for testing reliability in content analysis and similar data-making efforts.

2,101 citations


Cites background or methods from "Content Analysis in Mass Communicat..."

  • ...…(2002, p. 163) who merely quotes a concern expressed elsewhere about the appropriateness of using different coders for coding different but overlapping sets of units, Lombard et al. (2002) make it a point of recommending against this attractive possibility (p. 602) – without justification, however....

    [...]

  • ...In a recent paper published in a special issue of Human Communication Research devoted to methodological topics (Vol. 28, No. 4), Lombard, Snyder-Duch, and Bracken (2002) presented their findings of how reliability was treated in 200 content analyses indexed in Communication Abstracts between 1994…...

    [...]

  • ...In a recent article published in this journal, Lombard, Snyder-Duch, and Bracken (2002) surveyed 200 content analyses for their reporting of reliability tests; compared the virtues and drawbacks of five popular reliability measures; and proposed guidelines and standards for their use....

    [...]

  • ...This highly undesirable property benefits coders who disagree on these margins over those who agree and it clearly contradicts what its proponents (Cohen, 1960; Fleiss, 1975) argued and what Lombard et al. (2002) have found to be the dominant opinion in the literature....

    [...]

  • ...As already mentioned, Lombard et al. (2002) applied the following criterion for accepting content analysis findings as sufficiently reliable: .70, otherwise %-agreement .90 (p. 596)....

    [...]

Journal ArticleDOI
TL;DR: This article presents a procedure for developing coding schemes for in-depth semistructured interview transcripts that involves standardizing the units of text on which coders work and improving the coding scheme’s discriminant capability to an acceptable point.
Abstract: Many social science studies are based on coded in-depth semistructured interview transcripts. But researchers rarely report or discuss coding reliability in this work. Nor is there much literature on the subject for this type of data. This article presents a procedure for developing coding schemes for such data. It involves standardizing the units of text on which coders work and then improving the coding scheme’s discriminant capability (i.e., reducing coding errors) to an acceptable point as indicated by measures of either intercoder reliability or intercoder agreement. This approach is especially useful for situations where a single knowledgeable coder will code all the transcripts once the coding scheme has been established. This approach can also be used with other types of qualitative data and in other circumstances.

1,668 citations

Journal ArticleDOI
TL;DR: The structural topic model makes analyzing open-ended responses easier, more revealing, and capable of being used to estimate treatment effects, and is illustrated with analysis of text from surveys and experiments.
Abstract: Collection and especially analysis of open-ended survey responses are relatively rare in the discipline and when conducted are almost exclusively done through human coding. We present an alternative, semiautomated approach, the structural topic model (STM) (Roberts, Stewart, and Airoldi 2013; Roberts et al. 2013), that draws on recent developments in machine learning based analysis of textual data. A crucial contribution of the method is that it incorporates information about the document, such as the author's gender, political affiliation, and treatment assignment (if an experimental study). This article focuses on how the STM is helpful for survey researchers and experimentalists. The STM makes analyzing open-ended responses easier, more revealing, and capable of being used to estimate treatment effects. We illustrate these innovations with analysis of text from surveys and experiments.

1,058 citations


Cites background from "Content Analysis in Mass Communicat..."

  • ...Next human coders are unleashed on the data and numerical estimates for each document compared across coders (Lombard et al., 2006; Artstein and Poesio, 2008)....

    [...]

Journal ArticleDOI
TL;DR: The authors put forward the need to improve the theoretical and empirical base of the existing instruments in order to promote the overall quality of CSCL-research.
Abstract: Research in the field of Computer Supported Collaborative Learning (CSCL) is based on a wide variety of methodologies. In this paper, we focus upon content analysis, which is a technique often used to analyze transcripts of asynchronous, computer mediated discussion groups in formal educational settings. Although this research technique is often used, standards are not yet established. The applied instruments reflect a wide variety of approaches and differ in their level of detail and the type of analysis categories used. Further differences are related to a diversity in their theoretical base, the amount of information about validity and reliability, and the choice for the unit of analysis. This article presents an overview of different content analysis instruments, building on a sample of models commonly used in the CSCL-literature. The discussion of 15 instruments results in a number of critical conclusions. There are questions about the coherence between the theoretical base and the operational translation of the theory in the instruments. Instruments are hardly compared or contrasted with one another. As a consequence the empirical base of the validity of the instruments is limited. The analysis is rather critical when it comes to the issue of reliability. The authors put forward the need to improve the theoretical and empirical base of the existing instruments in order to promote the overall quality of CSCL-research.

934 citations


Cites background or methods from "Content Analysis in Mass Communicat..."

  • ...It can accommodate any number of coders, but it has a major weakness: it fails to account for agreement by chance (Lombard et al., 2002; Neuendorf, 2002)....

    [...]

  • ...Percent agreement is considered an overly liberal index by some researchers, and the indices which do account for chance agreement, such as Krippendorff s alpha, are considered overly conservative and often too restrictive (Lombard et al., 2002; Rourke et al., 2001)....

    [...]

  • ...When it is calculated across a set of variables, it is not considered as a good measure because it can veil variables with unacceptably low levels of reliability (Lombard et al., 2002)....

    [...]

  • ...Following Lombard and colleagues (Lombard et al., 2002), the ‘‘biggest drawback to its use has been its complexity and the resulting difficulty of by hand calculations, especially for interval and ratio level variables’’....

    [...]

  • ...Krippendorff s alpha takes into account the magnitude of the misses, adjusting for whether the variable is measured as nominal, ordinal, interval, or ratio (Krippendorff, 1980; Lombard et al., 2002; Neuendorf, 2002)....

    [...]

References
More filters
Journal ArticleDOI
Jacob Cohen1
TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.
Abstract: CONSIDER Table 1. It represents in its formal characteristics a situation which arises in the clinical-social-personality areas of psychology, where it frequently occurs that the only useful level of measurement obtainable is nominal scaling (Stevens, 1951, pp. 2526), i.e. placement in a set of k unordered categories. Because the categorizing of the units is a consequence of some complex judgment process performed by a &dquo;two-legged meter&dquo; (Stevens, 1958), it becomes important to determine the extent to which these judgments are reproducible, i.e., reliable. The procedure which suggests itself is that of having two (or more) judges independently categorize a sample of units and determine the degree, significance, and

34,965 citations


"Content Analysis in Mass Communicat..." refers background or methods in this paper

  • ...is the simple percentage of agreement”; they call Cohen’s kappa “the most widely used measure of interjudge reliability across the behavioral science literature” (p. 137). Hughes and Garrett (1990) coded 68 articles in Journal of Marketing Research, Journal of Marketing, and Journal of Consumer Research during 1984– 1987 that contained reports of intercoder reliability and found 65% used percent agreement. Kolbe and Burnett (1991) coded 128 articles from consumer behavior research in 28 journals, three proceedings and one anthology between 1978 and 1989....

    [...]

  • ...is the simple percentage of agreement”; they call Cohen’s kappa “the most widely used measure of interjudge reliability across the behavioral science literature” (p. 137). Hughes and Garrett (1990) coded 68 articles in Journal of Marketing Research, Journal of Marketing, and Journal of Consumer Research during 1984– 1987 that contained reports of intercoder reliability and found 65% used percent agreement....

    [...]

  • ...is the simple percentage of agreement”; they call Cohen’s kappa “the most widely used measure of interjudge reliability across the behavioral science literature” (p. 137). Hughes and Garrett (1990) coded 68 articles in Journal of Marketing Research, Journal of Marketing, and Journal of Consumer Research during 1984– 1987 that contained reports of intercoder reliability and found 65% used percent agreement. Kolbe and Burnett (1991) coded 128 articles from consumer behavior research in 28 journals, three proceedings and one anthology between 1978 and 1989. Most of the authors were in marketing departments (only 12.2% were from communication, advertising, and journalism schools or departments). Percent agreement was reported in 32% of the studies, followed by Krippendorff’s alpha (7%), and Holsti’s method (4%); often the calculation method wasn’t specified, and in 31% of the articles no reliability was reported. Also, 36% of the studies reported only an overall reliability, which can hide variables with unacceptably low agreement. Consistent with these findings, Kang et al. (1993) reviewed the 22 articles published in the Journal of Advertising between 1981 and 1990 that employed content analysis and found that 78% “used percentage agreement or some other inappropriate measure” (p. 18). Pasadeos, Huhman, Standley, and Wilson (1995) coded 163 content analyses of news-media messages in four journals (Journalism & Mass Communication Quarterly, Newspaper Research Journal, Journal of Broadcasting and Electronic Media, and Journal of Communication) for the 6-year period of 1988–1993....

    [...]

  • ...Percent agreement, Scott’s pi, Cohen’s kappa, and Krippendorff’s alpha were all used to assess intercoder reliability for each variable coded. A beta version of the software package PRAM (Program for Reliability Assessment with Multiple-coders, Skymeg Software, 2002) was used to calculate the first three of these. A beta version of a separate program, Krippendorff’s Alpha 3.12, was used to calculate the fourth. Holsti’s (1969) method was not calculated because, in the case of two coders who evaluate the same reliability sample, the results are identical to those for percent agreement....

    [...]

  • ...is the simple percentage of agreement”; they call Cohen’s kappa “the most widely used measure of interjudge reliability across the behavioral science literature” (p. 137). Hughes and Garrett (1990) coded 68 articles in Journal of Marketing Research, Journal of Marketing, and Journal of Consumer Research during 1984– 1987 that contained reports of intercoder reliability and found 65% used percent agreement. Kolbe and Burnett (1991) coded 128 articles from consumer behavior research in 28 journals, three proceedings and one anthology between 1978 and 1989. Most of the authors were in marketing departments (only 12.2% were from communication, advertising, and journalism schools or departments). Percent agreement was reported in 32% of the studies, followed by Krippendorff’s alpha (7%), and Holsti’s method (4%); often the calculation method wasn’t specified, and in 31% of the articles no reliability was reported. Also, 36% of the studies reported only an overall reliability, which can hide variables with unacceptably low agreement. Consistent with these findings, Kang et al. (1993) reviewed the 22 articles published in the Journal of Advertising between 1981 and 1990 that employed content analysis and found that 78% “used percentage agreement or some other inappropriate measure” (p. 18). Pasadeos, Huhman, Standley, and Wilson (1995) coded 163 content analyses of news-media messages in four journals (Journalism & Mass Communication Quarterly, Newspaper Research Journal, Journal of Broadcasting and Electronic Media, and Journal of Communication) for the 6-year period of 1988–1993. They wrote that “we were not able to ascertain who specifically had done the coding in approximately 55% of the studies; a similar number had not reported on whether coding was done independently or by consensus; and more than 80% made no mention of coder training” (p. 8). In their study 51% of the articles did not address reliability at all, 31% used percent agreement, 10% used Scott’s pi, and 6% used Holsti’s method. Only 19% gave reliability figures for all variables while 20% gave only an overall figure. In a study of content analyses published in Journalism & Mass Communication Quarterly between 1971 and 1995, Riffe and Freitag (1997) found that out of 486 articles, only 56% reported intercoder reliability and of those most only reported an overall figure, while only 10% “explicitly specified random sampling in reliability tests” (p....

    [...]

Book
01 Jan 1980
TL;DR: History Conceptual Foundations Uses and Kinds of Inference The Logic of Content Analysis Designs Unitizing Sampling Recording Data Languages Constructs for Inference Analytical Techniques The Use of Computers Reliability Validity A Practical Guide
Abstract: History Conceptual Foundations Uses and Kinds of Inference The Logic of Content Analysis Designs Unitizing Sampling Recording Data Languages Constructs for Inference Analytical Techniques The Use of Computers Reliability Validity A Practical Guide

25,749 citations


Additional excerpts

  • ...…“rules of thumb” set out by several methodologists (including Banerjee, Capozzoli, McSweeney, & Sinha, 1999; Ellis, 1994; Frey, Botan, & Kreps, 2000; Krippendorff, 1980; Popping, 1988; and Riffe, Lacy, & Fico, 1998) and concludes that “coefficients of .90 or greater would be acceptable to all,…...

    [...]

  • ...Again, there are no established standards, but Neuendorf (2002) reviews “rules of thumb” set out by several methodologists (including Banerjee, Capozzoli, McSweeney, & Sinha, 1999; Ellis, 1994; Frey, Botan, & Kreps, 2000; Krippendorff, 1980; Popping, 1988; and Riffe, Lacy, & Fico, 1998) and concludes that “coefficients of ....

    [...]

Book
13 Dec 2001
TL;DR: The Content Analysis Guidebook provides an accessible core text for upper-level undergraduates and graduate students across the social sciences that unravels the complicated aspects of content analysis.
Abstract: List of Boxes List of Tables and Figures Foreword Acknowledgments 1. Defining Content Analysis Is Content Analysis "Easy"? Is It Something That Anyone Can Do? A Six-Part Definition of Content Analysis 2. Milestones in the History of Content Analysis The Growing Popularity of Content Analysis Milestones of Content Analysis Research 3. Beyond Description: An Integrative Model of Content Analysis The Language of the Scientific Method How Content Analysis Is Done: Flowchart for the Typical Process of Content-Analysis Research Approaches to Content Analysis The Integrative Model of Content Analysis Evaluation With the Integrative Model of Content Analysis 4. Message Units and Sampling Units Defining the Population Archives Medium Management Sampling Sample Size 5. Variables and Predictions Identifying Critical Variables Hypotheses, Predictions, and Research Questions 6. Measurement Techniques Defining Measurement Validity, Reliability, Accuracy, and Precision Types of Validity Assessment Operationalization Computer Coding Selection of a Computer Text Content Analysis Program Human Coding Index Construction in Content Analysis 7. Reliability Intercoder Reliability Standards and Practices Issues in the Assessment of Reliability Pilot and Final Reliabilities Intercoder Reliability Coefficients: Issues and Comparisons Calculating Intercoder Reliability Coefficients Treatment of Variables That Do Not Achieve an Acceptable Level of Reliability The Use of Multiple Coders Advanced and Specialty Issues in Reliatbility Coefficient Selection 8. Results and Reporting Data Handling and Transformations Hypothesis Tesing Selecting the Appropriate Statistical Tests Frequencies Co-Occurences and In-Context Occurrences Time Lines Bivariate Relationships Multivariate Relationships 9. Contexts Psychometric Applications of Content Analysis Open-Ended Written and Pictorial Responses Linguistics and Semantic Networks Stylometrics and Computer Literary Analysis Interaction Analysis Other Interpersonal Behaviors Violence in the Media Gender Roles Minority Portrayals Advertising News Political Communication Web Analyses Other Applied Contexts Commercial and Other Client-Based Applications of Content Analysis Future Directions Resource 1: Message Archives - P.D. Skalski General Collections Film, Television and Radio Archives Literary and General Corpora Other Archives Resource 2: Using NEXIS for Text Acquisition for Content Analysis Resource 3: Computer Content Analysis Software - P.D. Skalski Part I. Quantitative Computer Text Analysis Programs Part II. VBPro How-To Guide and Executional Flowchart Resource 4: An Introduction to PRAM--A Program for Reliability Assessment With Multiple Coders Resource 5: The Content Analysis Guidebook Online Content Analysis Resources Bibliographies Message Archives and Corpora Reliability Human Coding Sample Materials Computer Content Analysis References Author Index Subject Index About the Authors

7,877 citations


"Content Analysis in Mass Communicat..." refers background or methods in this paper

  • ...), but this technique has also been questioned (Neuendorf, 2002). With the coding data in hand, the researcher calculates and reports one or more indices of reliability. Popping (1988) identified 39 different “agreement indices” for coding nominal categories, which excludes several techniques for ratio and interval level data, but only a handful of techniques are widely used....

    [...]

  • ...The result is often calculated not for a single variable but across a set of variables, a very poor practice which can hide variables with unacceptably low levels of reliability (Kolbe & Burnett, 1991; Neuendorf, 2002)....

    [...]

  • ...This index also does not account for differences in how the individual coders distribute their values across the coding categories, a potential source of systematic bias; that is, it assumes the coders have distributed their values across the categories identically and if this is not the case, the formula fails to account for the reduced agreement (Craig, 1981; Hughes & Garrett, 1990; Neuendorf, 2002)....

    [...]

  • ...…the coding categories, a potential source of systematic bias; that is, it assumes the coders have distributed their values across the categories identically and if this is not the case, the formula fails to account for the reduced agreement (Craig, 1981; Hughes & Garrett, 1990; Neuendorf, 2002)....

    [...]

  • ...In some cases the coders evaluate different but overlapping units (e.g., coder 1 codes units 1–20, coder 2 codes units 11–30, etc.), but this technique has also been questioned (Neuendorf, 2002)....

    [...]

Journal ArticleDOI
Jacob Cohen1
TL;DR: The Kw provides for the incorpation of ratio-scaled degrees of disagreement (or agreement) to each of the cells of the k * k table of joi.
Abstract: A previously described coefficient of agreement for nominal scales, kappa, treats all disagreements equally. A generalization to weighted kappa (Kw) is presented. The Kw provides for the incorpation of ratio-scaled degrees of disagreement (or agreement) to each of the cells of the k * k table of joi

7,604 citations


"Content Analysis in Mass Communicat..." refers methods in this paper

  • ...Cohen (1968) proposed a weighted kappa to account for different types of disagreements, however, as with the other indices discussed so far, this measure is generally used only for nominal level variables....

    [...]

Journal ArticleDOI

7,318 citations


"Content Analysis in Mass Communicat..." refers background in this paper

  • ...The index has been adapted for multiple coders and cases in which different coders evaluate different units (Fleiss, 1971)....

    [...]