scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Brief Tutorial on the Development of Measures for Use in Survey Questionnaires

01 Jan 1998-Organizational Research Methods (SAGE Publications)-Vol. 1, Iss: 1, pp 104-121
TL;DR: The problem of adequate measurement of abstract constructs is perhaps the greatest challenge to understanding the behavior of people in organizations as discussed by the authors, and the problem with the reliability and validity of measures us...
Abstract: The adequate measurement of abstract constructs is perhaps the greatest challenge to understanding the behavior of people in organizations. Problems with the reliability and validity of measures us...

Summary (4 min read)

Introduction

  • The article will describe the development of measures consisting of multiple scales.
  • Construct validity forms the link between theory and psychometric measurement (Kerlinger, 1986), and construct validation is essential for the development of quality measures (Schmitt & Klimoski, 1991).
  • The following discussion will cover two of these approaches.
  • The first is deductive, sometimes called logical partitioning or classification from above.

Deductive

  • Deductive scale development derives its name from the fact that the theoretical foundation provides enough information to generate the initial set of items.
  • This approach literature to develop the theoretical definition of the construct under examination.
  • Expert power might be defined as “the ability to administer to another information, knowledge, or expertise.”.
  • Through the development of adequate construct definitions, items should capture the domain of interest.
  • The disadvantages of the deductive approach are that it is very timeconsuming and requires that researchers possess a working knowledge of the phenomena under investigation.

Inductive

  • The inductive approach may be appropriate when the conceptual basis for a construct may not result in easily identifiable dimensions for which items can then be generated.
  • An example might be, “Describe how your manager communicates with you.”.
  • From these categorized responses, items are derived for subsequent factor analysis.
  • Without a definition of the construct under examination, it can be difficult to develop items that will be conceptually consistent.
  • This technique also makes the appropriate labeling of Butler, 1991; Kipnis, Schmidt, & Wilkinson, 1980.).

Item Development

  • There are a number of guidelines that one should follow in writing items.
  • Items that all respondents would answer similarly should not be used, as they will generate little variance.
  • Respondents are then asked to rate on a Likert-type scale the extent to which each item corresponds to each definition.
  • A second recent advance in establishing content validity is the technique of substantive validity analysis developed by Anderson and Gerbing (1991).
  • This technique was employed recently by MacKenzie, Podsakoff, and Fetter (1991; see Hinkin, 1985, for a detailed description of this process).

A very common question in scale construction is, “How many items?” There are no

  • Hard-and-fast rules guiding this decision, but keeping a measure short is an effective means of minimizing response biases caused by boredom or fatigue (Schmitt & Stults, 1985; Schriesheim & Eisenbach, 1990).
  • Additional items also demand more time in both the development and administration of a measure (Carmines & Zeller, 1979).
  • Adequate internal consistency reliabilities can be obtained with as few as three items (Cook et al., 1981), and adding items indefinitely makes progressively less impact on scale reliability (Carmines & Zeller, 1979).
  • Cortina (1993) found that scales with many argument in favor of shorter scales with high internal consistency.
  • It is also important to assure that the domain has been adequately sampled, as inadequate sampling is a primary source of measurement error (Churchill, 1979).

Item Scaling

  • With respect to scaling the items, it is important that the scale used generate sufficient variance among respondents for subsequent statistical analyses (Stone, 1978).
  • Such as Guttman and Thurstone, Likerttype scales are the most frequently used in survey questionnaire research (Cook et al., 1981) and are the most useful in behavioral research (Kerlinger, 1986).
  • They also are most suitable for use in factor analysis.
  • Coefficient alpha reliability with Likert scales has been shown to increase up to the use of five points, but then it levels off (Lissitz & Green, 1975).
  • If the scale is to be assessing frequency in the use of a behavior, it is very important that the researcher accurately benchmark the response range to maximize the obtained variance on a measure (Harrison & McLaughlin, 1991).

Summary

  • The demonstration of construct validity of a measure is the ultimate objective of the scale development (Cronbach & Meehl, 1955).
  • It may be argued that, due to potential difficulties caused by common source/common method variance, it is inappropriate to use the same sample both for scale development and for assessing the psychometric properties of a new measure (e.g., Campbell, 1976).
  • Psychological Bulletin, 103,265-275. responses to self-report instruments: Effects of item content on work attitude measures.
  • Development and application of new social power measures in superior- subordinate relationships.

Sample Size

  • In the scale development process, it will be necessary to use several independent samples.
  • In the content validity pretest step of the process, both Schriesheim et al. (1993) and Anderson and Gerbing (1991) have suggested that small samples may be appropriate for their analyses, the former using a sample of 65 and the latter using two samples of 20.
  • As sample size increases, the likelihood of attaining statistical significance increases, and it is important to note the difference between statistical and practical significance (Cohen, 1969).
  • Both exploratory and confirmatory factor analysis, discussed below, have been shown to be particularly susceptible to sample size effects.
  • Based on the latter recommendation, if 30 items were retained to develop three measures, at least 300 respondents would be needed for data collection.

It is at this stage that the researcher collects data to both evaluate the new measure’s

  • Factor structure and also for subsequent examination of convergent, discriminant, and criterion-related validity with other measures.
  • Selection of an appropriate type of sample is very important to assure enough variance in responses and avoid the effects of an idiosyncratic context.
  • The sample used for the subsequent data collection should be of adequate size and be representative of the population of interest and be clearly described.

Exploratory Factor Analysis

  • Once the data have been collected, it is recommended that factor analysis is used to further refine the new scales.
  • The researcher should have a strong theoretical justification for determining the number of factors to be retained, and the examination of item loadings on latent factors provides a confirmation of expectations.
  • If the items have been carefully developed, the number of factors that emerge on both Kaiser and scree criteria should equal the number of scales being developed.
  • There are no hard-and-fast rules for this, but the .40 criterion level appears most commonly used in judging factor loadings as meaningful (Ford et al., 1986).
  • The percentage of the total item variance that is explained is also important; the larger the percentage the better.

Internal Consistency Assessment

  • Reliability is the accuracy or precision of a measuring instrument and is a necessary condition for validity (Kerlinger, 1986).
  • A large coefficient alpha (.70 for exploratory measures; Nunnally, 1978) provides an indication of strong item covariance and suggests that the sampling domain has been captured adequately (Churchill, 1979).
  • If the above steps are all carefully followed, it is highly likely that the new scales will be internally consistent and possess content validity.
  • The second purpose it to examine the fit of individual items within the specified model using the modification indices and t values.
  • It has been suggested that a chi-square two or three times as large as the degrees of freedom is acceptable (Carmines & Mclver, 1981), but the fit is considered better the closer the chi-square value is to the degrees of freedom for a model (Thacker, Fields, & Tetrick, 1989).

Multitrait-Multimethod Matrix (MTMM)

  • Convergent and discriminant validity are most commonly examined by using the MTMM developed by Campbell and Fiske (1959; Schmitt & Klimoski, 1991).
  • Bagozzi et al., 1991), they are still useful in determining convergent and discriminant validity (Hollenbeck, Klein, O’Leary, & Wright, 1989; Marsh & Hocevar, 1988).
  • The data from the additional measures obtained during the original questionnaire administration are used at this stage.
  • A matrix is obtained by correlating the newly developed scales with the other measures and by examining the magnitudes of correlations that are similar and dissimilar.
  • Convergent validity is achieved when the correlations between measures of similar constructs using different methods, such as self-reported performance and performance evaluation data (monotrait-heteromethod), are “significantly different from zero and sufficiently large” (Campbell & Fiske, 1959, p. 82).

Alternative Methods

  • There have been several recent advances in techniques to assess convergent and discriminant validity.
  • Factor analytical techniques also have been used to examine discriminant validity.
  • Recent developments have been made in the use of confirmatory factor analysis for what Bagozzi et al. (1991) term “second-generation methods for approaching construct validity” (p. 429).
  • The use of the MTMM, however, has long been a well-accepted technique for establishing convergent and discriminant validity and should serve as a good starting point for establishing construct validity (Schmitt & Klimoski, 1991; Schoenfeldt, 1984).
  • The researcher should also examine relationships between the new measures and variables with which they could be hypothesized to relate to develop a nomological network and establish criterion-related validity (Cronbach & Meehl, 1955).

Hollenbeck, J. R., Klein, H. J., O’Leary, A. M., & Wright, P. M. (1989). Investigation of the

  • Construct validity of a self-report measure of goal commitment.
  • A guide to the program and applications, also known as LISREL 7.
  • Proceedings of the 1990 Southern Management Association annual meetings, 396-398.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

A Brief Tutorial on the Development of Measures for Use in
Survey Questionnaires
Timothy R. Hinkin, Cornell University
The adequate measurement of abstract constructs is perhaps the greatest
challenge to understanding the behavior of people in organizations. Problems with the
reliability and validity of measures used on survey questionnaires continue to lead to
difficulties in interpreting the results of field research. Price and Mueller suggest that
measurement problems may be due to the lack of a well-established framework to guide
researchers through the various stages of scale development. This article provides a
conceptual framework and a straightforward guide for the development of scales in
accordance with established psychometric principles for use in field studies
Introduction
In an extensive review of the organizational behavior literature, Hinkin (1995) found
that inappropriate domain sampling, poor factor structure, low internal consistency reliability
and poor reporting of newly developed measures continue to threaten our understanding of
organizational phenomena. The creation of flawed measures may be due in part to the lack of a
well-established framework to guide researchers through the various stages of scale
development (Price & Mueller, 1986). If researchers determine that no measure exists with
which to assess a particular phenomenon and decide to develop a new measure, some
direction in scale development should prove useful. The importance of sound measurement is
stated succinctly by Schoenfeldt (1984): “The construction of the measuring devices is perhaps
the most important segment of any study. Many well-conceived research studies have never
seen the light of day because of flawed measures” (p. 78).
Perhaps the greatest difficulty in conducting research in organizations is assuring the
accuracy of measurement of the constructs under examination (Barrett, 1972). A construct is a
representation of something that does not exist as an observable dimension of behavior, and
the more abstract the construct, the more difficult it is to measure (Nunnally, 1978). Because
researchers studying organizational behavior rely most heavily on the use of questionnaires as
the primary means of data collection (Stone, 1978), it is crucial that the measures on these
survey instruments adequately represent the constructs under examination.
The purpose of this article is to provide a conceptual framework and a straightforward
guide for the development of scales in accordance with established psychometric principles for

use in survey research. The article is directed toward those readers who may have limited
knowledge or methodological expertise in the scale development process but who are
somewhat familiar with many of the various statistical concepts and methods to be described
herein. As such, no attempt will be made to describe any of the recommended techniques in
great depth. Rather, the focus will be on the order in which the various analyses should be
undertaken, potential problems that may arise, recommendations for reporting results, and
ways in which the process may be made more effective. For the sake of brevity, the discussion
will be oriented around one alternative in a step, but mention will be made of alternative ways
of executing a particular step in the process. The article will describe the development of
measures consisting of multiple scales. The process would be the same, although less complex,
for developing a single, multi-item scale. Supplementary readings will often be recommended
to provide the reader with the opportunity to examine discussed techniques in greater detail. A
model for the scale development process is presented in Figure 1.
The Scale of Development Process
Many criteria have been proposed for assessing the psychometric soundness of
measurement instruments. The American Psychological Association (APA, 1995) states that an
appropriate operational definition of the construct a measure purports to represent should
include a demonstration of content validity, criterion-related validity, and internal consistency.
Together, these provide evidence of construct validitythe extent to which the scale measures
what it is purported to measure. There are three major aspects of construct validation: (a)
specifying the domain of the construct, (b) empirically determining the extent to which items
measure that domain, and (c) examining the extent to which the measure produces results that
are predictable from theoretical hypotheses (Nunnally, 1978). Construct validity forms the link
between theory and psychometric measurement (Kerlinger, 1986), and construct validation is
essential for the development of quality measures (Schmitt & Klimoski, 1991). Each stage of the
process described below will contribute to increasing the confidence in the construct validity of
the new measure.
Step 1: Item Generation
The first stage of scale development is the creation of items to assess the construct
under examination. The key to successful item generation is the development of a well-
articulated theoretical foundation that would indicate the content domain for the new
measure. At this point, the goal of the researcher is to develop items that will result in
measures that sample the theoretical domain of interest to demonstrate content validity.
Domain sampling theory states that it is not possible to measure the complete domain of
interest, but that it is important that the sample of items drawn from potential items
adequately represents the construct under examination (Ghiselli, Campbell, & Zedeck, 1981).
Once a thorough understanding of the theoretical foundation for the potential measure
has been developed, there are several ways in which preliminary items may be created. The

following discussion will cover two of these approaches. The first is deductive, sometimes called
logical partitioning or classification from above. The second method is inductive, known also as
grouping, or classification from below (Hunt, 1991). Both of these techniques have been used
by organizational researchers, and the scale developers must decide which is most appropriate
in their particular situation. Each method will be briefly discussed below.
Deductive
Deductive scale development derives its name from the fact that the theoretical
foundation provides enough information to generate the initial set of items. This approach

requires an understanding of the phenomenon to be investigated and a thorough review of the
literature to develop the theoretical definition of the construct under examination. The
definition is then used as a guide for the development of items (Schwab, 1980). For example,
expert power might be defined as “the ability to administer to another information, knowledge,
or expertise.” Items then may be generated from this definition, being sure that they are
worded consistently in terms of describing a single behavior or an affective response.
Advantages and disadvantages. An advantage of the deductive approach to scale
development is that if properly conducted, it will help to assure content validity in the final
scales. Through the development of adequate construct definitions, items should capture the
domain of interest. The disadvantages of the deductive approach are that it is very time-
consuming and requires that researchers possess a working knowledge of the phenomena
under investigation. In exploratory research, it may not be appropriate to attempt to impose
measures onto an unfamiliar situation. In most situations in which theory does exist, the
deductive approach would be most appropriate. (For an example of this approach, see Ironson,
Smith, Brannick, Gibson, & Paul, 1989; Viega, 1991.)
Inductive
The inductive approach may be appropriate when the conceptual basis for a construct
may not result in easily identifiable dimensions for which items can then be generated.
Researchers usually develop scales inductively by asking a sample of respondents to provide
descriptions of their feelings about their organizations or to describe some aspect of behavior.
An example might be, “Describe how your manager communicates with you.” Responses are
then classified into a number of categories by content analysis based on key words or themes
(see Williamson, Karp, Dalphin, & Gray, 1982) or a sorting process such as the Q-Sorting
technique with an agreement index of some type, usually using multiple judges (see Anderson
& Gerbing, 1991; Kerlinger, 1986). From these categorized responses, items are derived for
subsequent factor analysis.
Advantages and disadvantages. This approach may be very useful when conducting
exploratory research and when it is difficult to generate items that represent an abstract
construct. The challenge arises, however, when attempting to develop items by interpreting the
descriptions provided by respondents. Without a definition of the construct under examination,
it can be difficult to develop items that will be conceptually consistent. This method requires
expertise in content analysis and relies heavily on post hoc factor analytical techniques to
ultimately determine scale construction, basing factor structure and, therefore, scales on item
covariance rather than similar content. Although items may load on the same factor, there is no
guarantee that they measure the same theoretical construct or come from the same sampling
domain (Cortina, 1993). The researcher is compelled to rely on some theoretical framework,
with little assurance that obtained results will not contain items that assess extraneous content
domains (Schriesheim & Hinkin, 1990). This technique also makes the appropriate labeling of

factors more difficult (Ford, MacCallum, & Tait, 1986). (For an example of this approach, see
Butler, 1991; Kipnis, Schmidt, & Wilkinson, 1980.)
Item Development
There are a number of guidelines that one should follow in writing items. Statements
should be simple and as short as possible, and the language used should be familiar to target
respondents. It is also important to keep all items consistent in terms of perspective, being sure
not to mix items that assess behaviors with items that assess affective responses (Harrison &
McLaughlin, 1993). Items should address only a single issue; “double-barreled” items such as
“My manager is intelligent and enthusiastic” should be not be used. Such items may represent
two constructs and result in confusion on the part of the respondents. Leading questions
should be avoided, as they may bias responses. Items that all respondents would answer
similarly should not be used, as they will generate little variance. The issue of negatively
worded, reverse- scored items has stimulated much discussion and has strong proponents both
for and against their use. Some researchers argue that the use of reverse-scored items may
reduce response set bias (e.g., Price & Mueller, 1986). Others, however, have found that the
use of a few of these items randomly interspersed within a measure may have a detrimental
effect on psychometric properties of a measure (Harrison & McLaughlin, 1991). If the
researcher does choose to use reverse-scored items, they must be very carefully worded to
assure appropriate interpretation by respondents, and careful attention should be paid to
factor loadings and communalities at the factor analytical stage of scale development
(Schriesheim, Eisenbach, & Hill, 1989). (For a more detailed discussion of writing items, see
Edwards, 1957; Warwick & Lininger, 1975).
Content Validity Assessment
After items have been generated, they should be subjected to an assessment of content
validity. This process will serve as a pretest, permitting the deletion of items that are deemed to
be conceptually inconsistent. There seems to be no generally accepted quantitative index of
content validity of psychological measures, and judgment must be exercised in validating a
measure (Stone, 1978). Methods do exist, however, to examine the consistency of judgments
with respect to content validity.
Perhaps the most contemporary approach is that developed by Schriesheim and
colleagues (Schriesheim, Powers, Scandura, Gardiner, & Lankau, 1993). The first step is to
administer a set of items that have been developed to measure various constructs, along with
definitions of these various constructs, to respondents. All items are included on every page,
with a different definition at the top of each page. Respondents are then asked to rate on a
Likert-type scale the extent to which each item corresponds to each definition. Schriesheim et
al. (1993) also included a “does not match any definition” option but eliminated this category
from analysis as responses to this were very infrequent. A Q-correlation matrix (item by item)
of the data was then calculated, and that matrix was subjected to principal components

Citations
More filters
Journal ArticleDOI
TL;DR: The establishment of measurement invariance across groups is a logical prerequisite to conducting substantive cross-group comparisons (e.g., tests of group mean differences, invariance of structura, etc.).
Abstract: The establishment of measurement invariance across groups is a logical prerequisite to conducting substantive cross-group comparisons (e.g., tests of group mean differences, invariance of structura...

6,086 citations

Journal ArticleDOI
TL;DR: In this paper, social learning theory is used as a theoretical basis for understanding ethical leadership and a constitutive definition of the ethical leadership construct is proposed. But, little empirical research focuses on an ethical dimension of leadership.

3,547 citations


Cites methods from "A Brief Tutorial on the Development..."

  • ...In developing the measure, we followed the steps advocated in the psychometric literature (e.g., Ghiselli, Campbell, & Zedeck, 1981) and summarized by Hinkin (1998)....

    [...]

  • ...To ensure that the measure was psychometrically sound, we followed systematic procedures (e.g., Ghiselli et al., 1981; Hinkin, 1998) for developing new measures, using multiple types of samples and steps to support content coverage, discriminant validity, nomological validity, and predictive power…...

    [...]

  • ...To ensure that the measure was psychometrically sound, we followed systematic procedures (e.g., Ghiselli et al., 1981; Hinkin, 1998) for developing new measures, using multiple types of samples and steps to support content coverage, discriminant validity, nomological validity, and predictive power beyond existing constructs....

    [...]

Journal ArticleDOI
TL;DR: The authors have suggested that general self-efficacy (GSE) can substantially contribute to organizational theory, research, and practice, however, the limited construct validity of GSE work conduct...
Abstract: Researchers have suggested that general self-efficacy (GSE) can substantially contribute to organizational theory, research, and practice. Unfortunately, the limited construct validity work conduct...

2,673 citations

Journal ArticleDOI
TL;DR: Recommendations include using identified rather than anonymous responses, incorporating instructed response items before data collection, as well as computing consistency indices and multivariate outlier analysis to ensure high-quality data.
Abstract: When data are collected via anonymous Internet surveys, particularly under conditions of obligatory participation (such as with student samples), data quality can be a concern. However, little guidance exists in the published literature regarding techniques for detecting careless responses. Previously several potential approaches have been suggested for identifying careless respondents via indices computed from the data, yet almost no prior work has examined the relationships among these indicators or the types of data patterns identified by each. In 2 studies, we examined several methods for identifying careless responses, including (a) special items designed to detect careless response, (b) response consistency indices formed from responses to typical survey items, (c) multivariate outlier analysis, (d) response time, and (e) self-reported diligence. Results indicated that there are two distinct patterns of careless response (random and nonrandom) and that different indices are needed to identify these different response patterns. We also found that approximately 10%–12% of undergraduates completing a lengthy survey for course credit were identified as careless responders. In Study 2, we simulated data with known random response patterns to determine the efficacy of several indicators of careless response. We found that the nature of the data strongly influenced the efficacy of the indices to identify careless responses. Recommendations include using identified rather than anonymous responses, incorporating instructed response items before data collection, as well as computing consistency indices and multivariate outlier analysis to ensure high-quality data.

2,287 citations

Journal ArticleDOI
TL;DR: In this paper, the authors enhance the theoretical precision of cultural intelligence (CQ: capability to function effectively in culturally diverse settings) by developing and testing a model that posits differential relationships between the four CQ dimensions (metacognitive, cognitive, motivational and behavioural) and three intercultural effectiveness outcomes (cultural judgment and decision making, cultural adaptation and task performance).
Abstract: We enhance the theoretical precision of cultural intelligence (CQ: capability to function effectively in culturally diverse settings) by developing and testing a model that posits differential relationships between the four CQ dimensions (metacognitive, cognitive, motivational and behavioural) and three intercultural effectiveness outcomes (cultural judgment and decision making, cultural adaptation and task performance in culturally diverse settings). Before testing the model, we describe development and cross-validation (N = 1,360) of the multidimensional cultural intelligence scale (CQS) across samples, time and country. We then describe three substantive studies (N = 794) in field and educational development settings across two national contexts, the USA and Singapore. The results demonstrate a consistent pattern of relationships where metacognitive CQ and cognitive CQ predicted cultural judgment and decision making; motivational CQ and behavioural CQ predicted cultural adaptation; and metacognitive CQ and behavioural CQ predicted task performance. We discuss theoretical and practical implications of our model and findings.

1,611 citations


Cites background from "A Brief Tutorial on the Development..."

  • ...Hinkin (1998) suggested starting with twice as many items as targeted for the final scale to allow psychometric refinement....

    [...]

  • ...Hinkin (1998) suggested starting with twice as many items as targeted for the final scale to allow psychometric refinement....

    [...]

References
More filters
Book
01 Dec 1969
TL;DR: The concepts of power analysis are discussed in this paper, where Chi-square Tests for Goodness of Fit and Contingency Tables, t-Test for Means, and Sign Test are used.
Abstract: Contents: Prefaces. The Concepts of Power Analysis. The t-Test for Means. The Significance of a Product Moment rs (subscript s). Differences Between Correlation Coefficients. The Test That a Proportion is .50 and the Sign Test. Differences Between Proportions. Chi-Square Tests for Goodness of Fit and Contingency Tables. The Analysis of Variance and Covariance. Multiple Regression and Correlation Analysis. Set Correlation and Multivariate Methods. Some Issues in Power Analysis. Computational Procedures.

115,069 citations


"A Brief Tutorial on the Development..." refers background in this paper

  • ...As sample size increases, the likelihood of attaining statistical significance increases, and it is important to note the difference between statistical and practical significance (Cohen, 1969)....

    [...]

Journal ArticleDOI
TL;DR: This transmutability of the validation matrix argues for the comparisons within the heteromethod block as the most generally relevant validation data, and illustrates the potential interchangeability of trait and method components.
Abstract: Content Memory (Learning Ability) As Comprehension 82 Vocabulary Cs .30 ( ) .23 .31 ( ) .31 .31 .35 ( ) .29 .48 .35 .38 ( ) .30 .40 .47 .58 .48 ( ) As judged against these latter values, comprehension (.48) and vocabulary (.47), but not memory (.31), show some specific validity. This transmutability of the validation matrix argues for the comparisons within the heteromethod block as the most generally relevant validation data, and illustrates the potential interchangeability of trait and method components. Some of the correlations in Chi's (1937) prodigious study of halo effect in ratings are appropriate to a multitrait-multimethod matrix in which each rater might be regarded as representing a different method. While the published report does not make these available in detail because it employs averaged values, it is apparent from a comparison of his Tables IV and VIII that the ratings generally failed to meet the requirement that ratings of the same trait by different raters should correlate higher than ratings of different traits by the same rater. Validity is shown to the extent that of the correlations in the heteromethod block, those in the validity diagonal are higher than the average heteromethod-heterotrait values. A conspicuously unsuccessful multitrait-multimethod matrix is provided by Campbell (1953, 1956) for rating of the leadership behavior of officers by themselves and by their subordinates. Only one of 11 variables (Recognition Behavior) met the requirement of providing a validity diagonal value higher than any of the heterotrait-heteromethod values, that validity being .29. For none of the variables were the validities higher than heterotrait-monomethod values. A study of attitudes toward authority and nonauthority figures by Burwen and Campbell (1957) contains a complex multitrait-multimethod matrix, one symmetrical excerpt from which is shown in Table 6. Method variance was strong for most of the procedures in this study. Where validity was found, it was primarily at the level of validity diagonal values higher than heterotrait-heteromethod values. As illustrated in Table 6, attitude toward father showed this kind of validity, as did attitude toward peers to a lesser degree. Attitude toward boss showed no validity. There was no evidence of a generalized attitude toward authority which would include father and boss, although such values as the VALIDATION BY THE MULTITRAIT-MULTIMETHOD MATRIX

15,795 citations

Journal ArticleDOI
TL;DR: A critical element in the evolution of a fundamental body of knowledge in marketing, as well as for improved marketing practice, is the development of better measures of the variables with which marketers deal with marketing as discussed by the authors.
Abstract: A critical element in the evolution of a fundamental body of knowledge in marketing, as well as for improved marketing practice, is the development of better measures of the variables with which ma...

14,727 citations


"A Brief Tutorial on the Development..." refers background in this paper

  • ...It is also important to assure that the domain has been adequately sampled, as inadequate sampling is a primary source of measurement error (Churchill, 1979)....

    [...]

  • ...70 for exploratory measures; Nunnally, 1978) provides an indication of strong item covariance and suggests that the sampling domain has been captured adequately (Churchill, 1979)....

    [...]

  • ...It is also important to assure that the domain has been adequately sampled, as inadequate sampling is a primary source of measurement error (Churchill, 1979). As Thurstone (1947) points out, scales should possess simple structure, or parsimony....

    [...]

  • ...Low correlations indicate items that are not drawn from the appropriate domain and that are producing error and unreliability (Churchill, 1979)....

    [...]

Journal ArticleDOI

13,654 citations

Journal ArticleDOI
TL;DR: The Scree Test for the Number Of Factors this paper was first proposed in 1966 and has been used extensively in the field of behavioral analysis since then, e.g., in this paper.
Abstract: (1966). The Scree Test For The Number Of Factors. Multivariate Behavioral Research: Vol. 1, No. 2, pp. 245-276.

12,228 citations

Frequently Asked Questions (9)
Q1. What contributions have the authors mentioned in the paper "A brief tutorial on the development of measures for use in survey questionnaires" ?

This article provides a conceptual framework and a straightforward guide for the development of scales in accordance with established psychometric principles for use in field studies 

Billings, and Nilan (1985) suggest that at least four items per scale are needed to test the homogeneity of items within each latent construct. 

An advantage of the deductive approach to scaledevelopment is that if properly conducted, it will help to assure content validity in the final scales. 

It should be anticipated that approximately one half of the created items will be retained for use in the final scales, so at least twice as many items as will be needed in the final scales should be generated to be administered in a survey questionnaire. 

A Q-correlation matrix (item by item) of the data was then calculated, and that matrix was subjected to principal componentsexamination. 

It is also important to assure that the domain has been adequately sampled, as inadequate sampling is a primary source of measurement error (Churchill, 1979). 

With respect to scaling the items, it is important that the scale used generate sufficientvariance among respondents for subsequent statistical analyses (Stone, 1978). 

The disadvantages of the deductive approach are that it is very timeconsuming and requires that researchers possess a working knowledge of the phenomena under investigation. 

This method requires expertise in content analysis and relies heavily on post hoc factor analytical techniques to ultimately determine scale construction, basing factor structure and, therefore, scales on item covariance rather than similar content.