A Brief Tutorial on the Development of Measures for Use in Survey Questionnaires
Summary (4 min read)
Introduction
- The article will describe the development of measures consisting of multiple scales.
- Construct validity forms the link between theory and psychometric measurement (Kerlinger, 1986), and construct validation is essential for the development of quality measures (Schmitt & Klimoski, 1991).
- The following discussion will cover two of these approaches.
- The first is deductive, sometimes called logical partitioning or classification from above.
Deductive
- Deductive scale development derives its name from the fact that the theoretical foundation provides enough information to generate the initial set of items.
- This approach literature to develop the theoretical definition of the construct under examination.
- Expert power might be defined as “the ability to administer to another information, knowledge, or expertise.”.
- Through the development of adequate construct definitions, items should capture the domain of interest.
- The disadvantages of the deductive approach are that it is very timeconsuming and requires that researchers possess a working knowledge of the phenomena under investigation.
Inductive
- The inductive approach may be appropriate when the conceptual basis for a construct may not result in easily identifiable dimensions for which items can then be generated.
- An example might be, “Describe how your manager communicates with you.”.
- From these categorized responses, items are derived for subsequent factor analysis.
- Without a definition of the construct under examination, it can be difficult to develop items that will be conceptually consistent.
- This technique also makes the appropriate labeling of Butler, 1991; Kipnis, Schmidt, & Wilkinson, 1980.).
Item Development
- There are a number of guidelines that one should follow in writing items.
- Items that all respondents would answer similarly should not be used, as they will generate little variance.
- Respondents are then asked to rate on a Likert-type scale the extent to which each item corresponds to each definition.
- A second recent advance in establishing content validity is the technique of substantive validity analysis developed by Anderson and Gerbing (1991).
- This technique was employed recently by MacKenzie, Podsakoff, and Fetter (1991; see Hinkin, 1985, for a detailed description of this process).
A very common question in scale construction is, “How many items?” There are no
- Hard-and-fast rules guiding this decision, but keeping a measure short is an effective means of minimizing response biases caused by boredom or fatigue (Schmitt & Stults, 1985; Schriesheim & Eisenbach, 1990).
- Additional items also demand more time in both the development and administration of a measure (Carmines & Zeller, 1979).
- Adequate internal consistency reliabilities can be obtained with as few as three items (Cook et al., 1981), and adding items indefinitely makes progressively less impact on scale reliability (Carmines & Zeller, 1979).
- Cortina (1993) found that scales with many argument in favor of shorter scales with high internal consistency.
- It is also important to assure that the domain has been adequately sampled, as inadequate sampling is a primary source of measurement error (Churchill, 1979).
Item Scaling
- With respect to scaling the items, it is important that the scale used generate sufficient variance among respondents for subsequent statistical analyses (Stone, 1978).
- Such as Guttman and Thurstone, Likerttype scales are the most frequently used in survey questionnaire research (Cook et al., 1981) and are the most useful in behavioral research (Kerlinger, 1986).
- They also are most suitable for use in factor analysis.
- Coefficient alpha reliability with Likert scales has been shown to increase up to the use of five points, but then it levels off (Lissitz & Green, 1975).
- If the scale is to be assessing frequency in the use of a behavior, it is very important that the researcher accurately benchmark the response range to maximize the obtained variance on a measure (Harrison & McLaughlin, 1991).
Summary
- The demonstration of construct validity of a measure is the ultimate objective of the scale development (Cronbach & Meehl, 1955).
- It may be argued that, due to potential difficulties caused by common source/common method variance, it is inappropriate to use the same sample both for scale development and for assessing the psychometric properties of a new measure (e.g., Campbell, 1976).
- Psychological Bulletin, 103,265-275. responses to self-report instruments: Effects of item content on work attitude measures.
- Development and application of new social power measures in superior- subordinate relationships.
Sample Size
- In the scale development process, it will be necessary to use several independent samples.
- In the content validity pretest step of the process, both Schriesheim et al. (1993) and Anderson and Gerbing (1991) have suggested that small samples may be appropriate for their analyses, the former using a sample of 65 and the latter using two samples of 20.
- As sample size increases, the likelihood of attaining statistical significance increases, and it is important to note the difference between statistical and practical significance (Cohen, 1969).
- Both exploratory and confirmatory factor analysis, discussed below, have been shown to be particularly susceptible to sample size effects.
- Based on the latter recommendation, if 30 items were retained to develop three measures, at least 300 respondents would be needed for data collection.
It is at this stage that the researcher collects data to both evaluate the new measure’s
- Factor structure and also for subsequent examination of convergent, discriminant, and criterion-related validity with other measures.
- Selection of an appropriate type of sample is very important to assure enough variance in responses and avoid the effects of an idiosyncratic context.
- The sample used for the subsequent data collection should be of adequate size and be representative of the population of interest and be clearly described.
Exploratory Factor Analysis
- Once the data have been collected, it is recommended that factor analysis is used to further refine the new scales.
- The researcher should have a strong theoretical justification for determining the number of factors to be retained, and the examination of item loadings on latent factors provides a confirmation of expectations.
- If the items have been carefully developed, the number of factors that emerge on both Kaiser and scree criteria should equal the number of scales being developed.
- There are no hard-and-fast rules for this, but the .40 criterion level appears most commonly used in judging factor loadings as meaningful (Ford et al., 1986).
- The percentage of the total item variance that is explained is also important; the larger the percentage the better.
Internal Consistency Assessment
- Reliability is the accuracy or precision of a measuring instrument and is a necessary condition for validity (Kerlinger, 1986).
- A large coefficient alpha (.70 for exploratory measures; Nunnally, 1978) provides an indication of strong item covariance and suggests that the sampling domain has been captured adequately (Churchill, 1979).
- If the above steps are all carefully followed, it is highly likely that the new scales will be internally consistent and possess content validity.
- The second purpose it to examine the fit of individual items within the specified model using the modification indices and t values.
- It has been suggested that a chi-square two or three times as large as the degrees of freedom is acceptable (Carmines & Mclver, 1981), but the fit is considered better the closer the chi-square value is to the degrees of freedom for a model (Thacker, Fields, & Tetrick, 1989).
Multitrait-Multimethod Matrix (MTMM)
- Convergent and discriminant validity are most commonly examined by using the MTMM developed by Campbell and Fiske (1959; Schmitt & Klimoski, 1991).
- Bagozzi et al., 1991), they are still useful in determining convergent and discriminant validity (Hollenbeck, Klein, O’Leary, & Wright, 1989; Marsh & Hocevar, 1988).
- The data from the additional measures obtained during the original questionnaire administration are used at this stage.
- A matrix is obtained by correlating the newly developed scales with the other measures and by examining the magnitudes of correlations that are similar and dissimilar.
- Convergent validity is achieved when the correlations between measures of similar constructs using different methods, such as self-reported performance and performance evaluation data (monotrait-heteromethod), are “significantly different from zero and sufficiently large” (Campbell & Fiske, 1959, p. 82).
Alternative Methods
- There have been several recent advances in techniques to assess convergent and discriminant validity.
- Factor analytical techniques also have been used to examine discriminant validity.
- Recent developments have been made in the use of confirmatory factor analysis for what Bagozzi et al. (1991) term “second-generation methods for approaching construct validity” (p. 429).
- The use of the MTMM, however, has long been a well-accepted technique for establishing convergent and discriminant validity and should serve as a good starting point for establishing construct validity (Schmitt & Klimoski, 1991; Schoenfeldt, 1984).
- The researcher should also examine relationships between the new measures and variables with which they could be hypothesized to relate to develop a nomological network and establish criterion-related validity (Cronbach & Meehl, 1955).
Hollenbeck, J. R., Klein, H. J., O’Leary, A. M., & Wright, P. M. (1989). Investigation of the
- Construct validity of a self-report measure of goal commitment.
- A guide to the program and applications, also known as LISREL 7.
- Proceedings of the 1990 Southern Management Association annual meetings, 396-398.
Did you find this useful? Give us your feedback
Citations
6,086 citations
3,547 citations
Cites methods from "A Brief Tutorial on the Development..."
...In developing the measure, we followed the steps advocated in the psychometric literature (e.g., Ghiselli, Campbell, & Zedeck, 1981) and summarized by Hinkin (1998)....
[...]
...To ensure that the measure was psychometrically sound, we followed systematic procedures (e.g., Ghiselli et al., 1981; Hinkin, 1998) for developing new measures, using multiple types of samples and steps to support content coverage, discriminant validity, nomological validity, and predictive power…...
[...]
...To ensure that the measure was psychometrically sound, we followed systematic procedures (e.g., Ghiselli et al., 1981; Hinkin, 1998) for developing new measures, using multiple types of samples and steps to support content coverage, discriminant validity, nomological validity, and predictive power beyond existing constructs....
[...]
2,673 citations
2,287 citations
1,611 citations
Cites background from "A Brief Tutorial on the Development..."
...Hinkin (1998) suggested starting with twice as many items as targeted for the final scale to allow psychometric refinement....
[...]
...Hinkin (1998) suggested starting with twice as many items as targeted for the final scale to allow psychometric refinement....
[...]
References
115,069 citations
"A Brief Tutorial on the Development..." refers background in this paper
...As sample size increases, the likelihood of attaining statistical significance increases, and it is important to note the difference between statistical and practical significance (Cohen, 1969)....
[...]
15,795 citations
14,727 citations
"A Brief Tutorial on the Development..." refers background in this paper
...It is also important to assure that the domain has been adequately sampled, as inadequate sampling is a primary source of measurement error (Churchill, 1979)....
[...]
...70 for exploratory measures; Nunnally, 1978) provides an indication of strong item covariance and suggests that the sampling domain has been captured adequately (Churchill, 1979)....
[...]
...It is also important to assure that the domain has been adequately sampled, as inadequate sampling is a primary source of measurement error (Churchill, 1979). As Thurstone (1947) points out, scales should possess simple structure, or parsimony....
[...]
...Low correlations indicate items that are not drawn from the appropriate domain and that are producing error and unreliability (Churchill, 1979)....
[...]
12,228 citations
Related Papers (5)
Frequently Asked Questions (9)
Q2. How many items are needed to test the homogeneity of items within each latent construct?
Billings, and Nilan (1985) suggest that at least four items per scale are needed to test the homogeneity of items within each latent construct.
Q3. What is the advantage of the deductive approach to scale development?
An advantage of the deductive approach to scaledevelopment is that if properly conducted, it will help to assure content validity in the final scales.
Q4. How many items will be retained for use in the final scales?
It should be anticipated that approximately one half of the created items will be retained for use in the final scales, so at least twice as many items as will be needed in the final scales should be generated to be administered in a survey questionnaire.
Q5. What was the method used to determine the content validity of the items?
A Q-correlation matrix (item by item) of the data was then calculated, and that matrix was subjected to principal componentsexamination.
Q6. What is the importance of ensuring that the domain has been adequately sampled?
It is also important to assure that the domain has been adequately sampled, as inadequate sampling is a primary source of measurement error (Churchill, 1979).
Q7. What is the way to measure the content validity of scales?
With respect to scaling the items, it is important that the scale used generate sufficientvariance among respondents for subsequent statistical analyses (Stone, 1978).
Q8. What are the disadvantages of the deductive approach?
The disadvantages of the deductive approach are that it is very timeconsuming and requires that researchers possess a working knowledge of the phenomena under investigation.
Q9. What is the main difference between the two methods?
This method requires expertise in content analysis and relies heavily on post hoc factor analytical techniques to ultimately determine scale construction, basing factor structure and, therefore, scales on item covariance rather than similar content.