scispace - formally typeset
Search or ask a question

Showing papers on "Item response theory published in 1978"


Journal ArticleDOI
TL;DR: In this paper, a latent trait measurement model in which ordered response categories are both parameterized and scored with successive integers is investigated and applied to a summated rating or Likert questionnaire.
Abstract: A latent trait measurement model in which ordered response categories are both parameterized and scored with successive integers is investigated and applied to a summated rating or Likert ques tionnaire In addition to each category, each item of the questionnaire and each subject are para meterized in the model; and maximum likelihood estimates for these parameters are derived Among the features of the model which make it attractive for applications to Likert questionnaires is that the total score is a sufficient statistic for a subject's at titude measure Thus, the model provides a formal ization of a familiar and practical procedure for measuring attitudes

403 citations


Journal ArticleDOI
TL;DR: When the logistic function is substituted for the normal, Thurstone's Case V specialization of the law of comparative judgment for paired comparison responses gives an identical equation for the especial case as discussed by the authors.
Abstract: When the logistic function is substituted for the normal, Thurstone's Case V specialization of the law of comparative judgment for paired comparison responses gives an identical equation for the es...

134 citations


Journal ArticleDOI
TL;DR: In this article, the authors point out the shortcomings of standard testing and measurement technology, such as the values of standard item parameters (item difficulty and item discrimination) are not invariant across groups of examinees that differ in ability.
Abstract: There are many shortcomings of standard testing and measurement technology.' For one, the values of standard item parameters (item difficulty and item discrimination) are not invariant across groups of examinees that differ in ability. This means that standard item statistics are only useful in test construction for examinee populations very similar to the sample of examinees in which the item statistics were obtained. Another shortcoming is that comparisons of examinees on an ability measured by a set of test items comprising a test are limited to situations where examinees are administered the same (or parallel) test items. Finally, standard testing technology provides no basis for determining what a particular examinee might do when confronted with a

95 citations


01 Dec 1978
TL;DR: Item Response Theory (IRT) as mentioned in this paper is a latent trait theory based on test characteristic curve theory, which is used for the testing practitioner with minimum training in statistics and psychometrics.
Abstract: : This book is an introduction to Item Response Theory (IRT) (also called Item Characteristic Curve Theory, or latent trait theory) It is written for the testing practitioner with minimum training in statistics and psychometrics It presents in simple language and with examples the basic mathematical concepts needed to understand the theory Then, building upon those concepts, it develops the basic concepts of Item Response Theory: item parameters, item response function, test characteristic curve, item information functions, test information curve, relative efficiency curve, and score information curve The maximum likelihood and Bayesian modal estimates of ability are described with illustrative examples After a discussion of assumptions and available computer programs, some practical applications are presented, ie equating scales, tailored testing, item cultural bias, and setting pass-fail cut-offs (Author)

37 citations



Journal ArticleDOI
TL;DR: In this paper, a correction is proposed that takes account of the fact that a master who is not able to produce the right answer to an item may guess and the meaning of this correction and its consequences for estimating the model parameters are discussed.
Abstract: Macready and Dayton (1977) introduced two probabilistic models for mastery assessment based on an idealistic all-or-none conception of mastery. Although these models are in statistical respects complete, the question is whether they are a plausible rendering of what happens when an examinee responds to an item. First, a correction is proposed that takes account of the fact that a master who is not able to produce the right answer to an item may guess. The meaning of this correction and its consequences for estimating the model parameters are discussed. Second, Macready and Dayton’s latent class models are confronted with the three-parameter logistic model extended with the conception of mastery as a region on a latent variable. It appears that from a latent trait theoretic point of view, the Macready and Dayton models assume item characteristic curves that have the unrealistic form of a step function with a single step. The implications of the all-or-none conception of mastery for the learning process wil...

17 citations



Journal ArticleDOI
TL;DR: In this paper, the authors suggest latent partition analysis as a means of empirically demonstrating the conceptual homogeneity of an item population and use it to identify the most salient features of a population.
Abstract: The purpose of this study is to suggest latent partition analysis (Wiley, 1967) as a means of empirically demonstrating the conceptual homogeneity of an item population. Throughout the psychometric literature, there is general agreement that homogeneity would be a desirable characteristic of an item population. However, the question of exactly what homogeneity should mean or how it should be measured has never been resolved.

13 citations


Journal ArticleDOI
TL;DR: In this paper, the same items were presented in clusters (32 triads, 2 dyads) with instructions to pick the one true (or the one false) statement, and the relation between the difficulty of the component true-false items and of the multiple choice clusters was examined.
Abstract: Graduate students in education responded twice at the same sitting to 100 true-false questions on educational measurement. The items were presented as Part 1 and Part 2 of a midterm test. In Part 1, the items were presented separately with instructions to the students to mark each statement true or false. In Part 2, the same items were presented in clusters (32 triads, 2 dyads) with instructions to pick the one true (or the one false) statement. Scores on Part 1 were much more reliable than scores on Part 2. These results support the suggestion from test specialists that test constructors should avoid use of multiple true-false items. The relation between the difficulty of the component true-false items and of the multiple choice clusters was examined.

13 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a survey of the steps involved in item analysis in personality scales, questionnaires, and inventories, and present a series of items which appear on the surface to be tapping the construct, and put the test into practice.
Abstract: Although hundreds of published and unpublished personality scales, questionnaires, and inventories have been developed since World War II, relatively little formal exposition is available concerning the steps involved in item analysis. It is felt that this situation has been compounded by past adherence to a strictly rational or empirical construction schema, since each has implied that only certain statistical item-analytic techniques are appropriate. At its extreme, rational scale construction has involved only a few agreed upon steps: (a) select and define a construct of interest, (b) write a series of items which appear on the surface to be tapping the construct, and (c) put the test into practice, perhaps attempting to differentiate criterion groups on the basis of obtained score. In many instances, only the most rudimentary item-analytic procedures, if any, have been used.

11 citations


Journal ArticleDOI
TL;DR: In this article, a method for utilizing this information is suggested that weights each alternative, including no response, on the test to yield maximum coefficient alpha (generalized Kuder-Richardson formula 20).
Abstract: When multiple-choice tests are scored in the usual manner, giving each correct answer one point, information concerning response patterns is lost. A method for utilizing this information is suggested that weights each alternative, including no response, on the test to yield maximum coefficient alpha (generalized Kuder-Richardson formula 20). An example is presented, and the suggested method of scoring is compared with two conventional methods of scoring for this example.

Journal ArticleDOI
TL;DR: In this article, an alternative algorithm for item analysis is described, in which item discrimination indices have been defined for item distractors in addition to their traditional definition for the scored alternative, and the Campbell and Fiske concept of convergent and discriminant validity was reconceptualized from the test to the item level and proposed as an aid in interpretation of results of item analysis.
Abstract: Described is an alternative algorithm for item analysis in which item discrimination indices have been defined for item distractors in addition to their traditional definition for the scored alternative. Also, the Campbell and Fiske concept of convergent and discriminant validity was reconceptualized from the test to the item level and proposed as an aid in interpretation of results of item analysis.

Journal ArticleDOI
TL;DR: In this paper, a modified version of Horst's model for examinee behavior was used to compare the effect of guessing on item reliability for the answer-until-correct (AUC) and zero-one (ZO) scoring procedures.
Abstract: The answer-until-correct (AUC) procedure re quires that examinees respond to a multiple-choice item until they answer it correctly. The examinee's score on the item is then based on the number of responses required for the item. It was expected that the additional responses obtained under the AUC procedure would improve reliability by pro viding additional information on those examinees who fail to choose the correct alternative on their first attempt. However, when compared to the zero- one (ZO) scoring procedure, the AUC procedure has failed to yield consistent improvements in relia bility. Using a modified version of Horst's model for examinee behavior, this paper compares the ef fect of guessing on item reliability for the AUC pro cedure and the ZO procedure. The analysis shows that the relative efficiency of the two procedures de pends strongly on the nature of the item alterna tives and implies that the appropriate criteria for item selection are different for each procedure. Conflicting results rep...




Journal ArticleDOI
TL;DR: Three algorithms for selecting a subset of originally available items to maximize coefficient alpha, including one advanced by Serlin and Kaiser (1976), were compared on the size of the resulting alpha and computation time required with nine sets of data.
Abstract: Three algorithms for selecting a subset of originally available items to maximize coefficient alpha, including one advanced by Serlin and Kaiser (1976), were compared on the size of the resulting alpha and computation time required with nine sets of data. Results indicated that a combination of the two alternate algorithms proposed would perform better than the Serlin-Kaiser method. The characteristics of a computer program to perform these item analyses are described.

Journal ArticleDOI
TL;DR: It is shown that answer changes were more likely to be made on items occurring early in a group of items and toward the end of a test.
Abstract: In an attempt to identify some of the causes of answer changing behavior, the effects of four tests and item specific variables were evaluated. Three samples of New Zealand school children of different ages were administered tests of study skills. The number of answer changes per item was compared with the position of each item in a group of items, the position of each item in the test, the discrimination index and the difficulty index of each item. It is shown that answer changes were more likely to be made on items occurring early in a group of items and toward the end of a test. There was also a tendency for difficult items and items with poor discriminations to be changed more frequently. Some implications of answer changing in the design of tests are discussed.

10 Dec 1978
TL;DR: In this paper, the Pearson System method and the Two-Parameter Beta Method are used for both Degree 3 and 4 cases, and the results are compared with the previous ones. But the results of the two item parameters in the normal ogive model are also estimated for each item, and compared with those obtained by the other two procedures, and mean square errors are adopted in evaluating both the estimated item characteristic functions and probability density functions of ability.
Abstract: : Following Simple Sum Procedure and Weighted Sum Procedure, another method, Proportioned Sum Procedure, is introduced in the context of the Conditional P.D.F. Approach. The new method is somewhat different from the previous two, however, in the sense that the set of conditional density functions is not exclusively recategorized into the item score groups, but they are proportioned into each item score category. The same hypothetical data, i.e., the maximum likelihood estimates of the five hundred hypothetical subjects and their responses to the ten binary items, each of which follows the normal ogive model, are used to try the method. The criterion item characteristic function for each binary item is obtained and compared with those obtained by the other two procedures. The Pearson System Method and the Two-Parameter Beta Method are used for both Degree 3 and 4 Cases, and the results are compared with the previous ones. The mean square errors are adopted in evaluating both the resultant estimated item characteristic functions and probability density functions of ability. The two item parameters in the normal ogive model are also estimated for each item, and the results are compared with the previous ones. (Author)

Journal ArticleDOI
TL;DR: This procedure provides for a more efficient display of test data and could be used to supplement computerized item-analysis programs.
Abstract: A procedure is described in this paper for the display of item analysis data using two types of control charts. The item control chart consists of a plot of item difficulty versus each item with action and warning limits drawn to denote boundaries for the 95% and 99% confidence intervals. A reference line is drawn at a mean item difficulty of 50%. The second control chart consists of a plot of item discrimination versus item difficulty for all test items. This procedure provides for a more efficient display of test data and could be used to supplement computerized item-analysis programs.

Journal ArticleDOI
TL;DR: A response set (or response style) has been defined as "a habit or a momentary set causing the subject to earn a different score from the one he would earn if the same items were presented in a different form".
Abstract: A response-set (or response-style) has been defined as "a habit or a momentary set causing the subject to earn a different score from the one he would earn if the same items were presented in a different form" (Cronbach, 1970, p 148) Several different types of response sets (eg, acquiescence, willingness-to-guess, evasiveness, etc) have been postulated and supported by empirical research In personality instruments, the operation of response sets is often desirable to permit the identification of individuals possessing certain traits In ability and achievement tests, however, response sets are to be avoided since they "dilute a test with factors not intended to form part of the test content, and so reduce its logical validity" (Cronbach, 1950, p 3) In administering objective classroom examinations, this author has observed that some students miss test items, not because they lack the information or skills necessary to answer correctly, but rather because they read the question with insufficient care They think they know what the question says and respond accordingly, when the question poses a different problem For example, a true-false test item might be presented as follows: