Showing papers in &quot;Applied Psychological Measurement in 1986&quot;

Assessing the Dimensionality of a Set of Test Items.

TL;DR: In this paper, a general model for analyzing multitrait-multimethod (MTMM) matrices is presented, based on confirmatory factor analysis (Joreskog, 1974).

...read moreread less

Abstract: Procedures for analyzing multitrait-multimethod (MTMM) matrices are reviewed. Confirmatory factor analysis (Joreskog, 1974) is presented as a general model allowing evaluation of the discriminant a...

...read moreread less

315 citations

Journal Article•DOI•

[...]

Ronald K. Hambleton¹, Richard Rovinelli•Institutions (1)

University of Massachusetts Amherst¹

Book Review : A Primer of Multivariate Statistics (2nd ed.): Richard J. Harris New York: Academic Press, 1985, 546 pp., approx: $33.75

TL;DR: In this article, the authors compared four methods of determining the dimensionality of a set of test items: linear factor analysis, nonlinear factor analysis (NFA), residual analysis, and a method developed by Bejar (1980).

...read moreread less

Abstract: This study compared four methods of determining the dimensionality of a set of test items: linear factor analysis, nonlinear factor analysis, residual analysis, and a method developed by Bejar (1980). Five artifi cial test datasets (for 40 items and 1,500 examinees) were generated to be consistent with the three-parame ter logistic model and the assumption of either a one- or a two-dimensional latent space. Two variables were manipulated: (1) the correlation between the traits (r = .10 or r = .60) and (2) the percent of test items measuring each trait (50% measuring each trait, or 75% measuring the first trait and 25% measuring the second trait).While linear factor analysis in all instances over estimated the number of underlying dimensions in the data, nonlinear factor analysis with linear and quad ratic terms led to correct determination of the item di mensionality in the three datasets where it was used. Both the residual analysis method and Bejar's method proved disappointing. These results suggest th...

...read moreread less

136 citations

Journal Article•DOI•

[...]

James E. Carlson¹•Institutions (1)

The American College of Financial Services¹

A comparison of the eigenvalue method and the geometric mean procedure for ratio scaling

TL;DR: This paper pointed out that the approach used by any textbook author will rarely have a great deal in common with a given instructor's biases, and that some of the major criticisms of the book have to do with the relative emphasis placed on the various topics, the author's tendency to present only one side of certain multi-sided issues, and a certain amount of technical inaccuracy.

...read moreread less

Abstract: Each instructor in a given field brings some uniqueness to his or her delivery of the particular subject matter, and the approach used by any textbook author will rarely have a great deal in common with a given instructor’s biases. In my case, I find that my approach to this subject has almost nothing in common with that of Harris. Some of my criticism of his text should be tempered by this fact. My major criticisms of the book have to do with the relative emphases placed on the various topics, the author’s tendency to present only one side of certain multi-sided issues, and a certain amount of technical inaccuracy. One indicator of the relative emphasis of topics is the number of pages devoted to actual multivariate

...read moreread less

104 citations

Journal Article•DOI•

[...]

David V. Budescu¹, Rami Zwick², Amnon Rapoport¹•Institutions (2)

University of Haifa¹, University of North Carolina at Chapel Hill²

Linking Item Parameters Onto a Common Scale

TL;DR: In this article, the performance of two ratio scaling methods, the eigenvalue method proposed by Saaty (1977, 1980) and the geometric mean procedure advocated by Williams and Cra...

...read moreread less

Abstract: This article evaluates and compares the performance of two ratio scaling methods, the eigenvalue method proposed by Saaty (1977, 1980) and the geometric mean procedure advocated by Williams and Cra...

...read moreread less

97 citations

Journal Article•DOI•

[...]

C. David Vale¹•Institutions (1)

Assessment Systems Corporation¹

A step-down hierarchical multiple regression analysis for examining hypotheses about test bias in prediction

TL;DR: An item bank typically contains items from several tests that have been calibrated by administering them to different groups of examinees, and the parameters of the items must be linked onto a common scale using an anchoring design and a transformation method.

...read moreread less

Abstract: An item bank typically contains items from several tests that have been calibrated by administering them to different groups of examinees. The parameters of the items must be linked onto a common scale. A linking technique consists of an anchoring design and a transformation method. Four basic anchoring designs are the unanchored, anchor-items, anchor-group, and double-anchor designs. The transformation design con sists of the system of equations that is used to trans late the anchor information and put the item parame ters on a common scale. Several transformation methods are discussed briefly. A simulation study is presented that compared the equivalent-groups method with the anchor-items method, using varying numbers of common items, applied both to the situation in which the groups were equivalent and one in which they were not. The results confirm previous findings that the equivalent-groups method is adequate when the groups are in fact equivalent. When the groups are not equivalent, accurate linkin...

...read moreread less

92 citations

Journal Article•DOI•

[...]

Gary J. Lautenschlager¹, Jorge L. Mendoza²•Institutions (2)

University of Georgia¹, Texas A&M University²

Rule-based item bank construction and evaluation within the linear logistic framework

TL;DR: In this article, the problem of determining test bias in regression models is reexamined, and a step-down hierarchical multiple regression procedure is proposed to test hypothesis test bias.

...read moreread less

Abstract: The problem of determining test bias in prediction using regression models is reexamined. Past ap proaches have made use of separate regression anal yses in each subgroup, moderated multiple regression analysis using subgroup coding, and hierarchical mul tiple regression strategies. Although it is agreed that hierarchical multiple regression analysis is preferable to either of the former methods, the approach pre sented here differs with respect to the hypothesis test ing procedure to be employed in such an analysis. This paper describes the difficulties in testing hy potheses about the existence of bias in prediction us ing step-up methods of analysis. Some shortcomings of previously recommended approaches for testing these hypotheses are discussed. Finally, a step-down hierarchical multiple regression procedure is recom mended. Analysis of real data illustrates the potential usefulness of the step-down procedure.

...read moreread less

86 citations

Journal Article•DOI•

[...]

Lutz F. Hornke, Michael W. Habon¹•Institutions (1)

Dornier Flugzeugwerke¹

Exploiting auxiliary information about examinees in the estimation of item parameters

TL;DR: In this paper, a set of 636 items were constructed using prespecified cognitive operations, which explained item difficulty parameters quite well; further cross-validation research may contribute to an item writing approach which attempts to bring psychological theory and psychometric models closer together.

...read moreread less

Abstract: In cognition research, item writing rules are consid ered a necessary prerequisite of item banking. A set of 636 items was constructed using prespecified cognitive operations. An evaluation of test data from some 7,400 examinees revealed 446 homogeneous items. Some items had to be discarded because of printing flaws, and others because of operation complexion or other well-describable reasons. However, cognitive operations explained item difficulty parameters quite well; further cross-validation research may contribute to an item writing approach which attempts to bring psychological theory and psychometric models closer together. This will eventually free item construction from item writer idiosyncrasies.

...read moreread less

80 citations

Journal Article•DOI•

[...]

Robert J. Mislevy¹•Institutions (1)

Princeton University¹

Simple and Weighted Unfolding Threshold Models for the Spatial Representation of Binary Choice Data

TL;DR: Empirical Bayesian computational procedures are presented and illustrated with data from the National Assessment of Educational Progress survey, finding gains roughly equivalent to two to six additional item responses can be expected in typical educational and psychological applications.

...read moreread less

Abstract: : A pervasive problem in item response theory (IRT) is the difficulty of simultaneously estimating large numbers of parameters from limited data. Even large samples of examinees may not eliminate the problem when each examinee responds to only a few items, as in educational assessment and adaptive testing. The precision of item parameter estimates can be increased by taking advantage of dependencies between the latent proficiency variable and auxiliary examinee variables such as age, courses taken, and years of schooling. Gains roughly equivalent to two to six additional item responses can be expected in typical educational and psychological applications. Empirical Bayes computational procedures are presented, and illustrated with data from the Profile of American Youth survey.

...read moreread less

76 citations

Journal Article•DOI•

[...]

Wayne S. DeSarbo¹, Donna L. Hoffman²•Institutions (2)

University of Pennsylvania¹, Columbia University²

Rejoinder to "The Mokken Scale: A Critical Discussion"

TL;DR: The development of an unfold ing methodology designed to analyze " pick any" or "pick any/n" binary choice data (e.g., decisions to buy or not to buy various products) and the results of an appli cation of the spatial choice model to a synthetic data set in a monte Carlo analysis are presented.

...read moreread less

Abstract: This paper describes the development of an unfold ing methodology designed to analyze "pick any" or "pick any/n" binary choice data (e.g., decisions to buy or not to buy various products). Maximum likeli hood estimation procedures are used to obtain a joint space representation of both persons and objects. A review of the relevant literature concerning the spatial treatment of such binary choice data is presented. The nonlinear logistic model type is described, as well as the alternating maximum likelihood algorithm used to estimate the parameter values. The results of an appli cation of the spatial choice model to a synthetic data set in a monte carlo analysis are presented. An appli cation concerning consumer (intended) choices for nine competitive brands of sports cars is discussed. Future research may provide a means of generalizing the model to accommodate three-way choice data.

...read moreread less

70 citations

Journal Article•DOI•

[...]

Robert J. Mokken, Charles Lewis¹, Klaas Sijtsma²•Institutions (2)

University of Groningen¹, VU University Amsterdam²

Equivalence of conventional and computer presentation of speed tests

TL;DR: The nonparametric approach to constructing and evaluating tests based on binary items proposed by Mokken has been criticized by Roskam, van den Wol lenberg, and Jansen.

...read moreread less

Abstract: The nonparametric approach to constructing and evaluating tests based on binary items proposed by Mokken has been criticized by Roskam, van den Wol lenberg, and Jansen. It is contended that their argu ments misrepresent the objectives of this approach, that their criticisms of the role of the H coefficient in the procedures are irrelevant or erroneous, and that they fail to distinguish the inherent requirements (and limitations) of general nonparametric models and pro cedures from those of parametric ones. It is concluded that Mokken's procedures provide a useful tool for re searchers in the social sciences who wish to construct and evaluate tests for measuring theoretically mean ingful latent traits while avoiding the strong parametric assumptions of traditional item response theory.

...read moreread less

65 citations

Journal Article•DOI•

[...]

Valerie A. Greaud¹, Bert F. Green¹•Institutions (1)

Johns Hopkins University¹

Some applications of optimization algorithms in test design and adaptive testing

TL;DR: In this article, the effects of computer pre-sentation on speeded clerical tests were examined as variants of the conven tional score, number of correct responses in a fixed in terval of time.

...read moreread less

Abstract: This study examined the effects of computer pre sentation on speeded clerical tests. Two ratio scores— average number of correct responses per minute and its inverse, average number of seconds per correct re sponse—were examined as variants of the conven tional score, number of correct responses in a fixed in terval of time. Ratio scores were more reliable than number-correct scores and were less sensitive to test ing time. Tests administered on the computer were found to be at least as reliable as conventionally ad ministered tests, but examinees were much faster in the computer mode. Correlations between paper-and- pencil and computer modes were high, except when task differences were introduced by computer imple mentation.

...read moreread less

Journal Article•DOI•

[...]

T.J.J.M. Theunissen

Optimal detection of certain forms of inappropriate test scores

TL;DR: Some test design problems can be seen as combina torial optimization problems, and several suggestions are presented, with various possible applications.

...read moreread less

Abstract: Some test design problems can be seen as combina torial optimization problems. Several suggestions are presented, with various possible applications. Results obtained thus far are promising; the methods suggested can also be used with highly structured test specifica tions.

...read moreread less

Journal Article•DOI•

[...]

Fritz Drasgow, Michael V. Levine¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

Effect of Examinee Group on Equating Relationships

TL;DR: In this paper, two versions of the standardized l0 appropriate ness index are compared to optimal indices, and the detection rates for polychotomous and dichotomous scorings of the item responses are compared.

...read moreread less

Abstract: Optimal appropriateness indices, recently introduced by Levine and Drasgow (1984), provide the highest rates of detection of aberrant response patterns that can be obtained from item responses. In this article they are used to study three important problems in ap propriateness measurement. First, the maximum detec tion rates of two particular forms of aberrance are de termined for a long unidimensional test. These detection rates are shown to be moderately high. Sec ond, two versions of the standardized l0 appropriate ness index are compared to optimal indices. At low false alarm rates, one standardized l0 index has detec tion rates that are about 65% as large as optimal for spuriously high (cheating) test scores. However, for the spuriously low scores expected from persons with ill-advised testing strategies or reading problems, both standardized l0 indices are far from optimal. Finally, detection rates for polychotomous and dichotomous scorings of the item responses are compared. It is shown that dichot...

...read moreread less

Journal Article•DOI•

[...]

Deborah J. Harris¹, Michael J. Kolen¹•Institutions (1)

The American College of Financial Services¹

The Mokken Scale: A Critical Discussion:

TL;DR: In this article, the effects of ability of the examinee group used to establish the equating relationship on linear, equipercentile, and three-parameter logistic IRT-enabled true score equating methods were investigated.

...read moreread less

Abstract: Many educational tests make use of multiple test forms, which are then horizontally equated to establish interchangeability among forms. To have confidence in this interchangeability, the equating relationships should be robust to the particular group of examinees on which the equating is conducted. This study inves tigated the effects of ability of the examinee group used to establish the equating relationship on linear, equipercentile, and three-parameter logistic IRT esti mated true score equating methods. The results show all of the methods to be reasonably independent of ex aminee group, and suggest that population independ ence is not a good reason for selecting one method over another.

...read moreread less

Journal Article•DOI•

[...]

Edward E. Roskam¹, Arnold L. van den Wollenberg¹, Paul Jansen•Institutions (1)

Radboud University Nijmegen¹

Covariance and Regression Slope Models for Studying Validity Generalization

TL;DR: In this paper, it is argued that the Mokken scale is an unfruitful compromise between the requirements of a Guttman scale and classical test theory, and that the Rasch model is the only item response model fulfilling this requirement.

...read moreread less

Abstract: The Mokken scale is critically discussed. It is ar gued that Loevinger's H, adapted by Mokken and ad vocated as a coefficient of scalability, is sensitive to properties of the item set which are extraneous to Mokken's requirement of holomorphy of item re sponse curves. Therefore, when defined in terms of H, the Mokken scale is ambiguous. It is furthermore ar gued that item-selection free statistical inferences con ceming the latent person order appear to be insuffi ciently based on double monotony alone, and that the Rasch model is the only item response model fulfilling this requirement. Finally, it is contended that the Mokken scale is an unfruitful compromise between the requirements of a Guttman scale and the requirements of classical test theory.

...read moreread less

Journal Article•DOI•

[...]

Nambury S. Raju¹, Rodney D. Fralicx, Stephen D. Steinhaus¹•Institutions (1)

Illinois Institute of Technology¹

Effect of Dissimulation Motivation and Anxiety on Response Pattern Appropriateness Measures

TL;DR: In this article, the covariance and regression slope models are proposed for assessing validity gen eralization, which are less restrictive in that they require only one hypothetical distribution of the distribution.

...read moreread less

Abstract: Two new models, the covariance and regression slope models, are proposed for assessing validity gen eralization The new models are less restrictive in that they require only one hypothetical distr

...read moreread less

Journal Article•DOI•

[...]

Menucha Birenbaum¹•Institutions (1)

Tel Aviv University¹

The robustness of Rasch estimates

TL;DR: The authors examined the effect of anxiety and dissimulation motivation of job applicants on their performance on an ability test and found that low anxiety scorers had a greater effect on the ability test performance.

...read moreread less

Abstract: This study examined the effect of anxiety and dissi mulation motivation of job applicants on their perfor mance on an ability test. Two aspects of performance were considered: the total score and the appropriate ness score. Four IRT-based appropriateness indices for detecting aberrant response patterns were employed in this study. The results indicate a negative effect of dissimulation motivation on the performance of low anxiety scorers, with respect to both the total score and the appropriateness score, with a greater effect on the latter. This effect was evidenced by an erratic or aberrant response pattern on the ability test; that is, missing relatively easy items while answering more difficult ones correctly. The results are discussed in light of the diverse interpretations concerning the meaning of Lie scales.

...read moreread less

Journal Article•DOI•

[...]

F.J.R. van de Vijver¹•Institutions (1)

Tilburg University¹

A Probabilistic Multidimensional Scaling Vector Model

TL;DR: In this article, a small scale applicability of the Rasch estimates was investigated under simulated conditions of guess ing and heterogeneity in item discrimination, and the results showed that robustness could only be demonstrated for the correlational criterion.

...read moreread less

Abstract: The small scale applicability of Rasch estimates was investigated under simulated conditions of guess ing and heterogeneity in item discrimination. The ac curacy of the Rasch estimates was evaluated by means of the correlation between the item/person parameters and their estimates, the standard deviations of the esti mates, and the difference as well as the root mean squared difference between parameters and estimates. Within the range of the present investigation (from 10 to 50 items and from 25 to 500 persons) these criteria yielded favorable results under conditions of heteroge neous item discrimination. Under conditions of guess ing, robustness could only be demonstrated for the correlational criterion. Guessing affects the difference measures between the parameter values and estimates quite strongly in a systematic way. It is argued that, notwithstanding these estimation errors, the Rasch model is to be preferred over nonstandard estimation procedures, from which the validity is unclear, or the use o...

...read moreread less

Journal Article•DOI•

[...]

Wayne S. DeSarbo¹, Richard L. Oliver¹, Geert De Soete²•Institutions (2)

University of Pennsylvania¹, Ghent University²

Graphical analysis of item response theory residuals

TL;DR: In this paper, a new stochastic multidimensional scaling (MDS) method was developed for paired comparisons data and rendered a spatial representation of subjects and stimuli, where subjects are represented as vectors and stimuli as points in a T-dimensional space, where the scalar products or pro jections of the stimulus points onto the subject vectors, provide respective information as to the utility (or whatever latent construct is under investigation) of the stimuli to the subjects.

...read moreread less

Abstract: This article presents the development of a new sto chastic multidimensional scaling (MDS) method, which operates on paired comparisons data and renders a spatial representation of subjects and stimuli. Subjects are represented as vectors and stimuli as points in a T- dimensional space, where the scalar products, or pro jections of the stimulus points onto the subject vec tors, provide respective information as to the utility (or whatever latent construct is under investigation) of the stimuli to the subjects. The psychometric literature concerning related MDS methods that also operate on paired comparisons data is reviewed, and a technical description of the new method is provided. A small monte carlo analysis performed on synthetic data with the new method is also presented. To illustrate the versatility of the model, an application measuring con sumer satisfaction and investigating the impact of hy pothesized determinants, using one of the optional re parameterized models, is described. Future areas of ...

...read moreread less

Journal Article•DOI•

[...]

Larry H. Ludlow¹•Institutions (1)

Boston College¹

An Exploration of the Robustness of Four Test Equating Models

TL;DR: A graphical comparison of empirical versus simu lated residual variation is presented as one way to as sess the goodness of fit of an item response theory model.

...read moreread less

Abstract: A graphical comparison of empirical versus simu lated residual variation is presented as one way to as sess the goodness of fit of an item response theory model. The two forms of residual variation were gen erated through the separate calibration of empirical data and data "tailored" to fit the model, given the empirical parameter estimates. A variety of techniques illustrate the utility of using tailored residuals as a specific baseline against which empirical residuals may be understood.

...read moreread less

Journal Article•DOI•

[...]

Gary Skaggs¹, Robert W. Lissitz¹•Institutions (1)

University of Maryland, College Park¹

Perspective on Educational Measurement

TL;DR: In this article, the authors explored how four test equating methods (linear, equipercen tile, and item response theory methods based on the Rasch and three-parameter models) responded to tests of different psychometric properties.

...read moreread less

Abstract: This monte carlo study explored how four com monly used test equating methods (linear, equipercen tile, and item response theory methods based on the Rasch and three-parameter models) responded to tests of different psychometric properties. The four methods were applied to generated data sets where mean item difficulty and discrimination as well as level of chance scoring were manipulated. In all cases, examinee abil ity was matched to the level of difficulty of the tests. The results showed the Rasch model not to be very robust to violations of the equal discrimination and non-chance scoring assumptions. There were also problems with the three-parameter model, but these were due primarily to estimation and linking prob lems. The recommended procedure for tests similar to those studied is the equipercentile method.

...read moreread less

Journal Article•DOI•

[...]

Harold Gulliksen¹•Institutions (1)

Princeton University¹

The changing conception of measurement in education and psychology

TL;DR: In this paper, the use of various types of free-answer items (e.g., the brief answer, interlinear, and "fill in the blanks in the following paragraph" forms) is discussed.

...read moreread less

Abstract: An important but usually neglected aspect of the training of teachers is instruction in the art of writing good classroom tests. Such training should emphasize various forms of objective items (e.g., multiple- choice, master list, matching, greater-less-same, best- worst answer, and matrix format). The proper formu lation and accurate grading of essay items should be included, as should the use of various types of free- answer items (e.g., the brief answer, interlinear, and "fill in the blanks in the following paragraph" forms). For courses involving laboratory work, such as sci ence, machine shop, and home economics, perfor mance and identification tests based on the laboratory work should be used.A second point is that organizations developing ap titude tests for nonacademic areas, such as police work, fire fighting, and licensing tests, should empha size the use by the client of a valid, reliable, and un biased criterion. Organizations developing academic aptitude tests should also (1) be alert to the ...

...read moreread less

Journal Article•DOI•

[...]

Willem J. van der Linden¹•Institutions (1)

University of Twente¹

Survey research measurement issues in evaluating change: A laboratory investigation

TL;DR: A new type of theory and practice in testing is replacing the standard test by the test item bank, and classical test theory by item response theory as discussed by the authors, and it is shown how these also reinforce and complete each other.

...read moreread less

Abstract: Since the era of Binet and Spearman, classical test theory and the ideal of the standard test have gone hand in hand, in part because both are based on the same paradigm of experimental control by manipulation and randomization. Their longevity is a consequence of this mutually beneficial symbiosis. A new type of theory and practice in testing is replacing the standard test by the test item bank, and classical test theory by item response theory. In this paper it is shown how these also reinforce and complete each other.

...read moreread less

Journal Article•DOI•

[...]

Achilles A. Armenakis¹, M. Ronald Buckley², Arthur G. Bedeian³•Institutions (3)

Auburn University¹, Washington State University², Louisiana State University³

Banking non-dichotomously scored items

TL;DR: In this article, the authors employed a laboratory methodology to investigate two research questions related to scale recalibration (beta change) in temporal survey re search, and applied this methodology to evaluate the use of the retrospecive design in assessing organizational change.

...read moreread less

Abstract: Efforts to operationalize the alpha/beta/gamma change typology have suffered from a notable limita tion. Virtually all have been conducted in field set tings, thereby limiting the degree of experimental con trol over outcome criteria. Recognizing this limitation, the present study employed a laboratory methodology to investigate two research questions related to scale recalibration (beta change) in temporal survey re search. Application of this methodology permitted ran dom respondent assignment, exact replication of stim uli, and systematic time interval variation for the pretest-posttest design. Furthermore, the use of these procedures permitted testing the use of the retrospec tive design in assessing organizational change. Impli cations of the findings for the measurement of change are discussed.

...read moreread less

Journal Article•DOI•

[...]

Geofferey N. Masters¹, John Evans²•Institutions (2)

University of Melbourne¹, Development Dimensions International²

An empirical Bayesian approach to item banking

TL;DR: In this paper, a method for constructing a bank of items scored in two or more ordered response categories is described, which enables multistep problems, rating scale items, question "clus...

...read moreread less

Abstract: A method for constructing a bank of items scored in two or more ordered response categories is de scribed and illustrated. This method enables multistep problems, rating scale items, question "clus...

...read moreread less

Journal Article•DOI•

[...]

Willem J. van der Linden¹, Theodorus Johannes Hendrikus Maria Eggen¹•Institutions (1)

University of Twente¹

An estimator of examinee-level measurement error variance that considers test form difficulty adjustments

TL;DR: In this paper, a procedure for the sequential optimization of the calibration of an item bank is given, based on an empirical Bayesian approach to a reformulation of the Rasch model as a model for paired comparisons between the difficulties of test items.

...read moreread less

Abstract: A procedure for the sequential optimization of the calibration of an item bank is given. The procedure is based on an empirical Bayesian approach to a reformulation of the Rasch model as a model for paired comparisons between the difficulties of test items in which ties are allowed to occur. First, it is shown how a paired-comparisons design deals with the usual incompleteness of calibration data and how the item parameters can be estimated using this design. Next, the procedure for a sequential optimization of the item parameter estimators is given, both for individuals responding to pairs of items and for item and examinee groups of any size. The paper concludes with a discussion of the choice of the first priors in the procedure and the problems involved in its generalization to other item response models.

...read moreread less

Journal Article•DOI•

[...]

David Jarjoura¹•Institutions (1)

Northeast Ohio Medical University¹

Small N does not always justify Rasch model

TL;DR: In this paper, a model and estimator for examinee-level measure of error variance are developed, which takes into account test form difficulty adjustments often used in standardized tests, and is linked to indices designed for identi fying unusual item response patterns.

...read moreread less

Abstract: A model and estimator for examinee-level measure ment error variance are developed. Although the bi nomial distribution is basic to the modeling, the pro posed error model provides some insights into problems associated with simple binomial error, and yields estimates of error that are quite distinct from bi nomial error. By taking into consideration test form difficulty adjustments often used in standardized tests, the model is linked also to indices designed for identi fying unusual item response patterns. In addition, av erage error variance under the model is approximately that which would be obtained through a KR-20 estimate of reliability, thus providing a unique justification for this popular index. Empirical results using odd-even and alternate-forms measures of error variance tend to favor the proposed model over the binomial.

...read moreread less

Journal Article•DOI•

[...]

Dato N. M. de Gruijter

The changing conception of measurement: A commentary

TL;DR: In many applications of item response theory, it is of little consequence whether the Rasch model or a more accurate, but more complicated item response model is used as mentioned in this paper, and it might be advantageous to employ the RMS with small sample sizes.

...read moreread less

Abstract: In many applications of item response theory, it is of little consequence whether the Rasch model or a more accurate, but more complicated item response model is used. With small sample sizes, it might be advantageous to employ the Rasch model. A clear counterexample is the case of optimal item selection under guessing.

...read moreread less

Journal Article•DOI•

[...]

Ronald K. Hambleton¹•Institutions (1)

University of Massachusetts Amherst¹

Development of a testing service system

TL;DR: In this paper, the authors present a special issue on item banks for implementing testing programs with the aid of item banks. An historical frame work for viewing the papers is provided by brief re views of the literature in the areas of item response theory, item banking, and computerized testing.

...read moreread less

Abstract: This paper comments on the contributions to this special issue on item banking. An historical frame work for viewing the papers is provided by brief re views of the literature in the areas of item response theory, item banking, and computerized testing. In general, the eight papers are viewed as contributing valuable technical knowledge for implementing testing programs with the aid of item banks.

...read moreread less

Journal Article•DOI•

[...]

Catharina C. van Thiel, Michel A. Zwarts