scispace - formally typeset
Search or ask a question

Showing papers in "Applied Psychological Measurement in 1986"


Journal ArticleDOI
TL;DR: In this paper, a general model for analyzing multitrait-multimethod (MTMM) matrices is presented, based on confirmatory factor analysis (Joreskog, 1974).
Abstract: Procedures for analyzing multitrait-multimethod (MTMM) matrices are reviewed. Confirmatory factor analysis (Joreskog, 1974) is presented as a general model allowing evaluation of the discriminant a...

315 citations


Journal ArticleDOI
TL;DR: In this article, the authors compared four methods of determining the dimensionality of a set of test items: linear factor analysis, nonlinear factor analysis (NFA), residual analysis, and a method developed by Bejar (1980).
Abstract: This study compared four methods of determining the dimensionality of a set of test items: linear factor analysis, nonlinear factor analysis, residual analysis, and a method developed by Bejar (1980). Five artifi cial test datasets (for 40 items and 1,500 examinees) were generated to be consistent with the three-parame ter logistic model and the assumption of either a one- or a two-dimensional latent space. Two variables were manipulated: (1) the correlation between the traits (r = .10 or r = .60) and (2) the percent of test items measuring each trait (50% measuring each trait, or 75% measuring the first trait and 25% measuring the second trait).While linear factor analysis in all instances over estimated the number of underlying dimensions in the data, nonlinear factor analysis with linear and quad ratic terms led to correct determination of the item di mensionality in the three datasets where it was used. Both the residual analysis method and Bejar's method proved disappointing. These results suggest th...

136 citations


Journal ArticleDOI
TL;DR: This paper pointed out that the approach used by any textbook author will rarely have a great deal in common with a given instructor's biases, and that some of the major criticisms of the book have to do with the relative emphasis placed on the various topics, the author's tendency to present only one side of certain multi-sided issues, and a certain amount of technical inaccuracy.
Abstract: Each instructor in a given field brings some uniqueness to his or her delivery of the particular subject matter, and the approach used by any textbook author will rarely have a great deal in common with a given instructor’s biases. In my case, I find that my approach to this subject has almost nothing in common with that of Harris. Some of my criticism of his text should be tempered by this fact. My major criticisms of the book have to do with the relative emphases placed on the various topics, the author’s tendency to present only one side of certain multi-sided issues, and a certain amount of technical inaccuracy. One indicator of the relative emphasis of topics is the number of pages devoted to actual multivariate

104 citations


Journal ArticleDOI
TL;DR: In this article, the performance of two ratio scaling methods, the eigenvalue method proposed by Saaty (1977, 1980) and the geometric mean procedure advocated by Williams and Cra...
Abstract: This article evaluates and compares the performance of two ratio scaling methods, the eigenvalue method proposed by Saaty (1977, 1980) and the geometric mean procedure advocated by Williams and Cra...

97 citations


Journal ArticleDOI
TL;DR: An item bank typically contains items from several tests that have been calibrated by administering them to different groups of examinees, and the parameters of the items must be linked onto a common scale using an anchoring design and a transformation method.
Abstract: An item bank typically contains items from several tests that have been calibrated by administering them to different groups of examinees. The parameters of the items must be linked onto a common scale. A linking technique consists of an anchoring design and a transformation method. Four basic anchoring designs are the unanchored, anchor-items, anchor-group, and double-anchor designs. The transformation design con sists of the system of equations that is used to trans late the anchor information and put the item parame ters on a common scale. Several transformation methods are discussed briefly. A simulation study is presented that compared the equivalent-groups method with the anchor-items method, using varying numbers of common items, applied both to the situation in which the groups were equivalent and one in which they were not. The results confirm previous findings that the equivalent-groups method is adequate when the groups are in fact equivalent. When the groups are not equivalent, accurate linkin...

92 citations


Journal ArticleDOI
TL;DR: In this article, the problem of determining test bias in regression models is reexamined, and a step-down hierarchical multiple regression procedure is proposed to test hypothesis test bias.
Abstract: The problem of determining test bias in prediction using regression models is reexamined. Past ap proaches have made use of separate regression anal yses in each subgroup, moderated multiple regression analysis using subgroup coding, and hierarchical mul tiple regression strategies. Although it is agreed that hierarchical multiple regression analysis is preferable to either of the former methods, the approach pre sented here differs with respect to the hypothesis test ing procedure to be employed in such an analysis. This paper describes the difficulties in testing hy potheses about the existence of bias in prediction us ing step-up methods of analysis. Some shortcomings of previously recommended approaches for testing these hypotheses are discussed. Finally, a step-down hierarchical multiple regression procedure is recom mended. Analysis of real data illustrates the potential usefulness of the step-down procedure.

86 citations


Journal ArticleDOI
TL;DR: In this paper, a set of 636 items were constructed using prespecified cognitive operations, which explained item difficulty parameters quite well; further cross-validation research may contribute to an item writing approach which attempts to bring psychological theory and psychometric models closer together.
Abstract: In cognition research, item writing rules are consid ered a necessary prerequisite of item banking. A set of 636 items was constructed using prespecified cognitive operations. An evaluation of test data from some 7,400 examinees revealed 446 homogeneous items. Some items had to be discarded because of printing flaws, and others because of operation complexion or other well-describable reasons. However, cognitive operations explained item difficulty parameters quite well; further cross-validation research may contribute to an item writing approach which attempts to bring psychological theory and psychometric models closer together. This will eventually free item construction from item writer idiosyncrasies.

80 citations


Journal ArticleDOI
TL;DR: Empirical Bayesian computational procedures are presented and illustrated with data from the National Assessment of Educational Progress survey, finding gains roughly equivalent to two to six additional item responses can be expected in typical educational and psychological applications.
Abstract: : A pervasive problem in item response theory (IRT) is the difficulty of simultaneously estimating large numbers of parameters from limited data. Even large samples of examinees may not eliminate the problem when each examinee responds to only a few items, as in educational assessment and adaptive testing. The precision of item parameter estimates can be increased by taking advantage of dependencies between the latent proficiency variable and auxiliary examinee variables such as age, courses taken, and years of schooling. Gains roughly equivalent to two to six additional item responses can be expected in typical educational and psychological applications. Empirical Bayes computational procedures are presented, and illustrated with data from the Profile of American Youth survey.

76 citations


Journal ArticleDOI
TL;DR: The development of an unfold ing methodology designed to analyze " pick any" or "pick any/n" binary choice data (e.g., decisions to buy or not to buy various products) and the results of an appli cation of the spatial choice model to a synthetic data set in a monte Carlo analysis are presented.
Abstract: This paper describes the development of an unfold ing methodology designed to analyze "pick any" or "pick any/n" binary choice data (e.g., decisions to buy or not to buy various products). Maximum likeli hood estimation procedures are used to obtain a joint space representation of both persons and objects. A review of the relevant literature concerning the spatial treatment of such binary choice data is presented. The nonlinear logistic model type is described, as well as the alternating maximum likelihood algorithm used to estimate the parameter values. The results of an appli cation of the spatial choice model to a synthetic data set in a monte carlo analysis are presented. An appli cation concerning consumer (intended) choices for nine competitive brands of sports cars is discussed. Future research may provide a means of generalizing the model to accommodate three-way choice data.

70 citations


Journal ArticleDOI
TL;DR: The nonparametric approach to constructing and evaluating tests based on binary items proposed by Mokken has been criticized by Roskam, van den Wol lenberg, and Jansen.
Abstract: The nonparametric approach to constructing and evaluating tests based on binary items proposed by Mokken has been criticized by Roskam, van den Wol lenberg, and Jansen. It is contended that their argu ments misrepresent the objectives of this approach, that their criticisms of the role of the H coefficient in the procedures are irrelevant or erroneous, and that they fail to distinguish the inherent requirements (and limitations) of general nonparametric models and pro cedures from those of parametric ones. It is concluded that Mokken's procedures provide a useful tool for re searchers in the social sciences who wish to construct and evaluate tests for measuring theoretically mean ingful latent traits while avoiding the strong parametric assumptions of traditional item response theory.

65 citations


Journal ArticleDOI
TL;DR: In this article, the effects of computer pre-sentation on speeded clerical tests were examined as variants of the conven tional score, number of correct responses in a fixed in terval of time.
Abstract: This study examined the effects of computer pre sentation on speeded clerical tests. Two ratio scores— average number of correct responses per minute and its inverse, average number of seconds per correct re sponse—were examined as variants of the conven tional score, number of correct responses in a fixed in terval of time. Ratio scores were more reliable than number-correct scores and were less sensitive to test ing time. Tests administered on the computer were found to be at least as reliable as conventionally ad ministered tests, but examinees were much faster in the computer mode. Correlations between paper-and- pencil and computer modes were high, except when task differences were introduced by computer imple mentation.

Journal ArticleDOI
TL;DR: Some test design problems can be seen as combina torial optimization problems, and several suggestions are presented, with various possible applications.
Abstract: Some test design problems can be seen as combina torial optimization problems. Several suggestions are presented, with various possible applications. Results obtained thus far are promising; the methods suggested can also be used with highly structured test specifica tions.

Journal ArticleDOI
TL;DR: In this paper, two versions of the standardized l0 appropriate ness index are compared to optimal indices, and the detection rates for polychotomous and dichotomous scorings of the item responses are compared.
Abstract: Optimal appropriateness indices, recently introduced by Levine and Drasgow (1984), provide the highest rates of detection of aberrant response patterns that can be obtained from item responses. In this article they are used to study three important problems in ap propriateness measurement. First, the maximum detec tion rates of two particular forms of aberrance are de termined for a long unidimensional test. These detection rates are shown to be moderately high. Sec ond, two versions of the standardized l0 appropriate ness index are compared to optimal indices. At low false alarm rates, one standardized l0 index has detec tion rates that are about 65% as large as optimal for spuriously high (cheating) test scores. However, for the spuriously low scores expected from persons with ill-advised testing strategies or reading problems, both standardized l0 indices are far from optimal. Finally, detection rates for polychotomous and dichotomous scorings of the item responses are compared. It is shown that dichot...

Journal ArticleDOI
TL;DR: In this article, the effects of ability of the examinee group used to establish the equating relationship on linear, equipercentile, and three-parameter logistic IRT-enabled true score equating methods were investigated.
Abstract: Many educational tests make use of multiple test forms, which are then horizontally equated to establish interchangeability among forms. To have confidence in this interchangeability, the equating relationships should be robust to the particular group of examinees on which the equating is conducted. This study inves tigated the effects of ability of the examinee group used to establish the equating relationship on linear, equipercentile, and three-parameter logistic IRT esti mated true score equating methods. The results show all of the methods to be reasonably independent of ex aminee group, and suggest that population independ ence is not a good reason for selecting one method over another.

Journal ArticleDOI
TL;DR: In this paper, it is argued that the Mokken scale is an unfruitful compromise between the requirements of a Guttman scale and classical test theory, and that the Rasch model is the only item response model fulfilling this requirement.
Abstract: The Mokken scale is critically discussed. It is ar gued that Loevinger's H, adapted by Mokken and ad vocated as a coefficient of scalability, is sensitive to properties of the item set which are extraneous to Mokken's requirement of holomorphy of item re sponse curves. Therefore, when defined in terms of H, the Mokken scale is ambiguous. It is furthermore ar gued that item-selection free statistical inferences con ceming the latent person order appear to be insuffi ciently based on double monotony alone, and that the Rasch model is the only item response model fulfilling this requirement. Finally, it is contended that the Mokken scale is an unfruitful compromise between the requirements of a Guttman scale and the requirements of classical test theory.

Journal ArticleDOI
TL;DR: In this article, the covariance and regression slope models are proposed for assessing validity gen eralization, which are less restrictive in that they require only one hypothetical distribution of the distribution.
Abstract: Two new models, the covariance and regression slope models, are proposed for assessing validity gen eralization The new models are less restrictive in that they require only one hypothetical distr

Journal ArticleDOI
TL;DR: The authors examined the effect of anxiety and dissimulation motivation of job applicants on their performance on an ability test and found that low anxiety scorers had a greater effect on the ability test performance.
Abstract: This study examined the effect of anxiety and dissi mulation motivation of job applicants on their perfor mance on an ability test. Two aspects of performance were considered: the total score and the appropriate ness score. Four IRT-based appropriateness indices for detecting aberrant response patterns were employed in this study. The results indicate a negative effect of dissimulation motivation on the performance of low anxiety scorers, with respect to both the total score and the appropriateness score, with a greater effect on the latter. This effect was evidenced by an erratic or aberrant response pattern on the ability test; that is, missing relatively easy items while answering more difficult ones correctly. The results are discussed in light of the diverse interpretations concerning the meaning of Lie scales.

Journal ArticleDOI
TL;DR: In this article, a small scale applicability of the Rasch estimates was investigated under simulated conditions of guess ing and heterogeneity in item discrimination, and the results showed that robustness could only be demonstrated for the correlational criterion.
Abstract: The small scale applicability of Rasch estimates was investigated under simulated conditions of guess ing and heterogeneity in item discrimination. The ac curacy of the Rasch estimates was evaluated by means of the correlation between the item/person parameters and their estimates, the standard deviations of the esti mates, and the difference as well as the root mean squared difference between parameters and estimates. Within the range of the present investigation (from 10 to 50 items and from 25 to 500 persons) these criteria yielded favorable results under conditions of heteroge neous item discrimination. Under conditions of guess ing, robustness could only be demonstrated for the correlational criterion. Guessing affects the difference measures between the parameter values and estimates quite strongly in a systematic way. It is argued that, notwithstanding these estimation errors, the Rasch model is to be preferred over nonstandard estimation procedures, from which the validity is unclear, or the use o...

Journal ArticleDOI
TL;DR: In this paper, a new stochastic multidimensional scaling (MDS) method was developed for paired comparisons data and rendered a spatial representation of subjects and stimuli, where subjects are represented as vectors and stimuli as points in a T-dimensional space, where the scalar products or pro jections of the stimulus points onto the subject vectors, provide respective information as to the utility (or whatever latent construct is under investigation) of the stimuli to the subjects.
Abstract: This article presents the development of a new sto chastic multidimensional scaling (MDS) method, which operates on paired comparisons data and renders a spatial representation of subjects and stimuli. Subjects are represented as vectors and stimuli as points in a T- dimensional space, where the scalar products, or pro jections of the stimulus points onto the subject vec tors, provide respective information as to the utility (or whatever latent construct is under investigation) of the stimuli to the subjects. The psychometric literature concerning related MDS methods that also operate on paired comparisons data is reviewed, and a technical description of the new method is provided. A small monte carlo analysis performed on synthetic data with the new method is also presented. To illustrate the versatility of the model, an application measuring con sumer satisfaction and investigating the impact of hy pothesized determinants, using one of the optional re parameterized models, is described. Future areas of ...

Journal ArticleDOI
Larry H. Ludlow1
TL;DR: A graphical comparison of empirical versus simu lated residual variation is presented as one way to as sess the goodness of fit of an item response theory model.
Abstract: A graphical comparison of empirical versus simu lated residual variation is presented as one way to as sess the goodness of fit of an item response theory model. The two forms of residual variation were gen erated through the separate calibration of empirical data and data "tailored" to fit the model, given the empirical parameter estimates. A variety of techniques illustrate the utility of using tailored residuals as a specific baseline against which empirical residuals may be understood.

Journal ArticleDOI
TL;DR: In this article, the authors explored how four test equating methods (linear, equipercen tile, and item response theory methods based on the Rasch and three-parameter models) responded to tests of different psychometric properties.
Abstract: This monte carlo study explored how four com monly used test equating methods (linear, equipercen tile, and item response theory methods based on the Rasch and three-parameter models) responded to tests of different psychometric properties. The four methods were applied to generated data sets where mean item difficulty and discrimination as well as level of chance scoring were manipulated. In all cases, examinee abil ity was matched to the level of difficulty of the tests. The results showed the Rasch model not to be very robust to violations of the equal discrimination and non-chance scoring assumptions. There were also problems with the three-parameter model, but these were due primarily to estimation and linking prob lems. The recommended procedure for tests similar to those studied is the equipercentile method.

Journal ArticleDOI
TL;DR: In this paper, the use of various types of free-answer items (e.g., the brief answer, interlinear, and "fill in the blanks in the following paragraph" forms) is discussed.
Abstract: An important but usually neglected aspect of the training of teachers is instruction in the art of writing good classroom tests. Such training should emphasize various forms of objective items (e.g., multiple- choice, master list, matching, greater-less-same, best- worst answer, and matrix format). The proper formu lation and accurate grading of essay items should be included, as should the use of various types of free- answer items (e.g., the brief answer, interlinear, and "fill in the blanks in the following paragraph" forms). For courses involving laboratory work, such as sci ence, machine shop, and home economics, perfor mance and identification tests based on the laboratory work should be used.A second point is that organizations developing ap titude tests for nonacademic areas, such as police work, fire fighting, and licensing tests, should empha size the use by the client of a valid, reliable, and un biased criterion. Organizations developing academic aptitude tests should also (1) be alert to the ...

Journal ArticleDOI
TL;DR: A new type of theory and practice in testing is replacing the standard test by the test item bank, and classical test theory by item response theory as discussed by the authors, and it is shown how these also reinforce and complete each other.
Abstract: Since the era of Binet and Spearman, classical test theory and the ideal of the standard test have gone hand in hand, in part because both are based on the same paradigm of experimental control by manipulation and randomization. Their longevity is a consequence of this mutually beneficial symbiosis. A new type of theory and practice in testing is replacing the standard test by the test item bank, and classical test theory by item response theory. In this paper it is shown how these also reinforce and complete each other.

Journal ArticleDOI
TL;DR: In this article, the authors employed a laboratory methodology to investigate two research questions related to scale recalibration (beta change) in temporal survey re search, and applied this methodology to evaluate the use of the retrospecive design in assessing organizational change.
Abstract: Efforts to operationalize the alpha/beta/gamma change typology have suffered from a notable limita tion. Virtually all have been conducted in field set tings, thereby limiting the degree of experimental con trol over outcome criteria. Recognizing this limitation, the present study employed a laboratory methodology to investigate two research questions related to scale recalibration (beta change) in temporal survey re search. Application of this methodology permitted ran dom respondent assignment, exact replication of stim uli, and systematic time interval variation for the pretest-posttest design. Furthermore, the use of these procedures permitted testing the use of the retrospec tive design in assessing organizational change. Impli cations of the findings for the measurement of change are discussed.

Journal ArticleDOI
TL;DR: In this paper, a method for constructing a bank of items scored in two or more ordered response categories is described, which enables multistep problems, rating scale items, question "clus...
Abstract: A method for constructing a bank of items scored in two or more ordered response categories is de scribed and illustrated. This method enables multistep problems, rating scale items, question "clus...

Journal ArticleDOI
TL;DR: In this paper, a procedure for the sequential optimization of the calibration of an item bank is given, based on an empirical Bayesian approach to a reformulation of the Rasch model as a model for paired comparisons between the difficulties of test items.
Abstract: A procedure for the sequential optimization of the calibration of an item bank is given. The procedure is based on an empirical Bayesian approach to a reformulation of the Rasch model as a model for paired comparisons between the difficulties of test items in which ties are allowed to occur. First, it is shown how a paired-comparisons design deals with the usual incompleteness of calibration data and how the item parameters can be estimated using this design. Next, the procedure for a sequential optimization of the item parameter estimators is given, both for individuals responding to pairs of items and for item and examinee groups of any size. The paper concludes with a discussion of the choice of the first priors in the procedure and the problems involved in its generalization to other item response models.

Journal ArticleDOI
TL;DR: In this paper, a model and estimator for examinee-level measure of error variance are developed, which takes into account test form difficulty adjustments often used in standardized tests, and is linked to indices designed for identi fying unusual item response patterns.
Abstract: A model and estimator for examinee-level measure ment error variance are developed. Although the bi nomial distribution is basic to the modeling, the pro posed error model provides some insights into problems associated with simple binomial error, and yields estimates of error that are quite distinct from bi nomial error. By taking into consideration test form difficulty adjustments often used in standardized tests, the model is linked also to indices designed for identi fying unusual item response patterns. In addition, av erage error variance under the model is approximately that which would be obtained through a KR-20 estimate of reliability, thus providing a unique justification for this popular index. Empirical results using odd-even and alternate-forms measures of error variance tend to favor the proposed model over the binomial.

Journal ArticleDOI
TL;DR: In many applications of item response theory, it is of little consequence whether the Rasch model or a more accurate, but more complicated item response model is used as mentioned in this paper, and it might be advantageous to employ the RMS with small sample sizes.
Abstract: In many applications of item response theory, it is of little consequence whether the Rasch model or a more accurate, but more complicated item response model is used. With small sample sizes, it might be advantageous to employ the Rasch model. A clear counterexample is the case of optimal item selection under guessing.

Journal ArticleDOI
TL;DR: In this paper, the authors present a special issue on item banks for implementing testing programs with the aid of item banks. An historical frame work for viewing the papers is provided by brief re views of the literature in the areas of item response theory, item banking, and computerized testing.
Abstract: This paper comments on the contributions to this special issue on item banking. An historical frame work for viewing the papers is provided by brief re views of the literature in the areas of item response theory, item banking, and computerized testing. In general, the eight papers are viewed as contributing valuable technical knowledge for implementing testing programs with the aid of item banks.

Journal ArticleDOI
TL;DR: The development of an integrated system for the storage of items and the construction and analysis of tests is described, being developed both as a general facility for the Dutch Institute of Educa tional Measurement and as a support systems for the use and maintenance of item banks in schools.
Abstract: The development of an integrated system for the storage of items and the construction and analysis of tests is described The system is being developed both as a general facility for the Dutch Institute of Educa tional Measurement and as a support system for the use and maintenance of item banks in schools The methodology of developing the system is described with attention to the system architecture and to the re sults of the first stage of the system development