Showing papers in &quot;Applied Psychological Measurement in 1999&quot;

A Revised Index of Interrater Agreement for Multi-Item Ratings of a Single Target

TL;DR: In this article, a multistage adaptive testing approach that factors a into the item selection process is proposed, where the items in the item bank are stratified into a number of levels based on their a values.

...read moreread less

Abstract: Computerized adaptive tests (CAT) commonly use item selection methods that select the item which provides maximum information at an examinees estimated trait level. However, these methods can yield extremely skewed item exposure distributions. For tests based on the three-parameter logistic model, it was found that administering items with low discrimination parameter (a) values early in the test and administering those with high a values later was advantageous; the skewness of item exposure distributions was reduced while efficiency was maintained in trait level estimation. Thus, a new multistage adaptive testing approach is proposed that factors a into the item selection process. In this approach, the items in the item bank are stratified into a number of levels based on their a values. The early stages ofa test use items with lower as and later stages use items with higher as. At each stage, items are selected according to an optimization criterion from the corresponding level. Simulation studies were ...

...read moreread less

277 citations

Journal Article•DOI•

[...]

Michael K. Lindell¹, Christina J. Brandt², David J. Whitney³•Institutions (3)

Texas A&M University¹, Michigan State University², California State University, Long Beach³

Computerized adaptive testing: overview and introduction.

TL;DR: In this paper, an alternative index, r*wg.J, is recommended, which is an inverse linear function of the ratio of the average obtained variance to the variance of uniformly distributed random error.

...read moreread less

Abstract: The commonly used form of rwg. (J) can display irregular behavior, so four variants of this index were examined. An alternative index, r*wg. J, is recommended. This index is an inverse linear function of the ratio of the average obtained variance to the variance of uniformly distributed random error. r*wg.Jis superficially similar to Cronbach’s α, but careful examination confirms that r*wg.Jis an index of agreement, not reliability. Based on an examination of the small-sample behavior of rwgand r*wg.J, sample sizes of 10 or more raters are recommended.

...read moreread less

256 citations

Journal Article•DOI•

[...]

Rob R. Meijer¹, Michael L. Nering•Institutions (1)

University of Twente¹

Using Response-Time Constraints To Control for Differential Speededness in Computerized Adaptive Testing.

TL;DR: The elements of CAT discussed here include item selection procedures, estimation of the latent trait, item exposure, measurement precision, and item bank development.

...read moreread less

Abstract: Use of computerized adaptive testing (CAT) has increased substantially since it was first formulated in the 1970s. This paper provides an overview of CAT and introduces the contributions to this Special Issue. The elements of CAT discussed here include item selection procedures, estimation of the latent trait, item exposure, measurement precision, and item bank development. Some topics for future research are also presented.

...read moreread less

145 citations

Journal Article•DOI•

[...]

Willem J. van der Linden¹, David J. Scrams, Deborah L. Schnipke²•Institutions (2)

University of Twente¹, Law School Admission Council²

A Description and Demonstration of the Polytomous DFIT Framework.

TL;DR: An item-selection algorithm is proposed for neutralizing the differential effects of time limits on computerized adaptive test scores based on a statistical model for distributions of examinees’ response times on items in a bank that is updated each time an item is administered.

...read moreread less

Abstract: An item-selection algorithm is proposed for neutralizing the differential effects of time limits on computerized adaptive test scores The method is based on a statistical model for distributions of examinees’ response times on items in a bank that is updated each time an item is administered Predictions from the model are used as constraints in a 0-1 linear programming model for constrained adaptive testing that maximizes the accuracy of the trait estimator The method is demonstrated empirically using an item bank from the Armed Services Vocational Aptitude Battery

...read moreread less

117 citations

Journal Article•DOI•

[...]

Claudia Flowers¹, T. C. Oshima², Nambury S. Raju³•Institutions (3)

University of North Carolina at Charlotte¹, Georgia State University², Illinois Institute of Technology³

Item Selection in Adaptive Testing with the Sequential Probability Ratio Test

TL;DR: In this article, the authors examined the polytomous-DFIT framework and found that it was effective in identifying DTF and DIF for the simulated conditions, but the DTF index did not perform as consistently as the DIF index.

...read moreread less

Abstract: Raju, van der Linden, & Fleer (1995) proposed an item response theory based, parametric differential item functioning (DIF) and differential test functioning (DTF) procedure known as differential functioning of items and tests (DFIT). According to Raju et al., the DFIT framework can be used with unidimensional and multidimensional data that are scored dichotomously and/or polytomously. This study examined the polytomous-DFIT framework. Factors manipulated in the simulation were: (1) length of test (20 and 40 items), (2) focal group distribution, (3) number of DIF items, (4) direction of DIF, and (5) type of DIF. The findings provided promising results and indicated directions for future research. The polytomous DFIT framework was effective in identifying DTF and DIF for the simulated conditions. The DTF index did not perform as consistently as the DIF index. The findings are similar to those of unidimensional and multidimensional DFIT studies.

...read moreread less

115 citations

Journal Article•DOI•

[...]

Theodorus Johannes Hendrikus Maria Eggen

Probit Latent Class Analysis with Dichotomous or Ordered Category Measures: Conditional Independence/Dependence Models.

TL;DR: In this study, a method based on Kullback-Leibler information (KLI) was evaluated and showed that testing algorithms using KLI-based item selection performed better than or as well as those using Fisher information (FI) based item selection.

...read moreread less

Abstract: Wald’s (1947) sequential probability ratio test can be implemented as an adaptive test for classifying examinees into categories. However, current implementations use an item selection method that ...

...read moreread less

112 citations

Journal Article•DOI•

[...]

John S. Uebersax

The null distribution of person-fit statistics for conventional and adaptive tests.

TL;DR: In this article, flexible methods that relax restrictive conditional independence assumptions of latent class analysis (LCA) are described, and the relationship between the multivariate probit mixture model proposed here and Rost's mixed Rasch (1990, 1991) model is discussed.

...read moreread less

Abstract: Flexible methods that relax restrictive conditional independence assumptions of latent classanalysis (LCA) are described. Dichotomous and ordered category manifest variables are viewed asdiscretized latent continuous variables. The latent continuous variables are assumed to have a mixtureofmultivariate-normals distribution. Within a latent class, conditional dependence is modeled as the mutual association of all or some latent continuous variables with a continuous latent trait (or in special cases, multiple latent traits). The relaxation of conditional independence assumptions allows LCA to better model natural taxa. Comparisons of specific restricted and unrestricted models permit statistical tests of specific aspects of latent taxonic structure. Latent class, latent trait, and latent distribution analysis can be viewed as special cases of the mixed latent trait model. The relationship between the multivariate probit mixture model proposed here and Rost’s mixed Rasch (1990, 1991) model is discussed. Two...

...read moreread less

96 citations

Journal Article•DOI•

[...]

Edith M. L. A. van Krimpen-Stoop, Rob R. Meijer¹•Institutions (1)

University of Twente¹

Item Parameter Recovery for the Nominal Response Model.

TL;DR: In this article, a three-part simulation study was conducted to investigate the theoretical distribution of the lz and lz across trait across trait for CAT and P&P tests.

...read moreread less

Abstract: Several person-fit statistics have been proposed to detect item score patterns that do not fit an item response theory model. To classify response patterns as misfitting, the distribution of a person-fit statistic is needed. The theoretical null distributions of several fit statistics have been derived for paper-and-pencil (P&P) tests. However, it is unknown whether these distributions also hold for computerized adaptive tests (CAT). A three-part simulation study was conducted. In the first study, the theoretical distribution of the lz statistic across trait. 0levels for CAT and P&P tests was investigated. The distribution of the l*z statistic proposed by Snijders (in press) was also investigated. Results indicated that the distribution of both lz and l*z differed from the theoretical distribution in CAT. The second study examined the distributions of lzand l*z using simulation. These simulated distributions, when based on O [UNKNOWN], were found to be problematic in CAT. In the third study, the detection rates of l*z and lz were compared. The rates for both statistics were found to be similar in most cases

...read moreread less

73 citations

Journal Article•DOI•

[...]

R. J. De Ayala¹, Monica Sava-Bolesta²•Institutions (2)

University of Nebraska–Lincoln¹, University of Maryland, College Park²

Correlates of Person Fit and Effect of Person Fit on Test Validity

TL;DR: In this paper, the sample size ratio (SSR), the latent trait distribution (LD), and the amount of item information were used to estimate the item parameters in the nominal response model.

...read moreread less

Abstract: Establishing guidelines for reasonable item parameter estimation is fundamental to use of the nominal response model. Factors studied were the sample size ratio (SSR), latent trait distribution (LD), and amount of item information. Results showed that the LD accounted for 42.5% of the variability in the accuracy of estimating the slope parameter; the SSR and the maximum item information factors accounted for 29.5% and 3.5% of the accuracy, respectively. In general, as the LD departed from a normal distribution, a larger number of examinees was required to accurately estimate the slope and intercept parameters. Results indicated that an SSR of 10:1 can produce reasonably accurate item parameter estimates when the LD is normal.

...read moreread less

71 citations

Journal Article•DOI•

[...]

Neal Schmitt¹, David Chan², Joshua M. Sacco, Lynn A. McFarland, Danielle Jennings¹ - Show less +1 more•Institutions (2)

Michigan State University¹, National University of Singapore²

Are Simple Change Scores Obsolete? An Approach to Studying Correlates and Predictors of Change.

TL;DR: Person-fit indices (lz and multitest lzm) derived from item response theory and used to identify misfitting examinees were computed based on responses to cognitive ability and personality tests as discussed by the authors.

...read moreread less

Abstract: Person-fit indices (lz and multitest lzm) derived from item response theory and used to identify misfitting examinees were computed based on responses to cognitive ability and personality tests. lz indices from different ability domains within the cognitive tests were uncorrelated with each other; lz indices from different tests within the personality domain were moderately intercorrelated. Cross-domain correlations were near 0. Test-taking motivation and conscientiousness were correlated moderately with multitest lzm for personality tests and to a lesser extent for cognitive tests. Test reactions were uncorrelated with any of the lz measures. Males had higher mean lz s than females. This difference could be partly attributed to differences in conscientiousness. African-Americans had higher mean lz than Whites. This effect could not be accounted for by test-taking motivation or conscientiousness. High values of lz affected the criterion-related validity of the set of cognitive tests such that the validity...

...read moreread less

65 citations

Journal Article•DOI•

[...]

Tenko Raykov¹•Institutions (1)

Fordham University¹

Accuracy of Population Validity and Cross-Validity Estimation: An Empirical Comparison of Formula-Based, Traditional Empirical, and Equal Weights Procedures.

TL;DR: In this paper, a latent variable modeling approach is discussed that focuses on ability change scores and allows estimation of both individual latent change score and the relationship of ability change score to other variables.

...read moreread less

Abstract: This paper complements recent discussions about the reliability of observed change scores (Collins, 1996a; Humphreys, 1996; Williams & Zimmerman, 1996a, 1996b). It is argued that modeling change on the latent dimension of interest is a better approach to measuring change than focusing on observed change scores and their properties. It is proposed that research be directed toward correlates and predictors of ability change (Rogosa & Willett, 1985b) and away from recorded change scores and their reliability. A latent variable modeling approach is discussed that focuses on ability change scores. It permits estimation of both individual latent change scores and the relationship of ability change scores to other variables.

...read moreread less

Journal Article•DOI•

[...]

Nambury S. Raju¹, Reyhan Bilgiç², Jack E. Edwards³, Paul F. Fleer¹•Institutions (3)

Illinois Institute of Technology¹, Middle East Technical University², Defense Manpower Data Center³

Empirical Initialization of the Trait Estimator in Adaptive Testing

TL;DR: In this paper, an empirical Monte Carlo study was performed using predictor and criterion data from 84,808 U.S. Air Force enlistees, and 500 estimates for each of 9 validity and 11 cross-validity estimation procedures were generated for each sample size condition.

...read moreread less

Abstract: An empirical monte carlo study was performed using predictor and criterion data from 84,808 U.S. Air Force enlistees. 501 samples were drawn for each of seven sample size conditions: 25, 40, 60, 80, 100, 150, and 200. Using an eight-predictor model, 500 estimates for each of 9 validity and 11 cross-validity estimation procedures were generated for each sample size condition. These estimates were then compared to the actual squared population validity and cross-validity in terms of mean bias and mean squared bias. For the regression models determined using ordinary least squares, the Ezekiel procedure produced the most accurate estimates of squared population validity (followed by the Smith and the Wherry procedures), and Burket’s formula resulted in the best estimates of squared population cross-validity. Other analyses compared the coefficients determined by traditional empirical cross-validation and equal weights; equal weights resulted in no loss of predictive accuracy and less shrinkage. Numerous issu...

...read moreread less

Journal Article•DOI•

[...]

Willem J. van der Linden¹•Institutions (1)

University of Twente¹

Reducing Bias in CAT Trait Estimation: A Comparison of Approaches.

TL;DR: In this article, a procedure for empirical initialization of the trait is proposed based on the statistical relation between and background variables known prior to test administration, which is modeled using a two-parameter version of a logistic item response theory model with manifest predictors discussed in Zwinderman (1991).

...read moreread less

Abstract: A procedure for empirical initialization of the trait. estimator in adaptive testing is proposed that is based on the statistical relation between and background variables known prior to test administration. The relation is modeled using a two-parameter version of a logistic item response theory model with manifest predictors discussed in Zwinderman (1991). Equations are provided that are necessary for estimating the parameters from an incomplete sample of response data and data on background variables. The procedure is illustrated for an adaptive version of a test from the Dutch General Aptitude Test Battery, with response time on a prior test as a background variable.

...read moreread less

Journal Article•DOI•

[...]

Tianyou Wang, Bradley A. Hanson, Che-Ming A. Lau

Detecting item memorization in the cat environment

TL;DR: In this article, the use of a beta prior in trait estimation was extended to the maximum expected a posteriori (MAP) method of Bayesian estimation, called essentially unbiased MAP (EU-MAP).

...read moreread less

Abstract: The use of a beta prior in trait estimation was extended to the maximum expected a posteriori (MAP) method of Bayesian estimation. This new method, called essentially unbiased MAP (EU-MAP), was compared with MAP (using a standard normal prior), essentially unbiased expected a posteriori, weighted likelihood, and maximum likelihood estimation methods. Comparisons were made based on the effects that the shape of prior distributions, different item bank characteristics, and practical constraints had on bias, standard error, and root-mean-square error (RMSE). Overall, EU-MAP performed best. This new method significantly reduced bias in fixed-length tests (though with a slight increase in RMSE) and performed reasonably well when a fixed posterior variance termination rule was used. Practical constraints had little effect on the bias of this method.

...read moreread less

Journal Article•DOI•

[...]

Lori McLeod, Charles Lewis

A Note on Simple Gain Score Precision

TL;DR: In this paper, the authors evaluate procedures that could identify these individuals by examining the application of person-fit indices in the adaptive test environment, using information from these indices, a new method was developed.

...read moreread less

Abstract: The purpose of appropriateness/person-fit indices is to identify response patterns for which a given item response theory model is inappropriate for an examinee even though that model is appropriate for a group. This study was concerned with those cases in which examinees had prior knowledge of items from an item bank used to generate a computerized adaptive test (CAT) and used the memorized information to inflate their test scores. The objective was to evaluate procedures that could identify these individuals by examining the application of person-fit indices in the CAT environment. The lzand ECI4 zindices were selected for comparison. Using information from these indices, a new method was developed. All three indices showed little power to detect the use of memorization. Some possibilities for altering a test when the model becomes inappropriate for an examinee are also discussed.

...read moreread less

Journal Article•DOI•

[...]

Gideon J. Mellenbergh¹•Institutions (1)

University of Amsterdam¹

A Rationale for Defining Achievement Levels Using IRT-Estimated Domain Scores.

TL;DR: In this article, a distinction is made between two concepts of measurement precision: reliability and information dependence, and it is shown that reliability is population dependent and information is examinee dependent.

...read moreread less

Abstract: A distinction is necessary between two concepts of measurement precision. Reliability is population dependent and information is examinee dependent. Both concepts also apply to the simple gain scor...

...read moreread less

Journal Article•DOI•

[...]

E. Matthew Schulz, Michael J. Kolen¹, W. Alan Nicewander²•Institutions (2)

University of Iowa¹, Defense Manpower Data Center²

Likert Scaling Using Continuous, Censored, and Graded Response Models: Effects on Criterion-Related Validity

TL;DR: In this article, a new procedure for defining achievement levels on continuous scales was developed using aspects of Guttman scaling and item response theory, which assigns examinees to levels of achievement when the levels are represented by separate pools of multiple-choice items.

...read moreread less

Abstract: A new procedure for defining achievement levels on continuous scales was developed using aspects of Guttman scaling and item response theory. This procedure assigns examinees to levels of achievement when the levels are represented by separate pools of multiple-choice items. Items were assigned to levels on the basis of their content and hierarchically defined level descriptions. The resulting level response functions were well-spaced and noncrossing. This result allowed well-spaced levels of achievement to be defined by a common percent-correct standard of mastery on the level pools. Guttman patterns of mastery could be inferred from level scores. The new scoring procedure was found to have higher reliability, higher classification consistency, and lower classification error, when compared to two Guttman scoring procedures.

...read moreread less

Journal Article•DOI•

[...]

Pere J. Ferrando¹•Institutions (1)

Rovira i Virgili University¹

Correlations Redux: Asymptotic Confidence Limits for Partial and Squared Multiple Correlations:

TL;DR: This article examined how three item response models performed when they were applied to data collected from a conventionally developed Likert-type personality scale, each model examined is based on a...

...read moreread less

Abstract: This study examined how three item response models performed when they were applied to data collected from a conventionally developed Likert-type personality scale. Each model examined isbased on a...

...read moreread less

Journal Article•DOI•

[...]

Richard G. Graf, Edward F. Alf¹•Institutions (1)

San Diego State University¹

Some Reliability Estimates for Computerized Adaptive Tests

TL;DR: In this paper, the authors describe procedures and computer programs for solving these problems using the methods described by Olkin and Finn, and extend these methods for any number of predictors or for partialing out any variable.

...read moreread less

Abstract: Olkin & Finn (1995) developed expressions for confidence intervals for functions of simple, partial, and multiple correlations. This paper describes procedures and computer programs for solving these problems using the methods described by Olkin and Finn. The programs extendthe methods for any number of predictors or for partialing out any number of variables.

...read moreread less

Journal Article•DOI•

[...]

W. Alan Nicewander, Gary L. Thomasson¹•Institutions (1)

Defense Manpower Data Center¹

Software Note Obtaining Comparable Item Parameter Estimates in MULTILOG and PARSCALE for Two Polytomous IRT Models

TL;DR: In this article, three reliability estimates are derived for the Bayes modal estimate (BME) and the maximum likelihood estimate (MLE) of θ in computerized adaptive tests (CAT).

...read moreread less

Abstract: Three reliability estimates are derived for the Bayes modal estimate (BME) and the maximum likelihood estimate (MLE) of θin computerized adaptive tests (CAT). Each reliability estimate is a functio...

...read moreread less

Journal Article•DOI•

[...]

Ruth A. Childs¹, Wen-Hung Chen²•Institutions (2)

University of Toronto¹, American Institutes for Research²

Knowledge of solution strategies and irt modeling of items for transitive reasoning

TL;DR: The logistic versions of Samejima's (1969) graded response model and Muraki's (1992) generalized partial-credit model are parameterized differently by MULTILOG (Thissen, 1991) and PARSCALE (Muraki, 1992) as mentioned in this paper.

...read moreread less

Abstract: The logistic versions of Samejima’s (1969) graded response model and Muraki’s (1992) generalized partial-credit model are parameterized differently by MULTILOG (Thissen, 1991) and PARSCALE (Muraki ...

...read moreread less

Journal Article•DOI•

[...]

Klaas Sijtsma¹, Anton C. Verweij²•Institutions (2)

Tilburg University¹, VU University Amsterdam²

Testing the Equality of Two Independent αCoefficients Adjusted by the Spearman-Brown Formula

TL;DR: In this paper, a model-oriented approach to studying processes and strategies underlying the incorrect/correct responses to cognitive test tasks is presented, which is contrasted with a dataoriented approach in which verbal explanations for incorrect or correct responses are collected during the test phase and incorporated in the scoring.

...read moreread less

Abstract: Componential item response theory (CIRT) is presented as a model-oriented approach to studying processes and strategies underlying the incorrect/correct responses to cognitive test tasks. CIRT is contrasted with a data-oriented approach in which verbal explanations for incorrect/correct responses are collected during the test phase and incorporated in the scoring. Alternatively, the psychologically meaningful data are modeled by unidimensional item response theory models. Verbal explanations for each examinee and task were collected from transitive reasoning tasks in addition to the incorrect/correct responses. Two datasets were compiled, one reflecting the common incorrect/correct scoring and one showing whether a deductive strategy had been used to produce a correct response. The Mokken model of monotone homogeneity, the partial-credit model, and the generalized one-parameter logistic model were used to analyze both polytomous datasets. Results showed that combining knowledge of solution strategies with...

...read moreread less

Journal Article•DOI•

[...]

Yousef M. Alsawalmeh¹, Leonard S. Feldt²•Institutions (2)

Yarmouk University¹, University of Iowa²

Empirical initialization of the trait estimator in adaptive testing [erratum]

TL;DR: In this paper, an approximate statistical test is developed for the hypothesis of equality between the Spearman-Brown extrapolations of two independent values of Cronbach's alpha reliability coefficient (α), assuming that the units added to or deleted from each instrument are classically parallel to the units included in the original version of each instrument.

...read moreread less

Abstract: An approximate statistical test is developed for the hypothesis of equality between the Spearman-Brown extrapolations of two independent values of Cronbach’s alpha reliability coefficient (α). This test assumes that the units added to or deleted from each instrument are classically parallel to the units included in the original version of each instrument. The projections for Tests 1 and 2 are based on lengthening or shortening factors of K1 and K2, which may or may not be equal. Special cases of this test include applications in which the projected values are intraclass coefficients or only one of the instruments is presumed to be altered in length. Monte carlo simulations demonstrated that the procedure effectively controls Type I error even when the original αs are based on as few as two test parts or two raters.

...read moreread less

Journal Article•

[...]

van der Linden, J Willem

Distinguishing Constant and Dimension-Dependent Interaction: A Simulation Study.

TL;DR: An estimator in adaptive testing is proposed that is based on the statistical relation between and background variables known prior to test administration, based on a two-parameter version of a logistic item response theory model.

...read moreread less

Journal Article•DOI•

[...]

Francis Tuerlinckx, Paul De Boeck¹•Institutions (1)

Katholieke Universiteit Leuven¹

Optimal Item Discrimination and Maximum Information for Logistic IRT Models.

TL;DR: In this article, a simulation study was conducted to determine how well two models for local item dependency (LID), called interaction models, could be distinguished, and the results indicated that if the interaction parameter is not too extreme, the COIM will be rejected in favor of the true model, while finding the true weight required a large sample size.

...read moreread less

Abstract: A simulation study was conducted to determine how well two models for local item dependency (LID), called interaction models, could be distinguished. The models examined were the constantorder interaction model (COIM) and the dimension dependent interaction model (DDIM). Data were simulated according to the latter model. Three factors were manipulated: sample size, the weight of the difference between the latent trait value of the examinee and the interaction parameter, and the value of the interaction parameter. Results indicated that (1) if the interaction parameter is not too extreme, the COIM will be rejected in favor of the true model (the Rasch model fit poorly for all levels of the interaction parameter); (2) a larger weight of the difference between the latent trait value and the interaction parameter facilitated the rejection of the COIM, although finding the true weight required a large sample size; and (3) the value for the interaction parameter with an optimal discrimination between the COIM a...

...read moreread less

Journal Article•DOI•

[...]

Wim J. J. Veerkamp¹, Martijn P. F. Berger²•Institutions (2)

University of Twente¹, Maastricht University²

Computer Program Exchange: Pwrcoeff and Nnormult: Programs for Simulating Multivariate NonNormal Data

TL;DR: This paper derives discrimination parameter values, as functions of the guessing parameter and distances between person parameters and item difficulty, that yield maximum information for the three-parameter logistic item response theory model.

...read moreread less

Abstract: Items with the highest discrimination parameter values in a logistic item response theory model do not necessarily give maximum information. This paper derives discrimination parameter values, as functions of the guessing parameter and distances between person parameters and item difficulty, that yield maximum information for the three-parameter logistic item response theory model. An upper bound for information as a function of these parameters is also derived. An algorithm is suggested for the maximum information item selection criterion for adaptive testing and is compared with a full bank search algorithm.

...read moreread less

Journal Article•DOI•

[...]

Jonathan Nevitt, Gregory R. Hancock¹•Institutions (1)

University of Maryland, College Park¹

Strategies for Computerized Adaptive Grading Testing

TL;DR: NNORMULT as discussed by the authors is a multivariate extension of the Fleishman power method for generating simulated multivariate nonnormal data, which requires the user to enter the desired population skew, kurtosis, and start values for the constants.

...read moreread less

Abstract: Many of the data-analytic methods employed in the social sciences assume that the data are normally distributed. It has been recognized that this assumption is often unrealistic, so there has been much research into how particular data-analytic methods behave when the data are not normally distributed. Much of this research has relied heavily on monte carlo simulations to characterize the behavior of a particular test statistic. A cornerstone of this type of investigation is the generation of data that conform to prescribed population characteristics. Fleishman (1978) developed the power transformation method for generating simulated univariate nonnormal data. Vale & Maurelli (1983) extended this study by developing a method for generating simulated multivariate nonnormal data. PWRCOEFF. PWRCOEFFderives power transformation constants for any possible combination of skew and kurtosis. It requires the user to enter the desired population skew, kurtosis, and start values for the constants. The program outputs a text file containing the specified skew and kurtosis, the constants for the power transformation, the start values, and the number of iterations to convergence. NNORMULT. NNORMULT generates multivariate nonnormal data using the multivariate extension of the Fleishman power method as developed by Vale & Maurelli (1983). The program requires the user to enter the sample size of the dataset, the population covariance matrix (or correlation matrix) for the data, and the Fleishman power constants [which can be found in Fleishman’s (1978) table or can be derived using PWRCOEFF]. NNORMULT outputs a text file containing a matrix of transformed raw data. The sample covariance structure S, of this matrix, represents a random sample drawn from the desired population with covariance structure 6 .

...read moreread less

Journal Article•DOI•

[...]

Beiling Xiao¹•Institutions (1)

Northern Illinois University¹

Latent Structure Analysis of Classification Errors in Screening and Clinical Diagnosis: An Alternative to Classification Analysis

TL;DR: The item search algorithm in these tests can be based on either a golden section search, a Z-score, or an EAP-based search; these methods result, respectively, in the golden search grading test (GGT), the Z score grading test, and the EAP grading test.

...read moreread less

Abstract: IRT-based adaptive grading tests are designed to assign examinees to one of several grading categories. The item search algorithm in these tests can be based on either a golden section search, a Z-score, or an EAP-based search; these methods result, respectively, in the golden search grading test (GGT), the Z-score grading test (ZGT), and the EAP grading test (EGT). Grade assignments are evaluated after each item is administered and after the current trait estimate ([UNKNOWN]) has been determined. A test is terminated based on one of three conditions: (1) [UNKNOWN] is between two cutoff scores; (2) [UNKNOWN] is above or below the highest or lowest cutoff scores, respectively; or (3) a prespecified maximum number of items has been administered. Monte carlo studies using actual ACT Mathematics test item parameters showed that all three strategies effectively assigned examinees into multiple achievement grade levels. EGT had more correct classifications in the middle range of grade levels and more classifica...

...read moreread less

Journal Article•DOI•

[...]

John R. Bergan¹, Richard D. Schwarz², Linda A. Reddy³•Institutions (3)

University of Arizona¹, CTB/McGraw Hill², Fairleigh Dickinson University³

Obtaining Comparable Item Parameter Estimates in MULTILOG and PARSCALE

TL;DR: This paper presents latent-class models that fall within the purview of the general model presented by Clogg & Goodman (1984, 1985) and Walter & Irwig (1988) and variations on the general latent- class model allow the investigator to determine whether the criterion measure and/or the diagnostic or screening procedure for multiple groups can be considered error-free.

...read moreread less

Abstract: Classification analysis is used widely to detect classification errors determined by evaluating a screening or diagnostic instrument against a criterion measure. The usefulness of classification analysis is limited because it assumes an error-free criterion and provides no statistical test of the validity of that assumption. The classification-analysis model is a special case of a general latent-class model. This paper presents latent-class models that fall within the purview of the general model presented by Clogg & Goodman (1984, 1985) and Walter & Irwig (1988). Variations on the general latent-class model allowthe investigator to determine whether the criterion measure and/or the diagnostic or screening procedure for multiple groups can be considered error-free. Analogous to the problem of differential item functioning, the general model makes it possible to test assumptions regarding classification errors that could occur across groups. The proportion of individuals who may be misclassified by a scree...

...read moreread less

Journal Article•

[...]

Ruth A. Childs, Wen-Hung Chen