scispace - formally typeset
Search or ask a question

Showing papers in "Psychometrika in 1994"


Journal ArticleDOI
TL;DR: In this article, a general theory for parametric inference in contingency tables is outlined, and the asymptotic covariance matrix of the estimated polychoric correlations is derived.
Abstract: A general theory for parametric inference in contingency tables is outlined. Estimation of polychoric correlations is seen as a special case of this theory. The asymptotic covariance matrix of the estimated polychoric correlations is derived for the case when the thresholds are estimated from the univariate marginals and the polychoric correlations are estimated from the bivariate marginals for given thresholds. Computational aspects are also discussed.

355 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider the case where branch probabilities are products of nonnegative integer powers in the parameters, and their complements, 1 - θs, and show that the EM algorithm necessarily converges to a local maximum.
Abstract: Multinomial processing tree models assume that an observed behavior category can arise from one or more processing sequences represented as branches in a tree. These models form a subclass of parametric, multinomial models, and they provide a substantively motivated alternative to loglinear models. We consider the usual case where branch probabilities are products of nonnegative integer powers in the parameters, 0≤θs≤1, and their complements, 1 - θs. A version of the EM algorithm is constructed that has very strong properties. First, the E-step and the M-step are both analytic and computationally easy; therefore, a fast PC program can be constructed for obtaining MLEs for large numbers of parameters. Second, a closed form expression for the observed Fisher information matrix is obtained for the entire class. Third, it is proved that the algorithm necessarily converges to a local maximum, and this is a stronger result than for the exponential family as a whole. Fourth, we show how the algorithm can handle quite general hypothesis tests concerning restrictions on the model parameters. Fifth, we extend the algorithm to handle the Read and Cressie power divergence family of goodness-of-fit statistics. The paper includes an example to illustrate some of these results.

275 citations


Journal ArticleDOI
TL;DR: In this article, issues of evidence and inference in educational assessment are discussed from a general principles for inference in the presence of uncertainty, and the resulting concepts and techniques can be viewed as applications of more general principles.
Abstract: Educational assessment concerns inference about students' knowledge, skills, and accomplishments. Because data are never so comprehensive and unequivocal as to ensure certitude, test theory evolved in part to address questions of weight, coverage, and import of data. The resulting concepts and techniques can be viewed as applications of more general principles for inference in the presence of uncertainty. Issues of evidence and inference in educational assessment are discussed from this perspective.

254 citations


Journal ArticleDOI
TL;DR: Compared with a traditional clustering method, theK-means procedure had fewer points misclassified while the classification accuracy of neural networks worsened as the number of clusters in the data increased from two to five.
Abstract: Several neural networks have been proposed in the general literature for pattern recognition and clustering, but little empirical comparison with traditional methods has been done. The results reported here compare neural networks using Kohonen learning with a traditional clustering method (K-means) in an experimental design using simulated data with known cluster solutions. Two types of neural networks were examined, both of which used unsupervised learning to perform the clustering. One used Kohonen learning with a conscience and the other used Kohonen learning without a conscience mechanism. The performance of these nets was examined with respect to changes in the number of attributes, the number of clusters, and the amount of error in the data. Generally, theK-means procedure had fewer points misclassified while the classification accuracy of neural networks worsened as the number of clusters in the data increased from two to five.

127 citations


Journal ArticleDOI
TL;DR: This paper proposed a loglinear IRT model that relates polytomously scored item responses to a multidimensional latent space, where the analyst may specify a response function for each response, indicating which latent abilities are necessary to arrive at that response.
Abstract: A loglinear IRT model is proposed that relates polytomously scored item responses to a multidimensional latent space The analyst may specify a response function for each response, indicating which latent abilities are necessary to arrive at that response Each item may have a different number of response categories, so that free response items are more easily analyzed Conditional maximum likelihood estimates are derived and the models may be tested generally or against alternative loglinear IRT models

120 citations


Journal ArticleDOI
TL;DR: In this paper, a class of oblique rotation procedures is proposed to rotate a pattern matrix such that it optimally resembles a matrix which has an exact simple pattern, which can recover relatively complex simple structures where other well-known simple structure rotation techniques fail.
Abstract: Factor analysis and principal component analysis are usually followed by simple structure rotations of the loadings. These rotations optimize a certain criterion (e.g., varimax, oblimin), designed to measure the degree of simple structure of the pattern matrix. Simple structure can be considered optimal if a (usually large) number of pattern elements is exactly zero. In the present paper, a class of oblique rotation procedures is proposed to rotate a pattern matrix such that it optimally resembles a matrix which has an exact simple pattern. It is demonstrated that this method can recover relatively complex simple structures where other well-known simple structure rotation techniques fail.

98 citations


Journal ArticleDOI
James Arbuckle1

95 citations


Journal ArticleDOI
TL;DR: In this article, a conditional maximum likelihood algorithm for estimation of the basic parameters of the partial credit model is presented, based on recurrences for the combinatorial functions involved, and using a quasi-Newton approach, the so-called Broyden-Fletcher-Goldfarb-Shanno (BFGS) method.
Abstract: The partial credit model is considered under the assumption of a certain linear decomposition of the item × category parametersδ ih into “basic parameters”α j. This model is referred to as the “linear partial credit model”. A conditional maximum likelihood algorithm for estimation of theα j is presented, based on (a) recurrences for the combinatorial functions involved, and (b) using a “quasi-Newton” approach, the so-called Broyden-Fletcher-Goldfarb-Shanno (BFGS) method; (a) guarantees numerically stable results, (b) avoids the direct computation of the Hesse matrix, yet produces a sequence of certain positive definite matricesB k ,k=1, 2, ..., converging to the asymptotic variance-covariance matrix of the $$\hat \alpha _j $$ . The practicality of these numerical methods is demonstrated both by means of simulations and of an empirical application to the measurement of treatment effects in patients with psychosomatic disorders.

92 citations


Journal ArticleDOI
TL;DR: In this paper, the authors established the correspondence between an IRF and a unique set of ICRFs for two of the most commonly used polytomous IRT models (the partial credit models and the graded response model).
Abstract: The item response function (IRF) for a polytomously scored item is defined as a weighted sum of the item category response functions (ICRF, the probability of getting a particular score for a randomly sampled examinee of ability θ). This paper establishes the correspondence between an IRF and a unique set of ICRFs for two of the most commonly used polytomous IRT models (the partial credit models and the graded response model). Specifically, a proof of the following assertion is provided for these models: If two items have the same IRF, then they must have the same number of categories; moreover, they must consist of the same ICRFs. As a corollary, for the Rasch dichotomous model, if two tests have the same test characteristic function (TCF), then they must have the same number of items. Moreover, for each item in one of the tests, an item in the other test with an identical IRF must exist. Theoretical as well as practical implications of these results are discussed.

77 citations


Journal ArticleDOI
TL;DR: This article proposed a generalization of the usual random effects model based on trimmed means for handling unequal variances, under the assumption of normality, but no results were given on how their procedure performs when distributions are nonnormal.
Abstract: The random effects ANOVA model plays an important role in many psychological studies, but the usual model suffers from at least two serious problems. The first is that even under normality, violating the assumption of equal variances can have serious consequences in terms of Type I errors or significance levels, and it can affect power as well. The second and perhaps more serious concern is that even slight departures from normality can result in a substantial loss of power when testing hypotheses. Jeyaratnam and Othman (1985) proposed a method for handling unequal variances, under the assumption of normality, but no results were given on how their procedure performs when distributions are nonnormal. A secondary goal in this paper is to address this issue via simulations. As will be seen, problems arise with both Type I errors and power. Another secondary goal is to provide new simulation results on the Rust-Fligner modification of the Kruskal-Wallis test. The primary goal is to propose a generalization of the usual random effects model based on trimmed means. The resulting test of no differences among J randomly sampled groups has certain advantages in terms of Type I errors, and it can yield substantial gains in power when distributions have heavy tails and outliers. This last feature is very important in applied work because recent investigations indicate that heavy-tailed distributions are common. Included is a suggestion for a heteroscedastic Winsorized analog of the usual intraclass correlation coefficient.

67 citations


Journal ArticleDOI
TL;DR: In this article, a simple test for independence in psychometrics is proposed, which is not recommended that the new test replace the usual test of H0:ρ = 0, but the new measure has important advantages over the usual measure in terms of both Type I errors and power.
Abstract: A well-known result is that the usual correlation coefficient,ρ, is highly nonrobust: very slight changes in only one of the marginal distributions can alterρ by a substantial amount. There are a variety of methods for correcting this problem. This paper identifies one particular method which is useful in psychometrics and provides a simple test for independence. It is not recommended that the new test replace the usual test ofH 0:ρ = 0, but the new test has important advantages over the usual test in terms of both Type I errors and power.

Journal ArticleDOI
TL;DR: In this paper, the authors presented some results on identification in multitrait-multimethod (MTMM) confirmatory factor analysis (CFA) models, and discussed the implications of these results for CFA models in general.
Abstract: This paper presents some results on identification in multitrait-multimethod (MTMM) confirmatory factor analysis (CFA) models. Some MTMM models are not identified when the (factorial-patterned) loadings matrix is of deficient column rank. For at least one other MTMM model, identification does exist despite such deficiency. It is also shown that for some MTMM CFA models, Howe's (1955) conditions sufficient for rotational uniqueness can fail, yet the model may well be identified and rotationally unique. Implications of these results for CFA models in general are discussed.

Journal ArticleDOI
TL;DR: In this article, a new proof was presented for the following result by Ghurye and Wallace: given that the independent random variables are Bernoulli with success probability strictly between 0 and 1 and nondecreasing in the sum, the sum has monotone likelihood ratio.
Abstract: By use of an inequality of Marcus and Lopes for elementary symmetric functions, a new proof is presented for the following result by Ghurye and Wallace: Given that the independent random variablesX j are Bernoulli with success probabilityp j (θ) strictly between 0 and 1 and nondecreasing inθ, the sum ΣX j has monotone likelihood ratio.

Journal ArticleDOI
TL;DR: A split-sample replication stopping rule for hierarchical cluster analysis is compared with the internal criterion previously found superior by Milligan and Cooper (1985) in their comparison of 30 different procedures as discussed by the authors.
Abstract: A split-sample replication stopping rule for hierarchical cluster analysis is compared with the internal criterion previously found superior by Milligan and Cooper (1985) in their comparison of 30 different procedures. The number and extent of overlap of the latent population distributions was systematically varied in the present evaluation of stopping-rule validity. Equal and unequal population base rates were also considered. Both stopping rules correctly identified the actual number of populations when there was essentially no overlap and clusters occupied visually distinct regions of the measurement space. The replication criterion, which is evaluated by clustering of cluster means from preliminary analyses that are accomplished on random partitions of an original data set, was superior as the degree of overlap in population distributions increased. Neither method performed adequately when overlap obliterated visually discernible density nodes.

Journal ArticleDOI
TL;DR: LocallyD-optimaln-point designs are derived using the branch-and-bound algorithm of Welch and appear to be considerably more efficient than random seeding of items.
Abstract: Replenishing item pools for on-line ability testing requires innovative and efficient data collection designs. By generating localD-optimal designs for selecting individual examinees, and consistently estimating item parameters in the presence of error in the design points, sequential procedures are efficient for on-line item calibration. The estimating error in the on-line ability values is accounted for with an item parameter estimate studied by Stefanski and Carroll. LocallyD-optimaln-point designs are derived using the branch-and-bound algorithm of Welch. In simulations, the overall sequential designs appear to be considerably more efficient than random seeding of items.

Journal ArticleDOI
TL;DR: In this paper, hierarchical Bayes procedures for the two-parameter logistic item response model were compared for estimating item and ability parameters, and two joint and two marginal Bayesian estimation procedures were analyzed via simulated data sets.
Abstract: Hierarchical Bayes procedures for the two-parameter logistic item response model were compared for estimating item and ability parameters. Simulated data sets were analyzed via two joint and two marginal Bayesian estimation procedures. The marginal Bayesian estimation procedures yielded consistently smaller root mean square differences than the joint Bayesian estimation procedures for item and ability estimates. As the sample size and test length increased, the four Bayes procedures yielded essentially the same result.

Journal ArticleDOI
TL;DR: In this paper, conditions are stated for the existence of a set of independent Rasch binary items such that their raw score and the partial credit raw score have identical probability density functions.
Abstract: Given a Masters partial credit item withn known step difficulties, conditions are stated for the existence of a set of (locally) independent Rasch binary items such that their raw score and the partial credit raw score have identical probability density functions. The conditions are those for the existence ofn positive values with predetermined elementary symmetric functions and include the requirement that then step difficulties form an increasing sequence.

Journal ArticleDOI
TL;DR: In this article, a modification of one of the available statistical tests of the equality of nonindependent alpha reliability coefficients is derived which avoids this limitation, and the modified test can be safely employed in comparisons between interrater reliabilities.
Abstract: The available statistical tests of the equality of nonindependent alpha reliability coefficients require that the product of the number of test parts times the number of subjects be quite large—1000 or more. A modification of one of these tests is derived which avoids this limitation. Monte Carlo studies indicate that the modified test effectively controls the Type I error rate with as few as 2 or 3 test parts and 50 subjects. This means the modified test can be safely employed in comparisons between interrater reliabilities.

Journal ArticleDOI
TL;DR: In this paper, an asymptotic expression for the reliability of a linearly equated test is developed using normal theory, where the reliability is expressed as the product of two terms, the test before equating, and an adjustment term.
Abstract: An asymptotic expression for the reliability of a linearly equated test is developed using normal theory. The reliability is expressed as the product of two terms, the reliability of the test before equating, and an adjustment term. This adjustment term is a function of the sample sizes used to estimate the linear equating transformation. The results of a simulation study indicate close agreement between the theoretical and simulated reliability values for samples greater than 200. Findings demonstrate that samples as small as 300 can be used in linear equating without an appreciable decrease in reliability.

Journal ArticleDOI
TL;DR: In this paper, it was shown that the pattern resulting from independent cluster rotation is columnwise proportional to the associated weights matrix, which implies that the interpretation of the components does not depend on whether it is based on the pattern or on the component weights matrix.
Abstract: Procedures for oblique rotation of factors or principal components typically focus on rotating the pattern matrix such that it becomes optimally simple. An important oblique rotation method that does so is Harris and Kaiser's (1964) independent cluster (HKIC) rotation. In principal components analysis, a case can be made for interpreting the components on the basis of the component weights rather than on the basis of the pattern, so it seems desirable to rotate the components such that the weights rather than the pattern become optimally simple. In the present paper, it is shown that HKIC rotates the components such that both the pattern and the weights matrix become optimally simple. In addition, it is shown that the pattern resulting from HKIC rotation is columnwise proportional to the associated weights matrix, which implies that the interpretation of the components does not depend on whether it is based on the pattern or on the component weights matrix. It is also shown that the latter result only holds for HKIC rotation and slight modifications of it.

Journal ArticleDOI
TL;DR: In this paper, the authors provide a unified, theoretical basis on which measures of data reliability may be derived or evaluated, for both quantitative and qualitative data This approach evaluates reliability as the "proportional reduction in loss" (PRL) that is attained in a sample by an optimal estimator.
Abstract: We provide a unified, theoretical basis on which measures of data reliability may be derived or evaluated, for both quantitative and qualitative data This approach evaluates reliability as the “proportional reduction in loss” (PRL) that is attained in a sample by an optimal estimator The resulting measure is between 0 and 1, linearly related to expected loss, and provides a direct way of contrasting the measured reliability in the sample with the least reliable and most reliable data-generating cases The PRL measure is a generalization of many of the commonly-used reliability measures We show how the quantitative measures from generalizability theory can be derived as PRL measures (including Cronbach's alpha and measures proposed by Winer) For categorical data, we develop a new measure for the general case in which each of N judges assigns a subject to one of K categories and show that it is equivalent to a measure proposed by Perreault and Leigh for the case where N is 2

Journal ArticleDOI
TL;DR: In this article, an ANCOVA for pre-and post-test variablesX andY which are ordinal measures of η and Θ, respectively, is presented.
Abstract: With random assignment to treatments and standard assumptions, either a one-way ANOVA of post-test scores or a two-way, repeated measures ANOVA of pre- and post-test scores provides a legitimate test of the equal treatment effect null hypothesis for latent variable Θ. In an ANCOVA for pre- and post-test variablesX andY which are ordinal measures ofη and Θ, respectively, random assignment and standard assumptions ensure the legitimacy of inferences about the equality of treatment effects on latent variable Θ. Sample estimates of adjustedY treatment means are ordinal estimators of adjusted post-test means on latent variable Θ.

Journal ArticleDOI
TL;DR: A model for preferential and triadic choice is derived in terms of weighted sums of centralF distribution functions, a probabilistic generalization of Coombs' (1964) unfolding model, which provides more insight into the same problem that they discussed.
Abstract: A model for preferential and triadic choice is derived in terms of weighted sums of centralF distribution functions. This model is a probabilistic generalization of Coombs' (1964) unfolding model and special cases, such as the model of Zinnes and Griggs (1974), can be derived easily from it. This new form extends previous work by Mullen and Ennis (1991) and provides more insight into the same problem that they discussed.

Journal ArticleDOI
TL;DR: In this paper, a useful method for identifying a variate as inconsistent is proposed in factor analysis based on the likelihood principle, which is illustrated by some examples of inconsistent variates.
Abstract: When some of observed variates do not conform to the model under consideration, they will have a serious effect on the results of statistical analysis. In factor analysis the model with inconsistent variates may result in improper solutions. In this article a useful method for identifying a variate as inconsistent is proposed in factor analysis. The procedure is based on the likelihood principle. Several statistical properties such as the effect of misspecified hypotheses, the problem of multiple comparisons, and robustness to violation of distributional assumptions are investigated. The procedure is illustrated by some examples.

Journal ArticleDOI
TL;DR: Using the constant information model and constant amounts of test information for a finite interval of ability, simulated data suggest that it is desirable to consider some modification of the test information function when it is used as the measure of accuracy in ability estimation.
Abstract: The test information function serves important roles in latent trait models and in their applications. Among others, it has been used as the measure of accuracy in ability estimation. A question arises, however, if the test information function is accurate enough for all meaningful levels of ability relative to the test, especially when the number of test items is relatively small (e.g., less than 50). In the present paper, using the constant information model and constant amounts of test information for a finite interval of ability, simulated data were produced for eight different levels of ability and for twenty different numbers of test items ranging between 10 and 200. Analyses of these data suggest that it is desirable to consider some modification of the test information function when it is used as the measure of accuracy in ability estimation.

Journal ArticleDOI
TL;DR: In a simulation study it is shown that under the presence of outliers the robust functions outperform the ordinary least squares function, both when the underlying structure is linear in the variables as when it is nonlinear.
Abstract: A method for robust canonical discriminant analysis via two robust objective loss functions is discussed. These functions are useful to reduce the influence of outliers in the data. Majorization is used at several stages of the minimization procedure to obtain a monotonically convergent algorithm. An advantage of the proposed method is that it allows for optimal scaling of the variables. In a simulation study it is shown that under the presence of outliers the robust functions outperform the ordinary least squares function, both when the underlying structure is linear in the variables as when it is nonlinear. Furthermore, the method is illustrated with empirical data.

Journal ArticleDOI
TL;DR: A simplified tensor basis is given, by showing that a symmetric matrix can also be decomposed in terms of 1/2n(n+1) fixed binary matrices of rank one, and it is shown that in this case the number of dimensions needed can be as large asp, thenumber of matrices analyzed.
Abstract: Zellini (1979, Theorem 3.1) has shown how to decompose an arbitrary symmetric matrix of ordern ×n as a linear combination of 1/2n(n+1) fixed rank one matrices, thus constructing an explicit tensor basis for the set of symmetricn ×n matrices. Zellini's decomposition is based on properties of persymmetric matrices. In the present paper, a simplified tensor basis is given, by showing that a symmetric matrix can also be decomposed in terms of 1/2n(n+1) fixed binary matrices of rank one. The decomposition implies that ann ×n ×p array consisting ofp symmetricn ×n slabs has maximal rank 1/2n(n+1). Likewise, an unconstrained INDSCAL (symmetric CANDECOMP/PARAFAC) decomposition of such an array will yield a perfect fit in 1/2n(n+1) dimensions. When the fitting only pertains to the off-diagonal elements of the symmetric matrices, as is the case in a version of PARAFAC where communalities are involved, the maximal number of dimensions can be further reduced to 1/2n(n−1). However, when the saliences in INDSCAL are constrained to be nonnegative, the tensor basis result does not apply. In fact, it is shown that in this case the number of dimensions needed can be as large asp, the number of matrices analyzed.

Journal ArticleDOI
TL;DR: Evidence is provided that the TREEFAM outperforms traditional models that ignore the effects of unfamiliarity in terms of superior tree recovery and overall goodness-of-fit.
Abstract: This paper presents a new procedure called TREEFAM for estimating ultrametric tree structures from proximity data confounded by differential stimulus familiarity. The objective of the proposed TREEFAM procedure is to quantitatively “filter out” the effects of stimulus unfamiliarity in the estimation of an ultrametric tree. A conditional, alternating maximum likelihood procedure is formulated to simultaneously estimate an ultrametric tree, under the unobserved condition of complete stimulus familiarity, and subject-specific parameters capturing the adjustments due to differential unfamiliarity. We demonstrate the performance of the TREEFAM procedure under a variety of alternative conditions via a modest Monte Carlo experimental study. An empirical application provides evidence that the TREEFAM outperforms traditional models that ignore the effects of unfamiliarity in terms of superior tree recovery and overall goodness-of-fit.

Journal ArticleDOI
TL;DR: In this article, empirical approximate Bayes estimators (EABEs) were proposed for estimating domain scores under binomial and hypergeometric distributions, respectively, and the convergence rate of the overall expected loss of Bayes risk in either EABE or EBE depends on test length, sample size, and ratio of test length to size of domain items.
Abstract: We introduce two simple empirical approximate Bayes estimators (EABEs)—\(\widetilde{d}_N (x)\) and\(\widetilde\delta _N (x)\)—for estimating domain scores under binomial and hypergeometric distributions, respectively. Both EABEs (derived from corresponding marginal distributions of observed test scorex without relying on knowledge of prior domain score distributions) have been proven to hold Δ-asymptotic optimality in Robbins' sense of convergence in mean. We found that, where\(\widetilde{d}^* _N\) and\(\widetilde\delta ^* _N\) are the monotonized versions of\(\widetilde{d}_N\) and\(\widetilde\delta _N\) under Van Houwelingen's monotonization method, respectively, the convergence rate of the overall expected loss of Bayes risk in either\(\widetilde{d}^* _N\) or\(\widetilde\delta ^* _N\) depends on test length, sample size, and ratio of test length to size of domain items. In terms of conditional Bayes risk,\(\widetilde{d}^* _N\) and\(\widetilde\delta ^* _N\) outperform their maximum likelihood counterparts over the middle range of domain scales. In terms of mean-squared error, we also found that: (a) given a unimodal prior distribution of domain scores,\(\widetilde\delta ^* _N\) performs better than both\(\widetilde{d}^* _N\) and a linear EBE of the beta-binomial model when domain item size is small or when test items reflect a high degree of heterogeneity; (b)\(\widetilde{d}^* _N\) performs as well as\(\widetilde\delta ^* _N\) when prior distribution is bimodal and test items are homogeneous; and (c) the linear EBE is extremely robust when a large pool of homogeneous items plus a unimodal prior distribution exists.

Journal ArticleDOI
TL;DR: In this article, it was shown that for all three types, the absolute value of the first variant is greater than or equal to the absolute values of the second and third variants, respectively.
Abstract: De Vries (1993) discusses Pearson's product-moment correlation, Spearman's rank correlation, and Kendall's rank-correlation coefficient for assessing the association between the rows of two proximity matrices. For each of these he introduces a weighted average variant and a rowwise variant. In this note it is shown that for all three types, the absolute value of the first variant is greater than or equal to the absolute value of the second.