scispace - formally typeset
Search or ask a question

Showing papers in "Psychometrika in 1971"


Journal ArticleDOI
TL;DR: In this article, the authors study similarities and differences in factor structures between different groups in a setting where a battery of tests has been administered to samples of examinees from several populations. But they do not consider the differences between different populations.
Abstract: This paper is concerned with the study of similarities and differences in factor structures between different groups. A common situation occurs when a battery of tests has been administered to samples of examinees from several populations.

1,592 citations


Journal ArticleDOI
TL;DR: In this article, various models for sets of congeneric tests are considered, including models appropriate for the analysis of multitrait-multimethod data, and the special cases when two or more tests within a set are tau-equivalent or parallel are also considered.
Abstract: Various models for sets of congeneric tests are considered, including models appropriate for the analysis of multitrait-multimethod data. All models are illustrated with real data. The special cases when two or more tests within a set are tau-equivalent or parallel are also considered. All data analyses are done within the framework of a general model by Joreskog [1970].

1,511 citations


Journal ArticleDOI
TL;DR: In this paper, a rigorous and greatly simplified proof of Guttman's theorem for the least upper bound dimensionality of arbitrary real symmetric matrices was given, where the points embedded in a real Euclidean space subtend distances which are strictly monotone with the off-diagonal elements of the matrix.
Abstract: This paper gives a rigorous and greatly simplified proof of Guttman's theorem for the least upper-bound dimensionality of arbitrary real symmetric matricesS, where the points embedded in a real Euclidean space subtend distances which are strictly monotone with the off-diagonal elements ofS. A comparable and more easily proven theorem for the vector model is also introduced. At mostn-2 dimensions are required to reproduce the order information for both the distance and vector models and this is true for any choice of real indices, whether they define a metric space or not. If ties exist in the matrices to be analyzed, then greatest lower bounds are specifiable when degenerate solutions are to be avoided. These theorems have relevance to current developments in nonmetric techniques for the monotone analysis of data matrices.

125 citations


Journal ArticleDOI

116 citations


Journal ArticleDOI
TL;DR: Several themes which are common to both econometrics and psychometrics are surveyed in this paper, illustrated by reference to permanent income hypotheses, simultaneous equation models, adaptive expectations and partial adjustment schemes.
Abstract: Several themes which are common to both econometrics and psychometrics are surveyed. The themes are illustrated by reference to permanent income hypotheses, simultaneous equation models, adaptive expectations and partial adjustment schemes, and by reference to test score theory, factor analysis, and time-series models.

98 citations


Journal ArticleDOI
TL;DR: For example, this paper found that when subsamples of candidates were drawn from their respective racial groups, matched on mathematical and verbal items, there was an observable decrease in the size of the item x race interaction, suggesting that one factor contributing to that interaction was simply the difference in performance levels on the test shown by the two races.
Abstract: Several samples of Black and White students were drawn from the 1970 PSAT administration in Georgia and studied for item x race interaction on both the verbal and mathematical sections of the test. When subsamples of candidates were drawn from their respective racial groups, matched on mathematical for the study of verbal items and matched on verbal for the study of mathematical items, there was an observable decrease in the size of the item x race interaction, suggesting that one factor contributing to that interaction was simply the difference in performance levels on the test shown by the two races. Further analyses demonstrated a moderate item x group interaction for Blacks native to different cities and a moderate item x group interaction for Blacks native to areas of different population density.

93 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider external characteristics of four methods for determining factor score estimates; that is, relations of these estimates to measures on attributes not entered into the factor analysis, and find that different ones of the methods are appropriate for different uses.
Abstract: Considerations of factor score estimates have concentrated on internal characteristics. This report considers external characteristics of four methods for determining factor score estimates; that is, relations of these estimates to measures on attributes not entered into the factor analysis. These external characteristics are important for many uses of factor score estimates. Findings are that different ones of the methods are appropriate for different uses.

88 citations


Journal ArticleDOI
TL;DR: For instance, the authors found that at grade 5 there were no differences in achievement between the genders, but by grade 5 the boys pulled ahead, while parallel differences emerged in the percentage perceiving mathematics as interesting and as likely to be helpful in earning a living.
Abstract: With the objective of investigating sex-typed interests as possible causes of difference in mathematics achieve ment between the sexes, this study made use of longitudinal data from the Growth Study, begun at Educational Testing Service (ETS) in 1961. Growth in mathematics achievement as measured by STEP Math and SCAT-Q was compared with changing interest patterns as reflected in certain biographical questionnaire responses. At grade 5 there were no differences in achievement but thereafter the boys pulled ahead, while parallel differences emerged in the percentage perceiving mathematics as interesting and as likely to be helpful in earning a living. MANY PREVIOUS studies have examined sex differences in ability and achievement in mathe matics. Tyler (18:240ff), Anastasi (3:497), and Maccoby (15:26-28) surveyed the field of sex dif ferences in aptitude and achievement and report that girls usually do better in verbal and linguistic studies, while boys generally do better in nu merical and spatial aptitudes and in tests of arithmetical reasoning. Maccoby points out that during grade school years, some studies show boys beginning to forge ahead on tests of i(arithmetical reasoning," although a number of studies reveal no sex differences on this dimension at this time. Fairly consistently, however, boys excel at arithmetical reason ing in high school, and the differences are substantially in favor of men among college students and adults (15:26).

77 citations


Journal ArticleDOI
TL;DR: A simplified proof of a lemma by Ledermann [1938] which lies at the core of the factor indeterminacy issue is presented in this article, which leads to a representation of an orthogonal matrixT, relating equivalent factor solutions, which is used to evaluate bounds on the average correlation between equivalent sets of uncorrelated factors.
Abstract: A simplified proof of a lemma by Ledermann [1938], which lies at the core of the factor indeterminacy issue, is presented. It leads to a representation of an orthogonal matrixT, relating equivalent factor solutions, which is different from Ledermann's [1938] and Guttman's [1955].T is used to evaluate bounds on the average correlation between equivalent sets of uncorrelated factors. It is found that the minimum average correlation is independent of the data.

77 citations


Journal ArticleDOI
TL;DR: In this article, three approximation techniques have been suggested when the group sizes are unequal, and empirically analyzed to determine its effect on TypeI error, including average group size and differences in group size.
Abstract: A limitation of the Tukey HSD procedure for multiple comparison has been the requirement of equal number of observations for each group. Three approximation techniques have been suggested when the group sizes are unequal. Each of these techniques was empirically analyzed to determine its effect on TypeI error. Two of the considered variables, average group size and differences in group size, caused differing actual probabilities of TypeI error. One of the three techniques (Kramer's) consistently provided actual probabilities in close agreement with corresponding nominal probabilities.

59 citations


Journal ArticleDOI
TL;DR: The oblimax, promax, maxplane, and Harris-Kaiser techniques are compared and the Harris- Kaiser procedure—independent cluster version for factorially simple data,P'P proportional to φ, with equamax rotations, for complex—is recommended.
Abstract: The oblimax, promax, maxplane, and Harris-Kaiser techniques are compared. For five data sets, of varying reliability and factorial complexity, each having a graphic oblique solution (used as criterion), solutions obtained using the four methods are evaluated on (1) hyperplane-counts, (2) agreement of obtained with graphic within-method primary factor correlations and angular separations, (3) angular separations between obtained and corresponding graphic primary axes. The methods are discussed and ranked (descending order): Harris-Kaiser, promax, oblimax, maxplane. The Harris-Kaiser procedure—independent cluster version for factorially simple data,P'P proportional to φ, with equamax rotations, for complex—is recommended.


Journal ArticleDOI
TL;DR: A multidimensional scaling analysis is presented for replicated layouts of pairwise choice responses that involves the reduction of a three-way least squares problem to two subproblems, one trivial and the other solvable by classical least squares matrix factorization.
Abstract: A multidimensional scaling analysis is presented for replicated layouts of pairwise choice responses. In most applications the replicates will represent individuals who respond to all pairs in some set of objects. The replicates and the objects are scaled in a joint space by means of an inner product model which assigns weights to each of the dimensions of the space. Least squares estimates of the replicates' and objects' coordinates, and of unscalability parameters, are obtained through a manipulation of the error sum of squares for fitting the model. The solution involves the reduction of a three-way least squares problem to two subproblems, one trivial and the other solvable by classical least squares matrix factorization. The analytic technique is illustrated with political preference data and is contrasted with multidimensional unfolding in the domain of preferential choice.

Journal ArticleDOI
TL;DR: In this article, a general one-way analysis of variance components with unequal replication numbers is used to provide unbiased estimates of the true and error score variance of classical test theory, and the foundations for a Bayesian approach are detailed.
Abstract: A general one-way analysis of variance components with unequal replication numbers is used to provide unbiased estimates of the true and error score variance of classical test theory. The inadequacy of the ANOVA theory is noted and the foundations for a Bayesian approach are detailed. The choice of prior distribution is discussed and a justification for the Tiao-Tan prior is found in the particular context of the “n-split” technique. The posterior distributions of reliability, error score variance, observed score variance and true score variance are presented with some extensions of the original work of Tiao and Tan. Special attention is given to simple approximations that are available in important cases and also to the problems that arise when the ANOVA estimate of true score variance is negative. Bayesian methods derived by Box and Tiao and by Lindley are studied numerically in relation to the problem of estimating true score. Each is found to be useful and the advantages and disadvantages of each are discussed and related to the classical test-theoretic methods. Finally, some general relationships between Bayesian inference and classical test theory are discussed.

Journal ArticleDOI
TL;DR: Sarett and Wilson as mentioned in this paper defined state as a generalized drive state providing, for example, the intensity dimension of the emotions, the alertness factor in intelli gence, and the general level of reactivity to stimulation.
Abstract: State is one of those psychological constructs which is widely used, carries meaning for commerce, and yet, when carefully con sidered, is rather difficult to define. It is clearly an important charac teristic of human behavior and is probably one of the more important variables distinguishing the living from the inanimate, such as ma chines. Yet, its definition is most difficult and soon gives way to sim ple taxonomy. State is usually considered, first of all, as a continuum of be havior, reflecting some underlying condition. This condition is usually defined along either an arousal continuum or a consciousness con tinuum. In contemporary psychology the notion of consciousness— as the entire issue of phenomenology—has been neglected, so most investigations deal with state in terms of arousal. Duffy's (1962) de finition of arousal demonstrates the breadth of this concept. It is conceived as a generalized drive state providing, for example, the intensity dimension of the emotions, the alertness factor in intelli gence, and the general level of reactivity to stimulation—a rather inclusive dimension. The consciousness continuum is less well de fined, but has within it the notion of awareness—either internal or external (see Hilgard, 1969). Given that state is usually defined as an arousal continuum, it would be easy to define state explicitly as some continuum in a speci fic behavioral area of choosing that continuum as a function of the model of behavior we wish to employ. Thus, if one were talking about brain function, one would discuss state (and state changes) in terms of EEG or REM behavior during various levels of sleep. Construction of autonomic nervous system models would describe state in terms of heart rate level, while activity models would measure movement, smiling, and sucking changes. Attention could be considered a state 1This research is supported by the National Science Foundation, Grant #GB-8590. and an Early Childhood Research Council Grant. Recognition is to be given to Pamela Sarett and Yvonne Watson for data collection, and to Cornelia Wilson for data analysis. * Educational Testing Service, Princeton, New Jersey 08540.

Journal ArticleDOI
TL;DR: The most commonly stated purposes of grades are for selection to more advanced educational programs and employment, for motivating students, and for providing students with information about their performance as discussed by the authors, however, consideration of purposes seldom enters decisions about form.
Abstract: While grades and grading procedures have become issues of widespread interest and controversy in recent years, the interest has been directed primarily toward questions of external form and the prediction of later grades from earlier grades. More important questions concerning the purposes and effectiveness of grades have continued to be neglected. Nontraditional forms of grading currently being tried are usually Pass/Fail systems, but Pass/No Record and descriptive grading are also used. The most commonly stated purposes of grades are for selection to more advanced educational programs and employment, for motivating students, and for providing students with information about their performance. Consideration of purposes, however, seldom enters decisions about form. Grades are charged with distorting both learning and teaching and undesirably affecting students' attitudes and emotional states, yet evidence either for or against these charges is sparse. Education is commonly viewed as a vehicle for social and economic mobility, but a contrary view of the educational system as a device for the restraint of mobility and the maintenance of the existing social class structure has also been presented. Neither view can point to convincing evidence in its support, but grades have a major role in both. Although grades are often charged with being unreliable, those charges usually refer to grades assigned to individual pieces of student work, such as test papers or themes. Course grades and grade-point averages have shown high internal consistency and good temporal stability up to about a year. Treating academic performance as a single dimension represented by grades is therefore justifiable. Nevertheless, situations probably exist in which the widely presumed but infrequently demonstrated multidimensional nature of academic performance should be acknowledged and put to use.

Book ChapterDOI
TL;DR: In this paper, an experimenter threw individually 219 different dice of four different brands and recorded even and odd outcomes for one block of 20,000 trials for each die, for a total of 4,380,000 throws in all.
Abstract: An experimenter threw individually 219 different dice of four different brands and recorded even and odd outcomes for one block of 20,000 trials for each die—4,380,000 throws in all. The resulting data on runs offer a basis for comparing the observed properties of such a physical randomizing process with theory and with simulations based on pseudo-random numbers and RAND Corporation random numbers. Although generally the results are close to those forecast by theory, some notable exceptions raise questions about the surprise value that should be associated with occurrences two standard deviations from the mean. These data suggest that the usual significance level may well actually be running from 7 to 15 percent instead of the theoretical 5 percent.

Journal ArticleDOI
TL;DR: This paper traces the development of the procedure from Hevner's beginning method up to the various methods in use today and describes both the testing procedures and scoring methods used.
Abstract: Confidence testing has been used in varying forms over the past 40 years as a method for increasing the amount of information available from objective test items. This paper traces the development of the procedure from Hevner's beginning method up to the various methods in use today and describes both the testing procedures and scoring methods used. The term confidence testing is applied to both probabilistic testing and confidence weighting procedures. Various procedures are presented and their relationship with personality factors discussed.

Journal ArticleDOI
TL;DR: In this article, specially constructed "speeded and unspeeded" forms of a reading comprehension test were administered to regular center and fee-free center LSAT candidates in an effort to determine: (1) if the test was more speeded for fee free candidates, and (2) if reducing the amount of speededness was more beneficial to fee free candidate.
Abstract: Specially constructed “speeded” and “unspeeded” forms of a Reading Comprehension test were administered to regular center and fee-free center LSAT candidates in an effort to determine: (1) if the test was more speeded for fee-free candidates, and (2) if reducing the amount of speededness was more beneficial to fee-free candidates. Results of the analyses show: (1) the test is somewhat more speeded for fee-free candidates than for regular candidates, (2) reducing the amount of speededness produces higher scores for both regular (22 scaled score points) and fee-free (33 scaled score points) center candidates, and (3) reducing speededness is not more beneficial (in terms of increasing the number of items answered correctly) to fee-free than to regular center candidates. Lower KR-20 reliability was observed under speeded conditions in the fee-free sample and is discussed.

Journal ArticleDOI
TL;DR: A variety of path models involving unmeasured variables are formulated in terms of Jőreskog's (1970a) general model for the analysis of covariance structures as discussed by the authors.
Abstract: A variety of path models involving unmeasured variables are formulated in terms of Jőreskog's (1970a) general model for the analysis of covariance structures.

Journal ArticleDOI
TL;DR: Differential prediction for black and white students was empirically investigated at 13 institutions by comparison of regression planes as discussed by the authors, focusing on the possibility that prediction procedures that are appropriate for white (majority) students would under-predict the performance of black (minority) students.
Abstract: Differential prediction for black and white students was empirically investigated at 13 institutions by comparison of regression planes. Particular attention was given to the possibility that prediction procedures that are appropriate for white (majority) students would under-predict the performance of black (minority) students. The data tend to support, among others, the following generalizations: (1) a single regression plane cannot be used to predict freshman GPA for both blacks and whites in many of the institutions studied; (2) nevertheless, if prediction of GPA from SAT scores is based upon prediction equations suitable for majority students, then black students, as a group, are predicted to do about as well as (or better than) they actually do.

Journal ArticleDOI
TL;DR: In this paper, the authors present a contribution to the sampling theory of a set of homogeneous tests which differ only in length, test length being regarded as an essential test parameter.
Abstract: This paper presents a contribution to the sampling theory of a set of homogeneous tests which differ only in length, test length being regarded as an essential test parameter. Observed variance-covariance matrices of such measurements are taken to follow a Wishart distribution. The familiar true score-and-error concept of classical test theory is employed. Upon formulation of the basic model it is shown that in a combination of such tests forming a “total” test, the singal-to-noise ratio of the components is additive and that the inverse of the population variance-covariance matrix of the component measures has all of its off-diagonal elements equal, regardless of distributional assumptions. This fact facilitates the subsequent derivation of a statistical sampling theory, there being at mostm + 1 free parameters whenm is the number of component tests. In developing the theory, the cases of known and unknown test lengths are treated separately. For both cases maximum-likelihood estimators of the relevant parameters are derived. It is argued that the resulting formulas will remain resonable even if the distributional assumptions are too narrow. Under these assumptions, however, maximum-likelihood ratio tests of the validity of the model and of hypotheses concerning reliability and standard error of measurement of the total test are given. It is shown in each case that the maximum-likelihood equations possess precisely one acceptable solution under rather natural conditions. Application of the methods can be effected without the use of a computer. Two numerical examples are appended by way of illustration.

Journal ArticleDOI
Joseph B. Kruskal1
TL;DR: In this paper, it was shown that the gradient ΔQ exists and is continuous everywhere and is given by a simple formula, where ΔQ is the residual sum of squares obtained from the least-squares monotone regression.
Abstract: Least-squares monotone regression has received considerable discussion and use. Consider the residual sum of squaresQ obtained from the least-squares monotone regression ofy i onx i . TreatingQ as a function of they i , we prove that the gradient ΔQ exists and is continuous everywhere, and is given by a simple formula. (We also discuss the gradient ofd=Q 1/2.) These facts, which can be questioned (Louis Guttman, private communication), are important for the iterative numerical solution of models, such as some kinds of multidimensional scaling, in which monotone regression occurs as a subsidiary element, so that they i and hence indirectlyQ are functions of other variables.

Journal ArticleDOI
TL;DR: In this article, a flexilevel test is found to be superior to a peaked conventional test for measuring examinees in the middle of the ability range, superior for examinees at the extremes.
Abstract: A flexilevel test is found to be inferior to a peaked conventional test for measuring examinees in the middle of the ability range, superior for examinees at the extremes. Throughout the entire range of ability, a flexilevel test is much superior to any conventional test that attempts to provide accurate measurement at both extremes.

Journal ArticleDOI
TL;DR: In this article, a numerical procedure for obtaining an interval estimate of a parameter in an empirical Bayes estimation problem is presented, where each observed value X has a binomial probability distribution depending on the binomial parameter Z, where Z has an unknown distribution.
Abstract: : A numerical procedure is outlined for obtaining an interval estimate of a parameter in an empirical Bayes estimation problem. In the particular problem considered, each observed value X has a binomial probability distribution depending on the binomial parameter Z, where Z has an unknown distribution. For each x, the parameter estimated is epsilon(Z/X = x), the posterior mean. Illustrative numerical results are presented. (Author)

Journal ArticleDOI
TL;DR: In this article, an exploratory model of teaching behavior as a role contract is proposed, which consists of three subject strategies, the didactic, generalist, and researcher strategies, and the dimensions of student response, ambiguity, and warmth.
Abstract: An exploratory model of teaching behavior as a role contract is proposed. The model consists of three subject strategies–the didactic, generalist, and researcher strategies, and the dimensions of student response, ambiguity, and warmth. Indices designed to measure the dimensions in the model were developed and related to various criteria in a large sample of two-year colleges and students. These criteria included faculty ratings, students' sense of progress, satisfaction, and college achievements. The indices were related to these criteria in plausible ways, the generalist, researcher, and warmth indices typically having positive relations, and ambiguity having negative relations.

Journal ArticleDOI
TL;DR: In this paper, scores of foreign graduate students on the GRE Aptitude Tests and the Test of English as a Foreign Language (TOEFL) were combined through multiple and moderated regression to predict grade-point average (GPA).
Abstract: Scores of foreign graduate students on the Graduate Record Examinations (GRE) Aptitude Tests and the Test of English as a Foreign Language (TOEFL) were combined through multiple and moderated regression to predict grade-point average (GPA). It was hypothesized that TOEFL would moderate the relationship between the GRE scores and GPA. According to this hypothesis, students scoring high on TOEFL would be more predictable by GRE than those scoring low. The hypothesis was only partially supported by the results. The results suggest that foreign students with low English verbal aptitude can succeed in American graduate schools. The limitations of GPA as a criterion of graduate school success for foreign students is discussed.

Journal ArticleDOI
TL;DR: Joreskog's ACOVSM as mentioned in this paper is a general computer program for analysis of covariance structures including Generalized MANOVA, which is capable of handling most standard statistical models as well as many nonstandard and complicated ones.
Abstract: Joreskog, Karl G.; And Others ACOVSM: A General Computer Program for Analysis of Covariance Structures Including Generalized MANOVA. Educational Testing Service, Princeton, N.J. RB-71-1 Jan 71 69p. EDRS Price ME-$0.65 HC-$3.29 *Analysis of Covariance, *Analysis of Variance, *Computer Programs, Mathematics, *Models, Multiple Regression Analysis, Statistical Analysis, *Statistics Joreskog's general method for analysis of covariance structures was developed for estimating a model involving structures of a very general form on means, variances, and covariances of multivariate observations. This method achieves a great deal of generality and flexibility, in that it is capable of handling most standard statistical models as well as many nonstandard and complicated ones. This paper describes a computer program for the method. When the variance-covariance matrix of the observed variables is unconstrained, the method may be used to estimate location parameters and to test linear hypotheses about them. For example, the program may be used to handle such standard problems as multivariate regression, ANOVA, and MANOVA. A unique feature of the method is that it can also be used when the variance-covariance matrix is constrained tc be of a certain form. Various other models involving correlated errors or errors of measurement can also be handled. An illustration of how input data is set up, and what the printout looks like for two small sets of data, are included. (Author/CK)

Journal ArticleDOI
TL;DR: In this article, a general four-state chain has the same parameter space as an all-or-none model if and only if its representation with an observable absorbing state is lumpable into a Markov chain with three states.
Abstract: Methods developed by Bernbach [1966] and Millward [1969] permit increased generality in analyses of identifiability. Matrix equations are presented that solve part of the identifiability problem for a class of Markov models. Results of several earlier analyses are shown to involve special cases of the equations developed here. And it is shown that a general four-state chain has the same parameter space as an all-or-none model if and only if its representation with an observable absorbing state is lumpable into a Markov chain with three states.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed two kinds of creativity measures, divergent measures and convergent measures, adaptations of Mednick's Remote Associates Test, in which he attempted to find one word which was associatively related to each of three others.
Abstract: Fourth through sixth grade children were given two kinds of creativity measures–divergent measures in which the child named all the ideas he could that met a simple requirement, and convergent measures, adaptations of Mednick's Remote Associates Test, in which he attempted to find one word which was associatively related to each of three others. Divergent and convergent measures shared little variance, and the latter were strongly correlated with IQ and achievement. Moreover, convergent items requiring production of the correct association were strongly related to items requiring only recognition. It was argued that in children Remote Associates performance depends on evaluative abilities rather than the size of the associative repertoire.