scispace - formally typeset
Search or ask a question

Showing papers in "Psychometrika in 1951"


Journal ArticleDOI
TL;DR: In this paper, a general formula (α) of which a special case is the Kuder-Richardson coefficient of equivalence is shown to be the mean of all split-half coefficients resulting from different splittings of a test, therefore an estimate of the correlation between two random samples of items from a universe of items like those in the test.
Abstract: A general formula (α) of which a special case is the Kuder-Richardson coefficient of equivalence is shown to be the mean of all split-half coefficients resulting from different splittings of a test. α is therefore an estimate of the correlation between two random samples of items from a universe of items like those in the test. α is found to be an appropriate index of equivalence and, except for very short tests, of the first-factor concentration in the test. Tests divisible into distinct subtests should be so divided before using the formula. The index $$\bar r_{ij} $$ , derived from α, is shown to be an index of inter-item homogeneity. Comparison is made to the Guttman and Loevinger approaches. Parallel split coefficients are shown to be unnecessary for tests of common types. In designing tests, maximum interpretability of scores is obtained by increasing the first-factor concentration in any separately-scored subtest and avoiding substantial group-factor clusters within a subtest. Scalability is not a requisite.

37,235 citations


Journal ArticleDOI
TL;DR: In this article, a procedure for estimating the reliability of sets of ratings, test scores, or other measures is described and illustrated, based upon analysis of variance, which may be applied both in the special case where a complete set of ratings from each ofk sources is available for each ofn subjects, and in the general case wherek 1,k 2,k 3,k 4,k 5,k 6,k 7,k 8,k 9,k 10,k 11,k 12,
Abstract: A procedure for estimating the reliability of sets of ratings, test scores, or other measures is described and illustrated. This procedure, based upon analysis of variance, may be applied both in the special case where a complete set of ratings from each ofk sources is available for each ofn subjects, and in the general case wherek 1,k 2, ...,k n ratings are available for each of then subjects. It may be used to obtain either a unique estimate or a confidence interval for the reliability of either the component ratings or their averages. The relations of this procedure to others intended to serve the same purpose are considered algebraically and illustrated numerically.

1,033 citations


Journal ArticleDOI
TL;DR: It is proposed that the degree to which tests are speeded be investigated explicitly, and an indexτ is advanced to define this concept, and it is demonstrated that, for moderately speeded tests, the coefficient of equivalence can be determined approximately from single-trial data.
Abstract: Non-spurious methods are needed for estimating the coefficient of equivalence for speeded tests from single-trial data. Spuriousness in a split-half estimate depends on three conditions; the split-half method may be used if any of these is demonstrated to be absent. A lower-bounds formula,r c, is developed. An empirical trial of this coefficient and other bounds proposed by Gulliksen demonstrates that, for moderately speeded tests, the coefficient of equivalence can be determined approximately from single-trial data. It is proposed that the degree to which tests are speeded be investigated explicitly, and an indexτ is advanced to define this concept.

651 citations


Journal ArticleDOI
TL;DR: In this article, it is shown that the assumption of zero correlations can be relaxed to an assumption of equal correlations between pairs with no change in method, and the usual approach to the method of paired comparisons Case V is shown to lead to a least square estimate of the stimulus positions on the sensation scale.
Abstract: Thurstone’s Case V of the method of paired comparisons assumes equal standard deviations of sensations corresponding to stimuli and zero correlations between pairs of stimuli sensations. It is shown that the assumption of zero correlations can be relaxed to an assumption of equal correlations between pairs with no change in method. Further the usual approach to the method of paired comparisons Case V is shown to lead to a least squares estimate of the stimulus positions on the sensation scale.

335 citations


Book ChapterDOI
TL;DR: A test of goodness of fit is developed for Thurstone’s method of paired comparisons, Case V, where n is the number of observations per pair, and θ′ are the angles obtained by applying the inverse sine transformation to the fitted and the observed proportions respectively.
Abstract: A test of goodness of fit is developed for Thurstone’s method of paired comparisons, Case V. The test involves the computation of $$ x^2 = n\sum {(\theta '' - \theta ')^2 /821,}$$ where n is the number of observations per pair, and θ″ and θ′ are the angles obtained by applying the inverse sine transformation to the fitted and the observed proportions respectively. The number of degrees of freedom is (k − 1)(k − 2)/2.

322 citations


Journal ArticleDOI
TL;DR: In this paper, the problem of using a set of measurements on an individual to decide from which of several populations he has been drawn is considered, and the principles for choosing the rule of classification are based on costs of misclassification.
Abstract: The problem considered is the use of a set of measurements on an individual to decide from which of several populations he has been drawn. It is assumed that in each population there is a probability distribution of the measurements. Principles for choosing the rule of classification are based on costs of misclassification. Optimum procedures are derived in general terms. If the measurements are normally distributed, the procedures use one discriminant function in the case of two populations and several discriminant functions in the cases of more populations. The numerical example given involves three normal populations.

217 citations


Journal ArticleDOI
TL;DR: For the point distribution model of Lazarsfeld's latent structure analysis, the general matrix equation is stated which relates the manifest data in the form of joint occurrence matrices to the latent parameters.
Abstract: For the point distribution model of Lazarsfeld's latent structure analysis, the general matrix equation is stated which relates the manifest data in the form of joint occurrence matrices to the latent parameters. The relationship of the item responses and these joint occurrence matrices is also indicated in matrix form. A general solution for the latent parameters is then presented, which is based on the notion of factoring two joint occurrence matrices. The solution is valid under certain conditions which will usually be fulfilled. The solution assumes that estimates are available for the elements in the joint occurrence matrices with recurring subscripts, analogous to item communality or reliability. Some alternative methods of obtaining these estimates are discussed. Finally a fictitious 3-class, 8-item example is presented in detail.

93 citations


Journal ArticleDOI
TL;DR: The correlations among the thirteen personality scores yielded by the Guilford schedule for factors STDCR, and the GUILFORD-Martin schedules for factors GAMIN, and O, Ag, and Co, as reported by Lovell, were factored by the centroid method.
Abstract: The correlations among the thirteen personality scores yielded by the Guilford schedule for factors STDCR, and the Guilford-Martin schedules for factors GAMIN, and O, Ag, and Co, as reported by Lovell, were factored by the centroid method. The purpose was to see how many factors were represented by the thirteen scores; therefore the test reliabilities were used in the diagonal cells. It was found that the scores represent not more than nine linearly independent factors. The orthogonal factor matrix was rotated to oblique simple structure. Seven of the oblique factors were given tentative interpretation. Two factors were regarded as residual factors because of the small variance which they represent. The seven factors have been named Active, Vigorous, Impulsive, Dominant, Stable, Sociable, and Reflective.

67 citations


Journal ArticleDOI
TL;DR: A battery of 46 tests was given to 237 college men and a factor analysis using the Thurstone technique revealed eight clearly interpretable first-order factors, one dubious factor, and a residual factor as discussed by the authors.
Abstract: A battery of 46 tests was given to 237 college men. A factor analysis using the Thurstone technique revealed eight clearly interpretable first-order factors, one dubious factor, and a residual factor. The factors were interpreted as induction, deduction, flexibility of closure, speed of closure, space, verbal comprehension, word fluency, and number. Four second-order factors were abstracted from the matrix of first-order correlations. The presence of induction, deduction, and flexibility of closure on the first second-order factor, interpreted as an analytic factor, confirmed previous indications of relationships between the reasoning and closure factors. A second bipolar factor is interpreted as a speed of association factor. The third factor is interpreted as facility in handling meaningful verbal materials—perhaps an ability to do abstract thinking. The fourth factor is possibly a second-order closure factor—perhaps an ability to do concrete thinking.

63 citations


Book ChapterDOI
TL;DR: If customary methods of solution are used on the method of paired comparisons for Thurstone’s Case V (assuming equal standard deviations of sensations for each stimulus), when in fact one or more of the standard deviations is aberrant, all stimuli will be properly spaced except the one with the aberrant standard deviation.
Abstract: If customary methods of solution are used on the method of paired comparisons for Thurstone’s Case V (assuming equal standard deviations of sensations for each stimulus), when in fact one or more of the standard deviations is aberrant, all stimuli will be properly spaced except the one with the aberrant standard deviation. A formula is given to show the amount of error due to the aberrant stimulus.

61 citations


Journal ArticleDOI
TL;DR: In this article, the negative binomial distribution may have wide applications in the psychological field and it is demonstrated that the estimation of its parameters is often inefficient when fitting by the method of moments.
Abstract: As an analytical tool the negative binomial distribution may have wide applications in the psychological field. The estimation of its parameters is demonstrated to be often inefficient when fitting by the method of moments. This causes possibly true hypotheses to be rejected. Formulas for the efficiency of the moment method and solution of the likelihood equations are derived. Efficiency graphs and detailed tables for the $$\lambda (r,\hat p)$$ function reduce the maximum-likelihood method to a minimum of computational labour. Practical applications of the ease and power of the M.L. procedure are given.


Journal ArticleDOI
TL;DR: In this article, a comparison of the Wherry-Gaylord iterative factor analysis procedure and the Thurstone multiple-group analysis of sub-tests is made, showing that the two methods result in the same factors.
Abstract: A comparison of the Wherry-Gaylord iterative factor analysis procedure and the Thurstone multiple-group analysis of sub-tests shows that the two methods result in the same factors. The Wherry-Gaylord method has the advantage of giving factor loadings for items. The number of iterations needed can be reduced by doing a factor analysis of sub-tests, re-grouping sub-tests according to factors, and using each group as a starting point for iterations.

Journal ArticleDOI
TL;DR: In this article, an extension of the square root method has been made to the problem of selecting a minimum set of variables in a multiple regression problem, where the computations required are more compact, and anF ratio criterion is used which leads to the selection of fewer variables.
Abstract: An extension of Dwyer's “square root” method has been made to the problem of selecting a minimum set of variables in a multiple regression problem. The square root method of selection differs from the Wherry-Doolittle method primarily in that (1) the computations required are more compact, (2) anF ratio criterion is used which leads to the selection of fewer variables. The method provides solutions for the problems of test selection, item analysis, analysis of variance with disproportionate frequencies, and other problems requiring the rejection of superfluous variables. In a subsequent article a worked example will be given, and the square root and Wherry-Doolittle methods compared.

Journal ArticleDOI
TL;DR: In this paper, the authors examined the nature of psychological measurements in relation to mathematical structures and representations and examined their relationship to the real number system and the Weber-Fechner relation.
Abstract: The nature of psychological measurements in relation to mathematical structures and representations is examined. Some very general notions concerning algebras and systems are introduced and applied to physical and number systems, and to measurement theory. It is shown that the classical intensive and extensive dimensions of measurements with their respective ordinal and additive scales are not adequate to describe physical events without the introduction of the notions of dimensional units and of dimensional homogeneity. It is also shown that in the absence of these notions, the resulting systems of magnitudes have only a very restricted kind of isomorphism with the real number system, and hence have little or no mathematical representations. An alternative in the form of an extended theory of measurements is developed. A third dimension of measurement, the supra-extensive dimension, is introduced; and a new scale, the multiplicative scale, is associated with it. It is shown that supra-extensive magnitudes do constitute systems isomorphic with the system of real numbers and that they alone can be given mathematical representations. Physical quantities are supra-extensive magnitudes. In contrast, to date, psychological quantities are either intensive or extensive, but never of the third kind. This, it is felt, is the reason why mathematical representations have been few and without success in psychology as contrasted to the physical sciences. In particular, the Weber-Fechner relation is examined and shown to be invalid in two respects. It is concluded that the construction of multiplicative scales in psychology, or the equivalent use of dimensional analysis, alone will enable the development of fruitful mathematical theories in this area of investigation.

Journal ArticleDOI

Journal ArticleDOI
TL;DR: The square root method is compared with the Wherry-Doolittle method in this article, and a worked example is given which illustrates the compactness of the procedure and demonstrates that the square root algorithm can be used for selection.
Abstract: Thesquare root method of selection has been explained in a previous article. In the present article a worked example is given which illustrates the compactness of the procedure. The square root method is compared with the Wherry-Doolittle method.

Journal ArticleDOI
TL;DR: Results indicated that subjects who varied in age and mental status could be differentiate according to the parameters defining the curves of addition rate as a function of length.
Abstract: Rate of addition was studied as a function of difficulty as measured by problem length. The hypothesis was tested that the rate of addition would decline as a function of the logarithm of the number of addition operations per problem. The test material required the rapid addition of single columns of digits ranging from two to twenty-five digits in length. Rate of uncorrected addition declined as a power function of problem length and the rate of correct addition declined as an exponential function of length. Results indicated that subjects who varied in age and mental status could be differentiate according to the parameters defining the curves of addition rate as a function of length.

Journal ArticleDOI
TL;DR: The standard error of measurement is found to be related in simple fashion to the amount of information in a test in the sense of R. A. Fisher.
Abstract: (1) A new descriptive parameter for tests, the standard length, is defined and related to reliability, correlation, and validity by means of simplified versions of known formulas. (2) The standard error of measurement is found to be related in simple fashion to the amount of information in a test in the sense of R. A. Fisher. The amount of information is computable as the test length divided by the standard length of the test. (3) The invariant properties of the standard length of a test under changes in length are discussed and proved. Similar results for the correlation coefficient corrected for attenuation and the index of validity are indicated.

Journal ArticleDOI
TL;DR: In this article, a formula for estimating a point-biserialr or a tetrachoricr from an obtained phi coefficient was developed, which was shown to be equivalent to that obtained from first-order use of the tetrACHoricr series.
Abstract: Formulas are developed for estimating a point-biserialr or a tetrachoricr from an obtained phi coefficient. The estimate of a tetrachoricr, which is calledrφ, is shown to be equivalent to that obtained from first-order use of the tetrachoricr series. A tabulation is made of corrections needed to makerφ equivalent numerically to the tetrachoricr. In spite of its greater generality than estimates of tetrachoricr by previous methods, there are limitations, which are pointed out.

Journal ArticleDOI
TL;DR: The conventional scoring formula to "correct for guessing" is derived and compared with a regression method for scoring which has been recently proposed by Hamilton as mentioned in this paper, and it is shown that the usual formula,S=R−W/(n−1), yields a close approximation (correct within one point) to the maximum-likelihood estimate of an individual's "true score" on the test, if we assume that the individual "knows" or "does not know" the answer to each item, that guessing at unknown items is random, and that success at guessing is governed
Abstract: The conventional scoring formula to “correct for guessing” is derived and is compared with a regression method for scoring which has been recently proposed by Hamilton. It is shown that the usual formula,S=R−W/(n−1), yields a close approximation (correct within one point) to the maximum-likelihood estimate of an individual's “true score” on the test, if we assume that the individual “knows” or “does not know” the answer to each item, that guessing at unknown items is random, and that success at guessing is governed by the binomial law. It is also shown that the usual scoring formula yields an unbiased estimate of the individual's “true score,” when the true score is defined as the mean score over an indefinitely large number of independent attempts at the test or at equivalent (parallel) tests.

Journal ArticleDOI
TL;DR: In this article, a second-order unrotated general factor has been identified by using Thurstone's method and it seems possible to identify this factor with "g" in the first order and the second order gives indications that allow for a better interpretation of fundamental psychological activities.
Abstract: The proof of the existence of “g” is more than a methodological problem and concerns the very core of psychological theory. The principles of noegenesis should be identified experimentally before a final opinion can be rendered about “g.” Many general factors isolated in different studies are not necessarily “g.” In the present study a second-order unrotated general factor has been identified by using Thurstone's method. It seems possible to identify this factor with “g.” In the first order, factors that seem to represent the first and second principles of noegenesis have been found. The existence of synthetic and analytic activities and their interplay in intellectual performances is indicated. The relation of likeness is of great interest in explaining cognitive abilities and is isolated both as a first and second order factor. For the final identification of factors the search should be conducted beyond the elementary listing of tests. The dynamic aspects underlying factors are more meaningful than their simple description. The second order gives indications that allow for a better interpretation of fundamental psychological activities.



Journal ArticleDOI
TL;DR: In this paper, the ratio of item validity to item-total correlation is used to select items which will tend to yield the maximum correlation with a criterion, and then the items to be retained are identified by comparing the ratio for each item with the validity of the original test.
Abstract: The ratio of item validity to item-total correlation can be used to select items which will tend to yield the maximum correlation with a criterion Items to be retained are identified by comparing the ratio for each item with the validity of the original test Further improvement of the validity in the experimental sample can be obtained by adding items to or removing items from the selected nucleus, according to recomputed ratios involving the correlations of the items with the nucleus and evaluated by means of a revised cut-off point With slight variations, the method may be used for interest and personality tests as well as for aptitude material The principal advantage over previous methods is that for any cycle of the analysis an exact cut-off point is provided

Journal ArticleDOI
TL;DR: The writer has previously presented solutions for two sets of conditions and this article presents the solution for a third set of conditions, which are given a fixed amount of total testing time and the total number of items or testing time is fixed.
Abstract: Having given a fixed amount of total testing time it is important to know how long each test in the battery should be so that the correlation of the battery with the criterion will be a maximum. The precise solution for the test lengths will depend on a particular set of conditions which may be specified. The writer has previously presented solutions for two sets of conditions. This article presents the solution for a third set of conditions. These are: (1) The total number of items or testing time is fixed. (2) The score is the total number of items correctly answered. (3) The test lengths are determined in such a way that the correlation of total score with the criterion is a maximum. The solutions for the two previous sets of conditions, together with the current set, are summarized. A set of experimental data is submitted to each solution and the three sets of results are compared.

Journal ArticleDOI

Journal ArticleDOI
TL;DR: In this paper, the authors present a method to determine without waiting for criterion data what the validity of the experimental test must be in order to improve the battery validity, and the method together with the proof is presented.
Abstract: Typical selection or classification testing programs should provide for improvement of the predictive efficiency of the test battery. Such provision calls for the administration of experimental tests along with the operational battery administration and follow-up analysis to determine the value of the experimental material. It is possible to determine without waiting for criterion data what the validity of the experimental test must be in order to improve the battery validity. The method together with the proof is presented.

Journal ArticleDOI
TL;DR: The diagram is maximally useful where large numbers of coefficients are to be calculated in test item analysis and the mean criterion score of the group passing the item and the proportion of correct answers to the item is entered.
Abstract: A description is given of a diagram (available separately) for computing biserial or point biserial correlation coefficients. The diagram is maximally useful where large numbers of coefficients are to be calculated in test item analysis. The diagram is entered with the mean criterion score of the group passing the item and the proportion of correct answers to the item.