scispace - formally typeset
Search or ask a question

Showing papers in "Applied Psychological Measurement in 1977"


Journal ArticleDOI
TL;DR: The CES-D scale as discussed by the authors is a short self-report scale designed to measure depressive symptomatology in the general population, which has been used in household interview surveys and in psychiatric settings.
Abstract: The CES-D scale is a short self-report scale designed to measure depressive symptomatology in the general population. The items of the scale are symptoms associated with depression which have been used in previously validated longer scales. The new scale was tested in household interview surveys and in psychiatric settings. It was found to have very high internal consistency and adequate test- retest repeatability. Validity was established by pat terns of correlations with other self-report measures, by correlations with clinical ratings of depression, and by relationships with other variables which support its construct validity. Reliability, validity, and factor structure were similar across a wide variety of demographic characteristics in the general population samples tested. The scale should be a useful tool for epidemiologic studies of de pression.

48,339 citations


Journal ArticleDOI
TL;DR: In this article, the authors developed and applied a self-report inventory for measuring individual differences in learning processes and found that the results were positively related to performance under incidental learning instructions in both a lecture-learning and traditional verbal learning study.
Abstract: Five studies are presented—all related to the de velopment and application of a self-report inventory for measuring individual differences in learning processes. Factor analysis of items derived by trans lating laboratory learning processes into the context of academic study yielded four scales: Synthesis- Analysis, Study Methods, Fact Retention, and Elab orative Processing. There were no sex differences, and the scales demonstrated acceptable reliabilities. The Synthesis-Analysis and Elaborative Processing scales both assess aspects of information processing (including depth of processing), but Synthesis- Analysis assesses organizational processes, while Elaborative Processing deals with active, elaborative approaches to encoding. These two scales were positively related to performance under incidental learning instructions in both a lecture-learning and traditional verbal-learning study. Study Methods assessed adherence to systematic, traditional study techniques. This scale was positively related to pe...

315 citations


Journal ArticleDOI
TL;DR: The authors showed that the appropriate statistic to apply is weighted kappa with its revised standard error, and that the mini-mal number of cases required for the valid applica tion of weighted Kappa varies between 20 and 100, depending upon the size of the ordinal scale.
Abstract: It frequently occurs in psychological research that an investigator is interested in assessing the ex tent of interrater agreement when the data are measured on an ordinal scale. This monte carlo study demonstrates that the appropriate statistic to apply is weighted kappa with its revised standard error. The study also demonstrates that the mini mal number of cases required for the valid applica tion of weighted kappa varies between 20 and 100, depending upon the size of the ordinal scale. This contrasts with a previously cited large sample esti mate of 200. Given the difficulty of obtaining sam ple sizes this large, the latter finding should be of some comfort to investigators who use weighted kappa to measure interrater consensus.

204 citations


Journal ArticleDOI
TL;DR: The concept of content validity takes on special importance where invoked to justify use of a test as discussed by the authors, and the term refers to psychological measurement, using samples of behavior, sampling both stim...
Abstract: The concept of content validity takes on special importance where invoked to justify use of a test. The term 1) refers to psychological measurement, 2) using samples of behavior, sampling both stim...

178 citations


Journal ArticleDOI
TL;DR: In this article, a simple approximation of the Wright's RRC procedure is developed, which produces comparable estimates in a few seconds, and an editing algorithm for preparing item response data for calibration is appended.
Abstract: Wright's (1969) widely used "unconditional" pro cedure for Rasch sample-free item calibration is biased. A correction factor which makes the bias negligible is identified and demonstrated. Since this procedure, in spite of its superiority over "condi tional" procedures, is nevertheless slow at calibra ting 60 or more items, a simple approximation which produces comparable estimates in a few seconds is developed. Since no procedure works on data containing persons or items with infinite para meter estimates, an editing algorithm for preparing item response data for calibration is appended.

105 citations


Journal ArticleDOI
TL;DR: In this article, a set of mathematical reasoning and two sets of verbal comprehension items were cast into each of three formats (constructed response, standard multiple-choice, and Coombs multiple choice) in order to assess whether tests with iden tical content but different formats measure the same attribute.
Abstract: Two sets of mathematical reasoning and two sets of verbal comprehension items were cast into each of three formats—constructed response, standard multiple-choice, and Coombs multiple- choice—in order to assess whether tests with iden tical content but different formats measure the same attribute, except for possible differences in error variance and scaling factors. The resulting 12 tests were administered to 199 eighth-grade stu dents. The hypothesis of equivalent measures was rejected for only two comparisons: the con structed-response measure of verbal comprehen sion was different from both the standard and the Coombs multiple-choice measures of this ability. Maximum likelihood factor analysis confirmed the hypothesis that a five-factor structure will give a satisfactory account of the common variance among the 12 tests. As expected, the two major factors were mathematical reasoning and verbal comprehension. Contrary to expectation, only one of the other three factors bore a (weak) resem blance to a fo...

95 citations


Journal ArticleDOI
TL;DR: It is emphasized that the standard error of estimation should be considered as the major index of dependability, as opposed to the reliability of a test.
Abstract: Several important and useful implications in latent trait theory, with direct implications for individualized adaptive or tailored testing, are pointed out. A way of using the information function in tailored testing in connection with the standard error of estimation of the ability level using maximum likelihood estimation is suggested. It is emphasized that the standard error of estimation should be considered as the major index of dependability, as opposed to the reliability of a test. The concept of weak parallel forms is expanded to test-

85 citations


Journal ArticleDOI
TL;DR: In this article, a linear loss function is used for computing a cutting score that minimizes the risk for the decision rule, which is demonstrated with a criterion-referenced achievement test of elementary statistics administered to 167 students.
Abstract: The situation is considered in which a total score on a test is used for classifying examinees into two categories: "accepted (with scores above a cutting score on the test) and "not accepted" (with scores below the cutting score). A value on the latent variable is fixed in advance; examinees above this value are "suitable" and those below are "not suitable." Using a linear loss function, a procedure is described for computing a cutting score that minimizes the risk for the decision rule. The procedure is demonstrated with a criterion-referenced achievement test of elementary statistics administered to 167 students.

69 citations


Journal ArticleDOI
TL;DR: In this article, three studies were described in which choice reac tion time (RT) was related to such psychometric ability measures as verbal comprehension, nu merical reasoning, hidden figures, and progressive matrices tests.
Abstract: Three studies are described in which choice reac tion time (RT) was related to such psychometric ability measures as verbal comprehension, nu merical reasoning, hidden figures, and progressive matrices tests. Although fairly consistent negative correlations were found between these tests and choice RT when high school samples were used, differences from study to study highlight the need to develop more reliable measures for cognitive laboratory procedures and to study these in popu lations that are more broadly representative of hu man cognitive power.

54 citations


Journal ArticleDOI
TL;DR: This article investigated the effects of administer ing a personality inventory by computer and found that significant differences exist between paper-pencil and computer administrations of the MMPI on the cannot say (?) scale and scale 6 (Paranoia).
Abstract: This study investigated the effects of administer ing a personality inventory by computer. Both the results of the initial study and a replication suggest that significant differences exist between paper-pen cil and computer administrations of the MMPI on the cannot say (?) scale and scale 6 (Paranoia). However, there appears to be no set of items that would account for these scale differences. Dif ferences on the ? scale were explained in terms of the different methods used to omit items in each condition. Differences on scale 6 were small, and the clinical significance of that difference needs to be investigated further. Implications for future re search on computer-administered personality in struments are discussed.

44 citations


Journal ArticleDOI
TL;DR: The present research simulated the responses of 75 subjects responding to 30 items under the Birnbaum and Rasch models and attempted a fit to the data using the Rasch model, finding the poorest overall fit appeared within the uniform distribution.
Abstract: Among the varieties of logistic models, those at tributed to Birnbaum (involving the parameters of item discrimination, item difficulty, and person ability) and Rasch (involving only item difficulty and person ability) have received attention. The present research simulated the responses of 75 subjects responding to 30 items under the Birn baum model and then attempted a fit to the data using the Rasch model. When item discriminations varied from a variance of .05 to .25 within distribu tions of different form (uniform, normal, and posi tively skewed), the poorest overall fit appeared within the uniform distribution. For each distribu tion there was only a slight increase in the lack of fit as the variances increased.

Journal ArticleDOI
TL;DR: In this paper, an application of Samejima's latent trait model for continuous responses is reported, and a brief review of latent trait theory is presented, including an elaboration of the theory for test responses.
Abstract: This paper reports an application of Samejima's latent trait model for continuous responses A brief review of latent trait theory is presented, including an elaboration of the theory for test resp

Journal ArticleDOI
TL;DR: In this article, the authors computed correlation between undergraduate and graduate grade point averages as well as between these and standard graduate and professional school tests, and found that early grades are more highly predictable from aptitude tests than later grades.
Abstract: Using a model followed in earlier research, corre lations were computed between undergraduate and graduate grade-point averages as well as between these and standard graduate and professional school tests. Approximately 1200 law school stu dents constituted a professional school sample and another 1200 students in mathematics, physics, and chemistry constituted a graduate school sample. Earlier findings were replicated. In addition, it is shown that both graduate and professional school grades form simplex matrices and that early grades are more highly predictable from aptitude tests than later grades. There is evidence for a single simplex matrix extending through the four under graduate and three post-graduate years only in the law school sample. There are two separate simplex matrices for the two levels in the graduate school sample. Correlations between test scores and under graduate grades are biased to very low values in the professional school sample by a compensatory selec tion system, but both ap...

Journal ArticleDOI
TL;DR: Results indicated that for the high-ability group, mean test scores under KR conditions were significantly higher than were those under no-KR conditions on both the conventional and adaptive tests, but the difference was statistically significant only within the conventional testing strategy.
Abstract: This study investigated the effects of immediate knowledge of results and adaptive testing on per formance on a computer-administered test of verbal ability Examinees were administered either a 50- item conventional test or an adaptive test of verbal ability; half the subjects in each group received im mediate knowledge of results (KR) concerning the correctness/incorrectness of each item response, while the other half did not Subjects within high- and low-ability subgroups were assigned randomly to one of the four resulting experimental condi tions The dependent variable was maximum likeli hood ability estimates derived from item response patterns Results indicated that for the high-ability group, mean test scores under KR conditions were significantly higher than were those under no-KR conditions on both the conventional and adaptive tests Within the low-ability group, mean test scores were higher under KR conditions than under no- KR conditions, but the difference was statistically significant onl

Journal ArticleDOI
TL;DR: In this paper, an attempt was made to replicate the selection of items for the Bem Sex Role Inventory (BSRI) and the 20 masculine, 20 feminine, and 20 neutral items of the BSRI were rated for social desirability.
Abstract: An attempt was made to replicate the selection of items for the Bem Sex Role Inventory (BSRI) The 20 masculine, 20 feminine, and 20 neutral items of the BSRI were rated for social desirability "in

Journal ArticleDOI
TL;DR: The reorientation of experimental psychology from studying performance to studying cognitive processes has created a new potential for under standing ability tests in terms of the nature of the cog....
Abstract: The reorientation of experimental psychology from studying performance to studying cognitive processes has created a new potential for under standing ability tests in terms of the nature of the cog...

Journal ArticleDOI
TL;DR: In this paper, a matrix based on 352 items and 214 subjects drawn from 31 colleges was decomposed according to the Eckart-Young theorem, the factor loading matrix was formed, and then was rotated to an orthogonal target matrix of the 22 PRF-E scales.
Abstract: Multiscale personality inventories have rarely, if ever, demonstrated factor structure broadly consis tent at the item level with scale keys. An item fac tor analysis of the 352 items of PRF-E was under taken to evaluate the extent to which PRF items de fine separate and distinct factors corresponding to keyed scales. A matrix based on 352 items and 214 subjects drawn from 31 colleges was decomposed according to the Eckart-Young theorem, the factor loading matrix was formed, and then was rotated to an orthogonal target matrix of the 22 PRF-E scales. Inspection of the rotated matrix showed that only 2 of the 352 items failed to load in the keyed direc tion. The mean loading of items on their scale fac tor was .38; the mean loading of non-scale items was .09. This strong tendency for scale items to load more highly than non-scale items was also re flected in the majority of scale items being among the 16 highest loadings on each scale factor. For three scales, Dominance, Harmavoidance, and Or der, the scale...

Journal ArticleDOI
TL;DR: In this paper, staff members of the Psychology Department at the University of Oregon rated each other's height on five rating scales representative of those found in social psychology, and when the ratings were aged, a very good estimate of true physical height was obtained.
Abstract: Staff members of the Psychology Department at the University of Oregon rated each other's height on five rating scales representative of those found in social psychology. When the ratings were aver aged, a very good estimate of true physical height was obtained. Further, factor scores based on all five scales proved to be even better estimates of true height; the correlation between such scores and height in inches was .98.

Journal ArticleDOI
TL;DR: The question underlying debate in both areas is whether the refinement and continuous scaling of information derived from primitive conceptualizations or measurement strategies is likely to prove stimulating or stifling to further scientific groBB1h.
Abstract: comes stale or unproductive. The expressive range of methodological language also shapes the generation of theory and, in much the same manner that practical media and formal structural constraints influence art and literature, the limitations of theoretical and methodological constructs may prove stimulating or stifling to a science at a particular stage of maturity. In any science, a period of primarily methodological rather than substantive development may sometimes be necessary to unblock the logjam created by theories and measurements which cannot effectively interact through existing tools. Discrete Multivariate Alla(vsis arrives at a time when the various psychosocial disciplines are all suffering, to varying degrees, from attempts to swallow whole those chunks of statistical methodology for continuous data that have been most successful in the natural science disciplines. The movement to quantification of psychological and social research has been motivated, in large measure, by a desire to legitimatize behavioral science through application of the &dquo;hard science&dquo; criteria of objectivity and reproducibility to statements of and data analyses relating to behavioral paradigms. Passionate advocacy of multiple regression and other multivariate analytic tools has been matched by claims that such tools have forced their proponents, through compromises necessary in measurement, data preparation and formal hypothesis construction, to distort and ultimately trivialize their science in order to accommodate the prerequisites of statistical analysis. The intensity of this debate between &dquo;traditionalists&dquo; and &dquo;methodologists&dquo; has shown no sign of abating in the last 10 years: indeed. parallel discussions in the area of medical clinical research display the same basic concern as that which troubles academic social scientists. The question underlying debate in both areas is whether the refinement and continuous scaling of information derived from primitive conceptualizations or measurement strategies is likely to prove stimulating or stifling to further scientific groBB1h. Much of the urgency and stridency of discourse on this issue is certainly due to the perceived disarray of statistical methodology appropriate to categorical information, consisting of observations which fall into nominal, ordinal or scaled classes. The analysis of such discrete data has long been limited in scope and convenience by the basic dependence of available methods on the dimensionality

Journal ArticleDOI
TL;DR: In this article, four Monte Carlo simulation studies of Owen's Bayesian sequential procedure for adaptive mental testing were conducted, where the authors explored a number of additional properties, both in a normally distributed population and in a distribution-free context.
Abstract: Four monte carlo simulation studies of Owen's Bayesian sequential procedure for adaptive mental testing were conducted. In contrast to previous simulation studies of this procedure which have concentrated on evaluating it in terms of the corre lation of its test scores with simulated ability in a normal population, these four studies explored a number of additional properties, both in a normally distributed population and in a distribution-free context. Study 1 replicated previous studies with finite item pools, but examined such properties as the bias of estimate, mean absolute error, and cor relation of test length with ability. Studies 2 and 3 examined the same variables in a number of hypo thetical infinite item pools, investigating the effects of item discriminating power, guessing, and vari able vs. fixed test length. Study 4 investigated some properties of the Bayesian test scores as latent trait estimators. The properties of interest included the conditional bias of the ability estimates, the info...

Journal ArticleDOI
TL;DR: In this paper, the latent roots as analyzed in three ways show a clear but small increase in the number of common factors during this time period, particularly for the white groups, and rotated factor loadings also support the differentiation hypothesis.
Abstract: Factor analyses have been computed in samples of white male and female and black male and fe male students for the same 16 cognitive variables at grade levels 5, 7, 9, and 11. Samples for each of the four independent groups remained constant at the four grade levels. The latent roots as analyzed in three ways show a clear but small increase in the number of common factors during this time period, particularly for the white groups. Rotated factor loadings also support the differentiation hypothesis. For the white males, who showed the clearest evi dence for differentiation of abilities, rotated load ings provide descriptions of the emerging factors. Although the evidence for differentiation is less clear in white females, the emerging factors appear to become identical by the 11th grade. Data for black males and females, which are based on small er Ns, are more ambiguous.

Journal ArticleDOI
TL;DR: Interest inventories are frequently validated against group membership criteria as mentioned in this paper, and two approaches are considered, only one of which is commonly used: the most common approach is to use interest inventories to predict which occupation counselors will enter or prefer, and the second is to suggest occupations for counselors to consider on the basis of compatibility of interests.
Abstract: Interest inventories are frequently validated against group membership criteria. Two approaches are considered, only one of which is commonly used. The choice between the two approaches depends on the application being validated. The first and most common approach assumes that interest inventories are to be used in predicting which occupation coun selees will enter or prefer. The second assumes that interest inventories are to be used in suggesting oc cupations for counselees to consider on the basis of compatibility of interests. Validation of these two uses of interest inventories requires different treat ment of criterion group base rates. As illustrated by data drawn from a published study, the two ap proaches to validation can produce substantial dif ferences in criterion group hit rates. Such differ ences may be found in any study validating group membership predictions if criterion group sizes vary greatly.

Journal ArticleDOI
TL;DR: A review of the psychological, sociological and educational literature indicated that the various conceptualizations of "alienation" could be fitted into five tentative categories appearing to have... as discussed by the authors.
Abstract: A review of the psychological, sociological and educational literature indicated that the various conceptualizations of "alienation" could be fitted into five tentative categories appearing to have...

Journal ArticleDOI
TL;DR: In this article, eleven indicators of intelligence and 10 measures of short-term learning were studied in a sample of 265 fourteen-year-olds using the inter-battery methods developed by Tucker.
Abstract: Eleven indicants of intelligence and 10 measures of short-term learning were studied in a sample of 265 fourteen-year-olds using the inter-battery methods developed by Tucker The results indicated

Journal ArticleDOI
TL;DR: The authors compared two approaches to scoring a Psychological Climate Questionnaire, an empirical keying of items using item analysis and a rational approach which focused on identifying the salient features of the questionnaire items.
Abstract: The present study compared two approaches to scoring a Psychological Climate Questionnaire— an empirical keying of items using item analysis and a rational approach which focused on identifying the...

Journal ArticleDOI
TL;DR: In the real world of statistical power and sample size calculation, this paper showed that in the "real world" it is possible to compute statistical power without error. But, this is not the case in the statistical power calculation of textbooks.
Abstract: "Textbook" calculations of statistical power and/or sample size follow from formulas that as sume that the variables under consideration are measured without error. However, in the "real world" of ...

Journal ArticleDOI
TL;DR: Owen's (1969) Bayesian tailored testing method is introduced along with a brief review of its deriva tion, and the characteristics of a good item bank are outlined and explored in terms of their influence on the Bayesian tailoring process.
Abstract: Owen's (1969) Bayesian tailored testing method is introduced along with a brief review of its deriva tion. The characteristics of a good item bank are outlined and explored in terms of their influence on the Bayesian tailoring process. The results clearly demonstrate importance of a good item bank; one having a sufficient number of items with high dis crimination, low guessing probability, and a uniform distribution of difficulty.

Journal ArticleDOI
TL;DR: In this article, a method for studying relationships among groups in terms of categorical data patterns is described, which yields a dimensional rep resentation of configural relationships among mul tiple groups and a quantitative scaling of cate gorical data pattern for use in subsequent assign ment of new individuals to the groups.
Abstract: A method for studying relationships among groups in terms of categorical data patterns is de scribed. The procedure yields a dimensional rep resentation of configural relationships among mul tiple groups and a quantitative scaling of cate gorical data patterns for use in subsequent assign ment of new individuals to the groups. Two ex amples are used to illustrate potential of the method. In the first, profile data that were pre viously analyzed by metric multiple discriminant function analysis are reanalyzed by the nonmetric categorical data pattern technique with highly similar results. The second example examines re lationships among psychiatric syndrome groups in terms of similarities in patterns of categorical background variables. Results appear consistent with other available information concerning the epidemiology of psychiatric disorders.

Journal ArticleDOI
TL;DR: The authors investigated the use of the Rasch simple logistic model in obtaining test-free ability estimates and found that raw-score ability estimates are influenced by the difficulty of the items used in measurement.
Abstract: This research investigated the use of the Rasch simple logistic model in obtaining test-free ability estimates. Two tests each of word, picture, symbol, and number analogies were administered to college and high school students. Differences between scores on each pair of tests were analyzed to deter mine whether the ability estimates were independ ent of the tests employed. The results indicate that raw-score ability estimates are influenced by the difficulty of the items used in measurement but that Rasch ability estimates are relatively independ ent of the difficulty of these items. The need is dis cussed for additional research in which an indi vidualized item-presentation procedure is used with the Rasch model.

Journal ArticleDOI
TL;DR: The GRIP tests were found to be useful for measuring short- term memory and sequential reasoning abilities.
Abstract: A battery of Graphic Information Processing Tests (GRIP) was developed to utilize the display characteristics of computer terminals in measuring abilities important for processing visually presented information. The GRIP battery was especially in tended to assess five "real world" personal at tributes which have been difficult to measure with paper-and-pencil tests. The experimental tests were administered to 385 Navy enlisted men and evalu ated in conjunction with paper-and-pencil tests of the same attributes as well as with operational cog nitive tests and biographical variables. The GRIP tests were found to be useful for measuring short- term memory and sequential reasoning abilities.