scispace - formally typeset
Search or ask a question

Showing papers in "Educational and Psychological Measurement in 1960"


Journal ArticleDOI
Jacob Cohen1
TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.
Abstract: CONSIDER Table 1. It represents in its formal characteristics a situation which arises in the clinical-social-personality areas of psychology, where it frequently occurs that the only useful level of measurement obtainable is nominal scaling (Stevens, 1951, pp. 2526), i.e. placement in a set of k unordered categories. Because the categorizing of the units is a consequence of some complex judgment process performed by a &dquo;two-legged meter&dquo; (Stevens, 1958), it becomes important to determine the extent to which these judgments are reproducible, i.e., reliable. The procedure which suggests itself is that of having two (or more) judges independently categorize a sample of units and determine the degree, significance, and

34,965 citations


Journal ArticleDOI
TL;DR: A survey of available computer programs for factor analytic computations and a analysis of the problems of the application of computers to factor analysis.
Abstract: more stodgy and less exciting application of computers to psychological problems. Let me warn you about how I am going to talk today. I have not conducted a survey of available computer programs for factor analytic computations, nor have I done an analysis of the problems of the application of computers to factor analysis in any way that could be considered scientific. I am saying that I shall ask you to listen to my opinions about the applications of computers to factor

9,914 citations


Journal ArticleDOI
TL;DR: However, it will be argued that there are some serious m1iiemphases in our use of statistical methods, which are retarding the growth of psychology as discussed by the authors, which is not a healthy sign.
Abstract: MosT psychologists probably will agree that the emphasis on statistical methods in psychology is a healthy sign. Although we sometimes substitute statistical elegance for good ideas and overembellish small studies with elaborate analyses, we are probably on a firmer basis than we were in the prestatistieal days. However, it will be argued that there are some serious m1iiemphases in our use of statistical methods, which are retarding the growth of psychology. ’

161 citations


Journal ArticleDOI
TL;DR: A method of analysis which would enable investigators to analyze large matrices into hierarchical types, as illustrated in Chart 1, is needed.
Abstract: ELEMENTARY linkage analysis (McQuitty, 1957b) is a rapid method for the isolation of types, but it results in a first-level classification only. Agreement analysis (McQuitty, 1956), on the other hand, classifies into successive levels such as species, genera, families, etc., but it has the disadvantage of being prohibitively laborious if performed by pencil and paper on other than a small matrix; large matrices require an electronic computer and even then the analyses are relatively laborious. Typological studies, which have classified subjects objectively into types on the basis of major patterns of responses, indicate that there are many psychological types and that large sets of data are essential to isolate them (McQuitty, 1957a). A method of analysis which would enable investigators to analyze large matrices into hierarchical types, as illustrated in Chart 1, is needed. This paper develops and illustrates such a method. It can be performed by pencil and paper on even relatively large matrices.

102 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a multiple correlation model to minimize the difference between standard predictor scores, zp-zc, and standard criterion scores, Zc, which can be termed D. Since the concern is with accuracy of prediction, overprediction and under-prediction of the same degree are considered equal errors, and the sign of D can be ignored.
Abstract: THE thinking about problems of prediction has been dominated by the multiple correlation model. As a consequence, in test development and validation attention is directed entirely toward seeking tests which correlate highly with the criterion and low with each other, and there are faint hopes of discovering a suppressor variable. The objective is to develop a test yielding standard scores that directly reflect standard criterion scores. That is, the attempt is made to minimize the difference between standard predictor scores, z~, and standard criterion scores, Zc. This difference, zp-zc, can be termed D. Since the concern is with accuracy of prediction, over-prediction and under-prediction of the same degree are considered equal errors. Consequently, the sign of D can be ignored. When the validity coefficient is 1.00 and the prediction is perfect, it is apparent that the D scores of all individuals are zero. However, with any given validity that is less than unity, the D scores vary for different individuals. For some individuals D is zero or very small, for some it has an intermediate magnitude, and for others it is large. The lower the correlation between predictor and criterion scores, the larger is the range of D scores. For a given test and a given criterion, then, individuals can be differentiated in terms of the degree to which their scores on the predictor reflect their criterion scores. For those individuals with high D scores, prediction of criterion scores from test scores is poor; for those with low D scores, prediction is good. Now suppose scores on a second test are related to D scores. Then

97 citations



Journal ArticleDOI
TL;DR: In this paper, it is argued that simple structure and orthogonality are incompatible and that any program which is restricted to orthogonal programs must be rejected as a basis for general scientific research.
Abstract: OBTAINING a unique rotational resolution of factor analyses, either in exploring an entirely new domain or in seeking to confirm hypotheses, depends at present on attaining simple structure, since the proportional profiles method (Cattell & Cattell, 1955) still lacks completion. Both theoretical arguments and extensive empirical results show that simple structure and orthogonality are, except for rare accident, incompatible. Available computer programs using analytical methods (whether aiming, incidentally, at oblique or orthogonal resolutions of rotation) use either the principle of maximizing fourth powers, as in Wrigley’s quartimax (Neuhaus & Wrigley, 1954) and Saunders’, Pinzka’s, and Dickman’s oblimax (Pinzka & Saunders, 1954) or on the distinct principle involved in Kaiser’s varimax (1959) and in Carroll’s oblimin, biquartimin and his other programs (1953; 1957; 1958). It is generally conceded-and it is certainly the writers’ experience-that in the orthogonal programs, Carroll’s designs and Kaiser’s varimax are best. However, any program which is restricted to orthogonality must be rejected as a basis for general scientific research. Unfortunately, also, Kaiser’s oblique varimax and some of Carroll’s programs have the inherent vice that they attempt to avoid

64 citations


Journal ArticleDOI
TL;DR: In this paper, it is assumed that scientifically significant patterns can be extracted from configurations possessed by many people, and these patterns of characteristics are important in the sense that two or more persons are identical or similar with respect to them; the pattern can be used to classify people into categories.
Abstract: Even though every person is presumed to be unique in terms of all the characteristics which he possesses, it is assumed that scientifically significant patterns can be extracted from configurations possessed by many people. These patterns of characteristics are important in the sense that two or more persons are identical or similar with respect to them; the patterns can be used to classify people into categories.

53 citations


Journal ArticleDOI
TL;DR: In this article, Ghiselli et al. developed a test which does classify individuals on better than chance basis in terms of the extent to which the scores they earn on a predictor test are related to their scores.
Abstract: on a given criterion (Ghiselli, 1956, 1960). Such determinations, made on the basis of the score the individual obtains on a third variable, are by no means perfect. Nevertheless, one can develop what might be called a &dquo;predictability&dquo; test which does classify individuals on better than a chance basis in terms of the extent to which the scores they earn on a predictor test are related to their

52 citations


Journal ArticleDOI
TL;DR: This paper reported that personality structure does not change radically from early childhood to maturity, that the factors previously isolated in adults are found in similar number and with similar frequency in early childhood subjects.
Abstract: RECENTLY Cattell and various colleagues (Cattell & Coan, 1957; Cattell & Gruen, 1953; Peterson & Cattell, 1959) have reported a series of factor analytic investigations of personality in children. Subjects in early, middle, and late childhood have been examined through use of ratings, questionnaires, and objective tests, factor analyses have been conducted, and the results compared with those previously obtained from study of adults. Although some agerelated changes have been reported, the major conclusion of these studies has been that personality structure does not change radically from early childhood to maturity, that the factors previously isolated in adults are found in similar number and with similar

50 citations


Journal ArticleDOI
TL;DR: A number of recent investigations have been concerned with the problem of what can be done with data which are in the form of profiles as mentioned in this paper, and many investigators have felt that the arrangement or pattern of a profile can often be more meaningful than any one, or all, of the individual profile measures considered separately.
Abstract: A number of recent investigations have been concerned with the problem of what can be done with data which are in the form of profiles. Many investigators have felt that the arrangement or pattern of a profile can often be more meaningful than any one, or all, of the individual profile measures considered separately. Such reasoning is implicit, for example, in the construction of profiles for objective and projective personality tests like the Rorschach and MMPI, and for various measures of intelligence, values, interests, and aptitudes. Assuming that a profile may yield more useful information than each of the individual measures it contains, we can proceed to a brief review of some of the ways in which profiles have been used. Among the more elementary comparisons which have been made with profile data are the following:

Journal ArticleDOI
TL;DR: In this paper, a list of 59 guidance films are annotated with three ready references which the reader should find very useful, including a reprint of the Psychological Corporation Test Bulletin on Methods of Expressing Test Scores.
Abstract: errors and inconsistencies throughout the book. Among the more serious are the alternate use of the expressions &dquo;data are&dquo; and &dquo;data is,&dquo; and the careless reference to Karl Rogers of the University of Chicago early in the text, to be followed later correctly as Carl Rogers of the University of Wisconsin. The appendices contain three ready references which the reader should find very useful. Appendix A is a reprint of the Psychological Corporation Test Bulletin on Methods of Expressing Test Scores. Appendix B is an annotated list of 59 guidance films. Appendix C is

Journal ArticleDOI
TL;DR: One of the more common forms of cheating occurs when a student copies answers from one or more of his neighbors in the testing situation as mentioned in this paper, and this type of cheating is usually so cleverly done that although
Abstract: SINCE early in the history of objective or &dquo;new-type&dquo; tests, users of them have been concerned with their peculiar susceptability to cheating (Bird, 1927). Perhaps the most widely used, and certainly the most widely recommended, type of these tests is the multiplechoice variety in its several forms, and one of the more common forms of cheating occurs when a student copies answers from one or more of his neighbors in the testing situation. This type of cheating on multiple-choice tests is usually so cleverly done that although

Journal ArticleDOI
TL;DR: In this article, a review describes fourteen factors of physical proficiency identified from previous research and other possible factors which might be discovered were also described, and a number of questions were raised regarding the structure of skill in this area and suggestions were made for future studies to answer these questions.
Abstract: : This review describes fourteen factors of physical proficiency identified from previous research. Other possible factors which might be discovered were also described. A number of questions were raised regarding the structure of skill in this area and suggestions were made for future studies to answer these questions. Several things are clear. There is no such thing as general physical proficiency. The problem is a multidimensional one. It is also clear that previous studies comparing American youth with youth of other countries have assessed only a small number of the factors already identified. Eventually the development of a battery of basic reference tests which will provide comprehensive coverage of abilities in this area is anticipated. Such measures would also allow an assessment of the relative contributions of the component abilities to a variety of different, more complex, athletic performances. An outline and description of tests which might be included in such studies is presented. Some are well known tests but others are new ideas. This outline also provides an interim report of what abilities such tests probably measure.

Journal ArticleDOI
TL;DR: This paper defined bias in attitude measurement as any response to a statement that is a result of something other than agreement or disagreement with the statement itself; thus, it may take the form of deliberate faking, response set, or simple inaccurate estimation of one's own opinions.
Abstract: THE traditional method of attitude measurement has been to ask subjects to indicate their agreement or disagreement with statements which reflect their attitude toward some psychological object. These statements usually are selected from a larger pool of statements through the use of a scaling or rating technique. Although the subject’s response to the statements is generally assumed to be a true representation of his attitude, there is no assurance that his response actually does represent his attitude accurately. In this traditional type of attitude measurement the responses may be subject to many types of bias since no major attempt is made to control bias. Bias in attitude measurement is defined as any response to a statement that is a result of something other than agreement or disagreement with the statement itself; thus, it may take the form of deliberate faking, response set, or simple inaccurate estimation of one’s own opinions. The issue of whether the subject’s response to statements represents his true attitude is particularly crucial for the research area

Journal ArticleDOI
TL;DR: The authors examined the role of two stylistic consistencies in Gough's (1957) California Psychological Inventory (CPI) : (a) acquiescence and (b) the tendency to respond consistently in
Abstract: an understanding that has continued to grow ever since Cronbach’s (1946) review of evidence related to response sets in psychological testing. Jackson and Messick (1958) have recently called for a clear distinction between the interpretation of behavior in terms of content and of style. As applied to personality questionnaires, this distinction calls attention to the fact that the response-evoking properties of the particular item form may contribute consistently to the variance of a test above and beyond the variance attributable to content. The separation of these two general sources of variance is essential both to further the understanding and the improvement of the logical validity of content variables and to develop potentially useful measures of stylistic consistencies. In this study we shall examine the role of two stylistic consistencies in Gough’s (1957) California Psychological Inventory (CPI) : (a) acquiescence and (b) the tendency to respond consistently in

Journal ArticleDOI
TL;DR: This paper found that noncompulsive students are more predictable than compulsive students, as judged by correlations between interest blank scores and freshman average grade, seems to hold only for the occupational keys most logically related to engineering (Mathematician, Physicist, Engineer, and Chemist).
Abstract: The concept “differential predictability” refers to the idea that people may vary in the extent to which their behavior is predictable by some predictor measure. A previous study showed that freshman engineering grades of students classified as “noncompulsive” were more predictable by Strong Vocational Interest Blank scores than were the grades of “compulsive” students. Two rough indicators of compulsiveness were used: (1) tendency to resemble accountants on the Accountant scale of the Vocational Interest Blank and (2) being above the regression line of reading speed score on vocabulary score for the Cooperative English Test C2: Reading Comprehension. The study was replicated, using freshman students in the School of Engineering at Princeton. The finding that noncompulsive students are more predictable than compulsive students, as judged by correlations between interest blank scores and freshman average grade, seems to hold only for the occupational keys most logically related to engineering (Mathematician, Physicist, Engineer, and Chemist) when the groups are defined on the basis of reading speed relative to vocabulary.

Journal ArticleDOI
TL;DR: The most relevant to the present study are the recent investigation of Terrien (1955), which suggests that behavior within the teaching profession is channelized into systems leading to identifiable occupational types, and the study of Stern, Stein, and Bloom (1956), who contend that knowledge of unconscious personal motives contributes to the effectiveness of predictions of success in teaching.
Abstract: or as trainees (Callis, 1953; Cook, Leeds & Callis, 1949; Dodge, 1943; Leeds, 1950; Soderquist, 1935), there have been few studies of unconscious motivation for teaching and its significance for a teaching career. With the exception of Waller (1932), the analysis of motives for entering the teaching profession has been more typically confined to a tabulation of the backgrounds and expressed interests of teachers and teacher trainees (Best, 1948; Gould, 1924; Seagoe, 1942; Tudhope, 1944). In recent years, however, many areas of psychology, sociology, and anthropology have become oriented toward the investigation of men’s unconscious motives and purposes in pursuing specific courses of action; potential applications in industry (Henry, 1949), science (Kubie, 1953; Roe, 1953; Stein, Mackenzie, Rodgers & Meer, 1955), and the ministry (Stern, 1954) have begun to appear in the literature. The most relevant of these to the present study are the recent investigation of Terrien (1955), which suggests that behavior within the teaching profession is channelized into systems leading to identifiable occupational types, and the study of Stern, Stein, and Bloom (1956), who contend that knowledge of unconscious personal motives contributes to the effectiveness of predictions of success in teaching.

Journal ArticleDOI
TL;DR: Two excellent contributions to psychological thinking call attention to the complexities involved in the objective study of problem solving and suggest ways of attacking the problem.
Abstract: IN his classical study on Problem Solving, Karl Duncker (1945) states, &dquo;A problem arises when a living creature has a goal but does not know how this goal is to be reached.&dquo; He also tries to answer the questions, &dquo;How does the solution arise from the problem situation ?&dquo; and &dquo;In what ways is the solution of a problem attained?&dquo; Several years later B. S. Bloom and L. J. Broder (1950) in their monograph, &dquo;Problem Solving Processes of College Students,&dquo; stress the importance of the analysis of mental processes rather than of mental products. These two excellent contributions to psychological thinking call attention to the complexities involved in the objective study of problem solving and suggest ways of attacking the problem. Numerous attempts have been made to evaluate thought and concept formation processes, and quite often they have been criticized for their lack of objectivity. For instance, in asking the subject to introspect or to think aloud, the experimenter is seldom sure that other observers would reach the same conclusion, to say nothing of the semantic difficulties involved and the lack of adequate controls. Some studies are mostly interesting anecdotal descriptions colored by the biases of the experimenter. With the development of psychological tests, a considerable improvement was made in terms of control of certain variables. But psychological tests do not provide ways for exploring the processes

Journal ArticleDOI
TL;DR: In this article, an objective measure, in contradistinction to voluntary self-description, is described as a test in which the subject believes that he should emphasize accuracy of response because correct answers exist that can be externally evaluated.
Abstract: j ect’s behavior is assessed without his being aware of the manner in which that behavior affects the scoring and interpretation. In Campbell’s typology of tests (1957) an objective measure, in contradistinction to voluntary self-description, is described as a test in which the subject believes that he should emphasize accuracy of response because correct answers exist that can be externally evaluated. &dquo;Indirect measurement&dquo; for Campbell (1957) involves the interpretation of test responses in ways other than those anticipated by the respondent; some objective tests of personality employ a disguise of

Journal ArticleDOI
TL;DR: The authors found that distinct differences exist in interests, preferences, motivation, interpersonal relationships, and general life-style among individuals who make different scores on the two parts of such tests (Altus, 1952; Goldstein, 1935; Munroe, 1946; Rich, 1928).
Abstract: TESTS which purport to differentiate and measure quantitative and verbal abilities of individuals are used extensively in educational and vocational counseling as aids in predicting, among other things, probable success in college. Although very little has been done on the biological basis for performance on such tests, there is considerable evidence from clinical and counseling work that distinct differences exist in interests, preferences, motivation, interpersonal relationships, and general life-style among individuals who make different scores on the two parts of such tests (Altus, 1952; Goldstein, 1935; Munroe, 1946; Rich, 1928). Sometimes where the

Journal ArticleDOI
TL;DR: In this paper, the relationship between responses and the sex of the subject was investigated using the Maudsley Personality Inventory, a measure designed to provide a numerical system of a subject's neuroticism and extraversion, and the relationships with sex and class were established on the basis of analysis of variance and comparisons of mean scores of different groups.
Abstract: quota samples are under investigation. Similarly, and less excusably, little is known about the relationship between such responses and the sex of the subject. The work of Eysenck (1958) has suggested that according to their questionnaire responses women are somewhat more neurotic than men, that men are somewhat more extraverted than women, and that working class subjects are slightly more neurotic than middle class subj ects. These results were reported in connection with standardization studies of the short and long scales, respectively, of the Maudsley Personality Inventory, a measure designed to provide a numerical system of a subject’s neuroticism and extraversion (Eysenck, 1956; Eysenck, 1958-59; Eysenck, in press). The relationships with sex and class were established on the basis of analysis of variance and comparisons of mean scores of different groups. The present study, while also making use of the short scale of the Maudsley Personality Inventory, is in addition using three further short scales relating to the personality traits of rigidity, emotionality, and nervousness; the items were selected and adapted from the scales published by Nigniewitzky (1955), Guilford (1939), and Evans and McConnell

Journal ArticleDOI
TL;DR: Rimland as discussed by the authors developed an improved form (Form 6) of the Navy Arithmetic Test (ARI) for enlisted personnel, and two modifications were being considered: a) changes in the time limits for the two subtests and b) the use of right answer not given as a response in the reasoning subtest.
Abstract: THis research was conducted during the development of an improved form (Form 6) of the Navy Arithmetic Test (ARI) for enlisted personnel. Two modifications were being considered: a) changes in the time limits for the two subtests and b) the use of &dquo;right answer not given&dquo; as a response in the reasoning subtest. The development of the ARI test and other research conducted during its development are reported elsewhere (Rimland, in press; Rimland, 1958). The computation subtest of the ARI, which consists of 20 multiple-choice items, has a 12-minute time limit. The reasoning subtest consists of 30 items to be completed in 35 minutes. Past forms of

Journal ArticleDOI
TL;DR: In this paper, the authors investigated the validity of a battery of tests to predict the performance of doctoral candidates on a comprehensive series of examinations which partially determine admittance to the doctoral training program.
Abstract: Problem. The problem investigated in this study was to determine the validity of a battery of tests to predict the performance of doctoral candidates on a comprehensive series of examinations which partially determine admittance to the doctoral training program. Description of Predictor Variables. The predictor variables consisted of the Aptitude Test of the Graduate Record Examinatiors which yields separate verbal and quantitative scores and the Area Tests of the GRE which furnish scores in the areas of Natural Sci-

Journal ArticleDOI
TL;DR: The use of time limits in tests has been argued to permit testing arrangements to be more efficient, more easily scheduled, and the like; to make scores more comparable, in that testing everyone under a given time limit supposedly means the subjects have been given the test under more nearly equivalent conditions; or to be inherent to the nature of the task, that is, to be necessary in view of what one wishes to measure.
Abstract: Since there are some well-known tests for which no time limit is specified, it is pertinent to ask why time limits are imposed when they are. It seems to me that the use of time limits stems from three considerations: (a) practical administrative convenience; (b) comparability of scores; and (c) the nature of the task itself and the behavior to be predicted. Use of time limits is, for example, asserted to permit testing arrangements to be more efficient, more easily scheduled, and the like; to make scores more comparable, in that testing everyone under a given time limit supposedly means the subjects have been given the test under more nearly equivalent conditions; or to be inherent to the nature of the task, that is, to be necessary in view of what one wishes to measure.

Journal ArticleDOI
TL;DR: The authors investigated methods of predicting success of graduate students at Purdue University using a sample consisting of recipients of X-R Fellowships between September, 1956, and September, 1958, and found that only 119 students from departments which had fewer than three tested students over this time period were excluded from this study.
Abstract: Fellows was part of a larger, four-year study to investigate methods of predicting success of graduate students at Purdue University. Subjects. The sample consisted of recipients of X-R Fellowships between September, 1956, and September, 1958. Of the 212 X-R Fellows over this time period, 119 were included in this study. The students from departments which had fewer than three tested X-R Fellows over this time period were excluded from this study. The departments included in the study and the number of students in each department appear in Table 1. All students were candidates for the Ph.D.


Journal ArticleDOI
TL;DR: In this paper, the authors present some information about the ground swell of activity that is currently developing in psychology, engineering, and education, and summarize some preliminary findings generated by the beginning research studies so as to stimulate further inquiry.
Abstract: REACTIONS to the idea of a machine that teaches vary from complete rejection to ecstatic enthusiasm. One purpose of this paper will be to present some information about the ground swell of activity that is currently developing in psychology, engineering, and education. A second purpose will be to summarize some preliminary findings generated by the beginning research studies so as to stimulate further inquiry especially as it might relate to problems in special education. This summary will indicate some specific problems that teaching machines might be employed to investigate in the education of the mentally retarded.

Journal ArticleDOI
James F. Adams1
TL;DR: In this paper, four hypothetical distributions of test scores are presented in Table 1. Test W is characterized by approximate normality, Test X by platykurtosis, Test Y by leptokurtosis; and Test Z by skew-
Abstract: ALTHOUGH Nefzger and Drasgow (1957) have pointed out that Pearson’s product movement correlation does not assume normally distributed variables, item analysis techniques which are intended to give estimates of the product moment coefficient or are the product moment coefficient for a particular situation, need to be interpreted cautiously if the criterion scores are not normally distributed. Four hypothetical distributions of test scores are presented in Table 1. Test W is characterized by approximate normality; Test X by platykurtosis; Test Y by leptokurtosis; and Test Z by skew-

Journal ArticleDOI
TL;DR: In this paper, a generalized version of the reliability formula known as K-R(21) is derived for the case when the item effect variance is excluded from the error variance, and the reliability measure which will be derived is an estimate which corresponds to a formula derived by Horst, formula 13, n constant.
Abstract: THERE is a growing trend toward reformulation of psychological measurement by means of analysis of variance (Ebel, 1951; Haggard, 1958; Sutcliffe, 1958). Since the work of Jackson (1939) and Hoyt (1941) the measurement model preferred for mental test data has usually been a double classification model, either of the mixed or of the random effects variety. Lord (1955) has derived the reliability formula known as K-R(21), originally proposed by Kuder and Richardson (1937), using some assumptions required by item random sampling theory. Using similar assumptions and a single classification analysis of variance model, a generalized form of K-R(21) may be derived for which item scores are not restricted to two constants. The result serves as a slightly biased estimate of the reliability of randomly parallel tests. A second model, which will be discussed only briefly, is possible for the case where the item effect variance is excluded from the error variance. The general reliability measure which will be derived is an estimate which corresponds to a formula derived by Horst (1949, formula 13, n constant) using different assumptions.i The theory of measurement implicit in the present derivation will require brief discussion.