Showing papers in "Educational and Psychological Measurement in 1972"

PDF

Open Access

Journal Article•DOI•

Effects of Some Variations in Rating Scale Characteristics on the Means and Reliabilities of Ratings

[...]

01 Jul 1972-Educational and Psychological Measurement

TL;DR: In this paper, the effects of scale variations on the means and reliabilities of the resulting ratings are discussed. But the authors do not consider the effect of scale levels on the ratings themselves.

...read moreread less

Abstract: a great deal of attention over the years, with an excellent summary of pertinent work being provided by Guilford (1954, Chapter 11). Certainly one of the more widely used types of rating scales is the simple graphic, ordinal type, and there seems to be an unlimited variety of ways in which the characteristics of such scales may be varied. Two obvious and popular ways in which scale characteristics may be varied are the number of scale levels and the manner with which scale levels are defined. Basic questions that are thus raised concern the effects of such variations on the means and the reliabilities of the resulting ratings. Several earlier studies seem to be especially germane to these questions. Madden and Bourdon (1964) report a study of occupational evaluation where each of 15 occupations

...read moreread less

168 citations

Journal Article•DOI•

Suppressor Variables, Prediction, and the interpretation of Psychological Relationships:

[...]

Anthony J. Conger¹, Douglas N. Jackson¹•Institutions (1)

University of Western Ontario¹

01 Oct 1972-Educational and Psychological Measurement

TL;DR: One of the traditional problems confronting the applied measurement specialist in psychology is the large-scale prediction of particular criteria as mentioned in this paper, where it is often difficult to find more than a small number of predictors contributing to incremental validity, the idea of a suppressor variable (Horst, 1941) has continued to capture periodically the imagination of those confronted with prediction problems.

...read moreread less

Abstract: ONE of the traditional problems confronting the applied measurement specialist in psychology is the large-scale prediction of particular criteria. Because it is often difficult to find more than a small number of predictors contributing to incremental validity, the idea of a suppressor variable (Horst, 1941)-one contributing to incremental validity while itself uncorrelated with the criterion-has continued to capture periodically the imagination of those confronted with prediction problems. The fact that bona fide suppressors have only rarely been reported (Lord and Novick, 1968) has not diminished the search. In a similar manner, ever since the technique of partial correlation

...read moreread less

82 citations

Journal Article•DOI•

The Lack of Retest Reliability for Individual Differences in the Kinesthetic Aftereffect

[...]

Arlene H. Morgan¹, Ernest R. Hilgard¹•Institutions (1)

Stanford University¹

21 Dec 1972-Educational and Psychological Measurement

TL;DR: Petrie et al. as discussed by the authors proposed a theory of augmentation-reduction, which states that some people augment incoming stimuli and others reduce them, according to the KAE response.

...read moreread less

Abstract: RECENT interest in the kinesthetic aftereffect (KAE) has taken two directions. First is a more precise quantification of the IUE following the procedures of Kohler and Dinnerstein (1947) , who demonstrated the ICAE by measuring changes in judgments of width (Bakan and Thompson, 1962, 1967; Hilgard, Morgan and Prytulak, 1968). Second is a classification of individuals on a perceptual style dimension based on individual differences in the I lAE response. Petrie (1967) proposed a theory of augmentationreduction, which states that some people “augment” incoming stimuli and others “reduce” them. These characteristic styles can be identified, according to Petrie, by the KAE response. Augmenters exaggerate the IlAE when the standard block feels wider (after rubbing a smaller block); reducers exaggerate the KAE when the standard block feels narrower (after rubbing a larger block). Petrie relates this tendency to a number of personality variables, the most important of which (in her theory) is pain tolerance. She proposed that people who tolerate pain well are those who reduce incoming stimuli, and that they can be selected by their extreme response in the IUE when the stimulus block is larger than the test block. Poser (1960) and Ryan and Foster (1967), supporting her theory, reported that reducers, as selected by their I lAE response to the “reduction’, contrast, tolerated pain

...read moreread less

60 citations

Journal Article•DOI•

A Generalized Upper-Lower Item Discrimination Index:

[...]

Robert L. Brennan¹•Institutions (1)

Stony Brook University¹

01 Jul 1972-Educational and Psychological Measurement

TL;DR: In this paper, the authors focus on the upper-lower types of discrimination indices, i.e., the number of students in an upper group who get an item correct and the number in a lower group who got the item correct.

...read moreread less

Abstract: ness typically calculated by test evaluators. Although a discrimination index, in the classical sense, is described as a measure of item-criterion correlation, such an index is frequently interpreted as a measure of comparison between the number of students in an upper group who get an item correct and the number of students in a lower group who get the item correct. This paper is primarily concerned about the upper-lower types of indices which have been discussed by writers such as Bridg-

...read moreread less

54 citations

Journal Article•DOI•

Robust Tests for Homogeneity of Variance.

[...]

Paul A. Games¹, Henry B. Winkler², David A. Probert²•Institutions (2)

Pennsylvania State University¹, Ohio University²

01 Dec 1972-Educational and Psychological Measurement

TL;DR: The analysis of variance test of equal means is robust to violations of this assumption when equal n's > 10 are used, if the n's are unequal, this assumption may be critical.

...read moreread less

Abstract: E sometimes has an a priori interest in testing the homogeneity of variance of k independent groups. For example, B. F. Skinner (1958) predicted that achievement scores of students finishing programmed lessons would have a smaller variance than students taught by other methods. If Gagne’s (1965) hierarchially arranged behaviors are involved, small variance in low level skills would simplify teaching for higher skills. A second reason for interest in Ho: uI2 = ua2 = -.* uK2 is as an assumption needed to guarantee the accuracy of various tests on means. Although the analysis of variance test of equal means is robust to violations of this assumption when equal n’s > 10 are used, if the n’s are unequal, this assumption may be critical (Box, 1954). Even with equal n’s, the Tukey Wholly Significant Difference (Tukey, 1953; Miller, 1966), the Neman-Keuls (Keuls, 1952; Nemmsn, 1939), the Duncan Multiple Range (Duncan, 1955), the Least Significant Difference (Fisher, 1949)’ and the Schd6 (1953) multiple comparison tests all require homogeneous variances. An a priori specified test on a contrast

...read moreread less

53 citations

Journal Article•DOI•

A Comparison of Five Variable Weighting Procedures

[...]

John G. Claudy¹•Institutions (1)

University of Tennessee¹

01 Jul 1972-Educational and Psychological Measurement

TL;DR: An estimate of the multiple correlation coefficient in the population from which the sample was drawn, and the sample beta weights are taken as estimates of the population beta weights.

...read moreread less

Abstract: an estimate of the multiple correlation coefficient in the population from which the sample was drawn, and the sample beta weights are taken as estimates of the population beta weights. It is these population parameters which are usually of interest, and not the sample statistics in and of themselves. In spite of the fact that this technique is so widely used, the situation seems little improved from what it was 20 years ago when Cureton (1950) wrote: &dquo;It is doubtful that any other statistical techniques have been so generally and widely misused and misinterpreted in educational research as have those of multiple correlation (p. 690).&dquo; Nor is there any reason to expect improvement in this situation, and indeed, the ready availability of standard computer regression programs may be making it worse. All too often the nature of the data used, or the size of the sample employed, is not satisfactory for multiple regression purposes.

...read moreread less

52 citations

Journal Article•DOI•

The Structure and Content of Social Attitude Referents: A Preliminary Study:

[...]

Fred N. Kerlinger¹•Institutions (1)

New York University¹

01 Oct 1972-Educational and Psychological Measurement

TL;DR: Kerlinger as mentioned in this paper explored the measurement of social attitudes by using attitude referents (objects) as stimuli; assess the psychometric properties of an attitude scale constructed with referent, and test aspects of a structural theory of attitudes.

...read moreread less

Abstract: THIS study had four purposes: to explore the measurement of social attitudes by using attitude referents (objects) as stimuli; to assess the psychometric properties of an attitude scale constructed with referents; to study the firstand second-order factor structures of attitude referents; and to test aspects of a structural theory of attitudes (Kerlinger, 1967a). The usual approach to the measurement of attitudes is to use statements or propositions that presumably reflect the attitudes. With the exception of work using the semantic differential (e.g., Osgood, Suci, and Tannenbaum, 1957, pp. 104-116, 171-176, 192-195; Triandis and Davis, 1965), which ordinarily concentrates on the meaning of attitude concepts in semantic space and studies only a few concepts at a time, and

...read moreread less

52 citations

Journal Article•DOI•

Personality Types among Negro College Students

[...]

Nissim Levy¹, Clennie Murphy¹, Rae Carlson²•Institutions (2)

Howard University¹, National Institutes of Health²

01 Oct 1972-Educational and Psychological Measurement

TL;DR: The authors found that Motivational, cognitive, and affective variables appear to be patterned differently in studies of performance, of sex-identity, and of self-conceptualization in different ethnic groups.

...read moreread less

Abstract: literature reveals a sharp contemporary interest in studying Negrowhite personality differences (Pettigrew, 1964; Deutsch, Katz, and Jensen, 1968; Dreger and Miller, 1968). As such inquiry develops, there is mounting evidence that generalizations derived from studies of white samples do not hold up with Negro subjects (e.g., Carlson and Levy, 1970; Gurin, Gurin, Lao and Beattie, 1969; Hedegard and Brown, 1969; Lott and Lott, 1963). Motivational, cognitive, and affective variables appear to be patterned differently in studies of performance, of sex-identity, and of self-conceptualization in different ethnic groups. Moreover, the very complexity of the emerging patterns suggests that dimensional approaches in personality assessment are not fully adequate tools for capturing ethnic differences in personality, and that more complex, typological approaches may be more useful. Among the available typologies, that of Jung (1923) appears especially promising as a conceptual framework capable of representing the organization of cognitive, affective, and temperamental qualities within the individual. Although Jungian theory has not been influential in American academic psychology (beyond a simple and somewhat misleading adaptation of &dquo;extraversion&dquo; and

...read moreread less

50 citations

Journal Article•DOI•

Evaluating the Teaching of Intelligence1

[...]

Paul I. Jacobs, Mary Vandeventer

01 Jul 1972-Educational and Psychological Measurement

TL;DR: For instance, this paper argued that if environment is important, then it should be possible to create environments that foster intellectual growth, and that more care must be taken with the definition of intelligence.

...read moreread less

Abstract: FOR many years psychologists have been intrigued by the question of the relative importance of heredity and environment in determining differences in intelligence. Their major research method has been the correlation of intelligence test scores of people with varying degrees of genetic similarity. Thus, for example, they have shown that heredity is important because monozygotic twins, who are genetically identical, are more alike in their intelligence test scores than are dizygotic twins, whose degree of genetic similarity is only that of ordinary siblings. The users of this research method have generally been content to define intelligence as &dquo;the ability measured by intelligence tests.&dquo; Recently some psychologists have taken a different approach to the heredity-environment question. They argue that if environment is important, then it should be possible to create environments that foster intellectual growth. With this approach more care must be taken with the definition of intelligence. In

...read moreread less

49 citations

Journal Article•DOI•

Three-Choice Versus Four-Choice Items: Implications for Reliability and Validity Of Objective Achievement Tests

[...]

Frank Costin¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

21 Dec 1972-Educational and Psychological Measurement

TL;DR: Costin this paper found that mean discrimination indices and estimates of homogeneity were a little higher for three-choice items than for fourchoice items, and suggested that teachers in the natural and social-behavioral sciences who employ four-choice tests would find it profitable to shift to threechoice items; they could then increase the efficiency with which they covered course content without reducing the homogeneity and discriminating power of their tests and at the same time make the task of test construction less arduous and timeconsuming.

...read moreread less

Abstract: IN measuring psychology students’ knowledge of empirical generalizations, Costin (1970) found that mean discrimination indices and estimates of homogeneity mere slightly higher for t.hreechoice items than for four-choice items. (“Discrimination” was estimated with the D-index [Findley, 19561 and homogeneity with the Kuder-Richardson Formula 20.) I n view of these results and their relationship to previous investigations concerning the number of alternatives in objective achievement tests, Costin (1970) suggested that teachers in the natural and social-behavioral sciences who employ four-choice items mould find it profitable to shift to three-choice items; they could then increase the efficiency with which they covered course content without reducing the homogeneity and discriminating power of their tests and at the same time make the task of test construction less arduous and timeconsuming. However, the findings of the study were based on a relatively restricted sample: 200 students in introductory psychology classes at Chanute Air Force Base. As a check on the results, and as a basis for wider generalization, the study was extended to a large introductory course a t the Urbana-Champaign campus of the University of Illinois. This course included a broad spectrum of the student body, since it is required in many curricula, and also is’ widely sought to fulfill general education requirements.

...read moreread less

46 citations

Journal Article•DOI•

An Analysis of Two Response Styles: True Responding and Item Endorsement 1

[...]

Martin E. Morf¹, Douglas N. Jackson¹•Institutions (1)

University of Western Ontario¹

01 Jul 1972-Educational and Psychological Measurement

TL;DR: The authors found that a true or false response to an item like "I usually help old ladies across the street" is determined by the substance or content of the item, as well as by aspects of its form, such as ambiguity and saliency, positive as opposed to negative wording, and desirability scale value.

...read moreread less

Abstract: A true or false response to an item like &dquo;I usually help old ladies across the street,&dquo; is clearly the result of a variety of determinants. In this case, the respondent may in fact usually help old ladies across the street, and be high on a nurturance dimension. However, he may also tend (a) to manifest a general tendency to respond true to test items, (b) to endorse test items as descriptive of himself, and (c) to respond consistently in a desirable or an undesirable direction. In other words, his response may be determined by the substance or content of the item, as well as by aspects of its form, such as ambiguity and salience, positive as opposed to negative wording, and desirability scale value. The effects of such aspects of item form, interacting with subject characteristics and manifesting themselves as response styles (Jackson and Messick, 1958, 1962), were the primary focus of the present investigation.

...read moreread less

Journal Article•DOI•

Weighted Chi Square: an Extension of the Kappa Method:

[...]

Jacob Cohen¹•Institutions (1)

New York University¹

01 Apr 1972-Educational and Psychological Measurement

TL;DR: In this article, a general method for the study of mway tables of proportions or frequencies is presented, in which the investigator's a priori hypotheses about the cells are expressed numerically and used as weights.

...read moreread less

Abstract: IN the usual analysis of contingency tables by means of x2, it is frequently the case that the investigator has a priori expectations that a specified few of the cells will have larger than chance proportions (or frequencies), while he has no particular hypotheses with regard to the remaining cells. Or, if his theory is stronger, he may be able to articulate his hypotheses about the outcome for each cell with greater refinement, distinguishing chance and larger and smaller than chance expectations in varying degrees. In either case, neither the measure of association he computes (if any) nor the x2 test he performs takes into account in any way his a priori hypotheses about the contingency table. They merely index the degree and significance of the collective departure from chance, in all directions and degrees indiscriminately, of the values he observes in the cells when he has collected and organized his data. This article presents a very general method for the study of mway tables of proportions or frequencies (where m is one or more) in which the investigator’s a priori hypotheses about the cells are expressed numerically and used as weights. These weights are then used in ~ an index of hypothesized association, and also in a test of its significance, weighted x2 (xw2), which thus utilizes as relevant

...read moreread less

Journal Article•DOI•

A Factor Analysis of the EPPS and PRF Personality Inventories

[...]

Allen L. Edwards¹, Robert D. Abbott¹, Alan J. Klockars¹•Institutions (1)

University of Washington¹

01 Apr 1972-Educational and Psychological Measurement

TL;DR: Two multi-scale personality inventories have followed Murray's need structure theory of personality in the formulation of their scale definitions and item domains as mentioned in this paper, including the Edwards Personal Preference Schedule (EPPS) which measures the strength of 15 needs: Achievement (ach), Deference (def ), Order (ord), Exhibition (exh), Autonomy (aut), Affiliation (ajy), Intraception (int), Succorance (suc), Dominance (dom), Abasement (aba), Nurturance (nur), Change (chg), Endurance

...read moreread less

Abstract: Two multi-scale personality inventories have followed Murray’s need structure theory of personality in the formulation of their scale definitions and item domains. Edwards (1957a), on the basis of the list of manifest needs presented by Murray and others (1938), developed the Edwards Personal Preference Schedule (EPPS) which measures the strength of 15 needs: Achievement (ach), Deference (def ), Order (ord), Exhibition (exh), Autonomy (aut), Affiliation (ajy), Intraception (int), Succorance (suc), Dominance (dom), Abasement (aba), Nurturance (nur), Change (chg), Endurance (end), Heterosexuality (het), and Aggression (agg). The EPPS uses a forced-choice item format in which two state-

...read moreread less

Journal Article•DOI•

English Proficiency, Verbal Aptitude, and Foreign Student Success in American Graduate Schools

[...]

Amiel T. Sharon

01 Jul 1972-Educational and Psychological Measurement

TL;DR: In this paper, the admission of foreign students to graduate study in the United States is a complex problem, as foreign students often lack proficiency in the English language and have different language and cultural backgrounds.

...read moreread less

Abstract: THE admission of foreign students to graduate study in the United States is a complex problem. Unlike their American counterparts, foreign students often lack proficiency in the English language and have different language and cultural backgrounds. Furthermore, undergraduate record, which generally has been found to be the best predictor of graduate school success, is difficult to evaluate for the foreign student. The lack of comparability in the grading systems of universities in different countries makes it impossible to employ the prediction approach used with American students. The appraisal of the foreign candidate’s aptitude for graduate study by standardized admissions tests also has pitfalls. Poor performance may be due to factors not directly related to aptitude for graduate study. For example, the nonnative examinee may lack adequate English proficiency to understand the test questions or he may not be familiar with the philosophy or method of American objective tests. Competence in the English language is one factor which has

...read moreread less

Journal Article•DOI•

Expected Grade in a Course, Grade Point Average, and Student Ratings of the Course and the Instructor.

[...]

R. Barker Bausell¹, Jon Magoon¹•Institutions (1)

University of Delaware¹

21 Dec 1972-Educational and Psychological Measurement

TL;DR: This paper found no relationship between grade received and student rating of instructors, and no relationship even when controlling for scholastic performance. But they did not find any relationship between student performance in a course and instructor's academic ability.

...read moreread less

Abstract: STUDENT-INITIATED course evaluations have become institutional fixtures on a large number of college and university campuses in recent years. As long as the results from these evaluations are used solely by students for their avoved selective and expressive purposes, then, like the results of a public election, they are as valid as they are representative of the opinions of the student population. Due to the increased use of the results from course evaluations by other constituents of the academic community, however, the question of validity is not so easily dismissed. Many faculty regard student evaluations of their courses as an indication of their teaching success, and may actually allow the results to shape their subsequent pedagogical behavior. There is reason to believe that administrators are increasingly using the results of course evaluations as an operational measure of teaching effectiveness (Pierrel, 1968) , usually one of several criteria for faculty promotion. In these instances, a determination of those factors which influence ratings, but are at odds with (or extraneous to) the instructor’s purposes, is crucial. Researchers have long suspected that a student’s performance in a course, as well as his general academic ability, may bias his rating of that course and its instructor. There is, however, a degree of ambiguity in the findings to date. Remmers (1930), one of the first to investigate the issue, found a point-biserial correlation of only 0.07 between grade received and student rating of instructors. He later found no relationship even when controlling for scholastic

...read moreread less

Journal Article•DOI•

A Validity Study of Scales to Measure Need Achievement, Need Affiliation, Impulsiveness, and Intellectuality.

[...]

Robert H. Frs¹, Alan B. Knox¹•Institutions (1)

Columbia University¹

01 Apr 1972-Educational and Psychological Measurement

TL;DR: The authors conducted a multivariate study of the factors that are associated with the non-college bound young adult's decision to enroll in adult education programs, and found that a substantial proportion of high school graduates enroll in these programs.

...read moreread less

Abstract: THE Center for Adult Education at Teachers College, Columbia University, under the direction of Alan B. Knox, has been conducting a multivariate study of the factors that are associated with the noncollege bound young adult’s decision to enroll in adult education programs. This group, ranging in age from approximately 16 to 25, comprises a substantial proportion of high school graduates. In an era of ever increasing technological and societal complexity, it will be necessary for these individuals to acquire post high school adult education. One of the purposes of the Young Adult Study was to

...read moreread less

Journal Article•DOI•

Correcting Correlations for Restrictions in Range Due to Selection on an Unmeasured Variable.

[...]

N. Dale Bryant¹, Sunanda Gokhale•Institutions (1)

Columbia University¹

01 Jul 1972-Educational and Psychological Measurement

TL;DR: The size of a correlation coefficient is dependent in part upon the variability of the measured values in the correlation sample, and any time that a sample is restricted in range on either or both of the measures, the correlations between those two measures will tend to be lowered as compared to the same correlation based upon a representative sample of the population.

...read moreread less

Abstract: THE size of a correlation coefficient is dependent in part upon the variability of the measured values in the correlation sample. Any time that a sample is restricted in range on either or both of the measures, the correlations between those two measures will tend to be lowered as compared to the same correlation based upon a representative sample of the population. If prediction within the restricted sample is the purpose of the correlation, then the obtained value is the meaningful and correct

...read moreread less

Journal Article•DOI•

Convergent and Discriminant Validation of the French and Guilford-Zimmerman Spatial Orientation and Spatial Visualization Factors:

[...]

Gary D. Borich¹, Patricia M. Bauman²•Institutions (2)

University of Texas at Austin¹, Indiana University²

02 Sep 1972-Educational and Psychological Measurement

Journal Article•DOI•

Reliability of College Grades and Grade Point Averages: Some Implications for Prediction of Academic Performance

[...]

Alfred F. Etaugh¹, Claire F. Etaugh¹, Donald E. Hurd¹•Institutions (1)

Bradley University¹

21 Dec 1972-Educational and Psychological Measurement

TL;DR: In this article, the authors found that combining cognitive predictor variables typically yield a multiple correlation with GPA ranging between.50 and.GO, with the addition of noncognitive variables generally producing only a minimal gain in predictability.

...read moreread less

Abstract: THE search for predictors of academic performance in college, usually defined in terms of grade point average (GPA), has met with limited success. Combinations of cognitive predictor variables typically yield a multiple correlation with GPA ranging between .50 and .GO, with the addition of noncognitive variables generally producing only a minimal gain in predictability (e.g., Chansky, 1965; Fishman, 1962). The low validity coefficients obtained are attributed by some writers (e.g., Chansky, 1964) to the presumed low reliability of the GPA criterion. Conclusions regarding low reliability of the GPA usually arc based on observations of diversity in grading practices, rather than on direct computation. The authors located only one attempt to determine the reliability of the GPA (Clark, 1950). Unfortunately, the study is of limited value, since only selected students and courses vere examined, and a possibly biased reliability estimate was used (Ebel, 1951). The equivalence between the analysis of variance (ANOVA) model and the standard reliability formulas, demonstrated by Hoyt (1941), permits the assessment of the reliability of grades and their averages in situations in which not all studcnts are rated by the same set of graders (Ebel, 1951; Winer, 1962), a situation for which the standard reliability formulas are not well-suited. The

...read moreread less

Journal Article•DOI•

A Multitrait-Multimethod Model for Studying Growth:

[...]

Charles E. Werts, Karl G. Jöreskog, Robert L. Linn

01 Oct 1972-Educational and Psychological Measurement

TL;DR: Werts and Linn as mentioned in this paper applied the logical structure of the Campbell and Fiske multitrait-multimethod approach to the problem of studying growth and its determinants.

...read moreread less

Abstract: EDRS Price MF-$0.65 HC-$3.29 Analysis of Covariance, Analysis of Variance, Correlation, *Factor Analysis, *Factor Structure, Hypothesis Testing, *Mathematical Models, *Mathematics, *Personal Growth, Statistics The logical structure of the Campbell and Fiske multitrait-multimethod approach is applied to the problem of studying growth and its determinants. The resulting model is a special case of Joreskog's general model for the analysis of covariance structures. The relationships of traditional psychometric formulations to this model are detailed. (Author) U S. DEPARTMENT OF HEALTH, EDUCATION & WELFARE OFFICE DF EDUCATION THIS DOCUMENT HAS BEEN REPRODUCED EXACTLY AS RECEIVED FROM THE PERSON OR ORGANIZATION ORIGINATING IT POINTS DF VIEW OR OPINIONS STATED DO NOT NECESSARILY REPRESENT OFFICIAL OFFICE OF EDUCATION POSITION OR POLICY A MULTITRAIT-MULTIMETHOD MODEL FOR STUDYING GROWTH Charles E. Werts, Karl G. JOreskog and Robert L. Linn This Bulletin is a draft for interoffice circulation. Corrections and suggestions for revision are solicited. The Bulletin should not be cited as a reference without the specific permission of the authors. It is automatically superseded upon formal publication of the material. Educational Testing Service Princeton, New Jersey March 1971 RB -7i -1-

...read moreread less

Journal Article•DOI•

Test Reliability and the Kuder-Richardson Formulas: Derivation From Probability Theory

[...]

Donald W. Zimmerman¹•Institutions (1)

Carleton University¹

21 Dec 1972-Educational and Psychological Measurement

TL;DR: The Kuder-Richardson formulas have been used widely to estimate the reliability coefficient from a single administration of a test (Ihde r and Richardson, 1937).

...read moreread less

Abstract: THE Kuder-Richardson formulas, the K R 20 and the K R 21, have been used widely to estimate the reliability coefficient from a single administration of a test ( Ihde r and Richardson, 1937). The appeal of the idea of finding reliability from item statistics which are available after one testing, togcther with the computational simplicity of the formulas, probably accounts for their popularity. Although a great deal of attention has been devoted over a period of years to the estimation of reliability from item statistics (Jackson and Ferguson, 1941; Guttman, 1945; Gulliksen, 1950; Cronbach, 1951; Lyerly, 1958; Novick and Lewis, 1967, and others) , there are still gaps in the mathematical derivation of the IcuderRichardson results. The main purpose of this paper is to fill some of these gaps, using language consistent with modern probability theory (see, for example, Feller, 1968; Thomasian, 1969). This approach, i t is hoped, will also lead to a better understanding of conditions under which the formulas are applicable to test data. A test score is regarded as a sum of scores on items which are either “correct” or “incorrect.” Subtest scores which are continuous random variables can also be considered, and the structure of the model is essentially the same in that case. I n the present paper the former case is stressed, since we are concerned with another sort of generality, in which assumptions about independence of item scores (“experimental independence”) , correlations between certain scores, and random sampling procedures, are not made. Initially, there are no restrictions of this kind on the probability distributions of item

...read moreread less

Journal Article•DOI•

The Effect of Scoring Instructions and Degree of Speededness on the Validity and Reliability of Multiple-Choice Tests1:

[...]

Ross E. Traub¹, Ronald K. Hambleton²•Institutions (2)

Ontario Institute for Studies in Education¹, University of Massachusetts Amherst²

01 Oct 1972-Educational and Psychological Measurement

TL;DR: The authors compared the effects of two scoring instructions, one promising a small reward for omitted questions, the other threatening a small penalty for wrong answers, on the performance of a multiple-choice vocabulary test.

...read moreread less

Abstract: In a recent study, Traub, Hambleton, and Singh (1969) compared the effects of two scoring instructions-one promising a small reward for omitted questions, the other threatening a small penalty for wrong answers-on the performance of a multiple-choice vocabulary test. It was found that the reward instruction produced fewer incorrect answers, more omitted questions, and, with one qualification, higher reliability than the penalty instruction. The qualification was that the difference in reliability coefficients for the two instructions was significant at the .05 level when the performance

...read moreread less

Journal Article•DOI•

Answer Changing On Objective Tests: Some Implications for Test Validity

[...]

Stanley S. Jacobs¹•Institutions (1)

University of Pittsburgh¹

21 Dec 1972-Educational and Psychological Measurement

TL;DR: This paper found that better students gain more than poorer students when answers are changed, and that there is a relationship between total test scores and types of changes made by better students and poorer students.

...read moreread less

Abstract: THE reluctance of students to change their responses to objective test items may appear illogical to anyone who views the testtaking process in a simplistic manner. Since the objective in virtually all achievement testing situations is a maximum test score, it would certainly seem advisable to change one’s response(s) whenever the change(s) will contribute to that score. However, the decision as to when to change B response hinges on n highIy subjective “degree of belief” in the correctness of an item option. This belief, which probably is the result of a highly personal weighting of many factors, would probably shorn a great deal of variability across an apparently homogeneous group of subjects. Surprisingly, the research evidence on the question of answerchanging is scanty and often methodologically deficient. For instance, although the belief among students (and instructors) that “first impressions are best” seems widespread, apparently the only published deliberate survey of student opinions appeared over 40 years ago (Mathews, 1929). Although his data indicated that students felt answers should not be changed, an examination of their answer sheets revealed that answer-changing should be encouraged, since the typical result was an improvement in test score. A number of studies on the effects of answer-changing (Lehman, 1928; Jarrett, 1948; Reile and Briggs, 1952; Bath, 1967; Reiling and Taylor, 1972) have concluded that there is a relationship between total test scores and types of changes made. That is, better students gain more than poorer students when answers are changed. One must be aware of a possible tautology, however, since the strat-

...read moreread less

Journal Article•DOI•

Rating Scales as Measures of Clinical Judgment III: Judgments of the Self On Personality Inventory Scales and Direct Ratings:

[...]

James Bentley Taylor¹, Marianne Ptacek¹, Martha Carithers¹, Carol Griffin¹, Lolafaye Coyne¹ - Show less +1 more•Institutions (1)

Menninger Foundation¹

01 Oct 1972-Educational and Psychological Measurement

TL;DR: In this paper, a thermometer-like line is divided into 100 points, with discrete way points being anchored by specific examples, and the degree of agreement in rankings is evaluated statistically by Kendall's W (1962).

...read moreread less

Abstract: THE first paper in this series provided a background and rational for a new scale construction method: example-anchored scaling (Taylor, 1968a). Our initial application was with &dquo;clinical judgment,&dquo; in which one person rated another on a variable of clinical interest. The method was used to construct a particular kind of rating scale: a thermometer-like line divided into 100 points, with discrete way points being anchored by specific examples. Most of our reported applications have employed casehistory vignettes as anchors, but photographs have also been used, and other kinds of examples (drawings, test responses, attitude statements, etc.) are possible. Briefly stated, construction of an example-anchored scale begins by defining, as concisely and clearly as possible, the domain of behavior to be measured. A set of 30 or so examples (items) are accumulated so as to sample the full range of the domain. Five or six judges rank order the examples in terms of the defined content. The degree of order-i.e., the agreement in rankings-is evaluated statistically by Kendall’s W (1962). It has been shown (Taylor, 1968b) that this brief judgmental method

...read moreread less

Journal Article•DOI•

The Reliability of Differences Between Linear Regression Weights in Applied Differential Psychology

[...]

Frank L. Schmidt¹•Institutions (1)

Michigan State University¹

21 Dec 1972-Educational and Psychological Measurement

TL;DR: For example, this article showed that for certain combinations of N (sample size) and p (number of predictors) in applied differential psychology, simple unit weighting (summing of z scores of predictor) produces, on the average, a larger correlation in the long run, i.e., in the population, than 8.

...read moreread less

Abstract: IN many areas of psychology, education, sociology, economics and other behavioral and social sciences, a relatively common research design is one that calls for the prediction of the standing of a person or thing on one variable, often designated the criterion, from his or its standing on a number of other variables, often called the predictors. When the relationships in question are linear, least squared error multiple regression weights are most commonly used in weighting the predictors into a composite. Theso weights minimize the sum of the squared deviations of the observed from the predicted criterion scores (Anderson 1958). In practice, the sample regression weights (8) are often computed on relatively small samples, and as a result, are only rough approximations to the population regression weights @), which are, by definition, the most effective set of predictor weights possible. If applied to the entire population, 8 would produco some correlation, p@), and B would produce p@), the maximum correlation. Although p@) Tvill vary depending on tho chance differences between different fi, p ( 0 ) is a parameter and thus has only one value. A previous study (Schmidt, 1971) showed that, for certain combinations of N (sample size) and p (number of predictors) in applied differential psychology, simple unit weighting of predictors (summing of z scores of predictors) produces, on the average, a larger correlation in the long run, i.e., in the population, than 8. With small N and large p , these differences in predictive eficiency favoring simple unit predictor weights (I) over f i were large enough, in some cases, to be of practical significance in applied situations (e.g., .12-.13 correlation units).

...read moreread less

Journal Article•DOI•

Reducing Objections to Personality Inventories with Special Instructions

[...]

Alan M. Fink¹, James N. Butcher¹•Institutions (1)

University of Minnesota¹

01 Oct 1972-Educational and Psychological Measurement

Journal Article•DOI•

Overall Measures of Self-Actualization Derived from the Personal Orientation Inventory: a Replication and Refinement Study

[...]

Vernon J. Damm¹•Institutions (1)

University of Portland¹

01 Jul 1972-Educational and Psychological Measurement

TL;DR: In this article, the intercorrelations among selected subscales of the POI with the view of determining the feasibility of substituting one or two scales for the overall scale (composite of 12 scales) were investigated.

...read moreread less

Abstract: That study was replicated on samples from older populations, and comparisons between males and females were made. Thus, the purpose of this investigation was to report for new samples the intercorrelations among selected subscales of the POI with the view of determining the feasibility of substituting one or two scales for the overall scale (composite of 12 scales) of the POI. Method. Three additional populations sampled were: 205 male

...read moreread less

Journal Article•DOI•

Sequential Testing for Dichotomous Decisions

[...]

Robert L. Linn, Donald A. Rock, T. Anne Cleary¹•Institutions (1)

University of Wisconsin-Madison¹

01 Apr 1972-Educational and Psychological Measurement

TL;DR: In this article, two sequential testing procedures for dichotomous decisions were investigated using existing item response data of 4840 college students on three achievement tests, and the assignment rules for the sequential test procedures and for short conventional tests were developed using half the sample of students and cross-validated using the other half.

...read moreread less

Abstract: Two sequential testing procedures for dichotomous decisions were investigated using existing item response data of 4840 college students on three achievement tests. The assignment rules for the sequential test procedures and for short conventional tests were developed using half the sample of students and cross-validated using the other half. In general, the first type of sequential test required an average of approximately half as many items as the short conventional test for a given degree of accuracy. This is in close agreement with previous theoretical results. The second type of sequential test used knowledge of assignment on one dimension in making the assignment on the second method. The efficacy of this second method depends heavily on the relationship between the two dimensions.

...read moreread less

Journal Article•DOI•

The Use of Hierarchical Factor Analysis in the Determination of Corporate Image Dimensions

[...]

Darrell Roach¹, Robert J. Wherry²•Institutions (2)

Nationwide Mutual Insurance Company¹, Ohio State University²

01 Apr 1972-Educational and Psychological Measurement

Journal Article•DOI•

A Comparison of Analyses Using the First and Second Generation Little Jiffy's

[...]

George P. Hollenbeck

01 Apr 1972-Educational and Psychological Measurement

TL;DR: The second generation Little Jiffy (LJ-II) as discussed by the authors is a variant of the first generation LJ-I that uses Harris factors and avoids artifactual difficulty factors.

...read moreread less

Abstract: PRINCIPAL components analysis and varimax rotation of all components with eigenvalues greater than one (the Little Jiffy) has been perhaps the most widely used factor analytic procedure (Cronbach, 1970). It has served as a &dquo;workhorse,&dquo; especially in the beginning exploration of an unknown domain. Recently, Kaiser (1970) has proposed a second generation Little Jiffy (LJ-II) which he suggested has several advantages over LJ-I. LJ-II provides a measure of sampling adequacy of the variables under study; it uses &dquo;model-free&dquo; Harris factors; it, at least theoretically, avoids artifactual difficulty factors; it uses the &dquo;orthoblique&dquo; method for transforming factors, thus allowing correlated factors. The practitioner schooled in LJ-1 is reluctant, however, to give up that tool without some notion of how his obtained &dquo;order from

...read moreread less