scispace - formally typeset
Search or ask a question

Showing papers in "Educational and Psychological Measurement in 1981"


Journal ArticleDOI
TL;DR: In this paper, the authors consider some appropriate and inappropriate uses of coefficient kappa and alternative kappa-like statistics, and discuss the descriptive characteristics of these statistical methods, but their discussion is restricted to descriptive characteristics only.
Abstract: This paper considers some appropriate and inappropriate uses of coefficient kappa and alternative kappa-like statistics. Discussion is restricted to the descriptive characteristics of these statist...

1,145 citations


Journal ArticleDOI
TL;DR: The Maslach Burnout Inventory (MBI) as discussed by the authors was designed to assess the frequency and intensity of perceived burnout among persons in the helping professions in general, and examined the reliability of the MBI.
Abstract: The Maslach Burnout Inventory (MBI) was designed to assess the frequency and intensity of perceived burnout among persons in the helping professions in general. This study examined the reliability ...

388 citations


Journal ArticleDOI
TL;DR: The Test of Logical Thinking (TOLT) as discussed by the authors was developed to measure five modes of formal reasoning: controlling variables, proportional reasoning, combinatorial reasoning, probabilistic reasoning, controlling variables and controlling variables.
Abstract: The paper describes the development of the Test of Logical Thinking (TOLT) to measure five modes of formal reasoning: controlling variables, proportional reasoning, combinatorial reasoning, probabi...

345 citations


Journal ArticleDOI
TL;DR: This paper used items from the LBDQ-XII Initiating Structure and Consideration subscales to create a written description of a fictitious manager and then asked participants to complete a questionnaire containing the twenty items.
Abstract: The prevailing conventional wisdom is that it is advisable to mix positively and negatively worded items in psychological measures to counteract acquiescence response bias. However, there has been virtually no unambiguous empirical evidence to support this recommendation. Thus, an experiment was conducted to evaluate the ability of subjects to respond accurately to both positive and reversed (negative) items on a questionnaire. Items from the LBDQ—XII Initiating Structure and Consideration subscales were used to create a written description of a fictitious manager. One hundred-fifty subjects, all upper-division business undergraduates, were given the written managerial description and then asked to complete a questionnaire containing the twenty Initiating Structure and Consideration items. The managerial descriptions were in two forms (to portray high and low Initiating Structure), and the questionnaires contained items in three forms (all positively worded, all negatively worded, and mixed). The data wer...

324 citations


Journal ArticleDOI
TL;DR: In this paper, the effects of item presentation mode on the degree of leniency bias inherent in responses to standard field research questionnaires were examined and two types of modes were examined.
Abstract: This study was concerned with the effects of item presentation mode on the degree of leniency bias inherent in responses to standard field research questionnaires. Two types of modes were examined ...

74 citations


Journal ArticleDOI
TL;DR: In this article, a revised version of the Measure of Sampling Adequacy for factor-analytic data matrices is presented, which is based on the one presented in this paper.
Abstract: A revised version of Kaiser's Measure of Sampling Adequacy for factor-analytic data matrices is presented.

73 citations


Journal ArticleDOI
TL;DR: In this paper, six applicable criteria are suggested for use in examining the adequacy of items in such inventories, two of these criteria relate to the response distribution on each item, two to internal consistency of the items and two to the discriminative value of the item.
Abstract: In many personality and interest inventories a score profile, rather than a single score, is attributed to each subject. Six applicable criteria are suggested for use in examining the adequacy of items in such inventories. Two of these criteria relate to the response distribution on each item, two to the internal consistency of the items and two to the discriminative value of the items.

65 citations


Journal ArticleDOI
TL;DR: In this article, the form of the Johnson-Neyman region of significance was determined by the statistic for testing the null hypothesis that the population within-group regressions are parallel.
Abstract: The form of the Johnson-Neyman region of significance is shown to be determined by the statistic for testing the null hypothesis that the population within-group regressions are parallel. Results are obtained for both simultaneous and nonsimultaneous regions of significance.

60 citations


Journal ArticleDOI
TL;DR: The use of change scores is perfectly legitimate in certain circumstances and at times may even be preferable to other methods of analysis, such as covariance analysis as discussed by the authors, which is more powerful than change scores.
Abstract: Psychologists have been warned repeatedly in recent years of the hazards of change scores. Unfortunately, these warnings seem to have created the belief among many researchers that the use of change scores is universally misleading and therefore should be avoided at all costs. However, the use of change scores is perfectly legitimate in certain circumstances and at times may even be preferable to other methods of analysis. The ANOVA of change scores is acceptable in randomized pretest-posttest designs, where it is equivalent to a repeated measures approach to the data. In addition, the unreliability of difference scores poses no problem here. Despite this, the analysis of covariance is generally preferred, because it is more powerful. However, in at least two situations the analysis of change scores is preferable to the analysis of covariance.

45 citations


Journal ArticleDOI
TL;DR: The first eigenvalue of a correlation matrix indicates the maximum amount of variance of the variables which can be accounted for with a linear model by a single underlying factor as mentioned in this paper, and when all correlations are positive, this first eigvalue is approximately a linear function of the average correlation among the variables.
Abstract: The first eigenvalue of a correlation matrix indicates the maximum amount of the variance of the variables which can be accounted for with a linear model by a single underlying factor. When all correlations are positive, this first eigenvalue is approximately a linear function of the average correlation among the variables. While that is not true when not all the correlations are positive, in the general case the first eigenvalue is approximately equal to a lower bound derived in the paper. That lower bound is based on the maximum average correlation over reversals of variables and over subsets of the variables. Regression tests show these linear approximations are very accurate. The first eigenvalue measures the primary cluster in the matrix, its number of variables and average correlation.

43 citations


Journal ArticleDOI
TL;DR: In this paper, a measure of young children's generalized tendency to expect positive or negative outcomes (Optimism-Pessimism Test Instrument : OPTI) is described, and evidence for the measure's reliability for first and second-grade children is provided.
Abstract: A measure of young children's generalized tendency to expect positive or negative outcomes (Optimism-Pessimism Test Instrument : OPTI) is described. Descriptive data and evidence for the measure's reliability for first- and second-grade children are provided. Validity is assessed by the measure's relationship to several other measures of personality constructs. Moderate but significant correlations were found between OPTI and attitude toward school, self-concept, delay of gratification, and locus of control. These significant correlations suggest that the meaning of such personality dimensions may be clarified by further examination of their relationship to a generalized expectancy for positive or negative outcomes. The OPTI measure could be a useful research tool in such investigations.

Journal ArticleDOI
TL;DR: In this article, a factor analysis of the 1%2 revision of the Advanced Progressive Matrices (APM) was conducted such that substantive factor structure interpretations were freed of the effects of differences in item difficulty.
Abstract: The study presents a factor analysis of the 1%2 revision of the Advanced Progressive Matrices (APM). The analysis was conducted such that substantive factor structure interpretations were freed of the effects of differences in item difficulty. The APM test was given to 237 examinees, 16-18 years old. The data were subjected to a Guttman scale analysis to determine whether the APM could be interpreted as a one factor instrument. Then the phi/phi max inter-item correlation matrix was factored. A principal components analysis, followed by a series of varimax rotations of the principal components, was performed. The Guttman coefficients of scalability were too small to support a one factor theory of the APM. The 2-factor solution provided the most interpretable factor structure. Factor I was composed of items in which the solution was obtained by adding or subtracting patterns. Factor I1 was composed of items in which the solution was based on the ability to perceive the progression of a pattern. Results are discussed in terms of representative cognitive tests and tasks believed to embody the logical operations responsible for successful performance on items loading on each factor. The possibility of forming subtests of items to enhance the predictive validity of the matrices also is discussed.

Journal ArticleDOI
TL;DR: Using male/female high school seniors and college students (freshman through seniors), a 40-item, forced-choice, easily scored, group-administered, objective instrument (DISI-O) was developed, corr...
Abstract: Using male/female high school seniors and college students (freshman through seniors), a 40-item, forced-choice, easily scored, group-administered, objective instrument (DISI-O) was developed, corr...

Journal ArticleDOI
TL;DR: In the case of the Test of Science Related Attitudes (TOSRA), this article found that there are clear conceptual distinctions among the seven subscales even if not operational distinctions.
Abstract: The 70 items of the Test of Science Related Attitudes (TOSRA) were given to 1041 students who responded directly to them as attitude test items and to 39 teachers who were asked to assign the items to categories, without directions about either the number of categories to be used or the criteria to be employed in establishing them. Factor analysis of the item correlation matrix obtained from the student responses suggested that the seven subscales of the test were not distinct. Factor analysis of a joint proportions matrix for the items, obtained from the teachers' categorizations of the items, suggested that there are clear conceptual distinctions among the seven subscales even if not operational distinctions.

Journal ArticleDOI
TL;DR: The Fennema-Sherman Mathematics Attitudes Scales were administered to 1541 junior high school students and factor analyses were performed on the responses to the 108 items comprising the scales as mentioned in this paper.
Abstract: The Fennema-Sherman Mathematics Attitudes Scales were administered to 1541 junior high school students. To investigate the construct validity of the scales, factor analyses were performed on the responses to the 108 items comprising the scales. Additional factor analyses of the scale scores were performed, and a comparison was made with results reported by Fennema and Sherman (1976). Analysis of the items led to the interpretation of eight factors. The results generally provided empirical evidence to support the theoretical structure of the Fennema-Sherman Mathematics Attitudes Scales.

Journal ArticleDOI
TL;DR: This paper assessed the construct validity of the Myers-Briggs Type Indicator, MBTI (Myers, 1976) and found that friends or relatives can make judgments about an individual which will be associated with his/her predominant personality type.
Abstract: This study assessed the construct validity of the Myers-Briggs Type Indicator, MBTI (Myers, 1976). The rationale was that friends or relatives can make judgments about an individual which will be associated with his/her predominant personality type. Forty-eight subjects rated themselves on two seven-point Likert scales designed to assess behavioral styles. These inventories (designated, "Behavioral Style Inventories") were designed for this study based on the operational definitions of type outlined by Myers in the manual to the MBTI. One inventory assessed perceptions held by subjects of themselves (Form S); the other was a measure of perceptions of their ideal selves (Form I). Forty-five of the subjects were also independently rated by their spouses who used a similar form of the inventory (Form M). All subjects then took the MBTI. Scores were converted to type categories and compared by using the coefficient of agreement for nominal data, Kappa. Self-typing on Form S of the Behavioral Styles Inventory ...

Journal ArticleDOI
TL;DR: The validity of the Watson-Glaser Critical Thinking Appraisal (WGCTA) measure was examined in an academic setting emphasizing the use of critical thinking with an accelerated group of students who were pursuing a combined baccalaureate and professional program of medicine as discussed by the authors.
Abstract: The validity of the Watson-Glaser Critical Thinking Appraisal (WGCTA) measure (Watson and Glaser, 1964) was examined in an academic setting emphasizing the use of critical thinking with an accelerated group of students who were pursuing a combined baccalaureate and professional program of medicine. Using a criterion of grades from a specially designed physics course intended to stress critical thinking, validity coefficients of .54 0, < .oooO2) and of .45 (p < .0007), respectively, were observed for each of two predictors: (a) total scores on the College Board Scholastic Aptitude Test (SAT) and (b) total scores on the WGCTA measure. The WGCTA did not add much to the total prediction and should probably not be used as a substitute for regular entrance examinations in a college setting. However, the fact that it significantly predicted standing in a specialized course at the college level lends some credence to its criterion-related validity.

Journal ArticleDOI
TL;DR: The predictive validity of each of several mathematics and language variables relative to criteria of grade point average (GPA) and word problem-solving skills for a group of 60 undergraduate hispa...
Abstract: The predictive validity of each of several mathematics and language variables relative to criteria of grade point average (GPA) and word problem-solving skills for a group of 60 undergraduate Hispa...

Journal ArticleDOI
TL;DR: The Locke-Wallace Marital Adjustment Test (LWMAT) has been used by researchers seeking to classify married couples as high or low in level of adjustment as discussed by the authors.
Abstract: The Locke-Wallace Marital Adjustment Test (LWMAT) continues to be used by researchers seeking to classify married couples as high or low in level of adjustment. Although it has been over twenty years since the emergence of this instrument, little subsequent evaluation of it has appeared in the literature. An examination of data from a representative sample led to three conclusions: (1) many of the original items are not necessary; (2) there is one underlying factor of “adjustment” in the test; and (3) the instrument also appears to be tapping a second factor related to social expectancy. Suggestions are made for alternative measures in research.

Journal ArticleDOI
TL;DR: A random sample of 2626 Canadian professional accountants was mailed a questionnaire containing the Self-Directed Search instrument (SDS) developed by Holland as discussed by the authors, and 1206 completed questionnaires wer...
Abstract: A random sample of 2626 Canadian professional accountants was mailed a questionnaire containing the Self-Directed Search instrument (SDS) developed by Holland. The 1206 completed questionnaires wer...

Journal ArticleDOI
TL;DR: This article investigated the internal consistency coefficients for two of the four ILP scales but cast doubt on the factor structure of the inventory of learning processes and found that these scales proved to be useful predictors of academic achievement for the Australian students.
Abstract: Investigations of the Inventory of Learning Processes with 255 Australian and 173 Filipino college freshmen revealed satisfactory internal consistency coefficients for two of the four ILP scales but cast doubt on the factor structure of the inventory. Yet, the ILP scales proved to be useful predictors of academic achievement for the Australian students.

Journal ArticleDOI
TL;DR: This article found that attitudes toward mathematics are causally predominant over mathematics achievement for what variance is shared in common, and that the correlations were generally positive and low, were strongest within the fourth through the seventh grades.
Abstract: Results of the differences between cross-lagged panel correlations and of the consistency of them for 2,429 students within the first eight grades weakly suggested that attitudes toward mathematics is causally predominant over mathematics achievement for what variance is shared in common. The correlations, which were generally positive and low, were strongest within the fourth through the seventh grades.

Journal ArticleDOI
TL;DR: In this article, an investigation was conducted to determine the construct validity of the Piers-Harris Children's Self Concept Scale (P-H) and the results indicated that the P-H demonstrates both convergent and discriminant validity in an assessment of a relatively stable and internally consistent construct.
Abstract: An investigation was conducted to determine the construct validity of the Piers-Harris Children's Self Concept Scale (P-H). Evidence for the validity of the instrument was analyzed according to the model for construct validation proposed by Sabers and Whitney. The main parameters of the model include the convergent validity, discriminant validity, internal consistency, and stability of the measure. Results indicated that the P-H demonstrates both convergent and discriminant validity in an assessment of a relatively stable and internally consistent construct.

Journal ArticleDOI
TL;DR: In this paper, the authors examined the degree of relationship between GPA earned during one academic quarter and an ability measure defined as scores on the composite of four academic tests in the ACT Assessment (American College Testing Program, 1959-1980).
Abstract: For a sample of 71 full-time students attending a large community college in the San Francisco Bay area, the objective of this investigation was to examine the degree of relationship between GPA earned during one academic quarter and: (a) an ability measure defined as scores on the composite of four academic tests in the ACT Assessment (American College Testing Program, 1959–1980) or scores on the College Board Scholastic Aptitude Test (Educational Testing Service, 1948–1980) that had been converted to scores on the academic composite of the ACT Assessment; (b) standing on an expectancy measure defined as a student's anticipated GPA; (c) scores on each of six measures representing a valence construct that indicates the relative desirability of long-range student goals; (d) scores on each of six measures portraying an instrumentality construct reflecting the facilitative effect of college grades (a short-term goal) as perceived by students in realizing long-term goals (such as vocational success or endurin...

Journal ArticleDOI
TL;DR: A survey devised by Marshall as discussed by the authors to measure students' perceptions of their medical school was completed by undergraduates at two schools with radically different approaches to medical education, and the survey proved to be reliable in terms of its internal consistency, and exhibited strong validity for a construct of school learning environment.
Abstract: A survey devised by Marshall (1978) to measure students' perceptions of their medical school was completed by undergraduates at two schools with radically different approaches to medical education. The survey proved to be reliable in terms of its internal consistency, and exhibited strong validity for a construct of school learning environment. Differences found between the two schools were not specific to medical education, which raised the prospect of the survey's extension to similar evaluations in any other educational institution.

Journal ArticleDOI
TL;DR: In this paper, the authors reanalyzed the data of the two earlier investigations to determine whether the impairment of the discriminant validity of the grouped questionnaire items might be due to spurious correlations attributable to leniency.
Abstract: Two previous investigations revealed that grouping (rather than randomizing) questionnaire items measuring similar constructs (in subsections) resulted in impaired discriminant validity (Schrie-sheim and DeNisi, 1980) and that grouping also strengthened the impact of leniency response bias (Schriesheim, 1981). This study reanalyzed the data of the two earlier investigations to determine whether the impairment of the discriminant validity of the grouped questionnaire items might be due to spurious correlations attributable to leniency. The responses of thirty discount store employees to a questionnaire containing grouped items measuring leniency in leader behavior descriptions (Schriesheim, 1980) and four similar constructs from the Leader Behavior Description Questionnaire (Stogdill, 1963) and Four-Factor Theory Questionnaire (Taylor and Bowers, 1972) were examined for convergent and discriminant validity by using a traditional zero-order multitrait-multimethod correlation matrix analysis (Campbell and Fi...

Journal ArticleDOI
TL;DR: In this article, the extent to which interrater reliability estimates, based upon categorical data, are similar between pairs of observers all of whom have independently evaluated the same sample of subjects is investigated.
Abstract: A problem that is of interest to research investigators is the extent to which interrater reliability estimates, based upon categorical data, are similar between pairs of observers all of whom have independently evaluated the same sample of subjects. The problem can be resolved statistically by the computer program described herein.

Journal ArticleDOI
TL;DR: In this paper, the relative effectiveness of the California Achievement Tests (CAT), the ACT Assessment (Academic Tests) of the American College Testing Program (ACT), the College Board Scholastic Aptitude Test (SAT), and high school grade point average (GPA) in predicting college freshman GPA was studied.
Abstract: Studied was the relative effectiveness of the. California Achievement Tests (CAT), the ACT Assessment (Academic Tests) of the American College Testing Program (ACT), the College Board Scholastic Aptitude Test (SAT), and high school grade point average (GPA) in predicting college freshman GPA. The incremental and differential incremental effectiveness of the CAT, ACT, and SAT in addition to high school GPA were also studied. Although high school GPA was the best single predictor, the CAT was as effective a predictor as was the ACT or the SAT. Use of either the ACT, SAT, or CAT resulted in an 18.47% increase in predictive etBciency over that obtained by using high school GPA alone. As the increase in predictive efficiency was very nearly the same (within rounding error) for the three tests (ACT, SAT, and CAT), they failed to demonstrate differential incremental validity.

Journal ArticleDOI
TL;DR: Two procedures for scoring the Recreation Experience Preference scales were investigated from using data obtained from respondents engaged in outdoor recreational activities in Pennsylvania (N = 46) as discussed by the authors, where respondents were asked to participate in a variety of outdoor activities.
Abstract: Two procedures for scoring the Recreation Experience Preference scales were investigated from using data obtained from respondents engaged in outdoor recreational activities in Pennsylvania (N = 46

Journal ArticleDOI
TL;DR: One of three parallel certifying examinations was administered to three classes of second year medical students and results indicated that the standard-setting procedure is a significant factor in the determination of the evaluation outcome.
Abstract: In criterion-referenced evaluation, several procedures for setting absolute standards of performance are available. However, it is not known if a given standard-setting procedure will yield consist...