scispace - formally typeset
Search or ask a question

Showing papers in "Educational and Psychological Measurement in 1985"


Journal ArticleDOI
TL;DR: In this paper, three numerical coefficients (V, R, and H) for analyzing the validity and reliability of ratings are described, which range in value from 0 to 1, and are computed as the ratio of an o...
Abstract: Three numerical coefficients (V, R, and H) for analyzing the validity and reliability of ratings are described. Each coefficient, which ranges in value from 0 to 1, is computed as the ratio of an o...

623 citations


Journal ArticleDOI
TL;DR: In this paper, the authors describe the development and validation of a new instrument entitled Attitudes Toward Statistics (ATS) to be used in the measurement of attitude change in introductory statistics students.
Abstract: This study describes the development and validation of a new instrument entitled Attitudes Toward Statistics (ATS) to be used in the measurement of attitude change in introductory statistics students. Two ATS subscales are identified: Attitude Toward Course and Attitude Toward the Field, respectively. These subscales were demonstrated to have both high internal consistency and test-retest reliability. It is further shown that each ATS subscale provides distinctly different information about the attitudes of introductory statistics students.

266 citations


Journal ArticleDOI
TL;DR: In this paper, a study was conducted to evaluate teacher attitudes toward computer use which may affect the success of computer-related programs in school curricula, and the authors found that the Computer Attitude Scale and its four subscales (Computer Anxiety, Computer Confidence, Computer Liking, and Computer Usefulness) were reliable in measuring teachers' attitudes toward computers.
Abstract: As computer-related programs are introduced into school curricula, it is helpful to evaluate teacher attitudes toward computer use which may affect the success of such programs. Involving 114 teachers enrolled in microcomputer staff development courses, this study was concerned with the reliability, the factorial validity, and the differential validity of the Computer Attitude Scale and its four subscales (Computer Anxiety, Computer Confidence, Computer Liking, and Computer Usefulness). This instrument was found to be reliable in measuring teachers' attitudes toward computers and effective in differentiating among teachers with different amounts of computer experience.

245 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a heuristic demonstration that structure coefficients may be helpful in the case of correlated multiple regression analysis, where data include predictor variables that are correlated either unavoidably or due to conscious design choices.
Abstract: Multiple regression analysis is frequently and increasingly being employed in both experimental and non-experimental research. However, when data include predictor variables that are correlated, either unavoidably or due to conscious design choices, some regression results can become difficult to interpret. The paper presents an actual study to provide a heuristic demonstration that structure coefficients may be helpful in these cases.

228 citations


Journal ArticleDOI
TL;DR: Divergent thinking tests are probably the most commonly employed measures of creative potential and have demonstrated adequate psychometric properties with many populations as discussed by the authors, however, a recent study suggests that divergent thinking test may not be suitable for many populations.
Abstract: Divergent thinking tests are probably the most commonly employed measures of creative potential and have demonstrated adequate psychometric properties with many populations. Recently, however, a pa...

175 citations


Journal ArticleDOI
TL;DR: In this paper, Cohen's kappa coefficient is extended to measure agreement over time for continuous nominal scales, which avoids problems encountered by the arbitrary division of real time durations into presence/absence frequencies in discrete intervals.
Abstract: Cohen's kappa for measuring agreement between two observers using a discrete nominal scale is extended to measuring agreement over time for continuous nominal scales. The continuous kappa coefficient avoids problems encountered by the arbitrary division of real time durations into presence/absence frequencies in discrete intervals. The extension is simple but issues of independence and number of observations pose problems for significance testing.

61 citations


Journal ArticleDOI
TL;DR: For a sample of 462 elementary and junior high school classroom teachers, evidence was sought regarding the degree of relationship between each of six personal and life history variables (sex, age, etc.) as mentioned in this paper.
Abstract: For a sample of 462 elementary and junior high school classroom teachers, evidence was sought regarding the degree of relationship between each of six personal and life history variables—sex, age, ...

59 citations


Journal ArticleDOI
TL;DR: This article explored the impact of implicit theories in a managerial context, using both an objective leader behavior manipulation and a leader performance cue manipulation, and found that the popular measure initiating structure as measured by the LBDQ, was indeed responsive to the performance cue manipulations in a manner consistent with previous implicit leadership theory research.
Abstract: Previous research has shown that questionnaire measures of leader behavior can be susceptible to response bias stemming from individual “implicit leadership theories.” The research reported here extended this work by exploring the impact of implicit theories in a managerial context, using both an objective leader behavior manipulation and a leader performance cue manipulation. The findings confirmed that the popular measure initiating structure as measured by the LBDQ, was indeed responsive to the performance cue manipulation in a manner consistent with previous implicit leadership theory research. However, results from more “behaviorally oriented” measures were not significantly responsive to the performance cue manipulation, but were shown to be very good representations of actual leader behaviors. The discussion focused on how researchers might reduce the bias stemming from implicit leadership theories.

57 citations


Journal ArticleDOI
TL;DR: This paper explored the item structure of the Myers-Briggs Type Indicator (MBTI) using a factor analysis of responses of a large sample (n = 1291) yielded six salient factors, four resembling the four scales of the MBTI.
Abstract: The current study explores the item structure of the Myers-Briggs Type Indicator (MBTI). A factor analysis of responses of a large sample (n = 1291) yielded six salient factors, four resembling the four scales of the MBTI. Kuder-Richardson 20 coefficients are reported for the scales and factors. Pearson rs are also reported for the factor-scale relationships. All analyses yield only limited support for the item validity of the MBTI. Relevant issues involving the construction of the test are discussed and suggestions are made for future research.

55 citations


Journal ArticleDOI
TL;DR: In this article, the concurrent validity of each of five subscales of the measure of academic self-concept entitled Dimensions of Self-Concept (DOSC)-Level of Aspiration, Anxiety, Academic Interest and Satisfaction, Leadership and Initiative, and Identification versus Alienation-was found relative to each of three criterion scales of the Maslach Burnout Inventory (MBI) scored for both frequency and intensity of response.
Abstract: For a sample of 109 graduate students beginning their first semester of practice teaching at the elementary-school level the concurrent validity of each of five subscales of the measure of academic self-concept entitled Dimensions of Self-Concept (DOSC)-Level of Aspiration, Anxiety, Academic Interest and Satisfaction, Leadership and Initiative, and Identification versus Alienation-was found relative to each of three criterion scales of the Maslach Burnout Inventory (MBI) scored for both frequency and intensity of response: (a) Emotional Exhaustion, (b) Depersonalization, and (c) Personal Accomplishment. It was concluded that for the most part the subscales of the DOSC demonstrated promising concurrent validity relative to the criterion measures afforded by the MBI. Substantial evidence was present that those student teachers who exhibited scores on the DOSC indicative of a positive or facilitative academic self-concept tended to register scores on the MBI associated with minimal tendency toward burnout be...

53 citations


Journal ArticleDOI
TL;DR: The Kirton Adaption-Innovation Inventory (KAI) was evaluated for its factorial composition, internal consistency, and relationship with self-esteem compared with previous studies as discussed by the authors.
Abstract: The Kirton Adaption-Innovation Inventory (KAI) was evaluated for its factorial composition, internal consistency, and relationship with self-esteem compared with previous studies. Data was from a r...

Journal ArticleDOI
TL;DR: In this paper, the reliability and construct validity of an alternate form of the Learning Style Inventory using a semantic differential format was evaluated using the Likert-type normative form and it was found that the alternate form was reliable and construct valid.
Abstract: Learning style assessment provides a framework within which individual differences for specific ways of learning can be described. Kolb's Learning Style Inventory has been used to assess learners' preferences for specific phases of an experiential learning cycle. This study was designed to determine the reliability and construct validity of an alternate form of the Learning Style Inventory using a semantic differential format. Results of this study suggest that the alternate form was reliable and construct valid. In addition, they indicate that this alternate form might be more reliable than a previously presented Likert-type normative form.

Journal ArticleDOI
TL;DR: In this paper, a model of examinee behavior based on knowledge and random guessing is used to generate hypotheses about how true-false scores work, and the confirmation of six hypotheses formed a network of support for the contention that true-true scores contain an error component due to guessing.
Abstract: A model of examinee behavior based on knowledge and random guessing is used to generate hypotheses about how true-false scores work. Although others have expressed reservations about the simplicity and utility of such a model, it leads to informative ideas. The confirmation of six hypotheses forms a network of support for the contention that true-false scores contain an error component (due to guessing) that makes these scores less reliable than those based on 5-choice items. Examinee response style, a propensity to favor the selection of true or false responses when the answer is unknown, can invalidate a total true-false score. When the answer key for items unknown to an examinee is unequally split between those keyed true and false, this interaction produces scores inaccurate by as much as a sample standard deviation.

Journal ArticleDOI
TL;DR: In this article, the authors compare the standard maximum likelihood estimation with two forms of robust estimation, BIWEIGHT and AMJACK, and person analysis within the Rasch model, and find that although the two robust estimation procedures recover the generating parameters under certain conditions, in the presence of many forms of measurement disturbances they mask important information.
Abstract: Measurement disturbances, e.g., guessing, startup, plodding, etc., often result in an examinee's ability being either over- or underestimated by the maximum likelihood estimation employed in latent trait psychometric models. Several authors have suggested methods to lessen the impact of unexpected responses on the ability estimation process. This study uses simulated data to compare the standard maximum likelihood estimation with two forms of robust estimation, BIWEIGHT and AMJACK, and person analysis within the Rasch model. The results indicate that, although the two robust estimation procedures recover the generating parameters under certain conditions, in the presence of many forms of measurement disturbances they mask important information. Rasch person analysis has the advantage of not only providing a method of modifying the ability estimation procedure but also of providing a means of identifying the nature of the disturbance.

Journal ArticleDOI
TL;DR: In this paper, the authors determine the test-retest reliability and concurrent validity of the short form (Form B) of Coopersmith Self-Esteem Inventory for children.
Abstract: The purpose of this study was to determine the test-retest reliability and concurrent validity of the short form (Form B) of Coopersmith Self-Esteem Inventory. The subjects were 140 children from s...

Journal ArticleDOI
TL;DR: In this paper, the authors validate cognitive and affective scales for assessing computer attitudes using maximum likelihood factor analysis (MLF) and alpha reliability coefficients were near 090 for both scales.
Abstract: The purpose of this study was to validate cognitive and affective scales for assessing computer attitudes The two seven-item scales were identified using maximum likelihood factor analysis The alpha reliability coefficients were near 090 for both scales

Journal ArticleDOI
TL;DR: Algorithms for the exact chi-square test and the Fisher exact probability test are presented and the use of an arbitrary initial value in the recursion provides a significant reduction in computation time over previously published algorithms.
Abstract: Algorithms for the exact chi-square test and the Fisher exact probability test are presented. One- and two-tailed probability values for each test are computed recursively. The use of an arbitrary ...

Journal ArticleDOI
TL;DR: This paper examined the relationship between graduate grade point average and subtests of the GRE Aptitude Test for subsamples of graduate students in different academic disciplines and found that the GRE test was correlated with the degree point average.
Abstract: This study examined the relationship between graduate grade point average and subtests of the Graduate Record Examinations (GRE) Aptitude Test for subsamples of graduate students in different acade...

Journal ArticleDOI
TL;DR: The authors compared nine indices of response pattern appropriateness based on IRT with respect to their relation to the total test score and their effectiveness in detecting unusual response patterns, and compared three groups of response patterns.
Abstract: The study compared nine indices of response pattern appropriateness based on IRT with respect to their relation to the total test score and their, effectiveness in detecting unusual response patterns. Three groups of response patterns were analyzed; a group of cooperative examinees, a group of uncooperative examinees and a group of randomly generated response patterns. The comparison among the indices was based on their capability to differentiate among the three groups and on the percentages of correct and incorrect classification of examinees to their actual groups based on their score on the index.

Journal ArticleDOI
TL;DR: In this article, the authors discuss the importance of the effect size and the Type II error as factors in determining the sample size, and discuss the impact of type II error on effect size.
Abstract: In a previous article, the authors discuss the importance of the effect size and the Type II error as factors in determining the sample size (Hinkle and Oliver, 1983). Tables were developed and pre...

Journal ArticleDOI
TL;DR: The factor structure of the Stress Arousal Checklist (SACL) was examined using the responses of a New Zealand sample of 203 first and second year University students as mentioned in this paper.
Abstract: The factor structure of the Stress Arousal Checklist (SACL) was examined using the responses of a New Zealand sample of 203 first and second year University students. The results produced an almost...

Journal ArticleDOI
TL;DR: In this paper, the effect size index for the difference between independent proportions and q, the difference of independent correlation coefficients, is estimated to three decimal places and can be used conveniently in conjunction with tables and charts of power and sample size.
Abstract: Tables are given for the rapid estimation of h, the effect size index for the difference between independent proportions, and of q, the effect size index for the difference between independent correlation coefficients. The tables are accurate to three decimal places and may be used conveniently in conjunction with tables and charts of power and sample size. Formulas for power and sample size estimation are also presented.

Journal ArticleDOI
TL;DR: In this article, the authors compared behaviorally anchored rating scales with a carefully constructed summated rating scale and found that BARS had less halo error, more leniency error, and lower interrater reliablity than the alternative format.
Abstract: Behaviorally Anchored Rating Scales developed according to Bernardin, LaShells, Smith, and Alvarez's (1976) optimal procedure was compared with a carefully constructed summated rating scale. Using both scales, 727 undergraduates rated 32 instructors. Psychometric comparisons indicated that BARS had less halo error, more leniency error, and lower interrater reliablity than the alternative format. The two formats did not differ in ratee discrimination and susceptibility to rating bias due to rater characteristics. Finally, the formats contained convergent and discriminant validity.

Journal ArticleDOI
TL;DR: In this article, the authors investigated various coefficients of person reliability, including both measures of within- and between-occasion consistency, and implied that person reliability may be a multidimensional concept and that certain item consistency measures are confounded with psychopathology.
Abstract: Person reliability concerns the response consistency of a single respondent to a number of psychological tests or test items. This study investigated various coefficients of person reliability, including both measures of within- and between-occasion consistency. Using a one-month test-retest interval, 123 undergraduates completed a 12-scale, 240-item, true-false inventory measuring various dimensions of psychopathology. Six consistency measures were calculated for each respondent: two within-occasion person reliability indices for each testing, a between-occasion person reliability index, and an item consistency measure involving a count of corresponding items answered identically on both occasions. Results implied that person reliability may be a multidimensional concept and that certain item consistency measures are confounded with psychopathology. Other consistency indices, however, tend to be independent of psychopathology. Clinical implications are discussed.

Journal ArticleDOI
TL;DR: In this article, an index is defined for each column of a factor matrix to measure the goodness of fit for scale defining items, where the latter are determined in advance by the investigator.
Abstract: An index is defined for each column of a factor matrix to measure the goodness of fit for scale defining items, where the latter are determined in advance by the investigator. Let MS(S), MS(NS), and MS(T) be the mean-square loadings for scale items, nonscale items, and total items, respectively, for a given factor. A measure denoted IFFS (index of fit for factor scales) is defined as the signal-to-signal plus noise ratio: IFFS = 1 - MS(NS)IMS(T). The highest possible value of IFFS is 1.00, with .50 indicating that scale items are no better than nonscale items in defining the construct. An overall measure is also defined. The index is complemented by Kaiser's index of factor simplicity (IFS) as computed for each scale item. Evaluative criteria and examples are presented.

Journal ArticleDOI
TL;DR: FACTOREP as discussed by the authors is a computer program which provides information to solve the problem of determining the most appropriate number of factors to be rotated in order to produce a replicable and stable factor structure.
Abstract: FACTOREP is a computer program which provides information to solve the problem of determining the most appropriate number of factors to be rotated in order to produce a replicable and stable factor structure. This objective is accomplished by using the s index which enables a researcher to make comparisons of similarity across different factor structures. Matrices of s index values are produced to indicate the degree of similarity in analyses across different numbers of factors, subject groups, and cut off levels for factor loadings. Written in Pascal, the program affords readily interpretable information leading to the identification of replicable factor structures.

Journal ArticleDOI
Larry H. Ludlow1
TL;DR: In this paper, an analytic strategy for the graphical representation and analysis of Rasch model residuals is presented, where baseline graphical configurations of the residualvariation can be constructed, and measurement irregularities may be exposed.
Abstract: When the parameters of item response data are estimated by alatent trait model, some variation will remain unaccounted underthe model. If baseline graphical configurations of the residualvariation can be constructed, measurement irregularities may beexposed. This paper presents an analytic strategy for the graphicalrepresentation and analysis of Rasch model residuals.

Journal ArticleDOI
TL;DR: In this article, some suggestions for measuring marginal symmetry in agreement matrices for categorical data are discussed, together with measures of item-by-item agreement conditional on marginal asymmetry.
Abstract: Some suggestions for measuring marginal symmetry in agreement matrices for categorical data are discussed, together with measures of item-by-item agreement conditional on marginal asymmetry. Connections with intraclass correlations for dichotomous data are noted.

Journal ArticleDOI
TL;DR: The authors investigated the relationship between test-completion speed and performance and found that test completion speed on an untimed test is unrelated to test performance on all tests in a course of study.
Abstract: This investigation was designed to: (a) resolve the contradictions raised in earlier research on the relationship between test-completion speed and performance and (b) to extend the range of inquiry from performance on one test to performance on all tests in a course of study. Test scores were obtained from 278 college students on three course-based objective examinations. Data analyses yielded no linear or curvilinear relationships between completion speed and performance on individual tests. Similarly, no relationship was found between average completion speed on three tests and total score. The results suggest that test-completion speed on an untimed test is unrelated to test performance. The implications of these findings and suggestions for future research are discussed.

Journal ArticleDOI
TL;DR: In this article, the authors advocate the use of a stress paradigm in the assessment of children with behavior disorders and use the stress response scale to assess the behavioral patterns that a child is likely to adopt in response to stress.
Abstract: This paper advocates the use of a stress paradigm in the assessment of children with behavior disorders. The paradigm suggests that one element of an assessment should be the behavioral pattern that the child is likely to adopt in response to stress. The Stress Response Scale, designed to assess such behavioral patterns, is presented and discussed. In order to extend the scale's clinical utility, it was necessary to obtain data on the behavioral patterns that might typically be expected to be found with children in general. Data are presented which describes the most frequently found patterns among a population of school-aged children.