scispace - formally typeset
Search or ask a question

Showing papers in "Educational and Psychological Measurement in 2004"


Journal ArticleDOI
TL;DR: In 1997, noting that the 50th anniversary of the publication of "Coefficient Alpha and the Internal Structure of Tests" was fast approaching, Lee Cronbach planned what have become the notes publish...
Abstract: In 1997, noting that the 50th anniversary of the publication of “Coefficient Alpha and the Internal Structure of Tests” was fast approaching, Lee Cronbach planned what have become the notes publish...

1,139 citations


Journal ArticleDOI
Abstract: The authors provide a cautionary note on reporting accurate eta-squared values from multifactor analysis of variance (ANOVA) designs. They reinforce the distinction between classical and partial eta-squared as measures of strength of association. They provide examples from articles published in premier psychology journals in which the authors erroneously reported partial eta-squared values as representing classical etasquared values. Finally, they discuss broader impacts of inaccurately reported etasquared values for theory development, meta-analytic reviews, and intervention programs.

566 citations


Journal ArticleDOI
TL;DR: The Feedback Environment Scale as discussed by the authors is a new tool to assist in the diagnosis and training of managers in the area of feedback and coaching, and it has been used to assess the internal consistency, test-retest reliability, and discriminant validity of the facet scores of the scale.
Abstract: Managers are increasingly being held accountable for providing resources that support employee development, particularly in the form of feedback and coaching. To support managers as trainers and coaches, organizations must provide managers with the tools they need to succeed in this area. This article presents a new tool to assist in the diagnosis and training of managers in the area of feedback and coaching: the Feedback Environment Scale. This article also discusses the theoretically based definition of this new construct and the development and validation evidence for the scale that measures this construct. Confirmatory factor analyses supported the a priori measurement model, and assessment of relationships proposed in a preliminary nomological network provide initial support for the construct validity of the scale. Results also show evidence for the internal consistency, test-retest reliability, and discriminant validity of the facet scores of the Feedback Environment Scale.

426 citations


Journal ArticleDOI
TL;DR: In this article, a total of 1,247 college students participated in a study on the effect of scale format on the reliability of Likert-type rating scales and the results indicated that scales with few response categories tended to result in lower reliability, especially lower test-retest reliability.
Abstract: A total of 1,247 college students participated in this study on the effect of scale format on the reliability of Likert-type rating scales. The number of response categories ranged from 3 to 9. Anchor labels on the scales were provided for each response option or for the end points only. The results indicated that the scales with few response categories tended to result in lower reliability, especially lower test-retest reliability. The scales with all the response options clearly labeled were likely to yield higher test-retest reliability than those with only the end points labeled. Scale design that leads to consistent participant responses as indicated by test-retest reliability should be preferred.

361 citations


Journal ArticleDOI
TL;DR: In this article, a psychometric study of the short form of the Social and Emotional Loneliness Scale for Adults (SELSA-S) is presented, where data were collected via self-report measures and mail surveys from se...
Abstract: This article presents a psychometric study of the short form of the Social and Emotional Loneliness Scale for Adults (SELSA-S). Data were collected via self-report measures and mail surveys from se...

272 citations


Journal ArticleDOI
TL;DR: In this paper, the authors study the behavior of the chi-square difference test in such a circumstance and show that when the base model is misspecified, the z test for the statistical significance of a parameter estimate can also be misleading.
Abstract: In mean and covariance structure analysis, the chi-square difference test is often applied to evaluate the number of factors, cross-group constraints, and other nested model comparisons. Let model Ma be the base model within which model Mb is nested. In practice, this test is commonly used to justify Mb even when Ma is misspecified. The authors study the behavior of the chi-square difference test in such a circumstance. Monte Carlo results indicate that a nonsignificant chi-square difference cannot be used to justify the constraints in Mb. They also show that when the base model is misspecified, the z test for the statistical significance of a parameter estimate can also be misleading. For specific models, the analysis further shows that the intercept and slope parameters in growth curve models can be estimated consistently even when the covariance structure is misspecified, but only in linear growth models. Similarly, with misspecified covariance structures, the mean parameters in multiple group models can be estimated consistently under null conditions.

215 citations


Journal ArticleDOI
TL;DR: In this article, the authors developed an abridged version of the JIG scale for use by practitioners and researchers of organizational behavior, and reported the results of three validation studies documenting the process of scale reduction and the psychometric suitability of the reduced-length scale.
Abstract: The Job Descriptive Index family of job attitude measures includes the Job in General (JIG) scale, a measure of global satisfaction with one’s job. The scale was originally developed and validated by Ironson, Smith, Brannick, Gibson, and Paul. Following structured scale reduction procedures developed by Stanton, Sinar, Balzer, and Smith, the current authors developed an abridged version of the JIG for use by practitioners and researchers of organizational behavior. They report the results of three validation studies documenting the process of scale reduction and the psychometric suitability of the reduced-length scale.

213 citations


Journal ArticleDOI
TL;DR: In this article, the authors examined the psychometric properties of the achievement goal questionnaire (AGQ) when modified for a general academic context, and found evidence of a four-factor structure of achievement goal orientation when this measure was used in a course-specific context.
Abstract: The psychometric properties of the Achievement Goal Questionnaire (AGQ), when modified for a general academic context, were examined. Previous research has found evidence of a four-factor structure of achievement goal orientation when this measure was used in a course-specific context. This study is an important addition to goal orientation research for the following two reasons: (a) It provides additional support for four distinct factors of goal orientation, and (b) it answers the call for examining achievement goal orientation measures at different levels of specificity. The authors found that the four-factor structure of goal orientation replicated when used in a general academic context.

191 citations


Journal ArticleDOI
TL;DR: The Teacher Beliefs Survey (TBS) as mentioned in this paper ) is an instrument for assessing the beliefs of teachers related to constructivist and traditional approaches to teaching and learning, containing 21 items in three hypothetical constructs.
Abstract: The development and validation of the Teacher Beliefs Survey (TBS) is described. The TBS, an instrument for assessing the beliefs of teachers related to constructivist and traditional approaches to teaching and learning, contains 21 items in three hypothetical constructs. Elementary teachers, preservice (n = 61) and in-service (n = 137), participated in the development of the TBS. Analysis of this pilot data suggested a four-factor structure: Traditional Management, Traditional Teaching, Constructivist Teaching, and Constructivist Parent. A validation study included preservice teachers (n = 896). The results did not confirm the four-factor structure; further analysis suggested a three-factor structure, eliminating the Constructivist Parent factor. Future plans include development of a Constructivist Management factor and a larger pool of items for existing factors.

171 citations


Journal ArticleDOI
TL;DR: The Goal Orientation and Learning Strategies Survey (GOALS-S) as mentioned in this paper was designed to measure students' motivational goal orientations and their cognitive and metacognitive strategies, and results of first-order confirmatory factor analyses (CFAs) supported the factorial validity of the scales measuring students' goals and strategies.
Abstract: This article outlines the construction and validation of the Goal Orientation and Learning Strategies Survey (GOALS-S). This 84-item survey was designed to measure students’ motivational goal orientations and their cognitive and metacognitive strategies. Results of first-order confirmatory factor analyses (CFAs) supported the factorial validity of the GOALS-S scales measuring students’ goals and strategies (with goodness-of-fit indices in post-hoc models ranging from .908 to .981). In addition, higher order CFAs (HCFAs) support hierarchical structure of the GOALS-S scales (with goodness-of-fit indices ranging from .904 to .980). Finally, tests of invariance supported the factorial stability of the GOALS-S scales across gender groups (with goodness-of-fit indices ranging from .901 to .981).

168 citations


Journal ArticleDOI
TL;DR: In this article, preliminary psychometric data for two fathering measures, the existing Nurturant Fathering Scale and the newly developed Father Involvement Scale, were provided.
Abstract: This study provides preliminary psychometric data for two fathering measures, the existing Nurturant Fathering Scale and the newly developed Father Involvement Scale. Both measures are completed fr...

Journal ArticleDOI
TL;DR: In this paper, the authors compared several procedures for detecting differential item functioning (DIF): logistic regression analysis, the Mantel-Haenszel (MH) procedure, and the modified MH procedure by Mazor, Clauser, and Hambleton.
Abstract: This article compares several procedures in their efficacy for detecting differential item functioning (DIF): logistic regression analysis, the Mantel-Haenszel (MH) procedure, and the modified Mantel-Haenszel procedure by Mazor, Clauser, and Hambleton. It also compares the effect size measures that these procedures provide. In this study, different conditions of item parameters (difficulty and discrimination) and DIF magnitude were manipulated. Furthermore, both uniform and nonuniform DIF conditions were simulated. Results suggest that logistic regression analysis generally detected more items with DIF than the standard MH procedure and the modified MH procedure for symmetrical nonuniform DIF. The DIF effect size measures based on logistic regression, however, appeared to be insensitive to the specified DIF conditions.

Journal ArticleDOI
TL;DR: In this article, the authors present a formula for weighted kappa in terms of rater means, rater variances, and the rater covariance that is particularly helpful in emphasizing the sensitivity of kappa to differences in rater's marginal distributions.
Abstract: This article presents a formula for weighted kappa in terms of rater means, rater variances, and the rater covariance that is particularly helpful in emphasizing that weighted kappa is an absolute agreement measure in the sense that it is sensitive to differences in rater’s marginal distributions. Specifically, rater mean differences will decrease the value of weighted kappa relative to the value of the intraclass correlation that ignores mean differences. In addition, if rater variances also differ, then the value of weighted kappa will be decreased relative to the value of the product-moment correlation. Equality constraints on the rater means and variances are given to illustrate the relationships between weighted kappa, the intraclass correlation, and the product-moment correlation. In addition, the expression for weighted kappa shows that weighted kappa belongs to the Zegers-ten Berge family of chance-corrected association coefficients. More specifically, weighted kappa is equivalent to the chance-co...

Journal ArticleDOI
TL;DR: In this paper, a mixed-model approach, available through SAS PROC MIXED, was compared to a Welch-James type statistic, which is known to provide generally robust tests of treatment effects in a repeated measures between-by within-subjects design under assumption violations given certain sample size requirements.
Abstract: One approach to the analysis of repeated measures data allows researchers to model the covariance structure of their data rather than presume a certain structure, as is the case with conventional univariate and multivariate test statistics. This mixed-model approach, available through SAS PROC MIXED, was compared to a Welch-James type statistic. The Welch-James approach is known to provide generally robust tests of treatment effects in a repeated measures between-by within-subjects design under assumption violations given certain sample size requirements. The mixed-model F tests were based on Kenward-Roger’s adjusted degrees of freedom solution, an approach specifically proposed for small sample settings. The authors investigated Type I error control for repeated measures main and interaction effects in unbalanced designs when normality and covariance homogeneity assumptions did not hold. The mixed-model Kenward-Roger’s adjusted F tests showed superior Type I error control in small sample size conditions ...

Journal ArticleDOI
TL;DR: In this paper, a factor analysis of 24 maximum-performance and self-report EI measures administered to an undergraduate sample (N= 176) yielded five factors: emotional congruence, emotional independence, social perceptiveness, Alexithymia, and social confidence.
Abstract: Dimensions of Emotional Intelligence (EI) were derived, and their place with respect to the cognitive ability and personality domains was examined. A factor analysis of 24 maximum-performance and self-report EI measures administered to an undergraduate sample (N= 176) yielded five factors: Emotional Congruence, Emotional Independence, Social Perceptiveness, Alexithymia, and Social Confidence. Emotional Congruence had lowcorrelations with four cognitive ability factors and Big Five personality factors, indicating that it may represent either a new psychological construct or a method factor. Social Perceptiveness correlated significantly with cognitive abilities, indicating its place in this domain. The remaining three factors had moderate correlations with various personality dimensions and low correlations with cognitive abilities, indicating that they fall outside the latter domain. On the basis of the present results, only maximum-performance and not self-report measures of EI can be seen as tapping the...

Journal ArticleDOI
TL;DR: The Short Internalized Homonegativity Scale (SIHS) as discussed by the authors was developed to assess the domain of sexual comfort with gay men, a domain that has been notably absent from other measures of internalized homophobia.
Abstract: The purpose of the study was to develop a short measure of internalized homophobia (IH), one that reflected contemporary attitudes toward homosexuality and included items designed to assess the domain of sexual comfort with gay men, a domain that has been notably absent from other measures of IH. The Short Internalized Homonegativity Scale (SIHS) was informed by the Reactions to Homosexuality Scale (RHS) and the contention that currently available measures of IH were outdated in their assessment of the construct and/or failed to assess its covert manifestations. A geographically diverse sample of gay men completed an online questionnaire (N = 1,307), and the 677 respondents from the United States formed the sample for the study. Confirmatory factor analyses supported a single higher order construct of IH comprising the lower order factors of Public Identification as Gay, Sexual Comfort With Gay Men, and Social Comfort With Gay Men.

Journal ArticleDOI
TL;DR: The authors compared scores on two divergent thinking tests, the Verbal and the Figural Torrance Tests of Creative Thinking (TTCTs), with scores on creativity interest inventories, Davis's How Do You Think? and Raudsepp's How Creative Are You? and found that the creativity interest inventory showed weak correlations with the verbal TTCT and no correlation with the figurative TTCT.
Abstract: This study compared scores on two divergent thinking tests, the Verbal and the Figural Torrance Tests of Creative Thinking(TTCT), with scores on two creativity interest inventories, Davis’s How Do You Think? and Raudsepp’s How Creative Are You? The creativity interest inventories showed weak correlations with the Verbal TTCT and no correlations with the Figural TTCT. The two interest inventories showed a strong intercorrelation, but the Verbal and Figural TTCTs were only moderately intercorrelated. These interrelationships were not substantially affected by controlling for standardized academic test scores. A principal components analysis of the subscale scores on the four tests resulted in three factors: Interests and Attitudes, Verbal Divergent Thinking, and Figural Divergent Thinking. These results provide evidence that creative interest inventories are not measuring the same construct as divergent thinking tests and that both are distinct from academic aptitude/achievement. They also support the conte...

Journal ArticleDOI
TL;DR: The reliability of CAGE scores was investigated in this article, where the median internal consistency reliability across 22 samples was.74, with a range of.52 to.90, and sample age was the only identified sample characteristic that demonstrated a statistically significant relationship with CAGE score reliability.
Abstract: The CAGE is a commonly used alcohol screening instrument. Although considerable work has been done on the validity of CAGE scores, relatively little information is available on their reliability. Reliability induction and generalization studies were performed for the CAGE. Of the 259 studies available for analysis, only 19 (7.3%) contained reliability information for the sample scores. Thirteen (5.0%) and 227 (87.6%) articles made what are designated as reliability induction by report and reliability induction by omission. The median internal consistency reliability across 22 samples was .74, with a range of .52 to .90. Sample age was the only identified sample characteristic that demonstrated a statistically significant relationship with CAGE score reliability.

Journal ArticleDOI
TL;DR: The Institutional Integration Scale as discussed by the authors is a scale based on Tinto's model of college student withdrawal, which measures five facets of academic and social integration, including academic, social, academic, and social.
Abstract: The Institutional Integration Scale is claimed to measure five facets of college student academic and social integration. The scale was based on Tinto’s model of college student withdrawal. Psychometric properties of the scale were examined based on a sample of 1st-year college students. These results led to item revisions and additions. The scale was administered to a second sample of 1st-year college students. The final 34-item instrument showed improved psychometric properties. The revised scale scores had satisfactory internal consistency reliability and intercorrelations among the subscales and with the total scale. Confirmatory factor analysis revealed that the original theoretical model may be problematic. Revisions to the model resulted in improved fit.

Journal ArticleDOI
TL;DR: In this paper, a multivariate chance-corrected interobserver agreement measure is proposed to account for the number of judges and the expected disagreement for the case with different judges, based on Janson and Olsson's multivariate generalization of Cohen's kappa.
Abstract: This article addresses the problem of accounting overall multivariate chance-corrected interobserver agreement when targets have been rated by different sets of judges (not necessarily equal in number). The proposed approach builds on Janson and Olsson’s multivariate generalization of Cohen’s kappa but incorporates weighting for number of judges and applies an expression for expected disagreement suitable for the case with different judges. The authors suggest that the attractiveness of this approach to multivariate agreement measurement lies in the interpretability of the terms of expected and observed disagreement as average distances between observations, and that addressing agreement without regard to the covariance structure among variables has advantages in simplicity and interpretability. Correspondences to earlier approaches are noted, and application of the proposed measure is exemplified using hypothetical data sets.

Journal ArticleDOI
TL;DR: In this article, the authors used Tellegen and Briggs's formulae to convert the sum of scaled scores for four selected Wechsler Adult Intelligence Scale (WAIS) short-form combinations into full-scale IQ estimates.
Abstract: Tables permitting the conversion of short-form composite scores to full-scale IQ estimates have been published for previous editions of the Wechsler Adult Intelligence Scale (WAIS). Equivalent tables are now needed for selected subtests of the WAIS-III. This article used Tellegen and Briggs’s formulae to convert the sum of scaled scores for four selected WAIS-III short-form combinations into full-scale IQ estimates. Conversion tables providing full-scale IQ estimates across all age groups from the sum of scaled scores for these four short forms are presented. Reliability and validity estimates for these short forms are also provided for all age groups.

Journal ArticleDOI
TL;DR: In this paper, a multiple-group second-order confirmatory factor analysis was performed to verify if the factor structure of Spreitzer's PE questionnaire was invariant between groups of 191 male and 200 female nurses.
Abstract: Psychological empowerment (PE) is presumed to be a second-order latent construct composed of four dimensions: meaning, competence, self-determination, and impact. Based on the results of two validation studies, it has been hypothesized that loadings of the four dimensions on PE could vary across gender groups. A multiple-group second-order confirmatory factor analysis was performed to verify if the factor structure of Spreitzer’s PE questionnaire was invariant between groups of 191 male and 200 female nurses. Results indicated that the structure of the PE questionnaire could be assumed invariant across genders. Directions for future research are discussed.

Journal ArticleDOI
TL;DR: The California Measure of Mental Motivation (CM3) as mentioned in this paper was developed to assess secondary students' disposition toward critical thinking. But, the CM3 was not designed for the assessment of academic achievement, rather it was designed to assess learning orientation, creative problem solving, mental focus, and cognitive integrity.
Abstract: There is agreement that fostering K-12 students’ critical thinking is a worthwhile endeavor. However, many educators would agree that there are students in their class-rooms who are able to think well but often choose not to utilize those skills. Little is known about the critical thinking dispositions of elementary and secondary students. This article reports on the development of a new instrument, the California Measure of Mental Motivation (CM3). Results from four independent and diverse studies demonstrate the suitability of the CM3 as a tool to assess secondary students’ disposition toward critical thinking. Exploratory factor analysis, with oblique rotation, indicated four theoretically meaningful dimensions: Learning Orientation, Creative Problem Solving, Mental Focus, and Cognitive Integrity. The four factors demonstrated a satisfactory level of stability across study samples. Scales derived from these four factors correlated with known measures of student motivation and academic achievement.

Journal ArticleDOI
TL;DR: In this paper, sample-size restrictions limit the contingency table approaches based on asymptotic distributions, such as the Mantel-Haenszel (MH) procedure, for detecting differential item functioning (DIF) in m...
Abstract: Sample-size restrictions limit the contingency table approaches based on asymptotic distributions, such as the Mantel-Haenszel (MH) procedure, for detecting differential item functioning (DIF) in m...

Journal ArticleDOI
TL;DR: This paper investigated the current research practice concerning reporting measurement validity evidence based on a sample of 696 research reports listed in the American Psychological Association's Directory of Unpublished Experimental Mental Measures and found that only 55% of the reports included any type of validity evidence.
Abstract: This study investigates the current research practice concerning reporting measurement validity evidence based on a sample of 696 research reports listed in the American Psychological Association’s Directory of Unpublished Experimental Mental Measures. Only 55% of the reports included any type of validity evidence. This was a substantially lower percentage than the percentage for reports of reliability found in an earlier study. Of those entries that included validity evidence, the vast majority reported correlations with other variables. Little use was made of the numerous other types of validation approaches described in measurement textbooks and in the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education’s Standards for Educational and Psychological Testing. Inconsistent reports of validity characterized nearly all journals covered in the study.

Journal ArticleDOI
TL;DR: In this article, a method for incorporating maximum likelihood (ML) estimation into reliability analyses with item-level missing data is outlined, and an ML estimate of the covariance matrix is first obtained using the EM algorithm, and coefficient alpha is subsequently computed using standard formulae.
Abstract: A method for incorporating maximum likelihood (ML) estimation into reliability analyses with item-level missing data is outlined. An ML estimate of the covariance matrix is first obtained using the expectation maximization (EM) algorithm, and coefficient alpha is subsequently computed using standard formulae. A simulation study demonstrated that the EMapproach yields (a) less bias in reliability estimates, (b) dramatically reduces cross-sample fluctuation of estimates, and (c) yields more accurate confidence intervals. Implications for reliability reporting practices are discussed, and the EM procedure is demonstrated using a heuristic data set.

Journal ArticleDOI
TL;DR: In this article, the authors investigated whether the six-factor structure of the Frost Multidimensional Perfectionism Scale could be replicated in a community-based sample, and the results showed that negative projections, organization, parental influence, achievement expectation, and achievement expectation were the most interpretable subscales.
Abstract: The purpose of the study was to investigate whether the six-factor structure of the Frost Multidimensional Perfectionism Scale could be replicated in a community-based sample. A sample of 255 adult participants (55.7% female, 44.3% male) ranging in age from 18 to 78 (mean = 37.0) completed the questionnaire. Based on the scree test and parallel analysis, four factors were selected for rotation. Varimax and oblimin rotation yielded four clearly interpretable subscales: Negative Projections, Organization, Parental Influences, and Achievement Expectations. Similarities between the original factor analysis and alternative solutions are discussed, as are the reasons for the suggested renaming of the factors.

Journal ArticleDOI
TL;DR: A meta-analytic study examined previously published studies of caregiving to identify factors that predict variance in reliability estimates (i.e., reliability generalization) of the Center for Epidemiologic Studies-Depression (CES-D) Scale as mentioned in this paper.
Abstract: The Center for Epidemiologic Studies-Depression (CES-D) Scale is among the most commonly used measures of depressive symptomatology. Despite this, a paucity of research has been undertaken to examine the psychometric properties of responses to this scale. This meta-analytic study examined previously published studies of caregiving to identify factors that predict variance in reliability estimates (i.e., reliability generalization). The results suggest that the type of care recipient, the relationship to the care recipient, and CES-D Scale length each statistically affect reliability estimates. Only the number of items, however, appears to have a substantive effect. It is thus recommended that the original 20-item scale be used. Overall, it appears that responses to the CES-D Scale by care providers are largely reliable across these populations. The findings of an informal survey of authors suggest an incomplete awareness and appreciation for issues regarding reliability induction.

Journal ArticleDOI
TL;DR: The authors used generalizability theory (GT) and many-facet Rasch measurement (MFRM) to evaluate psychometric properties of responses obtained from an assessment designed to measure complex problem-solving skills.
Abstract: This study describes the use of generalizability theory (GT) and many-facet Rasch measurement (MFRM) to evaluate psychometric properties of responses obtained from an assessment designed to measure complex problem-solving skills. The assessment revolved around the school activity of kickball. The task required of each student was to decide on a team T-shirt based on visual characteristics of the T-shirts and written descriptions of other T-shirt characteristics. Forty-four fourth-grade students, comprising the control group of a longitudinal research project to foster complex problem-solving skills of disadvantaged urban youth, participated in this study. Results indicate both measurement techniques agree on the relative magnitudes of variation among the facets but differ on how to handle the sources of variation.

Journal ArticleDOI
TL;DR: In this paper, the authors developed scales to assess instrumental help seeking, executive help-seeking, perceived benefits of help seeking and avoidance of helpseeking and examined their psychometric properties by conducting factor and reliability analyses.
Abstract: The purpose of this studywas to develop scales to assess instrumental help seeking, executive help seeking, perceived benefits of help seeking, and avoidance of help seeking and to examine their psychometric properties by conducting factor and reliability analyses. As this is the first attempt to examine the latent structures underlying the measured items, the authors conducted exploratory factor analyses. In addition, they also examined the relationship between the help-seeking scales and motivation and achievement constructs frequently used in the study of academic motivation. Results supported the continued use and development of the newscales, which can be adapted to assess help-seeking behavior across varied academic domains.