scispace - formally typeset
Search or ask a question

Showing papers in "Educational and Psychological Measurement in 2010"


Journal ArticleDOI
TL;DR: This article investigates the choice of sample size for pilot studies from a perspective particularly related to instrument development.
Abstract: Pilot studies are often recommended by scholars and consultants to address a variety of issues, including preliminary scale or instrument development. Specific concerns such as item difficulty, item discrimination, internal consistency, response rates, and parameter estimation in general are all relevant. Unfortunately, there is little discussion in the extant literature of how to determine appropriate sample sizes for these types of pilot studies. This article investigates the choice of sample size for pilot studies from a perspective particularly related to instrument development. Specific recommendations are made for researchers regarding how many participants they should use in a pilot study for initial scale development.

625 citations


Journal ArticleDOI
TL;DR: The Motivation at Work Scale (MAWS) as discussed by the authors was developed in accordance with the multi-dimensional conceptualization of motivation postulated in self determination theory, and the authors examined the structure of the MAWS in a group of 1,644 workers in two different languages, English and French.
Abstract: The Motivation at Work Scale (MAWS) was developed in accordance with the multi­ dimensional conceptualization of motivation postulated in self ­ determination theory. The authors examined the structure of the MAWS in a group of 1,644 workers in two different languages, English and French. Results obtained from these samples suggested that the structure of motivation at work across languages is consistently organized into four different types: intrinsic motivation, identified regulation, introjected regulation, and external regulation. The MAWS subscales were predictably associated with organizational behavior constructs. The importance of this new multidimensional scale to the development of new work motivation research is discussed.

533 citations


Journal ArticleDOI
TL;DR: In this article, the authors examined the dimensionality of the VARK learning styles inventory and found that correlated method models had the best fit to VARK scores, while potential problems related to item wording and the scale's scoring algorithm were identified, and cautions with respect to using VARK with research were raised.
Abstract: The authors examined the dimensionality of the VARK learning styles inventory. The VARK measures four perceptual preferences: visual (V), aural (A), read/write (R), and kinesthetic (K). VARK questions can be viewed as testlets because respondents can select multiple items within a question. The correlations between items within testlets are a type of method effect. Four multitrait—multimethod confirmatory factor analysis models were compared to evaluate the dimensionality of the VARK. The correlated trait—correlated method model had the best fit to the VARK scores. The estimated reliability coefficients were adequate. The study found preliminary support for the validity of the VARK scores. Potential problems related to item wording and the scale’s scoring algorithm were identified, and cautions with respect to using the VARK with research were raised.

386 citations


Journal ArticleDOI
TL;DR: In this paper, a self-report questionnaire was administered to undergraduates in introductory psychology Confirmatory factor analyses (CFA) supported a three-factor model that differentiated between interest generated by (a) the presentation of course material that grabbed students' attention (triggered-SI), (b) the extent to which the material itself was enjoyable and engaging (maintained-SI-feeling), and (c) whether the material was viewed as important and valuable.
Abstract: Three studies were conducted to develop and validate scores on a new measure appropriate for assessing adolescents’ situational interest (SI) across various academic settings In Study 1 (n 858), a self-report questionnaire was administered to undergraduates in introductory psychology Confirmatory factor analyses (CFA) supported a three-factor model that differentiated between interest generated by (a) the presentation of course material that grabbed students’ attention (triggered-SI), (b) the extent to which the material itself was enjoyable and engaging (maintained-SI-feeling), and (c) whether the material was viewed as important and valuable (maintained-SI-value) CFA analyses in Study 2 (n 284) and Study 3 (n 246) also supported the three-factor situational interest model for middle and high school students in mathematics Moreover, situational interest was shown to be distinct from individual interest and was a statistically significant predictor of change in individual interest across the school year

297 citations


Journal ArticleDOI
TL;DR: In this paper, the bias and mean squared error of the two estimators were assessed via Monte Carlo simulation of meta-analyses with the standardized mean difference as the effect-size index.
Abstract: Most of the statistical procedures in meta-analysis are based on the estimation of average effect sizes from a set of primary studies. The optimal weight for averaging a set of independent effect sizes is the inverse variance of each effect size, but in practice these weights have to be estimated, being affected by sampling error. When assuming a random-effects model, there are two alternative procedures for averaging independent effect sizes: Hunter and Schmidt’s estimator, which consists of weighting by sample size as an approximation to the optimal weights; and Hedges and Vevea’s estimator, which consists of weighting by an estimation of the inverse variance of each effect size. In this article, the bias and mean squared error of the two estimators were assessed via Monte Carlo simulation of meta-analyses with the standardized mean difference as the effect-size index. Hedges and Vevea’s estimator, although slightly biased, achieved the best performance in terms of the mean squared error. As the differe...

160 citations


Journal ArticleDOI
TL;DR: In this article, the performance of parallel analysis using principal component analysis (PA-PCA) and parallel analysis with principal axis factoring to identify the number of underlying factors was compared.
Abstract: Population and sample simulation approaches were used to compare the performance of parallel analysis using principal component analysis (PA-PCA) and parallel analysis using principal axis factoring (PA-PAF) to identify the number of underlying factors. Additionally, the accuracies of the mean eigenvalue and the 95th percentile eigenvalue criteria were examined. The 95th percentile criterion was preferable for assessing the first eigenvalue using either extraction method. In assessing subsequent eigenvalues, PA-PCA tended to perform as well as or better than PA-PAF for models with one factor or multiple minimally correlated factors; the relative performance of the mean eigenvalue and the 95th percentile eigenvalue criteria depended on the number of variables per factor. PA-PAF using the mean eigenvalue criterion generally performed best if factors were more than minimally correlated or if one or more strong general factors as well as group factors were present.

159 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compare how students in Grades 3 to 6 respond to a mathematics attitudes instrument with a 4-point Likert-type scale compared with one with an additional neutral point.
Abstract: The purpose of this study was to compare how students in Grades 3 to 6 respond to a mathematics attitudes instrument with a 4-point Likert-type scale compared with one with an additional neutral point (a 5-point Likert-type scale). The 606 participating students from six elementary and middle schools randomly received either the 4-point or 5-point format of the Math and Me Survey. Regardless of whether a neutral midpoint was offered or not, the structure of the instrument was virtually the same, with equal intercepts, means, variances and covariances, pattern coefficients, and nearly all residuals. The 5-point scale is preferred with this population because with this format the reliability estimate for the Mathematical Self-Perceptions subscale was higher (p = .049), and the pattern coefficients were stronger. Additionally, this format provided less model misfit than the 4-point format. Based on these findings, the authors recommend administration of the Math and Me Survey in the 5-point format. These fin...

127 citations


Journal ArticleDOI
TL;DR: In this paper, the concept of invariant item ordering (IIO) for polytomously scored items is discussed and methods for investigating an IIO in real test data are proposed.
Abstract: This article discusses the concept of an invariant item ordering (IIO) for polytomously scored items and proposes methods for investigating an IIO in real test data. Method manifest IIO is proposed...

117 citations


Journal ArticleDOI
TL;DR: In this paper, the authors tested five confirmatory factor analytic (CFA) models of the Positive Affect Negative Affect Schedule (PANAS) to provide validity evidence based on its internal structure.
Abstract: This study tested five confirmatory factor analytic (CFA) models of the Positive Affect Negative Affect Schedule (PANAS) to provide validity evidence based on its internal structure. A sample of 223 club sport athletes indicated their emotions during the past week. Results revealed that an orthogonal two-factor CFA model, specifying error correlations according to Zevon and Tellegen’s mood content categories, provided the best fit to our data. In addition, parameter estimates for this model suggest that PANAS scores are reliable and explain large proportions of item variance. Taken together with previous research, the findings further suggest that the PANAS may be a higher-order measure of affect and includes several consistently problematic items. The authors recommend that affect researchers attempt to improve the PANAS by (a) revising consistently problematic items, (b) adding new items to better capture mood content categories, and (c) providing additional internal structure validity evidence through ...

113 citations


Journal ArticleDOI
TL;DR: The authors empirically synthesize previous studies to investigate whether or not the Graduate Record Examination (GRE) predicts the performance of students in master's programs as well as performance of doctoral students.
Abstract: Extensive research has examined the effectiveness of admissions tests for use in higher education. What has gone unexamined is the extent to which tests are similarly effective for predicting performance at both the master’s and doctoral levels. This study empirically synthesizes previous studies to investigate whether or not the Graduate Record Examination (GRE) predicts the performance of students in master’s programs as well as the performance of doctoral students. Across nearly 100 studies and 10,000 students, this study found that GRE scores predict first year grade point average (GPA), graduate GPA, and faculty ratings well for both master’s and doctoral students, with differences that ranged from small to zero.

103 citations


Journal ArticleDOI
TL;DR: In this paper, the authors explore the impact of random responding on the size and direction of correlations observed between multi-item inventory scores and show that even low base rates of random responses can significantly affect observed correlations, especially when the inventories in question assess low or high base rate phenomena.
Abstract: Random responding to psychological inventories is a long-standing concern among clinical practitioners and researchers interested in interpreting idiographic data, but it is typically viewed as having only a minor impact on the statistical inferences drawn from nomothetic data. This article explores the impact of random responding on the size and direction of correlations observed between multi-item inventory scores. Random responses to individual items result in nonrandomly distributed inventory-level scores. Therefore, even low base rates of random responding can significantly affect the statistical inferences made from inventory-level data. Study 1 uses simulations to show that even low base rates of random responding can significantly affect observed correlations, especially when the inventories in question assess low or high base rate phenomena. Study 2 uses archival data to illustrate the moderating effect of random responding on observed correlations in two samples.

Journal ArticleDOI
TL;DR: In this paper, the authors investigated the effect of outlier contamination for binary and ordinal response scales and found that the coefficient alpha estimates were severely inflated with the presence of outliers, and like the earlier findings, the effects of the outliers were reduced with increasing theoretical reliability.
Abstract: In a recent Monte Carlo simulation study, Liu and Zumbo showed that outliers can severely inflate the estimates of Cronbach’s coefficient alpha for continuous item response data—visual analogue response format. Little, however, is known about the effect of outliers for ordinal item response data—also commonly referred to as Likert, Likert-type, ordered categorical, or ordinal/rating scale item responses. Building on the work of Liu and Zumbo, the authors investigated the effects of outlier contamination for binary and ordinal response scales. Their results showed that coefficient alpha estimates were severely inflated with the presence of outliers, and like the earlier findings, the effects of outliers were reduced with increasing theoretical reliability. The efficiency of coefficient alpha estimates (i.e., sample-to-sample variation) was inflated as well and affected by the number of scale points. It is worth noting that when there were no outliers, the alpha estimates were downward biased because of the ordinal scaling. However, the alpha estimates were, in general, inflated in the presence of outliers leading to positive bias.

Journal ArticleDOI
TL;DR: The Mathematics Value Inventory (MVI) as mentioned in this paper ) is a self-report inventory that measures individual differences in the perceived value of mathematical literacy for general education students, which is grounded in the Eccles et al. model of achievement-related choices and surveys students beliefs in four areas: interest, general utility, need for high achievement, and personal cost.
Abstract: The goal of this study was to develop a self-report inventory that measures individual differences in the perceived value of mathematical literacy for general education students.The Mathematics Value Inventory (MVI) is grounded in the Eccles et al. model of achievement-related choices and surveys students’ beliefs in four areas: interest, general utility, need for high achievement, and personal cost. This study describes the development and initial score validation of the MVI. As hypothesized, it was found that (a) MVI scores for students who were not majoring in math did not differ by gender, (b) students who had higher MVI scores had completed more college course work in math than did students with lower scores, and (c) MVI scores were not related to scores on a measure of social desirability.

Journal ArticleDOI
TL;DR: In this paper, the authors argue that a fixed cut-point is not applicable because the distribution of eigenvalues or their ratios depends on sample size and test length, just like other statistics, and propose three chi-square statistics for multivariate independence to test the correlation matrix obtained from the standardized residuals.
Abstract: Dimensionality is an important assumption in item response theory (IRT). Principal component analysis on standardized residuals has been used to check dimensionality, especially under the family of Rasch models. It has been suggested that an eigenvalue greater than 1.5 for the first eigenvalue signifies a violation of unidimensionality when there are 500 persons and 30 items. The cut-point of 1.5 is often used beyond this specific condition of sample size and test length. This study argues that a fixed cut-point is not applicable because the distribution of eigenvalues or their ratios depends on sample size and test length, just like other statistics. The authors conducted a series of simulations to verify this argument. They then proposed three chi-square statistics for multivariate independence to test the correlation matrix obtained from the standardized residuals. Through simulations, it was found that Steiger’s statistic behaved fairly like a chi-square distribution, when its degrees of freedom were ...

Journal ArticleDOI
TL;DR: This article examined the factor structure of an instrument developed to assess elementary students' individual perceptions of their classroom environments and found that the four-factor model appeared to be a more tenable solution because of its equally adequate fit indexes, parsimony, exploratory factor analytic support, and the high correlations between some factors.
Abstract: The purpose of this study was to examine the factor structure of an instrument developed to assess elementary students’ individual perceptions of their classroom environments. The Student Personal Perception of Classroom Climate (SPPCC) originally consisted of six subscales adapted from previously published scales. Exploratory factor analysis identified the underlying dimensions of the SPPCC. The authors subsequently tested the four-factor model against the six-factor model using confirmatory factor analyses with an independent sample of students. The four-factor model appeared to be a more tenable solution because of its equally adequate fit indexes, parsimony, exploratory factor analytic support, and the high correlations between some factors. Future research and potential limitations of the study are discussed.

Journal ArticleDOI
TL;DR: In this article, the authors suggest that competitive work environments may influence individual's attitudes, behaviors, stress, and performance, and that adequate measures of competitive environments are needed to evaluate them.
Abstract: Recent research suggests that competitive work environments may influence individual’s attitudes, behaviors, stress, and performance. Unfortunately, adequate measures of competitive environments ar...

Journal ArticleDOI
TL;DR: In this article, the authors examined the incidence of reporting evidence based on test consequences in Mental Measurements Yearbook and identified additional possible outleaving strategies for reporting evidence in the yearbook.
Abstract: This study followed up on previous work that examined the incidence of reporting evidence based on test consequences in Mental Measurements Yearbook. In the present study, additional possible outle...

Journal ArticleDOI
TL;DR: In this article, the authors empirically examined two operationalizations of the core self-evaluation construct: (a) the Judge, Erez, Bono, and Thoresen 12-item scale and (b) a composite measure of self-esteem, selfefficacy, locus of control, and neuroticism.
Abstract: The authors empirically examined two operationalizations of the core self-evaluation construct: (a) the Judge, Erez, Bono, and Thoresen 12-item scale and (b) a composite measure of self-esteem, self-efficacy, locus of control, and neuroticism.The study found that the composite scale relates more strongly than the shorter scale to performance, perceived job complexity, positive affectivity, personal trust, and belief in a just world. However, the short scale performed well and may be more practical in organizational research. The authors conclude that the 12-item measure is better used in research when participant time is limited and that a composite index is better when time is not a constraining factor in the data-collection process.

Journal ArticleDOI
TL;DR: This paper examined the psychometric effects of providing immediate feedback on the correctness of answers to open-ended questions, and allowing participants to revise their answers following feedback, and found that participants answering verbal and math questions are able to correct many of their initial incorrect answers, resulting in higher revised scores.
Abstract: Two experiments examine the psychometric effects of providing immediate feedback on the correctness of answers to open-ended questions, and allowing participants to revise their answers following feedback. Participants answering verbal and math questions are able to correct many of their initial incorrect answers, resulting in higher revised scores. In addition, the reliability of these scores is significantly higher than the reliability of scores based on no feedback. Finally, anxiety is significantly lower following a test section with feedback and revision.

Journal ArticleDOI
TL;DR: In this article, self-and peer-reported self- and peer-reports of EI and Big Five personality traits were used to confirm an a priori four-factor model for the Wong and Law Emotional Intelligence Scale (WLEIS) and a five factor model for Goldberg's International Personality Item Pool (IPIP).
Abstract: A major stumbling block for emotional intelligence (EI) research has been the lack of adequate evidence for discriminant validity In a sample of 280 dyads, self- and peer-reports of EI and Big Five personality traits were used to confirm an a priori four-factor model for the Wong and Law Emotional Intelligence Scale (WLEIS) and a five-factor model for Goldberg’s International Personality Item Pool (IPIP) After demonstrating measurement equivalence between self-report and peer-report for both scales, the authors show discriminant validity between the four EI subfacets and Big Five personality traits This is accomplished through a series of structural equation models fit to the mutitrait-multimethod matrix Despite their conclusion of discriminant validity, the authors note strong latent correlations between Others’ Emotion Appraisal and trait Agreeableness (φ = 87), between Use of Emotion and trait Conscientiousness (φ = 73), between Regulation of Emotion and trait Neuroticism (φ = −66), and between

Journal ArticleDOI
TL;DR: In this article, two challenges for using differential item functioning (DIF) measures when there are large group differences in true proficiency are illustrated, which may lead to inflated Type I error rates, for very different reasons.
Abstract: In this brief explication, two challenges for using differential item functioning (DIF) measures when there are large group differences in true proficiency are illustrated. Each of these difficulties may lead to inflated Type I error rates, for very different reasons. One problem is that groups matched on observed score are not necessarily well matched on true proficiency, which may result in the false detection of DIF due to inaccurate matching. The other problem is that a model that does not allow for a nonzero asymptote can produce what seems to be DIF. These issues have been discussed separately in the literature earlier. This article brings them together in a nontechnical form.

Journal ArticleDOI
TL;DR: There exist a variety of measurement instruments for assessing emotional intelligence (EI), and one approach is the use of other reports wherein knowledgeable informants indicate how well the scale ite... as mentioned in this paper.
Abstract: There exist a variety of measurement instruments for assessing emotional intelligence (EI). One approach is the use of other reports wherein knowledgeable informants indicate how well the scale ite...

Journal ArticleDOI
TL;DR: In this article, the authors examined the psychometric properties of scores from the University Attachment Scale, a measure that operationalizes group and member attachment as two separate dimensions of attachment to a university.
Abstract: This study examined the psychometric properties of scores from the University Attachment Scale, a measure that operationalizes group and member attachment as two separate dimensions of attachment to a university. A two-factor model was championed over a one-factor model providing evidence of a distinction between university attachment and member attachment. Relationships with external criteria provided further support for this distinction and construct validity evidence. As predicted, ‘‘involved’’ students had practically and statistically significantly higher group attachment than ‘‘noninvolved’’ students. Furthermore, transfer students had practically and statistically significantly lower member attachment than nontransfer students. Additionally, there was a statistically significant positive relationship between students’ perceived cohesion to the university and both group and member attachment. Overall, the authors believe that this is a promising new measure of university attachment.

Journal ArticleDOI
TL;DR: In this article, the validity of scores on the Homework purpose scale using 681 rural and 306 urban high school students was tested using confirmatory factor analysis on the rural sample.
Abstract: The purpose of this study is to test the validity of scores on the Homework Purpose Scale using 681 rural and 306 urban high school students. First, confirmatory factor analysis was conducted on the rural sample. The results reveal that the Homework Purpose Scale comprises three separate yet related factors, including Learning-Oriented Reasons, Adult-Oriented Reasons, and Peer-Oriented Reasons. This factor structure is tested with the data from the urban sample. Given an adequate level of configural, factor loading, common error covariance, and intercept invariance, the difference between the group means is further tested. The results reveal that urban high school students, as compared with their rural counterparts, are more likely to do homework for adult-oriented reasons.

Journal ArticleDOI
TL;DR: This paper provided a historical account and metasynthesis of which statistical techniques are most frequently used in the fields of education and psychology, and discussed trends for the education and psychological literature both individually and collectively.
Abstract: The purpose of the present study is to provide a historical account and metasynthesis of which statistical techniques are most frequently used in the fields of education and psychology. Six articles reviewing the American Educational Research Journal from 1969 to 1997 and five articles reviewing the psychological literature from 1948 to 2001 resulted in a total number of 17,698 techniques recorded from the 12,012 articles reviewed. No prior study of analytic practices has considered this broad scope of time and articles. Trends are discussed for the education and psychology literature both individually and collectively.

Journal ArticleDOI
TL;DR: In this paper, the authors test and cross-validate the Revised Two-Factor Study Process Questionnaire (R-SPQ-2F) factor structure, based on separate cohort data (Cohort 1: n = 1,490; Cohort 2: n= 1,533), among students attending a university in the United States.
Abstract: The Revised Two-Factor Study Process Questionnaire (R-SPQ-2F) is a measure of university students’ approach to learning. Original evaluation of the scale’s psychometric properties was based on a sample of Hong Kong university students’ scores. The purpose of this study was to test and cross-validate the R-SPQ-2F factor structure, based on separate cohort data (Cohort 1: n = 1,490; Cohort 2: n = 1,533), among students attending a university in the United States. Factor analytic results did not support the scale’s original factor structure, instead suggesting an alternative four-factor model of the scale data. In the cross-validation study, multisample confirmatory factor analysis results indicated that the scale’s measurement model parameters (e.g., factor loadings) were invariant across independent samples. Despite support for the scale’s respecified factor structure for Western university students, continued research is recommended to improve the scale’s psychometric properties. Implications for test sco...

Journal ArticleDOI
TL;DR: A revised version of the Coaching Competency Scale (CCS) was developed for athletes of high school teams (APCCS II-HST) as discussed by the authors and data were collected from athletes (N = 748) of seven relevant sports.
Abstract: The purpose of this validity study was to improve measurement of athletes’ evaluations of their head coach’s coaching competency, an important multidimensional construct in models of coaching effectiveness. A revised version of the Coaching Competency Scale (CCS) was developed for athletes of high school teams (APCCS II-HST). Data were collected from athletes (N = 748) of seven relevant sports. Athlete observations were clustered within teams (G = 74). Multigroup confirmatory factor analyses of the asymptotic within-teams covariance matrix provided evidence for factorial invariance, except for one residual variance, by athlete gender (nmale = 427, nfemale = 321). An exploratory multilevel confirmatory factor analysis provided evidence for close fit of an oblique five-factor within-teams structure and a one-factor between-teams structure.

Journal ArticleDOI
TL;DR: In this paper, Monte Carlo methods were used to simulate samples under known and controlled population conditions, and the results showed that the methods that proved to be the most accurate were those proposed by Bonett and Fisher.
Abstract: The purpose of this research is to examine eight of the different methods for computing confidence intervals around alpha that have been proposed to determine which of these, if any, is the most accurate and precise. Monte Carlo methods were used to simulate samples under known and controlled population conditions. In general, the differences in the accuracy and precision of the eight methods examined were negligible in many conditions. For the breadth of conditions examined in this simulation study, the methods that proved to be the most accurate were those proposed by Bonett and Fisher. Larger samples sizes and larger coefficient alphas also resulted in better interval coverage, whereas smaller numbers of items resulted in poorer interval coverage.

Journal ArticleDOI
TL;DR: Compared with the original global discrimination index method, the MMGDI method improves the recovery rate of each attribute and of the entire cognitive profile, especially the latter, which improves both the validity and reliability of the test scores from a CD-CAT program.
Abstract: This article proposes a new item selection method, namely, the modified maximum global discrimination index (MMGDI) method, for cognitive diagnostic computerized adaptive testing (CD-CAT). The new method captures two aspects of the appeal of an item: (a) the amount of contribution it can make toward adequate coverage of every attribute and (b) the amount of contribution it can make toward recovering the latent cognitive profile. A simulation study shows that the new method ensures adequate coverage of every attribute, which improves the validity of the test scores, and defensibility of the proposed uses of the test. Furthermore, compared with the original global discrimination index method, the MMGDI method improves the recovery rate of each attribute and of the entire cognitive profile, especially the latter. Therefore, the new method improves both the validity and reliability of the test scores from a CD-CAT program.

Journal ArticleDOI
TL;DR: Quantitative literacy is a habit of mind that is characterized by the interrelationship among a person's everyday understanding of mathematics, his or her beliefs about mathematics, and their or her... as discussed by the authors.
Abstract: Quantitative literacy is a habit of mind that is characterized by the interrelationship among a person’s everyday understanding of mathematics, his or her beliefs about mathematics, and his or her ...