scispace - formally typeset
Search or ask a question

Showing papers in "Educational and Psychological Measurement in 2006"


Journal ArticleDOI
TL;DR: In this article, the authors developed a short questionnaire to measure work engagement, a positive work-related state of fulfillment characterized by vigor, dedication, and absorption, which is defined as "a positive work related state of fulfilment".
Abstract: This article reports on the development of a short questionnaire to measure work engagement—a positive work-related state of fulfillment that is characterized by vigor, dedication, and absorption. ...

5,203 citations


Journal ArticleDOI
TL;DR: In this paper, the authors examined the use of factor analysis in current published research across four psychological journals and found that factor analysis applied in over 50% of the studies in the literature.
Abstract: Given the proliferation of factor analysis applications in the literature, the present article examines the use of factor analysis in current published research across four psychological journals. ...

1,983 citations


Journal ArticleDOI
TL;DR: A hierarchy of measurement models that can be used to estimate reliability and a procedure by which structural equation modeling can beused to test the fit of these models to a set of data are illustrated.
Abstract: Coefficient alpha, the most commonly used estimate of internal consistency, is often considered a lower bound estimate of reliability, though the extent of its underestimation is not typically known. Many researchers are unaware that coefficient alpha is based on the essentially tau-equivalent measurement model. It is the violation of the assumptions required by this measurement model that are often responsible for coefficient alpha's underestimation of reliability. This article presents a hierarchy of measurement models that can be used to estimate reliability and illustrates a procedure by which structural equation modeling can be used to test the fit of these models to a set of data. Test and data characteristics that can influence the extent to which the assumption of tau-equivalence is violated are discussed. Both heuristic and applied examples are used to augment the discussion.

560 citations


Journal ArticleDOI
TL;DR: In this article, the reliability of responses to the items, as well as the item parameters of three GSE measures using item response theory, were examined, and the results indicate that the New General Self-Efficacy Scale has a slight advantage over the other measures examined in this study in terms of the item discrimination, item information, and relative efficiency of the test information function.
Abstract: General self-efficacy (GSE), individuals'belief in their ability to perform well in a variety of situations, has been the subject of increasing research attention. However, the psychometric properties (e.g., reliability, validity) associated with the scores on GSE measures have been criticized, which has hindered efforts to further establish the construct of GSE. This study examines the reliability of responses to the items, as well as the item parameters of three GSE measures using item response theory. Contrary to the criticisms, the responses to the items on all three measures of GSE demonstrate acceptable psychometric properties, especially at lower levels of GSE. The results indicate that the New General Self-Efficacy Scale has a slight advantage over the other measures examined in this study in terms of the item discrimination, item information, and relative efficiency of the test information function. Implications for GSE research are discussed.

336 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a teaching satisfaction measure based on the Life Satisfaction Scale (LSS) and validated it on a sample of 202 primary and secondary school teachers and favorable psychometric properties were found.
Abstract: The present study proposes a teaching satisfaction measure and examines the validity of its scores. The measure is based on the Life Satisfaction Scale (LSS). Scores on the five-item Teaching Satisfaction Scale (TSS) were validated on a sample of 202 primary and secondary school teachers and favorable psychometric properties were found. As hypothesized, teaching satisfaction as measured by the TSS correlated positively with self-esteem but negatively with psychological distress and teaching stress. The TSS scores had good incremental validity for psychological distress and teaching stress beyond earlier Job Satisfaction Scales. The TSS offers a simple, direct, reliable, and valid assessment of teaching satisfaction. Future development of the TSS is discussed.

195 citations


Journal ArticleDOI
TL;DR: In this article, two classification methods, latent class cluster analysis and cluster analysis, are used to identify groups of child behavioral adjustment underlying a sample of elementary school children aged 6 to 11 years.
Abstract: Two classification methods, latent class cluster analysis and cluster analysis, are used to identify groups of child behavioral adjustment underlying a sample of elementary school children aged 6 to 11 years. Behavioral rating information across 14 subscales was obtained from classroom teachers and used as input for analyses. Both the procedures and results were compared. The latent class cluster analysis uncovered three classes representing differing levels of children's behavioral adjustment (well adjusted, average adjustment, functionally impaired), whereas the cluster analysis uncovered seven groups of child behavior. Results show a high degree of overlap, and each procedure offers unique information toward classifying child behavior.

180 citations


Journal ArticleDOI
TL;DR: In this article, the authors describe the development and initial validation of obtained scores from the Academic Expectations Stress Inventory (AESI), which measures expectations as a source of academic stress in middle and high school Asian students.
Abstract: This article describes the development and initial validation of obtained scores from the Academic Expectations Stress Inventory (AESI), which measures expectations as a source of academic stress in middle and high school Asian students. In the first study, exploratory factor analysis results from 721 adolescents suggested a nine-item scale with two factors—Expectations of Parents/Teachers (five items) and Expectations of Self (four items). The data also revealed initial evidence of the reliability of AESI’s scores. Initial estimates of convergent validity for AESI’s scores were also reported. In the second study, data from 387 adolescents were subjected to a confirmatory factor analysis that provided support for the factor structure derived from the first study. In the third study, data from 144 adolescents yielded evidence of AESI scores’ test-retest reliability. Additional evidence of AESI’s internal consistency estimates as well as convergent and discriminant validity for AESI’s scores were also provided.

165 citations


Journal ArticleDOI
TL;DR: In this paper, the authors gathered internal structural validity and external criterion validity evidence for the Experiences in Close Relationships-Revised Questionnaire (ECR-R) scores, and confirmatory factor analysis of the data provided general support for the hypothesized two-factor model.
Abstract: The current study gathered internal structural validity and external criterion validity evidence for the Experiences in Close Relationships-Revised Questionnaire (ECR-R) scores. Specifically, confirmatory factor analysis of the data provided general support for the hypothesized two-factor model, and hypothesized relationships with external criteria were substantiated. However, minor model misfit and low communalities (R2) suggested that some items may represent extraneous constructs. Further avenues of study regarding the functioning of the instrument are provided.

144 citations


Journal ArticleDOI
TL;DR: In this paper, the Torrance Tests of Creative Thinking (TTCT) scores were used to assess the invariance of gender groups and grade levels in determining the fit of the model.
Abstract: There is disagreement among researchers as to whether creativity is a unidimensional or multidimensional trait. Much of the debate centers around the most widely used measure of creativity, the Torrance Tests of Creative Thinking (TTCT). This study used data from 1,000 kindergartners (ages 5-7), 1,000 third graders (ages 7-11) and 1,000 sixth graders (ages 10-13). Confirmatory factor analyses were conducted for both the two-factor model and one-factor model to determine which fit the data better. Measurement invariance across genders and grade levels was assessed using multiple group analyses in which sets of parameters were freed sequentially in a series of hierarchically nested models. The findings indicate that the structure of TTCT scores is consistent with a two-factor theory. Also, the results of the multiple group analyses indicate that model parameters for gender groups are more invariant than for grade levels in determining the fit of the model.

132 citations


Journal ArticleDOI
TL;DR: The Multisource Assessment of Social Competence Scale was developed, based on the School Social Behavior Scale and examined to test the factor pattern and the consistency of the ratings of self, peers, teachers, and parents as discussed by the authors.
Abstract: The Multisource Assessment of Social Competence Scale was developed, based on the School Social Behavior Scale and examined to test the factor pattern and the consistency of the ratings of self, peers, teachers, and parents. The findings of the confirmatory factor analysis supported a four-factor solution consistent with two main dimensions (prosocial and antisocial), each divided into two subdimensions (cooperating skills, empathy, impulsivity, and disruptiveness). The resultant model was cross-validated with a new sample. The fit indexes implied that the factor patterns were invariant for the two samples. The correlations between the four social agents were statistically significant, albeit quite low, indicating that the different sources tend to provide divergent pictures of a child's social competence. Statistically significant differences in social competence were found between educational settings and between genders.

132 citations


Journal ArticleDOI
TL;DR: In this paper, a confirmatory factor analysis of responses by 211 preadolescents with mild intellectual disabilities to the individually administered Self Description Questionnaire I-Individual Administration (SDQI-IA) counters widely cited claims that these children cannot differentiate multiple selfconcept factors.
Abstract: Confirmatory factor analysis of responses by 211 preadolescents (M age = 10.25 years,SD = 1.48) with mild intellectual disabilities (MIDs) to the individually administered Self Description Questionnaire I–Individual Administration (SDQI-IA) counters widely cited claims that these children cannot differentiate multiple self-concept factors. Results provide clear support for the a priori eight-factor solution, modest correlations between the factors (Mdn r = .38), substantial reliabilities (Mdn = .90), and invariance of the factor solution over gender, age, and educational placement (regular vs. special, segregated classes). Also introduced is a new hybrid compromise between multigroup and multipleindicator-multiple-cause (MIMIC) approaches to latent mean differences. Consistent with a priori predictions, preadolescents with MIDs have lower self-concepts in segregated classes than in regular classes for three academic self-concept scales (reading, math, and general-school) and, to a lesser extent, peer rela...

Journal ArticleDOI
TL;DR: In this article, the authors explored the latent structure of scores on the Learning and Study Strategies Inventory (LASSI) and analyzed the relationship between this structure and students' academic performance.
Abstract: This study explores the latent structure of scores on the Learning and Study Strategies Inventory (LASSI) and analyzes the relationship between this structure and students' academic performance. Two independent samples of college freshmen (n = 527) and seniors (n = 429) completed the LASSI. Data analysis of the first sample revealed acceptable psychometric properties and suggested a three-factor model, which was supported by a confirmatory analysis of the second sample data. The three latent constructs, labeled Affective Strategies, Goal Strategies, and Comprehension Monitoring Strategies, were shown to be interrelated, and the first two were positively linked to academic performance. The usefulness and rationale of the latent structure of the LASSI and the potential use of its scales are discussed.

Journal ArticleDOI
TL;DR: The Munroe Multicultural Attitude Scale Questionnaire (MASQUE) as discussed by the authors is based on the transformative approach of Banks's transformative approach, which specifically measured multicultural attitudes, and it was used to measure knowledge, act, and care.
Abstract: Institutions of higher education want to diversify their learning climates, and many offer courses in multiculturalism, yet these courses still do not meet the needs of attitudinal change. A new instrument was developed, the Munroe Multicultural Attitude Scale Questionnaire (MASQUE), that was theoretically based in Banks's transformative approach, which specifically measured multicultural attitudes. Psychometric properties of the instrument's scores are discussed. Exploratory factor analysis supported the “know,” “act,” and “care” domains of Banks's transformative approach, and the instrument was sensitive to detecting group differences on several demographic variables. The MASQUE's potential uses for affecting multicultural research and instruction are discussed.

Journal ArticleDOI
TL;DR: The 20-minute timed version of the Raven Advanced Progressive Matrices Test is compared to the untimed APM as a measure of intellectual ability in 1st-year psychology students and proves to be an adequate predictor of theUntimedAPM score.
Abstract: The Raven Advanced Progressive Matrices Test (APM) is a well-known measure of higher order general mental ability. The time to administer the test, 40 to 60 minutes, is sometimes regarded as a drawback. To meet efficiency needs, the APM can be administered as a 30-or 40-minute timed test, or one of two developed short versions could be used. In this study, the 20-minute timed version of the APM is compared to the untimed APM as a measure of intellectual ability in 1st-year psychology students. This 20-minute timed version proves to be an adequate predictor of the untimed APM score.

Journal ArticleDOI
TL;DR: In this paper, the effect of item parameter drift (IPD) on item response probabilities and true scores was investigated using analytical, numerical, and visual tools, using the well-known fact that item and examinee parameters are identical only up to a set of linear transformations specific to the functional form of a given IRT model.
Abstract: One theoretical feature that makes item response theory (IRT) models those of choice for many psychometric data analysts is parameter invariance, the equality of item and examinee parameters from different examinee populations or measurement conditions. In this article, using the well-known fact that item and examinee parameters are identical only up to a set of linear transformations specific to the functional form of a given IRT model, violations of these transformations for unidimensional IRT models are investigated using analytical, numerical, and visual tools. Because item parameter drift (IPD) constitutes a lack of invariance (LOI) at the individual item level or item set level, the magnitudes and effects of IPD on examinee response probabilities and true scores are algebraically derived and connected to empirical results from a recent simulation study. Thus, this article facilitates a deeper understanding of the exact statistical formulation of parameter invariance as a fundamental property of late...

Journal ArticleDOI
Abstract: Factor analysis was applied to the Wechsler Intelligence Scale for Children–Fourth Edition (WISC-IV) scores of 432 Pennsylvania students referred for evaluation for special education services to determine the factor structure of the WISC-IV with this population. A first-order, four-factor oblique solution that mirrored that found in the WISC-IV normative sample was supported. When transformed to an orthogonalized higher order model, the general factor accounted for the greatest amount of common (75.7%) and total (46.7%) variance. In contrast, the largest contribution by a first-order factor (Verbal Comprehension) was 6.5% of total variance. It was recommended that interpretation of the WISC-IV not discount the strong general factor.

Journal ArticleDOI
TL;DR: In this paper, the authors employ a Monte Carlo study to investigate the effects of coarse categorization of dependent variables on power to detect true effects using three classes of regression models: OLS regression, ordinal logistic regression, and ordinal probit regression.
Abstract: Variables that have been coarsely categorized into a small number of ordered categories are often modeled as outcome variables in psychological research. The authors employ a Monte Carlo study to investigate the effects of this coarse categorization of dependent variables on power to detect true effects using three classes of regression models: ordinary least squares (OLS) regression, ordinal logistic regression, and ordinal probit regression. Both the loss of power and the increase in required sample size to regain the lost power are estimated. The loss of power and required sample size increase were substantial under conditions in which the coarsely categorized variable is highly skewed, has few categories (e.g., 2, 3), or both. Ordinal logistic and ordinal probit regression protect marginally better against power loss than does OLS regression.

Journal ArticleDOI
TL;DR: In this article, a marginal maximum likelihood (MMLM) method is used to estimate the parameters of a multidimensional generalized partial credit model for repeated measures, which is shown that model fit can be evaluated using Lagrange multiplier tests.
Abstract: The application of multidimensional item response theory (IRT) models to longitudinal educational surveys where students are repeatedly measured is discussed and exemplified. A marginal maximum likelihood (MML) method to estimate the parameters of a multidimensional generalized partial credit model for repeated measures is presented. It is shown that model fit can be evaluated using Lagrange multiplier tests. Two tests are presented: the first aims at evaluation of the fit of the item response functions and the second at the constancy of the item location parameters over time points. The outcome of the latter test is compared with an analysis using scatter plots and linear regression. An analysis of data from a school effectiveness study in Flanders (Belgium) is presented as an example of the application of these methods. In the example, it is evaluated whether the concepts "academic self-concept," "well-being at school," and "attentiveness in the classroom" were constant during the secondary school period.

Journal ArticleDOI
TL;DR: In this article, the authors evaluated the psychometric properties of CCAI scores, specifically focusing on the replicability of the proposed four-factor structure and found poor fit of the four factor structure.
Abstract: The Cross-Cultural Adaptability Inventory (CCAI) was developed as a tool to assess an individual's effectiveness in cross-cultural interaction and communication. Because limited validity evidence had been published regarding the CCAI, the purpose of the current study is to evaluate the psychometric properties of CCAI scores, specifically focusing on the replicability of the proposed four-factor structure. This study is based on responses from a sample of 709, primarily Caucasian, college sophomores at a mid-Atlantic university. Confirmatory factor analysis indicated poor fit of the four-factor structure and follow-up exploratory factor analyses failed to reveal an interpretable structure. Possible explanations of poor fit are discussed, and recommendations for further research are suggested.

Journal ArticleDOI
Hanna Eklöf1
TL;DR: In this article, the authors used the expectancy-value model of achievement motivation as a basis for measuring student test-taking motivation and found that the test taking motivation construct is distinct from general attitudes toward a subject and that task value perceptions are distinct from task performance expectancies.
Abstract: Using the expectancy-value model of achievement motivation as a basis, this study's purpose is to develop, apply, and validate scores from a self-report instrument measuring student test-taking motivation. Sampled evidence of construct validity for the present sample indicates that a number of the items in the instrument could be used as an indicator of student test-taking motivation. Exploratory factor analyses suggests that the test-taking motivation construct is distinct from general attitudes toward a subject and that task value perceptions are distinct from task performance expectancies. The instrument needs further development to consolidate its psychometric properties and to elaborate on the test-taking motivation construct in relation to the expectancy-value theory of achievement motivation.

Journal ArticleDOI
TL;DR: The psychometric properties of the Rosenblate Multidimensional Perfectionism Scale (1990) were investigated to determine its usefulness as a measurement of perfectionism with Australian secondary school girls and to find empirical support for the existence of both healthy and unhealthy types of perfectionist students as discussed by the authors.
Abstract: The psychometric properties of the Frost, Marten, Lahart, and Rosenblate Multidimensional Perfectionism Scale (1990) are investigated to determine its usefulness as a measurement of perfectionism with Australian secondary school girls and to find empirical support for the existence of both healthy and unhealthy types of perfectionist students. Participants were 409 female mixed-ability students from Years 7 and 10 in two private secondary schools in Sydney, Australia. Factor analyses yielded four rather than the six factors previously theorized. Cluster analysis indicated a distinct typology of healthy perfectionists, unhealthy perfectionists, and nonperfectionists. Healthy perfectionists were characterized by higher levels on Organization, whereas unhealthy perfectionists scored higher on the Parental Expectations & Criticism and Concern Over Mistakes & Doubts dimensions of perfectionism. Both types of perfectionists scored high on Personal Standards.

Journal ArticleDOI
TL;DR: This paper evaluated the racial/ethnic and gender differential item functioning (DIF) of the Massachusetts Youth Screening Instrument-Second Version (MAYSI-2) using the Rasch Model.
Abstract: The juvenile justice system needs a tool that can identify and assess mental health problems among youths quickly with validity and reliability. The goal of this article is to evaluate the racial/ethnic and gender differential item functioning (DIF) of the Massachusetts Youth Screening Instrument-Second Version (MAYSI-2) using the Rasch Model. Data are presented from 3,906 assessments of male and female juvenile offenders between 13 and 17 years of age who are incarcerated in the California Youth Authority. DIF is identified in some items, raising concerns about the Suicide Ideation subscale as well as the Traumatic Experiences subscale that may require some further examination and revision.

Journal ArticleDOI
TL;DR: Two competing structural models for the revised Learning and Study Strategies Inventory (LASSI) were examined and confirmatory factor analysis of the subscale scores provided support for the ER-GO-CA model.
Abstract: Two competing structural models for the revised Learning and Study Strategies Inventory (LASSI) were examined. The test developers promote a model related to three uncorrelated components of strategic learning: skill, will, and self-regulation. Other investigators have shown empirical support for a three-factor correlated model characterized by effort-related activities, goal orientation, and cognitive activities (ER-GO-CA). Neither model has been verified on scores from the second edition of the LASSI. In the present sample of 297 college students, confirmatory factor analysis of the subscale scores provided support for the ER-GO-CA model.

Journal ArticleDOI
TL;DR: The authors investigated factors affecting the rates of extreme response using direct measures and Spanish-speaking respondents from rural and urban settings and found that the tendency to choose extreme portions of a rating scale seems to be rooted in the mismatch between scale characteristics and respondents' subjective categories, respondents familiarity with rating scales, and communication norms.
Abstract: Translation and cultural adaptation of rating scales are two critical components in testing culturally and/or linguistically heterogeneous populations. Despite the proper use of these scales, challenges typically arise from respondents’ language, culture, ratiocination, and characteristics of measurement processes. This study investigated factors affecting the rates of extreme response using direct measures and Spanish-speaking respondents from rural and urban settings. Issues of respondents’ familiarity with rating scales, respondents’ subjective categories, and scale characteristics were investigated and their relation to extreme response documented. The tendency to choose extreme portions of a rating scale seems to be rooted in the mismatch between scale characteristics and respondents’ subjective categories, respondents’ familiarity with rating scales, and communication norms. This article discusses implications of findings for validity of translated and adapted rating scales.

Journal ArticleDOI
TL;DR: This article analyzed admission data and first-year grade point average (GPA) data from 11 graduate management schools to evaluate the predictive validity of Graduate Management Admission Test (GMAT) sc...
Abstract: Admissions data and first-year grade point average (GPA) data from 11 graduate management schools were analyzed to evaluate the predictive validity of Graduate Management Admission Test® (GMAT®) sc...

Journal ArticleDOI
TL;DR: The Survey of Perceived Organizational Support (SPOS) is a unidimensional measure of the general belief held by an employee that the organization is committed to him or her, values his or her continued membership, and is generally concerned about the employee's well-being as discussed by the authors.
Abstract: The Survey of Perceived Organizational Support (SPOS) is a unidimensional measure of the general belief held by an employee that the organization is committed to him or her, values his or her continued membership, and is generally concerned about the employee's well-being. In the interest of efficiency, researchers are often compelled to use a minimum number of SPOS items in their studies. This study reports on a reliability generalization across 62 published studies using the SPOS. Findings suggest that number of SPOS items and mean age of the sample are statistically significant in the relationship to reliability estimates. Additionally, mean age accounted for significant variance in internal consistency estimates over and above the number of items used.

Journal ArticleDOI
TL;DR: In this article, the authors report a confirmatory factor analysis of the 13-item scale in two samples, and based on the results, a 6-item unidimensional scale is recommended.
Abstract: In 1982, Reilly developed a 13-item scale to measure role overload. This scale has been widely used, but most studies did not assess the unidimensionality of the scale. Given the significance of unidimensionality in scale development, the current study reports a confirmatory factor analysis of the 13-item scale in two samples. Based on the results, a 6-item unidimensional scale is recommended. Scores from this scale correlate in predicted fashion with external criterion variables related to role overload.

Journal ArticleDOI
TL;DR: Relational complexity (RC) theory conceptualizes an individual's processing capacity and a task's complexity along a common ordinal metric as discussed by the authors, and the Latin square is a common metric for RC.
Abstract: Relational complexity (RC) theory conceptualizes an individual’s processing capacity and a task’s complexity along a common ordinal metric. The authors describe the development of the Latin Square ...

Journal ArticleDOI
TL;DR: In this article, the authors compared three methods for setting a confidence interval (CI) around Cohen's standardized mean difference statistic: the noncentral-t-based, percentile (PERC) bootstrap, and biased-corrected and accelerated (BCA) method under three conditions of nonnormality, eight cases of sample size, and six cases of population effect size (ES) magnitude.
Abstract: Kelley compared three methods for setting a confidence interval (CI) around Cohen's standardized mean difference statistic: the noncentral-t-based, percentile (PERC) bootstrap, and biased-corrected and accelerated (BCA) bootstrap methods under three conditions of nonnormality, eight cases of sample size, and six cases of population effect size (ES) magnitude. Kelley recommended the BCA bootstrap method. The authors expand on his investigation by including additional cases of nonnormality. Like Kelley, they find that under many conditions, the BCA bootstrap method works best; however, they also find that in some cases of nonnormality, the method does not control probability coverage. The authors also define a robust parameter for ES and a robust sample statistic, based on trimmed means and Winsorized variances, and cite evidence that coverage probability for this parameter is good over the range of nonnormal distributions investigated when the PERC bootstrap method is used to set CIs for the robust ES.

Journal ArticleDOI
TL;DR: In this article, the classification accuracy of linear discriminant analysis (LDA), quadratic discriminant analyses (QDA), logistic regression (LR), and classification and regression trees (CART) under a variety of data conditions was compared.
Abstract: This study compares the classification accuracy of linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression (LR), and classification and regression trees (CART) under a variety of data conditions. Past research has generally found comparable performance of LDA and LR, with relatively less research on QDA and virtually none on CART. This study uses Monte Carlo simulations to assess the crossvalidated predictive accuracy of these methods, while manipulating such factors as predictor distribution, sample size, covariance matrix inequality, group separation, and group size ratio. The results indicate that QDA performs as well as or better than the other alternatives in virtually all conditions. Suggestions for practitioners are provided.