scispace - formally typeset
Search or ask a question

Showing papers in "Educational and Psychological Measurement in 2002"


Journal ArticleDOI
TL;DR: A reliability generalization study for Spielberger's State-Trait Anxiety Inventory (STAI) was conducted by as mentioned in this paper, where a total of 816 research articles utilizing the STAI between 1990 and 2000 were reviewed and compared.
Abstract: A reliability generalization study for Spielberger’s State-Trait Anxiety Inventory (STAI) was conducted. A total of 816 research articles utilizing the STAI between 1990 and 2000 were reviewed and ...

855 citations


Journal ArticleDOI
TL;DR: This article developed a 12-item Likert-type measure of collective efficacy in schools, designed to assess the extent to which a faculty believes in its conjoint capability in terms of its ability to achieve collective efficacy.
Abstract: The present study reports on the development of a 12-item Likert-type measure of collective efficacy in schools. Designed to assess the extent to which a faculty believes in its conjoint capability...

273 citations


Journal ArticleDOI
TL;DR: A reliability generalization study was conducted for the Marlowe-Crowne Social Desirability Scale (MCSDS) as mentioned in this paper, which is the most commonly used tool designed to assess social desirability bias.
Abstract: A reliability generalization (RG) study was conducted for the Marlowe-Crowne Social Desirability Scale (MCSDS). The MCSDS is the most commonly used tool designed to assess social desirability bias ...

251 citations


Journal ArticleDOI
TL;DR: In this article, the authors trace histories of a variety of effect size indices, including relationship, group differences, and overlap, and review the histories in a multivariable and univariate setting.
Abstract: Depending on how one interprets what an effect size index is, it may be claimed that its history started around 1940, or about 100 years prior to that. An attempt is made in this article to trace histories of a variety of effect size indices. Effect size bases discussed pertain to (a) relationship, (b) group differences, and (c) group overlap. Multivariable as well as univariate indices are considered in reviewing the histories.

215 citations


Journal ArticleDOI
TL;DR: In this article, the validity of scores on the Student Adaptation to College Questionnaire (SACQ) in a sample of European university students was examined. And concurrent validity was established through significant correlations in the expected direction with alternative measures of student adjustment (academic motivation, loneliness, depression, and general adjustment to university).
Abstract: This study represents the first attempt to examine the validity of scores on the Student Adaptation to College Questionnaire (SACQ) in a sample of European university students. Concurrent validity was established through significant correlations in the expected direction with alternative measures of student adjustment (academic motivation, loneliness, depression, and general adjustment to university). Further concurrent validity evidence for selected subscales was provided through moderate associations with students’engagement in social activities and their self-reported use of psychological services provided on campus. Findings regarding predictive validity, as assessed through correlations with student attrition and academic results, went in the expected direction but were somewhat less convincing. The latter results are explained in terms of differences between European and North American systems of higher education. With some reservations regarding the Academic Adjustment subscale, then, the SACQ seem...

205 citations


Journal ArticleDOI
TL;DR: Reliability generalization (RG) was used to study five versions of the Working Alliance Inventory (WAI), including scores from 12 different scales, including internal consistency estimates, six interrater reliability estimates, and four study characteristics as mentioned in this paper.
Abstract: Reliability generalization (RG) was used to study five versions of the Working Alliance Inventory (WAI), including scores from 12 different scales. Sixty-seven internal consistency estimates, six interrater reliability estimates, and four study characteristics were analyzed. In general, reliability estimates of WAI scale scores appear to be robust. Mean reliability estimates ranged, in this sample of studies, from .79 to .97, with a modal estimate of .92. Variability in reliability estimates was, based on simple bivariate correlations, associated with client and therapist sample size for WAI total scores (observer version). Implications for measuring alliance using the WAI and conducting future RG studies on psychotherapy process measures are discussed.

174 citations


Journal ArticleDOI
TL;DR: The Myers-Briggs Type Indicator (MBTI) was submitted to a descriptive reliability generalization (RG) analysis to characterize the variability of measurement error in MBTI scores across administrations as discussed by the authors.
Abstract: The Myers-Briggs Type Indicator (MBTI) was submitted to a descriptive reliability generalization (RG) analysis to characterize the variability of measurement error in MBTI scores across administrations. In general, the MBTI and its scales yielded scores with strong internal consistency and test-retest reliability estimates, although variation was observed.

152 citations


Journal ArticleDOI
TL;DR: In this paper, the authors developed an abridged version of the Job Descriptive Index (AJDI) containing a total of 25 items and tested it on a national sample and a sample of university workers.
Abstract: The Job Descriptive Index is a popular measure of job satisfaction with five subscales containing 72 items. A national sample (n = 1,534) and a sample of university workers (n = 636) supported development of an abridged version of the Job Descriptive Index (AJDI) containing a total of 25 items. A systematic scale-reduction technique was employed with the first sample to decide which items to retain in each scale. The abridged subscales were then tested in the second sample. Results indicated that the relationships among the five abridged subscales and between the five abridged subscales and other measures were substantially preserved.

152 citations


Journal ArticleDOI
TL;DR: Reliability generalization (RG) is a measurement meta-analytic method used to explore the variability in score reliability estimates and to characterize the possible sources of this variance as mentioned in this paper.
Abstract: Reliability generalization (RG) is a measurement meta-analytic method used to explore the variability in score reliability estimates and to characterize the possible sources of this variance. This article briefly summarizes some RG considerations. Included is a description of how reliability confidence intervals might be portrayed graphically. The article includes tabulations across various RG studies, including how frequently authors (a) report score reliabilities for their own data, (b) conduct reliability induction, or (c) do not even mention reliability.

140 citations


Journal ArticleDOI
TL;DR: The Work Addiction Risk Test (WART) was designed to measure "workaholism" as mentioned in this paper, and the accuracy of the WART scores was investigated in a recent study.
Abstract: The Work Addiction Risk Test (WART) was designed to measure “workaholism.” The present study examines the underlying dimensions of the WART and investigated the accuracy of the WART scores to discr...

138 citations


Journal ArticleDOI
TL;DR: In this article, the behavior of item and person statistics obtained from two measurement frameworks, item response theory (IRT) and classical test theory (CTT), were examined using Monte Carlo techniques with simulated test data.
Abstract: Despite the well-known theoretical advantages of item response theory (IRT) over classical test theory (CTT), research examining their empirical properties has failed to reveal consistent, demonstrable differences. Using Monte Carlo techniques with simulated test data, this study examined the behavior of item and person statistics obtained from these two measurement frameworks. The findings suggest IRT- and CTT-based item difficulty and person ability estimates were highly comparable, invariant, and accurate in the test conditions simulated. However, whereas item discrimination estimates based on IRT were accurate across most of the experimental conditions, CTT-based item discrimination estimates proved accurate under some conditions only. Implications of the results of this study for psychometric item analysis and item selection are discussed.

Journal ArticleDOI
TL;DR: The authors examined the factorial validity, internal consistency, and predictive validity of scores from one measure of belonging to an after-school youth development program and found that belonging scores were positively related to actual program attendance over a 6-month period, self-reported attendance in the last week, and protective factors found in communities.
Abstract: Many youth development programs, including the Boys & Girls Clubs ofAmerica, feature belonging as a central piece in their theories ofchange. From a psychometric perspective, little is known about measures ofbelonging. This research examined the factorial validity, internal consistency, and predictive validity of scores from one measure of belonging to an after-school youth development program. Confirmatory factor analysis yielded a five-item measure from a calibration analysis that demonstrated “tight” cross validity in a cross-validation sample as well as factorial invariance between females and males. Internal consistency estimates for this 5-item scale exceeded .90 in both samples. Belonging scores were positively related to actual program attendance over a 6-month period, self-reported attendance in the last week, and protective factors found in communities. Belonging scores were moderately and negatively related to community-based risk factors.

Journal ArticleDOI
TL;DR: This article examined the construct of competitiveness, which has been variously defined and operationalized by psychologists for more than 100 years, and its relation to other constructs, and suggested that researchers should carefully define competitiveness and choose measures that reflect their own definition to improve interpretation of the results.
Abstract: This study examined the construct of competitiveness, which has been variously defined and operationalized by psychologists for more than 100 years, and its relation to other constructs. Four hypotheses regarding multidimensionality and related constructs were proposed and tested by administering 10 different paper-and-pencil measures to 140 undergraduate students. Two factor analyses (principal axis with varimax rotation) provided evidence of two factors that were labeled Self-Aggrandizement and Interpersonal Success. The results suggest that researchers should carefully define competitiveness and choose measures that reflect their own definition to improve interpretation of the results.

Journal ArticleDOI
TL;DR: In this article, the authors investigated the underlying factor structure of the 30-item EIS and constructed a reduced EIS (EIS-R) based on a reduced scale of 17 items.
Abstract: The purpose of this study was to investigate the underlying factor structure of the 30-item Emotional Intensity Scale (EIS) and to construct a reduced EIS (EIS-R). The psychometric characteristics of the scales were examined from three different samples: 204 employees of the University of Antwerp, a subset of 106 of the first sample who cooperated in a retest, and 510 men and women representative of the Belgian population. The original EIS does not seem to possess an adequate factor structure. Confirmatory factor analysis on a reduced scale of 17 items indicates that two factors underlie the EIS-R: a positive and a negative emotions factor. Furthermore, scores on the EIS-R were shown to have adequate reliability and validity.

Journal ArticleDOI
TL;DR: The fifth edition of the Publication Manual of the American Psychological Association (APA) draws on recommendations for improving statistical practices made by the APA Task Force on Statistical Inference (TFSI) as mentioned in this paper.
Abstract: The fifth edition of the Publication Manual of the American Psychological Association (APA) draws on recommendations for improving statistical practices made by the APA Task Force on Statistical Inference (TFSI). The manual now acknowledges the controversy over null hypothesis significance testing (NHST) and includes both a stronger recommendation to report effect sizes and a new recommendation to report confidence intervals. Drawing on interviews with some critics and other interested parties, the present review identifies a number of deficiencies in the new manual. These include lack of follow-through with appropriate explanations and examples of how to report statistics that are now recommended. At this stage, the discipline would be well served by a response to these criticisms and a debate over needed modifications.

Journal ArticleDOI
TL;DR: In this article, a reliability generalization study of the Geriatric Depression Scale (GDS) was conducted to further distill psychometric properties of the scores generated by this measure.
Abstract: Depression has proven to be a serious illness in older adults that often goes untreated because it is frequently misdiagnosed or is confused with other symptom patterns. One instrument that has been consistently cited in the literature as an effective indicator of depression in older adults is the Geriatric Depression Scale (GDS). The present study provided a reliability generalization (RG) study of the GDS in an effort to further distill psychometric properties of the scores generated by this measure. RG, a relatively new meta-analytic reliability procedure, was used to (a) identify the typical reliability of GDS scores across studies and (b) examine sources of measurement error across studies. Results from this investigation of 338 previously published research studies indicated that the average score reliability across studies was .8482 (SD = .0870) and that the number of items on the scale, scale SD, sample size, and participant population were the most important predictors of score reliability on thi...

Journal ArticleDOI
TL;DR: In this article, the authors compare and contrast the psychometric properties of four scales developed to measure hope and optimism, namely, the Revised Generalized Expectancy for Success Scale, the Life Orientation Test (LOT), the Hope Scale (HS), and the Hunter Opinions and Personal Expectations Scale.
Abstract: This study was designed to compare and contrast the psychometric properties of four scales developed to measure hope and optimism, namely, the Revised Generalized Expectancy for Success Scale, the Life Orientation Test (LOT), the Hope Scale (HS), and the Hunter Opinions and Personal Expectations Scale The definitions on which the measures are based are compared along with their reported reliability and construct validity Three hundred and forty-seven undergraduate students completed the scales along with measures of trait negative affect (TNA) and trait positive affect (TPA), task- and emotion-oriented coping and perceived stress All scales had adequate internal consistency, and there was strong evidence of convergent validity Regarding factor structure replicability, the LOT was marginally superior The possibility of the scales’redundancy because of contamination by TNA or TPA depended on the criterion construct It is argued that the LOT and HS are the scales of choice when assessing hope and/or op

Journal ArticleDOI
TL;DR: This paper conducted a reliability generalization study across studies and versions of the test and found that internal consistency and test-retest reliabilities for LSI scores fluctuate considerably and contribute to deleterious cumulative measurement error.
Abstract: The Learning Style Inventory (LSI) is a commonly employed measure of learning styles based on Kolb’s Experiential Learning Model. Nevertheless, the psychometric soundness of LSI scores has historically been critiqued. The present article extends this critique by conducting a reliability generalization study across studies and versions of the test. Results indicated that internal consistency and test-retest reliabilities for LSI scores fluctuate considerably and contribute to deleterious cumulative measurement error. Reliability variation was predictable by test version and several study features.

Journal ArticleDOI
TL;DR: The authors discusses procedures for constructing individual and simultaneous confidence intervals on contrasts on parameters of a number of fixed-effects ANOVA models, including multivariate analysis of variance (MANOVA) models for the analysis of repeated measures data.
Abstract: Although confidence interval procedures for analysis of variance (ANOVA) have been available for some time, they are not well known and are often difficult to implement with statistical packages. This article discusses procedures for constructing individual and simultaneous confidence intervals on contrasts on parameters of a number of fixed-effects ANOVA models, including multivariate analysis of variance (MANOVA) models for the analysis of repeated measures data. Examples show how these procedures can be implemented with accessible software. Confidence interval inference on parameters of random-effects models is also discussed.

Journal ArticleDOI
TL;DR: In this article, the authors examined score reliability for a measure of life satisfaction (LSI) and found no significant differences in score reliability by language of administration or sample type, including sample size, number of items, mean age, standard deviation of age, proportion female, and mean LSI score.
Abstract: The purpose of the present study was to examine score reliability for a measure of life satisfaction (Life Satisfaction Index [LSI]). This reliability generalization comprised a search of 157 journal articles, which resulted in the inclusion of a total of 34 samples. Results revealed an average reliability of .79 (SD = .10, median = .79). Bivariate correlational analyses revealed no relationships between score reliability and various sample characteristics, including sample size, number of items, mean age, standard deviation of age, proportion female, mean LSI score, and standard deviation of LSI scores. No significant differences in score reliability were found by language of administration or sample type. These analyses provide evidence for adequate reliability of LSI scores across a variety of sample characteristics; however, they must be interpreted with caution, given the small sample size. In addition, this study documents the poor reporting of psychometric properties in the LSI literature.

Journal ArticleDOI
TL;DR: For every 4-year college in the United States listed in the 1998 College Handbook of the College Board, the percentages of students graduating within 6 years of entering and of students having high school grade point averages (GPAs) of at least 3.00 were recorded.
Abstract: For every 4-year college in the United States listed in the 1998 College Handbook of the College Board, the percentages of students graduating within 6 years of entering and of students having high school grade point averages (GPAs) of at least 3.00 were recorded. The authors also obtained the College Board Scholastic Assessment Test I (SAT I) Verbal and Math and the American College Test (ACT) scores at the 25th and 75th percentiles of the distributions of scores of the enrolled freshmen. The SAT I Verbal and Math and the ACT scores at the 25th and 75th percentiles proved to be good predictors of the percentage of students graduating from the same institution that admitted them as freshmen (rs ranging from .62 to .73), as did the percentage of freshmen having high school GPAs of 3.00 or higher (r = .49). The correlations of the group percentages and means with the criterion were considerably higher than the predictive-validity coefficients of the SAT I and ACT scores for individual graduation as reported...

Journal ArticleDOI
Frank Baugh1
TL;DR: In this article, the authors emphasize that measurement issues must be explicitly considered even in studies that focus on substantive questions, and they discuss the dynamics associated with insufficient attention being paid to score reliabilities in substantive studies.
Abstract: The present article emphasizes that measurement issues must be explicitly considered even in studies that focus on substantive questions. First, dynamics associated with insufficient attention being paid to score reliabilities in substantive studies are discussed. Next, reasons to adjust effect size indices for score unreliability are presented. Finally, some procedures for adjusting effect sizes for score reliability are briefly reviewed.

Journal ArticleDOI
TL;DR: In this article, the authors explore the variability in reliability scores on a commonly used career scale, the Career Decision-Making Self-Efficacy Scale (CDMSE), and employ reliability generalization to identify typical score reliability, variability of score reliability and variables explaining this variability.
Abstract: The purpose of the present study was to explore the variability in reliability scores on a commonly used career scale, the Career Decision-Making Self-Efficacy Scale (CDMSE). Reliability generalization was employed to identify typical score reliability, variability of score reliability, and variables explaining this variability. Forty-nine pieces of work were examined, and the results revealed that 41% of them reported score reliability of their own data. Of the five subscales, Problem Solving showed the lowest score reliability. In addition, higher score reliability was associated with age, sample racial/ethnic demographics, and standard deviation of total mean score.

Journal ArticleDOI
TL;DR: In this paper, a 15-and 8-item shortened version of the Automatic Thoughts Questionnaire (ATQ) was used to measure cognitions associated with depression, and a single factor was found to underlie both reduced versions, with scores on this factor yielding strong estimates of internal consistency and nomological validity.
Abstract: Measures of depression are increasingly being used as outcomes or predictors by organizational and consumer psychologists. One such measure is the Automatic Thoughts Questionnaire (ATQ). However, questions about the 30-item ATQ’s factor structure and its length for use in survey research remain. The authors offer 15-and 8-item shortened versions of the ATQ. Two samples (n = 434 and n = 419) were used to derive the reduced versions. A single factor was found to underlie both reduced versions, with scores on this factor yielding strong estimates of internal consistency and nomological validity. Two more cross-validation samples (n = 163 and n = 91) also showed support for the 15-and 8-item versions. Overall, results suggest that these reduced-item versions of the ATQ are useful alternatives to measuring cognitions associated with depression.

Journal ArticleDOI
TL;DR: This paper investigated the equivalence of scores from computerized and paper-and-pencil versions of a reading placement test and found that both forms of the computerized versions produced higher vocabulary scores than the paper andpencil format and one form also had higher comprehension and total scores on the computerised version.
Abstract: This study investigated the equivalence of scores from computerized and paper-and-pencil versions of a reading placement test. Concerns about score equivalence on the computerized versions were warranted because of the speeded nature of the paper-and-pencil version and differences in text delivery and response modes. The results indicated that both forms of the computerized versions produced higher vocabulary scores than the paper-and-pencil format and one form also had higher comprehension and total scores on the computerized version. These difficulty differences, especially for the vocabulary scores, appeared related to the differences in response speed associated with use of a mouse to record responses in contrast to a pencil and answer sheet. Scale scores for the computerized versions had similar predictive power for course placement as paper-and-pencil scores. However, because these results were based on students from only seven institutions, additional studies are needed to investigate the comparabi...

Journal ArticleDOI
TL;DR: This paper employed an automated grader to evaluate essays, both holistically and with the rating of traits (content, organization, style, mechanics, and creativity) for Web-based student essays.
Abstract: This study employed an automated grader to evaluate essays, both holistically and with the rating of traits (content, organization, style, mechanics, and creativity) for Webbased student essays ser...

Journal ArticleDOI
TL;DR: A reliability generalization was conducted on Zuckerman's Sensation Seeking Scale, Form V (SSS-V) as discussed by the authors, and two hundred and forty-four empirical articles on the SSS-V were reviewed spanning a 20-year p...
Abstract: A reliability generalization (RG) was conducted on Zuckerman’s Sensation Seeking Scale, Form V (SSS-V). Two hundred and forty-four empirical articles on the SSS-V were reviewed spanning a 20-year p...

Journal ArticleDOI
TL;DR: In this paper, the authors investigated the relationship between test mode (paper and pencil vs. computerized with editorial control and computerized without editorial control) and computer familiarity with test performance on the Graduate Record Exam (GRE).
Abstract: Ideally, test performance is unrelated to the mode in which the test is administered. This study investigated the relationships between test mode (paper and pencil vs. computerized with editorial control and computerized without editorial control) and computer familiarity (lower, moderate, and higher) with test performance on the Graduate Record Exam (GRE). The GRE was administered to 222 undergraduates stratified by gender and randomly assigned to the three test mode groups. With self-reported grade point average as a covariate in a MANCOVA, the authors found that examinees in the paper-and-pencil group outperformed the computerized-without-editorial-control group on all subtests. The computerized-with-editorial-control group outperformed the computerized-without-editorial-control group on the Analytical subtest only. The authors also found a significant main effect for computer familiarity on the Analytical and Quantitative subtests. A significant interaction between computer familiarity and test mode o...

Journal ArticleDOI
TL;DR: The Learning and Study Strategies Inventory (LASSI) is used in hundreds of universities and high schools each year as discussed by the authors, and the reliability, structure, and criterion-related validity of LASSI scores are investigated.
Abstract: The Learning and Study Strategies Inventory (LASSI) is used in hundreds of universities and high schools each year. This study investigated the reliability, structure, and criterion-related validity of LASSI scores. Data were provided by 502 university students. Results suggest that the LASSI may not measure the postulated 10 scales typically used to report results.

Journal ArticleDOI
TL;DR: The Patterns of Adaptive Learning Survey (PALS) as mentioned in this paper was developed to assess a trichotomous achievement goal structure, which included the following subscales: Task Goal Orientation, Performance-Approach goal orientation, and Performance-Avoid Goal orientation.
Abstract: The Patterns of Adaptive Learning Survey (PALS) was developed to assess a trichotomous achievement goal structure, which included the following subscales: Task Goal Orientation, Performance-Approach Goal Orientation, and Performance-Avoid Goal Orientation. The use of the PALS in making inferences about these goal orientations was originally validated with a middle school sample of students. In this study, the authors computed Cronbach’s alphas and employed confirmatory factor analytic procedures to provide statistical evidence of the reliability and validity of inferences based on scores from the PALS at the fourth-grade and college levels.