scispace - formally typeset
Search or ask a question

Showing papers in "Educational and Psychological Measurement in 1998"


Journal ArticleDOI
TL;DR: The authors compared multiple-item Likert-type measures of psychological constructs to single-item, non-Likert type measures of the same constructs using confirmatory factor analysis, the alternative forms were compared on criteria of methods variance and construct validity.
Abstract: Common methods variance often is a problem with psychological measures that require respondent self-reports of attitudes, beliefs, perceptions, and the like. The present study examined this problem by comparing multiple-item, Likert-type measures of psychological constructs to single-item, non-Likert-type measures of the same constructs. Using confirmatory factor analysis, the alternative forms were compared on criteria of methods variance and construct validity. Neither method appeared to be empirically better than the other. Unusual situations in which well-developed single-item measures might be appropriate are discussed.

655 citations


Journal ArticleDOI
TL;DR: In this paper, a brief inventory derived from Schwartz's 56-item instrument measuring the structure and content of human values is presented, which is suitable for use in survey research and other settings in which the longer instrument might be impractical.
Abstract: The authors present a brief inventory derived from Schwartz's 56-item instrument measuring the structure and content of human values. The inventory's four 3-item scales, measuring the major clusters called Self-Transcendence, Self-Enhancement, Openness to Change, and Conservation (or Traditional) values, all produce scores with acceptable reliability in two studies of pro-environmental attitudes and actions, and the brief inventory predicts those indicators nearly as well as much longer ones. The authors also present subscales of biospheric and altruistic values that can be used to assess whether Self-Transcendence values are differentiated in this way in special samples such as environmental activists. The brief inventory is suitable for use in survey research and other settings in which the longer instrument might be impractical.

457 citations


Journal ArticleDOI
TL;DR: In this paper, reliability generalization characterizes the typical reliability of scores for a given test across studies, the amount of variability in reliability coefficients for given measures, and the sources of variability of reliability coefficients across studies.
Abstract: Because tests are not reliable, it is important to explore score reliability in virtually all studies. The present article proposes and illustrates a new method-reliability generalization-that can be used in a meta-analysis application similar to validity generalization. Reliability generalization characterizes (a) the typical reliability of scores for a given test across studies, (b) the amount of variability in reliability coefficients for given measures, and (c) the sources of variability in reliability coefficients across studies. The use of reliability generalization is illustrated here by analyzing 87 reliability coefficients reported for the two scales of the Bem Sex Role Inventory (BSRI).

380 citations


Journal ArticleDOI
Xitao Fan1
TL;DR: The authors empirically examined the behaviors of the item and person statistics derived from these two measurement frameworks and found that the person and item statistics from IRT and CIT are quite comparable.
Abstract: Despite theoretical differences between item response theory (IRT) and classical test theory (CTT), there is a lack of empirical knowledge about how, and to what extent, the IRT- and CTT-based item and person statistics behave differently. This study empirically examined the behaviors of the item and person statistics derived from these two measurement frameworks. The study focused on two issues: (a) What are the empirical relationships between IRT- and CTT-based item and person statistics? and (b) To what extent are the item statistics from IRT and those from CIT invariant across different participant samples? A large-scale statewide assessment database was used in the study. The findings indicate that the person and item statistics derived from the two measurement frameworks are quite comparable. The degree of invariance of item statistics across samples, usually considered as the theoretical superiority IRT models, also appeared to be similar for the two measurement fireworks.

347 citations


Journal ArticleDOI
TL;DR: A short-form of Set II, consisting of 12 items extracted from the original 36, was developed and found to possess acceptable psychometric properties as discussed by the authors, although this short form differed considerably in content from the short form previously devised by Arthur and Day, the two short forms did not differ with respect to concurrent validity and predictive power.
Abstract: Five hundred and six first-year university students completed Raven's Advanced Progressive Matrices. Scores on Set II ranged from 6 to 35 (M= 22.17, SD = 5.60). The first 12 items of Set II were found to add little to the discriminative power of the test. Exploratory and confirmatory factor analyses failed to confirm Dillon et al.'s two-factor solution and suggested that a single-factor best represented performance on Set II. A short-form of Set II, consisting of 12 items extracted from the original 36, was developed and found to possess acceptable psychometric properties. Although this short form differed considerably in content from the short form previously devised by Arthur and Day, the two short forms did not differ with respect to concurrent validity and predictive power.

257 citations


Journal ArticleDOI
TL;DR: This paper described the development and initial score validation of the 20-item Teacher Multicultural Attitude Survey (TMAS), a unidimensional self-report inventory of teachers' multicultural awareness and sensitivity.
Abstract: This article describes the development and initial score validation of the 20-item Teacher Multicultural Attitude Survey (TMAS), a unidimensional self-report inventory of teachers' multicultural awareness and sensitivity. In two separate studies, a principal components analysis supported a global factor of multicultural awareness. Construct validity of TMAS scores was further established through convergent correlations with related instruments. Criterion validity was demonstrated using the group differences approach with sample cohort groups. Multiple measures of internal consistency and a test-retest stability assessment indicated satisfactory levels of score reliability. Finally, a social desirability assessment indicated no contamination of the TMAS.

175 citations


Journal ArticleDOI
TL;DR: In this article, the authors assessed the accuracy of parallel analysis, a technique in which the observed eigenvalues are compared to eigen values from simulated data in which no real factors are present.
Abstract: Selecting the correct number of factors to retain in a factor analysis is a crucial step in developing psychometric tools or developing theories. The present study assessed the accuracy of parallel analysis, a technique in which the observed eigenvalues are compared to eigenvalues from simulated data in which no real factors are present. Study 1 investigated the effect of the presence of one real factor on the size of subsequent noise eigenvalues. The size of real factors and the sample size were manipulated. Study 2 examined the effect that the pattern of structure coefficients and continuousness of the variables have on the size of real and noise eigenvalues. Study 3 compared the results of Studies 1 and 2 to actual psychometric data. These examples illustrate the importance of modeling the data more closely when parallel analysis is used to determine the number of real factors.

165 citations


Journal ArticleDOI
TL;DR: In this article, a comparison of centered and raw score analyses in least squares regression is presented, yielding identical hypothesis tests associated with the moderation effect and regression equations that are functionally equivalent.
Abstract: Centering variables prior to the analysis of moderated multiple regression equations has been advocated for reasons both statistical (reduction of multicollinearity) and substantive (improved interpretation of the resulting regression equations). This article provides a comparison of centered and raw score analyses in least squares regression. The two methods are demonstrated to be equivalent, yielding identical hypothesis tests associated with the moderation effect and regression equations that are functionally equivalent.

142 citations


Journal ArticleDOI
TL;DR: The authors describes SAS and SPSS programs that provide complete simple slope statistics and plots all in one job run, available for two-and three-way interactions and for continuous and categorical moderators.
Abstract: Upon discovering an interaction in moderated multiple regression, users must conduct complicated and time-consuming simple slope analyses that are not performed by current statistical software programs. This article describes SAS and SPSS programs that provide complete simple slope statistics and plots all in one job run. Programs are available for two- and three-way interactions and for continuous and categorical moderators.

135 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a concise summary of the factorial invariance problem and propose a simplified notation intended to facilitate discussion of the problem and suggest a structured approach for testing large models.
Abstract: Comparing different groups (e.g., cultures, age cohorts) using survey-type instruments raises the question of factorial invariance, that is, whether or not members of different groups ascribe the same meanings to survey items. This article attempts to advance multi-group research by (a) providing a concise summary of the factorial invariance problem, (b) proposing a simplified notation intended to facilitate discussion of the problem, and (c) suggesting a structured approach for testing large models. This procedure is illustrated using an extended example. Two computer programs designed to make the recommended procedures less laborious are offered.

131 citations


Journal ArticleDOI
TL;DR: This paper assessed the effects of three potential confounding factors on structural equation modeling (SEM) fit indices and parameter estimates: data non-normality, estimation method, and sample size.
Abstract: The present Monte Carlo study assessed the effects of three potential confounding factors on structural equation modeling (SEM) fit indices and parameter estimates: data nonnormality, estimation method, and sample size. The major findings were that (a) relatively mild data nonnormality has little effect on SEM fit indices and parameter estimates; (b) under misspecified models, estimation method (maximum likelihood [ML] vs. generalized least squares [GLS]) has considerable influence on SEM incremental fit indices; and (c) some fit indices are more susceptible to the influence of sample size. Previous findings in the literature that SEM fit indices were consistent under different estimation methods may need to be revisited, because the finding was primarily based on Monte Carlo simulations involving true SEM models. Because SEM researchers rarely are certain whether they have correctly specified their models, it is critical that simulation studies are conducted in the presence of model misspecification, as ...

Journal ArticleDOI
TL;DR: This paper used principal components analysis with a varimax rotation to determine whether the underlying factor structure of the Fennema-Sherman Mathematics Attitudes Scales (FSMAS) fit the dimensions suggested by the position of the 108 items on nine subscales.
Abstract: Data from 196 Irish school children were analyzed using principal components analysis with a varimax rotation to determine whether the underlying factor structure of the Fennema-Sherman Mathematics Attitudes Scales (FSMAS) fit the dimensions suggested by the position of the 108 items on nine subscales. Results indicated a factor structure virtually identical to a previous study that used a different sample, with the items being reduced to six separate components rather than nine as suggested by the scales' developers. Based on this factor structure, the authors attempted the development of a shortened form of the FSMAS. Internal consistency estimates of the reliability of scores on the whole scale and each subscale for both the original and the short form were favorable, with alpha coefficients ranging from .79 to .96.

Journal ArticleDOI
TL;DR: In this article, the authors compared the performance of the mean equality tests proposed by Alexander and Govern, Box, Brown and Forsythe, James, and Welch, as well as the analysis of variance F test, for their ability to limit the number of Type I errors and to detect true treatment group differences in one-way, completely randomized designs in which the underlying distributions were nonnormal, variances were nonhomogeneous, and groups sizes were unequal.
Abstract: Tests of mean equality proposed by Alexander and Govern, Box, Brown and Forsythe, James, and Welch, as well as the analysis of variance F test, were compared for their ability to limit the number of Type I errors and to detect true treatment group differences in one-way, completely randomized designs in which the underlying distributions were nonnormal, variances were nonhomogeneous, and groups sizes were unequal. These tests were compared when the usual method of least squares was applied to estimate group means and variances and when Yuen's trimmed means and Winsorized variances were adopted. Based on the variables examined in this investigation, which included number of treatment groups, degree of population skewness, nature of the pairing of variances and group sizes, and nonnull effects of varying sizes, we recommend that researchers use trimmed means and Winsorized variances with either the Alexander and Govern, James, or Welch tests to test for mean equality.

Journal ArticleDOI
TL;DR: This paper explored the treatment within the volume What If There were No Significance Tests? of five selected major themes: effect sizes, effect size, null hypothesis, power of power, and effect power.
Abstract: This book review first explores the treatment within the volume What If There Were No Significance Tests? of five selected major themes: (a) effect sizes, (b) the "nil" null hypothesis, (c) power a...

Journal ArticleDOI
TL;DR: In this paper, the authors address the measurement of trait mood by examining a set of new scales to measure four separate dimensions: positive energy, tiredness, negative arousal, and relaxation.
Abstract: The present study addresses the measurement of trait mood by examining a set of new scales to measure four separate dimensions: positive energy, tiredness, negative arousal, and relaxation. The data were divided into two halves. On the first half of the data, separate exploratory factor analyses were performed for each dimension using 15 items chosen from various sources to represent each dimension of mood. On the second half of the data, separate confirmatory factor analyses identified the items for which the data best fit the model. The factor analyses produced conceptually meaningful scales whose scores varied in internal consistency reliabilities ranging from .87 to .93. Relationships among the scales match the predictions of theories by Burke, Brief, George, Roberson, and Webster; Thayer; and Watson and Tellegen.

Journal ArticleDOI
TL;DR: This paper analyzed the structure of the subscores obtained through streamlined scoring of 334 adults' responses to Figural Forms A and B of the Torrance Tests of Creative Thinking (TTCT).
Abstract: This study analyzed the structure of the subscores obtained through streamlined scoring of 334 adults' responses to Figural Forms A and B of the Torrance Tests of Creative Thinking (TTCT). The results of principal components analyses indicated that one general creativity factor adequately represented the subscores of both Form A and Form B. An empirical comparison of the factor structures of Form A and Form B indicated that these forms have an equivalent structure. The results of commonality analyses confirmed that the five subscores of each form provide very little unique variance and suggested that a new subscore, resistance to premature closure, may be a better indicator than fluency is of the divergent thinking skills measured by the figural TTCT.

Journal ArticleDOI
TL;DR: In this article, the reliability and validity of the scores of a subjective measure of desired aspirations and a behavioral measure of enacted aspirations were assessed. But they did not provide promising support for their validity and reliability for a sample of 5,655 employees.
Abstract: The present investigation assessed the reliability and validity of the scores of a subjective measure of desired aspirations and a behavioral measure of enacted aspirations. A sample of 5,655 employees was randomly split into two halves. Principal components analysis on Sample 1, followed by confirmatory factor analysis on Sample 2, confirmed the desired and enacted scales as distinct but related measures of managerial aspirations. The desired and enacted scales had satisfactory levels of internal consistency and temporal stability over a 1-year period. Relationships between the measures of desired and enacted managerial aspirations and both attitudinal and behavioral criteria, measured concurrently and 1 year later, provided preliminary support for convergent and discriminant validity for our sample. Desired aspirations demonstrated stronger validity than enacted aspirations. Although further examination of the psychometric properties of the scales is warranted, the present findings provide promising support for their validity and reliability for our sample.

Journal ArticleDOI
TL;DR: In meta-analysis, a weighted average effect size is usually obtained to summarize the global magnitude through a set of primary studies as discussed by the authors, and the optimal weight to obtain the unbiased and minimum varian...
Abstract: In meta-analysis, a weighted average effect size is usually obtained to summarize the global magnitude through a set of primary studies. The optimal weight to obtain the unbiased and minimum varian...

Journal ArticleDOI
TL;DR: In this article, the authors describe how much and in what ways the authors of articles fail to include adequate information about data collection, and recommend that journal editors and referees more thoughtfully consider the quality of measurement reporting when reviewing and editing submitted articles.
Abstract: The present study describes how much and in what ways the authors of articles fail to include adequate information about data collection. The instrumentation reported in 220 articles from 22 randomly selected journals was coded and tabulated using a scheme based on criteria from current research textbooks that are consistent with American Educational Research Association/American Psychological Association/National Council on Measurement in Education (AERA/APA/NCME) standards. Results suggest that the quality of measurement reporting continues to be a problem. Eight of the most common reporting failures are identified. It is recommended that journal editors and referees more thoughtfully consider the quality of measurement reporting when reviewing and editing submitted articles.

Journal ArticleDOI
TL;DR: The factor structure of the items of three commonly used measures of mathematics anxiety was examined using a sample of 323 undergraduates enrolled in a required college algebra course in this paper, and the results showed that six oblique...
Abstract: The factor structure of the items of three commonly used measures of mathematics anxiety was examined using a sample of 323 undergraduates enrolled in a required college algebra course. Six oblique...

Journal ArticleDOI
TL;DR: In this paper, an analytic and computer strategy is introduced and demonstrated for multistage Euclidean grouping (MEG), where the procedure sequentially produces first-stage clusters for independent data blocks; second-stage, higher order clusters based on a full similarity matrix for fist stage clusters; and third-stage cluster that allow case migration to relocate prior misassignments and to optimize within-cluster homogeneity.
Abstract: An analytic and computer strategy is introduced and demonstrated for multistage Euclidean grouping (MEG). The procedure sequentially produces first-stage clusters for independent data blocks; second-stage, higher order clusters based on a full similarity matrix for fist-stage clusters; and third-stage clusters that allow case migration to relocate prior misassignments and to optimize within-cluster homogeneity. The process is facilitated by special SAS computer codes and, in addition to conventional SAS cluster output, produces special fusion statistics, plots of all fusion statistics, and indices of homogeneity within clusters and within profile variables. The program also reports replication rates for final clusters.

Journal ArticleDOI
TL;DR: In this article, the validity of a higher order factorial structure of a Bulgarian version of the Beck Depression Inventory (BDI) for non-clinical adolescents was tested using a cross validation of three independent samples (n1 = 227, n2 = 172, n3 = 292).
Abstract: The purpose of the present study was to test for the validity of a higher order factorial structure of a Bulgarian version of the Beck Depression Inventory (BDI) for non clinical adolescents. In a cross validation of three independent samples (n1 = 227; n2 = 172; n3 = 292), and based on the analysis of covariance structures within the framework of a confirmatory factor analytic model, findings yielded exceptionally strong support for the hypothesized second-order three-factor structure. These findings add to a growing cross-cultural agglomerate of construct validity data related to the factorial structure of the BDI. Results are expected to be of substantial interest to both researchers and clinicians whose concerns focus on depression as it bears on this population.

Journal ArticleDOI
TL;DR: One of the most widely used measures of coping is the Ways of Coping Questionnaire (WCQ) as mentioned in this paper, and despite its widespread use, evidence regarding the construct validity of WCQ scores is limited and inconclu...
Abstract: One of the most widely used measures of coping is the Ways of Coping Questionnaire (WCQ). Despite its widespread use, evidence regarding the construct validity of WCQ scores is limited and inconclu...

Journal ArticleDOI
TL;DR: In this article, a simulation study was conducted to investigate the application of expected a posteriori (EAP) trait estimation in computerized adaptive tests (CAT) based on the partial credit model and compare it with maximum likelihood trait estimation (MLE).
Abstract: A simulation study was conducted to investigate the application of expected a posteriori (EAP) trait estimation in computerized adaptive tests (CAT) based on the partial credit model and compare it with maximum likelihood trait estimation (MLE). The performance of EAP was evaluated under different conditions: the number of quadrature points (10, 20,40, and 80) and the type of prior distribution (normal and uniform). The relative performance of MLE and the EAP estimation methods was assessed under two distributional forms of the latent trait (normal and negatively skewed). Results showed that, regardless of the latent trait distribution, MLE and EAP with a normal prior or a uniform prior using either 20, 40, or 80 quadrature points provided relatively accurate estimation in CAT based on the partial credit model. Also, increasing the number of quadrature points from 20 to 80 did not increase the accuracy of EAP estimation.

Journal ArticleDOI
TL;DR: The authors developed the argument that estimates of the moderator effect may vary depending on the approach (regression or subgroup) chosen by the meta-analyst, and illustrative examples of tables for different levels of moderator effects and moderator intercorrelations are provided.
Abstract: Meta-analytic searches for moderators or boundary conditions of relationships between variables have increased over the years. Multiple regression and subgroup analysis are the two most common strategies used to search for moderators in a meta-analysis. This short note develops the argument that estimates of the moderator effect may vary depending on the approach (regression or subgroup) chosen by the meta-analyst. The difference results from the ambiguity in assigning the variance shared by correlated moderators when that shared variance is also shared with the effect size. Formulae are presented to estimate the magnitude of the difference, and illustrative examples of tables for different levels of moderator effects and moderator intercorrelations are provided.

Journal ArticleDOI
TL;DR: In this paper, the relationship between class size and student evaluation of university teaching quality was analyzed and it was shown that the relationship is a nonlinear one, whose shape is basically determined by the range of class sizes and the number of different values selected.
Abstract: The relationship between class size and student evaluation of university teaching quality was analyzed. Data from 2,915 university classrooms were collected in classes ranging from 1 to 234 students. Results strongly support conclusions that (a) there is a weak relationship between class size and student ratings of teaching quality, when both statistical significance and effect size are taken into account; and (b) the relationship is a nonlinear one, whose shape is basically determined by the range of class size and the number of different values selected. Theoretical, methodological, and practical implications are discussed.

Journal ArticleDOI
TL;DR: The Family Background Questionnaire (FBQ) as mentioned in this paper is a self-report instrument for comprehensively assessing memories of these characteristics, which is a 179-item instrument with 22 subscales designed to provide such an instrument.
Abstract: Family of origin characteristics are widely perceived as having a substantial influence on development but no self-report instrument has been available for comprehensively assessing memories of these characteristics. This report describes the development of the Family Background Questionnaire (FBQ), a 179-item instrument with 22 subscales designed to provide such an instrument. Reliability analyses found high internal consistency and temporal stability in FBQ scores, and validity analyses found substantial consistency between the theoretically derived subscales and the factor structure of the instrument. Moreover, the scores of siblings from the same families were correlated as expected.

Journal ArticleDOI
TL;DR: In this article, the authors examined the relationship between the validity theory of the past 50 years with actual validity practices and found that the theoretical modifications to the concept of validity have been carried over into validity practice.
Abstract: The present study examined the relationship between the validity theory of the past 50 years with actual validity practices. This process involved the comparison of validity requirements and theory as they appeared in the published test standards (by the American Psychological Association, the American Educational Research Association, and the National Council on Measurement in Education) to the practices of measurement professionals. Each of the four sets of test standards were operationally defined and then compared to information and opinions about validity evidence expressed in the Mental Measurements Yearbook test reviews. Results from this process indicate that the theoretical modifications to the concept of validity have been carried over into validity practice. However, it was also found that operational modifications to the concept of validity have been influenced by validity practice. Therefore, there appears to be a symbiotic relationship between theory and practice on the influence of validity.

Journal ArticleDOI
TL;DR: In this paper, the authors examined the psychometric properties of the Learning and Study Strategies Inventory-High School (LASSI-HS) using a sample of high school students from Singapore.
Abstract: The purpose of this investigation was to examine the psychometric properties of the Learning and Study Strategies Inventory-High School (LASSI-HS) using a sample of high school students from Singapore. The authors first examined the reliability and validity of scores from the LASSI-HS, one of the most widely used measures of students' learning and study strategies. To further explore these properties, the two latent structure models of the LASSI-HS proposed by Olivarez and Tallent-Runnels were tested using a culturally diverse sample. It was found that the LASSI-HS yields reliable measures of students' learning and studying behaviors; however, the results raise questions about the underlying structure of this measure. That is, not only did the results fail to confirm the 10 latent constructs (i.e., scales) proposed within the LASSI-HS user's manual, but findings also failed to confirm the latent structure models proposed by Olivarez and Tallent-Runnels.

Journal ArticleDOI
TL;DR: In this article, first-and second-order confirmatory factor analyses were conducted to evaluate the factorial validity of scores on two positive affect (PA) measurement instruments and a multidimensional representation of PA.
Abstract: First- and second-order confirmatory factor analyses were conducted to evaluate the factorial validity of scores on two positive affect (PA) measurement instruments and a multidimensional representation of PA. Using a sample of college undergraduates (n = 232), trait PA was measured with items from the Differential Emotion Scale (DES) and the Positive Affect Negative Affect Schedule (PANAS). Both multifactorial first-order and second-order models fit the DES and PANAS reasonably well. Moreover, the first-order factors within each instrument were significantly correlated. With respect to the multidimensional representation of PA, both three- (Joviality, Self-Assurance, Interest/Attentiveness) and four-factor (the three-factor model plus Surprise) models were tenable. However, in the four-factor model, only a nominal amount of variance in a first-order Surprise factor was explained by a second-order PA factor.