Showing papers in "Educational and Psychological Measurement in 1997"
••
TL;DR: In this paper, the authors describe the development and validation process for an instrument to assess goal orientation (an individual disposition toward developing or validating one's ability in achievement settings), and the results of exploratory factor analysis, reliability analysis (internal consistency and test-retest), confirmatory factor analyses, and nomological network analysis all support the conclusion that the instrument operationalizes the theorized three-dimensional construct.
Abstract: This article describes the development and validation process for an instrument to assess goal orientation (an individual disposition toward developing or validating one's ability in achievement settings). In contrast to previous goal orientation instruments, three goal orientation dimensions are identified (learning, avoid, and prove), and the instrument is domain specific to work settings. The results of exploratory factor analysis, reliability analysis (internal consistency and test-retest), confirmatory factor analysis, and nomological network analysis all support the conclusion that the instrument operationalizes the theorized three-dimensional construct.
1,352 citations
••
TL;DR: In 1987, Hubert and Arabie proposed a randomization test of hypothesized order relations, and this has been operationalized in the Microsoft FORTRAN RANDALL program, which enables the evaluation of the fit of any pattern model to a data matrix of similarities or dissimilarities.
Abstract: In 1987, Hubert and Arabie proposed a randomization test of hypothesized order relations, and this has been operationalized in the Microsoft FORTRAN RANDALL program. This program enables the evaluation of the fit of any pattern model to a data matrix of similarities or dissimilarities. The exact probability of the model-data fit exceeding chance (as defined by a random relabeling of the rows and columns of the data matrix) is provided. This program is especially valuable in the evaluation of circumplex models of data as found in color perception, vocational interests, and interpersonal behavior.
194 citations
••
TL;DR: In this article, a general linear model (GLM) framework is employed to suggest that structure coefficients ought to be interpreted in structural equation modeling confinmatory factor analysis (CFA) studies in which factors are correlated.
Abstract: A general linear model (GLM) framework is employed to suggest that structure coefficients ought to be interpreted in structural equation modeling confinmatory factor analysis (CFA) studies in which factors are correlated. The computation of structure coefficients is explained. Two heuristic data sets are used to make the discussion concrete. The benefits from using CFA structure coefficients are illustrated using two additional studies.
184 citations
••
TL;DR: In this paper, the authors find that the variance component interpreted as pupil x task interaction actually arises from instability in pupil performance and that pupils are nested in classes and schools, and that whether to treat the population of a school's pupils as infinite or as limited to the student body assessed also requires careful consideration.
Abstract: Evidence of the uncertainty attached to school and individual scores is required to avoid over interpretation of results. Generalizability analysis provides in the standard error a suitable indicator of uncertainty. Assessments depart from traditional measurements in ways that require extensions and re interpretatons of generalizability analysis. The authors find, for example, that in many analyses the variance component interpreted as Pupil x Task interaction actually arises in part from instability in pupil performance. It is necessary to recognize in the school-level analysis that pupils are nested in classes and schools. Whether to treat the population of a school's pupils as infinite or as limited to the student body assessed also requires careful consideration.
177 citations
••
TL;DR: In this article, the reliability and validity of scores on the Religious Orientation Scale (ROS) were reviewed with respect to social desirability, and the validity of these scores was evaluated.
Abstract: Reliability and validity of scores on the Religious Orientation Scale (ROS) are reviewed with respect to social desirability. ROS measures intrinsic religiousness (I; religion as an end unto itself...
138 citations
••
TL;DR: The FRIEDBEN Test Anxiety Scale (the FTA) as mentioned in this paper is a 23-item scale consisting of the following three subscales: (a) Social Derogation (worries of being socially belittled and deprecated by significant others following failure on a test), (b) Cognitive Obstruction (poor concentration, failure to recall, difficulties in effective problem solving, before or during a test).
Abstract: This article presents the development of a measure of test anxiety among adolescents, named the FRIEDBEN Test Anxiety Scale (the FTA). It is a 23-item scale consisting of the following three subscales: (a) Social Derogation (worries of being socially belittled and deprecated by significant others following failure on a test), (b) Cognitive Obstruction (poor concentration, failure to recall, difficulties in effective problem solving, before or during a test), and (c) Tenseness (bodily and emotional discomfort). Replicability analyses-cross-validation and validity generalization-were conducted. Data regarding the construct validity of scores on the measure are reported.
122 citations
••
TL;DR: This paper explored construct validity of scores from the Bem Sex-Role Inventory using confirnatory factor analytic methods on data from 791 subjects and found that the short form has paradoxically been shown to generally yield more reliable scores.
Abstract: In the early 1970s Constantinople wrote a seminal article that subsequently led to the elaboration of the construct of psychological androgyny. The Bem Sex-Role Inventory is a popular measure of the construct, but the measure remains controversial. We explored construct validity of scores from the measure using confirnatory factor analytic methods on data from 791 subjects. Measurement characteristics of both long and short forms were investigated, in that the short form has paradoxically been shown to generally yield more reliable scores.
120 citations
••
TL;DR: In this paper, the authors examined the psychometric properties of three coping inventories: the Coping Inventory for Stressful Situations, the COPE, and the coping strategies inventory.
Abstract: This article examines the psychometric properties of three coping inventories: the Coping Inventory for Stressful Situations, the COPE, and the Coping Strategies Inventory. First, the stability of the factor structure for each inventory was examined using confirmatory factor analysis. Second, confirmatory and exploratory factor analyses were used to ascertain the common constructs underlying these coping scales. The results indicate preference for particular factor structures for each coping measure, although none of the factor structures examined provided a strong fit with data in this study. Three general factors were found across the three instruments: Problem Engagement, Avoidance, and Social/Emotional. These results suggest more complex conceptualizations of coping.
118 citations
••
TL;DR: In this article, confirmatory factor analysis was used to further examine the construct validity of the scores on the Survey of Perceived Organizational Support (SPOS), a measure of perceived employer commitment, was found to be unidimensional and distinguishable from measures of affective and continuance commitment.
Abstract: Wayne Shore and Lois Tetrick demonstrated in 1991 that the Survey of Perceived Organizational Support (SPOS), a measure of perceived employer commitment, was unidimensional and distinguishable from measures of affective and continuance commitment. In the present study, confirmatory factor analysis was used to further examine the construct validity of the scores on the SPOS. Participants were 205 members of the staff and faculty of a large western state university. Consistent with Shore and Tetrick's findings, the SPOS was found to be unidimensional. In addition, the SPOS was found to be distinguishable from two similarly conceptualized correlates of affective commitment: perceived supervisory support and organizational dependability. Findings are discussed with respect to their implications for understanding the commitment process.
95 citations
••
TL;DR: Several methods of constructing confidence intervals (CIs) for Spearman's rho were tested in a Monte Carlo investigation as mentioned in this paper, and each method for computing a 95% CI around p3 was evaluated with regard to size in the null case and power and coverage in non-null cases.
Abstract: Several methods of constructing confidence intervals (CIs) for Spearman's rho were tested in a Monte Carlo investigation. A total of 2,000 samples of sizes 10, 50, and 200 were randomly drawn from bivariate normal populations with p, equal to .00, .29, .43, .58, .73, and .89. Each method for computing a 95% CI around p3 was evaluated with regard to size in the null case and power and coverage in non-null cases. Fisher's z transformation of r, worked well provided N was not small and Ps was not too large. The CIs constructed using the variance estimate for product-moment correlations had coverages that were consistently too liberal. Kraemer's method for establishing CIs produced coverages that were conservative. An empirical attempt to adjust the Fisher CI maintained Type I error rate near the nominal level in all cases with no loss of power. Arguments are made for the continued use of r, in behavioral research.
91 citations
••
TL;DR: In this paper, the authors developed an instrument to measure employee's job satisfaction in Greece using Structural Equation Modeling Analysis (EQS) and Factor Analysis (FA).
Abstract: The aim of this study was to develop an instrument to measure employee's job satisfaction in Greece. Exploratory factor-analytic results indicated a six-factor solution with high internal consistency. The six factors obtained were Working Conditions, Supervisor, Pay, Job Itself, Organization as a Whole, and Promotion. Structural equation modeling analysis (EQS) showed that although the fit of the model is fairly good, there is need for slight improvement.
••
TL;DR: In this article, the authors investigated whether the broader dimensions of transformational and transactional leadership can be inferred from subordinate reports of leadership behaviors collected using instruments not specifically designed for this purpose.
Abstract: The present study investigated whether the broader dimensions of transformational and transactional leadership can be inferred from subordinate reports of leadership behaviors collected using instruments not specifically designed for this purpose. The leadership measurement instrument used was the Leadership Practices Inventory (LPI). Alternative second-order factor models were evaluated using LISREL 7, and results suggested that subordinate assessments made using the LPI also can be used to measure transformational and transactional leadership. This suggests that transformational and transactional leadership approaches may be thought of as underlying dimensions, with more particularistic leadership behaviors, such as those described by the five LPI dimensions, being related to them.
••
TL;DR: In this paper, a Monte Carlo study was designed to evaluate some of the conditions contributing to factor congruence when using Schonemann's orthogonal Procrustes transformation.
Abstract: Procrustes methods of factor rotation have been criticized for producing excessively high coefficients of congruence when attempting to fit one factor pattern matrix into the space of a targeted pattern. A Monte Carlo study was designed to evaluate some of the conditions contributing to this problem when using Schonemann's orthogonal Procrustes transformation. It was found that the expected size of the factor congruence coefficient varied with (a) the number of variables in the analysis, (b) the number of salient variables defining a factor, and (c) the size of the salient variables' factor pattern coefficients. The number of factors extracted also had some influence on congruence but only in interaction with the size of the salient pattern values. The results of this simulation study, which include a prediction equation, can be used by researchers to appraise levels of factor congruence they find with real data.
••
TL;DR: In this article, the authors report the development of three attachment style scales and support the factor structure of the attachment style construct via exploratory factor analysis of attachment style scores from 1,181 recent graduates of one university and confirmatory factor analyses of scores from 545 recent graduates from another university.
Abstract: This study reports the development of three attachment style scales. The factor structure of the attachment style construct was supported via exploratory factor analysis of attachment style scores from 1,181 recent graduates of one university and confirmatory factor analysis of scores from 545 recent graduates of another university. Additional evidence for the validity of scores produced by the scales was that scores on the new measures correlated with scores from previously developed scales and were associated as expected with scores on a measure of the Big Five personality traits.
••
TL;DR: In this article, a new method of extension analysis was proposed to avoid the problem of factor indeterminacy by using less restriction assumptions, and the new extension procedure gives the correlations without using estimated factor scores.
Abstract: Exploratory common factors have been correlated with variables external to the factor analysis by either extension analysis estimates or by correlating the external variables with estimated factor scores. In item analysis, a set of such correlations with possible scales is often computed for final item selection. The purposes of the present article are to document a problem common to both methods of estimating such correlations and to propose a new method of extension analysis that avoids this problem with less restriction assumptions. Solutions for a common data set by all three methods of extension analysis are presented that show the expected difficulties and improvements in the results. Because the new extension procedure gives the correlations without using estimated factor scores, use of this extension procedure eliminates factor indeterminacy from the multiple ways to estimate common factor scores. This method of evaluating possible scales was shown to have advantages over item-remainder correlatio...
••
TL;DR: In this paper, the increasing popularity of structural equation models that correct for attenuation due to measurement error is discussed, and the methods by which structural models correct for the effects of measurement error are reviewed.
Abstract: The increasing popularity of structural equation models that correct for attenuation due to measurement error is noted. The methods by which structural models correct for the effects of measurement error are reviewed. Next, implications of such disattenuation for interpreting the results of structural equation models are considered. Recommendations are advanced for addressing the practice of disattenuation, and caution is urged in drawing inferences based on disattenuated parameter estimates.
••
TL;DR: Results of the CFA support Achenbach's eight-correlated-factor model and provide additional evidence of the construct validity of the scores obtained from the CBCV/4-18.
Abstract: Confirmatory factor analysis (CFA) was used to evaluate the factor structure of T. M. Achenbach's eight cross-informant syndrome scales of the Child Behavior Checklist (CBCIJ4-18) using data genera...
••
TL;DR: In this article, the reliability of scores on four forms of the Test of English as a Foreign Language (TOEFL) was estimated using a hybrid IRT model, and it was found that there was very little difference between their overall reliability when the testlet items were assumed to be independent and when their dependence was modeled.
Abstract: The reliability of scores on four forms of the Test of English as a Foreign Language (TOEFL) was estimated using a hybrid IRT model. It was found that there was very little difference between their overall reliability when the testlet items were assumed to be independent and when their dependence was modeled. A larger difference in reliability was found when test sections were analyzed individually. Then we found as much as a 40% overestimate in reading comprehension testlets, with the longer testlets of the newest form of TOEFL showing the most local dependence. The listening comprehension testlets exhibited much less local dependence. We also found that the test was unidimensional enough for the use of univariate item response theory (IRT) to be efficacious, and that the reading comprehension testlets showed essentially no differential functioning by sex.
••
TL;DR: In this article, the authors evaluated the Time Structure Questionnaire (TSQ) of M. J. Bond and N. T. Feather and the Time Management Behavior Scale (TMBS) of T. H. Macan and colleagues by analyzing item content, subscale score reliabilities and factor structures.
Abstract: Several promising survey instruments recently have emerged to assess time structuring and time management practices and behaviors. The present study evaluated the Time Structure Questionnaire (TSQ) of M. J. Bond and N. T. Feather and the Time Management Behavior Scale (TMBS) of T. H. Macan and colleagues by analyzing item content, subscale score reliabilities, and factor structures. A sample of 701 American working adults completed the 20-item TSQ (N= 453 for the 46-item TMBS). Results confirmed that four of five TSQ subscales should appear in their original formats and that truncated versions of the four TMBS subscales and of the remaining TSQ subscale should be adopted. The study affirmed the importance of examining TSQ and TMBS subscales rather than simply aggregate scores and of achieving uniformity in subscale composition in future research.
••
TL;DR: The Maryland School Performance Assessment Program (MSPAP) as discussed by the authors is an innovative performance-based testing program covering reading, writing, language usage, mathematics, science, and social studies.
Abstract: The Maryland School Performance Assessment Program (MSPAP) is an innovative performance-based testing program covering reading, writing, language usage, mathematics, science, and social studies. MSPAP is administered annually on a census basis to students in Grades 3, 5, and 8, and the results are used for high-stakes, yearly evaluations of school performance and for tracking school improvement. The present article describes the program design and highlights its psychometric characteristics with respect to scaling, equating, standard setting, score accuracy, and validity.
••
TL;DR: A standard-setting method-the dominant profile judgment (DPJ) method-designed for use with profiles of polytomous scores on exercises in a performance-based assessment, which allows complex policy statements that could incorporate compensatory and/or conjunctive components.
Abstract: Traditional standard-setting methods are not well suited for applications with polytomously scored performance assessments. The present article presents a standard-setting method-the dominant profile judgment (DPJ) method-designed for use with profiles of polytomous scores on exercises in a performance-based assessment. The method is direct, in that it guides standard-setfing panelists in the articulation of their standard-setting policies. Further, it allows complex policy statements that could incorporate compensatory and/or conjunctive components. A detailed description of the method is provided, Results of an application of this standard-setting method are presented. Recommendations for improvements in the method are discussed.
••
TL;DR: In this article, the authors empirically examined the potential importance of suppressor variables in the personality domain and found that they increased the cross-validated multiple r from.61 to.68 (an 11% increase).
Abstract: The practical value to organizations and to society of even small increments to validity from suppressor effects may be substantial. This study empirically examined the potential importance of suppressor variables in the personality domain. Of the five suppressor variables tentatively identified in the initial validation sample, four were found to hold up in an independent cross-validation sample. The suppressor variables increased the cross-validated multiple r from .61 to .68 (an 11% increase). The authors argue that, in addition to potential increments to prediction, suppressor variables in the personality domain may also contribute to substantive knowledge and theory development. The authors recommend that examination of potential suppressors be a more frequent component of research and analysis in the personality domain.
••
TL;DR: The use of three popular statistical packages (BMDP, SAS, and SPSS) to obtain computational results for a predictive discriminant analysis (PDA) and for a DDA is generally reviewed as discussed by the authors.
Abstract: The two aspects of discriminant analysis are briefly reviewed: predictive discriminant analysis (PDA) and descriptive discriminant analysis (DDA). Use of three popular statistical packages (BMDP, SAS, and SPSS) to obtain computational results for a PDA and for a DDA is generally reviewed. Results yielded by two BMDP procedures (7M and SM) are discussed, as are four SAS procedures (DISCRIM, STEPDISC, CANDISC, and GLM), and two SPSS procedures (DISCRIMINANT and MANOVA). It is pointed out which procedures are used to obtain PDAresults and which are used for PDAresults. Examples of printout information are given. Some of these examples pertain to misleading information-one to incorrect information, and some, of course, to very useful information. The need for nonpackage analyses to generate specific PDA and DDA information is also mentioned.
••
TL;DR: In this article, a general interpretation can be made for the a-parameter (slope) in an item response theory (IRT) analysis of personality inventories, and the authors tested this conjecture using item-total correlations and both two-and threeparameter IRT models with the Eysenck Personality Questionnaire and were unable to support Roskam's proposed interpretation.
Abstract: Roskam conjectured in 1985 that a general interpretation can be made for the a-parameter (slope) in an item response theory (IRT) analysis of personality inventories. That is, Roskam suggested that steeper slopes (and hence higher item-total correlations in classical test theory) will be found with more concretely worded items, whereas lower slopes will be found with more abstractly worded items. The authors tested this conjecture using item-total correlations and both two- and three-parameter IRT models with the Eysenck Personality Questionnaire and were unable to support Roskam's proposed interpretation.
••
TL;DR: In this article, the psychometric properties of the Body Esteem Scale (BES) were examined among 255 girls and 436 boys from Grades 5 through 12, and the BES has been found to yield reliable and va...
Abstract: The present study examined the psychometric properties of the Body Esteem Scale (BES) among 255 girls and 436 boys from Grades 5 through 12. To date, the BES has been found to yield reliable and va...
••
TL;DR: In this paper, the results from two Monte Carlo studies in item response theory are analyzed with inferential methods to illustrate the strengths of these procedures, and it is recommended that researchers employ both descriptive and inferential method to analyze Monte Carlo results.
Abstract: Monte Carlo studies in item response theory have been used in a number of ways, for example, to evaluate new parameter estimation procedures, to compare item analysis programs, and to study the effects of multidimensional data on parameter estimation. These studies typically rely on simple descriptive methods to analyze Monte Carlo results, implying that complex effects are unlikely to be detected or their magnitudes estimated. These problems are exacerbated when Monte Carlo studies lack an experimental design to guide the data analyses. The results from two Monte Carlo studies in item response theory are analyzed with inferential methods to illustrate the strengths of these procedures. It is recommended that researchers in item response theory employ both descriptive and inferential methods to analyze Monte Carlo results.
••
TL;DR: This article used Coupland's book Generation X as the basis for developing a scale measuring attitudes associated with a generation of people currently in their mid-20s (i.e., Generation X).
Abstract: Douglas Coupland's book, Generation X, was used as the basis for developing a scale measuring attitudes associated with a generation of people currently in their mid-20s (i.e., Generation X). Although much anecdotal evidence has been presented about this group, published research has not addressed measures of their attitudes. The scale reported here measures both young and old people's attitudes toward older members of society, parents, jobs, and shopping. Instrument development involved three phases, and confirmatory factor analysis was employed to evaluate model fit.
••
TL;DR: This article examined four threats to the validity of an alternative objective item format, the multiple-mark format, and found that over 95% of students followed multiple-choice item directions and fewer than 1% omitted multiplemark items.
Abstract: This study examined four threats to the validity of an alternative objective item format, the multiple-mark format. This format uses a multiple-choice item format and directs students to mark correct alternatives and leave blank incorrect alternatives. The [ILLEGIBLE] format was used on the Kansas Reading and Mathematics Assessments at three [ILLEGIBLE] levels. The four validity threats studied were students understanding directions, students omitting responses, dependency among item alternatives, and student guessing. [ILLEGIBLE] results indicated that over 95% of students followed multiple-mark directions and fewer than 1% omitted multiple-mark items. Reliability results showed some dependency among alternatives. As a result, it was recommended that scoring occur at the item level for multiple-mark items. In regard to guessing, it appeared that when students guessed, they chose to leave an alternative blank rather than mark it. Overall, the reliability and validity coefficients showed that the multiple-...
••
TL;DR: In this paper, the authors present evidence for the construct validity of scores on the Ethical Issues Rating Scale, an instrument designed to measure respondents' assessment of the importance of various ethical issues.
Abstract: The present study presents efforts to establish evidence for the construct validity of scores on the Ethical Issues Rating Scale, an instrument designed to measure respondents' assessment of the importance of various ethical issues. Factor-analytic results using data from a sample of students enrolled in university business courses (n = 213) supported the existence of five constructs: personal integrity issues, corporate integrity issues, individual rights issues, environmental issues, and international issues. Suggestions regarding the usefulness of the instrument and procedures for its further refinement are offered.
••
TL;DR: In this paper, a simulation study was conducted to investigate the effect of population distribution on maximum likelihood estimation (MLE) and expected a posteriori estimation (EAP) in computerized adaptive testing (CAT) based on Andrich's rating scale model.
Abstract: A simulation study was conducted to investigate the effect of population distribution on maximum likelihood estimation (MLE) and expected a posteriori estimation (EAP) in computerized adaptive testing (CAT) based on Andrich's rating scale model. Comparisons were made among MLE and EAP with a normal prior distribution and EAP with a uniform prior distribution within two data sets: one generated using a normal trait distribution and the other using a negatively skewed trait distribution. Descriptive statistics, correlations, scattergrams, and accuracy indices were used to compare the different methods of trait estimation. EAP estimation with a normal prior or uniformprior yielded results similar to those obtained with MLE, even though the prior did not match the underlying trait distribution. An additional simulation study based on real data suggested that more work is needed to determine the optimal number of quadrature points for EAP in CAT based on the rating scale model. The choice between MLE and EAP f...