scispace - formally typeset
Search or ask a question

Showing papers in "Educational and Psychological Measurement in 1999"


Journal ArticleDOI
TL;DR: This paper examined whether individuals can fake their responses to a personality inventory if instructed to do so, and concluded that within-subjects designs produce more accurate estimates than between-subject designs.
Abstract: The authors examined whether individuals can fake their responses to a personality inventory if instructed to do so. Between-subjects and within-subject designs were metaanalyzed separately. Across 51 studies, fakability did not vary by personality dimension; all the Big Five factors were equally fakable. Faking produced the largest distortions in social desirability scales. Instructions to fake good produced lower effect sizes compared with instructions to fake bad. Comparing meta-analytic results from within-subjects and between-subjects designs, we conclude, based on statistical and methodological considerations, that within-subjects designs produce more accurate estimates. Between-subjects designs may distort estimates due to Subject × Treatment interactions and low statistical power.

483 citations


Journal ArticleDOI
TL;DR: In this paper, the factorial validity of scores on the Parent-Child Early Relational Assessment in a normative population of mothers and their 12-month-old infants was evaluated using confirmatory factor analysis.
Abstract: The present study attempts to determine the factorial validity of scores on the Parent-Child Early Relational Assessment in a normative population of mothers and their 12-month-old infants. Parent, child, and dyadic items scored from free play interactions were analyzed as separate components of the instrument. Scores on three parent, three infant, and two dyadic subscales were examined for reliability and convergent and discriminant validity using confirmatory factor analysis. To provide confidence in the results, a factorial invariance study using multigroup confirmatory factor analysis was conducted using a separate sample. All subscale scores demonstrated high levels of internal consistency, with coefficients ranging from .75 to .96. Evidence was also found for convergent and discriminant validity.

221 citations


Journal ArticleDOI
TL;DR: The Psychological Acculturation Scale (PAS) was developed to assess acculturation from a phenomenological perspective, with items pertaining to the individual’s sense of psychological attachment to and belonging within the Anglo-American and Latino/Hispanic cultures.
Abstract: Most instruments designed to measure acculturation have relied on specific cultural behaviors and preferences as primary indicators of acculturation. In contrast, feelings of belonging and emotional attachment to cultural communities have not been widely used. The Psychological Acculturation Scale (PAS) was developed to assess acculturation from a phenomenological perspective, with items pertaining to the individual's sense of psychological attachment to and belonging within the Anglo-American and Latino/Hispanic cultures. Responses from samples of bilingual individuals and Puerto Rican adolescents and adults are used to establish a high degree of measurement equivalence across the Spanish and English versions of the scale along with high levels of internal consistency and construct validity. The usefulness of the PAS and the importance of studying acculturation from a phenomenological perspective are discussed.

219 citations


Journal ArticleDOI
TL;DR: In this paper, the authors used structural equations modeling (LISREL) to test the competing values framework (CVF) and refine a scale that identifies the extent to which managers and other organizational constituencies use the framework's criteria to evaluate organizational effectiveness.
Abstract: The competing values framework (CVF) formulated by Quinn and his colleagues was developed to specify the criteria of organizational effectiveness and was later used to study a wide range of organizational phenomena such as culture and change. We extend this work by using structural equations modeling (LISREL) to (a) test the CVF and (b) refine a scale that identifies the extent to which managers and other organizational constituencies use the framework’s criteria to evaluate organizational effectiveness. Based on a sample of 300 hospital managers and supervisors, with one exception, our results support the CVF. Moreover, scores on the scale developed to measure the CVF criteria yield excellent validity and reliability estimates.

219 citations


Journal ArticleDOI
TL;DR: In this paper, the authors introduced a new approach to the substitution of missing values in surveys with Likert-type scales: relative mean substitution, which is demonstrated in comparison with three other commonly used methods for dealing with missing values, making use of actual field data.
Abstract: This article introduces a new approach to the substitution of missing values in surveys with Likert-type scales: relative mean substitution. The effectiveness of this method is demonstrated in comparison with three other commonly used methods for dealing with missing values, making use of actual field data. The emphasis is on two aspects of global effectiveness: (a) the accuracy in estimating various parameters at the same time and (b) the accuracy in estimating, for Likert-type scales with different psychometric characteristics, these various parameters under different conditions, such as different numbers of respondents (1,674; 400; and 100) and different distributions of missing values (two random and three nonrandom situations). The results indicated that this new relative mean substitution approach globally produced the most accurate estimates, mainly because of the more accurate estimation of the variances and the sensitivity to items with deviating means, provided that the Likert-type scales are su...

198 citations


Journal ArticleDOI
TL;DR: The structural properties of two measures of organizational commitment, the Organizational Commitment Questionnaire and the organizational Commitment Scale, were examined to establish similarities and differences in the measures as discussed by the authors, which indicated that the scales differed with respect to the components of commitment each measured and the strength of the relationships each had with the antecedents and consequences.
Abstract: The structural properties of two measures of organizational commitment, the Organizational Commitment Questionnaire and the Organizational Commitment Scale, were examined to establish similarities and differences in the measures. Next, the antecedents of age, gender, marital status, leader-member exchange, and justice and the consequences of job satisfaction, life satisfaction, nonwork satisfaction, intent to turnover, and job involvement were examined in relation to each scale. Results indicated that the scales differed with respect to the components of commitment each measured and the strength of the relationships each had with the antecedents and consequences. Suggestions for when the use of each scale might be appropriate are provided.

197 citations


Journal ArticleDOI
TL;DR: The authors highlights the theoretical differences between the Likert and Thurstone approaches to attitude measurement and demonstrates how such differences can lead to discrepant attitude estimates for individuals with the most extreme opinions.
Abstract: This article highlights the theoretical differences between the Likert and Thurstone approaches to attitude measurement and demonstrates how such differences can lead to discrepant attitude estimates for individuals with the most extreme opinions. Both simulated data and real data on attitude toward abortion are used to demonstrate this discrepancy. The results suggest that attitude researchers should, at the very least, devote more attention to the empirical response characteristics of items on a Likert attitude questionnaire. At most, these results suggest that other methods, such as the Thurstone technique or one of its recently developed item response theory counterparts, should be used to derive attitude estimates from disagree-agree responses.

170 citations


Journal ArticleDOI
TL;DR: The American Psychological Association's editorial style encourages authors to provide effect size estimates as discussed by the authors, and several journals, including Educational and Psychological Measurement, have adopted author-generative methods.
Abstract: The American Psychological Association’s editorial style urges authors to provide effect size estimates. Several journals, including Educational and Psychological Measurement, have adopted author g...

127 citations


Journal ArticleDOI
TL;DR: A scale to measure teacher sense of work autonomy, Teacher Work-Autonomy (TWA), is presented in this paper, which is based on a sample of 156 Israeli elementary school teachers and in the second, a total of 650 elementary and secondary school teachers in Israel served as participants.
Abstract: A scale to measure teacher sense of work autonomy, Teacher Work-Autonomy (TWA), is presented in this article. Replicability analyses of scores on the TWA —cross validation and validity generalization—were conducted and results reported in two studies. The purpose of the first study was to conceptualize the notion of teacher work autonomy, and the purpose of the second was to provide empirical evidence for the validity of scores on a scale derived from the results of the first study. The first study was based on a sample of 156 Israeli elementary school teachers, and in the second, a total of 650 elementary and secondary school teachers in Israel served as participants. Facet theory analytic techniques and factor analysis were used. Results indicated that four areas of functioning were pertinent to teachers’ sense of autonomy at work: (a) class teaching, (b) school mode of operating, (c) staff development, and (d) curriculum development.

112 citations


Journal ArticleDOI
TL;DR: In this paper, the use of student ratings of instructional quality is enhanced by an understanding of the nature of the underlying dimensions, and confirmatory factor analysis procedures are used to evaluate student ratings.
Abstract: The use of student ratings of instructional quality is enhanced by an understanding of the nature of the underlying dimensions. In the current investigation, confirmatory factor analysis procedures...

105 citations


Journal ArticleDOI
TL;DR: In this paper, the authors developed a conceptually and methodologically sound measure of employee identification with the work group using a three-phase analysis approach using subject matter experts in the field of organizational behavior and psychology.
Abstract: The objective of this study was to develop a conceptually and methodologically sound measure of employee identification with the work group. A three-phase analysis approach was used. First, a content analysis was conducted with subject matter experts (SMEs) in the field of organizational behavior and psychology. Second, an exploratory factor analysis of the factor structure was conducted using a sample of employees from a credit union (N = 140). Finally, confirmatory analyses using LISREL 8 were conducted with a sample of employees derived from four insurance organizations (N = 309). The exploratory and confirmatory factor analyses supported the factor structure of the identification measure and the scale scores showed acceptable levels of internal consistency in both samples ([.alpha] = .78; [.alpha] = .79, respectively). We also demonstrated that the construct of work group identification is distinct from but related to both work group cohesiveness and work group communication.

Journal ArticleDOI
TL;DR: The Utrecht Early Mathematical Competence Scales as discussed by the authors were developed to assess the developmental level of early mathematical competence in children ages 4 to 7 years, and the items were administered to 823 boys and girls in the 4-to 7-year age groups.
Abstract: The purpose of the research presented was to construct the Utrecht Early Mathematical Competence Scales to assess the developmental level of early mathematical competence in children ages 4 to 7 years. Eight mathematically different domains were distinguished, and an initial pool of 120 items was developed. The items were administered to 823 boys and girls in the 4-to 7-year age groups. Results demonstrate that the generalized one-parameter logistic model could explain the responses to 80 items. Two scales (A and B) were derived, each consisting of 40 items. In this article, Form A is emphasized. To test for group differences, a 6 (half-year age groups) ´ 2 (gender) factorial analysis of variance was used. A one-way analysis of variance with grade as the independent variable was used to test for grade differences. Participants within each age and grade group differed considerably in developmental level of early mathematical competence.

Journal ArticleDOI
TL;DR: This article examined the factor structure of high school students' responses to the Verbal, Math, Academic, and General self-concept scales of the Self-Description Questionnaire-II (SDQ-II) administered in China at two time points.
Abstract: This study examined the factor structure of high school students’responses to the Verbal, Math, Academic, and General self-concept scales of the Self-Description Questionnaire-II (SDQ-II) administered in China at two time points. Confirmatory factor analysis (CFA) showed that these students clearly distinguished the four scales. Math and Chinese self-concepts were positively correlated with Academic self-concept and with General self-concept. However, Math and Chinese self-concepts were negatively correlated with each other. The stability coefficients for each construct over time was high. The inclusion of Chinese and math achievement scores as well as teachers’ratings of general academic performance in CFA models further confirmed the validity of the multidimensional constructs by demonstrating a high correlation with each matching self-concept construct. All patterns of relations were replicated in a second collection of data 6 months later. The results support the multidimensionality and domain specifi...

Journal ArticleDOI
TL;DR: Theoretical and test simulation work reveals that under the knowledge-or-random-guessing assumption, three-option item tests are at least as good as four-option items tests in terms of item discrimination and internal consistency.
Abstract: Theoretical and test simulation work reveals that under the knowledge-or-randomguessing assumption, three-option item tests are at least as good as four-option item tests in terms of item discrimination and internal consistency. Of concern, however, is the finding that multiple-choice items may be susceptible to testwiseness, thereby contradicting the random-guessing assumption. Both item-level and test-level characteristics were examined for items included in a high stakes school-leaving mathematics examination. As expected, the influence of testwiseness is lessened when three-option items are used instead of four-option items. Differences and nondifferences between the psychometric characteristics of the three-option and four-option test forms tend to agree with the findings of earlier studies: Tests consisting of three-option items are at least equivalent to tests composed of four options in terms of internal consistency score reliability, difficulty is inversely related to the number of options, and t...

Journal ArticleDOI
TL;DR: In this paper, confirmatory factor analysis was used to evaluate the factor structure of a Chinese version of Pintrich and De Groot's Motivated Strategies for Learning Questionnaire (MSLQ).
Abstract: Confirmatory factor analysis was used to evaluate the factor structure of a Chinese version of Pintrich and De Groot’s Motivated Strategies for Learning Questionnaire (MSLQ). Data were gathered fro...

Journal ArticleDOI
TL;DR: In this paper, the authors compared eight models for analyzing count data: OLS, OLS with a transformed dependent variable, Tobit, Poisson, overdispersed Poisson and negative binomial, ordinal logistic, and ordinal probit regressions.
Abstract: The present study compares eight models for analyzing count data: ordinary least squares (OLS), OLS with a transformed dependent variable, Tobit, Poisson, overdispersed Poisson, negative binomial, ordinal logistic, and ordinal probit regressions. Simulation reveals the extent that each model produces false positives. Results suggest that, despite methodological expectations, OLS regression does not produce more false positives than expected by chance. The Tobit and Poisson models yield too many false positives. The negative binomial models produce fewer than expected false positives.

Journal ArticleDOI
TL;DR: It is concluded that in factor analytic studies using promax, the value of k may be appropriately set at 2, 3, or 4, depending on the version of promax.
Abstract: A Monte Carlo study involving 10,080 factor analyses examined the optimal value of k for promax factor rotations. The value of k was varied from 2 to 10 using three versions of promax. Error and bias of the sample factor pattern were found to be lower when k [.lessequal] 5 than when k > 5 but changed only slightly as k varied between 2 and 5. The best value of k was 2, 3, or 4, depending on the version of promax. It is concluded that in factor analytic studies using promax, the value of k may be appropriately set at 2, 3, or 4.

Journal ArticleDOI
TL;DR: In this paper, the authors report the development and score validation of an instrument for measuring anxieties students experience in college chemistry laboratories, which is based on factor analytic evidence from an initiator.
Abstract: The present study reports the development and score validation of an instrument for measuring anxieties students experience in college chemistry laboratories. Factor analytic evidence from an initi...

Journal ArticleDOI
TL;DR: In this article, the effects of retaining test items manifesting differential item functioning (DIF) on aspects of the measurement quality and validity of that test's scores were investigated using the Mantel-Haenszel procedure, which allows one to detect items that function differently in two groups of examinees at constant levels of the trait.
Abstract: This study investigated effects of retaining test items manifesting differential item functioning (DIF) on aspects of the measurement quality and validity of that test’s scores. DIF was evaluated using the Mantel-Haenszel procedure, which allows one to detect items that function differently in two groups of examinees at constant levels of the trait. Multiple composites of DIF-and non-DIF-containing items were created to examine the impact of DIF on the measurement, validity, and predictive relations involving those composites. Criteria used were the American College Testing composite, the Scholastic Aptitude Test (SAT) verbal (SATV), quantitative (SATQ), composite (SATC), and grade point average rank percentile. Results indicate measurement quality of tests is not seriously degraded when items manifesting DIF are retained, even when number of items in the compared composites has been controlled. Implications of results are discussed within the framework of multiple determinants of item responses.

Journal ArticleDOI
TL;DR: In this paper, the effects of autocorrelated errors on Type I error in ordinary least-squares models are clarified, and it is shown that under certain conditions, distortion in type I error is far less than is predicted by asymptotic theory.
Abstract: Several issues regarding the effects of autocorrelated errors on Type I error in ordinary least-squares models are clarified. Although autocorrelated errors have a large effect on both omnibus F tests and tests on individual intervention effect coefficients in many applications, there are exceptions that have not been pointed out previously. It is demonstrated that under certain conditions, distortion in Type I error is far less than is predicted by asymptotic theory. It is shown that these exceptions occur because the effect of autocorrelated errors is dependent on (a) the type of parameters (e.g., level change and/or slope change) required in the model, (b) the number of variables in the design matrix, and (c) the sample size. Because existing time-series methods perform poorly with small samples, this may be a useful finding in some situations; however, a better general solution is to use a recently developed small-sample method.

Journal ArticleDOI
TL;DR: In this paper, the authors focus on the practical effect of skipping or not reaching items anindividual skipped or could not reach in time, and show that missing data occur for a variety of reasons.
Abstract: Missing data occur for a variety of reasons. Particularly problematic are those items anindividual skipped or could not reach in time. This article focuses on the practical effectsof using differen...

Journal ArticleDOI
TL;DR: In this article, the optimal number of raters to employ for obtaining stable cutoff scores was evaluated via generalizability theory, and it was found that approximately 10 to 15 raters is an optimal target range, although fewer raters may sometimes be sufficient.
Abstract: Although the Angoff method is perhaps the most highly regarded and widely used judgmental method for setting criterion-referenced cutoff scores, little research has been conducted to determine the optimal number of raters to employ for obtaining stable cutoff scores. In the present study, Angoff ratings obtained from eight different occupational licensing examinations were evaluated via generalizability theory to estimate the optimal number of raters. Results indicated that approximately 10 to 15 raters is an optimal target range, although fewer raters may sometimes be sufficient. It is recommended that 10 to 15 raters be sampled as a general rule, followed by analyses to determine the optimal number of raters for examinations in a particular profession using a particular procedural variant of the Angoff method.

Journal ArticleDOI
TL;DR: In this paper, the authors examined the factor structure of the Organizational Identification Question-naire (OIQ), the most widely used instrument today for the assessment of organizational identification, and found four first-order and two second-order components.
Abstract: This study examined the factor structure of the Organizational Identification Question-naire (OIQ), the most widely used instrument today for the assessment of organizational identification. An analysis of a sample of social service employees in the southwestern United States (N = 369) yielded four first-order and two second-order components. This study contributes to the research literature pertaining to the structure of the OIQ.

Journal ArticleDOI
TL;DR: The School Function Assessment (SFA) as mentioned in this paper was developed to provide information on students abilities to meet functional demands of the elementary school program and support a comprehensive, detailed examination of the extent to which students with a variety of disabilities are performing important schoolrelated functional tasks and activities such as moving around the school, using classroom materials, interacting with peers, and caring for personal needs.
Abstract: The School Function Assessment (SFA) was developed to provide information on students’ abilities to meet functional demands of the elementary school program. This judgment-based, criterion-referenced assessment supports a comprehensive, detailed examination of the extent to which students with a variety of disabilities are performing important school-related functional tasks and activities such as moving around the school, using classroom materials, interacting with peers, and caring for personal needs. The factor structure underlying the SFA Activity Performance scales was investigated using data from two heterogeneous national samples of students with disabilities (n = 266 and n = 341). Analysis with the principal axis technique and oblique rotation identified two factors, a Cognitive/Behavioral Function dimension and a Physical Function dimension. These factors were moderately correlated, which is congruent with definitions of the function construct that emphasize the integration of physical, cognitive...

Journal ArticleDOI
TL;DR: In this paper, the authors investigated the impact of the number of raters and the type of decision (relative vs. absolute) on the reliability of writing scores and found that the reliability coefficients for writing scores decline when absolute decisions rather than relative decisions are made.
Abstract: Issues surrounding the psychometric properties of writing assessments have received ongoing attention. However, the reliability estimates of scores derived from various holistic and analytical scoring strategies reported in the literature have relied on classical test theory (CT), which accounts for only a single source of variance within a given analysis. Generalizability theory (GT) is a more powerful and flexible strategy that allows for the simultaneous estimation of multiple sources of error variance to estimate the reliability of test scores. Using GT, two studies were conducted to investigate the impact of the number of raters and the type of decision (relative vs. absolute) on the reliability of writing scores. The results of both studies indicated that the reliability coefficients for writing scores decline as (a) the number of raters is reduced and (b) when absolute decisions rather than relative decisions are made.

Journal ArticleDOI
TL;DR: The 18-item Need for Cognition (NFC) Scale (short version) was administered to two samples of 510 and 697 Australian males and females as mentioned in this paper, and the results indicated that the short NFC Scale is applicable for use with Australian samples.
Abstract: The 18-item Need for Cognition (NFC) Scale (short version) was administered to two samples of 510 and 697 Australian males and females. Consistent with the findings of other researchers, a principal components analysis and confirmatory factor analysis indicated one dominant factor. The scores for 17 of the items were also shown to have internal consistency. The findings indicate that the short NFC Scale is applicable for use with Australian samples.

Journal ArticleDOI
TL;DR: The authors evaluated the psychometric properties and factor structure of the new Form S as well as the role of gender and social desirability in assessing the critical thinking of management and nursing students.
Abstract: The Watson-Glaser Critical Thinking Appraisal (WGCTA) is a long-established and widely used measure. Recently, Watson and Glaser developed a short version, Form S, as a quickly administered measure of critical thinking. This study used samples of management (n = 142) and nursing (n = 123) undergraduates to evaluate the psychometric properties and factor structure of the new Form S as well as the role of gender and social desirability. The results provide only limited support for Form S because of poor to moderate internal-consistency reliability of scores on the five subtests and poor to moderate recovery, in the confirmatory factor analysis and principal components analysis, of the five subtests underlying the critical thinking construct. Scores on Form S were independent of social desirability scores and gender. Although use of Form S can be recommended for some purposes, further psychometric refinement is warranted.

Journal ArticleDOI
TL;DR: In this article, the effects of scale coarseness on the expected value, variance, covariance, correlation coefficient, and reliability coefficient are derived. But the results are not applicable to the case of rating scales.
Abstract: Equations for calculating the biases induced by coarse measurement scales are derived. Equations for the expected value, variance, covariance, correlation coefficient, and reliability coefficient are provided. The equations can be used to study the effects of measurement scale coarseness. Examples are given that illustrate that biases can vary depending on the mean and variance of the quantities being measured, the number of scale points, the rule for assigning quantities to scale points, and the number of items in a scale. Equations for the bias limits are also derived. Under some conditions, the biases disappear as the number of scale points increases. To avoid bias, it is recommended that graphic rating scales be used.

Journal ArticleDOI
TL;DR: The authors investigated the effects of types and prevalence of response patterns that might be provided by nonattending respondents on Cronbach's alpha and found that participants with low enthusiasm for completing the survey, for whatever reason, are most likely to be nonattendive.
Abstract: This research investigated the effects of types and prevalence of response patterns that might be provided by nonattending respondents on Cronbach’s alpha. Three simulated data sets, one for each value of Cronbach’s alpha .700, .800, and .900, were constructed for 100 respondents on 50 one-to-seven Likert items. Participants were replaced randomly in each population by one of eight response patterns at 5%, 10%, 15%, and 20% replacement levels. Effects were greater as a function of increased prevalence in the respondent group; however, as few as 5% of certain types of nonattending patterns had strong, inflating effects on alpha. Anyone who has observed a group of teachers called to an after-school meeting to complete a survey is likely to have seen great variation in their enthusiasm for completing the task. Also, when a group of students is asked to complete a survey in a classroom setting, similar variation is often observed. Individuals with low enthusiasm for completing the survey, for whatever reason, are most likely to be nonattending respondents. When we consider the widespread use of such surveys in educational measurement, this situation needs to be examined relative to effects on commonly used survey statistics, identification of such behavior, and possible removal from the data set. The issue of error or bias associated with attitude assessment has been discussed for the past several decades. Cronbach (1970) discussed two behaviors that bias responses, faking and acquiescence. Faking behavior is characterized by a respondent consciously providing invalid information such as

Journal ArticleDOI
TL;DR: The International AIDS Questionnaire-Chinese Version (IAQ-C) as mentioned in this paper was developed and tested with 1,667 Chinese adolescents and 277 Chinese university students in Hong Kong and the results indicated that four factors underlie the IAQC: transmission myths, attitudes, personal vulnerability, and facts.
Abstract: The purpose of this study was to develop and standardize an AIDS questionnaire for Chinese adolescents and to provide normative data on this instrument for use by clinicians, educators, and researchers. The International AIDS Questionnaire-Chinese Version (IAQ-C) was developed and tested with 1,667 Chinese adolescents and 277 Chinese university students in Hong Kong. Exploratory factor analysis indicated that four factors underlie the IAQ-C: transmission myths, attitudes, personal vulnerability, and facts. Confirmatory factor analysis on two independent samples then confirmed this four-factor structural model. The IAQ-C appears to yield valid and reliable scores measuring HIV and AIDS knowledge and attitudes among Chinese adolescents in Hong Kong.