scispace - formally typeset
Search or ask a question

Showing papers in "Educational and Psychological Measurement in 2009"


Journal ArticleDOI
TL;DR: In this paper, a motivational conceptualization of engagement and disaffection is presented, which emphasizes children's constructive, focused, enthusiastic participation in the activities of classroom learning, and distinguishes engagement from disaffections, as well as behavioral features from emotional features.
Abstract: This article presents a motivational conceptualization of engagement and disaffection: First, it emphasizes children’s constructive, focused, enthusiastic participation in the activities of classroom learning; second, it distinguishes engagement from disaffection, as well as behavioral features from emotional features. Psychometric properties of scores from teacher and student reports of behavioral engagement, emotional engagement, behavioral disaffection, and emotional disaffection were examined using data from 1,018 third through sixth graders. Structural analyses of the four indicators confirm that a multidimensional structure fits the data better than do bipolar or unidimensional models. Validity of scores is supported by findings that teacher reports are correlated with student reports, with in vivo observations in the classroom, and with markers of self-system and social contextual processes. As such, these measures capture important features of engagement and disaffection in the classroom, and any comprehensive assessment should include markers of each. Additional dimensions are identified, pointing the way to future research.

1,069 citations


Journal ArticleDOI
TL;DR: In this paper, the authors examined motivation and engagement across elementary school, high school, and university/college, with particular focus on the Motivation and Engagement Scale (comprising adaptive, impeding/maladaptive, and maladaptive factors).
Abstract: From a developmental construct validity perspective, this study examines motivation and engagement across elementary school, high school, and university/college, with particular focus on the Motivation and Engagement Scale (comprising adaptive, impeding/maladaptive, and maladaptive factors). Findings demonstrated developmental construct validity across the three distinct educational stages in terms of goodfitting first- and higher order factors, invariance of factor structure across gender and age, and a pattern of correlations with cognate constructs (e.g., homework completion, academic buoyancy, class participation) consistent with predictions. Notwithstanding the predominantly parallel findings, there was also notable distinctiveness, primarily in terms of mean-level effects, such that elementary school students were generally more motivated and engaged than university/college students who in turn were more motivated and engaged than high school students. Implications for motivation and engagement meas...

250 citations


Journal ArticleDOI
TL;DR: This article reported the results of three studies designed to extend the psychometric analyses of cultural intelligence (CQ) and examine its utility in the prediction of cross-cultural adaptation in international students.
Abstract: The article reports the results of three studies designed to extend the psychometric analyses of cultural intelligence (CQ) and to examine its utility in the prediction of cross-cultural adaptation. The first study supported the proposed four-factor (Cognitive, Meta-cognitive, Motivational, and Behavioral) structure of CQ in a large sample of international students (N = 346). The second study (N = 118) revealed a strong correlation (r = .82) between CQ and emotional intelligence and failed to support the incremental validity of CQ scores in the prediction of psychological, sociocultural, and academic adaptation in international students. The final study (N = 102) established discriminant validity (r = .04) between scores of CQ and a test of general cognitive ability (Raven's Advanced Progressive Matrices) and convergent validity across scores of the CQ and Multicultural Personality Questionnaire subscales; however, CQ scores did not demonstrate additional incremental validity in the prediction of adaptive...

179 citations


Journal ArticleDOI
TL;DR: In this article, a meta-analysis of survey response rates in published research in counseling and clinical psychology over a 20-year span is presented, which describes reported survey administration procedures in those fields.
Abstract: This article reports results of a meta-analysis of survey response rates in published research in counseling and clinical psychology over a 20-year span and describes reported survey administration procedures in those fields. Results of 308 survey administrations showed a weighted average response rate of 49.6%. Among possible moderators, response rates differed only by population sampled, journal in which articles were published, sampling source and method, and use of follow-up. Researchers whose studies were included in this meta-analysis used follow-up but rarely used incentives, prenotification, or other response-facilitation methods to maximize response rates. Although the future of survey research in general may rely more heavily on Internet data collection, mail surveys dominate in this field.

156 citations


Journal ArticleDOI
TL;DR: The Distributed Leadership Inventory (DLI) as discussed by the authors was developed and evaluated to investigate leadership team characteristics and distribution of leadership functions between formally designed leadership positions in large secondary schools, and the results of the DLI underpin that leading schools involve multiple individuals, which differs by the type of function.
Abstract: Systematic quantitative research on measuring distributed leadership is scarce. In this study, the Distributed Leadership Inventory (DLI) was developed and evaluated to investigate leadership team characteristics and distribution of leadership functions between formally designed leadership positions in large secondary schools. The DLI was presented to a sample of 2,198 respondents in 46 secondary schools. The input from a first subsample was used to perform exploratory factor analyses; the second subsample was used to verify the factor structure via confirmatory factor analysis. A one-factor structure for the leadership team characteristics (coherent leadership team) and a two-factor structure for the leadership functions (support and supervision) were confirmed. The results of the DLI underpin that leading schools involve multiple individuals, which differs by the type of function.

126 citations


Journal ArticleDOI
TL;DR: In this paper, a new way to model faking based on the assumption that faking occurs due to an interaction between person and situation is introduced, which combines a control group design with structural equation modeling and allows a separation of trait and faking variance.
Abstract: The impact of socially desirable responding or faking on noncognitive assessments remains an issue of strong debate. One of the main reasons for the controversy is the lack of a statistical method to model such response sets. This article introduces a new way to model faking based on the assumption that faking occurs due to an interaction between person and situation. The technique combines a control group design with structural equation modeling and allows a separation of trait and faking variance. The model is introduced and tested in an example. The results confirm a causal influence of faking on means and covariance structure of a Big 5 questionnaire. Both effects can be reversed by the proposed model. Finally, a real-life criterion was implemented and predicted by both variance sources. In this example, it was the trait but not the faking variance that was predictive. Implications for research and practice are discussed.

112 citations


Journal ArticleDOI
TL;DR: This article used exploratory and confirmatory factor analytic methods to examine the structure of the short form of the Bem Sex Role Inventory (BSRI) and found that women were found to score higher on the feminine factor.
Abstract: The short form of the Bem Sex Role Inventory (BSRI) contains half as many items as the long form and yet has often demonstrated better reliability and validity. This study uses exploratory and confirmatory factor analytic methods to examine the structure of the short form of the BSRI. A structure noted elsewhere also emerged here, consisting of two masculine factors and a single feminine factor. The three-factor model was found to be invariant across gender groups and also across two divergent samples, the first sample of college students and the second sample of accountants. As expected, women were found to score higher on the feminine factor. On a masculine factor that seemed to represent social control, men scored significantly higher than women did. However, no differences were found between men and women on a second masculine factor that seemed to represent a more internal, self-control dimension.

99 citations


Journal ArticleDOI
TL;DR: This paper developed survey measures of absorptive capacity (the ability to transform new knowledge into usable knowledge) and experienced community of practice (the extent to which a person is engaged with the given practice community) to provide tools appropriate for field research.
Abstract: Research on knowledge transfer in organizations has been hampered by the lack of tools yielding valid scores for studying critical constructs in concert. The authors developed survey measures of absorptive capacity (the ability to transform new knowledge into usable knowledge) and experienced community of practice (the extent to which a person is engaged with the given practice community) to provide tools appropriate for field research. A holdout sample of 1,971 engineers in a Fortune 100 science/technology company yielded 583 responses. Confirmatory factor analysis was used to assess internal structure, and convergent and discriminant evidence of validity. Path analysis was used to assess criterion-related validity. Results demonstrate that the new measures are internally consistent, are related in meaningful ways to other organizational variables, and provide distinct explanatory power. An additional 231 responses from a second Fortune 100 science/technology company provides cross-validation.

90 citations


Journal ArticleDOI
TL;DR: In this article, the authors investigated the application of the parallel analysis (PA) method for choosing the number of factors in component analysis for situations in which data are dichotomous or ordinal.
Abstract: The purpose of this study was to investigate the application of the parallel analysis (PA) method for choosing the number of factors in component analysis for situations in which data are dichotomous or ordinal Although polychoric correlations are sometimes used as input for component analyses, the random data matrices generated for use in PA typically consist of Pearson correlations In this study, the authors matched the type of random data matrix to the type of input matrix Analyses were conducted on both polychoric and Pearson correlation matrices, and random matrices of the same type (polychoric or Pearson) were generated for the PA procedure PA based on random Pearson correlations was found to perform at least as well as PA based on random polychoric correlations, for nearly all of the conditions studied

66 citations


Journal ArticleDOI
TL;DR: In this paper, the authors implemented a scale purification procedure onto the standard MIMIC method for differential item functioning (DIF) detection and assessed its performance through a series of simulations.
Abstract: This study implements a scale purification procedure onto the standard MIMIC method for differential item functioning (DIF) detection and assesses its performance through a series of simulations. It is found that the MIMIC method with scale purification (denoted as M-SP) outperforms the standard MIMIC method (denoted as M-ST) in controlling false-positive rates and yielding higher true-positive rates. Only when the DIF pattern is balanced between groups or when there is a small percentage of DIF items in the test does M-ST perform as appropriately as M-SP. Moreover, both methods yield a higher true-positive rate under the two-parameter logistic model than under the three-parameter model. M-SP is preferable to M-ST, because DIF patterns in real tests are unlikely to be perfectly balanced and the percentages of DIF items may not be small.

57 citations


Journal ArticleDOI
TL;DR: In this paper, a broad framework for item selection in computerized classification testing (CCT) is presented that incorporates these points and demonstrates that the efficiency of item selection approaches depend on the termination criteria that are used.
Abstract: Several alternatives for item selection algorithms based on item response theory in computerized classification testing (CCT) have been suggested, with no conclusive evidence on the substantial superiority of a single method. It is argued that the lack of sizable effect is because some of the methods actually assess items very similarly through different calculations and will usually select the same item. Consideration of methods that assess information across a wider range is often unnecessary under realistic conditions, although it might be advantageous to utilize them only early in a test. In addition, the efficiency of item selection approaches depend on the termination criteria that are used, which is demonstrated through didactic example and Monte Carlo simulation. Item selection at the cut score, which seems conceptually appropriate for CCT, is not always the most efficient option. A broad framework for item selection in CCT is presented that incorporates these points.

Journal ArticleDOI
TL;DR: In this paper, a scale assessing high school students' selfefficacy beliefs in chemistry-related tasks and psychometric properties of scores on this scale was developed. But the authors did not assess the psychometrics properties of these scales.
Abstract: The aim of this study was to develop a scale assessing high school students’ selfefficacy beliefs in chemistry-related tasks and to assess psychometric properties of scores on this scale. A pilot s...

Journal ArticleDOI
TL;DR: The authors examined the measurement equivalence of a second-order factor model of emotional intelligence (EI) using scores for 921 job applicants obtained during a personnel selection process and found that scores on the Wong and Law Emotional Intelligence Scale (WLEIS) are comparable across gender and ethnic groups.
Abstract: The present study examined the measurement equivalence of a second-order factor model of emotional intelligence (EI). Using scores for 921 job applicants obtained during a personnel selection process, measurement equivalence of the Wong and Law Emotional Intelligence Scale (WLEIS) was tested across ethnic (Whites, Blacks, and Hispanics) and gender groups. Results (a) supported the four-dimension, second-order factor structure of EI and (b) indicated that scores on the WLEIS are comparable across gender and ethnic groups. Findings are discussed in the context of applied and research-based relevance.

Journal ArticleDOI
TL;DR: Stress-related growth is defined as the perception or experience of deriving benefits from encountering stressful circumstances and has been identified as a protective factor against stress as mentioned in this paper, thus, it can be viewed as a coping mechanism against stress.
Abstract: Stress-related growth is defined as the perception or experience of deriving benefits from encountering stressful circumstances and, thus, has been identified as a protective factor against stress....

Journal ArticleDOI
TL;DR: In this article, a conceptual framework for examining differential item functioning (DIF) and differential person functioning (DPF) as types of model is described. But, the main purpose of this study is to describe a conceptual approach for assessing students with disabilities.
Abstract: The major purpose of this study is to describe a conceptual framework for examining differential item functioning (DIF) and differential person functioning (DPF) as types of model—data misfit within the context of assessing students with disabilities. Specifically, DIF and DPF can be viewed through the lens of residual analyses. Residual analyses can be used to explore DIF (item fit) as well as extended to explore DPF (person fit). One of the advantages of this conceptual framework is that the size of the subgroups can be quite small with interpretable results produced even for individuals. To illustrate this conceptual framework, Rasch measurement theory is used as the item response theory model. Methodological and theoretical issues are discussed. Data from a high-stakes assessment in mathematics in Georgia (Grade 7, geometry items) are used to illustrate the conceptual framework for students with disabilities. The substantive research questions used to illustrate the conceptual framework addresses whet...

Journal ArticleDOI
TL;DR: Findings indicate no major differences between the two types of journals in terms of ES reporting practices, and different conclusions could be reached based on interpreting ES versus p values.
Abstract: Effect size (ES) reporting practices in a sample of 10 educational research journals are examined in this study. Five of these journals explicitly require reporting ES and the other 5 have no such policy. Data were obtained from 99 articles published in the years 2003 and 2004, in which 183 statistical analyses were conducted. Findings indicate no major differences between the two types of journals in terms of ES reporting practices. Different conclusions could be reached based on interpreting ES versus p values. The discrepancy between conclusions based on statistical versus practical significance is frequently not reported, not interpreted, and mostly not discussed or resolved.

Journal ArticleDOI
TL;DR: The psychometric properties of scores from the achievement goal questionnaire were examined in samples of Japanese (N = 326) and Canadian (n = 307) postsecondary students as mentioned in this paper, and strong evidence for the four-factor structure of achievement goals in both the Canadian and Japanese populations.
Abstract: The psychometric properties of scores from the Achievement Goal Questionnaire were examined in samples of Japanese (N = 326) and Canadian (N = 307) postsecondary students. Previous research found evidence of a four-factor structure of achievement goals in U.S. samples. Using confirmatory factor-analytic techniques, the authors found strong evidence for the four-factor structure of achievement goals in both the Canadian and Japanese populations. Subsequent multigroup structural equation modeling indicated the metric invariance of this four-factor structure across the two populations.

Journal ArticleDOI
TL;DR: In this article, the authors demonstrate how a multidimensional Rasch analysis can be employed to take into account the information about the correlation between latent traits such that the precision of each subtest measure can be improved and the correlation among latent traits can be accurately estimated.
Abstract: Educational and psychological tests are often composed of multiple short subtests, each measuring a distinct latent trait. Unfortunately, short subtests suffer from low measurement precision, which makes the bandwidth—fidelity dilemma inevitable. In this study, the authors demonstrate how a multidimensional Rasch analysis can be employed to take into account the information about the correlation between latent traits such that the precision of each subtest measure can be improved and the correlation between latent traits can be accurately estimated. A real data set of the 13-scale Thinking Styles Inventory was analyzed with the traditional unidimensional approach and the multidimensional approach. The results demonstrate that in contrast to the unidimensional approach, the multidimensional approach yields a much higher level of measurement precision and a more appropriate estimate for the correlation between thinking styles. In conclusion, even short subtests can yield highly precise measures such that th...

Journal ArticleDOI
TL;DR: In this paper, the authors used Rasch modeling to examine the properties of the self-deception scale of the Balanced Inventory of Desirable Responding in terms of dimensionality, use of response category, sample appropriateness, and reliability.
Abstract: Self-deception has become a construct of great interest in individual differences research because it has been associated with levels of resilience and mental health. The Balanced Inventory of Desirable Responding (BIDR) is a self-report measure used for quantifying self-deception. In this study we used Rasch modeling to examine the properties of the self-deception scale of the BIDR in terms of dimensionality, use of response category, sample appropriateness, and reliability. A total of 315 university students (ages 18-21) were administered the self-deception scale of the BIDR. Seven-category and 2-category scoring methods were compared, as approved by the developers of the scale. Overall, the 7-category model was the best fit for the data and the sample. We concluded that the scale has the best reliability using a 7-category model with Item 13 deleted. Because of low person measure separation and reliability, the appropriateness of use of this instrument in undergraduate populations is questioned; the use of the measure in populations with larger ranges of self-deception is not recommended.

Journal ArticleDOI
TL;DR: The linear logistic test model (LLTM) as mentioned in this paper breaks down the item parameter of the Rasch model as a linear combination of some hypothesized elementary parameters, which can be used for psychometric research on various testing conditions.
Abstract: The linear logistic test model (LLTM) breaks down the item parameter of the Rasch model as a linear combination of some hypothesized elementary parameters. Although the original purpose of applying the LLTM was primarily to generate test items with specified item difficulty, there are still many other potential applications, which may be of use for psychometric research on various testing conditions. This article provides some examples of such applications. The examples include (a) position effect of item presentation (in particular, learning and fatigue effects); (b) content-specific learning effect; (c) effect of speeded item presentation; and (d) effect of item response format.

Journal ArticleDOI
TL;DR: Evidence is provided that parallel analysis may not be used to uncover the factorial structure of binary variables conforming to the unidimensional normal ogive model, and theoretical and empirical evidence that this is not appropriate.
Abstract: Parallel analysis has been shown to be suitable for dimensionality assessment in factor analysis of continuous variables. There have also been attempts to demonstrate that it may be used to uncover the factorial structure of binary variables conforming to the unidimensional normal ogive model. This article provides both theoretical and empirical evidence that this is not appropriate. Results of a simulation study indicate that sample size, item discrimination, and type of correlation coefficient (Pearson vs. tetrachoric correlation) considerably influence the performance of parallel analysis. Reliability of parallel analysis with binary variables is found to be notably poor for Pearson correlations and also limited for tetrachoric correlations.

Journal ArticleDOI
TL;DR: In this article, the impact of scaling gradients on a single-item Direct Behavior Rating (DBR) was examined using Generalizability Theory of Generalization (GTIG).
Abstract: Generalizability theory is used to examine the impact of scaling gradients on a single-item Direct Behavior Rating (DBR). A DBR refers to a type of rating scale used to efficiently record target be...

Journal ArticleDOI
TL;DR: In this article, the authors examined the psychometric properties of the scores on a version for children of the Carver and White Behavioral Inhibition and Activation scales (the BIS-BAS scales).
Abstract: The primary aim of this study was to examine the psychometric properties of the scores on a version for children of the Carver and White Behavioral Inhibition and Activation scales (the BIS—BAS scales). This involved administering the BIS—BAS scales, the Positive and Negative Affect Schedule, the Junior Eysenck Personality Questionnaire Revised—Abbreviated, and the Achievement Motives Scale to a population of 661 Norwegian sixth graders. The findings reveal that the scores on the BIS—BAS scales for children have a theoretically meaningful factor structure as well as satisfactory convergent validity and reliability. However, the results indicate that the version for children of the BAS scale actually may consist of two subscales: one related to the experience of pleasurable affect and one to persistence in goal pursuit. To a large extent, the relationship with the other scales was as expected.

Journal ArticleDOI
TL;DR: Results indicate that the weighting mechanism successfully addresses the constraints, that stratification helps to a great extent balancing exposure rates, and that the ascending-a design improves measurement precision.
Abstract: a-stratification is a method that utilizes items with small discrimination (a) parameters early in an exam and those with higher a values when more is learned about the ability parameter. It can achieve much better item usage than the maximum information criterion (MIC). To make a-stratification more practical and more widely applicable, a method for weighting the item selection process in a-stratification as a means of satisfying multiple test constraints is proposed. This method is studied in simulation against an analogous method without stratification as well as a-stratification using descending-rather than ascending-a procedures. In addition, a variation of a-stratification that allows for unbalanced usage of a parameters is included in the study to examine the trade-off between efficiency and exposure control. Finally, MIC and randomized item selection are included as baseline measures. Results indicate that the weighting mechanism successfully addresses the constraints, that stratification helps to...

Journal ArticleDOI
TL;DR: Results show that the interaction effects between missingness mechanism, treatment, and rate are most influential for explaining variation in bias, root mean square errors, and rejection rates.
Abstract: This article describes the results of a simulation study to investigate the impact of missing data on the detection of differential item functioning (DIF). Specifically, it investigates how four methods for dealing with missing data (listwise deletion, zero imputation, two-way imputation, response function imputation) interact with two methods of DIF detection (Mantel-Haenszel statistic, logistic regression analysis) under three mechanisms of missingness (data missing completely at random, data missing at random, and data missing not at random) to produce over- or underestimates of the DIF effect sizes and detection rates. Results show that the interaction effects between missingness mechanism, treatment, and rate are most influential for explaining variation in bias, root mean square errors, and rejection rates. An incorrect treatment of missing data can thus lead to severe increases of Type I and Type II error rates. However, the choice between the two DIF detection methods investigated in this study is...

Journal ArticleDOI
TL;DR: In this paper, the authors compare two methods on the quality of their suggestions to adjust incorrect assignments of items to subtests, the confirmatory common factor (CCF) method and the oblique multiple group (OMG) method.
Abstract: A common question in test evaluation is whether an a priori assignment of items to subtests is supported by empirical data. If the analysis results indicate the assignment of items to subtests under study is not supported by data, the assignment is often adjusted. In this study the authors compare two methods on the quality of their suggestions to adjust incorrect assignments of items to subtests. The confirmatory common factor (CCF) method is often used in practice. However, previous research reported rather poor quality of the suggested adjustments. Therefore, the CCF method is compared with a less often used but promising method, the oblique multiple group (OMG) method. The authors compared both methods with a simulation study taken under various conditions. For each method, several adjustment procedures were studied. The best adjustment procedure within the OMG method performed better than or highly comparable to the procedures within the CCF method.

Journal ArticleDOI
TL;DR: In this paper, the effect of mode of administration of the Raven Standard Progressive Matrices test on distribution, accuracy, and meaning of raw scores was investigated with repeated measures multivariate analysis of variance, internal consistency reliability estimates and confirmatory factor analysis approaches.
Abstract: This study investigates the effect of mode of administration of the Raven Standard Progressive Matrices test on distribution, accuracy, and meaning of raw scores. A random sample of high school students take counterbalanced paper-and-pencil and computer-based administrations of the test and answer a questionnaire surveying preferences for computer-delivered test administrations. Administration mode effect is studied with repeated measures multivariate analysis of variance, internal consistency reliability estimates, and confirmatory factor analysis approaches. Results show a lack of test mode effect on distribution, accuracy, and meaning of raw scores. Participants indicate their preferences for the computer-delivered administration of the test. The article discusses findings in light of previous studies of the Raven Standard Progressive Matrices test.

Journal ArticleDOI
TL;DR: In this paper, the authors evaluate alternative analysis strategies for the meta-analysis method of reliability generalization when the reliability estimates are not statistically independent, and the results suggest that the type of approach does not have a noticeable impact on the accuracy of the reliability results but that researchers should be cautious when the intraclass correlation is relatively large.
Abstract: This study was conducted to evaluate alternative analysis strategies for the meta-analysis method of reliability generalization when the reliability estimates are not statistically independent. Five approaches to dealing with the violation of independence were implemented: ignoring the violation and treating each observation as independent, calculating one mean or median from each study, selecting only one observation per study, and using a mixed-effects model. Monte Carlo methods were used to simulate samples under known and controlled population conditions. The results suggest that the type of approach does not have a noticeable impact on the accuracy of the reliability results but that researchers should be cautious when the intraclass correlation is relatively large. The simulations in this study also resulted in very poor confidence band coverage.

Journal ArticleDOI
TL;DR: In an effort to standardize academic application procedures, the authors developed the Standardized Letters of Recommendation (SLR) to capture important cognitive and noncognitive qualities of grad students as discussed by the authors.
Abstract: In an effort to standardize academic application procedures, the authors developed the Standardized Letters of Recommendation (SLR) to capture important cognitive and noncognitive qualities of grad...

Journal ArticleDOI
TL;DR: In this article, the authors used three item selection methods to create multiple short forms for two EARLI numeracy measures and compared these items selection methods on projected internal consistency and concurrent validity estimates for the resulting forms.
Abstract: Currently, few measures are available to monitor young children’s progress in acquiring key early academic skills. In response to this need, the authors have begun developing measures (i.e., the Early Arithmetic, Reading and Learning Indicators, or EARLI) of preschoolers’ numeracy skills. To accurately and efficiently monitor acquisition of early skills, users require multiple short forms that are appropriate in difficulty level for young children at different points in time. In the current study, the authors used three item selection methods to create multiple short forms for two EARLI numeracy measures. The authors then compared these item selection methods on projected internal consistency and concurrent validity estimates for the resulting forms. The short forms selected by these methods did not differ significantly on either criterion and appeared to be sufficiently sensitive to measure initial and acquisition of numeracy skills over time by preschool children enrolled in Head Start.