scispace - formally typeset
Search or ask a question

Showing papers in "Educational and Psychological Measurement in 2005"


Journal ArticleDOI
TL;DR: In this paper, the authors investigated the relationship between sample size and the quality of factor solutions obtained from exploratory factor analysis and found that when communalities are high, sample size tended to have less influence on the quality compared to when they were low.
Abstract: The purpose of this studywas to investigate the relationship between sample size and the quality of factor solutions obtained from exploratory factor analysis. This research expanded upon the range of conditions previously examined, employing a broad selection of criteria for the evaluation of the quality of sample factor solutions. Results showed that when communalities are high, sample size tended to have less influence on the quality of factor solutions than when communalities are low. Overdetermination of factors was also shown to improve the factor analysis solution. Finally, decisions about the quality of the factor solution depended upon which criteria were examined.

441 citations


Journal ArticleDOI
TL;DR: The purpose of this article was to provide a tutorial for performing cross-sectional and longitudinal analyses using this popular software platform, and borrowed heavily from Singer’s overview of SAS PROC MixED, duplicating her analyses using the SPSS MIXED procedure.
Abstract: Beginning with Version 11, SPSS implemented the MIXED procedure, which is capable of performing many common hierarchical linear model analyses. The purpose of this article was to provide a tutorial for performing cross-sectional and longitudinal analyses using this popular software platform. In doing so, the authors borrowed heavily from Singer’s overview of SAS PROC MIXED, duplicating her analyses using the SPSS MIXED procedure.

393 citations


Journal ArticleDOI
TL;DR: The authors present the implementations of gradient projection algorithms, both orthogonal and oblique, as well as a catalogue of rotation criteria and corresponding gradients and examples of rotation methods presented by applying them to a loading matrix from Wehmeyer and Palmer.
Abstract: Almost all modern rotation of factor loadings is based on optimizing a criterion, for example, the quartimax criterion for quartimax rotation. Recent advancements in numerical methods have led to general orthogonal and oblique algorithms for optimizing essentially any rotation criterion. All that is required for a specific application is a definition of the criterion and its gradient. The authors present the implementations of gradient projection algorithms, both orthogonal and oblique, as well as a catalogue of rotation criteria and corresponding gradients. Software for these is downloadable and free; a specific version is given for each of the computing environments used most by statisticians. Examples of rotation methods are presented by applying them to a loading matrix from Wehmeyer and Palmer.

388 citations


Journal ArticleDOI
TL;DR: The Marlowe-Crowne Social Desirability Scale (MCSDS) as mentioned in this paper is the most commonly used social desirability bias (SDB) assessment, conceptualizing SDB as an individual's need for approval.
Abstract: The Marlowe-Crowne Social Desirability Scale (MCSDS), the most commonly used social desirability bias (SDB) assessment, conceptualizes SDB as an individual’s need for approval. The Balanced Invento...

188 citations


Journal ArticleDOI
TL;DR: The authors used a rational-empirical approach to construct the Student Readiness Inventory, measuring psychosocial and academic-related skill factors found to predict two important college outcomes, academic performance and retention, in a recent meta-analysis.
Abstract: The authors used a rational-empirical approach to construct the Student Readiness Inventory, measuring psychosocial and academic-related skill factors found to predict two important college outcomes, academic performance and retention, in a recent meta-analysis. The initial item pool was administered to 5,970 first-year college students and high school seniors to empirically validate and cross-validate the underlying factor structure. Ten first-order and 3 second-order factors were derived, partially resembling the original conceptual model. Future study is needed to explore the criterion and predictive validities of the factors constituting this inventory.

178 citations


Journal ArticleDOI
TL;DR: In this paper, the authors demonstrate the application of these methods by analyzing the reliability of an eight-factor, nested-factor model that represents the structure of 45 tasks in an intelligence test (N= 1,233).
Abstract: Two aspects of the reliability of multidimensional measures can be distinguished: the amount of scale score variance that is accounted for by all underlying factors (composite reliability) and the degree to which the scale score reflects one particular factor (construct reliability). Confidence intervals for composite and construct reliabilities can be estimated by bootstrap methods. The authors demonstrate the application of these methods by analyzing the reliability of an eight-factor, nested-factor model that represents the structure of 45 tasks in an intelligence test (N= 1,233). Composite reliabilities ranged between .78 and .93, whereas construct reliabilities ranged between .17 and .68 when the scale indicators were equally weighted to compute the scale scores and between .52 and .90 with weights based on pattern coefficients. The results indicate the importance of distinguishing diagnostic from research applications when judging whether the reliability values of multidimensional measures are subst...

161 citations


Journal ArticleDOI
TL;DR: In this article, the bias-corrected and accelerated bootstrap confidence interval using the unbiased estimate of δ is proposed and recommended for general use, especially in cases in which the assumption of normality may be violated.
Abstract: The standardized group mean difference, Cohen’s d, is among the most commonly used and intuitively appealing effect sizes for group comparisons. However, reporting this point estimate alone does not reflect the extent to which sampling error may have led to an obtained value. A confidence interval expresses the uncertainty that exists between d and the population value, δ, it represents. A set of Monte Carlo simulations was conducted to examine the integrity of a noncentral approach analogous to that given by Steiger and Fouladi, as well as two bootstrap approaches in situations in which the normality assumption is violated. Because d is positively biased, a procedure given by Hedges and Olkin is outlined, such that an unbiased estimate of δ can be obtained. The bias-corrected and accelerated bootstrap confidence interval using the unbiased estimate of δ is proposed and recommended for general use, especially in cases in which the assumption of normality may be violated.

153 citations


Journal ArticleDOI
TL;DR: In this paper, the authors examined the construct and criterion-related validity of an ability-based EI measure (MSCEIT) and a mixed-model EQ measure (Emotional Quotient Inventory [EQ-i) using a military sample.
Abstract: Despite the popularity of the concept of emotional intelligence(EI), there is much controversy around its definition, measurement, and validity. Therefore, the authors examined the construct and criterion-related validity of an ability-based EI measure (Mayer Salovey Caruso Emotional Intelligence Test [MSCEIT]) and a mixed-model EI measure (Emotional Quotient Inventory [EQ-i]) using a military sample. Confirmatory factor analyses indicated that the four-factor model for the MSCEIT, but not the five-factor model for the EQ-i, fit well. MSCEIT and EQ-i scores were modestly intercorrelated. Gender was related only to the MSCEIT’s Emotional Perception scale scores. EQ-i scores, but not MSCEIT scores, tended to be strongly related to scores on measures assessing personality, self-monitoring ability, job satisfaction, and life satisfaction. The EQ-i also accounted for incremental variance in job and life satisfaction, after controlling for personality. Overall, cognitive ability scores were unrelated to EQ-i sc...

129 citations


Journal ArticleDOI
TL;DR: In this paper, a children's version of the Physical Self-Perception Profile (C-PSPP) was administered to seventh-, eighth-and ninth-grade high school students (N= 2,969).
Abstract: This study tests the generalizability of the factor pattern, structural parameters, and latent mean structure of a multidimensional, hierarchical model of physical self-concept in adolescents across gender and grade. A children's version of the Physical Self-Perception Profile (C-PSPP) was administered to seventh-, eighth- and ninth-grade high school students (N= 2,969). Two a priori models were proposed: a confirmatory factor-analytic model that hypothesized a multidimensional model of physical self-concept; and a structural equation model that proposed a multidimensional, hierarchical structure with global self-concept as a superordinate construct and physical self-concept as a domainlevel construct that explained the covariances among the subdomains of the C-PSPP. Both models satisfied multiple criteria for goodness-of-fit with the data in each individual gender and grade sample. Tests of the invariance of the factor pattern and structural parameters for both models across gender and grade were supported. Consistent with findings from other contexts, latent means analysis suggests that physical self-concept scores was higher in boys.

128 citations


Journal ArticleDOI
TL;DR: In this paper, multiple group confirmatory factor analyses (CFAs) were used to investigate the factorial stability of the Mentoring Functions Questionnaire (MFQ-9) across two groups: proteges who are satisfied with their mentor and those who are not.
Abstract: Ensuring construct comparability is a prerequisite for testing cross-group differences, yet this assumption is rarely tested in mentoring research. More studies testing for factorial invariance are needed for the construct validation of mentoring. Multiple group confirmatory factor analyses (CFAs) were used to investigate the factorial stability of the Mentoring Functions Questionnaire (MFQ-9) across two groups: protegeswho are satisfied with their mentor and those who are not. CFA results supported a three-factor structure for the MFQ-9 composed of the dimensions of vocational support, psychosocial support, and role modeling. However, tests of invariance demonstrated nonequivalence for five item-pair measurements. Overall, the MFQ-9 demonstrated excellent psychometric properties with unsatisfied proteges; however, the instrument may need further development for use with satisfied proteges.

104 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigated item parameter recovery, standard error estimates, and fit statistics yielded by the WINSTEPS program under the Rasch model and the rating scale model through Monte Carlo simulations.
Abstract: This study investigates item parameter recovery, standard error estimates, and fit statistics yielded by the WINSTEPS program under the Rasch model and the rating scale model through Monte Carlo simulations. The independent variables were item response model, test length, and sample size. WINSTEPS yielded practically unbiased estimates for the difficulty parameters under the Rasch model and the overall difficulty parameters under the rating scale model. However, the estimates for the intersection parameters under the rating scale model were substantially biased, especially for short tests. The standard errors of the overall difficulties and intersection parameters were slightly underestimated. The cube root-transformed weighted and unweighted item fit statistics did not follow the standard normal distribution in that their empirical sampling variances were much smaller than the expected value of unity. Correction procedures were proposed to make them follow approximately the standard normal distribution s...

Journal ArticleDOI
TL;DR: In this article, the authors examined the fakability of an situational judgment test (SJT) of college students' performance and found that faking had a negative effect on the criterion-related validity and the incremental validity of the SJT over cognitive ability and personality.
Abstract: There is increasing interest in using situational judgment tests (SJTs) to supplement traditional student admission procedures. An important unexplored issue is whether students can intentionally distort or fake their responses on SJTs. This study examined the fakability of an SJT of college students’ performance. Two hundred ninety-three psychology students completed a cognitive test, a personality measure, and an SJT. Only for the SJT, the students were assigned to either an honest or a fake condition. The scores of students in the fake condition were significantly higher than those of students in the honest condition (d = .89). Furthermore, faking had a negative effect on the criterion-related validity (there was a significant drop from r = .33 to r = .09) and the incremental validity of the SJT over cognitive ability and personality. These results are discussed in terms of the use of SJTs in high-stakes testing programs.

Journal ArticleDOI
TL;DR: In this article, a meta-analysis of coefficient alpha and test-retest reliability estimates showed that diagnostic classification of participants and the within-study BAI score variability were related to the magnitude of the reliability estimates.
Abstract: Anxiety is one of the most pervasive symptoms seen in clinical psychological disorders. The Beck Anxiety Inventory (BAI) is a popular measure to assess the construct of anxiety. This study was concerned with an examination of potential study factors that are associated with the variability of the reliability estimates of the BAI scores. A review of the literature involving the BAI showed that more than 57% of the publications either did not mention reliability estimates for BAI scores or presented secondary reliability estimates. This meta-analysis of coefficient alpha and test-retest reliability estimates showed that diagnostic classification of participants and the within-study BAI score variability were related to the magnitude of the reliability estimates.

Journal ArticleDOI
TL;DR: In this paper, the authors investigate evidence of the validity of Survey of Attitudes Toward Statistics Scale (SATS) scores and their relationship with scores from two other measures of attitudes toward statistics, the Attitude Toward statistics Scale (ATS) and the Statistics Attitude Survey.
Abstract: The purpose of the present study is to investigate evidence of the validity of Survey of Attitudes Toward Statistics Scale (SATS) scores and their relationship with scores from two other measures of attitudes toward statistics, the Attitude Toward Statistics Scale (ATS) and the Statistics Attitude Survey. The pre- and postcourse responses of 342 graduate and undergraduate students enrolled in inferential statistics courses at a large midwestern university were analyzed. Internal consistency reliability estimates were greater than .90 for total scores and greater than .70 for subscale scores for all instruments. Regression analyses confirmed the importance of SATS subscale scores over and above demographic variables in a theoretical model predicting statistics course achievement. Factor analyses suggested that both the ATS and the SATS have two domains, which is contrary to the four-factor solution proposed by the developers of the SATS.

Journal ArticleDOI
TL;DR: This paper investigated whether cynicism and depersonalization are two different dimensions of burnout or whether they may be collapsed into one construct of mental distance, and found that depersonality plays a different role in both samples, particularly as far as their relationship with professional efficacy is concerned.
Abstract: This article investigated whether cynicism and depersonalization are two different dimensions of burnout or whether they may be collapsed into one construct of mental distance. Using confirmatory factor analyses in two samples of teachers (n = 483) and blue-collar workers (n = 474), a superior fit was found for the four-factor model that contained cynicism, depersonalization, exhaustion, and professional efficacy as dimensions of burnout. In particular, cynicism and depersonalization emerged as unique burnout dimensions. Moreover, it appeared from multigroup analyses that this four-dimensional structure of burnout is partially invariant across both samples. Cynicism and depersonalization seemed to play a different role in both samples, particularly as far as their relationship with professional efficacy is concerned. It is recommended that future research on burnout should include the cynicism and depersonalization constructs.

Journal ArticleDOI
TL;DR: In this paper, the authors compared four methods for detecting DIF in ordinal items: the Mantel, generalized Mantel-Haenszel (GMH), logistic discriminant function analysis (LDFA), and unconstrained cumulative logits ordinal logistic regression (UCLOLR).
Abstract: Item bias is a major threat to measurement validity. Methods for detecting differential item functioning (DIF) are now commonly used to identify potentially biased items. DIF detection methods for dichotomous items are well developed, but those for ordinal items are less well developed. In this article, the authors compare four methods for detecting DIF in ordinal items: the Mantel, generalized Mantel-Haenszel (GMH), logistic discriminant function analysis (LDFA), and unconstrained cumulative logits ordinal logistic regression (UCLOLR). Factors varied include type of DIF, group ability differences, studied item discrimination, skewness in ability distributions, and sample size ratio. All procedures had good Type I error control as well as high power for detecting uniform DIF. However, the Mantel could not detect nonuniform DIF, and the LDFA also performed poorly in detecting nonuniform DIF, particularly when item discrimination was high. The UCLOLR and GMH performed extremely well under conditions simulat...

Journal ArticleDOI
TL;DR: The authors examined the factorial validity of scores on the newly developed Students Evaluation of Teaching Effectiveness Rating Scale (SETERS) through a series of confirmatory and multilevel structures.
Abstract: This study examined the factorial validity of scores on the newly developed Students’ Evaluation of Teaching Effectiveness Rating Scale (SETERS) through a series of confirmatory and multilevel structures. Conventional confirmatory factor analyses using the total covariance and pooled within-covariance matrices from two midwestern universities indicated that a reduced 25-item SETERS fit the data better than the original 34-item SETERS. Furthermore, multilevel factor analysis was conducted on the combined samples. This analysis suggested that one or three factors at the between and within levels were a plausible representation of SETERS scores. Pearson's correlations between individual scores on the SETERS and the Students’Evaluation of Educational Quality questionnaire provided additional validity evidence for the two measures. The need for additional empirical research on the SETERS before widespread use is discussed.

Journal ArticleDOI
TL;DR: Despite increased awareness of practical issues in multinational data collection, few studies have addressed the issue of measurement equivalence across Western and Eastern cultures, especially usi... as discussed by the authors, in their paper.
Abstract: Despite increased awareness of practical issues in multinational data collection, few studies have addressed the issue of measurement equivalence across Western and Eastern cultures, especially usi...

Journal ArticleDOI
TL;DR: In this article, the authors compared four different missing data methods (deletion, mean substitution, mean of adjacent observations, and maximum likelihood estimation) with respect to the accuracy of estimation for four parameters (level, error variance, degree of autocorrelation, and slope).
Abstract: Missing data are a common practical problem for longitudinal designs. Time-series analysis is a longitudinal method that involves a large number of observations on a single unit. Four different missing-data methods (deletion, mean substitution, mean of adjacent observations, and maximum likelihood estimation) were evaluated. Computer-generated time-series data of length 100 were generated for 50 different conditions representing five levels of autocorrelation, two levels of slope, and five levels of proportion of missing data. Methods were compared with respect to the accuracy of estimation for four parameters (level, error variance, degree of autocorrelation, and slope). The choice of method had a major impact on the analysis. The maximum likelihood very accurately estimated all four parameters under all conditions tested. The mean of the series was the least accurate approach. Statistical methods such as the maximum likelihood procedure represent a superior approach to missing data.

Journal ArticleDOI
TL;DR: In this article, the performance of parallel analysis for unidimensional binary data was investigated for single-factor models with 8 and 20 indicators, and sample size (50, 100, 200, 500, and 1,000), factor loading (45, 70, and 90), response ratio on two categories (50/50, 60/40, 70/30, 80/20, 90/10), and types of correlation coefficients (phi and tetrachoric correlations).
Abstract: The present simulation investigated the performance of parallel analysis for unidimensional binary data Single-factor models with 8 and 20 indicators were examined, and sample size (50, 100, 200, 500, and 1,000), factor loading (45, 70, and 90), response ratio on two categories (50/50, 60/40, 70/30, 80/20, and 90/10), and types of correlation coefficients (phi and tetrachoric correlations) were manipulated The results indicated that parallel analysis performed well in identifying the number of factors The performance improved as factor loading and sample size increased and as the percentages of responses on two categories became close Using the 95th and 99th percentiles of the random data eigenvalues as the criteria for comparison in parallel analysis yielded higher correct rate than using mean eigenvalues

Journal ArticleDOI
TL;DR: In this paper, the McInerney Facilitating Conditions Questionnaire (FCQ) was used to measure the perceived value of schooling (Value), affect toward schooling (Affect), peer positive academic climate (Peer Positive), parent positive academic environment (Parent Positive), teacher positive academic environments (Teacher), peer negative academic climate(Peer Negative), and parent negative academic conditions (Parent Negative).
Abstract: Elementary students (n = 277, in Grades 5-6) and high school students (n = 615, in Grades 7-12) responded to 26 items of McInerney’s Facilitating Conditions Questionnaire (FCQ). Confirmatory factor analyses of the FCQ found seven distinct factors underlying these items. These were perceived value of schooling (Value), affect toward schooling (Affect), peer positive academic climate (Peer Positive), parent positive academic climate (Parent Positive), teacher positive academic climate (Teacher), peer negative academic climate (Peer Negative), and parent negative academic climate (Parent Negative). The Peer and Parent Negative constructs were correlated positively with each other but negatively with all the positive constructs. Also, academic achievement was positively correlated with the five positive factors but negatively correlated with the two negative factors. In addition, the seven factors were invariant across the elementary and high school subsamples. These results provide support for the convergent...

Journal ArticleDOI
TL;DR: In this paper, the authors examined the relationship between men and women's CT domain identification, their perceptions of the CT field, and their interpersonal orientation to determine whether existing relationships among these variables might explain individuals' willingness to consider a number of CT-and non-CT-related fields.
Abstract: The aim of this project is to further examine the construct of domain identification (i.e., a person’s positive phenomenological experiences with, and perceived self-relevance of, a domain), specifically as it applies to computer technology (CT). The authors model a knownmeasure of math identification to first develop a measure ofCTidentification. The authors then test whether the new CT identification measure could uniquely explain the relationship between individuals’ gender and CT career pursuit, above and beyond math identification. Finally, the authors examine the relationships between men’s and women’s CT domain identification, their perceptions of the CT field, and their interpersonal orientation to determine whether existing relationships among these variables might explain individuals’ willingness to consider a number of CT-and non-CT-related fields.

Journal ArticleDOI
TL;DR: In this article, data on self-esteem gathered in a sample of 1,107 students within 72 school classes in Switzerland were analyzed using two-level confirmatory factor analysis, and the results indicated that a one-factor model of selfesteem with an additional orthogonal method or response-style factor of negatively worded items adequately described within-class (individual) differences in selfesteem.
Abstract: Classical factor analysis assumes independent and identically distributed observations. Educational data, however, are often hierarchically structured, with, for example, students being nested within classes. In this study, data on self-esteem gathered in a sample of 1,107 students within 72 school classes in Switzerland were analyzed using two-level confirmatory factor analysis. Considering a sequence of two-level confirmatory factor models, the results indicate that a one-factor model of self-esteem with an additional orthogonal method or response-style factor of negatively worded items adequately described within-class (individual) differences in self-esteem. By contrast, at the between-class level, a general factor of self-esteem was sufficient to capture school class differences in self-esteem. Thus, apart from other influences, for students, the social context (school class) seems to matter in forming their self-esteem. At the same time, the findings imply that studies examining self-esteem using sa...

Journal ArticleDOI
TL;DR: In this article, the longitudinal factorial invariance of a theoretically consistent, higher-order model for Center for Epidemiologic Studies-Depression (CES-D) scores among adolescent girls and boys in middle school was tested.
Abstract: This study tested the longitudinal factorial invariance of a theoretically consistent, higher-order model for Center for Epidemiologic Studies-Depression (CES-D) scores among adolescent girls and boys in middle school. Data were collected from 2,416 adolescents who completed a survey containing the CES-D in the fall of 1998, spring of 1999, and spring of 2000. The invariance analyses were conducted using LISREL 8.50 with maximum likelihood estimation and the Satorra-Bentler scaled chi-square statistic and standard errors. The higher-order model demonstrated longitudinal, as well as gender, invariance of the overall factor structure and first- and second-order structure coefficients, first-order factor variances, second-order factor variances and covariances, and item uniquenesses. The results demonstrate that meaningful comparisons of composite CES-D scores can be made across time among girls and boys in middle school.

Journal ArticleDOI
TL;DR: The Critical Thinking Belief Appraisal (CTBA) as discussed by the authors is based on a four-factor "advantage effect" model: the theoretical premise that teachers' CT-related decision making is associated with their beliefs about the effectiveness of (a) high-CT activities for high-advantage learners, (b) high CT activities for low-advantaged learners, and (c) low-CT activations for lowadvantages learners.
Abstract: This article reports five studies in which a scale for assessing teachers’ beliefs about classroom use of critical-thinking (CT) activities was developed and its scores evaluated for reliability and validity. The Critical Thinking Belief Appraisal (CTBA) is based on a four-factor “advantage effect” model: the theoretical premise that teachers’ CT-related decision making is associated with their beliefs about the effectiveness of (a) high-CT activities for high-advantage learners, (b) high-CT activities for low-advantage learners, (c) low-CT activities for high-advantage learners, and (d) low-CT activities for low-advantage learners. Results indicated that the scale produced scores with high reliability; a stable factor structure; and satisfactory discriminant, construct, and predictive validity. The studies supported the theoretical and practical utility of the construct and measure of teachers’ beliefs about classroom use of CT activities.

Journal ArticleDOI
TL;DR: In this paper, the authors compared eight confidence intervals of measures of effect size (ES) in a two-level repeated measures design and found that the ESs and intervals that used robust estimators and critical values were better at controlling the probability coverage.
Abstract: Probability coverage for eight different confidence intervals (CIs) of measures of effect size (ES) in a two-level repeated measures design was investigated. The CIs and measures of ES differed with regard to whether they used least squares or robust estimates of central tendency and variability, whether the end critical points of the interval were obtained using a theoretical or an empirical sampling distribution, and whether the ESs used a pooled or nonpooled estimate of error variability. These intervals were compared when data were obtained from both normal and nonnormal distributions and when the population magnitude of effect, size of sample, and variance heterogeneity were varied. Itwas found that the ESs and intervals that used robust estimators and critical values were obtained through a bootstrap method better at controlling the probability coverage (i.e., within [.925, .975]).

Journal ArticleDOI
TL;DR: A Spanish version of the Marlowe-Crowne Social Desirability Scale (MCSDS) was developed by applying a method derived from the cross-cultural and psychometric literature.
Abstract: A Spanish version of the Marlowe-Crowne Social Desirability Scale (MCSDS)was developed by applying a method derived from the cross-cultural and psychometric literature. The method included five sequenced studies: (a) translation and back-translation, (b) comprehension assessment, (c) psychometric equivalence study of two mixed-language versions, (d) analysis of items flagged for differential item functioning, and (e) psychometric study of the Spanish version. The method used was effective in obtaining an equivalent Spanish version of the MCSDS. Seven of 33 items were identified that functioned differently in the English and Spanish versions. Causes of this differential functioning were hypothesized by a bilingual committee, and revisions were suggested. The translated version has similar psychometric characteristics as those reported for the original English scale. Some progress was made in elucidating the factor structure of the scores of the instrument.

Journal ArticleDOI
TL;DR: In this article, the authors combine meta-analytic techniques and structural equation modeling to test theoretical models from a pool of studies from a set of studies, and the results show that the proposed procedures are more accurate than those based on t...
Abstract: Researchers are becoming interested in combining meta-analytic techniques and structural equation modeling to test theoretical models from a pool of studies. Most existing procedures are based on t...

Journal ArticleDOI
TL;DR: In this article, a reliability generalization study was conducted on the Patterns of Adaptive Learning Survey achievement goal orientation scales to assess the prediction of the different orientation scales, the adaptation of items to meet research needs, the number of respondents completing the instrument, and publication date cited for the manual used on reliability variation in articles that reported sample specific reliability coefficients.
Abstract: A reliability generalization study was completed on the Patterns of Adaptive Learning Survey achievement goal orientation scales to assess the prediction of (a) the different orientation scales, (b) the adaptation of items to meet research needs, (c) the number of respondents completing the instrument, and (d) the publication date cited for the manual used on reliability variation in articles that reported sample specific reliability coefficients. The results of analyses suggested that the evolution of the scales has improved the consistency of the scores derived, lending more credence to inferences that are based on the scores from the more recent versions of the instrument. Nevertheless, examples of poor to marginal coefficients were observed in some cases.

Journal ArticleDOI
TL;DR: In this article, a multidimensional item response model was proposed to detect specific forms of local item dependence, where items across different scales within an instrument share common stimuli and subjects respond to the common stimulus for each scale.
Abstract: A parallel design, in which items across different scales within an instrument share common stimuli and subjects respond to the common stimulus for each scale, is sometimes used in questionnaires or inventories. Because the items across scales share the same stimuli, the assumption of local item independence may not hold, thereby violating the assumption of local item independence under standard psychometric models. In this study, the authors describe a multidimensional item response model to detect specific forms of local item dependence. Three real data sets were analyzed to illustrate implications and applications of the proposed method.