scispace - formally typeset
Search or ask a question

Showing papers in "Practical Assessment, Research and Evaluation in 2014"


Journal ArticleDOI
TL;DR: The aim of the present paper is to provide a tutorial in MG-CFA using the freely available R-packages lavaan, semTools, and semPlot to enable a highly efficient analysis of the measurement models both for normally distributed as well as ordinal data.
Abstract: Multiple-group confirmatory factor analysis (MG-CFA) is among the most productive extensions of.structural equation modeling. Many researchers conducting cross-cultural or longitudinal studies are interested in testing for measurement and structural invariance. The aim of the present paper is to provide a tutorial in MG-CFA using the freely available R-packages lavaan, semTools, and semPlot. The combination of these packages enable a highly efficient analysis of the measurement models both for normally distributed as well as ordinal data. Data from two freely available datasets – the first with continuous the second with ordered indicators - will be used to provide a walk-through the individual steps.

217 citations


Journal ArticleDOI
TL;DR: In this paper, the authors explain the theory behind and meaning of MANOVA and DDA and provide an example of a simple MANOVA with real mental health data from 4,384 adolescents to show how to interpret MANOVA results.
Abstract: Reviews of statistical procedures (e.g., Bangert & Baumberger, 2005; Kieffer, Reese, & Thompson, 2001; Warne, Lazo, Ramos, & Ritter, 2012) show that one of the most common multivariate statistical methods in psychological research is multivariate analysis of variance (MANOVA). However, MANOVA and its associated procedures are often not properly understood, as demonstrated by the fact that few of the MANOVAs published in the scientific literature were accompanied by the correct post hoc procedure, descriptive discriminant analysis (DDA). The purpose of this article is to explain the theory behind and meaning of MANOVA and DDA. I also provide an example of a simple MANOVA with real mental health data from 4,384 adolescents to show how to interpret MANOVA results.

190 citations


Journal ArticleDOI
TL;DR: In this article, the authors describe the application of methods recommended to get the most out of Exploratory Factor Analysis (EFA) using FACTOR (http://psico.fcep.urv.es/utilitats/factor/, Lorenzo-Seva & Ferrando, 2006).
Abstract: Exploratory factor analysis (EFA) methods are used extensively in the field of assessment and evaluation. Due to EFA's widespread use, common methods and practices have come under close scrutiny. A substantial body of literature has been compiled highlighting problems with many of the methods and practices used in EFA, and, in response, many guidelines have been proposed with the aim to improve application. Unfortunately, implementing recommended EFA practices has been restricted by the range of options available in commercial statistical packages and, perhaps, due to an absence of clear, practical 'how-to' demonstrations. Consequently, this article describes the application of methods recommended to get the most out of your EFA. The article focuses on dealing with the common situation of analysing ordinal data as derived from Likert-type scales. These methods are demonstrated using the free, stand-alone, easy-to-use and powerful EFA package FACTOR (http://psico.fcep.urv.es/utilitats/factor/, Lorenzo-Seva & Ferrando, 2006). The demonstration applies the recommended techniques using an accompanying dataset, based on the Big 5 personality test. The outcomes obtained by the EFA using the recommended procedures through FACTOR are compared to the default techniques currently available in SPSS.

188 citations


Journal ArticleDOI
TL;DR: In this article, a simulation study was conducted to examine the power and Type I error rates of the confidence interval approach to equivalence testing under conditions of equal and non-equal sample sizes and variability when comparing two and three groups.
Abstract: The question of equivalence between two or more groups is frequently of interest to many applied researchers. Equivalence testing is a statistical method designed to provide evidence that groups are comparable by demonstrating that the mean differences found between groups are small enough that they are considered practically unimportant. Few recommendations exist regarding the appropriate use of these tests under varying data conditions. A simulation study was conducted to examine the power and Type I error rates of the confidence interval approach to equivalence testing under conditions of equal and non-equal sample sizes and variability when comparing two and three groups. It was found that equivalence testing performs best when sample sizes are equal. The overall power of the test is strongly influenced by the size of the sample, the amount of variability in the sample, and the size of the difference in the population. Guidelines are provided regarding the use of equivalence tests when analyzing non-optimal data.

170 citations


Journal ArticleDOI
TL;DR: In this article, the authors focus on how to conduct propensity score matching using an example from the field of education and provide information that will bring propensity-score matching within the reach of research and evaluation practitioners.
Abstract: Propensity score matching is a statistical technique in which a treatment case is matched with one or more control cases based on each case’s propensity score. This matching can help strengthen causal arguments in quasi-experimental and observational studies by reducing selection bias. In this article we concentrate on how to conduct propensity score matching using an example from the field of education. Our goal is to provide information that will bring propensity score matching within the reach of research and evaluation practitioners.

166 citations


Journal ArticleDOI
TL;DR: In this article, the influence of mood on academic course evaluation is examined by means of facial feedback, either a positive or a negative mood was induced while students were completing a course evaluation questionnaire during lectures.
Abstract: In two subsequent experiments, the influence of mood on academic course evaluation is examined. By means of facial feedback, either a positive or a negative mood was induced while students were completing a course evaluation questionnaire during lectures. Results from both studies reveal that a positive mood leads to better ratings of different dimensions of lecture quality. While in Study 1 (N=109) mood was not directly controlled, Study 2 (N=64) replicates the findings of the prior study and reveals direct influences of positive and negative mood on academic course evaluation.

71 citations


Journal ArticleDOI
TL;DR: This article explored the limitations of commonly used measures of socioeconomic status (SES) and explored the measurement of SES within a structural equation modeling (SEM) framework, highlighting both the relevant conceptual and measurement issues.
Abstract: This study uses a nationally representative student dataset to explore the limitations of commonly used measures of socioeconomic status (SES). Among the identified limitations are patterns of missing data that conflate the traditional conceptualization of SES with differences in family structure that have emerged in recent years and a lack of theoretically-based guidance for how the components of SES should be combined. Using kindergarten achievement data, the study illustrates how both the observed relation between SES and achievement and the observed interaction between SES and kindergarten program would be impacted by the use of different measures of SES. This study also explores the measurement of SES within a structural equation modeling (SEM) framework, highlighting both the relevant conceptual and measurement issues.

36 citations


Journal ArticleDOI
TL;DR: In this paper, the authors evaluate the potential impact of missing-at-random (MAR) assumptions on the performance of FIML estimation in multidimensional adaptive testing (MAT).
Abstract: The full-information maximum likelihood (FIML) method makes it possible to estimate and analyze structural equation models (SEM) even when data are partially missing, enabling incomplete data to contribute to model estimation. The cornerstone of FIML is the missing-at-random (MAR) assumption. In (unidimensional) computerized adaptive testing (CAT), unselected items (i.e., responses that are not observed) remain at random even though selected items (i.e., responses that are observed) have been associated with a test taker’s latent trait that is being measured. In multidimensional adaptive testing (MAT), however, the missingness in the response data partially depends on the unobserved data because items are selected based on various types of information including the covariance among latent traits. This eventually may lead to violations of MAR. This study aimed to evaluate the potential impact such a violation of MAR in MAT could have on FIML estimation performance. The results showed an increase in estimation errors in item parameter estimation when the MAT response data were used, and differences in the level of the impact depending on how items loaded on multiple latent traits.

33 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a method for the first publication of first publication to the Practical Assessment, Research & Evaluation (PARE) journal for the purpose of obtaining a first publication license.
Abstract: Copyright is retained by the first or sole author, who grants right of first publication to the Practical Assessment, Research & Evaluation. Permission is granted to distribute this article for nonprofit, educational purposes if it is copied in its entirety and the journal is credited. PARE has the right to authorize third party reproduction of this article in print, electronic and database forms.

20 citations


Journal ArticleDOI
TL;DR: In this article, a historical cohort comparison group is defined as a cohort group selected from pre-treatment archival data and matched to a subsequent cohort currently receiving a treatment, which can reduce noncomparability of treatment and control conditions through local, focal matching.
Abstract: There is increased emphasis on using experimental and quasi-experimental methods to evaluate educational programs; however, educational evaluators and school leaders are often faced with challenges when implementing such designs in educational settings. Use of a historical cohort control group design provides a viable option for conducting quasi-experiments in school-based outcome evaluation. A cohort is a successive group that goes through some experience together, such as a grade level or a training program. A historical cohort comparison group is a cohort group selected from pre-treatment archival data and matched to a subsequent cohort currently receiving a treatment. Although prone to the same threats to study validity as any quasi-experiment, issues related to selection, history, and maturation can be particularly challenging. However, use of a historical cohort control group can reduce noncomparability of treatment and control conditions through local, focal matching. In addition, a historical cohort control group design can alleviate concerns about denying program access to students in order to form a control group, minimize resource requirements and disruption to school routines, and make use of archival data schools and school districts collect and find meaningful.

17 citations


Journal ArticleDOI
TL;DR: This study investigates the use of an alternative procedure to MLM: regression using Taylor series linearization (TSL) variance estimation, which can yield consistent and unbiased estimates and standard errors (given the appropriate conditions), and can be performed using a variety of commercially and freely-available statistical software.
Abstract: Clustered data (e.g., students within schools) are often analyzed in educational research where data are naturally nested. As a consequence, multilevel modeling (MLM) has commonly been used to study the contextual or group-level (e.g., school) effects on individual outcomes. The current study investigates the use of an alternative procedure to MLM: regression using Taylor series linearization (TSL) variance estimation. Despite the name, regressions using TSL are straightforward to conduct, can yield consistent and unbiased estimates and standard errors (given the appropriate conditions), and can be performed using a variety of commerciallyand freely-available statistical software. I analyze a subsample of the High School and Beyond (HSB) dataset using MLM, regression using TSL, and ordinary least squares regression and compare results. In addition, 12,000 random samples are drawn from the HSB dataset of varying level-one and level-two sample sizes in order to compute biases in standard errors based on the different conditions. Sample R and SAS syntax showing how to run regressions using TSL are provided.

Journal ArticleDOI
TL;DR: A detailed account of a rubric revision process to address seven common problems to which rubrics are prone, including lack of consistency and parallelism; the presence of “orphan” and “widow” words and phrases; redundancy in descriptors; inconsistency in the focus of qualifiers; limited routes to partial credit; unevenness in incremental levels of performance.
Abstract: This article provides a detailed account of a rubric revision process to address seven common problems to which rubrics are prone: lack of consistency and parallelism; the presence of “orphan” and “widow” words and phrases; redundancy in descriptors; inconsistency in the focus of qualifiers; limited routes to partial credit; unevenness in incremental levels of performance; and inconsistencies across suites or sets of related rubrics. The author uses examples from both the draft stage precursor and the first revised (pilot) version of the Engineering Design Process Portfolio Scoring Rubric (EDPPSR), to illustrate the application of broadly relevant guidelines that can inform the creation of a new—or revision of an existing—rubric to achieve technical quality while preserving content integrity.

Journal ArticleDOI
TL;DR: This simulation study compares the relative bias of two commonly used missing data techniques when data are missing on more than one variable and suggests the multiple imputation works well, even when the imputation model itself is missing data.
Abstract: When exploring missing data techniques in a realistic scenario, the current literature is limited: most studies only consider consequences with data missing on a single variable. This simulation study compares the relative bias of two commonly used missing data techniques when data are missing on more than one variable. Factors varied include type of missingness (MCAR, MAR), degree of missingness (10%, 25%, and 50%), and where missingness occurs (one predictor, two predictors, or two predictors with overlap). Using a real dataset, cells are systematically deleted to create various scenarios of missingness so that parameter estimates from listwise deletion and multiple imputation may be compared to the “true” estimates from the full dataset. Results suggest the multiple imputation works well, even when the imputation model itself is missing data.

Journal ArticleDOI
TL;DR: This paper outlines how a discrete choice experiment can be used to learn more about how students are willing to trade off various features of assignments such as the nature and timing of feedback and the method used to submit assignments.
Abstract: This paper outlines how a discrete choice experiment (DCE) can be used to learn more about how students are willing to trade off various features of assignments such as the nature and timing of feedback and the method used to submit assignments. A DCE identifies plausible levels of the key attributes of a good or service and then presents the respondent with alternative bundles of these attributes and their levels and asks the respondent to choose between particular bundles. We report results from a DCE we conducted with undergraduate business students regarding their preferences for assignment systems. We find that the most important features of assignments are how relevant the assignments are for exam preparation and the nature of the feedback that students receive. We also find that students generally prefer online to paper assignments. We argue that the DCE approach has a lot of potential in education research.

Journal ArticleDOI
TL;DR: This article provided a less technical introduction to regression discontinuity (RD) for education researchers and practitioners, using visual analysis to aid conceptual understanding and provided additional resources for further exploration. But, the complexity of the RD design has limited its application in education research.
Abstract: The ability of regression discontinuity (RD) designs to provide an unbiased treatment effect while overcoming the ethical concerns plagued by Random Control Trials (RCTs) make it a valuable and useful approach in education evaluation. RD is the only explicitly recognized quasi-experimental approach identified by the Institute of Education Statistics to meet the prerequisites of a causal relationship. Unfortunately, the statistical complexity of the RD design has limited its application in education research. This article provides a less technical introduction to RD for education researchers and practitioners. Using visual analysis to aide conceptual understanding, the article walks readers through the essential steps of a Sharp RD design using hypothetical, but realistic, district intervention data and provides additional resources for further exploration.

Journal ArticleDOI
TL;DR: This article used meta-regression techniques to examine the reasons why relationships between background characteristics and outcomes may vary across different locations in a single multi-site survey and found that the metaregression approach to analysis is more accurate than combining data across all countries into a single simple model.
Abstract: This article demonstrates how meta-analytic techniques, that have typically been used to synthesize findings across numerous studies, can also be applied to examine the reasons why relationships between background characteristics and outcomes may vary across different locations in a single multi-site survey. This application is particularly relevant to the analysis of data from international surveys of student achievement. A brief introduction to the method of meta-regression is provided and the technique is demonstrated in an analysis of the extent to which the relationship between school autonomy and achievement varies depending upon the level of accountability in a country. The results show that the meta-regression approach to analysis is more accurate than combining data across all countries into a single simple model.

Journal ArticleDOI
TL;DR: In this paper, a comparison of three conditional growth percentile methods, i.e., student growth percentiles, percentile rank residuals, and a nonparametric matching method, is presented.
Abstract: This article provides a brief overview and comparison of three conditional growth percentile methods; student growth percentiles, percentile rank residuals, and a nonparametric matching method. These approaches seek to describe student growth in terms of the relative percentile ranking of a student in relationship to students that had the same profile of prior achievement. It is shown that even though the methods come from a similar conceptual foundation, the methods make different assumptions and use different models to estimate growth percentiles. Reading and Mathematics data from a large-scale assessment program are used to compare the growth percentile estimates in a practical setting. Results suggested that the methods often give somewhat similar results. However, the matching method tended to provide somewhat different estimates compared to the other approaches for students that had extreme scores on the prior year test. The implications of these results for large-scale state accountability programs are discussed.

Journal ArticleDOI
TL;DR: In this article, the National Board of Medical Examiners (PARE) retained the right of first publication to Practical Assessment, Research & Evaluation (PARE) for the first time.
Abstract: Copyright is retained by the authors' employer, the National Board of Medical Examiners, which grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute this article for nonprofit, educational purposes if it is copied in its entirety and the journal is credited. PARE has the right to authorize third party reproduction of this article in print, electronic and database forms.