scispace - formally typeset

Does it Matter How Data are Collected? A Comparison of Testing Conditions and the Implications for Validity

01 Jan 2009-Vol. 4, pp 17-26

AbstractThe effects of gathering test scores under low-stakes conditions has been a prominent domain of research in the assessment and testing literature. One important area within this larger domain concerns the implications of a test being low-stakes on test evaluation and development. The current study examined one variable, the testing context, that could impact students’ responses during low-stakes testing, and subsequently the decisions made when using the data for test refinement. Specifically, the factor-structure of college self-efficacy scores was examined across three low-stakes testing contexts, and results indicated differential model-data fit across conditions (the very controlled context yielded the best model-data fit), implying that testing conditions should be seriously considered when gathering low-stakes data used for instrument development.

Topics: Test validity (57%), Goodness of fit (54%), Context (language use) (51%), Test (assessment) (50%)

...read more

Content maybe subject to copyright    Report

Citations
More filters

Journal ArticleDOI
Abstract: Researchers commonly compare means and other statistics across groups with little concern for whether the measure possesses strong factorial invariance (ie, equal factor loadings and intercepts/thresholds) When this assumption is violated, inaccurate inferences associated with statistical and practical significance can occur This manuscript emphasizes the importance of testing for measurement invariance (MI) and provides guidance when conducting these tests Topics discussed are potential causes of noninvariant items, the difference between measurement bias and invariance, remedies for noninvariant measures, and considerations associated with model estimation Using a sample of 491 teachers, a demonstration is also provided that evaluates whether a newly constructed behavior and instructional management scale is invariant across elementary and middle school teachers Analyses revealed that the results differ slightly based on the estimation method utilized although these differences did not greatly in

371 citations


Cites background from "Does it Matter How Data are Collect..."

  • ...These correlated residuals might be caused by item-ordering effect (Barry & Finney, 2009), as these items were sequentially ordered but measuring different constructs....

    [...]


Journal ArticleDOI
Abstract: General education program assessment involves low-stakes testing, but students may not be motivated to perform optimally if they know the test results will not represent them personally. We propose a protocol for administering general education tests under low-stakes conditions and describe simple proctor strategies that engender effort and inhibit inattention.

33 citations


Book ChapterDOI
01 Jan 2013
TL;DR: Measurement validation in the behavioral sciences is generally carried out in a psychometric modeling framework that assumes unobservable traits/constructs created from the observed variables are the variables of interest.
Abstract: Measurement validation in the behavioral sciences is generally carried out in a psychometric modeling framework that assumes unobservable traits/constructs (i.e., latent factors) created from the observed variables (often items measuring that construct) are the variables of interest.

32 citations


01 Jan 2016
Abstract: Catherine E. Mathers James Madison University Abstract Research investigating methods to influence examinee motivation during low-stakes assessment of student learning outcomes has involved manipulating test session instructions. The impact of instructions is often evaluated using a popular self-report measure of test-taking motivation. However, the impact of these manipulations on the psychometric properties of the test-taking motivation measure has yet to be investigated, resulting in questions regarding the comparability of motivation scores across instruction conditions and the scoring of the measure. To address these questions, the factor structure and reliability of test-taking motivation scores were examined across instruction conditions during a low-stakes assessment session designed to address higher education accountability mandates. Incoming first-year college students were randomly assigned to one of three instruction conditions where personal consequences associated with test results were incrementally increased. Confirmatory factor analyses indicated a two-factor structure of test-taking motivation was supported across conditions. Moreover, reliability of motivation scores was adequate even in the condition with greatest personal consequence, which was reassuring given low reliability has been found in high-stakes contexts. Thus, the findings support the use of this self-report measure for the valuable research that informs motivation instruction interventions for low-stakes testing initiatives common in higher education assessment.

11 citations


Cites background from "Does it Matter How Data are Collect..."

  • ...…previous research examining the factor structure of noncognitive measures suggests dimensionality can differ across testing contexts (e.g., Barry & Finney, 2009; De Leeuw, Mellenbergh, & Hox, 1996), it is curious there have been no empirical studies assessing if the factor structure of…...

    [...]


Book ChapterDOI
01 Jan 2019
TL;DR: This chapter presents a strategy that allows the integration of results from UOA into the results from proctored computerized assessments and generalizes the idea of motivational filtering, known for the treatment of rapid guessing behavior in low-stakes assessment.
Abstract: Many large-scale competence assessments such as the National Educational Panel Study (NEPS) have introduced novel test designs to improve response rates and measurement precision. In particular, unstandardized online assessments (UOA) offer an economic approach to reach heterogeneous populations that otherwise would not participate in face-to-face assessments. Acknowledging the difference between delivery, mode, and test setting, this chapter extends the theoretical background for dealing with mode effects in NEPS competence assessments (Kroehne and Martens in Zeitschrift fur Erziehungswissenschaft 14:169–186, 2011 2011) and discusses two specific facets of UOA: (a) the confounding of selection and setting effects and (b) the role of test-taking behavior as mediator variable. We present a strategy that allows the integration of results from UOA into the results from proctored computerized assessments and generalizes the idea of motivational filtering, known for the treatment of rapid guessing behavior in low-stakes assessment. We particularly emphasize the relationship between paradata and the investigation of test-taking behavior, and illustrate how a reference sample formed by competence assessments under standardized and supervised conditions can be used to increase the comparability of UOA in mixed-mode designs. The closing discussion reflects on the trade-off between data quality and the benefits of UOA.

5 citations


Cites background from "Does it Matter How Data are Collect..."

  • ...As Barry and Finney (2009) showed by comparing UOA and different standardizations of classroom testing, standardized test conditions are superior even for test development....

    [...]

  • ...Whereas this setting effect can be seen as part of the ecological validity in the context of psychological experiments (Reips 2000), it might threaten the validity of competence assessments (e.g., Barry & Finney 2009)....

    [...]


References
More filters

Journal ArticleDOI
Abstract: Self-reports of behaviors and attitudes are strongly influenced by features of the research instrument, including question wording, format, and context. Recent research has addressed the underlying cognitive and communicative processes, which are systematic and increasingly wellunderstood. I review what has been learned, focusing on issues of question comprehension, behavioral frequency reports, and the emergence of context effects in attitude measurement. The accumulating knowledge about the processes underlying self-reports promises to improve questionnaire design and data quality.

2,440 citations


"Does it Matter How Data are Collect..." refers background in this paper

  • ...We believed we were seeing an item-order effect (e.g., Schwarz, 1999; Tourangeau & Rasinksi, 1988) due to low motivation....

    [...]

  • ...…in the fact that these items were presented in succession, and the strong relationships may have been caused by an item-ordering effect; especially when expressing attitudes, preceding questions can influence the responses given to subsequent ones (e.g., Schwarz, 1999; Tourangeau & Rasinksi, 1988)....

    [...]


Book
03 Apr 2000
Abstract: In this book, authors Tenko Raykov and George A. Marcoulides introduce students to the basics of structural equation modeling (SEM) through a conceptual, nonmathematical approach. For ease of understanding, the few mathematical formulas presented are used in a conceptual or illustrative nature, rather than a computational one.Featuring examples from EQS, LISREL, and Mplus, A First Course in Structural Equation Modeling is an excellent beginner’s guide to learning how to set up input files to fit the most commonly used types of structural equation models with these programs. The basic ideas and methods for conducting SEM are independent of any particular software.Highlights of the Second Edition include:• Review of latent change (growth) analysis models at an introductory level• Coverage of the popular Mplus program• Updated examples of LISREL and EQS• Downloadable resources that contains all of the text’s LISREL, EQS, and Mplus examples.A First Course in Structural Equation Modeling is intended as an introductory book for students and researchers in psychology, education, business, medicine, and other applied social, behavioral, and health sciences with limited or no previous exposure to SEM. A prerequisite of basic statistics through regression analysis is recommended. The book frequently draws parallels between SEM and regression, making this prior knowledge helpful.

1,489 citations


"Does it Matter How Data are Collect..." refers background in this paper

  • ...These values can be positive or negative, indicating under- or over-representation of relationships, and absolute values of three or greater have been suggested as values to indicate a poorly reproduced relationship (Raykov & Marcoulides, 2000)....

    [...]


Journal ArticleDOI
TL;DR: Results demonstrate that over repeated samples, model modifications may be very inconsistent and cross-validation results may behave erratically, leading to skepticism about generalizability of models resulting from data-driven modifications of an initial model.
Abstract: In applications of covariance structure modeling in which an initial model does not fit sample data well, it has become common practice to modify that model to improve its fit. Because this process is data driven, it is inherently susceptible to capitalization on chance characteristics of the data, thus raising the question of whether model modifications generalize to other samples or to the population. This issue is discussed in detail and is explored empirically through sampling studies using 2 large sets of data. Results demonstrate that over repeated samples, model modifications may be very inconsistent and cross-validation results may behave erratically. These findings lead to skepticism about generalizability of models resulting from data-driven modifications of an initial model. The use of alternative a priori models is recommended as a preferred strategy.

1,390 citations


Journal ArticleDOI
Abstract: We begin this article with the assumption that attitudes are best understood as structures in longterm memory, and we look at the implications of this view for the response process in attitude surveys. More specifically, we assert that an answer to an attitude question is the product of a fourstage process. Respondents first interpret the attitude question, determining what attitude the question is about. They then retrieve relevant beliefs and feelings. Next, they apply these beliefs and feelings in rendering the appropriate judgment. Finally, they use this judgment to select a response. All four of the component processes can be affected by prior items. The prior items can provide a framework for interpreting later questions and can also make some responses appear to be redundant with earlier answers. The prior items can prime some beliefs, making them more accessible to the retrieval process. The prior items can suggest a norm or standard of comparison for making the judgment. Finally, the prior items can create consistency pressures or pressures to appear moderate. Because of the multiple processes involved, context effects are difficult to predict and sometimes difficult to replicate. We attempt to sort out when context is likely to affect later responses and include a list of the variables that affect the size and direction of the effects of context.

920 citations


"Does it Matter How Data are Collect..." refers background in this paper

  • ...We believed we were seeing an item-order effect (e.g., Schwarz, 1999; Tourangeau & Rasinksi, 1988) due to low motivation....

    [...]

  • ...…in the fact that these items were presented in succession, and the strong relationships may have been caused by an item-ordering effect; especially when expressing attitudes, preceding questions can influence the responses given to subsequent ones (e.g., Schwarz, 1999; Tourangeau & Rasinksi, 1988)....

    [...]


Journal ArticleDOI
Abstract: Student test-taking motivation in low-stakes assessment testing is examined in terms of both its relationship to test performance and the implications of low student effort for test validity. A theoretical model of test-taking motivation is presented, with a synthesis of previous research indicating that low student motivation is associated with a substantial decrease in test performance. A number of assessment practices and data analytic procedures for managing the problems posed by low student motivation are discussed.

379 citations


"Does it Matter How Data are Collect..." refers background in this paper

  • ...Thus, their scores may not serve as valid indicators of their true level of the construct of interest (Sundre, 1999; Sundre & Kitsantas, 2004; Wise & DeMars, 2005)....

    [...]

  • ...However, if low motivation results in test scores that are not truly representative of the construct of interest, the scores are then ambiguous at best and misleading at worst (Wise & DeMars, 2005)....

    [...]

  • ...Because there are very few, if any, consequences associated with performance and because students may perceive no personal gain from the experience, low-stakes testing often leads to low effort and motivation on the part of the test-taker (Wise & DeMars, 2005)....

    [...]