scispace - formally typeset
Search or ask a question

Does it Matter How Data are Collected? A Comparison of Testing Conditions and the Implications for Validity

01 Jan 2009-Vol. 4, pp 17-26
TL;DR: In this article, the authors examined the effect of low-stakes test conditions on student self-efficacy scores and found that the very controlled context yielded the best model-data fit.
Abstract: The effects of gathering test scores under low-stakes conditions has been a prominent domain of research in the assessment and testing literature. One important area within this larger domain concerns the implications of a test being low-stakes on test evaluation and development. The current study examined one variable, the testing context, that could impact students’ responses during low-stakes testing, and subsequently the decisions made when using the data for test refinement. Specifically, the factor-structure of college self-efficacy scores was examined across three low-stakes testing contexts, and results indicated differential model-data fit across conditions (the very controlled context yielded the best model-data fit), implying that testing conditions should be seriously considered when gathering low-stakes data used for instrument development.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In this article, the authors emphasize the importance of testing for measurement invariance (MI) and provide guidance when conducting these tests and discuss potential causes of non-invariant items, the difference between measurement bias and invariance, remedies for non-informal measures, and considerations associated with model estimation.
Abstract: Researchers commonly compare means and other statistics across groups with little concern for whether the measure possesses strong factorial invariance (ie, equal factor loadings and intercepts/thresholds) When this assumption is violated, inaccurate inferences associated with statistical and practical significance can occur This manuscript emphasizes the importance of testing for measurement invariance (MI) and provides guidance when conducting these tests Topics discussed are potential causes of noninvariant items, the difference between measurement bias and invariance, remedies for noninvariant measures, and considerations associated with model estimation Using a sample of 491 teachers, a demonstration is also provided that evaluates whether a newly constructed behavior and instructional management scale is invariant across elementary and middle school teachers Analyses revealed that the results differ slightly based on the estimation method utilized although these differences did not greatly in

440 citations


Cites background from "Does it Matter How Data are Collect..."

  • ...These correlated residuals might be caused by item-ordering effect (Barry & Finney, 2009), as these items were sequentially ordered but measuring different constructs....

    [...]

Book ChapterDOI
01 Jan 2013
TL;DR: Measurement validation in the behavioral sciences is generally carried out in a psychometric modeling framework that assumes unobservable traits/constructs created from the observed variables are the variables of interest.
Abstract: Measurement validation in the behavioral sciences is generally carried out in a psychometric modeling framework that assumes unobservable traits/constructs (i.e., latent factors) created from the observed variables (often items measuring that construct) are the variables of interest.

46 citations

Journal ArticleDOI
TL;DR: This paper propose a protocol for administering general education tests under low-stakes conditions and describe simple proctor strategies that engender effort and inhibit inattention, which may not motivate students to perform optimally if they know the test results will not represent them personally.
Abstract: General education program assessment involves low-stakes testing, but students may not be motivated to perform optimally if they know the test results will not represent them personally. We propose a protocol for administering general education tests under low-stakes conditions and describe simple proctor strategies that engender effort and inhibit inattention.

37 citations

01 Jan 2016
TL;DR: Mathers et al. as mentioned in this paper investigated the impact of test session instructions on the psychometric properties of the test-taking motivation measure and found that the effect of instruction manipulations on the test taking motivation measure has yet to be investigated.
Abstract: Catherine E. Mathers James Madison University Abstract Research investigating methods to influence examinee motivation during low-stakes assessment of student learning outcomes has involved manipulating test session instructions. The impact of instructions is often evaluated using a popular self-report measure of test-taking motivation. However, the impact of these manipulations on the psychometric properties of the test-taking motivation measure has yet to be investigated, resulting in questions regarding the comparability of motivation scores across instruction conditions and the scoring of the measure. To address these questions, the factor structure and reliability of test-taking motivation scores were examined across instruction conditions during a low-stakes assessment session designed to address higher education accountability mandates. Incoming first-year college students were randomly assigned to one of three instruction conditions where personal consequences associated with test results were incrementally increased. Confirmatory factor analyses indicated a two-factor structure of test-taking motivation was supported across conditions. Moreover, reliability of motivation scores was adequate even in the condition with greatest personal consequence, which was reassuring given low reliability has been found in high-stakes contexts. Thus, the findings support the use of this self-report measure for the valuable research that informs motivation instruction interventions for low-stakes testing initiatives common in higher education assessment.

14 citations


Cites background from "Does it Matter How Data are Collect..."

  • ...…previous research examining the factor structure of noncognitive measures suggests dimensionality can differ across testing contexts (e.g., Barry & Finney, 2009; De Leeuw, Mellenbergh, & Hox, 1996), it is curious there have been no empirical studies assessing if the factor structure of…...

    [...]

Book ChapterDOI
01 Jan 2019
TL;DR: This chapter presents a strategy that allows the integration of results from UOA into the results from proctored computerized assessments and generalizes the idea of motivational filtering, known for the treatment of rapid guessing behavior in low-stakes assessment.
Abstract: Many large-scale competence assessments such as the National Educational Panel Study (NEPS) have introduced novel test designs to improve response rates and measurement precision. In particular, unstandardized online assessments (UOA) offer an economic approach to reach heterogeneous populations that otherwise would not participate in face-to-face assessments. Acknowledging the difference between delivery, mode, and test setting, this chapter extends the theoretical background for dealing with mode effects in NEPS competence assessments (Kroehne and Martens in Zeitschrift fur Erziehungswissenschaft 14:169–186, 2011 2011) and discusses two specific facets of UOA: (a) the confounding of selection and setting effects and (b) the role of test-taking behavior as mediator variable. We present a strategy that allows the integration of results from UOA into the results from proctored computerized assessments and generalizes the idea of motivational filtering, known for the treatment of rapid guessing behavior in low-stakes assessment. We particularly emphasize the relationship between paradata and the investigation of test-taking behavior, and illustrate how a reference sample formed by competence assessments under standardized and supervised conditions can be used to increase the comparability of UOA in mixed-mode designs. The closing discussion reflects on the trade-off between data quality and the benefits of UOA.

5 citations


Cites background from "Does it Matter How Data are Collect..."

  • ...As Barry and Finney (2009) showed by comparing UOA and different standardizations of classroom testing, standardized test conditions are superior even for test development....

    [...]

  • ...Whereas this setting effect can be seen as part of the ecological validity in the context of psychological experiments (Reips 2000), it might threaten the validity of competence assessments (e.g., Barry & Finney 2009)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: According to social cognitive theory, self-efficacy beliefs are the gateway to premature post-secondary institutional departure as discussed by the authors, which is a common cause of premature institutional departure in higher education.
Abstract: Researchers and educators continue to try to understand and predict premature post-secondary institutional departure. According to social cognitive theory, self-efficacy beliefs are the gateway to ...

86 citations

Journal ArticleDOI
TL;DR: The authors examined the generalizability of a recently developed technique called motivation filtering, whereby scores for students of low motivation are systemically filtered from test data to determine aggregate test scores that more accurately reflect student performance and that can be used for reporting purposes.
Abstract: Accountability for educational quality is a priority at all levels of education. Low-stakes testing is one way to measure the quality of education that students receive and make inferences about what students know and can do. Aggregate test scores from low-stakes testing programs are suspect, however, to the degree that these scores are influenced by low test-taker effort. This study examined the generalizability of a recently developed technique called motivation filtering, whereby scores for students of low motivation are systemically filtered from test data to determine aggregate test scores that more accurately reflect student performance and that can be used for reporting purposes. Across assessment tests in five different content areas, motivation filtering was found to consistently increase mean test performance and convergent validity.

61 citations

Journal ArticleDOI
TL;DR: In this paper, an effort-monitoring CBT was proposed to suppress rapid-guessing behavior in a low-stakes test, where the computer monitors examinee effort based on item response time.
Abstract: The attractiveness of computer-based tests (CBTs) is due largely to their capability to expand the ways we conduct testing. A relatively unexplored application, however, is actively using the computer to reduce construct-irrelevant variance while a test is being administered. This investigation introduces the effort-monitoring CBT, in which the computer monitors examinee effort (based on item response time) in a low-stakes test and displays warning messages to those exhibiting rapid-guessing behavior. The results of an experimental study are presented, which showed that an effort-monitoring CBT increased examinee effort and yielded more valid test scores than a conventional CBT. Thus, unlike previous research that has focused on identifying rapid-guessing behavior after it has occurred, the effort-monitoring CBT proactively attempts to suppress rapid-guessing behavior. This innovative testing procedure extends the capabilities of measurement practitioners to manage the psychometric challenges posed by unmotivated examinees.

60 citations

Journal ArticleDOI
TL;DR: In this paper, the authors present a framework based on the unified view of validity to assist in generating an evidence-based argument regarding the quality of a given assessment practice, including content, structure, sampling, contextual influences, score production, and utility.

42 citations


"Does it Matter How Data are Collect..." refers background in this paper

  • ...We believe this study helps answer the call of Birenbaum (2007) to evaluate the validity of the full testing program....

    [...]

  • ...…but rather should consider these inferences within the wider frame of how the assessment instruments map to the domain of study, the psychometric functioning and internal structure of the instruments (e.g., factor structure), and the contexts in which the data were collected (Birenbaum, 2007)....

    [...]

  • ..., factor structure), and the contexts in which the data were collected (Birenbaum, 2007)....

    [...]

Trending Questions (1)
What if test is valid but not reliable? is the test still can used to collect data?

The paper does not directly address the question of whether a test can still be used to collect data if it is valid but not reliable. The paper focuses on the impact of testing conditions on data collection and instrument development.