Does it Matter How Data are Collected? A Comparison of Testing Conditions and the Implications for Validity

Home
/
Papers
/
Does it Matter How Data are Collected? A Comparison of Testing Conditions and the Implications for Validity

Does it Matter How Data are Collected? A Comparison of Testing Conditions and the Implications for Validity

01 Jan 2009-Vol. 4, pp 17-26

TL;DR: In this article, the authors examined the effect of low-stakes test conditions on student self-efficacy scores and found that the very controlled context yielded the best model-data fit.

read less

Abstract: The effects of gathering test scores under low-stakes conditions has been a prominent domain of research in the assessment and testing literature. One important area within this larger domain concerns the implications of a test being low-stakes on test evaluation and development. The current study examined one variable, the testing context, that could impact students’ responses during low-stakes testing, and subsequently the decisions made when using the data for test refinement. Specifically, the factor-structure of college self-efficacy scores was examined across three low-stakes testing contexts, and results indicated differential model-data fit across conditions (the very controlled context yielded the best model-data fit), implying that testing conditions should be seriously considered when gathering low-stakes data used for instrument development.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Testing Measurement Invariance and Comparing Latent Factor Means Within a Confirmatory Factor Analysis Framework

[...]

Daniel A. Sass¹•Institutions (1)

University of Texas at San Antonio¹

01 Aug 2011-Journal of Psychoeducational Assessment

TL;DR: In this article, the authors emphasize the importance of testing for measurement invariance (MI) and provide guidance when conducting these tests and discuss potential causes of non-invariant items, the difference between measurement bias and invariance, remedies for non-informal measures, and considerations associated with model estimation.

...read moreread less

Abstract: Researchers commonly compare means and other statistics across groups with little concern for whether the measure possesses strong factorial invariance (ie, equal factor loadings and intercepts/thresholds) When this assumption is violated, inaccurate inferences associated with statistical and practical significance can occur This manuscript emphasizes the importance of testing for measurement invariance (MI) and provides guidance when conducting these tests Topics discussed are potential causes of noninvariant items, the difference between measurement bias and invariance, remedies for noninvariant measures, and considerations associated with model estimation Using a sample of 491 teachers, a demonstration is also provided that evaluates whether a newly constructed behavior and instructional management scale is invariant across elementary and middle school teachers Analyses revealed that the results differ slightly based on the estimation method utilized although these differences did not greatly in

...read moreread less

440 citations

Cites background from "Does it Matter How Data are Collect..."

...These correlated residuals might be caused by item-ordering effect (Barry & Finney, 2009), as these items were sequentially ordered but measuring different constructs....
[...]

Book Chapter•DOI•

Testing Measurement and Structural Invariance

[...]

Daniel A. Sass, Thomas A. Schmitt

01 Jan 2013

TL;DR: Measurement validation in the behavioral sciences is generally carried out in a psychometric modeling framework that assumes unobservable traits/constructs created from the observed variables are the variables of interest.

...read moreread less

Abstract: Measurement validation in the behavioral sciences is generally carried out in a psychometric modeling framework that assumes unobservable traits/constructs (i.e., latent factors) created from the observed variables (often items measuring that construct) are the variables of interest.

...read moreread less

46 citations

Journal Article•DOI•

Proctors Matter: Strategies for Increasing Examinee Effort on General Education Program Assessments

[...]

Abigail R. Lau, Peter J. Swerdzewski, Andrew T. Jones, Robin D. Anderson, Ross Markle - Show less +1 more

01 Jan 2009-The Journal of General Education

TL;DR: This paper propose a protocol for administering general education tests under low-stakes conditions and describe simple proctor strategies that engender effort and inhibit inattention, which may not motivate students to perform optimally if they know the test results will not represent them personally.

...read moreread less

Abstract: General education program assessment involves low-stakes testing, but students may not be motivated to perform optimally if they know the test results will not represent them personally. We propose a protocol for administering general education tests under low-stakes conditions and describe simple proctor strategies that engender effort and inhibit inattention.

...read moreread less

37 citations

Investigating the Dimensionality of Examinee Motivation across Instruction Conditions in Low-Stakes Testing Contexts.

[...]

Sara J. Finney, Catherine E. Mathers, Aaron J. Myers

01 Jan 2016

TL;DR: Mathers et al. as mentioned in this paper investigated the impact of test session instructions on the psychometric properties of the test-taking motivation measure and found that the effect of instruction manipulations on the test taking motivation measure has yet to be investigated.

...read moreread less

Abstract: Catherine E. Mathers James Madison University Abstract Research investigating methods to influence examinee motivation during low-stakes assessment of student learning outcomes has involved manipulating test session instructions. The impact of instructions is often evaluated using a popular self-report measure of test-taking motivation. However, the impact of these manipulations on the psychometric properties of the test-taking motivation measure has yet to be investigated, resulting in questions regarding the comparability of motivation scores across instruction conditions and the scoring of the measure. To address these questions, the factor structure and reliability of test-taking motivation scores were examined across instruction conditions during a low-stakes assessment session designed to address higher education accountability mandates. Incoming first-year college students were randomly assigned to one of three instruction conditions where personal consequences associated with test results were incrementally increased. Confirmatory factor analyses indicated a two-factor structure of test-taking motivation was supported across conditions. Moreover, reliability of motivation scores was adequate even in the condition with greatest personal consequence, which was reassuring given low reliability has been found in high-stakes contexts. Thus, the findings support the use of this self-report measure for the valuable research that informs motivation instruction interventions for low-stakes testing initiatives common in higher education assessment.

...read moreread less

14 citations

Cites background from "Does it Matter How Data are Collect..."

...…previous research examining the factor structure of noncognitive measures suggests dimensionality can differ across testing contexts (e.g., Barry & Finney, 2009; De Leeuw, Mellenbergh, & Hox, 1996), it is curious there have been no empirical studies assessing if the factor structure of…...
[...]

Book Chapter•DOI•

Disentangling Setting and Mode Effects for Online Competence Assessment

[...]

Ulf Kroehne, Timo Gnambs¹, Frank Goldhammer•Institutions (1)

Johannes Kepler University of Linz¹

01 Jan 2019

TL;DR: This chapter presents a strategy that allows the integration of results from UOA into the results from proctored computerized assessments and generalizes the idea of motivational filtering, known for the treatment of rapid guessing behavior in low-stakes assessment.

...read moreread less

Abstract: Many large-scale competence assessments such as the National Educational Panel Study (NEPS) have introduced novel test designs to improve response rates and measurement precision. In particular, unstandardized online assessments (UOA) offer an economic approach to reach heterogeneous populations that otherwise would not participate in face-to-face assessments. Acknowledging the difference between delivery, mode, and test setting, this chapter extends the theoretical background for dealing with mode effects in NEPS competence assessments (Kroehne and Martens in Zeitschrift fur Erziehungswissenschaft 14:169–186, 2011 2011) and discusses two specific facets of UOA: (a) the confounding of selection and setting effects and (b) the role of test-taking behavior as mediator variable. We present a strategy that allows the integration of results from UOA into the results from proctored computerized assessments and generalizes the idea of motivational filtering, known for the treatment of rapid guessing behavior in low-stakes assessment. We particularly emphasize the relationship between paradata and the investigation of test-taking behavior, and illustrate how a reference sample formed by competence assessments under standardized and supervised conditions can be used to increase the comparability of UOA in mixed-mode designs. The closing discussion reflects on the trade-off between data quality and the benefits of UOA.

...read moreread less

5 citations

Cites background from "Does it Matter How Data are Collect..."

...As Barry and Finney (2009) showed by comparing UOA and different standardizations of classroom testing, standardized test conditions are superior even for test development....
[...]
...Whereas this setting effect can be seen as part of the ecological validity in the context of psychological experiments (Reips 2000), it might threaten the validity of competence assessments (e.g., Barry & Finney 2009)....
[...]

References

PDF

Open Access

More filters

Journal Article•DOI•

A Psychometric Study of the College Self-Efficacy Inventory:

[...]

Paul A. Gore¹, Wade C. Leuwerke¹, Sarah E. Turley¹•Institutions (1)

Southern Illinois University Carbondale¹

01 Nov 2005-Journal of College Student Retention: Research, Theory and Practice

TL;DR: According to social cognitive theory, self-efficacy beliefs are the gateway to premature post-secondary institutional departure as discussed by the authors, which is a common cause of premature institutional departure in higher education.

...read moreread less

Abstract: Researchers and educators continue to try to understand and predict premature post-secondary institutional departure. According to social cognitive theory, self-efficacy beliefs are the gateway to ...

...read moreread less

86 citations

Journal Article•DOI•

The Generalizability of Motivation Filtering in Improving Test Score Validity

[...]

Vicki L. Wise, Steven L. Wise, Dennison S. Bhola

01 Feb 2006-Educational Assessment

TL;DR: The authors examined the generalizability of a recently developed technique called motivation filtering, whereby scores for students of low motivation are systemically filtered from test data to determine aggregate test scores that more accurately reflect student performance and that can be used for reporting purposes.

...read moreread less

Abstract: Accountability for educational quality is a priority at all levels of education. Low-stakes testing is one way to measure the quality of education that students receive and make inferences about what students know and can do. Aggregate test scores from low-stakes testing programs are suspect, however, to the degree that these scores are influenced by low test-taker effort. This study examined the generalizability of a recently developed technique called motivation filtering, whereby scores for students of low motivation are systemically filtered from test data to determine aggregate test scores that more accurately reflect student performance and that can be used for reporting purposes. Across assessment tests in five different content areas, motivation filtering was found to consistently increase mean test performance and convergent validity.

...read moreread less

61 citations

Journal Article•DOI•

Taking the Time to Improve the Validity of Low-Stakes Tests: The Effort-Monitoring CBT.

[...]

Steven L. Wise¹, Dennison S. Bhola¹, Sheng-Ta Yang¹•Institutions (1)

James Madison University¹

01 Jun 2006-Educational Measurement: Issues and Practice

TL;DR: In this paper, an effort-monitoring CBT was proposed to suppress rapid-guessing behavior in a low-stakes test, where the computer monitors examinee effort based on item response time.

...read moreread less

Abstract: The attractiveness of computer-based tests (CBTs) is due largely to their capability to expand the ways we conduct testing. A relatively unexplored application, however, is actively using the computer to reduce construct-irrelevant variance while a test is being administered. This investigation introduces the effort-monitoring CBT, in which the computer monitors examinee effort (based on item response time) in a low-stakes test and displays warning messages to those exhibiting rapid-guessing behavior. The results of an experimental study are presented, which showed that an effort-monitoring CBT increased examinee effort and yielded more valid test scores than a conventional CBT. Thus, unlike previous research that has focused on identifying rapid-guessing behavior after it has occurred, the effort-monitoring CBT proactively attempts to suppress rapid-guessing behavior. This innovative testing procedure extends the capabilities of measurement practitioners to manage the psychometric challenges posed by unmotivated examinees.

...read moreread less

60 citations

Journal Article•DOI•

Evaluating the assessment: sources of evidence for quality assurance

[...]

Menucha Birenbaum¹•Institutions (1)

Tel Aviv University¹

01 Mar 2007-Studies in Educational Evaluation

TL;DR: In this paper, the authors present a framework based on the unified view of validity to assist in generating an evidence-based argument regarding the quality of a given assessment practice, including content, structure, sampling, contextual influences, score production, and utility.

...read moreread less

42 citations

"Does it Matter How Data are Collect..." refers background in this paper

...We believe this study helps answer the call of Birenbaum (2007) to evaluate the validity of the full testing program....
[...]
...…but rather should consider these inferences within the wider frame of how the assessment instruments map to the domain of study, the psychometric functioning and internal structure of the instruments (e.g., factor structure), and the contexts in which the data were collected (Birenbaum, 2007)....
[...]
..., factor structure), and the contexts in which the data were collected (Birenbaum, 2007)....
[...]