scispace - formally typeset
Search or ask a question
Book ChapterDOI

Disentangling Setting and Mode Effects for Online Competence Assessment

TL;DR: This chapter presents a strategy that allows the integration of results from UOA into the results from proctored computerized assessments and generalizes the idea of motivational filtering, known for the treatment of rapid guessing behavior in low-stakes assessment.
Abstract: Many large-scale competence assessments such as the National Educational Panel Study (NEPS) have introduced novel test designs to improve response rates and measurement precision. In particular, unstandardized online assessments (UOA) offer an economic approach to reach heterogeneous populations that otherwise would not participate in face-to-face assessments. Acknowledging the difference between delivery, mode, and test setting, this chapter extends the theoretical background for dealing with mode effects in NEPS competence assessments (Kroehne and Martens in Zeitschrift fur Erziehungswissenschaft 14:169–186, 2011 2011) and discusses two specific facets of UOA: (a) the confounding of selection and setting effects and (b) the role of test-taking behavior as mediator variable. We present a strategy that allows the integration of results from UOA into the results from proctored computerized assessments and generalizes the idea of motivational filtering, known for the treatment of rapid guessing behavior in low-stakes assessment. We particularly emphasize the relationship between paradata and the investigation of test-taking behavior, and illustrate how a reference sample formed by competence assessments under standardized and supervised conditions can be used to increase the comparability of UOA in mixed-mode designs. The closing discussion reflects on the trade-off between data quality and the benefits of UOA.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper integrates log data from educational assessments into a taxonomy of paradata, and uses log data of the context questionnaires of the Programme for International Student Assessment (PISA) to illustrate the approach.
Abstract: Log data from educational assessments attract more and more attention and large-scale assessment programs have started providing log data as scientific use files. Such data generated as a by-product of computer-assisted data collection has been known as paradata in survey research. In this paper, we integrate log data from educational assessments into a taxonomy of paradata. To provide a generic framework for the analysis of log data, finite state machines are suggested. Beyond its computational value, the specific benefit of using finite state machines is achieved by separating platform-specific log events from the definition of indicators by states. Specifically, states represent filtered log data given a theoretical process model, and therefore, encode the information of log files selectively. The approach is empirically illustrated using log data of the context questionnaires of the Programme for International Student Assessment (PISA). We extracted item-level response time components from questionnaire items that were administered as item batteries with multiple questions on one screen and related them to the item responses. Finally, the taxonomy and the finite state machine approach are discussed with respect to the definition of complete log data, the verification of log data and the reproducibility of log data analyses.

59 citations

Journal ArticleDOI
TL;DR: The psychometric properties of a short instrument for the assessment of reasoning abilities that was administered as part of a longitudinal LSA to German students from special schools and basic secondary schools twice within 6 years demonstrate the feasibility of incorporating students with SEN-L into educational LSAs.
Abstract: Students with special educational needs in the area of learning (SEN-L) have learning disabilities that can lead to academic difficulties in regular schools. In Germany, these students are frequently enrolled in special schools providing specific training and support for these students. Because of their cognitive difficulties, it is unclear whether standard achievement tests that are typically administered in educational large-scale assessments (LSA) are suitable of students with SEN-L. The present study evaluated the psychometric properties of a short instrument for the assessment of reasoning abilities that was administered as part of a longitudinal LSA to German students from special schools (N = 324) and basic secondary schools (N = 338) twice within 6 years. Item response modeling demonstrated an essentially unidimensional scale for both school types. Few items exhibited systematic differential item functioning (DIF) between students with and without SEN-L, allowing for valid cross-group comparisons. However, change analyses across the two time points needed to account for longitudinal DIF among students with SEN-L. Overall, the cognitive test allowed for a valid measurement of reasoning abilities in students with SEN-L and comparative analyses regarding students without SEN-L. These results demonstrate the feasibility of incorporating students with SEN-L into educational LSAs.

10 citations

Journal ArticleDOI
TL;DR: This paper examined the effects of computer-based versus paper-based assessment of critical thinking skills, adapted from English (in the U.S.) to Chinese, using data collected based on a random assignment.
Abstract: We examine the effects of computer-based versus paper-based assessment of critical thinking skills, adapted from English (in the U.S.) to Chinese. Using data collected based on a random assignment ...

9 citations

Journal ArticleDOI
TL;DR: Unsupervised web-based assessments seem to be a feasible option in cognitive large-scale studies in higher education, particularly among low to medium ability respondents.
Abstract: Educational large-scale studies typically adopt highly standardized settings to collect cognitive data on large samples of respondents. Increasing costs alongside dwindling response rates in these studies necessitate exploring alternative assessment strategies such as unsupervised web-based testing. Before respective assessment modes can be implemented on a broad scale, their impact on cognitive measurements needs to be quantified. Therefore, an experimental study on N = 17,473 university students from the German National Educational Panel Study has been conducted. Respondents were randomly assigned to a supervised paper-based, a supervised computerized, and an unsupervised web-based mode to work on a test of scientific literacy. Mode-specific effects on selection bias, measurement bias, and predictive bias were examined. The results showed a higher response rate in web-based testing as compared to the supervised modes, without introducing a pronounced mode-specific selection bias. Analyses of differential test functioning showed systematically larger test scores in paper-based testing, particularly among low to medium ability respondents. Prediction bias for web-based testing was observed for one out of four criteria on study-related success factors. Overall, the results indicate that unsupervised web-based testing is not strictly equivalent to other assessment modes. However, the respective bias introduced by web-based testing was generally small. Thus, unsupervised web-based assessments seem to be a feasible option in cognitive large-scale studies in higher education.

9 citations

Journal ArticleDOI
TL;DR: The results of the information extracted from the scientific articles show that there is a need for supervision of students during online assessments, which can occur through computer vision algorithms, since there have been significant advances in these areas.
Abstract: Control of online evaluations online modality using artificial vision is a qualitative study that is based on the method of analysis and bibliographic conceptualization of 59 scientific articles taken from a total of 123 existing in the various bibliographic databases. The study began by addressing the issues concerned from the importance and understanding of online assessment control in the academic context to the types of computer vision algorithms and the main applications. To achieve this, the guided bibliographic technique was used through four research questions: What problems exist in online evaluations? What techniques have been used to detect plagiarism in online evaluations? What machine vision algorithms are used? What are the main detection and monitoring tasks that computer vision algorithms are capable of performing? The research questions allowed us to investigate problems of academic dishonesty, ease of committing plagiarism, online assessment control techniques, plagiarism detection techniques, object tracking algorithms, region-based algorithms, grid-based algorithms, face detection, detection of gestures and object detection. To determine the most relevant articles, three phases were considered. Phase one took into account inclusion criteria such as scientific articles, reviews, conferences evaluated by peers, studies carried out on the artificial vision algorithm, as well as online evaluations. The second phase gave the word search chain greater relevance to the bibliographic review and to provide it with adequate capacity to answer the four research questions, was ordered by year of publication, the topic, abstract and keywords were reviewed. Phase three reviewed by sections corresponding to the introduction and conclusion to know if the information contributes and if it is related to the research questions. The results of the information extracted from the scientific articles show that there is a need for supervision of students during online assessments, which can occur through computer vision algorithms, since there have been significant advances in these areas.

3 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper examined whether individuals can fake their responses to a personality inventory if instructed to do so, and concluded that within-subjects designs produce more accurate estimates than between-subject designs.
Abstract: The authors examined whether individuals can fake their responses to a personality inventory if instructed to do so. Between-subjects and within-subject designs were metaanalyzed separately. Across 51 studies, fakability did not vary by personality dimension; all the Big Five factors were equally fakable. Faking produced the largest distortions in social desirability scales. Instructions to fake good produced lower effect sizes compared with instructions to fake bad. Comparing meta-analytic results from within-subjects and between-subjects designs, we conclude, based on statistical and methodological considerations, that within-subjects designs produce more accurate estimates. Between-subjects designs may distort estimates due to Subject × Treatment interactions and low statistical power.

483 citations


"Disentangling Setting and Mode Effe..." refers methods in this paper

  • ...7Test-taking behavior can be studied experimentally by, for instance, using different instructional sets, as often done to determine the limits on fakability of personality scales (see for a meta-analysis, Viswesvaran & Ones, 1999)....

    [...]

Journal ArticleDOI
TL;DR: In this article, a theoretical model of test-taking motivation is presented, with a synthesis of previous research indicating that low student motivation is associated with a substantial decrease in test performance.
Abstract: Student test-taking motivation in low-stakes assessment testing is examined in terms of both its relationship to test performance and the implications of low student effort for test validity. A theoretical model of test-taking motivation is presented, with a synthesis of previous research indicating that low student motivation is associated with a substantial decrease in test performance. A number of assessment practices and data analytic procedures for managing the problems posed by low student motivation are discussed.

435 citations


"Disentangling Setting and Mode Effe..." refers methods in this paper

  • ...Then, motivation filtering should be applied to both the standardized and the unstandardized assessment, because it is known from previous research that rapid guessing threatens the validity of assessment results (e.g., Wise & DeMars 2005)....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors compared telephone and web versions of a questionnaire that assessed attitudes toward science and knowledge of basic scientific facts and found that the Web questionnaire produced less item nonresponse than the telephone survey.
Abstract: We carried out an experiment that compared telephone and Web versions of a questionnaire that assessed attitudes toward sci- ence and knowledge of basic scientific facts. Members of a random digit dial (RDD) sample were initially contacted by telephone and answered a few screening questions, including one that asked whether they had Internet access. Those with Intern et access were randomly assigned to complete either a Web version of the questionnaire or a computer- assisted telephone interview. Ther e were four main findings. First, although we offered cases assigned to the Web survey a larger incen- tive, fewer of them completed the online questionnaire; almost all those who were assigned to the telephone condition completed the interview. The two samples of Web users nonetheless had similar demographic characteristics. Second, the Web surv ey produced less item nonresponse than the telephone survey. The Web questionnaire prompted respon- dents when they left an item blank, whereas the telephone interviewers accepted "no opinion" answers without probing them. Third, Web respondents gave less differentiated an swers to batteries of attitude items than their telephone counterparts . The Web questionnaire presented these items in a grid that may have made their similarity more salient.

434 citations


"Disentangling Setting and Mode Effe..." refers methods in this paper

  • ...If this is not possible, the standardized test administration can be used as reference sample in the context of mixed-mode assessments (Fricker 2005; Vannieuwenhuyze et al. 2011)....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors introduce a new measure, termed response time effort (RTE), which is based on the hypothesis that unmotivated examinees will answer too quickly (i.e., before they have time to read and fully consider the item).
Abstract: When low-stakes assessments are administered, the degree to which examinees give their best effort is often unclear, complicating the validity and interpretation of the resulting test scores. This study introduces a new method, based on item response time, for measuring examinee test-taking effort on computer-based test items. This measure, termed response time effort (RTE), is based on the hypothesis that when administered an item, unmotivated examinees will answer too quickly (i.e., before they have time to read and fully consider the item). Psychometric characteristics of RTE scores were empirically investigated and supportive evidence for score reliability and validity was found. Potential applications of RTE scores and their implications are discussed.

412 citations


"Disentangling Setting and Mode Effe..." refers background or methods or result in this paper

  • ...In particular, fast responses are used to identify rapid-guessing behavior (Schnipke & Scrams 1997) that is related to test-taking engagement (Wise & Kong 2005)....

    [...]

  • ...Rapid guessing: For some selected indicators, such as solution behavior in relationship to test-taking engagement, robust theories (e.g., Wise & Kong 2005, Wise 2015, Guo et al. 2016) and sound evidence from previous research (e.g., Lee & Jia 2014, Finn 2015, Goldhammer et al. 2016, Liu et al.…...

    [...]

  • ...For instance, available theoretical considerations, such as the assumption about the existence of lurkers in online assessments (Bosnjak & Tuten 2001) or the link between response time and test-taking effort (Wise & Kong 2005), can be used to derive indicators of specific testtaking behaviors....

    [...]

Book ChapterDOI
01 Jan 2000
TL;DR: An apparatus for analyzing flowable substances such as slurries using techniques such as X-ray fluorescence, infra-red reflectance or emisson spectrography.
Abstract: The World Wide Web (WWW) provides a new tool for experimental research. The Web experiment method differs in fundamental aspects from traditional laboratory and field experiments; therefore it can be used to validate previous findings. Web experiments offer (1) easy access to a demographically and culturally diverse participant population, including participants from unique and previously inaccessible target populations; (2) bringing the experiment to the participant instead of the opposite; (3) high statistical power by enabling access to large samples; (4) the direct assessment of motivational confounding; and (5) cost savings of lab space, person-hours, equipment, and administration. These and 13 other advantages of Web experiments are reviewed and contrasted with 7 disadvantages, such as (1) multiple submissions, (2) lack of experimental control, (3) self-selection, and (4) drop out. Several techniques and other detailed solutions are described that avoid potential problems or even turn them into useful features of Web experimentation.

401 citations


"Disentangling Setting and Mode Effe..." refers background in this paper

  • ...Whereas this setting effect can be seen as part of the ecological validity in the context of psychological experiments (Reips 2000), it might threaten the validity of competence assessments (e.g., Barry & Finney 2009)....

    [...]

  • ...Dropout behavior in online assessments might reflect lower levels of commitment to the test (e.g., Reips 2000)....

    [...]