Showing papers in &quot;Language Testing in 2018&quot;

The relationship between three measures of L2 vocabulary knowledge and L2 listening and reading

TL;DR: In this article, the authors argue for the integration of the construct of interactional competence (IC) in the assessment of speaking, arguing that a psycholinguistically based speaking construct has predominated.

...read moreread less

Abstract: In the assessment of speaking, a psycholinguistically based speaking construct has predominated. In this paper, we argue for the integration of the construct of interactional competence (IC) in spe...

...read moreread less

81 citations

Journal Article•DOI•

[...]

Junyu Cheng¹, Joshua Matthews²•Institutions (2)

Southeast University¹, University of New England (Australia)²

A systematic review of methods for evaluating rating quality in language assessment

TL;DR: This article explored the constructs that underpin three different measures of vocabulary knowledge and investigated the degree to which these three measures correlate with, and are able to predict, the degree of knowledge knowledge.

...read moreread less

Abstract: This study explores the constructs that underpin three different measures of vocabulary knowledge and investigates the degree to which these three measures correlate with, and are able to predict, ...

...read moreread less

81 citations

Journal Article•DOI•

[...]

Stefanie A. Wind¹, Meghan E. Peterson¹•Institutions (1)

University of Alabama¹

What counts as “responding”? Contingency on previous speaker contribution as a feature of interactional competence:

TL;DR: This article identified and explored the dominant methods for evaluating rating quality within the context of research on large-scale rater-mediated language assessments and highlighted the reliance upon aggregate-level information that is not specific to individual raters or sp...

...read moreread less

Abstract: The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on large-scale rater-mediated language assessments. Results from the review of 259 methodological and applied studies reveal an emphasis on inter-rater reliability as evidence of rating quality that persists across methodological and applied studies, studies primarily focused on rating quality and studies not primarily focused on rating quality, and across multiple language constructs. Additional findings suggest discrepancies in rating designs used in empirical research and practical concerns in performance assessment systems. Taken together, the findings from this study highlight the reliance upon aggregate-level information that is not specific to individual raters or sp...

...read moreread less

58 citations

Journal Article•DOI•

[...]

Daniel M. K. Lam¹•Institutions (1)

University of Bedfordshire¹

Validation of rating processes within an argument-based framework:

TL;DR: The ability to interact with others has gained recognition as part of the L2 speaking construct in the assessment literature and in high and low-stakes speaking assessments as mentioned in this paper, and it has been recognized as one of the strengths of L2.

...read moreread less

Abstract: The ability to interact with others has gained recognition as part of the L2 speaking construct in the assessment literature and in high- and low-stakes speaking assessments. This paper first prese...

...read moreread less

51 citations

Journal Article•DOI•

[...]

Ute Knoch¹, Carol A. Chapelle²•Institutions (2)

University of Melbourne¹, Iowa State University²

Interpreting the relationships between TOEFL iBT scores and GPA: Language proficiency, policy, and profiles:

TL;DR: In this article, the authors present an analysis of how the concerns of language testers can be conceptualized in terms used to construct a validity argument, and the relevance of research about the rating of test performances extends beyond one or two inferences about rater reliabilit...

...read moreread less

Abstract: Argument-based validation requires test developers and researchers to specify what is entailed in test interpretation and use. Doing so has been shown to yield advantages (Chapelle, Enright, & Jamieson, 2010), but it also requires an analysis of how the concerns of language testers can be conceptualized in the terms used to construct a validity argument. This article presents one such analysis by examining how issues associated with the rating of test takers’ linguistic performance can be included in a validity argument. Through a manual search of published language testing research, we gathered examples of research studies investigating the quality of rating processes and products. We then analyzed them in terms of how the research could be framed within a validity argument. Drawing on Kane’s (2001, 2006, 2013) conceptualization of inferences, warrants, and assumptions, we show that the relevance of research about the rating of test performances extends beyond one or two inferences about rater reliabilit...

...read moreread less

50 citations

Journal Article•DOI•

[...]

April Ginther¹, Xun Yan²•Institutions (2)

Purdue University¹, University of Illinois at Urbana–Champaign²

Monitoring the performance of human and automated scores for spoken responses

TL;DR: This article examined the predictive validity of TOEFL iBT with respect to academic achievement as measured by the first-year grade point average (GPA) of Chinese students at Purdue University, a large, public, Research I institution in Indiana, USA.

...read moreread less

Abstract: This study examines the predictive validity of the TOEFL iBT with respect to academic achievement as measured by the first-year grade point average (GPA) of Chinese students at Purdue University, a large, public, Research I institution in Indiana, USA. Correlations between GPA, TOEFL iBT total and subsection scores were examined on 1990 mainland Chinese students enrolled across three academic years (N2011 = 740, N2012 = 554, N2013 = 696). Subsequently, cluster analyses on the three cohorts’ TOEFL subsection scores were conducted to determine whether different score profiles might help explain the correlational patterns found between TOEFL subscale scores and GPA across the three student cohorts. For the 2011 and 2012 cohorts, speaking and writing subscale scores were positively correlated with GPA; however, negative correlations were observed for listening and reading. In contrast, for the 2013 cohort, the writing, reading, and total subscale scores were positively correlated with GPA, and the negative co...

...read moreread less

42 citations

Journal Article•DOI•

[...]

Zhen Wang, Klaus Zechner, Yu Sun

The Development of EFL Examinations in Haiti: Collaboration and Language Assessment Literacy Development.

TL;DR: An overview of the automated speech scoring system SpeechRaterSM is provided and how to use charts and evaluation statistics to monitor and evaluate automated scores and human rater scores of spoken constructed responses is provided.

...read moreread less

Abstract: As automated scoring systems for spoken responses are increasingly used in language assessments, testing organizations need to analyze their performance, as compared to human raters, across several dimensions, for example, on individual items or based on subgroups of test takers. In addition, there is a need in testing organizations to establish rigorous procedures for monitoring the performance of both human and automated scoring processes during operational administrations. This paper provides an overview of the automated speech scoring system SpeechRaterSM and how to use charts and evaluation statistics to monitor and evaluate automated scores and human rater scores of spoken constructed responses.

...read moreread less

41 citations

Journal Article•DOI•

[...]

Beverly Baker¹, Caroline Riches²•Institutions (2)

University of Ottawa¹, McGill University²

Investigating the construct measured by banked gap-fill items: Evidence from eye-tracking:

TL;DR: This paper presented a series of workshops on language assessment with Haitian teachers in the spring of 2013, and the final products of these workshops were several revised versions of the workshop's final report.

...read moreread less

Abstract: Research was conducted during the delivery of a series of workshops on language assessment with Haitian teachers in the spring of 2013. The final products of these workshops were several revised na...

...read moreread less

40 citations

Journal Article•DOI•

[...]

Gareth McCray¹, Tineke Brunfaut¹•Institutions (1)

Lancaster University¹

Developing a user-oriented second language comprehensibility scale for English-medium universities

TL;DR: The authors investigated test-takers' processing while completing banked gap-fill tasks, designed to test reading proficiency, in order to test theoretically based expectations about the variation in cognitive processes of testtakers across levels of performance.

...read moreread less

Abstract: This study investigates test-takers’ processing while completing banked gap-fill tasks, designed to test reading proficiency, in order to test theoretically based expectations about the variation in cognitive processes of test-takers across levels of performance. Twenty-eight test-takers’ eye traces on 24 banked gap-fill items (on six tasks) were analysed according to seven on-line eye-tracking measures representing overall, text and task processing. Variation in processing was related to test-takers’ level of performance on the tasks overall. In particular, as hypothesised, lower-scoring students exerted more cognitive effort on local reading and lower-level cognitive processing in contrast to test-takers who attained higher scores. The findings of different cognitive processes associated with variation in scores illuminate the construct measured by banked gap-fill items, and therefore have implications for test design and the validity of score interpretations.

...read moreread less

37 citations

Journal Article•DOI•

[...]

Talia Isaacs¹, Pavel Trofimovich², Jennifer A. Foote³•Institutions (3)

University College London¹, Concordia University², University of Alberta³

Professional and non-professional raters’ responsiveness to fluency and accuracy in L2 speech: An experimental approach:

TL;DR: This paper developed an L2 English comprehensibility scale targeting the degree of perceived listener effort required for understanding L2 speech, which was used to guide teachers on what to focus on in instruction in order to target more effectively the linguistic factors that matter most for being understood and to raise learners' awareness about their abilities.

...read moreread less

Abstract: There is growing research on the linguistic features that most contribute to making second language (L2) speech easy or difficult to understand. Comprehensibility, which is usually captured through listener judgments, is increasingly viewed as integral to the L2 speaking construct. However, there are shortcomings in how this construct is operationalized in L2 speaking proficiency scales. Moreover, teachers and learners have little practical means of benefiting from research pinpointing the properties of learners’ oral performance that optimize or hinder their ability to be understood. There is thus the need for a tool to guide teachers on what to focus on in instruction in order to target more effectively the linguistic factors that matter most for being understood and to raise learners’ awareness about their abilities. To address this gap, this article reports on the development of an L2 English comprehensibility scale targeting the degree of perceived listener effort required for understanding L2 speech...

...read moreread less

37 citations

Journal Article•DOI•

[...]

Klaartje Duijm¹, Rob Schoonen², Jan H. Hulstijn¹•Institutions (2)

University of Amsterdam¹, Radboud University Nijmegen²

Interactional competence: Genie out of the bottle:

TL;DR: The authors investigated whether professional and non-professional raters with a broad exposure to L2 speech demonstrate similar responsiveness to fluency and linguistic accuracy in an occupational context, and found that raters' knowledge and experience may influence their ratings, both in terms of leniency and varied focus on different aspects of speech.

...read moreread less

Abstract: It is general practice to use rater judgments in speaking proficiency testing. However, it has been shown that raters’ knowledge and experience may influence their ratings, both in terms of leniency and varied focus on different aspects of speech.The purpose of this study is to identify raters’ relative responsiveness to fluency and linguistic accuracy in an occupational context, and to investigate whether professional and non-professional raters with a broad exposure to L2 speech demonstrate similar responsiveness to these two aspects. To this end, an experimental approach was applied. Fluency and accuracy were separated and systematically manipulated. As it is known that foreign accentedness of speech influences raters’ judgments, this factor was accounted for. Seventeen responses to a Dutch L2 exam in a vocational context were converted into four different versions manipulated for morpho-syntactical accuracy and/or fluency, and read by a Dutch L2 actor, resulting in 68 stimuli. Fifty-five professional ...

...read moreread less

Journal Article•DOI•

[...]

India C. Plough¹, Jayanti Banerjee, Noriko Iwashita²•Institutions (2)

Michigan State University¹, University of Queensland²

Topical Knowledge in L2 Speaking Assessment: Comparing Independent and Integrated Speaking Test Tasks.

TL;DR: The authors provide support for continued scrutiny of interactional competence (IC) as an important component of the speaking construct, and discuss the challenges associated with including IC in the spoken construct and the implications of studies in this special issue for the relationship between IC and proficiency.

...read moreread less

Abstract: The papers in this special issue provide support for continued scrutiny of interactional competence (IC) as an important component of the speaking construct. The contributions underscore the complex nature of IC and remind us of the multiple factors that affect any construct definition. At the same time, each study offers insights into those factors through their explorations of IC. In this final paper, we first briefly review key findings from the papers that confirm what is already known about IC and that provide new information to our understanding of the construct of IC. After summarizing points of convergence and of divergence, we turn to a discussion of areas that require additional targeted attention and offer four generalizations as starting points for research. In the final section, we take a critical look at the challenges associated with including IC in the speaking construct and the implications of the studies in this special issue for the relationship between IC and proficiency.

...read moreread less

Journal Article•DOI•

[...]

Heng-Tsung Danny Huang¹, Shao-Ting Alan Hung², Lia Plakans³•Institutions (3)

National Taiwan University¹, National Taiwan University of Science and Technology², University of Iowa³

University entrance language tests: A matter of justice:

TL;DR: The authors examined the influence of reading and listening input on the ability of test-takers to formulate their oral responses in integrated speaking test tasks (integrated tasks) and found that reading and/or listening input can provide reading and or listening input to serve as the basis for testtakers' responses.

...read moreread less

Abstract: Integrated speaking test tasks (integrated tasks) provide reading and/or listening input to serve as the basis for test-takers to formulate their oral responses. This study examined the influence o...

...read moreread less

Journal Article•DOI•

[...]

Bart Deygers¹, Kris Van den Branden¹, Koen Van Gorp²•Institutions (2)

Katholieke Universiteit Leuven¹, Michigan State University²

What constitutes professional communication in aviation: Is language proficiency enough for testing purposes?:

TL;DR: The assumption that even if language proficiency does not determine academic success, a certain proficiency level is still required is challenged by as discussed by the authors, who argue that language proficiency is not determinant of academic success.

...read moreread less

Abstract: University entrance language tests are often administered under the assumption that even if language proficiency does not determine academic success, a certain proficiency level is still required. ...

...read moreread less

Journal Article•DOI•

[...]

Hyejeong Kim¹•Institutions (1)

University of Melbourne¹

Young Learners' Response Processes When Taking Computerized Tasks for Speaking Assessment.

TL;DR: In this article, the authors identify what aviation experts consider to be the key features of effective communication by examining in detail their commentary on a 17-minute segment of recorded radiotelephon...

...read moreread less

Abstract: This paper aims to identify what aviation experts consider to be the key features of effective communication by examining in detail their commentary on a 17-minute segment of recorded radiotelephon...

...read moreread less

Journal Article•DOI•

[...]

Shinhye Lee¹, Paula Winke¹•Institutions (1)

Michigan State University¹

The teacher as examiner of L2 oral tests: A challenge to standardization:

TL;DR: The authors investigated the children's attentional foci on different test components (e.g., prompts, pictures, and a countdown timer) by means of their eye movements and found that NNS tended to fixate longer on and looked more frequently at the countdown timer than their NS peers, who were more lik...

...read moreread less

Abstract: We investigated how young language learners process their responses on and perceive a computer-mediated, timed speaking test. Twenty 8-, 9-, and 10-year-old non-native English-speaking children (NNSs) and eight same-aged, native English-speaking children (NSs) completed seven computerized sample TOEFL® Primary™ speaking test tasks. We investigated the children’s attentional foci on different test components (e.g., prompts, pictures, and a countdown timer) by means of their eye movements. We associated the children’s eye-movement indices (visit counts and fixation durations) with spoken performance. The children provided qualitative data (interviews; picture-drawings) on their test experiences as well. Results indicated a clear contrast between NNSs and NSs in terms of speech production (large score differences) as expected. More interestingly, the groups’ eye-movement patterns differed. NNSs tended to fixate longer on and looked more frequently at the countdown timer than their NS peers, who were more lik...

...read moreread less

Journal Article•DOI•

[...]

Pia Sundqvist¹, Peter Wikström¹, Erica Sandlund¹, Lina Nyroos²•Institutions (2)

Karlstad University¹, Uppsala University²

Measuring L2 speakers’ interactional ability using interactive speech tasks

TL;DR: In this paper, the issue of standardization in L2 oral testing is discussed and some countries opt for test-takers' own teachers as examiners in some countries, where external examiners are frequently used globally.

...read moreread less

Abstract: The present paper looks at the issue of standardization in L2 oral testing. Whereas external examiners are frequently used globally, some countries opt for test-takers’ own teachers as examiners in ...

...read moreread less

Journal Article•DOI•

[...]

Eline van Batenburg, Ron Oostdam¹, Amos J. S. van Gelderen², Nivja H. de Jong³•Institutions (3)

Hogeschool van Amsterdam¹, Rotterdam University of Applied Sciences², Utrecht University³

Do the TOEFL iBT® section scores provide value-added information to stakeholders?

TL;DR: The authors used a test format that standardizes the interlocutor's linguistic and interactional contributions to the exchange to assess interactional performance of pre-vocational learners, and reported on the extent to which these tasks can be used to assess L2 speakers' interactional abilities in a reliable and valid manner.

...read moreread less

Abstract: This article explores ways to assess interactional performance, and reports on the use of a test format that standardizes the interlocutor’s linguistic and interactional contributions to the exchange. It describes the construction and administration of six scripted speech tasks (instruction, advice, and sales tasks) with pre-vocational learners (n = 34), and reports on the extent to which these tasks can be used to assess L2 speakers’ interactional performance in a reliable and valid manner.The high levels of agreement found between three independent raters on both holistic and analytical measurements of interactional performance indicate that this construct can be measured reliably with these tasks. Means and standard deviations demonstrate that tasks differentiate between speakers’ interactional performance. Holistic ratings of linguistic accuracy and interactional ability correlate highly between tasks that focus on different language functions, and are situated in different interactional domains. Furthermore, positive correlations are found between both holistic and analytic ratings of oral performance and vocabulary size. Positive within-task correlations between analytical ratings of specific interactional strategies and holistic ratings of overall interactional ability show that analytic ratings of meaning negotiation and correcting misinterpretation provide additional information about speakers’ interactional ability that is not captured by holistic assessment alone.It is concluded that these tasks are a useful diagnostic tool for practitioners to support their learners’ interactional abilities at a sub-skill level.

...read moreread less

Journal Article•DOI•

[...]

Yasuyo Sawaki¹, Sandip Sinharay•Institutions (1)

Waseda University¹

Reliable predictors of reduced redundancy test performance: The interaction between lexical bonds and test takers’ depth and breadth of vocabulary knowledge:

TL;DR: This paper examined the reliability of the reading, listening, speaking, and writing section scores for the TOEFL iBT® test and their interrelationship in order to collect empirical evidence.

...read moreread less

Abstract: The present study examined the reliability of the reading, listening, speaking, and writing section scores for the TOEFL iBT® test and their interrelationship in order to collect empirical evidence...

...read moreread less

Journal Article•DOI•

[...]

Mostafa Janebi Enayat¹, Esmat Babaii²•Institutions (2)

Hakim Sabzevari University¹, Kharazmi University²

Listener Response as a Facet of Interactional Competence.

TL;DR: The authors investigated whether test takers' breadth and depth of vocabulary knowledge can contribute to their efficient use of lexical bonds while restoring damaged texts in reducible text restoration task, and they found that test taker's depth and breadth of knowledge contributed to their efficiency in restoring damaged text.

...read moreread less

Abstract: The present study intended to investigate whether test takers’ breadth and depth of vocabulary knowledge can contribute to their efficient use of lexical bonds while restoring damaged texts in redu...

...read moreread less

Journal Article•DOI•

[...]

Steven J. Ross¹•Institutions (1)

University of Maryland, College Park¹

A comparison of reliability and precision of subscore reporting methods for a state English language proficiency assessment

TL;DR: Interactional competence has been variously defined as turn-taking ability, paralinguistic features of communication such as eye contact, gesture, and gesticulation, and listener responses as mentioned in this paper.

...read moreread less

Abstract: Interactional competence has been variously defined as turn-taking ability, paralinguistic features of communication such as eye contact, gesture, and gesticulation, and listener responses. In exis...

...read moreread less

Journal Article•DOI•

[...]

Tanya Longabach¹, Vicki Peyton²•Institutions (2)

Excelsior College¹, University of Kansas²

TL;DR: K-12 English language proficiency tests that assess multiple content domains (e.g., listening, speaking, reading, writing) often have subsections based on these content domains; scores assigned to...

...read moreread less

Abstract: K–12 English language proficiency tests that assess multiple content domains (e.g., listening, speaking, reading, writing) often have subsections based on these content domains; scores assigned to ...

...read moreread less

Journal Article•DOI•

Test review: TestDaF

[...]

John M. Norris, Anastasia Drackert¹•Institutions (1)

Ruhr University Bochum¹

Screener tests need validation too: Weighing an argument for test use against practical concerns:

TL;DR: The Test of German as a Foreign Language (TestDaF1) as discussed by the authors is a standardized test of German language proficiency, which has been used for admission to German universities since 2001.

...read moreread less

Abstract: The German university setting has experienced a dramatic change over the past several decades with respect to students entering from abroad. In 2015, international students comprised 11.9% of all students enrolled in public universities, and recent global developments (most notably the massive migration of refugees into Germany) have resulted in rapidly evolving demands on institutions of higher education (g.a.s.t., 2016). Greater numbers of students from different countries of origin, with diverse educational backgrounds and distinct learning needs, are seeking admittance to the low-cost and highly regarded German university system. A key concern for this heterogeneous group of students is the extent to which they are prepared to participate in courses of study and university life, the primary language for which is German. It is within this milieu that the Test of German as a Foreign Language (TestDaF1) plays a critical role as a standardized test of German language proficiency. Developed and administered by the Society for Academic Study Preparation and Test Development (g.a.s.t.2), TestDaF was launched in 2001 and has experienced persistent annual growth, with more than 44,000 test takers in 2016 (a 16% increase over the previous year; g.a.s.t., 2017). Of note, and in keeping with the motto of “Study successfully in German”, TestDaF is one of a suite of products and services offered by g.a.s.t. and intended to facilitate access to German university studies, including the following: an Internet-based platform for individualized language learning (Deutsch-Uni Online or DUO); an online assessment for placing students into foreign language courses (onSET); and a university aptitude assessment, the Test for Academic Studies (TestAS). On its website, g.a.s.t. provides ample information regarding all of these products for different audiences, including test takers, test centers, universities, and teachers. This information is also made available in 20 languages, and potential test takers can even complete a brief automated C-test and receive feedback regarding their chances of passing TestDaF successfully.3 While the current review focuses on TestDaF per se, the presentation of this test as one part of an overall effort to support and facilitate international student access to German university studies reflects an important public service dimension underlying the assessment. 715848 LTJ0010.1177/0265532217715848Language TestingTest Review research-article2017

...read moreread less

Journal Article•DOI•

[...]

Jonathan E. Schmidgall, Edward P. Getman, Jiyun Zu

Test review: Certifying French competency: The DELF tout public (B2):

TL;DR: The claims articulated through the development process and evidence collected throughout development and pilot testing enable a wide-ranging, comparative evaluation of five- and 10-item TOEFL Primary Reading screener tests that systematically incorporate the concepts of measurement quality, impact, and practicality.

...read moreread less

Abstract: In this study, we define the term screener test, elaborate key considerations in test design, and describe how to incorporate the concepts of practicality and argument-based validation to drive an ...

...read moreread less

Journal Article•DOI•

[...]

Catherine Elder¹•Institutions (1)

University of Melbourne¹

19 Jul 2018-Language Testing

TL;DR: The DELF (Diplôme d'études en langue française) and DALF tests as mentioned in this paper have been used to certify the French competence of non-French citizens or of French citizens from non-francophone countries who have not completed a French secondary or higher education diploma.

...read moreread less

Abstract: Recent estimates indicate that French is spoken as a first or additional language by over 220 million people. French is an official language in 29 countries and in many organizations such as the United Nations and the Red Cross. French is also, after English, the most widely taught language in educational systems around the world, with an estimated 120 million students and 500,000 teachers. It is hardly surprising then that there is a strong international demand for official certification of French competence and that a range of tests are on offer to meet this goal. Among the recognized tests available for this purpose are the DELF (Diplôme d’études en langue française) and DALF (Diplôme approfondi de langue française). These are official qualifications awarded by the French Ministry of Education to certify the French competence of non-French citizens or of French citizens from non-francophone countries who have not completed a French secondary or higher education diploma. There are six independent diplomas: three for children or adolescents (DELF Prim, DELF Junior and DELF Scolaire) and three for adults (DELF tout public, a general proficiency qualification for those over 16 years of age, DELF Pro, a work-related test for those seeking initial employment opportunities or promotion, and DALF for higher level candidates). Each test is oriented to the CEFR scale with DELF Prim pitched at the preA1–A2 levels for immigrants with limited literacy backgrounds, the other DELF tests spanning the A1 to B2 levels and the DALF assessing proficiency at the more advanced C1 and C2 levels. Each test battery covers the four skill components of Listening, Speaking, Reading and Writing.

...read moreread less

Journal Article•DOI•

Revisiting the speaking construct: The question of interactional competence:

[...]

India C. Plough¹•Institutions (1)

Michigan State University¹