scispace - formally typeset
Search or ask a question

Showing papers in "Language Testing in 2001"


Journal ArticleDOI
TL;DR: This paper used a range of analysis techniques to present validity evidence and explore the equivalence of two revised and expanded versions of the Vocabulary Levels Test (VLST) for language assessment and vocabulary research.
Abstract: The Vocabulary Levels Test has been widely used in language assessment and vocabulary research despite never having been properly validated. This article reports on a study which uses a range of analysis techniques to present validity evidence, and to explore the equivalence of two revised and expanded versions of the Vocabulary Levels Test.

1,013 citations


Journal ArticleDOI
TL;DR: The authors argue that the institutional character of assessment often means that the needs of learners are not well served by much language assessment theory and practice, and calls for a reexamination of our research priorities.
Abstract: In this article I argue that a growing awareness of the fundamentally social character of language assessment challenges us to rethink our priorities and responsibilities in language testing research. This awareness has been brought about by the treatment of the social character of educational assessment in Samuel Messick’s influential work on validity, and by the intellectual changes triggered by postmodernism, where models of individual consciousness have been reinterpreted in the light of socially motivated critiques. The article concludes by arguing that the institutional character of assessment often means that the needs of learners are not well served by much language assessment theory and practice, and calls for a reexamination of our research priorities.

212 citations


Journal ArticleDOI
TL;DR: In this article, a sociocultural theory of mind is used to examine the interaction between L2 learning and L2 testing, and it is suggested that an examination of the content of these dialogues can provide test developers with targets for measurement.
Abstract: In this article one aspect of the many interfaces between second language (L2) learning and L2 testing is examined. The aspect that is examined is the oral interaction - the dialogue - that occurs within small groups. Discussed from within a sociocultural theory of mind, the point is made that, in a group, performance is jointly constructed and distributed across the participants. Dialogues construct cognitive and strategic processes which in turn construct student performance, information which may be invaluable in validating inferences drawn from test scores. Furthermore, student dialogues provide opportunities for language learning, i.e., opportunities for the joint construction of knowledge. It is suggested that an examination of the content of these dialogues can provide test developers with targets for measurement. Other implications for L2 testing are also discussed.

193 citations


Journal ArticleDOI
TL;DR: In this article, the authors describe the strong power of tests and the fact that tests lead to far-reaching and high-stakes decisions and consequences about individuals and groups, and propose a number of assessment strategies which are based on democratic principles so that society can guard and protect itself from such undemocratic practices.
Abstract: The article describes the strong power of tests and the fact that tests lead to far-reaching and high-stakes decisions and consequences about individuals and groups. Further, there is evidence that tests are often introduced by those in authority as disciplinary tools, often in covert ways for the purpose of manipulating educational systems and for imposing the agendas of those in authority. Yet, such uses of tests as instruments of power violate fundamental values and principles of democratic practices. The article proposes a number of assessment strategies which are based on democratic principles so that society can guard and protect itself from such undemocratic practices. The principles include the need:• for citizens in democratic societies to play a participatory and active role and transfer and share power from elites to and with local bodies;• for those who develop powerful tools to be responsible for their consequences;• to consider voices of diverse and different groups in multicultural societie...

193 citations


Journal ArticleDOI
TL;DR: Vocabulary tests are used for a wide range of instructional and research purposes but we lack a comprehensive basis for evaluating the current instruments or developing new lexical measures for the task as discussed by the authors.
Abstract: Vocabulary tests are used for a wide range of instructional and research purposes but we lack a comprehensive basis for evaluating the current instruments or developing new lexical measures for the...

169 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a study of classroom assessment in an English is an additional/second language (EAL) school context, using data from teacher interviews, classroom observations, video and audio recordings of learners, and lesson transcripts, the starting point for this investigation is the concept of the assessment cycle.
Abstract: In a recent issue of Language Testing, Rea-Dickins and Gardner (2000) reported on their work in classroom-based assessment, and provided examples of different ways in which information from learner assessments was used by teachers working with learners for whom English is an additional/second language (EAL). The research reported here is also concerned with classroom assessment in an EAL school context. As in the previous article, it is presented from the perspective that issues of classroom assessment and, in particular, formative assessment require further detailed analysis. Using data from teacher interviews, classroom observations, video and audio recordings of learners, and lesson transcripts, the starting point for this investigation is the concept of the assessment cycle. Taking a grounded approach, it traces different stages in the teacher assessment process and presents a working model for the analysis of teacher decision making in relation to assessment practices. At the same time, it identifies...

166 citations


Journal ArticleDOI
TL;DR: The authors examines language assessment from a critical perspective, defining critical in a manner similar to Pennycook (1999, 2001) and argues that alternative assessment, as distinct from testing, offers a partial response to the challenges presented by critical perspective on language assessment.
Abstract: This article examines language assessment from a critical perspective, defining critical in a manner similar to Pennycook (1999; 2001). I argue that alternative assessment, as distinct from testing, offers a partial response to the challenges presented by a critical perspective on language assessment. Shohamy’s (1997; 1999; 2001) critical language testing (CLT) is discussed as an adequate response to the critical challenge. Ultimately, I argue that important ethical questions, along with other issues of validity, will be articulated differently from a critical perspective than they are in the more traditional approach to language assessment.

156 citations


Journal ArticleDOI
TL;DR: The implementation of outcomes-based assessment and reporting systems in educational programs has been accompanied by a range of political and technical problems, including tensions between the sum of the sum as mentioned in this paper.
Abstract: The implementation of outcomes-based assessment and reporting systems in educational programs has been accompanied by a range of political and technical problems, including tensions between the sum...

141 citations


Journal ArticleDOI
TL;DR: This article examined differences between native and non-native EFL teachers' ratings of the English writing of Chinese university students and found no significant differences between the two groups in their scores for the 10 essays.
Abstract: This study examined differences between native and nonnative EFL (English as a Foreign Language) teachers’ ratings of the English writing of Chinese university students. I explored whether two groups of teachers -expatriates who typically speak English as their first language and ethnic Chinese with proficiency in English -gave similar scores to the same writing task and used the same criteria in their judgements. Forty-six teachers -23 Chinese and 23 English-background -rated 10 expository essays using a 10-point scale, then wrote and ranked three reasons for their ratings. I coded their reported reasons as positive or negative criteria under five major categories: general, content, organization, language and length. MANOVA showed no significant differences between the two groups in their scores for the 10 essays. Chi-square tests, however, showed that the English-background teachers attended more positively in their criteria to the content and language, whereas the Chinese teachers attended more negativ...

123 citations


Journal ArticleDOI
TL;DR: The authors take up some of the issues identified by Douglas (2000) as problematic for Language for Specific Purposes (LSP) testing, making reference to a number of performance-based instruments designed to assess the language proficiency of teachers or intending teachers.
Abstract: This article takes up some of the issues identified by Douglas (2000) as problematic for Language for Specific Purposes (LSP) testing, making reference to a number of performance-based instruments designed to assess the language proficiency of teachers or intending teachers. The instruments referred to include proficiency tests for teachers of Italian as a foreign language in Australia (Elder, 1994) and for trainee teachers using a foreign language (in this case English) as medium for teaching school subjects such as mathematics and science in Australian secondary schools (Elder, 1993b; Viete, 1998).The first problem addressed in the article has to do with specificity: how does one define the domain of teacher proficiency and is it distinguishable from other areas of professional competence or, indeed, from what is often referred to as ‘general’ language proficiency? The second problem has to do with the vexed issue of authenticity: what constitutes appropriate task design on a teacher-specific instrument...

102 citations


Journal ArticleDOI
TL;DR: This paper evaluated the characteristics of the Yes/No test as a measure for receptive vocabulary size in second language (L2) using a large corpus of data collected with French learners of Dutch.
Abstract: This article evaluates the characteristics of the Yes/No test as a measure for receptive vocabulary size in second language (L2). This evaluation was conducted both on theoretical grounds as well as on the basis of a large corpus of data collected with French learners of Dutch. The study focuses on the internal qualities of the format in comparison with other more classical test formats. The central issue of determining a meaningful test score is addressed by providing a theoretical framework distinguishing discrete from continuous models. Correction formulae based on the discrete approach are shown to differ when applied to the Yes/No test in comparison with Multiple Choice (MC) or True/False formats. Correction formulae based on the continuous approach take the response bias into account but certain underlying assumptions need to be validated. It is shown that both correction schemes display several shortcomings and that most of the data relative to the reliability of the Yes/No test presented in the li...

Journal ArticleDOI
TL;DR: The authors argued that LSP assessment criteria should be derived from an analysis of the target language use situation, using the concept of indigenous assessment criteria (Jacoby, 1998), defined as those used by subject specialists in assessing communicative performances of both novices and colleagues in academic, professional and vocational fields.
Abstract: Typically in assessment of Language for Specific Purposes (LSP), test content and methods are derived from an analysis of the target language use (TLU) situation. However, the criteria by which performances are judged are seldom derived from the same source. In this article, I argue that LSP assessment criteria should be derived from an analysis of the TLU situation, using the concept of indigenous assessment criteria (Jacoby, 1998). These criteria are defined as those used by subject specialists in assessing communicative performances of both novices and colleagues in academic, professional and vocational fields. Performance assessment practices are part of any professional culture, from formal, gatekeeping examination procedures, to informal, ongoing evaluation built into everyday interaction. I suggest a procedure for deriving assessment criteria from an analysis of the TLU situation and explore problems associated with doing so, recommending a ‘weak’ indigenous assessment hypothesis to assist in the d...

Journal ArticleDOI
TL;DR: The authors investigated differential item functioning in language proficiency tests in which test-takers with diverse backgrounds are involved, and found that DIF items pose a considerable risk of bias in the test.
Abstract: The investigation of differential item functioning (DIF) is crucial in language proficiency tests in which test-takers with diverse backgrounds are involved, because DIF items pose a considerable t...

Journal ArticleDOI
TL;DR: The authors discusses approaches to the standardized assessment of content knowledge for English language learners (ELLs), including testing in the student's first language, the use of test accommodations, and measuring growth in English as an alternative for accountability until student control of English is sufficient to assure validity of test scores.
Abstract: Within the context of accountability for US schools, standardized achievement tests are being used for increasingly `high stakes' decisions for all students including those for whom English is a second language, even when their English language skills are not adequate for the task. This article discusses approaches to the standardized assessment of content knowledge for English language learners (ELLs),1 including testing in the student's first language, the use of test accommodations, and measuring growth in English as an alternative for accountability until student control of English is sufficient to assure validity of test scores. Limitations of current research on the use of standardized content assessments with ELLs are presented and alternative approaches suggested.

Journal ArticleDOI
TL;DR: The Languages for Specific Purposes (LSP) project as mentioned in this paper has a long history, which includes earlier programmes such as, for example, German for chemists, phrase books for travellers and Latin for the religious.
Abstract: The Languages for Specific Purposes (LSP) project is not new. It may have dramatically extended its scope in the last 30 years through the expansion in numbers of students and trainees seeking admission to English medium courses, encouraging the diversification of ESP/EAP and EOP provision (Douglas, 2000), but the project has a longer history. This includes earlier programmes - such as, for example, German for chemists, phrase books for travellers and Latin for the religious - but also, of course, the pidginization of contact languages, representing an informal LSP. What formal LSP represents is a contract issued by group A for a designated share of group B’s language resource. This article discusses how far this practical activity, in particular the testing of LSP, is theoretically sound. Two types of theoretical justification have been appealed to, the linguistic principle of ‘-lect’ (thus dialect, sociolect, variety, register, genre), itself appealing to sociological views of role and status; and the e...

Journal ArticleDOI
TL;DR: This paper examined the theoretical and practical issues surrounding authenticity in course-based assessment, drawing on data from a university-level Japanese language course in Australia and found that the addition of an assessment dimension fundamentally changes the nature of a task and thus compromises authenticity.
Abstract: Authenticity is now firmly established as a central concern in test design and test validation (Bachman, 1990). However, there is disagreement about what authenticity is and about the degree of authenticity that can realistically be achieved. This article explores the theoretical and practical issues surrounding authenticity in course-based assessment, drawing on data from a university-level Japanese language course in Australia. It examines a teaching and assessment activity based around interviewing native speakers outside the classroom, which was designed to optimize authenticity. Using tapes that students made of the interview for assessment and retrospective interviews, the study examines various dimensions of authenticity and reveals a wide diversity in individual experiences. The article argues that the addition of an assessment dimension fundamentally changes the nature of a task, and thus compromises authenticity. It further suggests that authenticity must be viewed in terms of the implementation...

Journal ArticleDOI
TL;DR: The authors conducted interviews with 48 highly experienced instructors of ESL/EFL composition about their usual practices for writing assessment in courses in universities or immigrant settlement programs, and found that their conceptualizations of student assessment varied depending on whether the courses they taught were defined in reference to general or specific purposes for learning English.
Abstract: A fundamental difference emerged between specific and general purposes for language assessment in the process of my interviewing 48 highly experienced instructors of ESL/EFL composition about their usual practices for writing assessment in courses in universities or immigrant settlement programs. The instructors worked in situations where English is either the majority language (Australia, Canada, New Zealand) or an international language (Hong Kong, Japan, Thailand). Although the instructors tended to conceptualize ESL/EFL writing instruction in common ways overall, I was surprised to find how their conceptualizations of student assessment varied depending on whether the courses they taught were defined in reference to general or specific purposes for learning English. Conceptualizing ESL/EFL writing for specific purposes (e.g., in reference to particular academic disciplines or employment domains) provided clear rationales for selecting tasks for assessment and specifying standards for achievement; but ...

Journal ArticleDOI
TL;DR: The authors described a working model used to determine the target language use (TLU) in a Language for Specific Purpose (LSP) test project: The Listening Summary Transl...
Abstract: This article describes a working model used to determine the Target Language Use (TLU) (Bachman and Palmer, 1996) in a Language for Specific Purpose (LSP) test project: The Listening Summary Transl...

Journal ArticleDOI
TL;DR: This issue of Language Testing focuses on an area that continues to fascinate and trouble many of us: assessing language for specific purposes (LSP) as mentioned in this paper, and five articles in this issue present a range of challenging issues and findings that will inform our understanding of LSP assessment, but also further problematize it.
Abstract: This issue ofLanguage Testingfocuses on an area that continues to fascinate and trouble many of us: assessing language for specific purposes (LSP). The five articles in this issue present a range of challenging issues and findings that will inform our understanding of LSP assessment, but also further problematize it. This is no bad thing as English spreads wider still around the world and more and more users see English as a wholly instrumental skill in their lives. The issues apply, of course, equally to languages other than English. Davies questions the English for Specific Purposes (and by extension, the LSP) enterprise on both practical and theoretical grounds. Davies rejects the notion of considering Specific Purposes (SPs) as registers alone, agreeing with Widdowson and others that SPs are characterized by their communicative natures. As he points out, with communication in mind, we are in the territory of discourse and therefore of blurred boundaries. This is an issue which is also taken up by Elder and Cumming in this issue. Davies’ critique of the IELTS development, and of the validation study of the original ELTS, leads him to question the influence that SP factors (linguistic or discoursal) have in test-taker performances relative to other factors, such as mastery of the language core (grammar, core lexis, etc.). Davies concludes that, to date, SP tests can only be judged on the pragmatic criterion of test usefulness. In her article, Elder uses three language tests for teachers to examine the problem of indeterminacy in LSP tests, in the process of which she calls into question the notion that any test has an inherent claim to validity by virtue of its SP-ness. The tests illustrate some of the diversity of issues that may arise when we consider the language ability needed by both subject teachers and foreign language teachers.

Journal ArticleDOI
TL;DR: The origins of this special issue of Language Testing initially lay in an invitation to convene a colloquium on alternative assessment at the American Association for Applied Linguistics conference in Vancouver, in March 2000 as mentioned in this paper.
Abstract: The origins of this special issue of Language Testing initially lay in an invitation to convene a colloquium on alternative assessment at the American Association for Applied Linguistics conference in Vancouver, in March 2000. ‘Alternative assessment’ was a term I understood as referring to a movement, particularly in school contexts in the USA, away from the use of standardized multiple-choice tests in favour of more complex performance-based assessments. I decided to interpret the invitation more broadly as a brief to provide a forum for work which was providing alternatives or challenges to the current mainstream in language testing research both at the level of theory and at the level of practice. Over the years I have become more sympathetic to the viewpoint that too much language testing research is about high-stakes proficiency testing, ignoring classroom contexts, and focusing on the use of technically sophisticated quantitative methods to improve the quality of tests at the expense of methods more accessible to non-experts. My interest in matters outside the mainstream of language testing research (within which I have of course been, along with my colleagues in Melbourne, a wholehearted participant) has been further piqued by the ongoing intellectual critique of much work in applied linguistics from a socially critical or postmodern perspective; language testing research seems an obvious target for such critical reflection. The presence in my department of my colleagues Alastair Pennycook and Brian Lynch, and my ongoing exchanges with my friend Elana Shohamy, hastened the growth of this interest. The net result of this changed awareness on my part was a symposium featuring papers, here supplemented by two others, that were rather diverse in scope but which had either or both of the following characteristics:

Journal ArticleDOI
TL;DR: In this paper, the Cattell-Horn theory predicts that nonverbal abilities should correlate equally with primary and non-primary skills throughout the course of development and Gardner's theory predicts nonsignificant correlations.
Abstract: General sign theory per Oller et al (2000a, 2000b) predicts that, to the extent that valid measurements are possible, nonverbal abilities should correlate positively with primary language abilities (Hypothesis 1) Further, nonverbal abilities of persons in the early stages of acquiring a nonprimary language should correlate significantly more positively with proficiencies in their primary language than in their nonprimary language (Hypothesis 2(a)); but as persons approach parity between their primary and any nonprimary language, correlations between nonverbal scores and proficiencies in the two languages should both be positive and not significantly different (Hypothesis 2(b)) The Cattell-Horn theory predicts that nonverbal abilities should correlate equally with primary and nonprimary skills throughout the course of development Gardner’s theory predicts nonsignificant correlations Hypotheses 1, 2(a) and 2(b) are examined in within-subjects, repeated measures designs Study 1 examines 50 children acq

Journal ArticleDOI
TL;DR: Chalhoub-Deville, M. as mentioned in this paper and Read, J.C. 2000: Assessing languages for specific purposes. Cambridge: Cambridge University Press, 49−78.
Abstract: APA (American Psychological Association)1994:APA publication manual. Washington DC: APA. Alderson, J.C. 1999: Reading constructs and reading assessment. In Chalhoub-Deville, M., editor,Issues in computer-adaptive testing of reading proficiency. New York: Cambridge University Press, 49–78. Bachman, L.F., andPalmer, A. 1996: Language testing in practice . New York: Oxford University Press. Chalhoub-Deville, M. in press: Technology in standardized language assessment. In Kaplan, R., editor, Handbook of applied linguistics . Oxford: Oxford University Press. Douglas, D.2000: Assessing languages for specific purposes . Cambridge: Cambridge University Press. Grabe, W. 1999: Developments in reading research and their implications for computer-adaptive reading assessment. In Chalhoub-Deville, M., editor,Issues in computer-adaptive testing of reading proficiency . New York: Cambridge University Press, 11–48. Read, J. 2000: Assessing vocabulary . Cambridge: Cambridge University Press. Micheline Chalhoub-Deville University of Iowa


Journal ArticleDOI
TL;DR: The introduction to language study is presented in the book "Introductions to Language study" as discussed by the authors, which provides a review of the most important concepts, current approaches, methods and issues in language testing.
Abstract: Language testing is one of the seven current books in the Oxford series entitled ‘Introductions to language study’. As such, it follows the same pattern as all the books in the series. It includes four sections. The first section, Survey, consists of eight chapters, and provides a review of the most important concepts, current approaches, methods and issues in language testing. The book’s three other sections are: Readings, References and Glossary. As the series editor remarks, the purpose of this book is to give readers the ‘big picture’, and gradually introduce them to linguistics and the complex ideas in the specific area of language testing. Language testing serves this purpose well. The book leads the reader from a definition and history of language testing in Chapters 1 and 2, through the cycle, methods and techniques of language testing in Chapters 3 to 6, to some of the current issues in the field in Chapters 7 and 8. Because of its design, the book is succinct, easy to read and user-friendly. Yet it also stimulates readers’ thinking about issues in language testing. The examples used by the author are also fairly universal (driving tests) or classic (Homer). Readers from a wide variety of backgrounds would be able to relate to these examples and understand the concepts being elucidated through them. Chapter 1 provides an excellent introduction to the book. It defines language testing, discusses types of tests, different test purposes and the criteria for making inferences about an examinee’s linguistic ability. It provides a model that describes the relationship between a test and the criteria that a language test is attempting to measure. Chapter 2 talks about the history of language testing. In particular, it describes the different ‘schools of thought’ that have evolved: discrete point, integrative and pragmatic, and communicative language testing. Chapter 3 describes the cyclical process through which a language test is conceptualized, developed and operationalized. Chapter 4 discusses the concepts and methods in assessing examinees’ performance, specifically with the use of human raters. It also talks about ways to avoid unfair ratings through rater training and the use of accurately worded rating scales. Chapter 5 talks about validation of language tests. It defines validity and describes the different approaches to validity: face, content,