scispace - formally typeset
Search or ask a question
JournalISSN: 1384-6655

International Journal of Corpus Linguistics 

John Benjamins Publishing Company
About: International Journal of Corpus Linguistics is an academic journal published by John Benjamins Publishing Company. The journal publishes majorly in the area(s): Corpus linguistics & Computer science. It has an ISSN identifier of 1384-6655. Over the lifetime, 589 publications have been published receiving 16679 citations. The journal is also known as: IJCL.


Papers
More filters
Journal ArticleDOI
TL;DR: An extension of collocational analysis that takes into account grammatical structure and is specifically geared to investigating the interaction of lexemes and the grammatical constructions associated with them is introduced.
Abstract: This paper introduces an extension of collocational analysis that takes into account grammatical structure and is specifically geared to investigating the interaction of lexemes and the grammatical constructions associated with them. The method is framed in a construction-based approach to language, i.e. it assumes that grammar consists of signs (form-meaning pairs) and is thus not fundamentally different from the lexicon. The method is applied to linguistic expressions at various levels of abstraction (words, semi-fixed phrases, argument structures, tense, aspect and mood). The method has two main applications: first, to increase the adequacy of grammatical description by providing an objective way of identifying the meaning of a grammatical construction and determining the degree to which particular slots in it prefer or are restricted to a particular set of lexemes; second, to provide data for linguistic theory-building.

822 citations

Journal ArticleDOI
TL;DR: The system takes a written language sample as input and produces fourteen indices of syntactic complexity of the sample based on these measures, which are designed with advanced second language proficiency research in mind and developed and evaluated using college-level second language writing data from the Written English Corpus of Chinese Learners.
Abstract: We describe a computational system for automatic analysis of syntactic complexity in second language writing using fourteen different measures that have been explored or proposed in studies of second language development. The system takes a written language sample as input and produces fourteen indices of syntactic complexity of the sample based on these measures. The system is designed with advanced second language proficiency research in mind, and is therefore developed and evaluated using college-level second language writing data from the Written English Corpus of Chinese Learners (Wen et al. 2005). Experimental results show that the system achieves very high reliability on unseen test data from the corpus. We illustrate how the system is used in an example application to investigate whether and to what extent each of these measures significantly differentiate between different proficiency levels

648 citations

Journal ArticleDOI
TL;DR: The authors introduced an extension of distinctive-collocate analysis that takes into account grammatical structure and is specifically geared to investigating pairs of semantically similar grammatical constructions and the lexemes that occur in them.
Abstract: This paper introduces an extension of distinctive-collocate analysis that takes into account grammatical structure and is specifically geared to investigating pairs of semantically similar grammatical constructions and the lexemes that occur in them. The method, referred to as `distinctive-collexeme analysis', identifies lexemes that exhibit a strong preference for one member of the pair as opposed to the other, and thus makes it possible to identify subtle distributional differences between the members of such a pair. The method can be applied in the context of what is sometimes referred to as `grammatical alternation' (e.g. the dative alternation), but it can also be applied to other choices provided by the grammar (such as the two future tense constructions in English). The method has two main applications. First, it can reveal subtle differences between seemingly synonymous constructions, many of which are difficult to identify on the basis of more traditional approaches. Second, it can be used to investigate the very notion of `alternation'; we show that many alternations are much more restricted than has hitherto been assumed, and thus confirm the claims of recent, non-derivational views of grammar.

614 citations

Journal ArticleDOI
Paul Rayson1
TL;DR: The combination of the key words and key domains methods is shown to allow macroscopic analysis to inform the microscopic level (focussing on the use of a particular linguistic feature) and thereby suggesting those linguistic features which should be investigated further.
Abstract: This paper reports the extension of the key words method for the comparison of corpora. Using automatic tagging software that assigns part-of-speech and semantic field (domain) tags, a method is described which permits the extraction of key domains by applying the keyness calculation to tag frequency lists. The combination of the key words and key domains methods is shown to allow macroscopic analysis (the study of the characteristics of whole texts or varieties of language) to inform the microscopic level (focussing on the use of a particular linguistic feature) and thereby suggesting those linguistic features which should be investigated further. The resulting 'data-driven' approach presented here combines elements of both the 'corpus-based' and 'corpus-driven' paradigms in corpus linguistics. A web-based tool, Wmatrix, implementing the proposed method is applied in a case study: the comparison of UK 2001 general election manifestos of the Labour and Liberal Democratic parties.

497 citations

Journal ArticleDOI
TL;DR: The Corpus of Contemporary American English (COCA) as mentioned in this paper is the first large and diverse corpus of American English, which contains more than 385 million words from 1990 to 2008 (20 million words each year).
Abstract: The Corpus of Contemporary American English (COCA), which was released online in early 2008, is the first large and diverse corpus of American English. In this paper, we first discuss the design of the corpus — which contains more than 385 million words from 1990–2008 (20 million words each year), balanced between spoken, fiction, popular magazines, newspapers, and academic journals. We also discuss the unique relational databases architecture, which allows for a wide range of queries that are not available (or are quite difficult) with other architectures and interfaces. To conclude, we consider insights from the corpus on a number of cases of genre-based variation and recent linguistic variation, including an extended analysis of phrasal verbs in contemporary American English.

412 citations

Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
202314
202243
202127
202025
201923
201824