scispace - formally typeset
Search or ask a question

Showing papers by "Tony McEnery published in 2014"


Proceedings Article
01 May 2014
TL;DR: SubCAT, the Sublanguage Corpus Analysis Toolkit, assesses the representativeness and closure properties of corpora to analyze the extent to which they are either sublanguages, or representative samples of the general language.
Abstract: Sublanguages are varieties of language that form "subsets" of the general language, typically exhibiting particular types of lexical, semantic, and other restrictions and deviance. SubCAT, the Sublanguage Corpus Analysis Toolkit, assesses the representativeness and closure properties of corpora to analyze the extent to which they are either sublanguages, or representative samples of the general language. The current version of SubCAT contains scripts and applications for assessing lexical closure, morphological closure, sentence type closure, over-represented words, and syntactic deviance. Its operation is illustrated with three case studies concerning scientific journal articles, patents, and clinical records. Materials from two language families are analyzed-English (Germanic), and Bulgarian (Slavic). The software is available at sublanguage.sourceforge.net under a liberal Open Source license.

13 citations



31 May 2014
TL;DR: The authors survey progress in English corpus linguistics (henceforth ECL) and contrast that with the development of Chinese corpora linguistics, highlighting the relative state of development of the two, highlighting areas where the development may present a profitable avenue of development to the other, while also identifying challenges which are unique to either, while considering also areas where generalising from one to another may lead to problems.
Abstract: In this paper we will survey progress in English corpus linguistics (henceforth ECL) and contrast that with the development of Chinese corpus linguistics (henceforth CCL). In doing so we will be comparing the relative state of development of the two, highlighting areas where the development of one may present a profitable avenue of development to the other. We will also identify challenges which are unique to either, while considering also areas where generalising from one to the other may lead to problems. Such a review could become unwieldy if every area of linguistics were to be reviewed. While in passing some areas of linguistic theory will be considered, this review will focus principally upon language description and applied linguistics, areas where, arguably, corpus linguistics has made its greatest impact to date. The areas considered by this paper are as follows: lexicography, descriptive grammars and learner corpora. In each case we will review the development of these areas in ECL, outlining key debates and changes brought about by the application of corpora in the area in question. In all cases we will take a broadly historical approach, tracing the impact of corpora upon the research area by identifying key points in the development of ECL as well as key researchers who brought about that development. We will then contrast the ECL approach to that area with development to date in the area by CCL. We will then consider the contrast between the two.

1 citations