scispace - formally typeset
Search or ask a question

Showing papers in "International Journal of Lexicography in 2008"


Journal ArticleDOI
Patrick Hanks1
TL;DR: The authors suggest that a synthesis between Sinclairian corpus linguistics and construction grammar is overdue, and suggest that such a synthesis can help students to write and speak more idiomatically, using evidence of contemporary usage, not literary citations.
Abstract: John Sinclair opened up possibilities for new kinds of dictionaries. He assigned a central role to collocations and phraseology, insisting on close attention to textual evidence coupled with a broad theoretical perspective and ruthless jettisoning of hypotheses that do not fit the facts. He aimed to create dictionaries that would help students to write and speak idiomatically. In the tradition of Dr Johnson and OED, these would be based on evidence rather than speculation, but evidence of contemporary usage, not literary citations. In this paper, I look at some possibilities inspired by this approach. I suggest that a synthesis between Sinclairian corpus linguistics and construction grammar is overdue. © 2008 Oxford University Press. All rights reserved.

32 citations


Journal ArticleDOI
TL;DR: A corpus investigation of the lexis of the initial sentences of newspaper reports shows that naturalness in text is not just a product of the choice of collocations and grammatical pattern but depends with equal force on the choices of lexis primed for the appropriate textual position.
Abstract: The relationship between lexis and grammatical pattern is generally recognized to be more complex than was once thought; this complexity includes the association of word sense with favoured or dispreferred syntactic patterns. Modern learner dictionaries accordingly contain a great deal of information about the grammatical contexts in which words characteristically appear. Recent work on lexical priming theory suggests that there is an equally complex relationship between lexis and textual position. This paper reports a corpus investigation of the lexis of the initial sentences of newspaper reports. We show that naturalness in text is not just a product of the choice of collocations and grammatical pattern but depends with equal force on the choice of lexis primed for the appropriate textual position.

27 citations


Journal ArticleDOI
TL;DR: This paper discusses three important aspects of John Sinclair’s legacy: the corpus, lexicography, and the notion of ‘corpus-driven’.
Abstract: This paper discusses three important aspects of John Sinclair’s legacy: the corpus, lexicography, and the notion of ‘corpus-driven’. The corpus represents his concern with the nature of linguistic evidence. Lexicography is for him the canonical mode of language description at the lexical level. And his belief that the corpus should ‘drive’ the description is reflected in his constant attempts to utilize the emergent computer technologies to automate the initial stages of analysis and defer the intuitive, interpretative contributions of linguists to increasingly later stages in the process. Sinclair’s model of corpus-driven lexicography has spread far beyond its initial implementation at Cobuild, to most EFL dictionaries, to native-speaker dictionaries (e.g. the New Oxford Dictionary of English, and many national language dictionaries in emerging or re-emerging speech communities) and bilingual dictionaries (e.g. Collins, Oxford-Hachette).

24 citations


Journal ArticleDOI
TL;DR: This paper proposed a corpus-driven model of lexis as phraseology, with meaning being created and interpreted through cotext, which was partially realized lexicographically in the first Cobuild dictionary.
Abstract: John Sinclair wrote extensively on lexis, setting out a corpus-driven model of lexis as phraseology, with meaning being created and interpreted through cotext. This model was partially realized lexicographically in the first Cobuild dictionary. As Sinclair's phraseological approach to meaning evolved, further implications for a new kind of lexicography became apparent.

21 citations


Journal ArticleDOI
TL;DR: It is proposed that exemplification in learners’ dictionaries should vary according to the word's frequency of usage, the word’s collocational and syntactic complexities, and the user's needs and look-up preference.
Abstract: Exemplification in learners’ dictionaries is affected by such variables as the word frequency, part-of-speech and markedness of vocabulary. This article statistically examines the practice of the ‘Big Five’ in allocation of examples to different types of words. The results indicate that high-frequency words are generally exemplified, that prepositions, pronouns, conjunctions and adjectives are usually illustrated with more examples than other parts-of-speech, and that words marked for a particular style or attitude are sometimes provided with an example to show their pragmatic aspects. After a critical evaluation of the practice of the ‘Big Five’, the article proposes that exemplification in learners’ dictionaries should vary according to the word's frequency of usage, the word's collocational and syntactic complexities, and the user's needs and look-up preference.

12 citations


Journal ArticleDOI
TL;DR: A set of 12 specialised Estonian-Russian dictionaries for Russian schools, motivated by the socio-cultural context in Estonia that favours Russian-speaking people learning Estonian, are described.
Abstract: The paper describes a set of 12 specialised Estonian-Russian dictionaries for Russian schools, motivated by the socio-cultural context in Estonia that favours Russian-speaking people learning Estonian. The dictionaries are of L1-L1-L2 type and include terms, their main inflectional forms, explanations and the Russian translation of the term. To make the dictionaries as comprehensible as possible, the rules for clear writing were followed. Natural language processing tools were used to facilitate the work of the dictionary compilers by providing feedback on the vocabulary they use and by automatically generating the inflectional forms and asterisks for referencing terms in the explanations. The dictionaries were printed in paper format and made available online free of charge.

12 citations


Journal ArticleDOI
TL;DR: In this article it is shown how Sinclair's revolutionary insights are being adopted and developed in the production of bilingual dictionaries for Bantu languages.
Abstract: John Sinclair's impact on lexicography in English as well as his pioneering work in corpus linguistics is well known. What is less widely known is his impact on dictionary making for languages other than English. In this article it is shown how Sinclair's revolutionary insights are being adopted and developed in the production of bilingual dictionaries for Bantu languages. This work has proceeded from a Ciluba-Dutch learner's dictionary ten years ago to an online Swahili-English work and a Northern Sotho-English school dictionary. The latter has features that transcend the monolingual level, as corpus-based analyses in different languages have to be mapped onto one another. New questions arise as a result, which focus on the need to show idiomatic bilingual example sentences. A frequency-based approach to lexical and grammatical gaps is adopted, with a seamlessly integrated ‘corpus-based dictionary mini-grammar’. Not all problems have been solved, but the compilers find time and again that analysis of real data provides insights unavailable in an ‘armchair-linguistics’ approach. It is exciting to join those riding the wave that was set in motion by Sinclair.

12 citations


Journal ArticleDOI
Kenneth Church1
TL;DR: The term approximate lexicography was introduced by Grefenstette (1998) as a promising way forward between warring factions in linguistics (Chomsky 1957) and engineering (Brown et al. 1992) as mentioned in this paper.
Abstract: The term ‘approximate lexicography’ was introduced by Grefenstette (1998) as a promising way forward—a third way—between warring factions in linguistics (Chomsky 1957) and engineering (Brown et al. 1992). Like most compromises, this one is not perfect, but not bad. Grefenstette attributed approximate lexicography to several people, including Adam Kilgarriff, Jeremy Clear and myself, but in fact, my generation came to this position only after years of persuasive arguments by John Sinclair.

11 citations


Journal ArticleDOI
TL;DR: The inherent tension between corpus data and linguistic theory that aims to model it is explored, with particular reference to the dynamic and variable nature of the lexicon, presenting itself as a sequence of conflicting stages of discovery.
Abstract: In this paper, we explore the inherent tension between corpus data and linguistic theory that aims to model it, with particular reference to the dynamic and variable nature of the lexicon. We explore the process through which modeling of the data is accomplished, presenting itself as a sequence of conflicting stages of discovery. First-stage data analysis informs the model, whereas the seeming chaos of organic data inevitably violates our theoretical assumptions. But in the end, it is restrictions apparent in the data that call for postulating structure within a revised theoretical model. We show the complete cycle using two case studies and discuss the implications.

9 citations


Journal ArticleDOI
TL;DR: In this paper, a 1-million word corpus comprising advertising descriptions, gaming reviews, and discussion forums among participants was used to refine the methodology for discovering lexical units of meaning for online games.
Abstract: Further to the idea suggested in Ooi (2000) that Sinclair's most recent lexical model can be considered for examining linguistic phenomena on the Web such as electronic gaming, this paper proposes to refine the methodology for discovering lexical units of meaning for this popular online genre. The evidence comes from an approximately 1-million word corpus comprising advertising descriptions, gaming reviews, and discussion forums among gaming participants. This dataset is first subjected to an integrated corpus linguistic tool, WMatrix, which affords word frequency profiles, concordances, part-of-speech annotation and semantic content analyses. Illustrative lexemes from this genre are derived from an inspection of the concordances using Sinclair's model and compared with some popular online dictionaries for their respective range of coverage.

9 citations


Journal ArticleDOI
TL;DR: Was Abel Boyer's labelling system truly innovative?
Abstract: When Abel Boyer published his Royal Dictionary. In Two Parts. First, French and English. Secondly, English and French in 1699, labelling practices in dictionaries were not yet widespread and, when labels were used, they were not standardized. A review of 17th-century bilingual French-English dictionaries reveals an embryonic practice. Abel Boyer, however, introduced a set of usage labels in his dictionary that appear to be new and relatively well-organized for the time. In his dictionary, Boyer used four types of labels: typographical symbols, abbreviations, a combination of an abbreviation and a typographical symbol, and explicit textual annotations. The typographical symbols and abbreviations used are the same for both the French-English part and the English-French part. Only the textual annotations vary depending on the language. Boyer's labelling system included subject-field labels, stylistic labels, temporal labels, sociolinguistic labels and connotative labels. Was Boyer's labelling system truly innovative? How did Boyer structure his system? What method did he employ in his dictionary to show how readers should use words? For what parts of speech did he provide usage labels? Did he treat both parts of his dictionary in the same way? These are the questions that this paper attempts to answer.


Journal ArticleDOI
Robert Dixon1
TL;DR: This paper provided a chronological account of how English dictionaries have dealt with the commonest loans -kangaroo, boomerang, koala, dingo, wombat and a few more.
Abstract: Over 400 words have been borrowed from the Aboriginal languages of Australia into Australian English, some into other varieties of English and thence into other languages. A chronological account is provided of how English dictionaries have dealt with the commonest loans - kangaroo, boomerang, koala, dingo, wombat and a few more. There is comparison with the way in which loans from American and African languages were treated. Although there were ca 250 distinct indigenous languages in Australia, words taken from them were marked just as 'Aboriginal' or 'native Australian' until the publication of the second edition of the unabridged Random House Dictionary in 1987, of The Australian National Dictionary in 1988 and of Australian Words in English, their Origin and Meaning in 1990. The paper concludes with a survey of the ways in which other dictionaries have dealt with the newly-made-available etymologies.



Journal ArticleDOI
TL;DR: This paper shows how Sinclair's contribution through work on translation equivalence led on to the series of highly innovative bilingual ‘Bridge dictionaries' and tries to gauge his wider influence in the field of bilingual lexicography.
Abstract: Combining direct contribution and wider influence, John Sinclair stimulated a massive sea change in lexicography with the introduction of large reference corpora as the principal source of data. His major direct contribution was through the use of corpora in monolingual learners' dictionaries, but his interest in language inevitably led him on to look at multilingual matters; even though he was never directly involved in traditional bilingual lexicography. This paper shows how Sinclair's contribution through work on translation equivalence led on to the series of highly innovative bilingual ‘Bridge dictionaries'. It also tries to gauge his wider influence in the field of bilingual lexicography.