scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The planning and production of a corpus of Yilumbu: Progress of the building stage

01 Apr 2020-South African journal of african languages (Routledge)-Vol. 40, Iss: 1, pp 98-105
TL;DR: The present research will only give an account of the dictionary conceptualisation plan as well as future work of the Yilumbu corpus.
Abstract: In this article, an account is given of the planning and production of a corpus of Yilumbu. The primary focus of the article is on the presentation of the project background. Then the article also ...
Citations
More filters
References
More filters
Book ChapterDOI
13 Feb 2005
TL;DR: Evaluation on 7-lingual corpora and bilingual corpora show that the quality of classification is comparable to supervised approaches and works almost error-free from 100 sentences per language on.
Abstract: This work presents an unsupervised solution to language identification. The method sorts multilingual text corpora on the basis of sentences into the different languages that are contained and makes no assumptions on the number or size of the monolingual fractions. Evaluation on 7-lingual corpora and bilingual corpora show that the quality of classification is comparable to supervised approaches and works almost error-free from 100 sentences per language on.

126 citations


"The planning and production of a co..." refers background in this paper

  • ...language identification at sentence basis (Biemann and Teresniak, 2005); f....

    [...]

  • ...…ill-formed sentences based on handwritten regular expressions (Eckart, Quasthoff and Goldhahn, 2012); e. language identification at sentence basis (Biemann and Teresniak, 2005); f. duplicate sentence removal; g. tokenisation and word co-occurrence calculation; and h. the corpora are stored as…...

    [...]

Journal Article
TL;DR: In this article, the authors present a comprehensive theoretical conspectus of electronic corpora for the African languages, followed by a practical exploration of these corpora in the context of African linguistics.
Abstract: Compiling and querying electronic corpora has become a sine qua non as an empirical basis for contemporary linguistic research. As a result, around the world, corpus applications now abound in all fields of linguistics. In this article it is argued that, if African linguistics is to take its rightful place in the new millennium, the active compilation, querying and application of corpora should become an absolute priority. The article first presents a comprehensive theoretical conspectus of electronic corpora. This theoretical section is followed by a practical exploration for the African languages. To that end, two very different African-language corpus projects are described in detail. The survey of these two projects, combined to inter-African-language comparisons, are deemed to be sufficient proof of the feasibility of establishing a discipline of corpus linguistics for the African languages at present. (S/ern Af Linguistics & Applied Language Stud: 2000 18(1-4): 89-106)

77 citations


"The planning and production of a co..." refers methods in this paper

  • ...According to Prinsloo (2000), when it comes to corpus compilation, there are three steps to be considered, namely: 1) corpus design; 2) text collection; and 3) text encoding....

    [...]

BookDOI
17 Jan 2010
TL;DR: There is a need and a market for better specialised dictionaries for learners; we need a sound theoretical framework for coping with known and unknown challenges (for example the Internet) in the realm of pedagogical specialised lexicography.
Abstract: This book defends two main ideas: there is a need and a market for better specialised dictionaries for learners; we need a sound theoretical framework for coping with known and unknown challenges (for example the Internet) in the realm of pedagogical specialised lexicography. Both themes were Enrique Alcaraz's driving force during his life. Hence, his memory deserves this book that has been written by leading scholars in the field - they have compiled more than 70 dictionaries and published hundreds of books and articles on the topics here discussed - although only two of them knew him in person.

41 citations

Dissertation
01 Mar 2010
TL;DR: It is suggested that the use of ellipsoidal pronouns in this chapter should be considered as a separate entity, rather than a single word, for the purposes of this chapter.
Abstract: iii Opsomming vi Acknowledgements x Abbreviations xii CHAPTER 1: GE(ERAL I(TRODUCTIO( 1.

16 citations


Additional excerpts

  • ...The second type of texts refers to pedagogical and scientific contributions (Blanchon, 1984; Emejulu and Pambou-Loueya, 1990; Saphou-Bivigat, 2000; 2010; Mavoungou, 2002a; 2002b; 2002c; 2005; 2006; 2008; 2009; 2010a; 2010b; 2010c; 2011; 2012; Mboumba, 2009; Mavoungou and Plumel, 2010;…...

    [...]