scispace - formally typeset
Search or ask a question
JournalISSN: 0929-6174

Journal of Quantitative Linguistics 

Routledge
About: Journal of Quantitative Linguistics is an academic journal published by Routledge. The journal publishes majorly in the area(s): Zipf's law & Quantitative linguistics. It has an ISSN identifier of 0929-6174. Over the lifetime, 559 publications have been published receiving 7627 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The Simple Good–Turing estimator is defined, which is straightforward to use and performs well, absolutely and relative both to the approaches just discussed and to other, more sophisticated techniques.
Abstract: Linguists and speech researchers who use statistical methods often need to estimate the frequency of some type of item in a population containing items of various types. A common approach is to divide the number of cases observed in a sample by the size of the sample; sometimes small positive quantities are added to divisor and dividend in order to avoid zero estimates for types missing from the sample. These approaches are obvious and simple, but they lack principled justification, and yield estimates that can be wildly inaccurate. I.J. Good and Alan Turing developed a family of theoretically well-founded techniques appropriate to this domain. Some versions of the Good–Turing approach are very demanding computationally, but we define a version, the Simple Good–Turing estimator, which is straightforward to use. Tested on a variety of natural-language-related data sets, the Simple Good–Turing estimator performs well, absolutely and relative both to the approaches just discussed and to other, more sophisticated techniques.

317 citations

Journal ArticleDOI
TL;DR: An algorithm for rapidly computing type–token ratio through a moving window that is independent of text length is presented, and it is demonstrated that this measurement can detect changes within a text as well as differences between texts.
Abstract: Type–token ratio (TTR), or vocabulary size divided by text length (V/N), is a time-honoured but unsatisfactory measure of lexical diversity. The problem is that the TTR of a text sample is affected by its length. We present an algorithm for rapidly computing TTR through a moving window that is independent of text length, and we demonstrate that this measurement can detect changes within a text as well as differences between texts.

260 citations

Journal ArticleDOI
TL;DR: It is made evident that word frequency as a function of the rank follows two different exponents, ˜(-)1 for the first regime and ™(-)2 for the second.
Abstract: Zipf’s law states that the frequency of a word is a power function of its rank. The exponent of the power is usually accepted to be close to (-)1. Great deviations between the predicted and real nu...

235 citations

Journal ArticleDOI
TL;DR: It is demonstrated that optimal methods are based on continuity-corrected versions of the Wilson interval or Yates’ test, and that commonly-held beliefs about weaknesses of tests are misleading.
Abstract: Many statistical methods rely on an underlying mathematical model of probability based on a simple approximation, one that is simultaneously well-known and yet frequently misunderstood. The Normal approximation to the Binomial distribution underpins a range of statistical tests and methods, including the calculation of accurate confidence intervals, performing goodness of fit and contingency tests, line- and model-fitting, and computational methods based upon these. A common mistake is in assuming that, since the probable distribution of error about the “true value” in the population is approximately Normally distributed, the same can be said for the error about an observation. This paper is divided into two parts: fundamentals and evaluation. First, we examine the estimation of confidence intervals using three initial approaches: the “Wald” (Normal) interval, the Wilson score interval and the “exact” Clopper-Pearson Binomial interval. Whereas the first two can be calculated directly from formula...

231 citations

Journal ArticleDOI
TL;DR: Evaluation in Media Discourse Analysis of a Newspaper Corpus: Evaluation or the linguistic expression of a newspaper Corpus.
Abstract: Monika Bednarek, Evaluation in Media Discourse Analysis of a Newspaper Corpus. London/New York: Continuum, 2006. ISBN: 9780826491268 (hbk), pp. xvi + 253. Evaluation or the linguistic expression of...

155 citations

Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
20236
20228
202126
202030
201921
201820