scispace - formally typeset
Open AccessJournal ArticleDOI

A rationale for an asymptotic lognormal form of word‐frequency distributions1

John B. Carroll
- 01 Dec 1969 - 
- Vol. 1969, Iss: 2, pp 94
Reads0
Chats0
TLDR
The lognormal distribution has been found to fit word-frequency distributions satisfactorily if account is taken of the relations between populations and samples as discussed by the authors, and a rationale for an asymptotic Lognormal Distribution is derived by supposing that the probabilities at the nodes of decision trees are symmetrically distributed around.5 with a certain variance.
Abstract
The lognormal distribution has been found to fit word-frequency distributions satisfactorily if account is taken of the relations between populations and samples. A rationale for an asymptotic lognormal distribution is derived by supposing that the probabilities at the nodes of decision trees are symmetrically distributed around .5 with a certain variance. By the central limit theorem, the logarithms of the continued products of probabilities randomly sampled from such a distribution would have an asymptotically normal distribution. Two mathematical models incorporating this notion are developed and tested; in one, the number of factors in the continued products is assumed to be fixed, while in the other, that number is dependent upon a Poisson distribution. Psycholinguistic processes corresponding to these models are postulated and illustrated with reference to two sets of data: (1) word associations to the stimulus LIGHT, and (2) the Lorge Magazine Count. Reasonable fits to observed data or to underlying lognormal distributions are obtained but there remain certain problems in estimating parameters.

read more

Citations
More filters
Journal ArticleDOI

Zipf's word frequency law in natural language: a critical review and future directions.

TL;DR: It is shown that human language has a highly complex, reliable structure in the frequency distribution over and above Zipf’s law, although prior data visualization methods have obscured this fact.
Journal ArticleDOI

Statistical Models for Word Frequency Distributions: A Linguistic Evaluation

TL;DR: Application of these models to frequency distributions of a text, a corpus and morphological data reveals that no model can lay claim to exclusive validity, while inspection of the extrapolated theoretical vocabulary sizes raises doubts as to whether the urn scheme with independent trials is the correct underlying model for word frequency data.

Comparison of Human and Latent Semantic Analysis (LSA) Judgements of Pairwise Document Similarities for a News Corpus

TL;DR: Pairwise similarity judgement correlations between humans and Latent Semantic Analysis (LSA) were explored on a set of 50 news documents indicating the importance of correct settings and the low maximum correlation indicates that information presentation schemes based on LSA may often be at variance with visualisations based on human decisions even using the best settings for a data set.

Customizable Segmentation of Morphologically Derived Words in Chinese

TL;DR: This paper presents a system that can be conveniently customized to meet various user-defined standards in the segmentation of morphologically derived words (MDWs).

Extension of Zipf's Law to Word and Character N-grams for English and Chinese

TL;DR: It is shown that for a large corpus, Zipf 's law for both words in English and characters in Chinese does not hold for all ranks, but when single words or characters are combined together with n-gram Words or characters in one list and put in order of frequency, the frequency of tokens in the combined list follows Zipf’s law approximately.
References
More filters
Journal ArticleDOI

Models of Man.

G. L. S. Shackle, +1 more
- 01 Nov 1957 - 
Journal ArticleDOI

The Lognormal Distribution.

TL;DR: Lloyds Bank has its main root in a substantial private bank founded in Birmingham nearly two centuries ago; one hundred years ago this Bank still had only the one office in Birmingham, with a related private banking house in Lombard Street, and by amalgamation it has absorbed scores of other eighteenth and nineteenth century banks, both private and joint stock, and at least two of the former reach back into Restoration London, perhaps Cromwellian London.
Book

The lognormal distribution

TL;DR: Lloyds Bank has its main root in a substantial private bank founded in Birmingham nearly two centuries ago; one hundred years ago this Bank still had only the one office in Birmingham, with a related private banking house in Lombard Street, and by amalgamation it has absorbed scores of other eighteenth and nineteenth century banks, both private and joint stock, and at least two of the former reach back into Restoration London, perhaps Cromwellian London.