scispace - formally typeset
Search or ask a question

Showing papers by "Ching Y. Suen published in 1979"


Journal ArticleDOI
TL;DR: The positional distributions of n-grams obtained in the present study are discussed and statistical studies on word length and trends ofn-gram frequencies versus vocabulary are presented.
Abstract: n-gram (n = 1 to 5) statistics and other properties of the English language were derived for applications in natural language understanding and text processing. They were computed from a well-known corpus composed of 1 million word samples. Similar properties were also derived from the most frequent 1000 words of three other corpuses. The positional distributions of n-grams obtained in the present study are discussed. Statistical studies on word length and trends of n-gram frequencies versus vocabulary are presented. In addition to a survey of n-gram statistics found in the literature, a collection of n-gram statistics obtained by other researchers is reviewed and compared.

237 citations


Book
01 Jan 1979

14 citations