A Generative Model of Phonotactics
Reads0
Chats0
TLDR
A probabilistic model of phonotactics, the set of well-formed phoneme sequences in a language, that robustly assigns higher probabilities to held-out forms than a sophisticated N-gram model for all languages is presented.Abstract:
We present a probabilistic model of phonotactics, the set of well-formed phoneme sequences in a language. Unlike most computational models of phonotactics (Hayes and Wilson, 2008; Goldsmith and Riggle, 2012), we take a fully generative approach, modeling a process where forms are built up out of subparts by phonologically-informed structure building operations. We learn an inventory of subparts by applying stochastic memoization (Johnson et al., 2006; Goodman et al., 2008) to a generative process for phonemes structured as an and-or graph, based on concepts of feature hierarchy from generative phonology (Clements, 1985; Dresher, 2009). Subparts are combined in a way that allows tier-based feature interactions. We evaluate our models’ ability to capture phonotactic distributions in the lexicons of 14 languages drawn from the WOLEX corpus (Graff, 2012). Our full model robustly assigns higher probabilities to held-out forms than a sophisticated N-gram model for all languages. We also present novel analyses that probe model behavior in more detail.read more
Citations
More filters
Journal ArticleDOI
Phonotactic Complexity and Its Trade-offs
TL;DR: Methods for calculating a measure of phonotactic complexity—bits per phoneme— that permits a straightforward cross-linguistic comparison are presented, giving insight into how complex a language’s phonotactics is.
Journal ArticleDOI
Phonotactic Complexity and its Trade-offs
TL;DR: In this article, a measure of phonotactic complexity, bits per phoneme, was proposed to compare the entropy across languages, giving insight into how complex a language's phonotactics are.
Journal ArticleDOI
Miller's monkey updated: Communicative efficiency and the statistics of words in natural language
TL;DR: It is shown that lexicons resulting from the monkey model provide a better embodiment of communicative efficiency than the actual lexicon of English, and the updated monkey model provides a good fit for the growth trajectory of English as recorded in the Oxford English Dictionary.
Journal ArticleDOI
Why do human languages have homophones
Sean Trott,Benjamin Bergen +1 more
TL;DR: This article found that homophony in real languages is not directly selected for, but rather emerges as a natural consequence of other features of a language, such as phonotactics and distribution of word lengths.
References
More filters
Book
The Sound Pattern of English
Noam Chomsky,Morris Halle +1 more
TL;DR: Since this classic work in phonology was published in 1968, there has been no other book that gives as broad a view of the subject, combining generally applicable theoretical contributions with analysis of the details of a single language.
Journal ArticleDOI
A Bayesian Analysis of Some Nonparametric Problems
TL;DR: In this article, a class of prior distributions, called Dirichlet process priors, is proposed for nonparametric problems, for which treatment of many non-parametric statistical problems may be carried out, yielding results that are comparable to the classical theory.
ReportDOI
A Constructive Definition of Dirichlet Priors
TL;DR: In this article, a class of priors known as Dirichlet measures have been used for the distribution of a random variable X when it takes values in R sub K, where K is the dimension of all probability measures on a large space.
Journal ArticleDOI
The geometry of phonological features
TL;DR: The apparently vast number of speech sounds found in the languages of the world turn out to be surface-level realisations of a limited number of combinations of a very small set of such features – some twenty or so, in current analyses.