scispace - formally typeset
Open AccessJournal ArticleDOI

A Generative Model of Phonotactics

Reads0
Chats0
TLDR
A probabilistic model of phonotactics, the set of well-formed phoneme sequences in a language, that robustly assigns higher probabilities to held-out forms than a sophisticated N-gram model for all languages is presented.
Abstract
We present a probabilistic model of phonotactics, the set of well-formed phoneme sequences in a language. Unlike most computational models of phonotactics (Hayes and Wilson, 2008; Goldsmith and Riggle, 2012), we take a fully generative approach, modeling a process where forms are built up out of subparts by phonologically-informed structure building operations. We learn an inventory of subparts by applying stochastic memoization (Johnson et al., 2006; Goodman et al., 2008) to a generative process for phonemes structured as an and-or graph, based on concepts of feature hierarchy from generative phonology (Clements, 1985; Dresher, 2009). Subparts are combined in a way that allows tier-based feature interactions. We evaluate our models’ ability to capture phonotactic distributions in the lexicons of 14 languages drawn from the WOLEX corpus (Graff, 2012). Our full model robustly assigns higher probabilities to held-out forms than a sophisticated N-gram model for all languages. We also present novel analyses that probe model behavior in more detail.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book ChapterDOI

Grundzüge der Phonologie

Journal ArticleDOI

Phonotactic Complexity and Its Trade-offs

TL;DR: Methods for calculating a measure of phonotactic complexity—bits per phoneme— that permits a straightforward cross-linguistic comparison are presented, giving insight into how complex a language’s phonotactics is.
Journal ArticleDOI

Phonotactic Complexity and its Trade-offs

TL;DR: In this article, a measure of phonotactic complexity, bits per phoneme, was proposed to compare the entropy across languages, giving insight into how complex a language's phonotactics are.
Journal ArticleDOI

Miller's monkey updated: Communicative efficiency and the statistics of words in natural language

TL;DR: It is shown that lexicons resulting from the monkey model provide a better embodiment of communicative efficiency than the actual lexicon of English, and the updated monkey model provides a good fit for the growth trajectory of English as recorded in the Oxford English Dictionary.
Journal ArticleDOI

Why do human languages have homophones

TL;DR: This article found that homophony in real languages is not directly selected for, but rather emerges as a natural consequence of other features of a language, such as phonotactics and distribution of word lengths.
References
More filters
Book

The Sound Pattern of English

Noam Chomsky, +1 more
TL;DR: Since this classic work in phonology was published in 1968, there has been no other book that gives as broad a view of the subject, combining generally applicable theoretical contributions with analysis of the details of a single language.
Journal ArticleDOI

A Bayesian Analysis of Some Nonparametric Problems

TL;DR: In this article, a class of prior distributions, called Dirichlet process priors, is proposed for nonparametric problems, for which treatment of many non-parametric statistical problems may be carried out, yielding results that are comparable to the classical theory.
ReportDOI

A Constructive Definition of Dirichlet Priors

TL;DR: In this article, a class of priors known as Dirichlet measures have been used for the distribution of a random variable X when it takes values in R sub K, where K is the dimension of all probability measures on a large space.
Journal ArticleDOI

The geometry of phonological features

TL;DR: The apparently vast number of speech sounds found in the languages of the world turn out to be surface-level realisations of a limited number of combinations of a very small set of such features – some twenty or so, in current analyses.
Related Papers (5)