scispace - formally typeset
Open AccessJournal ArticleDOI

Empirical tests of the Gradual Learning Algorithm

Reads0
Chats0
TLDR
The Gradual Learning Algorithm (GLA) as mentioned in this paper is a constraint-ranking algorithm for learning optimality-theoretic grammars, which can learn free variation, deal effectively with noisy learning data, and account for gradient well-formedness judgments.
Abstract
The Gradual Learning Algorithm (Boersma 1997) is a constraint-ranking algorithm for learning optimality-theoretic grammars. The purpose of this article is to assess the capabilities of the Gradual Learning Algorithm, particularly in comparison with the Constraint Demotion algorithm of Tesar and Smolensky (1993, 1996, 1998, 2000), which initiated the learnability research program for Optimality Theory. We argue that the Gradual Learning Algorithm has a number of special advantages: it can learn free variation, deal effectively with noisy learning data, and account for gradient well-formedness judgments. The case studies we examine involve Ilokano reduplication and metathesis, Finnish genitive plurals, and the distribution of English light and dark /l/.

read more

Content maybe subject to copyright    Report

UvA-DARE is a service provided by the library of the University of Amsterdam (http
s
://dare.uva.nl)
UvA-DARE (Digital Academic Repository)
Empirical tests of the Gradual Learning Algorithm
Boersma, P.; Hayes, B.
DOI
10.1162/002438901554586
Publication date
2001
Document Version
Final published version
Published in
Linguistic Inquiry
Link to publication
Citation for published version (APA):
Boersma, P., & Hayes, B. (2001). Empirical tests of the Gradual Learning Algorithm.
Linguistic Inquiry
,
32
(1), 45-86. https://doi.org/10.1162/002438901554586
General rights
It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s)
and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open
content license (like Creative Commons).
Disclaimer/Complaints regulations
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please
let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material
inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter
to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You
will be contacted as soon as possible.
Download date:09 Aug 2022

Empirical Tests of the Gradual
Learning Algorithm
Paul Boersma
Bruce Hayes
The Gradual Learning Algorithm (Boersma 1997) is a constraint-rank-
ing algorithm for learning optimality-theoreticgrammars. The purpose
of this article is to assess the capabilities of the Gradual Learning
Algorithm, particularly in comparison with the Constraint Demotion
algorithm of Tesar and Smolensky (1993, 1996, 1998, 2000), which
initiated the learnability research program for Optimality Theory. We
argue that the Gradual Learning Algorithm has a number of special
advantages: it can learn free variation, deal effectively with noisy
learning data, and account for gradient well-formedness judgments.
The case studies we examine involve Ilokano reduplication and me-
tathesis, Finnish genitive plurals, and the distribution of English light
and dark /l/.
Keywords:
learnability, Optimality Theory, variation, Ilokano, Finnish
1 Introduction
Optimality Theory (Prince and Smolensky 1993) has made possible a new and fruitful approach
to the problem of phonological learning. If the language learner has access to an appropriate
inventory of constraints, then a complete grammar can be derived, provided an algorithm is
available that can rank the constraints on the basis of the input data. This possibility has led to
a line of research on ranking algorithms, originating with the work of Tesar and Smolensky (1993,
1996, 1998, 2000; Tesar 1995), who propose an algorithm called Constraint Demotion, reviewed
below. Other work on ranking algorithms includes Pulleyblank and Turkel 1995, 1996, 1998, to
appear, Broihier 1995, Hayes 1999, and Prince and Tesar 1999.
Our focus here is the Gradual Learning Algorithm, as developed by Boersma (1997, 1998,
to appear). This algorithm is in some respects a development of Tesar and Smolensky’s proposal:
it directly perturbs constraint rankings in response to language data, and, like most previously
proposed algorithms, it is error driven, in that it alters rankings only when the input data conflict
with its current ranking hypothesis. What is different about the Gradual Learning Algorithm is
the type of optimality-theoretic grammar it presupposes: rather than a set of discrete rankings, it
We would like to thank Arto Anttila, Chris Manning, and two anonymous LI reviewers for helpful input in the
preparation of this article. Thanks also to Louis Pols, the University of Utrecht, and the UCLA Academic Senate for
material assistance in making our joint work possible. The work of the first author was supported by a grant from the
Netherlands Organization for Scientific Research.
45
Linguistic Inquiry, Volume 32, Number 1, Winter 2001
4586
q 2001 by the Massachusetts Institute of Technology

46 P A U L B O E R S M A A N D B R U C E H A Y E S
assumes a continuous scale of constraint strictness. Also, the grammar is regarded as stochastic:
at every evaluation of the candidate set, a small noise component is temporarily added to the
ranking value of each constraint, so that the grammar can produce variable outputs if some
constraint rankings are close to each other.
The continuous ranking scale implies a different response to input data: rather than a whole-
sale reranking, the Gradual Learning Algorithm executes only small perturbations to the con-
straints’ locations along the scale. We argue that this more conservative approach yields important
advantages in three areas. First, the Gradual Learning Algorithm can fluently handle
optionality;
it readily forms grammars that can generate multiple outputs. Second, the algorithm is
robust,
in
the sense that speech errors occurring in the input data do not lead it off course. Third, the
algorithm is capable of developing formal analyses of linguistic phenomena in which speakers’
judgments involve
intermediate well-formedness.
A paradoxical aspect of the Gradual Learning Algorithm is that, even though it is statistical
and gradient in character, most of the constraint rankings it learns are (for all practical purposes)
categorical. These categorical rankings emerge as the limit of gradual learning. Categorical rank-
ings are of course crucial for learning data patterns where there is no optionality.
Learning algorithms can be assessed on both theoretical and empirical grounds. At the purely
theoretical level, we want to know if an algorithm can be guaranteed to learn all grammars that
possess the formal properties it presupposes. Research results on this question as it concerns the
Gradual Learning Algorithm are reported in Boersma 1997, 1998, to appear. On the empirical
side, we need to show that natural languages are indeed appropriately analyzed with grammars
of the formal type the algorithm can learn.
This article focuses on the second of these tasks. We confront the Gradual Learning Algorithm
with a variety of representative phonological phenomena, in order to assess its capabilities in
various ways. This approach reflects our belief that learning algorithms can be tested just like
other proposals in linguistic theory, by checking them out against language data.
A number of our data examples are taken from the work of the second author, who arrived
independently at the notion of a continuous ranking scale, and has with colleagues developed a
number of hand-crafted grammars that work on this basis (Hayes and MacEachern 1998; Hayes,
to appear).
We begin by reviewing how the Gradual Learning Algorithm works, then present several
empirical applications. A study of Ilokano phonology shows how the algorithm copes with data
involving systematic optionality. We also use a restricted subset of the Ilokano data to simulate
the response of the algorithm to speech errors. In both cases, we make comparisons with the
behavior of Constraint Demotion. Next, we turn to the study of output frequencies, posed as an
additional, stringent empirical test of the Gradual Learning Algorithm. We use the algorithm to
replicate the study of Anttila (1997a,b) on Finnish genitive plurals. Finally, we consider gradient
well-formedness, showing that the algorithm can replicate the results on English /l/ derived with
a hand-crafted grammar by Hayes (to appear).

E M P I R I C A L T E S T S O F T H E G R A D U A L L E A R N I N G A L G O R I T H M 47
2 How the Gradual Learning Algorithm Works
Two concepts crucial to the Gradual Learning Algorithm are the
continuous ranking scale
and
stochastic candidate evaluation.
We cover these first, then turn to the internal workings of the
algorithm.
2.1 The Continuous Ranking Scale
The Gradual Learning Algorithm presupposes a linear scale of constraint strictness, in which
higher values correspond to higher-ranked constraints. The scale is arranged in arbitrary units
and in principle has no upper or lower bound. Other work that has suggested or adopted a
continuous scale includes that of Liberman (1993:21, cited in Reynolds 1994), Zubritskaya (1997:
142144), Hayes and MacEachern (1998), and Hayes (to appear).
Continuous scales include strict constraint ranking as a special case. For instance, the scale
depicted graphically in (1) illustrates the straightforward nonvariable ranking C
1
. .
C
2
. .
C
3
.
(1)
Categorical ranking of constraints (C) along a continuous scale
C
1
strict
(high-ranked)
lax
(low-ranked)
C
2
C
3
2.2 How Stochastic Evaluation Generates Variation
The continuous scale becomes more meaningful when differences in distance have observable
consequences—for example, if the short distance between C
2
and C
3
in (1) tells us that the relative
ranking of this constraint pair is less fixed than that of C
1
and C
2
. We suggest that in the process
of speaking (i.e., at
evaluation time,
when the candidates in a tableau have to be evaluated in
order to determine a winner), the position of each constraint is temporarily perturbed by a random
positive or negative value. In this way, the constraints act as if they are associated with ranges
of values, instead of single points. We will call the value used at evaluation time a
selection
point.
The value more permanently associated with the constraint (i.e., the center of the range)
will be called the
ranking value.
Here there are two main possibilities. If the ranges covered by the selection points do not
overlap, the ranking scale again merely recapitulates ordinary categorical ranking.
(2)
Categorical ranking with ranges
C
1
strict lax
C
2
But if the ranges overlap, ranking will be free (variable).

48 P A U L B O E R S M A A N D B R U C E H A Y E S
(3)
Free ranking
C
2
strict lax
C
3
The reason is that, at evaluation time, it is possible to choose the selection points from anywhere
within the ranges of the two constraints. In (3), this would most often result in C
2
outranking C
3
,
but if the selection points are taken from the upper part of C
3
’s range, and the lower part of C
2
’s,
then C
3
would outrank C
2
. The two possibilities are shown in (4); /
·
2
/ and /
·
3
/ depict the selection
points for C
2
and C
3
.
(4)
a.
Common result: C
2
.
C
2
strict lax
C
3
·
2
·
3
b.
Rare result: C
3
.
C
2
strict lax
C
3
·
2
·
3
. C
2
. C
3
When one sorts all the constraints in the grammar by their selection points, one obtains a
total ranking to be employed for a particular evaluation time. With this total ranking, the ordinary
competition of candidates (supplied by the Gen function of Optimality Theory) takes place and
determines the winning output candidate.
1
The above description covers how the system in (4) behaves at one single evaluation time.
Over a longer sequence of evaluations, the overlapping ranges often yield an important observable
effect: for forms in which C
2
. .
C
3
yields a different output than C
3
. .
C
2
, one observes
free
variation,
that is, multiple outputs for a single underlying form.
To implement these ideas more precisely, we interpret the constraint ranges as
probability
distributions
(Boersma 1997, 1998, Hayes and MacEachern 1998). For each constraint, we assume
a function that specifies the probability that the selection point will occur at any given distance
above or below the constraint’s ranking value at evaluation time. By using probabilitydistributions,
one can not only enumerate the set of outputs generated by a grammar, but also make predictions
about their relative frequencies, a matter that will turn out to be important below.
Many noisy events in the real world occur with probabilities that are appropriately described
with a
normal
(
4
Gaussian) distribution. A normal distribution has a single peak in the center,
which means that values around the center are most probable, and declines gently but swiftly
1
The mechanism for determining the winning output in Optimality Theory, with Gen and a ranked constraint set,
will not be reviewed here. For background, see Prince and Smolenskys original work (1993) or textbooks such as
Archangeli and Langendoen 1997 and Kager 1999a.

Citations
More filters
Journal ArticleDOI

Differential Object Marking: Iconicity vs. Economy

TL;DR: The degree to which DOM penetrates the class of objects reflects the tension between two types of principles: one involves iconicity: the more marked a direct object qua object, the more likely it is to be overtly case-marked.
Book

Learnability in Optimality Theory

TL;DR: It is argued that Optimality Theory promotes confluence of the demands of more effective learnability and deeper linguistic explanation, as well as other learning issues, which allow efficient convergence to a correct grammar.
Journal ArticleDOI

Redundancy and reduction: Speakers manage syntactic information density

TL;DR: A principle of efficient language production based on information theoretic considerations is proposed: Uniform Information Density predicts that language production is affected by a preference to distribute information uniformly across the linguistic signal, and this prediction is tested against data from syntactic reduction.
Journal ArticleDOI

A Maximum Entropy Model of Phonotactics and Phonotactic Learning

TL;DR: This work proposes a theory of phonotactic grammars and a learning algorithm that constructs such Grammars from positive evidence, and applies the model in a variety of learning simulations, showing that the learnedgrammars capture the distributional generalizations of these languages and accurately predict the findings of a phonotactics experiment.
Journal ArticleDOI

Bridging the gap between l2 speech perception research and phonological theory

TL;DR: This paper provided an Optimality Theoretic model of phonological categorization that comes with a formal learning algorithm for its acquisition and provided evidence for the hypotheses of Full Transfer and Full Access.
References
More filters
Book

Optimality Theory: Constraint Interaction in Generative Grammar

TL;DR: In this article, Berber and Elmedlaoui present a theory for the construction of grammars in Optimality Theory, which is based on a core Syllabification in Imdlawn Tashlhiyt Berber.
Book

Phonology and Syntax: The Relation between Sound and Structure

TL;DR: A fundamentally new approach to the theory of phonology and its relation to syntax is developed in this book, which is the first to address the question of the relation between syntax and phonology in a systematic way.

Faithfulness and reduplicative identity

TL;DR: The UMass and Rutgers Correspondence Theory seminars were particularly important for the development of this work as discussed by the authors, and the comments, questions, and suggestions from the participants in the (eventually joint) UMass/Rutgers correspondence theory seminars are particularly important.
Journal Article

Compensatory Lengthening in Moraic Phonology

Bruce Hayes
- 01 Jan 1989 - 
TL;DR: Representation phonologique du niveau prosodique, contenant une seule unite correspondant a la notion traditionnelle de more dans un cadre metrique.