Empirical tests of the Gradual Learning Algorithm

doi:10.1162/002438901554586

UvA-DARE is a service provided by the library of the University of Amsterdam (http

s

://dare.uva.nl)

UvA-DARE (Digital Academic Repository)

Empirical tests of the Gradual Learning Algorithm

Boersma, P.; Hayes, B.

DOI

10.1162/002438901554586

Publication date

2001

Document Version

Final published version

Published in

Linguistic Inquiry

Link to publication

Citation for published version (APA):

Boersma, P., & Hayes, B. (2001). Empirical tests of the Gradual Learning Algorithm.

Linguistic Inquiry

,

32

(1), 45-86. https://doi.org/10.1162/002438901554586

General rights

It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s)

and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open

content license (like Creative Commons).

Disclaimer/Complaints regulations

If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please

let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material

inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter

to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You

will be contacted as soon as possible.

Download date:09 Aug 2022

Empirical Tests of the Gradual

Learning Algorithm

Paul Boersma

Bruce Hayes

The Gradual Learning Algorithm (Boersma 1997) is a constraint-rank-

ing algorithm for learning optimality-theoreticgrammars. The purpose

of this article is to assess the capabilities of the Gradual Learning

Algorithm, particularly in comparison with the Constraint Demotion

algorithm of Tesar and Smolensky (1993, 1996, 1998, 2000), which

initiated the learnability research program for Optimality Theory. We

argue that the Gradual Learning Algorithm has a number of special

advantages: it can learn free variation, deal effectively with noisy

learning data, and account for gradient well-formedness judgments.

The case studies we examine involve Ilokano reduplication and me-

tathesis, Finnish genitive plurals, and the distribution of English light

and dark /l/.

Keywords:

learnability, Optimality Theory, variation, Ilokano, Finnish

1 Introduction

Optimality Theory (Prince and Smolensky 1993) has made possible a new and fruitful approach

to the problem of phonological learning. If the language learner has access to an appropriate

inventory of constraints, then a complete grammar can be derived, provided an algorithm is

available that can rank the constraints on the basis of the input data. This possibility has led to

a line of research on ranking algorithms, originating with the work of Tesar and Smolensky (1993,

1996, 1998, 2000; Tesar 1995), who propose an algorithm called Constraint Demotion, reviewed

below. Other work on ranking algorithms includes Pulleyblank and Turkel 1995, 1996, 1998, to

appear, Broihier 1995, Hayes 1999, and Prince and Tesar 1999.

Our focus here is the Gradual Learning Algorithm, as developed by Boersma (1997, 1998,

to appear). This algorithm is in some respects a development of Tesar and Smolensky’s proposal:

it directly perturbs constraint rankings in response to language data, and, like most previously

proposed algorithms, it is error driven, in that it alters rankings only when the input data conflict

with its current ranking hypothesis. What is different about the Gradual Learning Algorithm is

the type of optimality-theoretic grammar it presupposes: rather than a set of discrete rankings, it

We would like to thank Arto Anttila, Chris Manning, and two anonymous LI reviewers for helpful input in the

preparation of this article. Thanks also to Louis Pols, the University of Utrecht, and the UCLA Academic Senate for

material assistance in making our joint work possible. The work of the first author was supported by a grant from the

Netherlands Organization for Scientific Research.

45

Linguistic Inquiry, Volume 32, Number 1, Winter 2001

45–86

q 2001 by the Massachusetts Institute of Technology

46 P A U L B O E R S M A A N D B R U C E H A Y E S

assumes a continuous scale of constraint strictness. Also, the grammar is regarded as stochastic:

at every evaluation of the candidate set, a small noise component is temporarily added to the

ranking value of each constraint, so that the grammar can produce variable outputs if some

constraint rankings are close to each other.

The continuous ranking scale implies a different response to input data: rather than a whole-

sale reranking, the Gradual Learning Algorithm executes only small perturbations to the con-

straints’ locations along the scale. We argue that this more conservative approach yields important

advantages in three areas. First, the Gradual Learning Algorithm can fluently handle

optionality;

it readily forms grammars that can generate multiple outputs. Second, the algorithm is

robust,

in

the sense that speech errors occurring in the input data do not lead it off course. Third, the

algorithm is capable of developing formal analyses of linguistic phenomena in which speakers’

judgments involve

intermediate well-formedness.

A paradoxical aspect of the Gradual Learning Algorithm is that, even though it is statistical

and gradient in character, most of the constraint rankings it learns are (for all practical purposes)

categorical. These categorical rankings emerge as the limit of gradual learning. Categorical rank-

ings are of course crucial for learning data patterns where there is no optionality.

Learning algorithms can be assessed on both theoretical and empirical grounds. At the purely

theoretical level, we want to know if an algorithm can be guaranteed to learn all grammars that

possess the formal properties it presupposes. Research results on this question as it concerns the

Gradual Learning Algorithm are reported in Boersma 1997, 1998, to appear. On the empirical

side, we need to show that natural languages are indeed appropriately analyzed with grammars

of the formal type the algorithm can learn.

This article focuses on the second of these tasks. We confront the Gradual Learning Algorithm

with a variety of representative phonological phenomena, in order to assess its capabilities in

various ways. This approach reflects our belief that learning algorithms can be tested just like

other proposals in linguistic theory, by checking them out against language data.

A number of our data examples are taken from the work of the second author, who arrived

independently at the notion of a continuous ranking scale, and has with colleagues developed a

number of hand-crafted grammars that work on this basis (Hayes and MacEachern 1998; Hayes,

to appear).

We begin by reviewing how the Gradual Learning Algorithm works, then present several

empirical applications. A study of Ilokano phonology shows how the algorithm copes with data

involving systematic optionality. We also use a restricted subset of the Ilokano data to simulate

the response of the algorithm to speech errors. In both cases, we make comparisons with the

behavior of Constraint Demotion. Next, we turn to the study of output frequencies, posed as an

additional, stringent empirical test of the Gradual Learning Algorithm. We use the algorithm to

replicate the study of Anttila (1997a,b) on Finnish genitive plurals. Finally, we consider gradient

well-formedness, showing that the algorithm can replicate the results on English /l/ derived with

a hand-crafted grammar by Hayes (to appear).

E M P I R I C A L T E S T S O F T H E G R A D U A L L E A R N I N G A L G O R I T H M 47

2 How the Gradual Learning Algorithm Works

Two concepts crucial to the Gradual Learning Algorithm are the

continuous ranking scale

and

stochastic candidate evaluation.

We cover these first, then turn to the internal workings of the

algorithm.

2.1 The Continuous Ranking Scale

The Gradual Learning Algorithm presupposes a linear scale of constraint strictness, in which

higher values correspond to higher-ranked constraints. The scale is arranged in arbitrary units

and in principle has no upper or lower bound. Other work that has suggested or adopted a

continuous scale includes that of Liberman (1993:21, cited in Reynolds 1994), Zubritskaya (1997:

142–144), Hayes and MacEachern (1998), and Hayes (to appear).

Continuous scales include strict constraint ranking as a special case. For instance, the scale

depicted graphically in (1) illustrates the straightforward nonvariable ranking C

1

. .

C

2

. .

C

3

.

(1)

Categorical ranking of constraints (C) along a continuous scale

C

1

strict

(high-ranked)

lax

(low-ranked)

C

2

C

3

2.2 How Stochastic Evaluation Generates Variation

The continuous scale becomes more meaningful when differences in distance have observable

consequences—for example, if the short distance between C

2

and C

3

in (1) tells us that the relative

ranking of this constraint pair is less fixed than that of C

1

and C

2

. We suggest that in the process

of speaking (i.e., at

evaluation time,

when the candidates in a tableau have to be evaluated in

order to determine a winner), the position of each constraint is temporarily perturbed by a random

positive or negative value. In this way, the constraints act as if they are associated with ranges

of values, instead of single points. We will call the value used at evaluation time a

selection

point.

The value more permanently associated with the constraint (i.e., the center of the range)

will be called the

ranking value.

Here there are two main possibilities. If the ranges covered by the selection points do not

overlap, the ranking scale again merely recapitulates ordinary categorical ranking.

(2)

Categorical ranking with ranges

C

1

strict lax

C

2

But if the ranges overlap, ranking will be free (variable).

48 P A U L B O E R S M A A N D B R U C E H A Y E S

(3)

Free ranking

C

2

strict lax

C

3

The reason is that, at evaluation time, it is possible to choose the selection points from anywhere

within the ranges of the two constraints. In (3), this would most often result in C

2

outranking C

3

,

but if the selection points are taken from the upper part of C

3

’s range, and the lower part of C

2

’s,

then C

3

would outrank C

2

. The two possibilities are shown in (4); /

·

2

/ and /

·

3

/ depict the selection

points for C

2

and C

3

.

(4)

a.

Common result: C

2

.

C

2

strict lax

C

3

·

2

·

3

b.

Rare result: C

3

.

C

2

strict lax

C

3

·

2

·

3

. C

2

. C

3

When one sorts all the constraints in the grammar by their selection points, one obtains a

total ranking to be employed for a particular evaluation time. With this total ranking, the ordinary

competition of candidates (supplied by the Gen function of Optimality Theory) takes place and

determines the winning output candidate.

1

The above description covers how the system in (4) behaves at one single evaluation time.

Over a longer sequence of evaluations, the overlapping ranges often yield an important observable

effect: for forms in which C

2

. .

C

3

yields a different output than C

3

. .

C

2

, one observes

free

variation,

that is, multiple outputs for a single underlying form.

To implement these ideas more precisely, we interpret the constraint ranges as

probability

distributions

(Boersma 1997, 1998, Hayes and MacEachern 1998). For each constraint, we assume

a function that specifies the probability that the selection point will occur at any given distance

above or below the constraint’s ranking value at evaluation time. By using probabilitydistributions,

one can not only enumerate the set of outputs generated by a grammar, but also make predictions

about their relative frequencies, a matter that will turn out to be important below.

Many noisy events in the real world occur with probabilities that are appropriately described

with a

normal

(

4

Gaussian) distribution. A normal distribution has a single peak in the center,

which means that values around the center are most probable, and declines gently but swiftly

1

The mechanism for determining the winning output in Optimality Theory, with Gen and a ranked constraint set,

will not be reviewed here. For background, see Prince and Smolensky’s original work (1993) or textbooks such as

Archangeli and Langendoen 1997 and Kager 1999a.

Empirical tests of the Gradual Learning Algorithm

Citations

Differential Object Marking: Iconicity vs. Economy

Learnability in Optimality Theory

Redundancy and reduction: Speakers manage syntactic information density

A Maximum Entropy Model of Phonotactics and Phonotactic Learning

Bridging the gap between l2 speech perception research and phonological theory

References

Optimality Theory: Constraint Interaction in Generative Grammar

Phonology and Syntax: The Relation between Sound and Structure

Faithfulness and reduplicative identity

Principles of Linguistic Change

Compensatory Lengthening in Moraic Phonology

Related Papers (5)

Optimality Theory: Constraint Interaction in Generative Grammar

Learnability in Optimality Theory

The Sound Pattern of English

Functional Phonology: Formalizing the interactions between articulatory and perceptual drives

Faithfulness and reduplicative identity