What is the way to convert a phrase-structure tree?

Then one can use the fast bilexical parsing algorithm of 3.3 to generate the highest-weighteddependency tree, and then convert that tree to a phrase-structure tree, as shown in Figure 1.6.

What is the way to reconstruct the sequence of states?

given a dependency tree for Nurses help John readily, the authors can reconstruct the sequence V, VP, VP of states encountered by as it reads help’s right children, and thereby associate a nonterminal attachment level with each child.

(Open Access) Bilexical Grammars and their Cubic-Time Parsing Algorithms (2000) | Jason Eisner

Q: What is the new algorithm for bilexical grammar?

The new algorithm also reduces space requirements to , from the cubic space required by CKY-style approaches to bilexical grammar.

Q: What are the advantages of lexicalized grammars?

It is now common to find fully lexicalized versions of many grammatical formalisms, such as context-free and tree-adjoining grammars (Schabes et al., 1988).

Chapter 1

BILEXICAL GRAMMARS AND THEIR

CUBIC-TIME PARSING ALGORITHMS

Jason Eisner

Dept. of Computer Science, University of Rochester

P.O. Box 270226

Rochester, NY 14627-0226 U.S.A.



jason@cs.rochester.edu

 !"#%$'&()+*-,

New Developments in

Natural Language Parsing

 .

/102223

#45&6

&7

.8

9#(4"&(:

; <

&=>?6"@(1A&A B6CC??&D(#E"#,F(

"A&GAE&(:IH

Abstract This chapter introduces weighted bilexical grammars, a formalism in which in-

dividual lexical items, such as verbs and their arguments, can have idiosyncratic

selectional inﬂuences on each other. Such ‘bilexicalism’ has been a theme of

much current work in parsing. The new formalism can be used to describe bilex-

ical approaches to both dependency and phrase-structure grammars, and a slight

modiﬁcation yields link grammars. Its scoring approach is compatible with a

wide variety of probability models.

The obvious parsing algorithm for bilexical grammars (used by most previous

authors) takes time

JLKNM-OP

. A more efﬁcient

JLKNMRQ4P

method is exhibited. The

new algorithm has been implemented and used in a large parsing experiment

(Eisner, 1996b). We also give a useful extension to the case where the parser

must undo a stochastic transduction that has altered the input.

1. INTRODUCTION

1.1 THE BILEXICAL IDEA

Lexicalized Grammars. Computational linguistics has a long tradition of

lexicalized grammars, in which each grammatical rule is specialized for some

individual word. The earliest lexicalized rules were word-speciﬁc subcate-

gorization frames. It is now common to ﬁnd fully lexicalized versions of

many grammatical formalisms, such as context-free and tree-adjoining gram-

mars (Schabes et al., 1988). Other formalisms, such as dependency grammar

This material is based on work supported by an NSF Graduate Research Fellowship and ARPA Grant

N6600194-C-6043 ‘Human Language Technology’ to the University of Pennsylvania.

(Mel’

cuk, 1988) and head-driven phrase-structure grammar (Pollard and Sag,

1994), are explicitly lexical from the start.

Lexicalized grammars have two well-known advantages. When syntactic

acceptability is sensitive to the quirks of individual words, lexicalized rules are

necessary for linguistic description. Lexicalized rules are also computationally

cheap for parsing written text: a parser may ignore those rules that do not

mention any input words.

Probabilities and the New Bilexicalism. More recently, a third advantage

of lexicalized grammars has emerged. Even when syntactic acceptability is not

sensitive to the particular words chosen, syntactic distribution may be (Resnik,

1993). Certain words may be able but highly unlikely to modify certain other

words. Of course, only some such collocational facts are genuinely lexical (the

storm gathered/*convened); others are presumably a weak reﬂex of semantics

or world knowledge (solve puzzles/??goats). But both kinds can be captured

by a probabilistic lexicalized grammar, where they may be used to resolve

ambiguity in favor of the most probable analysis, and also to speed parsing

by avoiding (‘pruning’) unlikely search paths. Accuracy and efﬁciency can

therefore both beneﬁt.

Work along these lines includes (Charniak, 1995; Collins, 1996; Eisner,

1996a; Charniak, 1997; Collins, 1997; Goodman, 1997), who reported state-

of-the-art parsing accuracy. Related models are proposed without evaluation in

(Lafferty et al., 1992; Alshawi, 1996).

This ﬂurry of probabilistic lexicalized parsers has focused on what one might

call bilexical grammars, in which each grammatical rule is specialized for

not one but two individual words.



The central insight is that speciﬁc words

subcategorize to some degree for other speciﬁc words: tax is a good object for

the verb raise. These parsers accordingly estimate, for example, the probability

that word



is modiﬁed by (a phrase headed by) word



, for each pair of words



in the vocabulary.

1.2 AVOIDING THE COST OF BILEXICALISM

Past Work. At ﬁrst blush, bilexical grammars (whether probabilistic or not)

appear to carry a substantial computational penalty. We will see that parsers

derived directly from CKY or Earley’s algorithm take time





for a sentence of length



and a vocabulary of



terminal symbols. In practice





, so this amounts to

 



. Such algorithms implicitly or explicitly

regard the grammar as a context-free grammar in which a noun phrase headed

by tiger bears the special nonterminal NP

tiger

. These





algorithms are used

by (Charniak, 1995; Alshawi, 1996; Charniak, 1997; Collins, 1996; Collins,

1997) and subsequent authors.

Bilexical Grammars and







Parsing 3

Speeding Things Up. The present chapter formalizes a particular notion of

bilexical grammars, and shows that a length-



sentence can be parsed in time

only

 









, where



and



are bounded by the grammar and are typically

small. (



isthemaximum number of senses per inputword, while



measuresthe

degree of interdependence that the grammar allows among the several lexical

modiﬁers of a word.) The new algorithm also reduces space requirements to













, from the cubic space required by CKY-style approaches to bilexical

grammar. The parsing algorithm ﬁnds the highest-scoring analysis or analyses

generated by the grammar, under a probabilistic or other measure.

The new





-time algorithm has been implemented, and was used in the

experimental work of (Eisner, 1996b; Eisner, 1996a), which compared various

bilexical probability models. The algorithm also applies to the Treebank Gram-

mars of (Charniak, 1995). Furthermore, it applies to the head-automaton gram-

mars (HAGs) of (Alshawi, 1996) and the phrase-structure models of (Collins,

1996; Collins, 1997), allowing

 



-time rather than







-time parsing,

granted the (linguistically sensible) restrictions that the number of distinct X-

bar levels is bounded and that left and right adjuncts are independent of each

other.

1.3 ORGANIZATION OF THE CHAPTER

This chapter is organized as follows:

Firstwewill developthe ideas discussed above.



2.presentsa simple formal-

ization of bilexical grammar, and then



3. explains why the naive recognition

algorithm is

 



and how to reduce it to







Next,



4. offers some extensions to the basic formalism.



4.1 extends it to

weighted (probabilistic) grammars, and shows how to ﬁnd the best parse of the

input.



4.2 explains how to handle and disambiguate polysemous words.



4.3

shows how to exclude or penalize string-local conﬁgurations.



4.4 handles the

more general case where the input is an arbitrary rational transduction of the

“underlying” string to be parsed.



5. carefully connects the bilexical grammar formalism of this chapter to

other bilexical formalisms such as dependency, context-free, head-automaton,

and link grammars. In particular, we apply the fast parsing idea to these for-

malisms.

The conclusions in



6. summarize the result and place it in the context of

other work by the author, including a recent asymptotic improvement.

2. A SIMPLE BILEXICAL FORMALISM

The bilexical formalism developed in this chapter is modeled on dependency

grammar (Gaifman, 1965; Mel’

cuk, 1988). It is equivalent to the class of split

bilexical grammars (including split bilexical CFGs and split HAGs) deﬁned

in (Eisner and Satta, 1999). More powerful bilexical formalisms also exist, and

improved parsing algorithms for these are cited in



5.6 and



5.8.

Form of the Grammar. We begin with a simple version of the formalism,

to be modiﬁed later in the chapter. A [split] unweighted bilexical grammar

consists of the following elements:

A set



of words, called the (terminal) vocabulary, which contains a

distinguished symbol



For each word







, a pair of deterministic ﬁnite-state automata



and



. Each automaton accepts some regular subset of







is deﬁned to be an upper bound on the number of states in any single

automaton. (



will be deﬁned in



4.2 as an upper bound on lexical ambiguity.)

The dependents of word



are the headwords of its arguments and ad-

juncts. Speaking intuitively, automaton



speciﬁes the possible sequences of

left dependents for



. So these allowable sequences, which are word strings in





, form a regular set. Similarly



speciﬁes the possible sequences of right

dependents for



By convention, the ﬁrst element in such a sequence is closest to



in the

surface string. Thus, the possible dependent sequences (from left to right) are

speciﬁed by











and









respectively. For example, if the tree shown

in Figure 1.1a is grammatical, then we know that



accepts the, and



accepts of raise.

To get fast parsing, it is reasonable to ask that the automata individually have

few states (i.e., that



be small). However, we wish to avoid any penalty for

having

many (distinct) automata—two per word in



;

many arcs leaving an automaton state—one per possible dependent in



That is, the vocabulary size



should not affect performance at all.

We will use









and









to denote the state sets of



and



respec-

tively;









and









to denote their initial states; and predicate









to mean

that



is a ﬁnal state of its automaton. The transition functions may be notated

as a single pair of functions



and



, where















returns the state reached

!

when it leaves state



on an arc labeled





, and similarly





 









Notice that as an implementation matter, if the automata are deﬁned in any

systematic way, it is not necessary to actually store them in order to represent

the grammar. One only needs to choose an appropriate representation for states



and deﬁne the







, and



functions.

Bilexical Grammars and







Parsing 5

Meaning of the Grammar. We now formally deﬁne the language generated

by such a grammar, and the structures that the grammar assigns to sentences of

this language.

Let a dependency tree be a rooted tree whose nodes (both internal and

external) are labeled with words from



, as illustrated in Figure 1.1a; the root

is labeled with the special symbol

   





. The children (‘dependents’) of

a node are ordered with respect to each other and the node itself, so that the

node has both left children that precede it and right children that follow it.

A dependency tree is grammatical iff for every word token



that appears

in the tree,



accepts the (possibly empty) sequence of



’s left children (from

right to left), and



accepts the sequence of



’s right children (from left to

right).

A string









is generated by the grammar, with analysis



, if



is a

grammatical dependency tree and listing the node labels of



in inﬁx order

yields the string



followed by

 



is called the yield of



Bilexicalism. The term bilexical refers to the fact that (i) each







may

specify a wholly different choice of automata

 

and



, and furthermore (ii)

these automata



and



may make distinctions among individual words that

are appropriate to serve as children (dependents) of



. Thus the grammar is

sensitive to speciﬁc pairs of lexical items.

For example, it is possible for one lexical verb to select for a completely

idiosyncratic set of nouns as subject, and another lexical verb to select for an

entirely different set of nouns. Since it never requires more than a two-state

automaton (though with many arcs!) to specify the set of possible subjects

for a verb, there is no penalty for such behavior in the parsing algorithm to be

described here.



AND



RECOGNITION

This section develops a basic

 



recognition method for simple bilexical

grammars as deﬁned above. We begin with a naive

 



method drawn from

context-free ‘dotted-rule’ methods such as (Earley, 1970; Graham et al., 1980).

Second, we will see why this method is inefﬁcient. Finally, a more efﬁcient







algorithm is presented.

Both methods are essentially chart parsers, in that they use dynamic pro-

gramming to build up an analysis of the whole sentence from analyses of its

substrings. However,theslowmethod combines traditional constituents, whose

lexical heads may be in the middle, while the fast method combines what we

will call spans, whose heads are guaranteed to be at the edge.

Bilexical Grammars and their Cubic-Time Parsing Algorithms

Figures

Citations

An efficient algorithm for projective dependency parsing

The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages

The CoNLL 2008 Shared Task on Joint Parsing of Syntactic and Semantic Dependencies

Simple Semi-supervised Dependency Parsing

Dependency Parsing

References

A note on two problems in connexion with graphs

Head-driven phrase structure grammar

An efficient context-free parsing algorithm

Dependency Syntax: Theory and Practice

An efficient context-free parsing algorithm

Related Papers (5)

Building a large annotated corpus of English: the penn treebank

Online Large-Margin Training of Dependency Parsers

Non-Projective Dependency Parsing using Spanning Tree Algorithms

Head-Driven Statistical Models for Natural Language Parsing

Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

Frequently Asked Questions (6)

Q1. What are the contributions mentioned in the paper "Bilexical grammars and their cubic-time parsing algorithms" ?

Q2. What is the main advantage of lexicalized grammars?

Q3. What is the new algorithm for bilexical grammar?

Q4. What is the way to convert a phrase-structure tree?

Q5. What are the advantages of lexicalized grammars?

Q6. What is the way to reconstruct the sequence of states?