scispace - formally typeset
Open AccessBook ChapterDOI

Bilexical Grammars and their Cubic-Time Parsing Algorithms

TLDR
This chapter introduces weighted bilexical grammars, a formalism in which individual lexical items, such as verbs and their arguments, can have idiosyncratic selectional influences on each other.
Abstract
This chapter introduces weighted bilexical grammars, a formalism in which individual lexical items, such as verbs and their arguments, can have idiosyncratic selectional influences on each other. Such ‘bilexicalism’ has been a theme of much current work in parsing. The new formalism can be used to describe bilexical approaches to both dependency and phrase-structure grammars, and a slight modification yields link grammars. Its scoring approach is compatible with a wide variety of probability models.

read more

Content maybe subject to copyright    Report

Chapter 1
BILEXICAL GRAMMARS AND THEIR
CUBIC-TIME PARSING ALGORITHMS
Jason Eisner
Dept. of Computer Science, University of Rochester
P.O. Box 270226
Rochester, NY 14627-0226 U.S.A.
jason@cs.rochester.edu
 !"#%$'&()+*-,
New Developments in
Natural Language Parsing
.
/102223
#45&6
.
&7
.8
9#(4"&(:
; <
&=>?6"@(1A&A B6CC??&D(#E"#,F(
.
"A&GAE&(:IH
Abstract This chapter introduces weighted bilexical grammars, a formalism in which in-
dividual lexical items, such as verbs and their arguments, can have idiosyncratic
selectional influences on each other. Such ‘bilexicalism’ has been a theme of
much current work in parsing. The new formalism can be used to describe bilex-
ical approaches to both dependency and phrase-structure grammars, and a slight
modification yields link grammars. Its scoring approach is compatible with a
wide variety of probability models.
The obvious parsing algorithm for bilexical grammars (used by most previous
authors) takes time
JLKNM-OP
. A more efficient
JLKNMRQ4P
method is exhibited. The
new algorithm has been implemented and used in a large parsing experiment
(Eisner, 1996b). We also give a useful extension to the case where the parser
must undo a stochastic transduction that has altered the input.
1. INTRODUCTION
1.1 THE BILEXICAL IDEA
Lexicalized Grammars. Computational linguistics has a long tradition of
lexicalized grammars, in which each grammatical rule is specialized for some
individual word. The earliest lexicalized rules were word-specific subcate-
gorization frames. It is now common to find fully lexicalized versions of
many grammatical formalisms, such as context-free and tree-adjoining gram-
mars (Schabes et al., 1988). Other formalisms, such as dependency grammar
S
This material is based on work supported by an NSF Graduate Research Fellowship and ARPA Grant
N6600194-C-6043 ‘Human Language Technology’ to the University of Pennsylvania.
1

2
(Mel’
ˇ
cuk, 1988) and head-driven phrase-structure grammar (Pollard and Sag,
1994), are explicitly lexical from the start.
Lexicalized grammars have two well-known advantages. When syntactic
acceptability is sensitive to the quirks of individual words, lexicalized rules are
necessary for linguistic description. Lexicalized rules are also computationally
cheap for parsing written text: a parser may ignore those rules that do not
mention any input words.
Probabilities and the New Bilexicalism. More recently, a third advantage
of lexicalized grammars has emerged. Even when syntactic acceptability is not
sensitive to the particular words chosen, syntactic distribution may be (Resnik,
1993). Certain words may be able but highly unlikely to modify certain other
words. Of course, only some such collocational facts are genuinely lexical (the
storm gathered/*convened); others are presumably a weak reflex of semantics
or world knowledge (solve puzzles/??goats). But both kinds can be captured
by a probabilistic lexicalized grammar, where they may be used to resolve
ambiguity in favor of the most probable analysis, and also to speed parsing
by avoiding (‘pruning’) unlikely search paths. Accuracy and efficiency can
therefore both benefit.
Work along these lines includes (Charniak, 1995; Collins, 1996; Eisner,
1996a; Charniak, 1997; Collins, 1997; Goodman, 1997), who reported state-
of-the-art parsing accuracy. Related models are proposed without evaluation in
(Lafferty et al., 1992; Alshawi, 1996).
This flurry of probabilistic lexicalized parsers has focused on what one might
call bilexical grammars, in which each grammatical rule is specialized for
not one but two individual words.
The central insight is that specific words
subcategorize to some degree for other specific words: tax is a good object for
the verb raise. These parsers accordingly estimate, for example, the probability
that word
is modified by (a phrase headed by) word
, for each pair of words

in the vocabulary.
1.2 AVOIDING THE COST OF BILEXICALISM
Past Work. At first blush, bilexical grammars (whether probabilistic or not)
appear to carry a substantial computational penalty. We will see that parsers
derived directly from CKY or Earley’s algorithm take time


for a sentence of length
and a vocabulary of

terminal symbols. In practice


, so this amounts to
 
. Such algorithms implicitly or explicitly
regard the grammar as a context-free grammar in which a noun phrase headed
by tiger bears the special nonterminal NP
tiger
. These

algorithms are used
by (Charniak, 1995; Alshawi, 1996; Charniak, 1997; Collins, 1996; Collins,
1997) and subsequent authors.

Bilexical Grammars and

Parsing 3
Speeding Things Up. The present chapter formalizes a particular notion of
bilexical grammars, and shows that a length-
sentence can be parsed in time
only


, where
and
are bounded by the grammar and are typically
small. (
isthemaximum number of senses per inputword, while
measuresthe
degree of interdependence that the grammar allows among the several lexical
modifiers of a word.) The new algorithm also reduces space requirements to

, from the cubic space required by CKY-style approaches to bilexical
grammar. The parsing algorithm finds the highest-scoring analysis or analyses
generated by the grammar, under a probabilistic or other measure.
The new

-time algorithm has been implemented, and was used in the
experimental work of (Eisner, 1996b; Eisner, 1996a), which compared various
bilexical probability models. The algorithm also applies to the Treebank Gram-
mars of (Charniak, 1995). Furthermore, it applies to the head-automaton gram-
mars (HAGs) of (Alshawi, 1996) and the phrase-structure models of (Collins,
1996; Collins, 1997), allowing

-time rather than

-time parsing,
granted the (linguistically sensible) restrictions that the number of distinct X-
bar levels is bounded and that left and right adjuncts are independent of each
other.
1.3 ORGANIZATION OF THE CHAPTER
This chapter is organized as follows:
Firstwewill developthe ideas discussed above.
2.presentsa simple formal-
ization of bilexical grammar, and then
3. explains why the naive recognition
algorithm is

and how to reduce it to

.
Next,
4. offers some extensions to the basic formalism.
4.1 extends it to
weighted (probabilistic) grammars, and shows how to find the best parse of the
input.
4.2 explains how to handle and disambiguate polysemous words.
4.3
shows how to exclude or penalize string-local configurations.
4.4 handles the
more general case where the input is an arbitrary rational transduction of the
“underlying” string to be parsed.
5. carefully connects the bilexical grammar formalism of this chapter to
other bilexical formalisms such as dependency, context-free, head-automaton,
and link grammars. In particular, we apply the fast parsing idea to these for-
malisms.
The conclusions in
6. summarize the result and place it in the context of
other work by the author, including a recent asymptotic improvement.
2. A SIMPLE BILEXICAL FORMALISM
The bilexical formalism developed in this chapter is modeled on dependency
grammar (Gaifman, 1965; Mel’
ˇ
cuk, 1988). It is equivalent to the class of split
bilexical grammars (including split bilexical CFGs and split HAGs) defined

4
in (Eisner and Satta, 1999). More powerful bilexical formalisms also exist, and
improved parsing algorithms for these are cited in
5.6 and
5.8.
Form of the Grammar. We begin with a simple version of the formalism,
to be modified later in the chapter. A [split] unweighted bilexical grammar
consists of the following elements:
A set
of words, called the (terminal) vocabulary, which contains a
distinguished symbol

.
For each word

, a pair of deterministic finite-state automata

and

. Each automaton accepts some regular subset of
.
is defined to be an upper bound on the number of states in any single
automaton. (
will be defined in
4.2 as an upper bound on lexical ambiguity.)
The dependents of word
are the headwords of its arguments and ad-
juncts. Speaking intuitively, automaton

specifies the possible sequences of
left dependents for
. So these allowable sequences, which are word strings in
, form a regular set. Similarly

specifies the possible sequences of right
dependents for
.
By convention, the first element in such a sequence is closest to
in the
surface string. Thus, the possible dependent sequences (from left to right) are
specified by


and

respectively. For example, if the tree shown
in Figure 1.1a is grammatical, then we know that

accepts the, and

accepts of raise.
To get fast parsing, it is reasonable to ask that the automata individually have
few states (i.e., that
be small). However, we wish to avoid any penalty for
having
many (distinct) automata—two per word in
;
many arcs leaving an automaton state—one per possible dependent in
.
That is, the vocabulary size

should not affect performance at all.
We will use

and

to denote the state sets of

and

respec-
tively;

and

to denote their initial states; and predicate

to mean
that
is a final state of its automaton. The transition functions may be notated
as a single pair of functions
and
, where


returns the state reached
by
!
when it leaves state
on an arc labeled
, and similarly

.
Notice that as an implementation matter, if the automata are defined in any
systematic way, it is not necessary to actually store them in order to represent
the grammar. One only needs to choose an appropriate representation for states
and define the
,
,
, and
functions.

Bilexical Grammars and

Parsing 5
Meaning of the Grammar. We now formally define the language generated
by such a grammar, and the structures that the grammar assigns to sentences of
this language.
Let a dependency tree be a rooted tree whose nodes (both internal and
external) are labeled with words from
, as illustrated in Figure 1.1a; the root
is labeled with the special symbol
. The children (‘dependents’) of
a node are ordered with respect to each other and the node itself, so that the
node has both left children that precede it and right children that follow it.
A dependency tree is grammatical iff for every word token
that appears
in the tree,

accepts the (possibly empty) sequence of
s left children (from
right to left), and

accepts the sequence of
s right children (from left to
right).
A string
is generated by the grammar, with analysis
, if
is a
grammatical dependency tree and listing the node labels of
in infix order
yields the string
followed by

.
is called the yield of
.
Bilexicalism. The term bilexical refers to the fact that (i) each
may
specify a wholly different choice of automata
and

, and furthermore (ii)
these automata

and

may make distinctions among individual words that
are appropriate to serve as children (dependents) of
. Thus the grammar is
sensitive to specific pairs of lexical items.
For example, it is possible for one lexical verb to select for a completely
idiosyncratic set of nouns as subject, and another lexical verb to select for an
entirely different set of nouns. Since it never requires more than a two-state
automaton (though with many arcs!) to specify the set of possible subjects
for a verb, there is no penalty for such behavior in the parsing algorithm to be
described here.
3.

AND

RECOGNITION
This section develops a basic

recognition method for simple bilexical
grammars as defined above. We begin with a naive

method drawn from
context-free ‘dotted-rule’ methods such as (Earley, 1970; Graham et al., 1980).
Second, we will see why this method is inefficient. Finally, a more efficient

algorithm is presented.
Both methods are essentially chart parsers, in that they use dynamic pro-
gramming to build up an analysis of the whole sentence from analyses of its
substrings. However,theslowmethod combines traditional constituents, whose
lexical heads may be in the middle, while the fast method combines what we
will call spans, whose heads are guaranteed to be at the edge.

Figures
Citations
More filters
Proceedings Article

An efficient algorithm for projective dependency parsing

Joakim Nivre
TL;DR: This paper presents a deterministic parsing algorithm for projective dependency grammar that has been experimentally evaluated in parsing unrestricted Swedish text, achieving an accuracy above 85% with a very simple grammar.
Proceedings ArticleDOI

The CoNLL 2008 Shared Task on Joint Parsing of Syntactic and Semantic Dependencies

TL;DR: This shared task not only unifies the shared tasks of the previous four years under a unique dependency-based formalism, but also extends them significantly: this year's syntactic dependencies include more information such as named-entity boundaries; the semantic dependencies model roles of both verbal and nominal predicates.
Proceedings Article

Simple Semi-supervised Dependency Parsing

TL;DR: This work focuses on the problem of lexical representation, introducing features that incorporate word clusters derived from a large unannotated corpus, and shows that the cluster-based features yield substantial gains in performance across a wide range of conditions.
Book

Dependency Parsing

TL;DR: This book surveys the three major classes of parsing models that are in current use: transition- based, graph-based, and grammar-based models, and gives a thorough introduction to the methods that are most widely used today.
References
More filters
Journal ArticleDOI

A note on two problems in connexion with graphs

TL;DR: A tree is a graph with one and only one path between every two nodes, where at least one path exists between any two nodes and the length of each branch is given.
Book

Head-driven phrase structure grammar

TL;DR: This book presents the most complete exposition of the theory of head-driven phrase structure grammar, introduced in the authors' "Information-Based Syntax and Semantics," and demonstrates the applicability of the HPSG approach to a wide range of empirical problems.
Journal ArticleDOI

An efficient context-free parsing algorithm

TL;DR: In this article, a parsing algorithm which seems to be the most efficient general context-free algorithm known is described, which is similar to both Knuth's LR(k) algorithm and the familiar top-down algorithm.

An efficient context-free parsing algorithm

TL;DR: A parsing algorithm which seems to be the most efficient general context-free algorithm known is described and appears to be superior to the top-down and bottom-up algorithms studied by Griffiths and Petrick.
Related Papers (5)
Frequently Asked Questions (6)
Q1. What are the contributions mentioned in the paper "Bilexical grammars and their cubic-time parsing algorithms" ?

This chapter introduces weighted bilexical grammars, a formalism in which individual lexical items, such as verbs and their arguments, can have idiosyncratic selectional influences on each other. The obvious parsing algorithm for bilexical grammars ( used by most previous authors ) takes time JLKNM-O P. 

Computational linguistics has a long tradition of lexicalized grammars, in which each grammatical rule is specialized for some individual word. 

The new algorithm also reduces space requirements to , from the cubic space required by CKY-style approaches to bilexical grammar. 

Then one can use the fast bilexical parsing algorithm of 3.3 to generate the highest-weighteddependency tree, and then convert that tree to a phrase-structure tree, as shown in Figure 1.6. 

It is now common to find fully lexicalized versions of many grammatical formalisms, such as context-free and tree-adjoining grammars (Schabes et al., 1988). 

given a dependency tree for Nurses help John readily, the authors can reconstruct the sequence V, VP, VP of states encountered by as it reads help’s right children, and thereby associate a nonterminal attachment level with each child.