scispace - formally typeset
Open AccessJournal ArticleDOI

Two ways of formalizing grammars

Mark Johnson
- 01 Jun 1994 - 
- Vol. 17, Iss: 3, pp 221-248
Reads0
Chats0
TLDR
This paper focuses on two widely-used "formal" or "logical" representations of gram mars in computational linguistics, Definite Clause Grammars and Feature Structure Grammar, and describes the way in which they express the recognition problem and the parsing problem.
Abstract
A grammar is a formal device which both identifies a certain set of utter ances as well-formed, and which also defines a transduction relation be tween these utterances and their linguistic representations. This paper focuses on two widely-used "formal" or "logical" representations of gram mars in computational linguistics, Definite Clause Grammars and Feature Structure Grammars, and describes the way in which they express the recognition problem (the problem of determining if an utterance is in the language generated by a grammar) and the parsing problem (the problem of finding the analyses assigned by a grammar to an utterance). Although both approaches are 'constraint-based', one of them is based on logical consequence relation, and the other is based on satisfiability. The main goal of this paper is to point out the different conceptual basis of these two ways of formalizing grammars, and discuss some of their properties.

read more

Content maybe subject to copyright    Report

MARK JOHNSON
TWO
WAYS OF
FORMALIZING GRAMMARS*
1.
INTRODUCTION
A
grammar
is a
formal device
which
both
identifies
a certain set
of
utter
ances as
well-formed,
and
which also
defines
a
transduction relation be
tween
these
utterances
and
their
linguistic
representations.
This
paper
focuses
on two
widely-used
"formal"
or
"logical"
representations
of
gram
mars in
computational
linguistics,
Definite
Clause
Grammars
and Feature
Structure
Grammars,
and
describes
the
way
in
which
they
express
the
recognition problem
(the
problem
of
determining
if
an utterance is
in the
language
generated
by
a
grammar)
and the
parsing problem
(the
problem
of
finding
the
analyses
assigned
by
a
grammar
to an
utterance).
Although
both
approaches
are
'constraint-based',
one
of them
is
based
on
logical consequence
relation,
and the other
is
based on
satisfiability.
The
main
goal
of this
paper
is
to
point
out
the
different
conceptual
basis
of
these two
ways
of
formalizing
grammars,
and
discuss
some
of their
properties.
1.1.
Definite-Clause
Grammars,
A
Validity-based Approach
The
definite-clause
grammar (DCG)
framework
originates
in
Colmerauer's
work on
Metamorphosis
Grammars
in the 1970's
(Colmerauer 1978)
and
was
developed
and
popularized
by
Pereira and Warren
(1981),
Pereira
and
Shieber
(1987)
and
others. In this
approach,
a
grammar
(here
taken
to
include
the
lexicon)
is conceived
of as a set of axioms. The well
formedness
of
an
utterance
and
the
fact
that
it has
a
certain
linguistic
structure
are theorems
that follow
from these
axioms,
so
both the
recog
nition
and
parsing problems
is one
of
determining
if
certain
types
of
formulae
are
logical
consequences
of
these
axioms. Thus
the well-form
edness
or
grammaticality
of
a
particular
utterance
is
expressed
by
the fact
that the
corresponding
formula is
a
consequence
of
the
grammar
axioms,
and
ungrammaticality
by
the fact that the
corresponding
formula
is not
a
*
I
would
like to
thank Edward Stabler and an
anonymous
L&P reviewer for their
helpful
comments on an
earlier
draft
of
this
paper.
Of
course,
all
responsibility
for errors
in
this
paper
rests with me.
Linguistics
and
Philosophy
17:
221-248,
1994.
?
1994 Kluwer Academic
Publishers.
Printed in
the
Netherlands.

222
MARK
JOHNSON
consequence
of
the
grammar
axioms
(even
though
it
may
be
consistent
with
them).
That
is,
if the
grammar
axioms
are
D
and
the formula
wf
(u,
s)
asserts
that the
utterance
u is well-formed
with
linguistic
reprsentation
s
(where
s
might
be
interpreted
as
a
parse
tree,
etc.),
then
the
recognition problem
is
the
problem
of
determining
if
the
following
holds.1
D
t
3s
wf(u,
s).
The
parsing
problem
is
the
problem
of
finding
all of the s such
that
the
following
holds
for the
given
utterance
u.
D t
wf(u,
s).
In
general
D is
a
finite
set
of
closed
formulae,
so these
problems
are
equivalent
to the
following
validity
problems,
where D' is a
conjunction
of the
members of
D.
t=D' -3s
wf(u, s).
l
D'
-wf(u,
s).
To
summarize,
in the
DCG
approach
the intended
interpretation
Xi
is
one in
which
linguistic
representations
and
strings
are
conceptualized
as
individuals.
The
grammar
axioms
D
state
the
essential
properties
of
the
intended
interpretation
Ati,
so
if
u
is
interpreted
in
X, as
a
grammatical
utterance
with
linguistic
structure
s,
then
wf(u,
s)
is
true
in
every
model
of
D.
1.2. Feature
Structures,
A
Satisfiability-based
Approach
The second
framework
is the
feature-structure (FS)
approach,
where
a
grammar (which
includes
the
lexicon)
is
conceived
of
as
a
set
of
constraints,
and
a
well-formed
linguistic
representation
is
any
structure
that
satisfies
these
constraints.2
Specifically,
the
grammar
and
the
utterance
both im
pose
constraints
that
the
linguistic
structure must meet.
The
well-formed
or
grammatical
structures are
those
that
satisfy
the
constraints
imposed
by
the
grammar.
An
utterance
is
well-formed
iff one
of these
structures
1
In
this
paper
the
following
notation is used.
Object-language
expressions
are
written
in
sanserif
font,
e.g.
x,
y
etc.,
while
meta-language
variables
(ranging
over
object-language
expressions)
are
written
in italic
font,
e.g.
x,
y,
etc.
2
The
version
considered
here
is
similar to
HPSG
(Pollard
and
Sag
1987)
in that
it
is
expressive
enough
that
no
external
phrase
structure
component
is
required
-
the
phrase
structure
rules
are encoded
as
feature structure
constraints
-
and is a
simplification
of
systems
proposed
by
Carpenter
(1992).

TWO WAYS OF
FORMALIZING GRAMMARS
223
also
meets the
additional constraint that it
"corresponds"
to
the
utterance
in
an
appropriate
way
(e.g.,
the
structure's
yield (terminal
string)
is the
string
of
words
of
that
utterance).
Thus
grammaticality
or
well-formedness
of an
utterance
corresponds
to the
satisfiability
of
a
set of
constraints,
and
ungrammaticality
or
ill-formedness
corresponds
to
the
unsatisfiability
of
that
set.
In
formalizing
the
recognition
and
parsing problems
in
this
approach,
linguistic representations
can
be
regarded
as
interpretations,
and
the
con
straints
as
expressions
or formulae
(from
some
language
of
constraints)
which
these
interpretations
must
satisfy
in
order to be
considered
well
formed
linguistic
representations.
That
is,
a
well-formed
linguistic
repre
sentation is
a
model
of
these formulae
(rather
than
an
individual
in a
model
as in
the
DCG
approach),
and
the
set
of
all
models
of the
gramma
tical
constraints
is the
set of
all
well-formed
linguistic
representations.
Thus
unlike
the
DCG
approach,
in
general
there
is
no
single
intended
model of a set of
feature structure constraints.
Most
of the work
in
this field
has
focussed
on
the
development
of
specialized languages
for
expressing systems
of constraints
to be used
as
annotations
on
phrase-structures
rules
(e.g.,
the Feature
Description
Language
of
Kasper
and
Rounds
(1990)
and
the
attribute-value
languages
of
Johnson
(1988)).
It
seems
that
the
language
of
first-order
logic
(in
fact,
usually
decidable
sublanguages
thereof)
is
capable
of
expressing
these
kinds
of
constraints
(Johnson
1990a,
b, 1991a, b,
Smolka
1992).
Manaster
Ramer and Rounds
(1987)
and
Carpenter
(1992)
propose
extended
ver
sions
of
these
systems
that
are
expressive
enough
to be
linguistically
useful
alone
(i.e.,
without other
descriptive
devices
such as a
phrase-structure
'backbone').
This
paper
explores
the
degree
to
which such
an extended
feature
system
can
be
expressed
in
a first-order
language.3
This also
aids
comparison
with
the DCG
approach,
which is
formulated in the
same
language.
The
recognition
problem
is the
problem
of
determining
the simultaneous
satisfiability
of both
grammar
and
utterance
constraints.
That
is,
if
F
is
a
formula
expressing
the
grammatical
constraints
that
every
well-formed
linguistic
structure must
satisfy (i.e.
that
is
true
in
exactly
the
well-formed
structures)
and
yield(u)
is
a
formula that
is
true in an
interpretation
(i.e.,
a
linguistic
structure)
iff
that
interpretation corresponds
to
utterance
u
3
An
interesting
alternative not discussed
in this
paper
is to
extend
a
standard first-order
language by
adding
'feature-structure
expressions'
to that
language.
It
seems
that
the
most
insightful
semantics for
such an
extended
language
is
based
on
abduction;
see Hohfield
and
Smolka
(1988)
and
Chen
and
Warren
(1989)
for details.

224
MARK JOHNSON
(say,
has
u
as its
phonological
form),
then
utterance
u
is well-formed
iff
there exists
a
model
.
such
that
the
following
is
true.
~
F
A
yield(u).
The
parsing problem
is the
problem
of
describing
or
characterizing
the
set of
models of
the
conjoined
constraints. Since
this
set
may
be
infinite,
it
is not
in
general
possible
to
exhaustively
enumerate these models.
There
are
two
standard
techniques
for
describing
the
models
of the
constraints,
both
exploiting
the
observation
that infinite
sets can
have finite
descrip
tions
(e.g.
the
infinite
set
of
integers greater
than
7
has the
finite
descrip
tion
"{x
I
x
>
7}").
These
two
techniques
are
discussed
in
detail
in
sections
2.8
and
2.10 of Johnson
(1988).
The
first
technique
exploits
the observation that
in
cases where the
possible
constraints
are
restricted,
it
may
be
possible
to
show
that
the set
of
models
possesses
a
certain
structure,
so
that
an
infinite
set of
models
can
be
finitely
described,
i.e.,
specified
or
identified
with
finite
means.
Usually,
attention
is
restricted
to
a
certain
type
of
interpretation,
e.g.
acyclic
deterministic finite
automata
(DFA)
in
Kasper
and Rounds
(1990),
and
attribute-value structures
(AVS)
in
Johnson
(1988).
Kasper
and
Rounds
(1990)
showed
that the set of
DFA
satisfying
any
constraint
expressible
in
their Feature
Description
Logic
is
a
finite
union of
principal
filters
(generated by
the
"minimal
models" with
respect
to the
"subsump
tion"
ordering),
and
Johnson
(1988)
showed that
the set of
AVSs
satisfying
any
constraint
expressible
in an
attribute-value
language
is
a
finite
union
of
finite differences of
finitely-generated
principal
filters;4
in
both
cases
there are
effective
procedures
for
constructing
these
generators,
which
constitute
a
finite
description
of
a
(possibly
infinite)
set of
models.
The
second
technique
is
a
variation of
the
first
one;
it is
based
on
the
observation
that
every
formula identifies
a
set of
interpretations,
namely
those
that
satisfy
it.
Thus the
formula
F
A
yield(u)
is
a
description
of
the
set
of
its
models
(although
perhaps
not a
very
useful
one).
For some
constraint
languages
(including
those
of
Kasper
and
Rounds
(1990)
and
Johnson
(1988))
there exist
algorithms
that reduce
an
arbitrary
formula
to
an
equivalent
formula
in a
"normal
form",
from
which
one
can
"read
off'
the
important
properties
of the models
(see
sections 2.8
and 2.10
of
Johnson
(1988)
for further
discussion).
Independently
of
the existence of
normal
forms,
however,
if
it
can
be
shown
that
if
4
Because
attribute-value
languages
can
express
negated
constraints,
Johnson
(1988)
requires
"negative"
minimal
models
(i.e.
"inequality
arcs")
as well as
"positive"
minimal models.

TWO WAYS
OF
FORMALIZING GRAMMARS
225
F
A
yield(u)
1
A
for
some
formula
A,
then
A
is true
of
every
linguistic
representation
that
satisfies the
grammar
constraints and
corresponds
to
the
utterance
u,
i.e.
A is
a
description
of
the
well-formed
structures
of u. Thus
information
about
an
utterance
can
be
extracted
by
computing
the
logical
consequences
of
the
(grammar
and
utterance)
constraints.5
For
example,
if
the
utterance
u
is
ungrammatical,
then
F
A
yield(u)
1
false
because there
are no
models of these
constraints.
2.
FORMALIZING
CONTEXT-FREE
GRAMMARS
Both
approaches
are
capable
of
expressing
grammars
considerably
more
complicated
than the
context-free
grammars
described in this
section,
but
it is
instructive
to
consider these
simpler
systems.
This
paper
follows
standard
linguistic
practice
in
assuming
that the
right
hand side of each
production
in the
grammars
being
formalized is either
a
(possibly empty)
sequence
of non-terminals
or
else
a
single
terminal.
This
assumption
sim
plifies
the formalization somewhat without
restricting
the class of lan
guages
that
can
be
expressed.
First,
formalizations of
the
recognition
problem
for
following
simple
context-free
grammar
(based
on
the
simple
grammar
of Shieber
1986)
are
presented.
Then the axioms
are
modified
so
that
a
representation
of the
parse
tree
is
produced
as
well.
Finally,
the
axioms
are
further modified
to
include
agreement
features,
so
that
ungrammatical
utterances such
as
*Knights
sleeps
are
not
generated.
S
-*
NP VP
VP-*V NP
NP uther
()
NP-,
knights
VP
->
sleeps
V--
like
5
Not all the
consequences
A
are
informative,
of
course,
since the
set
of
consequences
includes
e.g.
all
tautologies. Correspondingly,
not
all of
the
logical
consequences
of the
DCG
axioms
are
of interest
either.

Citations
More filters
Reference EntryDOI

Natural language processing and language learning

TL;DR: As a relatively young field of research and development that began with work on crypt-analysis and machine translation around 50 years ago, natural language processing (NLP) is concerned with the automated processing of human language.
Book ChapterDOI

The Cambridge Handbook of Learner Corpus Research: Learner corpora and natural language processing

TL;DR: This chapter focuses on one of the two application areas for NLP in the context of language learning, the use of NLP to process the native language to be learned, for example, to generate exercises, to support retrieval of reading material at the appropriate learner level, or to present texts to learners with visual or other enhancements to support language learning.
Journal Article

Computing with features as formulae

TL;DR: A fixed-point characterization of the minimal models of these formulae that serves as the theoretical foundation of a forward-chaining algorithm for determining their satisfiability is provided.

Detecting and Diagnosing Grammatical Errors for Beginning Learners of German: From Learner Corpus Annotation to Constraint Satisfaction Problems.

TL;DR: A constraint-based dependency parser provides the foundation for a flexible and modular analysis of German by representing parsing as a constraint satisfaction problem and the grammar checker, Fledgling, detects and diagnoses errors using constraint relaxation with a general-purpose conflict detection algorithm.
Proceedings ArticleDOI

Representing Constraints with Automata

TL;DR: This paper achieves an efficient representation of knowledge about the content of constraints which can be used as a practical tool for grammatical theory verification by using the intertranslatability of formulae of MSO logic and tree automata and the embedding ofMSO logic into a constraint logic programming scheme.
References
More filters
Book

Foundations of logic programming

TL;DR: This is the second edition of an account of the mathematical foundations of logic programming, which collects, in a unified and comprehensive manner, the basic theoretical results of the field, which have previously only been available in widely scattered research papers.
Book

Head-driven phrase structure grammar

TL;DR: This book presents the most complete exposition of the theory of head-driven phrase structure grammar, introduced in the authors' "Information-Based Syntax and Semantics," and demonstrates the applicability of the HPSG approach to a wide range of empirical problems.
Book ChapterDOI

Negation as failure

TL;DR: It is shown that when the clause data base and the queries satisfy certain constraints, which still leaves us with a data base more general than a conventional relational data base, the query evaluation process will find every answer that is a logical consequence of the completed data base.
Book

The logic of typed feature structures

TL;DR: The Logic of Typed Feature Structures as discussed by the authors is a monograph that brings all the main theoretical ideas into one place where they can be related and compared in a unified setting.
Trending Questions (2)
What are ways of defining grammar?

The paper discusses two ways of formalizing grammars: Definite Clause Grammars and Feature Structure Grammars.

What are ways of defining grammar in linguistics?

The paper discusses two ways of formalizing grammars in computational linguistics: Definite Clause Grammars and Feature Structure Grammars.