scispace - formally typeset
Search or ask a question

Showing papers on "Tree-adjoining grammar published in 2001"


Journal ArticleDOI
05 Jan 2001-Science
TL;DR: This work presents a mathematical framework for the evolutionary dynamics of grammar learning and calculates the condition under which natural selection favors the emergence of rule-based, generative grammars that underlie complex language.
Abstract: Universal grammar specifies the mechanism of language acquisition. It determines the range of grammatical hypothesis that children entertain during language learning and the procedure they use for evaluating input sentences. How universal grammar arose is a major challenge for evolutionary biology. We present a mathematical framework for the evolutionary dynamics of grammar learning. The central result is a coherence threshold, which specifies the condition for a universal grammar to induce coherent communication within a population. We study selection of grammars within the same universal grammar and competition between different universal grammars. We calculate the condition under which natural selection favors the emergence of rule-based, generative grammars that underlie complex language.

407 citations


Journal Article
TL;DR: It is shown how island grammars can be used to generate robust parsers that combine the accuracy of syntactical analysis with the speed, flexibility and tolerance usually only found in lexical analysis.
Abstract: Source model extraction---the automated extraction of information from system artifacts---is a common phase in reverse engineering tools. One of the major challenges of this phase is creating extractors that can deal with irregularities in the artifacts that are typical for the reverse engineering domain (for example, syntactic errors, incomplete source code, language dialects and embedded languages). This paper proposes a solution in the form of emph{island grammars, a special kind of grammars that combine the detailed specification possibilities of grammars with the liberal behavior of lexical approaches. We show how island grammars can be used to generate robust parsers that combine the accuracy of syntactical analysis with the speed, flexibility and tolerance usually only found in lexical analysis. We conclude with a discussion of the development of Mangrove, a generator for source model extractors based on island grammars and describe its application to a number of case studies.

265 citations


Proceedings ArticleDOI
06 Jul 2001
TL;DR: A new categorial formalism based on intuitionistic linear logic, which derives from current type-logical grammars, is abstract in the sense that both syntax and semantics are handled by the same set of primitives.
Abstract: We introduce a new categorial formalism based on intuitionistic linear logic. This formalism, which derives from current type-logical grammars, is abstract in the sense that both syntax and semantics are handled by the same set of primitives. As a consequence, the formalism is reversible and provides different computational paradigms that may be freely composed together.

232 citations


01 Apr 2001
TL;DR: It is proved that conjunctive grammars can still be parsed in cubic time and that the notion of the derivation tree is retained, which gives reasonable hope for their practical applicability.
Abstract: This paper introduces a class of formal grammars made up by augmenting the formalism of context-free grammars with an explicit set-theoretic intersection operation. It is shown that conjunctive grammars can generate some important non-contextfree language constructs, including those not in the intersection closure of context-free languages, and that they can provide very succinct descriptions of some context-free languages and finite intersections of context-free languages. On the other hand, it is proved that conjunctive grammars can still be parsed in cubic time and that the notion of the derivation tree is retained, which gives reasonable hope for their practical applicability.

188 citations


Book ChapterDOI
01 Jan 2001
TL;DR: The approximation algorithm is extended to the case of weighted context-free grammars and shows that the size of the minimal deterministic automata accepting the resulting approximations is of practical use for applications such as speech recognition.
Abstract: We present an algorithm for approximating context-free languages with regular languages. The algorithm is based on a simple transformation that applies to any context-free grammar and guarantees that the result can be compiled into a finite automaton. The resulting grammar contains at most one new nonterminal for any nonterminal symbol of the input grammar. The result thus remains readable and if necessary modifiable. We extend the approximation algorithm to the case of weighted context-free grammars. We also report experiments with several grammars showing that the size of the minimal deterministic automata accepting the resulting approximations is of practical use for applications such as speech recognition.

167 citations


01 Jan 2001
TL;DR: Two systems that automatically generate grammars are built that solve two major problems in grammar development: namely, the redundancy caused by the reuse of structures in a grammar and the lack of explicit generalizations over the structures inA grammar.
Abstract: Grammars are valuable resources for natural language processing. We divide the process of grammar development into three tasks: selecting a formalism, defining the prototypes, and building a grammar for a particular human language. After a brief discussion about the first two tasks, we focus on the third task. Traditionally, grammars are built by hand and there are many problems with this approach. To address these problems, we built two systems that automatically generate grammars. The first system (LexOrg) solves two major problems in grammar development: namely, the redundancy caused by the reuse of structures in a grammar and the lack of explicit generalizations over the structures in a grammar. LexOrg takes several types of specification as input and combines them to automatically generate a grammar. The second system (LexTract) extracts Lexicalized Tree Adjoining Grammars (LTAGs) and Context-free Grammars (CFGs) from Treebanks, and builds derivation trees that can be used to train statistical LTAG parsers directly. In addition to creating Treebank grammars and producing training materials for parsers, LeXTract is also used to evaluate the coverage of existing hand-crafted grammars, to compare grammars for different languages, to detect annotation errors in Treebanks, and to test certain linguistic hypotheses. LexOrg and LeXTract provide two different perspectives on grammars. In LexOrg, elementary trees in an LTAG grammar are the result of combining language specifications such as tree descriptions. In LeXTract, elementary trees are building blocks of syntactic structures in a Treebank. LexOrg makes explicit the language specifications that form elementary trees, whereas LeXTract makes explicit the elementary trees that form syntactic structures. The systems provide a rich set of tools for language description and comparison that greatly enhances our ability to build and maintain grammars and Treebanks effectively.

82 citations


Book ChapterDOI
27 Jun 2001
TL;DR: This paper completes the picture by showing that MGs in the sense of [11] and LCFRSs give in fact rise to the same class of derivable string languages.
Abstract: The type of a minimalist grammar (MG) as introduced by Stabler [11,12] provides an attempt of a rigorous algebraic formalization of the new perspectives adopted within the linguistic framework of transformational grammar due to the change from GB-theory to minimalism. Michaelis [6] has shown that MGs constitute a subclass of mildly context-sensitive grammars in the sense that for each MG there is a weakly equivalent linear context-free rewriting system (LCFRS). However, it has been left open in [6], whether the respective classes of string languages derivable by MGs and LCFRSs coincide. This paper completes the picture by showing that MGs in the sense of [11] and LCFRSs give in fact rise to the same class of derivable string languages.

76 citations


Book ChapterDOI
23 May 2001
TL;DR: It is shown that the number of non-terminal symbols used in the appearance checking mode can be restricted to two, and in the case of graph controlled (and programmed grammars) with appearance checking this number can be reduced to three.
Abstract: We improve the results elaborated in [6] on the number of non-terminal symbols needed in matrix grammars, programmed grammars, and graph-controlled grammars with appearance checking for generating arbitrary recursively enumerable languages. Of special interest is the result that the number of non-terminal symbols used in the appearance checking mode can be restricted to two. In the case of graph controlled (and programmed grammars) with appearance checking also the number of non-terminal symbols can be reduced to three (and four, respectively); in the case of matrix grammars with appearance checking we either need four non-terminal symbols with three of them being used in the appearance checking mode or else again we only need two nonterminal symbols being used in the appearance checking mode, but in that case we cannot bound the total number of non-terminal symbols.

75 citations


Book ChapterDOI
01 Jan 2001
TL;DR: A possible implication of the lexicalization of grammatical structures and the localization of dependencies (especially the predicate- argument relationships) that are central features of LTAG is considered.
Abstract: Let us consider a possible implication of the lexicalization of grammatical structures and the localization of dependencies (especially the predicate- argument relationships) that are central features of LTAG. Consider the elementary trees in the LTAG in Figure 1.. The tree corresponding to John likes peanuts passionately is derived by starting with the elementary tree for likes and then substituting the trees for John and peanuts at the respective nodes of the tree α1 and adjoining the tree for passionately at the VP node of the tree α1. The derivation tree in Figure 1. shows this derivation. If both substitution and adjoining are described as attachment of one tree to another tree, then the entire derivation consists of a set of attachments.

70 citations


Book
01 Jan 2001

68 citations


Journal ArticleDOI
Joachim Lambek1
01 Apr 2001-Grammars
TL;DR: This article presents an algebraic model of grammar in the form of a pregroup, which competes with an earlier model which was once proposed by me and is now being developed further by a small but dedicated group of researchers, and took the shape of a residuated monoid.
Abstract: At first sight, it seems quite unlikely that mathematics can be applied to the study of natural language. However, upon closer examination, it appears that language itself is a kind of mathematics: producing and recognizing speech involves calculations, albeit at a subconscious level, and the rules of grammar which the speaker has mastered, even if she cannot formulate them, resemble the axioms and rules of inference of mathematical logic. In this article I will present an algebraic model of grammar in the form of a pregroup, which competes with an earlier model which was once proposed by me and is now being developed further by a small but dedicated group of researchers, and took the form of a residuated monoid. I am not fully convinced that either of these models really captures the cognitive processes involved, and I still suspect that rewrite systems, also known as production grammars, do a better job. Yet the algebraic models offer an alternative approach of interest to the more mathematically inclined students of language. Although I have told this story before (Lambek, 1999), the present version is addressed to linguists, hence some mathematical definitions have been deferred to an appendix and proofs have been left out altogether. While some red herrings have been eliminated (protogroups, inflectors), the small fragment of English grammar treated here is essentially the same as in the earlier version.

Book ChapterDOI
27 Jun 2001
TL;DR: It is concluded that Minimalist Grammars are weakly equivalent to Multiple Context-Free Grammar.
Abstract: In this paper we will fix the position of Minimalist Grammars as defined in Stabler (1997) in the hierarchy of formal languages. Michaelis (1998) has shown that the set of languages generated by Minimalist Grammars is a subset of the set of languages generated by Multiple Context-Free Grammars (Seki et al., 1991). In this paper we will present a proof showing the reverse. We thus conclude that Minimalist Grammars are weakly equivalent to Multiple Context-Free Grammars.

Journal ArticleDOI
TL;DR: It is shown how the DSG formalism, which is designed to inherit many of the characterestics of LTAG, can be used to express a variety of linguistic analyses not available in LTAG.
Abstract: There is considerable interest among computational linguists in lexicalized grammatical frame-works; lexicalized tree adjoining grammar (LTAG) is one widely studied example. In this paper, we investigate how derivations in LTAG can be viewed not as manipulations of trees but as manipulations of tree descriptions. Changing the way the lexicalized formalism is viewed raises questions as to the desirability of certain aspects of the formalism. We present a new formalism, d-tree substitution grammar (DSG). Derivations in DSG involve the composition of d-trees, special kinds of tree descriptions. Trees are read off from derived d-trees. We show how the DSG formalism, which is designed to inherit many of the characterestics of LTAG, can be used to express a variety of linguistic analyses not available in LTAG.

Journal ArticleDOI
TL;DR: A new DNA parsing system, comprising a logic grammar formalism called Basic Gene Grammars and a bidirectional chart parser DNA-ChartParser, which allowed different sources of knowledge for recognizing E.coli promoters to be combined to achieve better accuracy.
Abstract: Motivation: The field of ‘DNA linguistics’ has emerged from pioneering work in computational linguistics and molecular biology. Most formal grammars in this field are expressed using Definite Clause Grammars but these have computational limitations which must be overcome. The present study provides a new DNA parsing system, comprising a logic grammar formalism called Basic Gene Grammars and a bidirectional chart parser DNA-ChartParser. Results: The use of Basic Gene Grammars is demonstrated in representing many formulations of the knowledge of Escherichia coli promoters, including knowledge acquired from human experts, consensus sequences, statistics (weight matrices), symbolic learning, and neural network learning. The DNA-ChartParser provides bidirectional parsing facilities for BGGs in handling overlapping categories, gap categories, approximate pattern matching, and constraints. Basic Gene Grammars and the DNAChartParser allowed different sources of knowledge for recognizing E.coli promoters to be combined to achieve better accuracy as assessed by parsing these DNA sequences in real-world data sets. Availability: DNA-ChartParser runs under SICStus Prolog. It and a few examples of Basic Gene Grammars are available at the URL: http://www.dai.ed.ac.uk/∼siu/DNA Contact: {siu,chrism,dr}@dai.ed.ac.uk

Journal ArticleDOI
23 May 2001
TL;DR: It is shown that, in the case of context-free programmed grammars with appearance checking working under free derivations, three nonterminals are enough to generate every recursively enumerable language.
Abstract: We show that, in the case of context-free programmed grammars with appearance checking working under free derivations, three nonterminals are enough to generate every recursively enumerable language. This improves the previously published bound of eight for the nonterminal complexity of these grammars. This also yields an improved nonterminal complexity bound of four for context-free matrix grammars with appearance checking. Moreover, we establish nonterminal complexity bounds for context-free programmed and matrix grammars working under leftmost derivations.

Journal ArticleDOI
TL;DR: A new representation scheme for extended context-free grammars (the symbol-threaded expression forest), a new normal form for these grammARS (dot normal form) and new regular expression algorithms are introduced.

01 Jan 2001
TL;DR: This paper shows that the analysis presented in earlier papers can be extended in a reasonable way to several cases that were unaccounted for in the original discussion, including well-known examples as the following.
Abstract: In recent papers (Kroch and Joshi 1985, Kroch 1987) we claimed that, if one adopts the Tree Adjoining Grammar (TAG) formalism of Joshi, Levi, and Takahashi (1975) as the formal language of syntax, the ungrammaticality of extractions from whislands can be made to follow in a straightforward way from the nonexistence of multiple whfronting in simple questions. The analysis we gave was oversimplified, however, because it wrongly predicted all whisland extractions to be ungrammatical, and we know that certain of them are well-formed, not only in languages like Swedish or Italian, but also in English (Chomsky 1986, Grimshaw 1986). Nevertheless, the analysis we gave had the attraction of providing a simple structural explanation for the whisland effect, and it generalized directly to such other manifestations of subjacency as the Complex Noun Phrase Constraint (CNPC). In this paper we show that the analysis presented in our earlier papers can be extended in a reasonable way to several cases that were unaccounted for in the original discussion. In particular, we discuss such well-known examples as the following:

Proceedings ArticleDOI
06 Jul 2001
TL;DR: A logical definition of Minimalist grammars, that are Stabler's formalization of Chomsky's minimalist program, leads to a neat relation to categorial grammar, (yielding a treatment of Montague semantics), a parsing-as-deduction in a resource sensitive logic, and a learning algorithm from structured data.
Abstract: We provide a logical definition of Minimalist grammars, that are Stabler's formalization of Chomsky's minimalist program. Our logical definition leads to a neat relation to categorial grammar, (yielding a treatment of Montague semantics), a parsing-as-deduction in a resource sensitive logic, and a learning algorithm from structured data (based on a typing-algorithm and type-unification). Here we emphasize the connection to Montague semantics which can be viewed as a formal computation of the logical form.

01 Jan 2001
TL;DR: The PET platform is introduced, which has been developed with two goals: to serve as a flexible basis for research in efficient processing techniques, allowing precise empirical study and comparison of different approaches, and to provide an efficient run-time processor that supports fruitful scientific and practical utilization of HPSG grammars.
Abstract: The efficiency problem in parsing with large-scale unification grammars, including implementations in the Head-driven Phrase Structure grammar (HPSG) framework, used to be a serious obstacle to their application in research and commercial settings. Over the past few years, however, significant progress in efficient processing has been achieved. Still, many of the proposed techniques were developed in isolation only, making comparison and the assessment of their combined potential difficult. Also, a number of techniques were never evaluated on large-scale grammars. This thesis sets out to improve this situation by reviewing, integrating, and evaluating a number of techniques for efficient unification-based parsing. A strong focus is set on efficient graph unification. I provide an overview of previous work in this area of research, including the foundational algorithm in the work of Wroblewski (1987), for which I identify a previously unnoticed flaw, and provide a solution. I introduce the PET platform, which has been developed with two goals: (i) to serve as a flexible basis for research in efficient processing techniques, allowing precise empirical study and comparison of different approaches, and (ii) to provide an efficient run-time processor that supports fruitful scientific and practical utilization of HPSG grammars. The design and implementation of PET is presented in detail, including a closer look at efficient semi-lattice computation in the preprocessor. A number of experiments with PET are discussed, using three existing large-scale HPSG grammars of English, Japanese, and German. I give precise empirical answers to some open research questions, most importantly the question of feature structure encoding (lists of feature-value pairs versus representations based on fixed arity), and show that this is a much less important factor than often assumed. I also address the question of predicting practical performance across grammars and processing platforms. Finally, I take a wider perspective and report on the overall improvement of processing performance for HPSG grammars (as exemplified by the LinGO grammar) that has been achieved over a period of four years by an international consortium of research groups.

Journal ArticleDOI
TL;DR: The Logical Description Grammar (LDG) as discussed by the authors is a model of grammar and the syntax-semantics interface based on descriptions in elementary logic that can simultaneously describe the syntactic structure and the semantics of a natural language expression.
Abstract: We present Logical Description Grammar (LDG), a model of grammar and the syntax-semantics interface based on descriptions in elementary logic. A description may simultaneously describe the syntactic structure and the semantics of a natural language expression, i.e., the describing logic talks about the trees and about the truth-conditions of the language described. Logical Description Grammars offer a natural way of dealing with underspecification in natural language syntax and semantics. If a logical description (up to isomorphism) has exactly one tree plus truth-conditions as a model, it completely specifies that grammatical object. More common is the situation, corresponding to underspecification, in which there is more than one model. A situation in which there are no models corresponds to an ungrammatical input.


Proceedings ArticleDOI
06 Jul 2001
TL;DR: This paper studies a parsing technique whose purpose is to improve the practical efficiency of RCL parsers and uses the shared derivation forest output by a prior RCL parser for a suitable superset of L.
Abstract: The theoretical study of the range concatenation grammar [RCG] formalism has revealed many attractive properties which may be used in NLP. In particular, range concatenation languages [RCL] can be parsed in polynomial time and many classical grammatical formalisms can be translated into equivalent RCGs without increasing their worst-case parsing time complexity. For example, after translation into an equivalent RCG, any tree adjoining grammar can be parsed in O(n6) time. In this paper, we study a parsing technique whose purpose is to improve the practical efficiency of RCL parsers. The non-deterministic parsing choices of the main parser for a language L are directed by a guide which uses the shared derivation forest output by a prior RCL parser for a suitable superset of L. The results of a practical evaluation of this method on a wide coverage English grammar are given.


Proceedings Article
18 Feb 2001
TL;DR: A formal lexicalized dependency grammar based on Meaning-Text theory that builds bubble trees as syntactic representations, that is, trees whose nodes can be filled by bubbles, which can contain others nodes.
Abstract: The paper presents a formal lexicalized dependency grammar based on Meaning-Text theory. This grammar associates semantic graphs with sentences. We propose a fragment of a grammar for French, including the description of extractions. The main particularity of our grammar is it that it builds bubble trees as syntactic representations, that is, trees whose nodes can be filled by bubbles, which can contain others nodes. Our grammar needs more complex operations of combination of elementary structures than other lexicalized grammars, such as TAG or CG, but avoids the multiplication of elementary structures and provides linguistically well-motivated treatments.1

01 Jan 2001
TL;DR: It is shown that matrix languages are characterized by valence grammars with target sets over (arbitrary) finite monoids, and proves a conjecture due to M. Jantzen stating that unordered vector languages can be characterized by Grammars controlled by permutations of regular languages.
Abstract: We discuss an extension of valence grammars, where the value of a valid derivation is allowed to be an element of a given target set. We discuss closure properties of language families generated by such grammars. Moreover, we investigate the generative power of valence grammars with target sets over the groups Z k , over the monoids N k and over finite monoids. This way, we also prove a conjecture due to M. Jantzen stating that unordered vector languages can be characterized by grammars controlled by permutations of regular languages. Furthermore, we show that matrix languages are characterized by valence grammars with target sets over (arbitrary) finite monoids.

Journal ArticleDOI
TL;DR: It is shown that PCGSTT whose component grammars are terminal distinguishable right-linear, a notion introduced by Radhakrishnan and Nagaraja in [33,34], are identifiable in the limit if certain data communication information is supplied in addition.
Abstract: We introduce a new variant of PC grammar systems, called PC grammar systems with terminal transmission, PCGSTT for short. We show that right-linear centralized PCGSTT have nice formal language theoretic properties: they are closed under gsm mappings (in particular, under intersection with regular sets and under homomorphisms) and union; a slight variant is, in addition, closed under concatenation and star; their power lies between that of n-parallel grammars introduced by Wood and that of matrix languages of index n, and their relation to equal matrix grammars of degree n is discussed. We show that membership for these language classes is complete for NL. In a second part of the paper, we discuss questions concerning grammatical inference of these systems. More precisely, we show that PCGSTT whose component grammars are terminal distinguishable right-linear, a notion introduced by Radhakrishnan and Nagaraja in [33,34], are identifiable in the limit if certain data communication information is supplied in addition.

01 Jan 2001
Abstract: The question about the position of categorial grammars in the Chomsky hierarchy arose in late 1950s and early 1960s. In 1960 Bar-Hillel, Gaifman, and Shamir [1] proved that a formal language can be generated by some basic categorial grammar if and only if the language is context-free. They conjectured (see also [7]) that the same holds for Lambek grammars, i. e., for categorial grammars based on a syntactic calculus introduced in 1958 by J. Lambek [10] (this calculus operates with three connectives: multiplication or concatenation of languages, left division, and right division). The proof of one half of this conjecture (namely, that every context-free language can be generated by some Lambek grammar) in fact coincides with the proof

Book ChapterDOI
18 Feb 2001
TL;DR: This article presented a formal lexicalized dependency grammar based on meaning-text theory, which associates semantic graphs with sentences and uses bubble trees as syntactic representations, that is, trees whose nodes can be filled by bubbles, which can contain others nodes.
Abstract: The paper presents a formal lexicalized dependency grammar based on Meaning-Text theory. This grammar associates semantic graphs with sentences. We propose a fragment of a grammar for French, including the description of ex- tractions. The main particularity of our grammar is it that it builds bubble trees as syntactic representations, that is, trees whose nodes can be filled by bubbles, which can contain others nodes. Our grammar needs more complex operations of combination of elementary structures than other lexicalized grammars, such as TAG or CG, but avoids the multiplication of elementary structures and provides linguistically well-motivated treatments.

Proceedings ArticleDOI
06 Jul 2001
TL;DR: An algorithm for extracting invertible probabilistic translation grammars from bilingual aligned and linguistically bracketed text and a number of heuristics to reduce the theoretically exponential computation time are presented.
Abstract: This paper presents an algorithm for extracting invertible probabilistic translation grammars from bilingual aligned and linguistically bracketed text The invertibility condition requires all translation ambiguities to be resolved in the final translation grammar The paper examines the complexity of inducing translation grammars and proposes a number of heuristics to reduce the the theoretically exponential computation time

Journal ArticleDOI
Peter Fletcher1
TL;DR: It is argued that the connectionist style of computation is, in some ways, better suited than sequential computation to the task of representing and manipulating recursive structures.
Abstract: This paper presents a new connectionist approach to grammatical inference. Using only positive examples, the algorithm learns regular graph grammars, representing two-dimensional iterative structures drawn on a discrete Cartesian grid. This work is intended as a case study in connectionist symbol processing and geometric concept formation. A grammar is represented by a self-configuring connectionist network that is analogous to a transition diagram except that it can deal with graph grammars as easily as string grammars. Learning starts with a trivial grammar. expressing no grammatical knowledge, which is then refined, by a process of successive node splitting and merging, into a grammar adequate to describe the population of input patterns. In conclusion. I argue that the connectionist style of computation is, in some ways, better suited than sequential computation to the task of representing and manipulating recursive structures.