scispace - formally typeset
Search or ask a question

Showing papers on "Tree-adjoining grammar published in 2002"


Proceedings Article
01 May 2002
TL;DR: An algorithm which translates the Penn Treebank into a corpus of Combinatory Categorial Grammar (CCG) derivations is presented and it is demonstrated how the combination of preprocessing and type-changing rules minimizes the lexical coverage problem.
Abstract: We present an algorithm which translates the Penn Treebank into a corpus of Combinatory Categorial Grammar (CCG) derivations. To do this we have needed to make several systematic changes to the Treebank which have to effect of cleaning up a number of errors and inconsistencies. This process has yielded a cleaner treebank that can potentially be used in any framework. We also show how unary type-changing rules for certain types of modifiers can be introduced in a CCG grammar to ensure a compact lexicon without augmenting the generative power of the system. We demonstrate how the combination of preprocessing and type-changing rules minimizes the lexical coverage problem.

110 citations


Journal ArticleDOI
TL;DR: Every system of this kind of language equations that are resolved with respect to variables and contain the operations of concatenation, union and intersection is proved to have a least fixed point, and the equivalence of these systems to conjunctive grammars is established.
Abstract: This paper studies systems of language equations that are resolved with respect to variables and contain the operations of concatenation, union and intersection Every system of this kind is proved to have a least fixed point, and the equivalence of these systems to conjunctive grammars is established This allows us to obtain an algebraic characterization of the language family generated by conjunctive grammars

55 citations


Journal ArticleDOI
TL;DR: It is shown that context-free valence grammars over finite monoids or commutative monoids have the same power as valence Grammar over finite groups or Commutative groups, respectively.

50 citations


Book ChapterDOI
TL;DR: It is shown that there exists a unique minimal balanced grammar equivalent to a given one and balanced languages are characterized through a property of their syntactic congruence.
Abstract: Balanced grammars are a generalization of parenthesis grammars in two directions. First, several kind of parentheses are allowed. Next, the set of right-hand sides of productions may be an infinite regular language. XML-grammars are a special kind of balanced grammars. This paper studies balanced grammars and their languages. It is shown that there exists a unique minimal balanced grammar equivalent to a given one. Next, balanced languages are characterized through a property of their syntactic congruence. Finally, we show how this characterization is related to previous work of McNaughton and Knuth on parenthesis languages.

45 citations


01 May 2002
TL;DR: This paper illustrates how tree-adjoining grammars (Joshi and Schabes, 1997) may be embedded in abstract categorial grammar (ACG) by showing how the embedding exemplifies several features of the ACG framework.
Abstract: categorial grammars are not intended as yet another grammatical formalism that would compete with other established formalisms. It should rather be seen as the kernel of a grammatical framework — in the spirit of (Ranta, 2002) — in which other existing grammatical models may be encoded. This paper illustrates this fact by showing how tree-adjoining grammars (Joshi and Schabes, 1997) may be embedded in abstract categorial grammars. This embedding exemplifies several features of the ACG framework: • The fact that the basic objects manipulated by an ACG are λ-terms allows higher-order operations to be defined. Typically, tree-adjunction is such a higher-order operation (Abrusci, Fouquere and Vauzeilles, 1999; Joshi and Kulick, 1997; Monnich, 1997). • The flexibility of the framework allows the embedding to be defined in two stages. A first ACG allows the tree langage of a given TAG to be generated. The abstract language of this first ACG corresponds to the derivation trees of the TAG. Then, a second ACG allows the corresponding string language to be extracted. The abstract language of this second ACG corresponds to the object language of the first one. 2. Abstract Categorial Grammars This section defines our notion of an abstract categorial grammar. We first introduce the notions of linear implicative types, higher-order linear signature, linear λ-terms built upon a higher-order linear signature, and lexicon. Let A be a set of atomic types. The set T (A) of linear implicative types built upon A is inductively defined as follows: 1. if a ∈ A, then a ∈ T (A); 2. if α, β ∈ T (A), then (α−◦ β) ∈ T (A). A higher-order linear signature consists of a triple Σ = 〈A,C, τ〉, where: 1. A is a finite set of atomic types; c © 2002 Philippe de Groote. Proceedings of the Sixth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+6), pp. 101–106. Universita di Venezia. 102 Proceedings of TAG+6 2. C is a finite set of constants; 3. τ : C → T (A) is a function that assigns to each constant in C a linear implicative type in T (A). Let X be a infinite countable set of λ-variables. The set Λ(Σ) of linear λ-terms built upon a higher-order linear signature Σ = 〈A,C, τ〉 is inductively defined as follows: 1. if c ∈ C, then c ∈ Λ(Σ); 2. if x ∈ X , then x ∈ Λ(Σ); 3. if x ∈ X , t ∈ Λ(Σ), and x occurs free in t exactly once, then (λx. t) ∈ Λ(Σ); 4. if t, u ∈ Λ(Σ), and the sets of free variables of t and u are disjoint, then (t u) ∈ Λ(Σ). Λ(Σ) is provided with the usual notion of capture avoiding substitution, α-conversion, and β-reduction (Barendregt, 1984). Given a higher-order linear signature Σ = 〈A,C, τ〉, each linear λ-term in Λ(Σ) may be assigned a linear implicative type in T (A). This type assignment obeys an inference system whose judgements are sequents of the following form: Γ −Σ t : α where: 1. Γ is a finite set of λ-variable typing declarations of the form ‘x : β’ (with x ∈ X and β ∈ T (A)), such that any λ-variable is declared at most once; 2. t ∈ Λ(Σ); 3. α ∈ T (A). The axioms and inference rules are the following:

43 citations


Book ChapterDOI
23 Sep 2002
TL;DR: In inductive inference for synthesizing context free grammars from positive and negative sample strings, implemented in Synapse system, is described.
Abstract: This paper describes inductive inference for synthesizing context free grammars from positive and negative sample strings, implemented in Synapse system. For effective inference of grammars, Synapse employs the following mechanisms. 1. A rule generating method called "inductive CYK algorithm," which generates minimum production rules required for parsing positive samples. 2. Incremental learning for adding newly generated rules to previously obtained rules.Synapse can synthesize both ambiguous grammars and unambiguous grammars. Experimental results show recent improvement of Synapse system to synthesize context free grammars.

32 citations


Journal ArticleDOI
TL;DR: A pumping lemma for random context languages of finite index is proved that generalizes and refines the existing one, and it is shown that these Grammars are strictly weaker than the non-erasing random context grammars.

30 citations


Proceedings ArticleDOI
24 Aug 2002
TL;DR: The resulting compiler creates a context-free backbone of the unification grammar, eliminates left-recursive productions and removes redundant grammar rules, and shows no significant computational overhead with respect to speech recognition performances for speech recognition grammar with compositional semantics compared to grammars without.
Abstract: In this paper a method to compile unification grammars into speech recognition packages is presented, and in particular, rules are specified to transfer the compositional semantics stated in unification grammars into speech recognition grammars. The resulting compiler creates a context-free backbone of the unification grammar, eliminates left-recursive productions and removes redundant grammar rules. The method was tested on a medium-sized unification grammar for English using Nuance speech recognition software on a corpus of 131 utterances of 12 different speakers. Results showed no significant computational overhead with respect to speech recognition performances for speech recognition grammar with compositional semantics compared to grammars without.

29 citations



Book ChapterDOI
08 Apr 2002
TL;DR: A new proof of Chirica and Martin's result, that the attribute values can be computed by a structural recursion over the tree is given, and a new definedness test is derived, which encompasses the traditional closure and circularity tests.
Abstract: A definition of the semantics of attribute grammars is given, using the lambda calculus. We show how this semantics allows us to prove results about attribute grammars in a calculational style. In particular, we give a new proof of Chirica and Martin's result [6], that the attribute values can be computed by a structural recursion over the tree. We also derive a new definedness test, which encompasses the traditional closure and circularity tests. The test is derived by abstract interpretation.

25 citations


Book ChapterDOI
06 Oct 2002
TL;DR: Techniques for a component-based style of programming in the context of higher-oder attribute grammars (HAG) are presented and two attribute grammar components can be re-used across different language-based tool specifications.
Abstract: This paper presents techniques for a component-based style of programming in the context of higher-oder attribute grammars (HAG). Attribute grammar components are "plugged in" into larger attribute grammar systems through higher-order attribute grammars. Higher-order attributes are used as (intermediate) "gluing" data structures.This paper also presents two attribute grammar components that can be re-used across different language-based tool specifications: a visualizer and animator of programs and a graphical user interface AG component. Both components are reused in the definition of a simple language processor. The techniques presented in this paper are implemented in Lrc: a purely functional, higher-order attribute grammar-based system that generates language-based tools.

Proceedings ArticleDOI
03 Jul 2002
TL;DR: Whale Calf is a parser generator that uses conjunctive grammars, a generalization of context-free Grammars with an explicit intersection operation, as the formalism of specifying the language.
Abstract: Whale Calf is a parser generator that uses conjunctive grammars, a generalization of context-free grammars with an explicit intersection operation, as the formalism of specifying the language. All existing parsing algorithms for conjunctive grammars are implemented - namely, the tabular algorithm for grammars in the binary normal form, the tabular algorithm for grammars in the linear normal form, the tabular algorithm for arbitrary grammars, the conjunctive LL, the conjunctive LR and the algorithm based on simulation of the automata equivalent to linear conjunctive grammars. The generated C++ programs can determine the membership of strings in the language and, if needed, create parse trees of these strings.

Book ChapterDOI
Bradford Craig Starkie1
23 Sep 2002
TL;DR: The method presented in this paper has the ability to infer attribute grammars that can generate a wide range of useful data structures such as simple and structured types, lists, concatenated strings, and natural numbers.
Abstract: This paper presents a method for inferring reversible attribute grammars from tagged natural language sentences. Attribute grammars are a form of augmented context free grammar that assign "meaning" in the form of a data structure to a string in a context free language. The method presented in this paper has the ability to infer attribute grammars that can generate a wide range of useful data structures such as simple and structured types, lists, concatenated strings, and natural numbers. The method also presents two new forms of grammar generalisation; generalisation based upon identification of optional phrases and generalisation based upon lists. The method has been applied to and tested on the task of the rapid development of spoken dialog systems.

Book ChapterDOI
18 Sep 2002
TL;DR: This paper establishes their computational equivalence to linear conjunctive grammars, which are linear context-free Grammars extended with an explicit intersection operation, which allows to combine the known results on the generative power and closure properties of triangular trellis automata and linear conj unctive gramMars.
Abstract: Triangular trellis automata, also studied under the name of one-way real-time cellular automata, have been known for several decades as a purely abstract model of parallel computers. This paper establishes their computational equivalence to linear conjunctive grammars, which are linear context-free grammars extended with an explicit intersection operation. This equivalence allows to combine the known results on the generative power and closure properties of triangular trellis automata and linear conjunctive grammars and to obtain new previously unexpected results on this language family - for instance, to determine their exact relationship with other comparable families of languages.

Book ChapterDOI
TL;DR: It is proved that history preserving bisimulation is decidable for finite-state graph grammars, by showing how the problem can be reduced to deciding the equivalence of finite causal automata.
Abstract: Along the years the concurrent behaviour of graph grammars has been widely investigated, and, in particular, several classical approaches to the semantics of Petri nets have been extended to graph grammars. Most of the existing semantics for graph grammars provide a (possibly concurrent) operational model of computation, while little interest has been devoted to the definition of abstract observational semantics. The aim of this paper is to introduce and study a behavioural equivalence over graph grammars, inspired by the classical history preserving bisimulation. Several choices are conceivable according to the kind of concurrent observation one is interested in. We concentrate on the basic case where the concurrent nature of a graph grammar computation is described by means of a prime event structure. As it happens for Petri nets, history preserving bisimulation can be studied in the general framework of causal automata -- a variation of ordinary automata introduced to deal with history dependent formalisms. In particular, we prove that history preserving bisimulation is decidable for finite-state graph grammars, by showing how the problem can be reduced to deciding the equivalence of finite causal automata.

Patent
John T. Maxwell1, Hadar Shemtov1
27 Sep 2002
TL;DR: This article proposed a process for generating with unification based grammars such as Lexical Functional Grammars which uses construction and analysis of generation guides to determine internal facts and eliminate incomplete edges prior to constructing a generation chart.
Abstract: A process for generating with unification based grammars such as Lexical Functional Grammars which uses construction and analysis of generation guides to determine internal facts and eliminate incomplete edges prior to constructing a generation chart. The generation guide can then be used in the construction of the generation chart to efficiently generate with unification-based grammars such as Lexical Functional Grammars. The generation guide is an instance of a grammar that has been specialized to the input and only contains those parts of the grammar that are relevant to the input. When the generation guide is analyzed to determine internal facts a smaller generation chart is produced.

Proceedings ArticleDOI
03 Jul 2002
TL;DR: A polynomial algorithm is shown to decide whether a context-free grammars is self-embedding or not and its advantages with respect to more classical representations by finite automata are pointed out.
Abstract: We consider non-self-embedding (NSE) context-free grammars as a representation of regular sets.We point out its advantages with respect to more classical representations by finite automata, in particular when considering the efficient realization of the rational operations. We give a characterization in terms of composition of regular grammars and state relationships between NSE grammars and push-down automata. Finally we show a polynomial algorithm to decide whether a context-free grammars is self-embedding or not.

Journal Article
TL;DR: It is demonstrated that for every phrase-structure grammar, there exists an equivalent simple semi-conditional grammar that has no more than twelve conditional productions.
Abstract: The present paper discusses the descriptional complexity of simple semi-conditional grammars with respect to the number of conditional productions. More specifically, it demonstrates that for every phrase-structure grammar, there exists an equivalent simple semi-conditional grammar that has no more than twelve conditional productions.

Proceedings ArticleDOI
24 Aug 2002
TL;DR: Results of the coupling of Becker's metarules and a simple yet principled hierarchy of rule application have been successful to generate the large set of verb trees in the grammar, from a very small initial set of trees.
Abstract: We discuss a grammar development process used to generate the trees of the wide-coverage Lexicalized Tree Adjoining Grammar (LTAG) for English of the XTAG Project. Result of the coupling of Becker's metarules and a simple yet principled hierarchy of rule application, the approach has been successful to generate the large set of verb trees in the grammar, from a very small initial set of trees.

Proceedings ArticleDOI
06 Jul 2002
TL;DR: This work presents two tabular algorithms for parsing of non-recursive context-free grammars, and shows that they perform well in practical settings, despite the fact that this problem is PSPACE-complete.
Abstract: We consider the problem of parsing non-recursive context-free grammars, i.e., context-free grammars that generate finite languages. In natural language processing, this problem arises in several areas of application, including natural language generation, speech recognition and machine translation. We present two tabular algorithms for parsing of non-recursive context-free grammars, and show that they perform well in practical settings, despite the fact that this problem is PSPACE-complete.

Proceedings ArticleDOI
14 Jan 2002
TL;DR: A new type of grammar is invented that extends tree grammars by permitting a notion of sharing in the productions that seems to be of independent interest and how to derive type inference from type checking is demonstrated.
Abstract: Abramov and Gluck have recently introduced a technique called URA for inverting first order functional programs. Given some desired output value, URA computes a potentially infinite sequence of substitutions/restrictions corresponding to the relevant input values. In some cases this process does not terminate.In the present paper, we propose a new program analysis for inverting programs. The technique works by computing a finite grammar describing the set of all input that relate to a given output. During the production of the grammar, the original program is implicitly transformed using so-called driving steps. Whereas URA is sound and complete, but sometimes fails to terminate, our technique always terminates and is complete, but not sound. As an example, we demonstrate how to derive type inference from type checking.The idea of approximating functional programs by grammars is not new. For instance, the second author has developed a technique using tree grammars to approximate termination behaviour of deforestation. However, for the present purposes it has been necessary to invent a new type of grammar that extends tree grammars by permitting a notion of sharing in the productions. These dag grammars seem to be of independent interest.

Journal ArticleDOI
TL;DR: A coding theorem is proved which shows that a structured grammar-based code has maximal redundancy/sample O(1=logn) provided that a weak regular structure condition is satisfied.
Abstract: A grammar-based code losslessly compresses each finite-alphabet data string x by compressing a context-free grammar Gx which represents x in the sense that the language of Gx is fxg. In an earlier paper, we showed that if the grammar Gx is a type of grammar called ir- reducible grammar for every data string x, then the resulting grammar-based code has maximal redundancy/sample O(log logn=logn) for n data samples. To further reduce the maximal redun- dancy/sample, in the present paper, we first decompose a context-free grammar into its structure and its data content, then encode the data content conditional on the structure, and finally replace the irreducible grammar condition with a mild condition on the structures of all grammars used to repre- sent distinct data strings of a fixed length. The resulting grammar-based codes are called structured grammar-based codes. We prove a coding theorem which shows that a structured grammar-based code has maximal redundancy/sample O(1=logn) provided that a weak regular structure condition is satisfied.

Book ChapterDOI
23 Sep 2002
TL;DR: It is shown that in contrast to k-valued classical categorial grammars, different classes of Lambek Grammars are not learnable from strings following Gold's model.
Abstract: In this paper we give some learnability results in the field of categorial grammars. We show that in contrast to k-valued classical categorial grammars, different classes of Lambek grammars are not learnable from strings following Gold's model. The results are obtained by the construction of limit points in each considered class: non associative Lambek grammars with empty sequences and Lambek grammars without empty sequences and without product. Such results express the difficulty of learning categorial grammars from unstructured strings and the need for structured examples.

Proceedings ArticleDOI
24 Aug 2002
TL;DR: Parsli is a finite-state parser which can be tailored to the lexicon, syntax, and semantics of a particular application using a hand-editable declarative lexicon that gives the application designer better and easier control over the natural language understanding component than using an off-the-shelf parser.
Abstract: Parsli is a finite-state (FS) parser which can be tailored to the lexicon, syntax, and semantics of a particular application using a hand-editable declarative lexicon. The lexicon is defined in terms of a lexicalized Tree Adjoining Grammar, which is subsequently mapped to a FS representation. This approach gives the application designer better and easier control over the natural language understanding component than using an off-the-shelf parser. We present results using Parsli on an application that creates 3D-images from typed input.

Proceedings ArticleDOI
11 Jul 2002
TL;DR: A phonological probabilistic context-free grammar, which describes the word and syllable structure of German words, is presented, and rules for English phonemes are added to the grammar, and the enriched grammar is trained on an English corpus.
Abstract: We present a phonological probabilistic context-free grammar, which describes the word and syllable structure of German words. The grammar is trained on a large corpus by a simple supervised method, and evaluated on a syllabification task achieving 96.88% word accuracy on word tokens, and 90.33% on word types. We added rules for English phonemes to the grammar, and trained the enriched grammar on an English corpus. Both grammars are evaluated qualitatively showing that probabilistic context-free grammars can contribute linguistic knowledge to phonology. Our formal approach is multilingual, while the training data is language-dependent.

Journal ArticleDOI
01 Jan 2002-Grammars
TL;DR: This paper discusses alternative approaches for defining the denotation of a grammar, culminating in one which is shown to be both compositional and fully-abstract, and shows how grammar modules can be defined such that their semantics retains these desirable properties.
Abstract: Given two context-free grammars (CFGs), G1 and G2, the language generated by the union of the grammars is not the union of the languages generated by each grammar: L(G1∪ G2)≠ L(G1∪ L(G2). In order to account for modularity of grammars, another way of defining the meaning of grammars is needed. This paper adapts results from the semantics of logic programming languages to CFGs. We discuss alternative approaches for defining the denotation of a grammar, culminating in one which we show to be both compositional and fully-abstract. We then show how grammar modules can be defined such that their semantics retains these desirable properties. This gives a clear, mathematically sound way for composing parts of grammars.

01 May 2002
TL;DR: This paper shows that, in terms of weak equivalence, the subclass of MGs which allow (overt) head movement but no phrasal movement in the sense of Stabler 1997, constitutes a proper subclass of linear indexed grammars (LIGs), and thus tree adjoining Grammar and Related Frameworks.
Abstract: The type of a minimalist grammar (MG) introduced in Stabler 1997 provides a simple algebraic formalization of the perspectives as they arise from Chomsky 1995b within the linguistic framework of transformational grammar. As known (cf. Michaelis 2001a, 2001b, Harkema 2001), this MG–type defines the same class of derivable string languages as, e.g., linear context–free (string) rewriting systems (Vijay–Shanker et al. 1987, Weir 1988). In this paper we show that, in terms of weak equivalence, the subclass of MGs which allow (overt) head movement but no phrasal movement in the sense of Stabler 1997, constitutes a proper subclass of linear indexed grammars (LIGs), and thus tree adjoining grammars. We also examine the “inner hierarchic complexity” of this embedding in some more detail by looking at the subclasses canonically resulting from the distinction between left and right adjunction of the moved head to the attracting one. Furthermore, we show that adding the possibility of phrasal movement by allowing at most one “indistinguishable” licensee to trigger such movement already increases the weak generative capacity of at least two of the considered subclasses, while this is not true for the particular subclass of MGs which do not employ any movement at all. The latter define the same class of derivable string languages as context free grammars. On the other hand however, MGs which do not employ head movement but whose licensee set consists of at most two elements, are shown to derive, i.a., languages not derivable by any LIG. In this sense our results contribute to shedding some light on the complexity as it arises from the interplay of two different structural transformation types whose common existence is often argued to be linguistically motivated. ∗This work has been funded by DFG–grant STA 519/1–1. A previous version was published in Proceedings of the Sixth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+6), pp. 57–65, Universita di Venezia.

Proceedings ArticleDOI
24 Aug 2002
TL;DR: It is shown that in contrast to classical categorial grammars, rigid and k-valued Lambek Grammars are not learnable from strings for variants of Lambek calculus.
Abstract: This paper is concerned with learning categorial grammars in Gold's model (Gold, 1967). Recently, learning algorithms in this model have been proposed for some particular classes of classical categorial grammars (Kanazawa, 1998).We show that in contrast to classical categorial grammars, rigid and k-valued Lambek grammars are not learnable from strings. This result holds for variants of Lambek calculus; our proof consists in the construction of limit points in each class. Such a result aims at clarifying the possible directions for future learning algorithms.

Proceedings ArticleDOI
03 Jul 2002
TL;DR: A novel view on top-down predictive parser construction for extended context-free grammars that is based on the rewriting of partial syntax trees is presented.
Abstract: Extended context-free grammars are context-free grammars in which the right-hand sides of productions are allowed to be any regular language rather than being restricted to only finite languages.We present a novel view on top-down predictive parser construction for extended context-free grammars that is based on the rewriting of partial syntax trees. This work is motivated by our development of ecfg, a Java toolkit for the manipulation of extended context-free grammars, and by our continuing investigation of XML.

Journal Article
TL;DR: An overview of the main results concerning their descriptional complexity with respect to the number of nonterminals or productions in partially parallel grammars that perform scattered rewriting and multirewriting is given.
Abstract: During a derivation step, partially parallel grammars rewrite some symbols of the sentential form while leaving the others unrewritten. The present paper discusses grammars that perform two types of this parallelism - scattered rewriting and multirewriting. It gives an overview of the main results concerning their descriptional complexity with respect to the number of nonterminals or productions. In the conclusion, some open problems are pointed out.