scispace - formally typeset
Search or ask a question

Showing papers on "Context-sensitive grammar published in 2002"


Patent
Mehryar Mohri1, Mark-Jan Nederhof1
18 Jul 2002
TL;DR: In this article, a context-free grammar is represented by a weighted finite-state transducer, which can be used to efficiently compile that grammar into a weighted automaton that accepts the strings allowed by the grammar with the corresponding weights.
Abstract: A context-free grammar can be represented by a weighted finite-state transducer. This representation can be used to efficiently compile that grammar into a weighted finite-state automaton that accepts the strings allowed by the grammar with the corresponding weights. The rules of a context-free grammar are input. A finite-state automaton is generated from the input rules. Strongly connected components of the finite-state automaton are identified. An automaton is generated for each strongly connected component. A topology that defines a number of states, and that uses active ones of the non-terminal symbols of the context-free grammar as the labels between those states, is defined. The topology is expanded by replacing a transition, and its beginning and end states, with the automaton that includes, as a state, the symbol used as the label on that transition. The topology can be fully expanded or dynamically expanded as required to recognize a particular input string.

171 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider XML documents described by a document type definition (DTD) and show that every XML language has a unique XML-grammar, and give two characterizations of languages generated by XMLgrammars, one is set-theoretic, the other is by a kind of saturation property.
Abstract: We consider XML documents described by a document type definition (DTD). An XML-grammar is a formal grammar that captures the syntactic features of a DTD. We investigate properties of this family of grammars. We show that every XML-language basically has a unique XML-grammar. We give two characterizations of languages generated by XML-grammars, one is set-theoretic, the other is by a kind of saturation property. We investigate decidability problems and prove that some properties that are undecidable for general context-free languages become decidable for XML-languages. We also characterize those XML-grammars that generate regular XML-languages.

63 citations


Journal ArticleDOI
TL;DR: Every system of this kind of language equations that are resolved with respect to variables and contain the operations of concatenation, union and intersection is proved to have a least fixed point, and the equivalence of these systems to conjunctive grammars is established.
Abstract: This paper studies systems of language equations that are resolved with respect to variables and contain the operations of concatenation, union and intersection Every system of this kind is proved to have a least fixed point, and the equivalence of these systems to conjunctive grammars is established This allows us to obtain an algebraic characterization of the language family generated by conjunctive grammars

55 citations


Journal ArticleDOI
TL;DR: It is shown that context-free valence grammars over finite monoids or commutative monoids have the same power as valence Grammar over finite groups or Commutative groups, respectively.

50 citations


Book ChapterDOI
TL;DR: It is shown that there exists a unique minimal balanced grammar equivalent to a given one and balanced languages are characterized through a property of their syntactic congruence.
Abstract: Balanced grammars are a generalization of parenthesis grammars in two directions. First, several kind of parentheses are allowed. Next, the set of right-hand sides of productions may be an infinite regular language. XML-grammars are a special kind of balanced grammars. This paper studies balanced grammars and their languages. It is shown that there exists a unique minimal balanced grammar equivalent to a given one. Next, balanced languages are characterized through a property of their syntactic congruence. Finally, we show how this characterization is related to previous work of McNaughton and Knuth on parenthesis languages.

45 citations


01 May 2002
TL;DR: This paper illustrates how tree-adjoining grammars (Joshi and Schabes, 1997) may be embedded in abstract categorial grammar (ACG) by showing how the embedding exemplifies several features of the ACG framework.
Abstract: categorial grammars are not intended as yet another grammatical formalism that would compete with other established formalisms. It should rather be seen as the kernel of a grammatical framework — in the spirit of (Ranta, 2002) — in which other existing grammatical models may be encoded. This paper illustrates this fact by showing how tree-adjoining grammars (Joshi and Schabes, 1997) may be embedded in abstract categorial grammars. This embedding exemplifies several features of the ACG framework: • The fact that the basic objects manipulated by an ACG are λ-terms allows higher-order operations to be defined. Typically, tree-adjunction is such a higher-order operation (Abrusci, Fouquere and Vauzeilles, 1999; Joshi and Kulick, 1997; Monnich, 1997). • The flexibility of the framework allows the embedding to be defined in two stages. A first ACG allows the tree langage of a given TAG to be generated. The abstract language of this first ACG corresponds to the derivation trees of the TAG. Then, a second ACG allows the corresponding string language to be extracted. The abstract language of this second ACG corresponds to the object language of the first one. 2. Abstract Categorial Grammars This section defines our notion of an abstract categorial grammar. We first introduce the notions of linear implicative types, higher-order linear signature, linear λ-terms built upon a higher-order linear signature, and lexicon. Let A be a set of atomic types. The set T (A) of linear implicative types built upon A is inductively defined as follows: 1. if a ∈ A, then a ∈ T (A); 2. if α, β ∈ T (A), then (α−◦ β) ∈ T (A). A higher-order linear signature consists of a triple Σ = 〈A,C, τ〉, where: 1. A is a finite set of atomic types; c © 2002 Philippe de Groote. Proceedings of the Sixth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+6), pp. 101–106. Universita di Venezia. 102 Proceedings of TAG+6 2. C is a finite set of constants; 3. τ : C → T (A) is a function that assigns to each constant in C a linear implicative type in T (A). Let X be a infinite countable set of λ-variables. The set Λ(Σ) of linear λ-terms built upon a higher-order linear signature Σ = 〈A,C, τ〉 is inductively defined as follows: 1. if c ∈ C, then c ∈ Λ(Σ); 2. if x ∈ X , then x ∈ Λ(Σ); 3. if x ∈ X , t ∈ Λ(Σ), and x occurs free in t exactly once, then (λx. t) ∈ Λ(Σ); 4. if t, u ∈ Λ(Σ), and the sets of free variables of t and u are disjoint, then (t u) ∈ Λ(Σ). Λ(Σ) is provided with the usual notion of capture avoiding substitution, α-conversion, and β-reduction (Barendregt, 1984). Given a higher-order linear signature Σ = 〈A,C, τ〉, each linear λ-term in Λ(Σ) may be assigned a linear implicative type in T (A). This type assignment obeys an inference system whose judgements are sequents of the following form: Γ −Σ t : α where: 1. Γ is a finite set of λ-variable typing declarations of the form ‘x : β’ (with x ∈ X and β ∈ T (A)), such that any λ-variable is declared at most once; 2. t ∈ Λ(Σ); 3. α ∈ T (A). The axioms and inference rules are the following:

43 citations


Book ChapterDOI
23 Sep 2002
TL;DR: In inductive inference for synthesizing context free grammars from positive and negative sample strings, implemented in Synapse system, is described.
Abstract: This paper describes inductive inference for synthesizing context free grammars from positive and negative sample strings, implemented in Synapse system. For effective inference of grammars, Synapse employs the following mechanisms. 1. A rule generating method called "inductive CYK algorithm," which generates minimum production rules required for parsing positive samples. 2. Incremental learning for adding newly generated rules to previously obtained rules.Synapse can synthesize both ambiguous grammars and unambiguous grammars. Experimental results show recent improvement of Synapse system to synthesize context free grammars.

32 citations


Journal ArticleDOI
TL;DR: A pumping lemma for random context languages of finite index is proved that generalizes and refines the existing one, and it is shown that these Grammars are strictly weaker than the non-erasing random context grammars.

30 citations


Proceedings ArticleDOI
24 Aug 2002
TL;DR: The resulting compiler creates a context-free backbone of the unification grammar, eliminates left-recursive productions and removes redundant grammar rules, and shows no significant computational overhead with respect to speech recognition performances for speech recognition grammar with compositional semantics compared to grammars without.
Abstract: In this paper a method to compile unification grammars into speech recognition packages is presented, and in particular, rules are specified to transfer the compositional semantics stated in unification grammars into speech recognition grammars. The resulting compiler creates a context-free backbone of the unification grammar, eliminates left-recursive productions and removes redundant grammar rules. The method was tested on a medium-sized unification grammar for English using Nuance speech recognition software on a corpus of 131 utterances of 12 different speakers. Results showed no significant computational overhead with respect to speech recognition performances for speech recognition grammar with compositional semantics compared to grammars without.

29 citations



Proceedings ArticleDOI
03 Jul 2002
TL;DR: Whale Calf is a parser generator that uses conjunctive grammars, a generalization of context-free Grammars with an explicit intersection operation, as the formalism of specifying the language.
Abstract: Whale Calf is a parser generator that uses conjunctive grammars, a generalization of context-free grammars with an explicit intersection operation, as the formalism of specifying the language. All existing parsing algorithms for conjunctive grammars are implemented - namely, the tabular algorithm for grammars in the binary normal form, the tabular algorithm for grammars in the linear normal form, the tabular algorithm for arbitrary grammars, the conjunctive LL, the conjunctive LR and the algorithm based on simulation of the automata equivalent to linear conjunctive grammars. The generated C++ programs can determine the membership of strings in the language and, if needed, create parse trees of these strings.

Journal ArticleDOI
01 Aug 2002-Grammars
TL;DR: A new LR-style parsing algorithm for conjunctive grammars is developed, based on the very same idea of a graph-structured pushdown, where the simultaneous existence of several paths in the graph is used to perform the mentioned intersection operation.
Abstract: The Generalized LR parsing algorithm for context-free grammars, introduced by Tomita in 1986, is a polynomial-time implementation of nondeterministic LR parsing that uses graph- structured stack to represent the contents of the nondeterministic parser's pushdown for all possible branches of computation at a single computation step. It has been specifically developed as a solution for practical parsing tasks arising in computational linguistics, and indeed has proved itself to be very suitable for natural language processing. Conjunctive grammars extend context-free grammars by allowing the use of an explicit intersection operation within grammar rules. This paper develops a new LR-style parsing algorithm for these grammars, which is based on the very same idea of a graph-structured pushdown, where the simultaneous existence of several paths in the graph is used to perform the mentioned intersection operation. The underlying finite automata are treated in the most general way: instead of showing the algorithm's correctness for some particular way of constructing automata, the paper defines a wide class of automata usable with a given grammar, which includes not only the traditional LR(k) automata, but also, for instance, a trivial automaton with a single reachable state. A modification of the SLR(k) table construction method that makes use of specific properties of conjunctive grammars is provided as one possible way of making finite automata to use with the algorithm. It is shown that the algorithm is applicable to any conjunctive grammar and can be implemented to work in no more than cubic time. Additionally, the algorithm can be made to work in linear time for the Boolean closure of the family of deterministic context-free languages.

Book ChapterDOI
Bradford Craig Starkie1
23 Sep 2002
TL;DR: The method presented in this paper has the ability to infer attribute grammars that can generate a wide range of useful data structures such as simple and structured types, lists, concatenated strings, and natural numbers.
Abstract: This paper presents a method for inferring reversible attribute grammars from tagged natural language sentences. Attribute grammars are a form of augmented context free grammar that assign "meaning" in the form of a data structure to a string in a context free language. The method presented in this paper has the ability to infer attribute grammars that can generate a wide range of useful data structures such as simple and structured types, lists, concatenated strings, and natural numbers. The method also presents two new forms of grammar generalisation; generalisation based upon identification of optional phrases and generalisation based upon lists. The method has been applied to and tested on the task of the rapid development of spoken dialog systems.

Book ChapterDOI
18 Sep 2002
TL;DR: This paper establishes their computational equivalence to linear conjunctive grammars, which are linear context-free Grammars extended with an explicit intersection operation, which allows to combine the known results on the generative power and closure properties of triangular trellis automata and linear conj unctive gramMars.
Abstract: Triangular trellis automata, also studied under the name of one-way real-time cellular automata, have been known for several decades as a purely abstract model of parallel computers. This paper establishes their computational equivalence to linear conjunctive grammars, which are linear context-free grammars extended with an explicit intersection operation. This equivalence allows to combine the known results on the generative power and closure properties of triangular trellis automata and linear conjunctive grammars and to obtain new previously unexpected results on this language family - for instance, to determine their exact relationship with other comparable families of languages.

Book ChapterDOI
TL;DR: It is proved that history preserving bisimulation is decidable for finite-state graph grammars, by showing how the problem can be reduced to deciding the equivalence of finite causal automata.
Abstract: Along the years the concurrent behaviour of graph grammars has been widely investigated, and, in particular, several classical approaches to the semantics of Petri nets have been extended to graph grammars. Most of the existing semantics for graph grammars provide a (possibly concurrent) operational model of computation, while little interest has been devoted to the definition of abstract observational semantics. The aim of this paper is to introduce and study a behavioural equivalence over graph grammars, inspired by the classical history preserving bisimulation. Several choices are conceivable according to the kind of concurrent observation one is interested in. We concentrate on the basic case where the concurrent nature of a graph grammar computation is described by means of a prime event structure. As it happens for Petri nets, history preserving bisimulation can be studied in the general framework of causal automata -- a variation of ordinary automata introduced to deal with history dependent formalisms. In particular, we prove that history preserving bisimulation is decidable for finite-state graph grammars, by showing how the problem can be reduced to deciding the equivalence of finite causal automata.

Patent
John T. Maxwell1, Hadar Shemtov1
27 Sep 2002
TL;DR: This article proposed a process for generating with unification based grammars such as Lexical Functional Grammars which uses construction and analysis of generation guides to determine internal facts and eliminate incomplete edges prior to constructing a generation chart.
Abstract: A process for generating with unification based grammars such as Lexical Functional Grammars which uses construction and analysis of generation guides to determine internal facts and eliminate incomplete edges prior to constructing a generation chart. The generation guide can then be used in the construction of the generation chart to efficiently generate with unification-based grammars such as Lexical Functional Grammars. The generation guide is an instance of a grammar that has been specialized to the input and only contains those parts of the grammar that are relevant to the input. When the generation guide is analyzed to determine internal facts a smaller generation chart is produced.

Proceedings ArticleDOI
03 Jul 2002
TL;DR: A polynomial algorithm is shown to decide whether a context-free grammars is self-embedding or not and its advantages with respect to more classical representations by finite automata are pointed out.
Abstract: We consider non-self-embedding (NSE) context-free grammars as a representation of regular sets.We point out its advantages with respect to more classical representations by finite automata, in particular when considering the efficient realization of the rational operations. We give a characterization in terms of composition of regular grammars and state relationships between NSE grammars and push-down automata. Finally we show a polynomial algorithm to decide whether a context-free grammars is self-embedding or not.

Journal Article
TL;DR: It is demonstrated that for every phrase-structure grammar, there exists an equivalent simple semi-conditional grammar that has no more than twelve conditional productions.
Abstract: The present paper discusses the descriptional complexity of simple semi-conditional grammars with respect to the number of conditional productions. More specifically, it demonstrates that for every phrase-structure grammar, there exists an equivalent simple semi-conditional grammar that has no more than twelve conditional productions.

Proceedings ArticleDOI
06 Jul 2002
TL;DR: This work presents two tabular algorithms for parsing of non-recursive context-free grammars, and shows that they perform well in practical settings, despite the fact that this problem is PSPACE-complete.
Abstract: We consider the problem of parsing non-recursive context-free grammars, i.e., context-free grammars that generate finite languages. In natural language processing, this problem arises in several areas of application, including natural language generation, speech recognition and machine translation. We present two tabular algorithms for parsing of non-recursive context-free grammars, and show that they perform well in practical settings, despite the fact that this problem is PSPACE-complete.

Proceedings ArticleDOI
14 Jan 2002
TL;DR: A new type of grammar is invented that extends tree grammars by permitting a notion of sharing in the productions that seems to be of independent interest and how to derive type inference from type checking is demonstrated.
Abstract: Abramov and Gluck have recently introduced a technique called URA for inverting first order functional programs. Given some desired output value, URA computes a potentially infinite sequence of substitutions/restrictions corresponding to the relevant input values. In some cases this process does not terminate.In the present paper, we propose a new program analysis for inverting programs. The technique works by computing a finite grammar describing the set of all input that relate to a given output. During the production of the grammar, the original program is implicitly transformed using so-called driving steps. Whereas URA is sound and complete, but sometimes fails to terminate, our technique always terminates and is complete, but not sound. As an example, we demonstrate how to derive type inference from type checking.The idea of approximating functional programs by grammars is not new. For instance, the second author has developed a technique using tree grammars to approximate termination behaviour of deforestation. However, for the present purposes it has been necessary to invent a new type of grammar that extends tree grammars by permitting a notion of sharing in the productions. These dag grammars seem to be of independent interest.

Journal ArticleDOI
TL;DR: A coding theorem is proved which shows that a structured grammar-based code has maximal redundancy/sample O(1=logn) provided that a weak regular structure condition is satisfied.
Abstract: A grammar-based code losslessly compresses each finite-alphabet data string x by compressing a context-free grammar Gx which represents x in the sense that the language of Gx is fxg. In an earlier paper, we showed that if the grammar Gx is a type of grammar called ir- reducible grammar for every data string x, then the resulting grammar-based code has maximal redundancy/sample O(log logn=logn) for n data samples. To further reduce the maximal redun- dancy/sample, in the present paper, we first decompose a context-free grammar into its structure and its data content, then encode the data content conditional on the structure, and finally replace the irreducible grammar condition with a mild condition on the structures of all grammars used to repre- sent distinct data strings of a fixed length. The resulting grammar-based codes are called structured grammar-based codes. We prove a coding theorem which shows that a structured grammar-based code has maximal redundancy/sample O(1=logn) provided that a weak regular structure condition is satisfied.

Book ChapterDOI
23 Sep 2002
TL;DR: It is shown that in contrast to k-valued classical categorial grammars, different classes of Lambek Grammars are not learnable from strings following Gold's model.
Abstract: In this paper we give some learnability results in the field of categorial grammars. We show that in contrast to k-valued classical categorial grammars, different classes of Lambek grammars are not learnable from strings following Gold's model. The results are obtained by the construction of limit points in each considered class: non associative Lambek grammars with empty sequences and Lambek grammars without empty sequences and without product. Such results express the difficulty of learning categorial grammars from unstructured strings and the need for structured examples.

Journal ArticleDOI
TL;DR: This work considers simulating finite automata (both deterministic and nondeterministic) with context-free grammars in Chomsky normal form (CNF), and shows that any unary DFA with n states can be simulated by a CNF grammar with O(n1/3) variables.

Journal ArticleDOI
TL;DR: For each context-free returning PC grammar system an equivalent system of this type can be constructed, where the total number of symbols used for describing a component can be bounded by a reasonably small constant.

Journal ArticleDOI
01 Jan 2002-Grammars
TL;DR: This paper discusses alternative approaches for defining the denotation of a grammar, culminating in one which is shown to be both compositional and fully-abstract, and shows how grammar modules can be defined such that their semantics retains these desirable properties.
Abstract: Given two context-free grammars (CFGs), G1 and G2, the language generated by the union of the grammars is not the union of the languages generated by each grammar: L(G1∪ G2)≠ L(G1∪ L(G2). In order to account for modularity of grammars, another way of defining the meaning of grammars is needed. This paper adapts results from the semantics of logic programming languages to CFGs. We discuss alternative approaches for defining the denotation of a grammar, culminating in one which we show to be both compositional and fully-abstract. We then show how grammar modules can be defined such that their semantics retains these desirable properties. This gives a clear, mathematically sound way for composing parts of grammars.

Proceedings ArticleDOI
24 Aug 2002
TL;DR: It is shown that in contrast to classical categorial grammars, rigid and k-valued Lambek Grammars are not learnable from strings for variants of Lambek calculus.
Abstract: This paper is concerned with learning categorial grammars in Gold's model (Gold, 1967). Recently, learning algorithms in this model have been proposed for some particular classes of classical categorial grammars (Kanazawa, 1998).We show that in contrast to classical categorial grammars, rigid and k-valued Lambek grammars are not learnable from strings. This result holds for variants of Lambek calculus; our proof consists in the construction of limit points in each class. Such a result aims at clarifying the possible directions for future learning algorithms.

Proceedings ArticleDOI
03 Jul 2002
TL;DR: A novel view on top-down predictive parser construction for extended context-free grammars that is based on the rewriting of partial syntax trees is presented.
Abstract: Extended context-free grammars are context-free grammars in which the right-hand sides of productions are allowed to be any regular language rather than being restricted to only finite languages.We present a novel view on top-down predictive parser construction for extended context-free grammars that is based on the rewriting of partial syntax trees. This work is motivated by our development of ecfg, a Java toolkit for the manipulation of extended context-free grammars, and by our continuing investigation of XML.

Journal Article
TL;DR: A procedure of finding minimal unifiers with respect to some preordering relation between substitutions is introduced to solve the general problem of finding all minimal (in several senses) categorial grammars compatible with a given language sample.
Abstract: In this paper continuing [1], we present a more general approach to restricted optimal unification, introduced and applied to learning algorithms for categorial grammars in [2] and further developed in [7, 8, 4, 5]. We introduce a procedure of finding minimal unifiers with respect to some preordering relation between substitutions and solve a general problem of finding all minimal (in several senses) categorial grammars compatible with a given language sample.

Book ChapterDOI
23 Sep 2002
TL;DR: A general condition valid for certain subclasses of the linear grammars given which these classes can be polynomially identified in the limit from given data is proposed.
Abstract: Linearity and determinism seem to be two essential conditions for polynomial learning of grammars to be possible. We propose a general condition valid for certain subclasses of the linear grammars given which these classes can be polynomially identified in the limit from given data. This enables us to give new proofs of the identification of well known classes of grammars, and to propose a new (and larger) class of linear grammars for which polynomial identification is thus possible.

01 Jan 2002
TL;DR: In this paper, the authors considered the descriptional complexity of block-synchronization context-free grammars, and showed that for weak and strong derivations, one begin symbol and two situation symbols are sufficient to generate all respective language families.
Abstract: We consider the descriptional complexity of block-synchronization context-free grammars, BSCF grammars. In particular, we consider the number of necessary situation and begin symbols as complexity measures. For weak and strong derivations, one begin symbol and two situation symbols are sufficient to generate all respective language families. Surprisingly, one situation symbol with equality synchronization is also sufficient to generate all weak derivation BSCF languages. The family of synchronized context-free languages (SCF languages) generated by grammars with one situation symbol using equality synchronization gives a language family properly between that of E0L and ET0L languages. Some normal forms are also presented for all variations. In addition, we show that either prefix or equality synchronization can be used to describe all weak and strong derivation languages.