scispace - formally typeset
Search or ask a question

Showing papers on "Tree-adjoining grammar published in 2006"


Journal ArticleDOI
27 Apr 2006-Nature
TL;DR: It is shown that European starlings (Sturnus vulgaris) accurately recognize acoustic patterns defined by a recursive, self-embedding, context-free grammar, and this finding opens a new range of complex syntactic processing mechanisms to physiological investigation.
Abstract: Noam Chomsky's work on ‘generative grammar’ led to the concept of a set of rules that can generate a natural language with a hierarchical grammar, and the idea that this represents a uniquely human ability. In a series of experiments with European starlings, in which several types of ‘warble’ and ‘rattle’ took the place of words in a human language, the birds learnt to classify phrase structure grammars in a way that met the same criteria. Their performance can be said to be almost human on this yardstick. So if there are language processing capabilities that are uniquely human, they may be more context-free or at a higher level in the Chomsky hierarchy. Or perhaps there is no single property or processing capacity that differentiates human language from non-human communication systems. Humans regularly produce new utterances that are understood by other members of the same language community1. Linguistic theories account for this ability through the use of syntactic rules (or generative grammars) that describe the acceptable structure of utterances2. The recursive, hierarchical embedding of language units (for example, words or phrases within shorter sentences) that is part of the ability to construct new utterances minimally requires a ‘context-free’ grammar2,3 that is more complex than the ‘finite-state’ grammars thought sufficient to specify the structure of all non-human communication signals. Recent hypotheses make the central claim that the capacity for syntactic recursion forms the computational core of a uniquely human language faculty4,5. Here we show that European starlings (Sturnus vulgaris) accurately recognize acoustic patterns defined by a recursive, self-embedding, context-free grammar. They are also able to classify new patterns defined by the grammar and reliably exclude agrammatical patterns. Thus, the capacity to classify sequences from recursive, centre-embedded grammars is not uniquely human. This finding opens a new range of complex syntactic processing mechanisms to physiological investigation.

510 citations


Proceedings ArticleDOI
08 Jun 2006
TL;DR: This work presents a new model of the translation process: quasi-synchronous grammar (QG), and evaluates the cross-entropy of QGs on unseen text and shows that a better fit to bilingual data is achieved by allowing greater syntactic divergence.
Abstract: Many syntactic models in machine translation are channels that transform one tree into another, or synchronous grammars that generate trees in parallel. We present a new model of the translation process: quasi-synchronous grammar (QG). Given a source-language parse tree T1, a QG defines a monolingual grammar that generates translations of T1. The trees T2 allowed by this monolingual grammar are inspired by pieces of substructure in T1 and aligned to T1 at those points. We describe experiments learning quasi-synchronous context-free grammars from bitext. As with other monolingual language models, we evaluate the cross-entropy of QGs on unseen text and show that a better fit to bilingual data is achieved by allowing greater syntactic divergence. When evaluated on a word alignment task, QG matches standard baselines.

112 citations


Book ChapterDOI
25 Sep 2006
TL;DR: An arc-consistency algorithm for context-free grammars, an investigation of when logic combinations of grammar constraints are tractable, and when the boundaries run between regular, context- free, and context-sensitive grammar filtering are studied.
Abstract: By introducing the Regular Membership Constraint, Gilles Pesant pioneered the idea of basing constraints on formal languages. The paper presented here is highly motivated by this work, taking the obvious next step, namely to investigate constraints based on grammars higher up in the Chomsky hierarchy. We devise an arc-consistency algorithm for context-free grammars, investigate when logic combinations of grammar constraints are tractable, show how to exploit non-constant size grammars and reorderings of languages, and study where the boundaries run between regular, context-free, and context-sensitive grammar filtering.

51 citations


Proceedings Article
01 Apr 2006
TL;DR: The tree relations Definable by synchronous tree-substitution grammars (STSG) were shown to be just those definable by linear complete bimorphisms, thereby providing for the first time a clear relationship between synchronous Grammars and tree transducers.
Abstract: We place synchronous tree-adjoining grammars and tree transducers in the single overarching framework of bimorphisms, continuing the unification of synchronous grammars and tree transducers initiated by Shieber (2004). Along the way, we present a new definition of the tree-adjoining grammar derivation relation based on a novel direct inter-reduction of TAG and monadic macro tree transducers. Tree transformation systems such as tree transducers and synchronous grammars have seen renewed interest, based on a perceived relevance to new applications, such as importing syntactic structure into statistical machine translation models or founding a formalism for speech command and control. The exact relationship among a variety of formalisms has been unclear, with a large number of seemingly unrelated formalisms being independently proposed or characterized. An initial step toward unifying the formalisms was taken (Shieber, 2004) in making use of the formallanguage-theoretic device of bimorphisms, previously used to characterize the tree relations definable by tree transducers. In particular, the tree relations definable by synchronous tree-substitution grammars (STSG) were shown to be just those definable by linear complete bimorphisms, thereby providing for the first time a clear relationship between synchronous grammars and tree transducers.

46 citations


Book ChapterDOI
08 Nov 2006
TL;DR: The validation of a context-free grammar obtained by the analysis against XML schemas is considered and two algorithms for deciding inclusion L(G1)⊆L(G2) are developed, which are efficient in practice although they have exponential complexity.
Abstract: String expression analysis conservatively approximates the possible string values generated by a program We consider the validation of a context-free grammar obtained by the analysis against XML schemas and develop two algorithms for deciding inclusion L(G1)⊆L(G2) where G1 is a context-free grammar and G2 is either an XML-grammar or a regular hedge grammar The algorithms for XML-grammars and regular hedge grammars have exponential and doubly exponential time complexity, respectively We have incorporated the algorithms into the PHP string analyzer and validated several publicly available PHP programs against the XHTML DTD The experiments show that both of the algorithms are efficient in practice although they have exponential complexity

35 citations


Journal ArticleDOI
01 Feb 2006-Lingua
TL;DR: The degree to which the explanations offered by these different approaches generalize across A- and A ′ -movement, across different structural contexts, and across the phenomena of displacement and agreement is explored, and whether such generalization is empirically warranted in each case.

34 citations


Journal ArticleDOI
TL;DR: It is shown how nearly all of these methods to model RNA and protein structure are based on the same core principles and can be converted into equivalent approaches in the framework of tree-adjoining grammars and related formalisms.
Abstract: Since the first application of context-free grammars to RNA secondary structures in 1988, many researchers have used both ad hoc and formal methods from computational linguistics to model RNA and protein structure. We show how nearly all of these methods are based on the same core principles and can be converted into equivalent approaches in the framework of tree-adjoining grammars and related formalisms. We also propose some new approaches that extend these core principles in novel ways.

33 citations


Journal Article
TL;DR: Adaptive star grammars as mentioned in this paper are an extension of node and hyperedge replacement grammar, and they have been shown to be capable of generating every type-0 string language.
Abstract: We propose an extension of node and hyperedge replacement grammars, called adaptive star grammars, and study their basic properties. A rule in an adaptive star grammar is actually a rule schema which, via the so-called cloning operation, yields a potentially infinite number of concrete rules. Adaptive star grammars are motivated by application areas such as modeling and refactoring object-oriented programs. We prove that cloning can be applied lazily. Unrestricted adaptive star grammars are shown to be capable of generating every type-0 string language. However, we identify a reasonably large subclass for which the membership problem is decidable.

31 citations


Book ChapterDOI
17 Sep 2006
TL;DR: It is proved that cloning can be applied lazily, and a reasonably large subclass for which the membership problem is decidable is identified.
Abstract: We propose an extension of node and hyperedge replacement grammars, called adaptive star grammars, and study their basic properties. A rule in an adaptive star grammar is actually a rule schema which, via the so-called cloning operation, yields a potentially infinite number of concrete rules. Adaptive star grammars are motivated by application areas such as modeling and refactoring object-oriented programs. We prove that cloning can be applied lazily. Unrestricted adaptive star grammars are shown to be capable of generating every type-0 string language. However, we identify a reasonably large subclass for which the membership problem is decidable.

31 citations


Journal ArticleDOI
TL;DR: The generalized LR parsing algorithm for context-free grammars is extended for the case of Boolean Grammars, which are a generalization of the context- free grammARS with logical connectives added to the formalism of rules.
Abstract: The generalized LR parsing algorithm for context-free grammars is extended for the case of Boolean grammars, which are a generalization of the context-free grammars with logical connectives added to the formalism of rules. In addition to the standard LR operations, Shift and Reduce, the new algorithm uses a third operation called Invalidate, which reverses a previously made reduction. This operation makes the mathematical justification of the algorithm significantly different from its prototype. On the other hand, the changes in the implementation are not very substantial, and the algorithm still works in time O(n4).

27 citations


Proceedings ArticleDOI
13 Nov 2006
TL;DR: An induction method is given to infer node replacement graph grammars from various structural representations and the correctness of an inferred grammar is verified by parsing graphs not present in the training set.
Abstract: Computer programs that can be expressed in two or more dimensions are typically called visual programs. The underlying theories of visual programming languages involve graph grammars. As graph grammars are usually constructed manually, construction can be a time-consuming process that demands technical knowledge. Therefore, a technique for automatically constructing graph grammars - at least in part - is desirable. An induction method is given to infer node replacement graph grammars. The method operates on labeled graphs of broad applicability. It is evaluated by its performance on inferring graph grammars from various structural representations. The correctness of an inferred grammar is verified by parsing graphs not present in the training set

Book ChapterDOI
20 Sep 2006
TL;DR: This work presents the first polynomial-time algorithm for inferring Simple External Context Grammars, a class of mildly context-sensitive grammars from positive examples.
Abstract: Natural languages contain regular, context-free, and context-sensitive syntactic constructions, yet none of these classes of formal languages can be identified in the limit from positive examples Mildly context-sensitive languages are able to represent some context-sensitive constructions, those most common in natural languages, such as multiple agreement, crossed agreement, and duplication These languages are attractive for natural language applications due to their expressiveness, and the fact that they are not fully context-sensitive should lead to computational advantages as well We realize one such computational advantage by presenting the first polynomial-time algorithm for inferring Simple External Context Grammars, a class of mildly context-sensitive grammars, from positive examples

Proceedings ArticleDOI
17 Jul 2006
TL;DR: This paper proposes a generic mathematical formalism for the combination of various structures: strings, trees, dags, graphs and products of them that is both elementary and powerful enough to strongly simulate many grammar formalisms.
Abstract: This paper proposes a generic mathematical formalism for the combination of various structures: strings, trees, dags, graphs and products of them. The polarization of the objects of the elementary structures controls the saturation of the final structure. This formalism is both elementary and powerful enough to strongly simulate many grammar formalisms, such as rewriting systems, dependency grammars, TAG, HPSG and LFG.

Journal ArticleDOI
TL;DR: Two results extending classical language properties into 2D are proved: non-recursive tile writing grammars (TRG) coincide with tiling systems (TS) and non-self-embedding TRG are suitably defined as corner Grammars, showing that they generate TS languages.

Proceedings Article
01 Jul 2006
TL;DR: Two methods of the analysis of distorted (fuzzy) string patterns are presented: a minimum distance measure is used for error-correcting parsing and stochastic approach.
Abstract: Two methods of the analysis of distorted (fuzzy) string patterns are presented. The methods are based on the use of GDPLL(k) grammars generating a large subclass of context sensitive languages. The first one utilizes error-correcting approach: a minimum distance measure is used for error-correcting parsing. The second one utilizes stochastic approach: the decision about the production to be applied in a derivation step is given according to the probability measure.

Book ChapterDOI
26 Jun 2006
TL;DR: In this paper, Okhotin et al. proposed a new semantics for boolean grammars, which applies to all such grammar models, independently of their syntax, based on the well-founded approach to negation.
Abstract: Boolean grammars [A. Okhotin, Information and Computation 194 (2004) 19-48] are a promising extension of context-free grammars that supports conjunction and negation. In this paper we give a novel semantics for boolean grammars which applies to all such grammars, independently of their syntax. The key idea of our proposal comes from the area of negation in logic programming, and in particular from the so-called well-founded semantics which is widely accepted in this area to be the “correct” approach to negation. We show that for every boolean grammar there exists a distinguished (three-valued) language which is a model of the grammar and at the same time the least fixed point of an operator associated with the grammar. Every boolean grammar can be transformed into an equivalent (under the new semantics) grammar in normal form. Based on this normal form, we propose an ${\mathcal{O}(n^3)}$ algorithm for parsing that applies to any such normalized boolean grammar. In summary, the main contribution of this paper is to provide a semantics which applies to all boolean grammars while at the same time retaining the complexity of parsing associated with this type of grammars.


Journal ArticleDOI
15 Jan 2006
TL;DR: It is shown that for a given LRG, there exists an LA such that they accept the same languages, and vice versa, and the equivalence between deterministic lattice-valued regular grammars and deterministic associative finite automata is shown.
Abstract: In this study, we introduce the concept of lattice-valued regular grammars. Such grammars have become a necessary tool for the analysis of fuzzy finite automata. The relationship between lattice-valued finite automata (LA) and lattice-valued regular grammars (LRG) are discussed and we get the following results, for a given LRG, there exists an LA such that they accept the same languages, and vice versa. We also show the equivalence between deterministic lattice-valued regular grammars and deterministic lattice-valued finite automata.

Journal ArticleDOI
TL;DR: The algorithm computes a canonical representation of a simple language, converting its arbitrary simple grammar into prime normal form (PNF); a simple grammar is in PNF if all its nonterminals define primes.

Journal ArticleDOI
TL;DR: It will be shown that (i) the elementary tree representing the logical form of a wh relative pronoun provides a generalized quantifier, and (ii) the semantic composition of the pied-piped material and the wh-word is achieved through adjoining the elementary Tree Adjoining Grammar.
Abstract: In relative clauses, the wh relative pronoun can be embedded in a larger phrase, as in a boy [whose brother] Mary hit. In such examples, we say that the larger phrase has pied-piped along with the wh-word. In this paper, using a similar syntactic analysis for wh pied-piping as in Han (2002) and further developed in Kallmeyer and Scheffler (2004), I propose a compositional semantics for relative clauses based on Synchronous Tree Adjoining Grammar. It will be shown that (i) the elementary tree representing the logical form of a wh-word provides a generalized quantifier, and (ii) the semantic composition of the pied-piped material and the wh-word is achieved through adjoining in the semantics of the former onto the latter.


Proceedings ArticleDOI
17 Jul 2006
TL;DR: This work reflects on the experience with the Russian resource grammar trying to answer the questions: how well Russian fits into the common interface and where the line between language-independent and language-specific should be drawn.
Abstract: A resource grammar is a standard library for the GF grammar formalism. It raises the abstraction level of writing domain-specific grammars by taking care of the general grammatical rules of a language. GF resource grammars have been built in parallel for eleven languages and share a common interface, which simplifies multilingual applications. We reflect on our experience with the Russian resource grammar trying to answer the questions: how well Russian fits into the common interface and where the line between language-independent and language-specific should be drawn.

01 Jan 2006
TL;DR: It is argued that embracing the unboundedness assumption of Treebank Grammars also brings the justification of smoothing techniques within the scope of Estimation Theory.
Abstract: State-of-the-art syntactic disambiguators for natural language employ ”Treebank Grammars”: probabilistic grammars directly projected from annotated corpora (treebanks). Treebank Grammars mark a paradigm shift from the manually constructed, a priori fixed linguistic grammars. In this paper we show that for describing these systems in the framework of Statistical Estimation Theory one must assume an unbounded number of parameters. The unboundedness assumption of Treebank Grammars expresses persistent uncertainty over the formal grammar of natural language. We argue that embracing the unboundedness assumption also brings the justification of smoothing techniques within the scope of Estimation Theory.

Book ChapterDOI
26 Jun 2006
TL;DR: In this paper, Bag Context (BC) is introduced as a device for regulated rewriting in tree grammars, and it is shown that the class of bc tree languages is the closure of the random context tree languages under linear top-down tree transductions.
Abstract: We introduce bag context, a device for regulated rewriting in tree grammars. Rather than being part of the developing tree, bag context (bc) evolves on its own during a derivation. We show that the class of bc tree languages is the closure of the class of random context tree languages under linear top-down tree transductions. Further, an interchange theorem for subtrees of dense trees in bc tree languages is established. This result implies that the class of bc tree languages is incomparable with the class of branching synchronization tree languages.

Journal ArticleDOI
01 Dec 2006
TL;DR: This theoretical paper studies how to translate finite state automata into categorial grammars and back, and shows that the generalization operators employed in both domains can be compared and that their result can always be represented by generalized automata, called "recursive automata ".
Abstract: In this theoretical paper, we compare the "classical" learning techniques used to infer regular grammars from positive examples with the ones used to infer categorial grammars. To this aim, we first study how to translate finite state automata into categorial grammars and back. We then show that the generalization operators employed in both domains can be compared, and that their result can always be represented by generalized automata, called "recursive automata ". The relation between these generalized automata and categorial grammars is studied in detail. Finally, new learnable subclasses of categorial grammars are defined, for which learning from strings is nearly not more expensive than from structures.

01 Jul 2006
TL;DR: It is shown that multi-component TAG does not necessarily retain the well-nestedness constraint, while this constraint is inherent to Coupled Context-Free Grammar (Hotz and Pitsch, 1996).
Abstract: The ability to represent cross-serial dependencies is one of the central features of Tree Adjoining Grammar (TAG). The class of dependency structures representable by lexicalized TAG derivations can be captured by two graph-theoretic properties: a bound on the gap degree of the structures, and a constraint called well-nestedness. In this paper, we compare formalisms from two strands of extensions to TAG in the context of the question, how they behave with respect to these constraints. In particular, we show that multi-component TAG does not necessarily retain the well-nestedness constraint, while this constraint is inherent to Coupled Context-Free Grammar (Hotz and Pitsch, 1996).

Journal Article
TL;DR: A parsing methodology to recognize a set of symbols represented by an adjacency grammar, a grammar that describes a symbol in terms of the primitives that form it and the relations among these primitives.
Abstract: Syntactic approaches on structural symbol recognition are characterized by defining symbols using a grammar. Following the grammar productions a parser is constructed to recognize symbols: given an input, the parser detects whether it belongs to the language generated by the grammar, recognizing the symbol, or not. In this paper, we describe a parsing methodology to recognize a set of symbols represented by an adjacency grammar. An adjacency grammar is a grammar that describes a symbol in terms of the primitives that form it and the relations among these primitives. These relations are called constraints, which are validated using a defined cost function. The cost function approximates the distortion degree associated to the constraint. When a symbol has been recognized the cost associated to the symbol is like a similarity value. The evaluation of the method has been realized from a qualitative point of view, asking some users to draw some sketches. From a quantitative point of view a benchmarking database of sketched symbols has been used.

Proceedings Article
01 Jan 2006
TL;DR: It is advocated that two-dimensional context-free grammars can be successfully used in the analysis of images containing objects that exhibit structural relations and demonstrated on a pilot study concerning recognition of off-line hand written mathematical formulae that they have the potential to deal with real-life noisy images.
Abstract: This contribution advocates that two-dimensional context-free grammars can be successfully used in the analysis of images containing objects that exhibit structural relations. The idea of structural construction is further developed. The approach can be made computationally efficient, practical and be able to cope with noise. We have developed and tested the method on a pilot study aiming at recognition of offline mathematical formulae. The other novelty is not treating symbol segmentation in the image and structural analysis as two separate processes. This allows the system to recover from errors made in initial symbol segmentation. 1 Motivation and Taxonomy of Approaches The paper serves two main purposes. First, it intends to point the reader’s attention to the theory of two-dimensional (2D) languages. It focuses on context-free grammars having the potential to cope with structural relations in images. Second, the paper demonstrates on a pilot study concerning recognition of off-line hand written mathematical formulae that the 2D context-free grammars have the potential to deal with real-life noisy images. The enthusiasm for grammar-based methods in pattern recognition from the 1970’s [6] has gradually faded down due to inability to cope with errors and noise. Even mathematical linguistics, in which the formal grammar approach was pioneered [4], has tended to statistical methods since the 1990s. M.I. Schlesinger from the Ukrainian Academy of Sciences in Kiev has been developing the 2D grammar-based pattern recognition theory in the context of engineering drawings analysis since the late 1970s. His theory was explicated in the 10th chapter of the monograph [17] in English for the first time. The first author of this paper studied independently the theoretical limits of 2D grammars [14] and proved them to be rather restrictive. The main motivation of the authors of the reported work is to discover to what extent the 2D grammars are applicable to practical image analysis. This paper provides insight into an ongoing work on a pilot study aiming at offline recognition of mathematical formulae. We have chosen this application domain because there is a clear structure in formulae and works of others exist which can be used for comparison. Let us categorize the approaches to mathematical formulae recognition along two directions: – on-line recognition (the timing of the pen strokes is available) versus off-line recognition (only an image is available). Proceedings of the Prague Stringology Conference ’06 – printed versus hand-written formulae. We deal with off-line recognition of hand-written formulae in this contribution. Of course, the approach can be also applied to printed formulae.

Journal ArticleDOI
TL;DR: It is obtained that Eulerian, Hamiltonian, planar and bipartite graphs and regular graphs of degree at least three are pr-universal in that sense that any language which can be generated by programmed grammars can be obtained where the underlying graph belongs to the given special class of graphs.
Abstract: Programmed grammars, one of the most important and well investigated classes of grammars with context-free rules and a mechanism controlling the application of the rules, can be described by graphs. We investigate whether or not the restriction to special classes of graphs restricts the generative power of programmed grammars with erasing rules and without appearance checking, too. We obtain that Eulerian, Hamiltonian, planar and bipartite graphs and regular graphs of degree at least three are pr-universal in that sense that any language which can be generated by programmed grammars (with erasing rules and without appearance checking) can be obtained by programmed grammars where the underlying graph belongs to the given special class of graphs, whereas complete graphs, regular graphs of degree 2 and backbone graphs lead to proper subfamilies of the family of programmed languages.

Proceedings ArticleDOI
15 Jul 2006
TL;DR: It is argued that in it-clefts as in It was Ohno who won, the cleft pronoun and the clefts clause form a discontinuous syntactic constituent, and a semantic unit as a definite description, presenting arguments from Percus and Hedberg.
Abstract: In this paper, we argue that in it-clefts as in It was Ohno who won, the cleft pronoun (it) and the cleft clause (who won) form a discontinuous syntactic constituent, and a semantic unit as a definite description, presenting arguments from Percus (1997) and Hedberg (2000). We propose a syntax of it-clefts using Tree-Local Multi-Component Tree Adjoining Grammar and a compositional semantics on the proposed syntax using Synchronous Tree Adjoining Grammar.