scispace - formally typeset
Search or ask a question

Showing papers on "L-attributed grammar published in 2004"


Journal ArticleDOI
16 Jan 2004-Science
TL;DR: Monkeys tested with the same methods, syllables, and sequence lengths were unable to master a grammar at this higher, “phrase structure grammar” level, but it is demonstrated that monkeys can spontaneously master such grammars.
Abstract: The capacity to generate a limitless range of meaningful expressions from a finite set of elements differentiates human language from other animal communication systems. Rule systems capable of generating an infinite set of outputs ("grammars") vary in generative power. The weakest possess only local organizational principles, with regularities limited to neighboring units. We used a familiarization/discrimination paradigm to demonstrate that monkeys can spontaneously master such grammars. However, human language entails more sophisticated grammars, incorporating hierarchical structure. Monkeys tested with the same methods, syllables, and sequence lengths were unable to master a grammar at this higher, "phrase structure grammar" level.

562 citations


Proceedings ArticleDOI
01 Jan 2004
TL;DR: PEGs address frequently felt expressiveness limitations of CFGs and REs, simplifying syntax definitions and making it unnecessary to separate their lexical and hierarchical components, and are here proven equivalent in effective recognition power.
Abstract: For decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols The power of generative grammars to express ambiguity is crucial to their original purpose of modelling natural languages, but this very power makes it unnecessarily difficult both to express and to parse machine-oriented languages using CFGs Parsing Expression Grammars (PEGs) provide an alternative, recognition-based formal foundation for describing machine-oriented syntax, which solves the ambiguity problem by not introducing ambiguity in the first place Where CFGs express nondeterministic choice between alternatives, PEGs instead use prioritized choice PEGs address frequently felt expressiveness limitations of CFGs and REs, simplifying syntax definitions and making it unnecessary to separate their lexical and hierarchical components A linear-time parser can be built for any PEG, avoiding both the complexity and fickleness of LR parsers and the inefficiency of generalized CFG parsing While PEGs provide a rich set of operators for constructing grammars, they are reducible to two minimal recognition schemas developed around 1970, TS/TDPL and gTS/GTDPL, which are here proven equivalent in effective recognition power

467 citations


Proceedings ArticleDOI
23 Aug 2004
TL;DR: An efficient bit-vector-based CKY-style parser for context-free parsing is presented, which computes a compact parse forest representation of the complete set of possible analyses for large treebank grammars and long input sentences.
Abstract: An efficient bit-vector-based CKY-style parser for context-free parsing is presented. The parser computes a compact parse forest representation of the complete set of possible analyses for large treebank grammars and long input sentences. The parser uses bit-vector operations to parallelise the basic parsing operations. The parser is particularly useful when all analyses are needed rather than just the most probable one.

137 citations


Proceedings Article
01 Jan 2004
TL;DR: Experimental results for a number of different corpora suggest that the SAVG framework is applicable for realistically sized grammars and corpora.
Abstract: Stochastic Attribute Value Grammars (SAVG) provide an attractive framework for syntactic analysis, because they allow the combination of linguistic sophistication with a principled treatment of ambiguity. The paper introduces a widecoverage SAVG for Dutch, known as Alpino, and we show how this SAVG can be efficiently applied, using a beam search algorithm to recover parses from a shared parse forest. Experimental results for a number of different corpora suggest that the SAVG framework is applicable for realistically sized grammars and corpora.

98 citations


Journal ArticleDOI
TL;DR: Their equivalence implies the equivalence of several other formal systems, including a certain restricted class of Turing machines and a certain type of language equations, thus giving further evidence for the importance of the language family they all generate.
Abstract: This paper establishes computational equivalence of two seemingly unrelated concepts: linear conjunctive grammars and trellis automata. Trellis automata, also studied under the name of one-way real-time cellular automata, have been known since early 1980s as a purely abstract model of parallel computers, while linear conjunctive grammars, introduced a few years ago, are linear context-free grammars extended with an explicit intersection operation. Their equivalence implies the equivalence of several other formal systems, including a certain restricted class of Turing machines and a certain type of language equations, thus giving further evidence for the importance of the language family they all generate.

71 citations


Journal ArticleDOI
TL;DR: The possibility of defining an Abstract Categorial Hierarchy is suggested by showing how to encode context-free string grammars, linearcontext-free tree grammARS, and linear context- free rewriting systems as Abstract C categorial Grammars.
Abstract: We show how to encode context-free string grammars, linear context-free tree grammars, and linear context-free rewriting systems as Abstract Categorial Grammars. These three encodings share the same constructs, the only difference being the interpretation of the composition of the production rules. It is interpreted as a first-order operation in the case of context-free string grammars, as a second-order operation in the case of linear context-free tree grammars, and as a third-order operation in the case of linear context-free rewriting systems. This suggest the possibility of defining an Abstract Categorial Hierarchy.

66 citations


Book ChapterDOI
01 Jan 2004
TL;DR: In this paper, the authors present range concatenation grammars, a syntactic formalism which possesses many attractive features, among which they emphasize here generative capacity and closure properties.
Abstract: We present Range Concatenation Grammars, a syntactic formalism which possesses many attractive features, among which we emphasize here generative capacity and closure properties. For example, Range Concatenation Grammars have stronger generative capacity than Linear Context-Free Rewriting Systems, although this power is not to the detriment of efficiency, since the generated languages can always be parsed in polynomial time. Range Concatenation Languages are closed under both intersection and complementation, and these closure properties suggest novel ways to describe some linguistic phenomena. We also present a parsing algorithm which is the basis for our current prototype implementation.

65 citations


Book ChapterDOI
01 Sep 2004
TL;DR: The basic notions used in Property Grammars are described and an account of long-distance dependencies is proposed, illustrating the expressive power of the formalism.
Abstract: This paper presents the basis of Property Grammars, a fully constraint-based theory. In this approach, all kinds of linguistic information is represented by means of constraints. The constraint system constitutes then the core of the theory: it is the grammar, but it also constitutes, after evaluation for a given input, its description. Property Grammars is then a non-generative theory in the sense that no structure has to be build, only constraints are used both to represent linguistic information and to describe inputs. This paper describes the basic notions used in PG and proposes an account of long-distance dependencies, illustrating the expressive power of the formalism.

60 citations


01 May 2004
TL;DR: This work presents the first synthesis of synchronous tree-substitution and -adjoining grammars, a framework of bimorphisms as the generalizing formalism in which all can be embedded.
Abstract: Tree transducer formalisms were developed in the formal language theory community as generalizations of finite-state transducers from strings to trees. Independently, synchronous tree-substitution and -adjoining grammars arose in the computational linguistics community as a means to augment strictly syntactic formalisms to provide for parallel semantics. We present the first synthesis of these two independently developed approaches to specifying tree relations, unifying their respective literatures for the first time, by using the framework of bimorphisms as the generalizing formalism in which all can be embedded. The central result is that synchronous treesubstitution grammars are equivalent to bimorphisms where the component homomorphisms are linear and complete.

59 citations


Journal ArticleDOI
TL;DR: An algorithm for the inference of context-free graph grammars from examples is presented, which builds on an earlier system for frequent substructure discovery, and is biased toward Grammar features that minimize description length.
Abstract: We present an algorithm for the inference of context-free graph grammars from examples. The algorithm builds on an earlier system for frequent substructure discovery, and is biased toward grammars that minimize description length. Grammar features include recursion, variables and relationships. We present an illustrative example, demonstrate the algorithm's ability to learn in the presence of noise, and show real-world examples.

59 citations


Journal ArticleDOI
TL;DR: Four different kinds of grammars that can define crossing dependencies in human language are compared here and some results relevant to the viability of mildly context sensitive analyses and some open questions are reviewed.


Book ChapterDOI
28 Sep 2004
TL;DR: This work shows the definition of a modelling environment for UML sequence diagrams, together with event-driven grammars for the construction of the abstract syntax representation and consistency checking, in combination with (non-monotonic) triple graph grammARS.
Abstract: In this work we introduce event-driven grammars, a kind of graph grammars that are especially suited for visual modelling environments generated by meta-modelling. Rules in these grammars may be triggered by user actions (such as creating, editing or connecting elements) and in its turn may trigger other user-interface events. Its combination with (non-monotonic) triple graph grammars allows constructing and checking the consistency of the abstract syntax graph while the user is building the concrete syntax model. As an example of these concepts, we show the definition of a modelling environment for UML sequence diagrams, together with event-driven grammars for the construction of the abstract syntax representation and consistency checking.

Book ChapterDOI
Robert C. Moore1
01 Jan 2004
TL;DR: An improved form of left-corner chart parsing for large context-free grammars is developed, introducing improvements that result in significant speed-ups more compared to previously-known variants of left corner parsing.
Abstract: We develop an improved form of left-corner chart parsing for large context-free grammars, introducing improvements that result in significant speed-ups compared to previously-known variants of left-corner parsing. We also compare our method to several other major parsing approaches, and find that our improved left-comer parsing method outperforms each of these across a range of grammars. In addition, we describe a new technique for minimizing the extra information needed to efficiently recover parses from the data structures built in the course of parsing.

01 Jan 2004
TL;DR: It is argued that hedge Grammars are effecively identical to balanced grammars and that balanced languages are identical to regular hedge languages, modulo encoding, from the close relationship between Dyck strings and hedges.
Abstract: The XML community generally takes trees and hedges as the model for XML document instances and element content. In contrast, Berstel and Boasson have discussed XML documents in the framework of extended context-free grammar, modeling XML documents as Dyck strings and schemas as balanced grammars. How can these two models be brought closer together? We examine the close relatioship between Dyck strings and hedges, observing that trees and hedges are higher level abstractions than are Dyck primes and Dyck strings. We then argue that hedge grammars are effecively identical to balanced grammars and that balanced languages are identical to regular hedge languages, modulo encoding. From the close relationship between Dyck strings and hedges, we obtain a two-phase architecture for the parsing of balanced languages. We propose caterpillar automata with an additional pushdown stack as a new computational model for the second phase; that is, for the validation of XML documents. Balanced Context-Free Grammars, Hedge Grammars and Pushdown Caterpillar Automata Table of

Journal ArticleDOI
TL;DR: Results suggest that space syntax is useful in determining the universe of solutions generated by the grammar and in evaluating the evolving designs in terms of spatial properties and, therefore, in guiding the generation of designs.
Abstract: This paper is concerned with how two different computational approaches to design – shape grammars and space syntax – can be combined into a single common framework for formulating, generating, and evaluating designs. The main goal is to explore how the formal principles applied in the design process interact with the spatial properties of the designed objects. Results suggest that space syntax is (1) useful in determining the universe of solutions generated by the grammar and (2) in evaluating the evolving designs in terms of spatial properties and, therefore, in guiding the generation of designs.

Journal ArticleDOI
TL;DR: It is proved that, given as input two context-free grammars, deciding non-emptiness of intersection of the two generated languages is PSPACE-complete if at least one grammar is non-recursive.
Abstract: We prove that, given as input two context-free grammars, deciding non-emptiness of intersection of the two generated languages is PSPACE-complete if at least one grammar is non-recursive. The problem remains PSPACE-complete when both grammars are non-recursive and deterministic. Also investigated are generalizations of the problem to several context-free grammars, of which a certain number are non-recursive.

01 Jan 2004
TL;DR: This paper attempts to give an overview of what syntactic methods exist in the literature, and how they have been used as tools for pattern modeling and recognition.
Abstract: We review various methods and applications that have used grammars for solving inference problems in computer vision and pattern recognition. Grammars have been useful because they are intuitively simple to understand, and have very elegant representations. Their ability to model semantic interpretations of patterns, both spatial and temporal, have made them extremely popular in the research community. In this paper, we attempt to give an overview of what syntactic methods exist in the literature, and how they have been used as tools for pattern modeling and recognition. We also describe several practical applications, which have used them with great success.

Book ChapterDOI
28 Sep 2004
TL;DR: This paper presents an Earley parser for string generating hypergraph grammars, leading to a parser for natural languages that is able to handle discontinuities, and describes how to model discontinuous constituents in natural languages.
Abstract: A string generating hypergraph grammar is a hyperedge replacement grammar where the resulting language consists of string graphs i.e. hypergraphs modeling strings. With the help of these grammars, string languages like a n b n c n can be modeled that can not be generated by context-free grammars for strings. They are well suited to model discontinuous constituents in natural languages, i.e. constituents that are interrupted by other constituents. For parsing context-free Chomsky grammars, the Earley parser is well known. In this paper, an Earley parser for string generating hypergraph grammars is presented, leading to a parser for natural languages that is able to handle discontinuities.

Proceedings ArticleDOI
23 Aug 2004
TL;DR: This work presents a constraint-based syntax-semantics interface for the construction of RMRS (Robust Minimal Recursion Semantics) representations from shallow grammars, and defines modular semantics construction principles in a typed feature structure formalism that allow flexible adaptation to alternative Grammars and different languages.
Abstract: We present a constraint-based syntax-semantics interface for the construction of RMRS (Robust Minimal Recursion Semantics) representations from shallow grammars. The architecture is designed to allow modular interfaces to existing shallow grammars of various depth - ranging from chunk grammars to context-free stochastic grammars. We define modular semantics construction principles in a typed feature structure formalism that allow flexible adaptation to alternative grammars and different languages.

Proceedings Article
25 Jul 2004
TL;DR: The long-term goal is to develop an approach to learning both syntax and semantics that bootstraps itself, using limited knowledge about syntax to infer additional knowledge about semantics, and limitedknowledge about semantics to inferAdditional knowledge about morphology.
Abstract: Context-free grammars cannot be identified in the limit from positive examples (Gold 1967), yet natural language grammars are more powerful than context-free grammars and humans learn them with remarkable ease from positive examples (Marcus 1993). Identifiability results for formal languages ignore a potentially powerful source of information available to learners of natural languages, namely, meanings. This paper explores the learnability of syntax (i.e. context-free grammars) given positive examples and knowledge of lexical semantics, and the learnability of lexical semantics given knowledge of syntax. The long-term goal is to develop an approach to learning both syntax and semantics that bootstraps itself, using limited knowledge about syntax to infer additional knowledge about semantics, and limited knowledge about semantics to infer additional knowledge about syntax.

Book ChapterDOI
07 Jun 2004
TL;DR: The first achievement in the field of grammatical inferencing of GDPLL(k) grammars is presented: an algorithm of automatic construction of a GDPLL (k) grammar from a so-called polynomial specification of the language.
Abstract: The recent results of the research into construction of syntactic pattern recognition-based expert systems are presented. The model of syntactic pattern recognition has been defined with the use of GDPLL(k) grammars and parsers, and the model has been successfully applied as an efficient tool for inference support in several expert systems. Nevertheless, one of the main problems of practical application of GDPLL(k) grammars consists in difficulties in defining the grammar from the sample of a pattern language. In the paper we present the first achievement in the field of grammatical inferencing of GDPLL(k) grammars: an algorithm of automatic construction of a GDPLL(k) grammar from a so-called polynomial specification of the language.

01 Jan 2004
TL;DR: The paper introduces libkp comparison and evaluation system for the syntactic processing of natural languages and discusses the advantages of the modular design as well as efficiency of the processing on the standard evaluation grammars.
Abstract: The paper introduces libkp comparison and evaluation system for the syntactic processing of natural languages. The analysis of sentences is based on context free grammar for given language with contextual extensions (constraints). The tool is language-independent, even though it is optimized for very large and highly ambiguous grammars for Czech (thousands of rules). We discuss the advantages of our modular design as well as efficiency of the processing on the standard evaluation grammars. We also expect this system to be used for comparing and evaluating different CFG-parsing algorithms.

01 Jan 2004
TL;DR: This dissertation develops and demonstrates a framework for carrying out rigorous comparisons of grammar formalisms in terms of their usefulness for applications, and adopts Miller's view of SGC as pertaining not directly to structural descriptions but their interpretations in particular domains, to pave the way for theoretical research to pursue results that are more directed towards applications.
Abstract: Grammars are gaining importance in statistical natural language processing and computational biology as a means of encoding theories and structuring algorithms. But one serious obstacle to applications of grammars is that formal language theory traditionally classifies grammars according to their weak generative capacity (WGC)—what sets of strings they generate—and tends to ignore strong generative capacity (SGC)—what sets of structural descriptions they generate—even though the latter is more relevant to applications. This dissertation develops and demonstrates, for the first time, a framework for carrying out rigorous comparisons of grammar formalisms in terms of their usefulness for applications. We do so by adopting Miller's view of SGC as pertaining not directly to structural descriptions but their interpretations in particular domains; and, following Joshi et al., by appropriately constraining the grammars and interpretations we consider. We then consider three areas of application. The first area is statistical parsing. We find that, in this domain, attempts to increase the SGC of a formalism can often be compiled back into the simpler formalism, gaining nothing. But this suggests a new view of current parsing models as compiled versions of grammars from richer formalisms. We discuss the implications of this view and its implementation in a probabilistic tree-adjoining grammar model, with experimental results on English and Chinese. For our other two applications, by contrast, we can readily increase the SGC of a formalism without increasing its computational complexity. For natural language translation, we discuss the formal, linguistic, and computational properties of a formalism that is more powerful than those currently proposed for statistical machine translation systems. Finally, we explore the application of formal grammars to modeling secondary/tertiary structures of biological sequences. We show how additional SGC can be used to extend models to take more complex structures into account, paying special attention to the technique of intersection, which has drawn comparatively little attention in computational linguistics. These results should pave the way for theoretical research to pursue results that are more directed towards applications, and for practical research to explore the use of advanced grammar formalisms more easily.

Journal ArticleDOI
TL;DR: This paper shows how some subclasses of contextual grammars can be translated into equivalent range concatenation Grammars and can thus be parsed in polynomial time, however, on some other subclasses, this translation schema only succeeds if the range Concatenation grammar formalism is extended.

Journal ArticleDOI
TL;DR: A generalization of ET0L systems is introduced: grammars with branching synchronization and nested tables that have the same string- and tree-generating power as n-fold compositions of top-down tree transducers.

Proceedings Article
01 May 2004
TL;DR: A corpus-based method for identifying and learning patterns describing events in a specific domain by examining the manner in which a small number of keywords in the domain are distributed throughout the corpus.
Abstract: We present a corpus-based method for identifying and learning patterns describing events in a specific domain by examining the manner in which: (a) a small number of keywords in the domain are distributed throughout the corpus; and, (b) a local grammar that is idiosyncratic of a class of events in the domain, governs the usage of the keywords. We tested our method against a corpus of 3.63 million words. The results show promise. More importantly, the method can be applied to any arbitrary domains.

Journal ArticleDOI
TL;DR: Using systems of equations, a number of subclasses of grammars, with self-embeddedness terms, such as $X \alpha X$ and $\gamma X \gamma$, that can still have regular languages as solutions are highlighted.
Abstract: In general, it is undecidable if an arbitrary context-free grammar has a regular solution. Past work has focused on special cases, such as one-letter grammars, non self-embedded grammars and the finite-language grammars, for which regular counterparts have been proven to exist. However, little is known about grammars with the self-embedded property. Using systems of equations, we highlight a number of subclasses of grammars, with self-embeddedness terms, such as $X \alpha X$ and $\gamma X \gamma$, that can still have regular languages as solutions. Constructive proofs that allow these subclasses of context-free grammars to be transformed to regular expressions are provided. We also point out a subclass of context-free grammars that is inherently non-regular. Our latest results can help demarcate more precisely the known boundaries between the regular and non-regular languages, within the context-free domain.

Book ChapterDOI
01 Jan 2004
TL;DR: This chapter argues for a particular architecture of Optimality Theory (OT) syntax that is bidirectional, the usual production-oriented optimization and makes crucial use of semantic and pragmatic factors.
Abstract: This chapter argues for a particular architecture of Optimality Theory (OT) syntax. This architecture has three core features: (i) it is bidirectional, the usual production-oriented optimization (called ‘first optimization’ here) is accompanied by a second step that checks the recoverability of an underlying form; (ii) this underlying form already contains a full-fledged syntactic specification; (iii) the procedure checking for recoverability especially makes crucial use of semantic and pragmatic factors.

Book ChapterDOI
01 Jan 2004
TL;DR: A simple and intuitive approximation method for turning unification-based grammars into context-free Grammars is presented and a novel disambiguation method is introduced which is based on probabilistic context- free approximations.
Abstract: We present a simple and intuitive approximation method for turning unification-based grammars into context-free grammars. We apply our method to several grammars and report on the quality of the approximation. We also present several methods that speed up the approximation process and that might be interesting to other areas of unification-based processing. Finally, we introduce a novel disambiguation method for unification grammars which is based on probabilistic context-free approximations.