scispace - formally typeset
Search or ask a question

Showing papers on "L-attributed grammar published in 2008"


Journal ArticleDOI
TL;DR: This work proposes a theory of phonotactic grammars and a learning algorithm that constructs such Grammars from positive evidence, and applies the model in a variety of learning simulations, showing that the learnedgrammars capture the distributional generalizations of these languages and accurately predict the findings of a phonotactics experiment.
Abstract: The study of phonotactics is a central topic in phonology. We propose a theory of phonotactic grammars and a learning algorithm that constructs such grammars from positive evidence. Our grammars consist of constraints that are assigned numerical weights according to the principle of maximum entropy. The grammars assess possible words on the basis of the weighted sum of their constraint violations. The learning algorithm yields grammars that can capture both categorical and gradient phonotactic patterns. The algorithm is not provided with constraints in advance, but uses its own resources to form constraints and weight them. A baseline model, in which Universal Grammar is reduced to a feature set and an SPE -style constraint format, suffices to learn many phonotactic phenomena. In order for the model to learn nonlocal phenomena such as stress and vowel harmony, it must be augmented with autosegmental tiers and metrical grids. Our results thus offer novel, learning-theoretic support for such representations. We apply the model in a variety of learning simulations, showing that the learned grammars capture the distributional generalizations of these languages and accurately predict the findings of a phonotactic experiment.

521 citations


Journal ArticleDOI
TL;DR: Silver is described, an extensible attribute grammar specification language, and it is shown how it can be extended with general purpose features such as pattern matching and domain specific featuressuch as collection attributes and constructs for supporting data-flow analysis of imperative programs.

94 citations


Proceedings Article
Mark Johnson1
01 Jun 2008
TL;DR: It is shown that incorporating both unsupervised syllabification and collocation-finding into the adaptor grammar significantly improves un supervised word-segmentation accuracy over that achieved by adaptor grammars that model only one of these linguistic phenomena.
Abstract: Adaptor grammars (Johnson et al., 2007b) are a non-parametric Bayesian extension of Probabilistic Context-Free Grammars (PCFGs) which in effect learn the probabilities of entire subtrees. In practice, this means that an adaptor grammar learns the structures useful for generating the training data as well as their probabilities. We present several different adaptor grammars that learn to segment phonemic input into words by modeling different linguistic properties of the input. One of the advantages of a grammar-based framework is that it is easy to combine grammars, and we use this ability to compare models that capture different kinds of linguistic structure. We show that incorporating both unsupervised syllabification and collocation-finding into the adaptor grammar significantly improves unsupervised word-segmentation accuracy over that achieved by adaptor grammars that model only one of these linguistic phenomena.

77 citations


01 Jan 2008
TL;DR: A measure for the degree of a treebank’s mild contextsensitivity is presented and compared to similar measures used in non-projective dependency parsing and to discontinuous phrase structure grammar (DPSG).
Abstract: Sometreebanks, such as German TIGER/NeGra, represent discontinuous elements directly, i.e. trees contain crossing edges, but the context-free grammars that are extracted from them, fail to make any use of this information. In this paper, we present amethod for extracting mildly context-sensitive grammars, i.e. simple range concatenation grammars (RCGs), from such treebanks. A measure for the degree of a treebank’s mild contextsensitivity is presented and compared to similar measures used in non-projective dependency parsing. Our work is also compared to discontinuous phrase structure grammar (DPSG).

43 citations


Journal ArticleDOI
TL;DR: This work maps several notions of dynamicity into the same formal framework in order to distill the similarities and differences among them, and explains different styles of architectural dynamisms in term of graph grammars.

41 citations


Proceedings ArticleDOI
25 Oct 2008
TL;DR: This work presents a discriminative, latent variable approach to syntactic parsing in which rules exist at multiple scales of refinement, and produces the best published parsing accuracies with the smallest reported grammars.
Abstract: We present a discriminative, latent variable approach to syntactic parsing in which rules exist at multiple scales of refinement. The model is formally a latent variable CRF grammar over trees, learned by iteratively splitting grammar productions (not categories). Different regions of the grammar are refined to different degrees, yielding grammars which are three orders of magnitude smaller than the single-scale baseline and 20 times smaller than the split-and-merge grammars of Petrov et al. (2006). In addition, our discriminative approach integrally admits features beyond local tree configurations. We present a multiscale training method along with an efficient CKY-style dynamic program. On a variety of domains and languages, this method produces the best published parsing accuracies with the smallest reported grammars.

41 citations


Proceedings ArticleDOI
20 Jun 2008
TL;DR: In this work, a minimal initial grammar is hierarchically refined using an adaptive split-and-merge EM procedure, giving compact, accurate grammars that encode many linguistically interpretable patterns and give the best published parsing accuracies on three German treebanks.
Abstract: We describe experiments on learning latent variable grammars for various German tree-banks, using a language-agnostic statistical approach. In our method, a minimal initial grammar is hierarchically refined using an adaptive split-and-merge EM procedure, giving compact, accurate grammars. The learning procedure directly maximizes the likelihood of the training treebank, without the use of any language specific or linguistically constrained features. Nonetheless, the resulting grammars encode many linguistically interpretable patterns and give the best published parsing accuracies on three German treebanks.

34 citations


31 Dec 2008
TL;DR: In this paper, the authors propose attribute decorators that describe an abstract evaluation mechanism for attribute grammars, making it possible to provide such extensions as part of a library of decorators.
Abstract: Preprint of paper published in: Compiler Construction, Lecture Notes in Computer Science 5501, 2009; doi:10.1007/978-3-642-00722-4_11 Attribute grammars are a powerful specification formalism for tree-based computation, particularly for software language processing. Various extensions have been proposed to abstract over common patterns in attribute grammar specifications. These include various forms of copy rules to support non-local dependencies, collection attributes, and expressing dependencies that are evaluated to a fixed point. Rather than implementing extensions natively in an attribute evaluator, we propose attribute decorators that describe an abstract evaluation mechanism for attributes, making it possible to provide such extensions as part of a library of decorators. Inspired by strategic programming, decorators are specified using generic traversal operators. To demonstrate their effectiveness, we describe how to employ decorators in name, type, and flow analysis.

33 citations


Book ChapterDOI
01 Jan 2008
TL;DR: Generalized Categorial Dependency Grammars (gCDG) studied in this paper are genuine categorial grammars expressing projective and discontinuous dependencies, stronger than CF-grammars and non-equivalent to mild context-sensitive grammarmars.
Abstract: Generalized Categorial Dependency Grammars (gCDG) studied in this paper are genuine categorial grammars expressing projective and discontinuous dependencies, stronger than CF-grammars and non-equivalent to mild context-sensitive grammars. We show that gCDG are parsed in polynomial time and enjoy good mathematical properties.

33 citations


Proceedings ArticleDOI
12 May 2008
TL;DR: A one to one correspondence between triple graph Grammars and suitable plain graph grammars is shown, thus results and benefits of the triple case can be transferred to the plain case and main results show the relationship between both graph transformation approaches, syntactical correctness of model transformations based on triple graph gram mars and a sound and complete condition for functional behaviour.
Abstract: Triple graph grammars have been applied and implemented as a formal basis for model transformations in a variety of application areas. They convince by special abilities in automatic derivation of forward, backward and several other transformations out of just one specified set of rules for the integrated model defined by a triple of graphs. While many case studies and all implementations, which state that they are using triple graph grammars, do not use triples of graphs, this paper presents the justification for many of them. It shows a one to one correspondence between triple graph grammars and suitable plain graph grammars, thus results and benefits of the triple case can be transferred to the plain case.Main results show the relationship between both graph transformation approaches, syntactical correctness of model transformations based on triple graph grammars and a sound and complete condition for functional behaviour. Theoretical results are elaborated on an intuitive case study for a model transformation from class diagrams to database models.

33 citations


Book ChapterDOI
15 Oct 2008
TL;DR: Relational growth Grammars are a variant of graph grammars with a principal application for plant modelling, where they extend the well-established, but limited formalism of L-systems.
Abstract: We present the formalism of relational growth grammars. They are a variant of graph grammars with a principal application for plant modelling, where they extend the well-established, but limited formalism of L-systems. The main property is the application of rules in parallel, motivated by the fact that life is fundamentally parallel. A further speciality is the dynamic creation of right-hand sides on rule application. Relational growth grammars have been successfully used not only for plant modelling, but also to model general 3D structures or systems of Artificial Life. We illustrate these applications at several examples, all being implemented using our programming language XL which extends Java and provides an implementation of relational growth grammars.

Journal ArticleDOI
TL;DR: The generative power of several subclasses of variable-linear macro Grammars and that of multiple context-free grammars are compared in details.
Abstract: Several grammars of which generative power is between context-free grammar and context-sensitive grammar were proposed. Among them are macro grammar and tree adjoining grammar. Multiple context-free grammar is also a natural extension of context-free grammars, and is known to be stronger in its generative power than tree adjoining grammar and yet to be recognizable in polynomial time. In this paper, the generative power of several subclasses of variable-linear macro grammars and that of multiple context-free grammars are compared in details.

Proceedings Article
01 May 2008
TL;DR: A framework allowing to describe tree-based grammars, and an actual fragment of a core multicomponent tree-adjoining grammar with tree tuples (TT-MCTAG) for German developed using this framework.
Abstract: Developing linguistic resources, in particular grammars, is known to be a complex task in itself, because of (amongst others) redundancy and consistency issues. Furthermore some languages can reveal themselves hard to describe because of specific characteristics, e.g. the free word order in German. In this context, we present (i) a framework allowing to describe tree-based grammars, and (ii) an actual fragment of a core multicomponent tree-adjoining grammar with tree tuples (TT-MCTAG) for German developed using this framework. This framework combines a metagrammar compiler and a parser based on range concatenation grammar (RCG) to respectively check the consistency and the correction of the grammar. The German grammar being developed within this framework already deals with a wide range of scrambling and extraction phenomena.

Proceedings ArticleDOI
18 Aug 2008
TL;DR: This paper presents a general platform, namely synchronous tree sequence substitution grammar (STSSG), for the grammar comparison study in Translational Equivalence Modeling (TEM) and Statistical Machine Translation (SMT).
Abstract: This paper presents a general platform, namely synchronous tree sequence substitution grammar (STSSG), for the grammar comparison study in Translational Equivalence Modeling (TEM) and Statistical Machine Translation (SMT). Under the STSSG platform, we compare the expressive abilities of various grammars through synchronous parsing and a real translation platform on a variety of Chinese-English bilingual corpora. Experimental results show that the STSSG is able to better explain the data in parallel corpora than other grammars. Our study further finds that the complexity of structure divergence is much higher than suggested in literature, which imposes a big challenge to syntactic transformation-based SMT.

Proceedings ArticleDOI
18 Aug 2008
TL;DR: This paper compared an HPSG parser with several CFG parsers in an experiment and found that meaningful differences among the parsers' performance can still be observed by such a shallow representation.
Abstract: This paper presents a methodology for the comparative performance analysis of the parsers developed for different grammar frameworks For such a comparison, we need a common representation format of the parsing results since the representation of the parsing results depends on the grammar frameworks; hence they are not directly comparable to each other We first convert the parsing result to a shallow CFG analysis by using an automatic tree converter based on synchronous grammars The use of such a shallow representation as a common format has the advantage of reduced noise introduced by the conversion in comparison with the noise produced by the conversion to deeper representations We compared an HPSG parser with several CFG parsers in our experiment and found that meaningful differences among the parsers' performance can still be observed by such a shallow representation

Proceedings ArticleDOI
24 Aug 2008
TL;DR: An open-source parsing environment (Tubingen Linguistic Parsing Architecture, TuLiPA) is presented which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to the parsing of several mildly context-sensitive formalisms.
Abstract: In this paper, we present an open-source parsing environment (Tubingen Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to the parsing of several mildly context-sensitive formalisms. This environment currently supports tree-based grammars (namely Tree-Adjoining Grammars (TAG) and Multi-Component Tree-Adjoining Grammars with Tree Tuples (TT-MCTAG)) and allows computation not only of syntactic structures, but also of the corresponding semantic representations. It is used for the development of a tree-based grammar for German.

Book ChapterDOI
15 Sep 2008
TL;DR: An application of incremental learning of definite clause grammars (DCGs) and syntax directed translation schema (SDTS) and Synapse synthesized a set of SDTS rules for translating extended arithmetic expressions with function calls and assignment operators into object codes from positive and negative samples of the translation.
Abstract: This paper discusses machine learning of grammars and compilers of programming languages from samples of translation from source programs into object codes. This work is an application of incremental learning of definite clause grammars (DCGs) and syntax directed translation schema (SDTS), which is implemented in the Synapse system. The main experimental result is that Synapse synthesized a set of SDTS rules for translating extended arithmetic expressions with function calls and assignment operators into object codes from positive and negative samples of the translation. The object language is a simple intermediate language based on inverse Polish notation. These rules contain an unambiguous context free grammar for the extended arithmetic expressions, which specifies the precedence and associativity of the operators. This approach can be used for designing and implementing a new programming language by giving the syntax and semantics in the form of the samples of the translation.

Book ChapterDOI
25 Aug 2008
TL;DR: A simple type of tiling is focused on, named regional, and the corresponding regional tile grammars are defined, which include both Siromoney's (or Matz's) Kolam Grammars, and their generalization by Průsa.
Abstract: Several classical models of picture grammars based on array rewriting rules can be unified and extended by a tiling based approach. The right part of a rewriting rule is formalized by a finite set of permitted tiles. We focus on a simple type of tiling, named regional, and define the corresponding regional tile grammars. They include both Siromoney's (or Matz's) Kolam grammars, and their generalization by Průsa. Regionally defined pictures can be recognized with polynomial time complexity by an algorithm extending the CKY one for strings. Regional tile grammars and languages are strictly included into the tile grammars and languages, and are incomparable with Giammarresi-Restivo tiling systems (or Wang's tilings).

Journal Article
TL;DR: In this article, the authors investigate the application of the Right-Nulled Generalized====== LR parsing algorithm (RNGLR) to scannerless parsing and present an updated parsing and filtering algorithm, called SRNGLR, and analyze its performance in comparison to SGLR on ambiguous grammars for the C, Java, Python, SASL, and C++.
Abstract: Analysis and renovation of large software portfolios requires syntax analysis of multiple, usually embedded, languages and this is beyond the capabilities of many standard parsing techniques. The traditional separation between lexer and parser falls short due to the limitations of tokenization based on regular expressions when handling multiple lexical grammars. In such cases scannerless parsing provides a viable solution. It uses the power of context-free grammars to be able to deal with a wide variety of issues in parsing lexical syntax. However, it comes at the price of less efficiency. The structure of tokens is obtained using a more powerful but more time and memory intensive parsing algorithm. Scannerless grammars are also more non-deterministic than their tokenized counterparts, increasing the burden on the parsing algorithm even further. In this paper we investigate the application of the Right-Nulled Generalized LR parsing algorithm (RNGLR) to scannerless parsing. We adapt the Scannerless Generalized LR parsing and filtering algorithm (SGLR) to implement the optimizations of RNGLR. We present an updated parsing and filtering algorithm, called SRNGLR, and analyze its performance in comparison to SGLR on ambiguous grammars for the programming languages C, Java, Python, SASL, and C++. Measurements show that SRNGLR is on average 33% faster than SGLR, but is 95% faster on the highly ambiguous SASL grammar. For the mainstream languages C, C++, Java and Python the average speedup is 16%.

Book ChapterDOI
25 Aug 2008
TL;DR: A web-based syntax editor for Grammatical Framework (GF) grammars which allows both direct abstract syntax tree manipulation and text input in any of the languages supported by the grammar.
Abstract: We present an approach to multilingual web content based on multilingual grammars and syntax editing for a controlled language. Content can be edited in any supported language and it is automatically kept within a controlled language fragment. We have implemented a web-based syntax editor for Grammatical Framework (GF) grammars which allows both direct abstract syntax tree manipulation and text input in any of the languages supported by the grammar. With this syntax editor and the GF JavaScript API, GF grammars can be used to build multilingual web applications. As a demonstration, we have implemented an example application in which users can add, edit and review restaurants in English, Spanish and Swedish.

Journal Article
TL;DR: A class of two-dimensional array grammars are considered that extend the contextual operations on strings to arrays in a naturalway and generate languages of pictures of rectangular arrays.
Abstract: In this paper, we consider a class of two-dimensional array grammars, called parallel contextual array grammars, that extend the contextual operations on strings to arrays in a naturalway and generate languages of pictures of rectangular arrays. Several classes of these array grammars and the resulting families of picture languages are considered. Necessary conditions for picture languages to be contained in these classes are obtained and the relations between these families are also established.

01 Jun 2008
TL;DR: Two recent extension of the non-associative Lambek calculus are shown to generate the same class of languages as tree adjoining grammars, using (tree generating) hyperedge replacement Grammars as an intermediate step.
Abstract: Two recent extension of the non-associative Lambek calculus, the Lambek-Grishin calculus and the multimodal Lambek calculus, are shown to generate the same class of languages as tree adjoining grammars, using (tree generating) hyperedge replacement grammars as an intermediate step. As a consequence both extensions are mildly context-sensitive formalisms and benefit from polynomial parsing algorithms.

Proceedings Article
01 Jan 2008
TL;DR: The possibility of enriching MGs with another controversial mechanism, namely, countercyclic operations, which allow structure building at any node in the tree instead of just at the root is explored.
Abstract: Minimalist grammars (MGs), as introduced in Stabler (1997), have proven a useful instrument in the formal analysis of syntactic theories developed within the minimalist branch of the principles–and– parameters framework (cf. Chomsky 1995, 2000). In fact, as shown in Michaelis (2001), MGs belong to the class of mildly context–sensitive grammars. Interestingly, without there being a rise in (at least weak) generative power, (extensions and variants of) MGs accommodate a wide variety of (arguably) “odd” items from the syntactician’s toolbox, such as head movement (Stabler 1997, 2001), affix hopping (Stabler 2001), (strict) remnant movement (Stabler 1997, 1999), adjunction (Frey and Gartner 2002), and (to some extent) scrambling (Frey and Gartner 2002). Here, we would like to explore the possibility of enriching MGs with another controversial mechanism, namely, countercyclic operations. These operations allow structure building at any node in the tree instead of just at the root. We will first discuss countercyclic ad-

01 Jun 2008
TL;DR: A formalism, feature-based regular tree grammars, and a translation from feature based tree adjoining grammar into this new formalism that preserves the derivation structures of the original grammar, and accounts for feature unification.
Abstract: The derivation trees of a tree adjoining grammar provide a first insight into the sentence semantics, and are thus prime targets for generation systems We define a formalism, feature-based regular tree grammars, and a translation from feature based tree adjoining grammars into this new formalism The translation preserves the derivation structures of the original grammar, and accounts for feature unification

01 Jan 2008
TL;DR: Some results on the power of tree controlled grammars are presented where the regular languages are restricted to some known subclasses of the family of regular languages.
Abstract: Tree controlled grammars are context-free grammars where the associated language only contains those terminal words which have a derivation where the word of any level of the corresponding derivation tree belongs to a given regular language. We present some results on the power of such grammars where we restrict the regular languages to some known subclasses of the family of regular languages.

Proceedings ArticleDOI
30 Jun 2008
TL;DR: In this paper the environment of PAG (Prototyping with Attribute Grammars) is described and its educational uses are reported.
Abstract: PAG (Prototyping with Attribute Grammars) is an environment that promotes active learning in courses on language processing (e.g. compiler construction and computational linguistics). In PAG, learners can specify the syntax and the semantics of their languages with attribute grammars. Then, the environment generates prototypes of processors for the languages specified which learners can test with different inputs. For each valid input the prototypes produce one or more decorated syntax trees, which learners can navigate using the semantic equations in the original grammar. In this paper we describe the environment and we report its educational uses.

Book ChapterDOI
07 Jun 2008
TL;DR: A polynomial algorithm for deciding whether a given word belongs to a language generated by a given unidirectional Lambek grammar is presented.
Abstract: Lambek grammars provide a useful tool for studying formal and natural languages. The generative power of unidirectional Lambek grammars equals that of context-free grammars. However, no feasible algorithm was known for deciding membership in the corresponding formal languages. In this paper we present a polynomial algorithm for deciding whether a given word belongs to a language generated by a given unidirectional Lambek grammar.

Proceedings ArticleDOI
16 Jun 2008
TL;DR: This paper investigates transforms of split dependency Grammars into unlexicalised context-free grammars annotated with hidden symbols and achieves an accuracy of 88% on the Penn Treebank data set, that represents a 50% reduction in error over previously published results on unlexifying dependency parsing.
Abstract: This paper investigates transforms of split dependency grammars into unlexicalised context-free grammars annotated with hidden symbols. Our best unlexicalised grammar achieves an accuracy of 88% on the Penn Treebank data set, that represents a 50% reduction in error over previously published results on unlexicalised dependency parsing.

Journal ArticleDOI
TL;DR: A three level psycholinguistic model is presented to account for L1 and L2 variation constrained by social factors only at Level I, by linguistic factors at Level II and change over time at Level III.
Abstract: Abstract Using sociolinguistic methods, variationists have successfully modeled many of the numerous factors that constrain L2 speakers’ use of variable linguistic forms. However, variationists have been less successful in developing a psycholinguistic model to account for variation in the grammar. In this paper we first describe early studies of L2 variation. Then, using examples from L1 English and L2 and bilingual Spanish, we present a three level psycholinguistic model to account for L1 and L2 variation constrained by social factors only at Level I, by linguistic factors at Level II and change over time at Level III.

Proceedings ArticleDOI
10 Jun 2008
TL;DR: This short paper proposes a mechanism based upon prototype grammars that automatically pushes changes from prototypes to derived Grammars even in the presence of semantic actions.
Abstract: Reusing syntax specifications without embedded arbitrary semantic actions is straightforward because the semantic analysis phases of new applications can feed off trees or other intermediate structures constructed by the pre-existing parser The presence of arbitrary embedded semantic actions, however, makes reuse difficult with existing mechanisms such as grammar inheritance and modules This short paper proposes a mechanism based upon prototype grammars that automatically pushes changes from prototypes to derived grammars even in the presence of semantic actions The prototype mechanism alone would be unsuitable for creating a new grammar from multiple preexisting grammars When combined with grammar composition, however, the prototype mechanism would improve grammar reuse because imported pre-existing grammars could be altered to suit each new application