scispace - formally typeset
Search or ask a question

Showing papers on "Tree-adjoining grammar published in 2000"


Journal Article
TL;DR: An object-oriented extension to canonical attribute grammars is described, permitting attributes to be references to arbitrary nodes in the syntax tree, and Attributes to be accessed via the reference attributes.
Abstract: An object-oriented extension to canonical attribute grammars is described, permitting attributes to be references to arbitrary nodes in the syntax tree, and attributes to be accessed via the reference attributes. Important practical problems such as name and type analysis for object-oriented languages can be expressed in a concise and modular manner in these grammars, and an optimal evaluation algorithm is available. An extensive example is given, capturing all the key constructs in object-oriented languages including block structure, classes, inheritance, qualified use, and assignment compatibility in the presence of subtyping. The formalism and algorithm have been implemented in APPLAB, an interactive language development tool.

192 citations


Proceedings ArticleDOI
03 Oct 2000
TL;DR: This work describes the induction of a probabilistic LTAG model from the Penn Treebank and finds that this induction method is an improvement over the EM-based method of (Hwa, 1998), and that the induced model yields results comparable to lexicalized PCFG.
Abstract: We discuss the advantages of lexicalized tree-adjoining grammar as an alternative to lexicalized PCFG for statistical parsing, describing the induction of a probabilistic LTAG model from the Penn Treebank and evaluating its parsing performance. We find that this induction method is an improvement over the EM-based method of (Hwa, 1998), and that the induced model yields results comparable to lexicalized PCFG.

180 citations


Book ChapterDOI
01 Jan 2000
TL;DR: This chapter introduces weighted bilexical grammars, a formalism in which individual lexical items, such as verbs and their arguments, can have idiosyncratic selectional influences on each other.
Abstract: This chapter introduces weighted bilexical grammars, a formalism in which individual lexical items, such as verbs and their arguments, can have idiosyncratic selectional influences on each other. Such ‘bilexicalism’ has been a theme of much current work in parsing. The new formalism can be used to describe bilexical approaches to both dependency and phrase-structure grammars, and a slight modification yields link grammars. Its scoring approach is compatible with a wide variety of probability models.

155 citations


Book ChapterDOI
Pat Langley1, Sean Stromsten1
31 May 2000
TL;DR: A rational reconstruction of Wolff's SNPR - the GRIDS system - is presented which incorporates a bias toward grammars that minimize description length, and the algorithm alternates between merging existing nonterminal symbols and creating new symbols, using a beam search to move from complex to simpler Grammars.
Abstract: We examine the role of simplicity in directing the induction of context-free grammars from sample sentences. We present a rational reconstruction of Wolff's SNPR - the GRIDS system - which incorporates a bias toward grammars that minimize description length. The algorithm alternates between merging existing nonterminal symbols and creating new symbols, using a beam search to move from complex to simpler grammars. Experiments suggest that this approach can induce accurate grammars and that it scales reasonably to more difficult domains.

101 citations


Book ChapterDOI
11 Sep 2000
TL;DR: It is proved that the problem of parsing a given string or its most probable parse with stochastic regular grammars is NP-hard and does not allow for a polynomial time approximation scheme.
Abstract: Determinism plays an important role in grammatical inference. However, in practice, ambiguous grammars (and non determinism grammars in particular) are more used than determinism grammars. Computing the probability of parsing a given string or its most probable parse with stochastic regular grammars can be performed in linear time. However, the problem of finding the most probable string has yet not given any satisfactory answer. In this paper we prove that the problem is NP-hard and does not allow for a polynomial time approximation scheme. The result extends to stochastic regular syntax-directed translation schemes.

86 citations


Patent
Mark E. Epstein1
25 Oct 2000
TL;DR: This paper applied a context free grammar to the text input to determine substrings and corresponding parse trees, and examined each possible substring using an inventory of queries corresponding to the CFG.
Abstract: A method and system for use in a natural language understanding system for including grammars within a statistical parser. The method involves a series of steps. The invention receives a text input. The invention applies a first context free grammar to the text input to determine substrings and corresponding parse trees, wherein the substrings and corresponding parse trees further correspond to the first context free grammar. Additionally, the invention can examine each possible substring using an inventory of queries corresponding to the CFG.

74 citations


Proceedings Article
23 Feb 2000
TL;DR: The authors extract different LTAGs from the Penn Treebank and show that certain strategies yield an improved extracted LTAG in terms of compactness, broad coverage, and supertagging accuracy.
Abstract: The accuracy of statistical parsing models can be improved with the use of lexical information Statistical parsing using Lexicalized tree adjoining grammar (LTAG), a kind of lexicalized grammar, has remained relatively unexplored We believe that is largely in part due to the absence of large corpora accurately bracketed in terms of a perspicuous yet broad coverage LTAG Our work attempts to alleviate this difficulty We extract different LTAGs from the Penn Treebank We show that certain strategies yield an improved extracted LTAG in terms of compactness, broad coverage, and supertagging accuracy Furthermore, we perform a preliminary investigation in smoothing these grammars by means of an external linguistic resource, namely, the tree families of an XTAG grammar, a hand built grammar of English

68 citations


Proceedings ArticleDOI
Rens Bod1
31 Jul 2000
TL;DR: It is shown that the common wisdom is wrong for stochastic grammars that use elementary trees instead of context-free rules, such as Stochastic Tree-Substitution Grammars used by Data-Oriented Parsing models, and a non-probabilistic metrics based on the shortest derivation outperforms a probabilistic metric on the ATIS and OVIS corpora.
Abstract: Common wisdom has it that the bias of stochastic grammars in favor of shorter derivations of a sentence is harmful and should be redressed. We show that the common wisdom is wrong for stochastic grammars that use elementary trees instead of context-free rules, such as Stochastic Tree-Substitution Grammars used by Data-Oriented Parsing models. For such grammars a non-probabilistic metric based on the shortest derivation outperforms a probabilistic metric on the ATIS and OVIS corpora, while it obtains competitive results on the Wall Street Journal (WSJ) corpus. This paper also contains the first published experiments with DOP on the WSJ.

48 citations


Journal ArticleDOI
TL;DR: It is shown that the class of string languages generated by spine Grammars coincides with that of tree adjoining grammars.
Abstract: In this paper we introduce a restricted model of context-free tree grammars called spine grammars, and study their formal properties including considerably simple normal forms. Recent research on natural languages has suggested that formalisms for natural languages need to generate a slightly larger class of languages than context-free grammars, and for that reason tree adjoining grammars have been widely studied relating them to natural languages. It is shown that the class of string languages generated by spine grammars coincides with that of tree adjoining grammars. We also introduce acceptors called linear pushdown tree automata, and show that linear pushdown tree automata accept exactly the class of tree languages generated by spine grammars. Linear pushdown tree automata are obtained from pushdown tree automata with a restriction on duplicability for the pushdown stacks.

46 citations


Book ChapterDOI
11 Sep 2000
TL;DR: A technique to infer finite-state transducers is proposed in this work, based on the formal relations between finite- state transducers and regular grammars.
Abstract: A technique to infer finite-state transducers is proposed in this work. This technique is based on the formal relations between finite-state transducers and regular grammars. The technique consists of: 1) building a corpus of training strings from the corpus of training pairs; 2) inferring a regular grammar and 3) transforming the grammar into a finite-state transducer.

41 citations


Book ChapterDOI
TL;DR: An efficient algorithm is proposed to solve one of the problems associated to the use of weighted and stochastic Context-Free Grammars: the problem of computing the N best parse trees of a given string.
Abstract: Context-Free Grammars are the object of increasing interest in the pattern recognition research community in an attempt to overcome the limited modeling capabilities of the simpler regular grammars, and have application in a variety of fields such as language modeling, speech recognition, optical character recognition, computational biology, etc. This paper proposes an efficient algorithm to solve one of the problems associated to the use of weighted and stochastic Context-Free Grammars: the problem of computing the N best parse trees of a given string. After the best parse tree has been computed using the CYK algorithm, a large number of alternative parse trees are obtained, in order by weight (or probability), in a small fraction of the time required by the CYK algorithm to find the best parse tree. This is confirmed by experimental results using grammars from two different domains: a chromosome grammar, and a grammar modeling natural language sentences from the Wall Street Journal corpus.

Book ChapterDOI
11 Sep 2000
TL;DR: This paper describes a method of synthesizing context free grammars from positive and negative sample strings, which is implemented in a grammatical inference system called Synapse, based on incremental learning for positive samples and a rule generation method by "inductive CYK algorithm,” which generates minimal production rules required for parsing positive samples.
Abstract: This paper describes a method of synthesizing context free grammars from positive and negative sample strings, which is implemented in a grammatical inference system called Synapse. The method is based on incremental learning for positive samples and a rule generation method by “inductive CYK algorithm,” which generates minimal production rules required for parsing positive samples. Synapse can generate unambiguous grammars as well as ambiguous grammars. Some experiments showed that Synapse can synthesize several simple context free grammars in considerably short time.

Proceedings Article
29 Apr 2000
TL;DR: This paper describes a method for estimating conditional probability distributions over the parses of "unification-based" grammars which can utilize auxiliary distributions that are estimated by other means, and applies this estimator to a Stochastic Lexical-Functional Grammar.
Abstract: This paper describes a method for estimating conditional probability distributions over the parses of "unification-based" grammars which can utilize auxiliary distributions that are estimated by other means. We show how this can be used to incorporate information about lexical selectional preferences gathered from other sources into Stochastic "Unification-based" Grammars (SUBGs). While we apply this estimator to a Stochastic Lexical-Functional Grammar, the method is general, and should be applicable to stochastic versions of HPSGs, categorial grammars and transformational grammars.

Journal ArticleDOI
TL;DR: It is shown that random context grammars are strictly weaker than the non-erasing random context Grammars and prove a shrinking lemma for their languages.

Book ChapterDOI
01 Sep 2000
TL;DR: A formal approach for the specification of mobile code systems based on graph grammars, that is a formal description technique suitable for the description of highly parallel systems, and is intuitive even for non-theoreticians is introduced.
Abstract: In this paper we introduce a formal approach for the specification of mobile code systems. This approach is based on graph grammars, that is a formal description technique that is suitable for the description of highly parallel systems, and is intuitive even for non-theoreticians We define a special class of graph grammars using the concepts of object-based systems and include location information explicitly. Aspects of modularity and execution in an open environment are discussed.

Journal ArticleDOI
01 May 2000-Grammars
TL;DR: A generalization of context-free grammars which nonetheless still has a cubic parse time complexity is presented, which belongs to an extension of mildly context-sensitive languages in which the constant growth property is relaxed and which can thus potentially be used in natural language processing.
Abstract: Context-free grammars and cubic parse time are so related in people's minds that they often think that parsing any extension of context-free grammars must need some extra time. Of course, this is not necessarily true and this paper presents a generalization of context-free grammars which nonetheless still has a cubic parse time complexity. This extension, which defines a subclass of context-sensitive languages, has both a theoretical and a practical interest. The class of languages defined by these grammars is closed under both intersection and complement (in fact this class contains both the intersection and the complement of context-free languages). Moreover, these languages belong to an extension of mildly context-sensitive languages in which the constant growth property is relaxed and which can thus potentially be used in natural language processing.

Journal ArticleDOI
TL;DR: It is proved that the three-nonterminal scattered context grammars characterize the family of recursively enumerable languages.


01 May 2000
TL;DR: This work has used a Lexicalized Tree Adjoining Grammar to capture the syntax associated with each verb class and has added semantic predicates to each tree, which allow for a compositional interpretation.
Abstract: We present a class-based approach to building a verb lexicon that makes explicit the close relation between syntax and semantics for Levin classes. We have used a Lexicalized Tree Adjoining Grammar to capture the syntax associated with each verb class and have added semantic predicates to each tree, which allow for a compositional interpretation.

Book ChapterDOI
31 Oct 2000
TL;DR: In this article, the authors address the issue of how to associate frequency information with lexicalized grammar formalisms, using Lexicalized Tree Adjoining Grammar as a representative framework, and evaluate their adequacy from both a theoretical and empirical perspective using data from existing large treebanks.
Abstract: We address the issue of how to associate frequency information with lexicalized grammar formalisms, using Lexicalized Tree Adjoining Grammar as a representative framework. We consider systematically a number of alternative probabilistic frameworks, evaluating their adequacy from both a theoretical and empirical perspective using data from existing large treebanks. We also propose three orthogonal approaches for backing off probability estimates to cope with the large number of parameters involved.

01 Jan 2000
TL;DR: The work reported here is a first step towards the development of an implemented TAG grammar for Korean, which is continuously updated with the addition of new analyses and modification of old ones.
Abstract: This document describes an on-going project of developing a grammar of Korean, the Korean XTAG grammar, written in the TAG formalism and implemented for use with the XTAG system enriched with a Korean morphological analyzer The Korean XTAG grammar described in this report is based on the TAG formalism (Joshi et al (1975)), which has been extended to include lexicalization (Schabes et al (1988)), and unification-based feature structures (Vijay-Shanker and Joshi (1991)) The document first describes the modifications that we have made to the XTAG system (The XTAG-Group (1998)) to handle rich inflectional morphology in Korean Then various syntactic phenomena that can be currently handled are described, including adverb modification, relative clauses, complex noun phrases, auxiliary verb constructions, gerunds and adjunct clauses The work reported here is a first step towards the development of an implemented TAG grammar for Korean, which is continuously updated with the addition of new analyses and modification of old ones

Journal ArticleDOI
TL;DR: An efficient, O(n), parsing algorithm for languages generated by dynamically programmed grammars, so-called DPLL(k) grammARS, is presented and can be used for analysis of complex trend functions describing the behaviour of an industrial equipment.

Proceedings ArticleDOI
31 Jul 2000
TL;DR: This paper produces a transformed grammar which simulates left-corner recognition of a user-specified set of the original productions, and top-down recognition of the others, and combined with two factorizations produces non-left-recursive grammars that are not much larger than the original.
Abstract: The left-corner transform removes left-recursion from (probabilistic) context-free grammars and unification grammars, permitting simple top-down parsing techniques to be used. Unfortunately the grammars produced by the standard left-corner transform are usually much larger than the original. The selective left-corner transform described in this paper produces a transformed grammar which simulates left-corner recognition of a user-specified set of the original productions, and top-down recognition of the others. Combined with two factorizations, it produces non-left-recursive grammars that are not much larger than the original.

Journal ArticleDOI
TL;DR: Any language accepted by a Turing machine may be written as a translation of a regular set performed by a generalised stream X-machine with underlying distributed grammars based on context-free rules, under = k derivation strategy.
Abstract: Stream X-machines are a general and powerful computational model. By coupling the control structure of a stream X-machine with a set of formal grammars a new machine called a generalised stream X-machine with underlying distributed grammars, acting as a translator, is obtained. By introducing this new mechanism a hierarchy of computational models is provided. If the grammars are of a particular class, say regular or context-free, then finite sets are translated into finite sets, when ?k, = k derivation strategies are used, and regular or context-free sets, respectively, are obtained for ?k, * and terminal derivation strategies. In both cases, regular or context-free grammars, the regular sets are translated into non-context-free languages. Moreover, any language accepted by a Turing machine may be written as a translation of a regular set performed by a generalised stream X-machine with underlying distributed grammars based on context-free rules, under = k derivation strategy. On the other hand the languages generated by some classes of cooperating distributed grammar systems may be obtained as images of regular sets through some X-machines with underlying distributed grammars. Other relations of the families of languages computed by generalised stream X-machines with the families of languages generated by cooperating distributed grammar systems are established. At the end, an example dealing with the specification of a scanner system illustrates the use of the introduced mechanism as a formal specification model.

01 Jan 2000
TL;DR: Investigation of whether the notion of locality inherent in Tree Adjoining Grammar (TAG) will allow for an efficient approach to automatic extraction of predicate-argument structure of Chinese Treebank parse trees.
Abstract: Working on Information Extraction and management of the TIDES (“Translingual Information Detection, Extraction and Summarization”) program at Penn. Helped develop Penn’s first contribution to the Automatic Content Extraction evaluation (pipelined statistical system). Current research is investigation of whether the notion of locality inherent in Tree Adjoining Grammar (TAG) will allow for an efficient approach to automatic extraction of predicate-argument structure. Developing software for tagging predicate argument structure of Chinese Treebank parse trees.

Proceedings ArticleDOI
31 Jul 2000
TL;DR: The authors describe a series of experiments which investigate the question empirically, by incrementally constructing a grammar and discovering what problems emerge when successively larger versions are compiled into finite state graph representations and used as language models for a medium-vocabulary recognition task.
Abstract: Systems now exist which are able to compile unification grammars into language models that can be included in a speech recognizer, but it is so far unclear whether non-trivial linguistically principled grammars can be used for this purpose. We describe a series of experiments which investigate the question empirically, by incrementally constructing a grammar and discovering what problems emerge when successively larger versions are compiled into finite state graph representations and used as language models for a medium-vocabulary recognition task.


Proceedings ArticleDOI
13 Sep 2000
TL;DR: The method, applicable to any unification grammar with a phrase-structure backbone, is shown to be effective in specializing a broad-coverage LFG for French.
Abstract: Broad-coverage grammars tend to be highly ambiguous. When such grammars are used in a restricted domain, it may be desirable to specialize them, in effect trading some coverage for a reduction in ambiguity. Grammar specialization is here given a novel formulation as an optimization problem, in which the search is guided by a global measure combining coverage, ambiguity and grammar size. The method, applicable to any unification grammar with a phrase-structure backbone, is shown to be effective in specializing a broad-coverage LFG for French.

Journal ArticleDOI
01 Jan 2000-Grammars
TL;DR: This work investigates various concepts of leftmost derivation in grammars controlled by bicoloured digraphs, paying specific attention to their descriptive capacity, to unify the presentation of known results regarding especially programmed and matrix Grammars and to obtain new results concerning grammar with regular control, and periodically time-variant grammARS.
Abstract: In this paper, we investigate various concepts of leftmost derivation in grammars controlled by bicoloured digraphs, paying specific attention to their descriptive capacity. This approach allows us to unify the presentation of known results regarding especially programmed and matrix grammars, and to obtain new results concerning grammars with regular control, and periodically time-variant grammars. Moreover, we consider leftmost derivations in grammars with (regular) context conditions.

Proceedings Article
29 Apr 2000
TL;DR: Evidence that left-to-right parsing cannot be realised within acceptable time-bounds if the so called correct-prefix property is to be ensured is provided.
Abstract: We compare the asymptotic time complexity of left-to-right and bidirectional parsing techniques for bilexical context-free grammars, a grammar formalism that is an abstraction of language models used in several state-of-the-art real-world parsers. We provide evidence that left-to-right parsing cannot be realised within acceptable time-bounds if the so called correct-prefix property is to be ensured. Our evidence is based on complexity results for the representation of regular languages.