scispace - formally typeset
Search or ask a question

Showing papers on "Phrase structure grammar published in 2002"


Book ChapterDOI
TL;DR: It is shown that there exists a unique minimal balanced grammar equivalent to a given one and balanced languages are characterized through a property of their syntactic congruence.
Abstract: Balanced grammars are a generalization of parenthesis grammars in two directions. First, several kind of parentheses are allowed. Next, the set of right-hand sides of productions may be an infinite regular language. XML-grammars are a special kind of balanced grammars. This paper studies balanced grammars and their languages. It is shown that there exists a unique minimal balanced grammar equivalent to a given one. Next, balanced languages are characterized through a property of their syntactic congruence. Finally, we show how this characterization is related to previous work of McNaughton and Knuth on parenthesis languages.

45 citations


Proceedings ArticleDOI
24 Aug 2002
TL;DR: The resulting compiler creates a context-free backbone of the unification grammar, eliminates left-recursive productions and removes redundant grammar rules, and shows no significant computational overhead with respect to speech recognition performances for speech recognition grammar with compositional semantics compared to grammars without.
Abstract: In this paper a method to compile unification grammars into speech recognition packages is presented, and in particular, rules are specified to transfer the compositional semantics stated in unification grammars into speech recognition grammars. The resulting compiler creates a context-free backbone of the unification grammar, eliminates left-recursive productions and removes redundant grammar rules. The method was tested on a medium-sized unification grammar for English using Nuance speech recognition software on a corpus of 131 utterances of 12 different speakers. Results showed no significant computational overhead with respect to speech recognition performances for speech recognition grammar with compositional semantics compared to grammars without.

29 citations



Book ChapterDOI
08 Apr 2002
TL;DR: A new proof of Chirica and Martin's result, that the attribute values can be computed by a structural recursion over the tree is given, and a new definedness test is derived, which encompasses the traditional closure and circularity tests.
Abstract: A definition of the semantics of attribute grammars is given, using the lambda calculus. We show how this semantics allows us to prove results about attribute grammars in a calculational style. In particular, we give a new proof of Chirica and Martin's result [6], that the attribute values can be computed by a structural recursion over the tree. We also derive a new definedness test, which encompasses the traditional closure and circularity tests. The test is derived by abstract interpretation.

25 citations


01 Jan 2002
TL;DR: In Ingria 1990, feature neutralization and the coordination of unlikes are seen to be different aspects of the same problem: how to determine the values of a coordinate mother's features from those of its conjuncts.
Abstract: All theories of syntax aspire to be able to analyze any construction found in any language. Thus Ingria’s (1990) argument that both feature neutrality and the coordination of unlikes pose fundamental problems for unification-based theories presents a formidable challenge to those theories; moreover, it has often been assumed to automatically apply to their successors – constraint-based theories of grammar, exemplified by Head-driven Phrase Structure Grammar [HPSG] (Pollard and Sag 1994).

23 citations


Book ChapterDOI
Bradford Craig Starkie1
23 Sep 2002
TL;DR: The method presented in this paper has the ability to infer attribute grammars that can generate a wide range of useful data structures such as simple and structured types, lists, concatenated strings, and natural numbers.
Abstract: This paper presents a method for inferring reversible attribute grammars from tagged natural language sentences. Attribute grammars are a form of augmented context free grammar that assign "meaning" in the form of a data structure to a string in a context free language. The method presented in this paper has the ability to infer attribute grammars that can generate a wide range of useful data structures such as simple and structured types, lists, concatenated strings, and natural numbers. The method also presents two new forms of grammar generalisation; generalisation based upon identification of optional phrases and generalisation based upon lists. The method has been applied to and tested on the task of the rapid development of spoken dialog systems.

17 citations


Patent
John T. Maxwell1, Hadar Shemtov1
27 Sep 2002
TL;DR: This article proposed a process for generating with unification based grammars such as Lexical Functional Grammars which uses construction and analysis of generation guides to determine internal facts and eliminate incomplete edges prior to constructing a generation chart.
Abstract: A process for generating with unification based grammars such as Lexical Functional Grammars which uses construction and analysis of generation guides to determine internal facts and eliminate incomplete edges prior to constructing a generation chart. The generation guide can then be used in the construction of the generation chart to efficiently generate with unification-based grammars such as Lexical Functional Grammars. The generation guide is an instance of a grammar that has been specialized to the input and only contains those parts of the grammar that are relevant to the input. When the generation guide is analyzed to determine internal facts a smaller generation chart is produced.

16 citations


Journal Article
TL;DR: It is demonstrated that for every phrase-structure grammar, there exists an equivalent simple semi-conditional grammar that has no more than twelve conditional productions.
Abstract: The present paper discusses the descriptional complexity of simple semi-conditional grammars with respect to the number of conditional productions. More specifically, it demonstrates that for every phrase-structure grammar, there exists an equivalent simple semi-conditional grammar that has no more than twelve conditional productions.

16 citations


Book ChapterDOI
23 Sep 2002
TL;DR: It is shown that in contrast to k-valued classical categorial grammars, different classes of Lambek Grammars are not learnable from strings following Gold's model.
Abstract: In this paper we give some learnability results in the field of categorial grammars. We show that in contrast to k-valued classical categorial grammars, different classes of Lambek grammars are not learnable from strings following Gold's model. The results are obtained by the construction of limit points in each considered class: non associative Lambek grammars with empty sequences and Lambek grammars without empty sequences and without product. Such results express the difficulty of learning categorial grammars from unstructured strings and the need for structured examples.

14 citations


Proceedings ArticleDOI
11 Jul 2002
TL;DR: A phonological probabilistic context-free grammar, which describes the word and syllable structure of German words, is presented, and rules for English phonemes are added to the grammar, and the enriched grammar is trained on an English corpus.
Abstract: We present a phonological probabilistic context-free grammar, which describes the word and syllable structure of German words. The grammar is trained on a large corpus by a simple supervised method, and evaluated on a syllabification task achieving 96.88% word accuracy on word tokens, and 90.33% on word types. We added rules for English phonemes to the grammar, and trained the enriched grammar on an English corpus. Both grammars are evaluated qualitatively showing that probabilistic context-free grammars can contribute linguistic knowledge to phonology. Our formal approach is multilingual, while the training data is language-dependent.

12 citations


Journal ArticleDOI
01 Jan 2002-Grammars
TL;DR: This paper discusses alternative approaches for defining the denotation of a grammar, culminating in one which is shown to be both compositional and fully-abstract, and shows how grammar modules can be defined such that their semantics retains these desirable properties.
Abstract: Given two context-free grammars (CFGs), G1 and G2, the language generated by the union of the grammars is not the union of the languages generated by each grammar: L(G1∪ G2)≠ L(G1∪ L(G2). In order to account for modularity of grammars, another way of defining the meaning of grammars is needed. This paper adapts results from the semantics of logic programming languages to CFGs. We discuss alternative approaches for defining the denotation of a grammar, culminating in one which we show to be both compositional and fully-abstract. We then show how grammar modules can be defined such that their semantics retains these desirable properties. This gives a clear, mathematically sound way for composing parts of grammars.

Proceedings ArticleDOI
24 Aug 2002
TL;DR: It is shown that in contrast to classical categorial grammars, rigid and k-valued Lambek Grammars are not learnable from strings for variants of Lambek calculus.
Abstract: This paper is concerned with learning categorial grammars in Gold's model (Gold, 1967). Recently, learning algorithms in this model have been proposed for some particular classes of classical categorial grammars (Kanazawa, 1998).We show that in contrast to classical categorial grammars, rigid and k-valued Lambek grammars are not learnable from strings. This result holds for variants of Lambek calculus; our proof consists in the construction of limit points in each class. Such a result aims at clarifying the possible directions for future learning algorithms.

Proceedings ArticleDOI
03 Jul 2002
TL;DR: A novel view on top-down predictive parser construction for extended context-free grammars that is based on the rewriting of partial syntax trees is presented.
Abstract: Extended context-free grammars are context-free grammars in which the right-hand sides of productions are allowed to be any regular language rather than being restricted to only finite languages.We present a novel view on top-down predictive parser construction for extended context-free grammars that is based on the rewriting of partial syntax trees. This work is motivated by our development of ecfg, a Java toolkit for the manipulation of extended context-free grammars, and by our continuing investigation of XML.

Journal Article
TL;DR: The authors investigates diverse case marking patterns in auxiliary verb constructions in Korean, and provides an account in terms of a general mechanism of structural case assignment within the framework of Head-Driven Phrase Structure Grammar (HPSG).
Abstract: This paper investigates diverse case marking patterns in auxiliary verb constructions (AVCs) in Korean, and provides an account in terms of a general mechanism of structural case assignment within the framework of Head-Driven Phrase Structure Grammar (HPSG). It is first shown that the complicated case marking patterns which arise from various combinations of auxiliary verbs posit problems for both transformational analyses based on head movement and previous HPSG analyses in which the final auxiliary verb solely determines the case marking property of the whole complex predicate. This paper argues that auxiliary verbs are different in their way of inheriting case marking property of the preceding predicate, and case alternation in siphta construction can be explained by the dual inheritance property specified in the lexicon. Drawing upon a complex predicate analysis of AVCs, this paper proposes that complicated case patterns in AVCs can be accounted for by classification of verbs/auxiliary verbs via distinct feature values and by the mechanism of structural case resolution.

Journal Article
TL;DR: A procedure of finding minimal unifiers with respect to some preordering relation between substitutions is introduced to solve the general problem of finding all minimal (in several senses) categorial grammars compatible with a given language sample.
Abstract: In this paper continuing [1], we present a more general approach to restricted optimal unification, introduced and applied to learning algorithms for categorial grammars in [2] and further developed in [7, 8, 4, 5]. We introduce a procedure of finding minimal unifiers with respect to some preordering relation between substitutions and solve a general problem of finding all minimal (in several senses) categorial grammars compatible with a given language sample.

Book ChapterDOI
24 Sep 2002
TL;DR: This paper fully extend Winskel’s approach to single-pushout grammars providing them with a categorical concurrent semantics expressed as a coreflection between the category of graph Grammars and the categoryof prime algebraic domains.
Abstract: The problem of extending to graph grammars the unfolding semantics originally developed by Winskel for (safe) Petri nets has been faced several times along the years, both for the single-pushout and double-pushout approaches, but only partial results were obtained. In this paper we fully extend Winskel’s approach to single-pushout grammars providing them with a categorical concurrent semantics expressed as a coreflection between the category of graph grammars and the category of prime algebraic domains.

Book ChapterDOI
23 Sep 2002
TL;DR: The aim is to find a (set of) grammar that recognizes new correct sentences by means of some initial correct examples that are presented and of a strategy to deduce the corresponding grammar consistent with the examples at each step.
Abstract: Natural language learning still remains an open problem, although there exist many approaches issued by actual researches. We also address ourselves this challenge and we provide here a prototype of a tool. First we need to clarify that we center on the syntactic level. We intend to find a (set of) grammar(s) that recognizes new correct sentences (in the sense of the correct order of the words) by means of some initial correct examples that are presented and of a strategy to deduce the corresponding grammar(s) consistent(s) with the examples at each step. In this model, the grammars are the support of the languages, so, the process of learning is a process of grammatical inference. Usually, in NLP approaches, natural language is represented by lexicalized grammars because the power of the language consists in the information provided by the words and their combination schemas. That’s why we adopt here the formal model of a categorial grammar that assigns every word a category and furnishes some general combination schema of categories. But, in our model, the strings of words are not sufficient for the inference, so additional information is needed. In Kanazawa’s work [3] the additional information is the internal structure of each sentence as a Structural Example. We try to provide instead a more lexicalized information, of semantic nature: the semantic type of words. Its provenance, as well as the psycho-linguistic motivation can be found in [1] and [2].


01 Jan 2002
TL;DR: A novel approach to top-down predictive parser construction for extended context-free grammars that is based on rewriting of partial syntax trees is developed.
Abstract: Extended context-free grammars are context-free grammars in which the right-hand sides of productions are allowed to be any regular language rather than being restricted to be any nite language. We develop a novel approach to top-down predictive parser construction for extended context-free grammars that is based on rewriting of partial syntax trees. This work is motivated by our development of ecfg, a Java toolkit for the manipulation of extended context-free grammars, and by our continuing investigation of XML.

Journal ArticleDOI
TL;DR: It is demonstrated that for every phrase-structure grammar, there exists an equivalent homogeneous grammar that has only three non-context-free productions of the form 00→e, 11→ e, and 22→e.

Patent
26 Jul 2002
TL;DR: In this article, a method for specifying equivalence of language grammars and automatically translating sentences in one language to sentences in another language in a computer environment is presented. But this method can be used only for high level languages like high level language, assembly language and machine language.
Abstract: A method for specifying equivalence of language grammars and automatically translating sentences in one language to sentences in another language in a computer environment. The method uses a unified grammar specification of grammars of different languages in a single unified representation of all the individual grammars where equivalent production rules of each of the grammars are merged into a single unified production rule. This method can be used to represent the equivalence of computer languages like high level language, assembly language and machine language and for translating sentences in any of these languages to another language.

Book ChapterDOI
07 Oct 2002
TL;DR: This paper defines a special kind of graph grammars that can be used to describe distributed systems with mobility and object-based systems and shows how to model such Grammars and their semantics in terms of tiles making explicit the aspects of interactivity and compositionality.
Abstract: In this paper we define a special kind of graph grammars, called linear ordered graph grammars, that can be used to describe distributed systems with mobility and object-based systems. Then we show how to model such grammars and their semantics in terms of tiles making explicit the aspects of interactivity and compositionality, that are of great importance in distributed systems.

Journal ArticleDOI
TL;DR: Token Dependency Semantics (TDS), a surface‐oriented and token‐based framework for compositional truth‐conditional semantics, is introduced, motivated by Davidson's ‘paratactic’ analysis of semantic intensionality (‘On Saying That’, 1968).
Abstract: This article introduces Token Dependency Semantics (TDS), a surface‐oriented and token‐based framework for compositional truth‐conditional semantics. It is motivated by Davidson's ‘paratactic’ analysis of semantic intensionality (‘On Saying That’, 1968, Synthese 19: 130–146), which has been much discussed in philosophy. This is the first fully‐fledged formal implementation of Davidson's proposal. Operator‐argument structure and scope are captured by means of relations among tokens. Intensional constituent tokens represent ‘propositional’ contents directly. They serve as arguments to the words introducing intensional contexts, rather than being ‘ordinary’ constituents. The treatment of de re readings involves the use of functions (‘anchors’) assigning entities to argument positions of lexical tokens. Quantifiers are thereby allowed to bind argument places on content tokens. This gives us a simple underspecification‐based account of scope ambiguity. The TDS framework is applied to indirect speech reports, mental attitude sentences, control verbs, and modal and agent‐relative sentence adverbs in English. This semantics is compatible with a traditional view of syntax. Here, it is integrated into a Head‐driven Phrase Structure Grammar (HPSG). The result is a straightforward and ontologically parsimonious analysis of truth‐conditional meaning and semantic intensionality.

Proceedings ArticleDOI
24 Aug 2002
TL;DR: Various definitions of OLP are investigated, their inter-relations are discussed and it is shown that some of them are undecidable.
Abstract: Unification grammars are known to be Turing-equivalent; given a grammar G and a word w, it is undecidable whether w e L (G). In order to ensure decidability, several constraints on grammars, commonly known as off-line parsability (OLP) were suggested. The recognition problem is decidable for grammars which satisfy OLP. An open question is whether it is decidable if a given grammar satisfies OLP. In this paper we investigate various definitions of OLP, discuss their inter-relations and show that some of them are undecidable.

Book ChapterDOI
23 Sep 2002
TL;DR: The learning strategy is inspired by Inductive Logic Programming and proceeds by searching through hypothesis spaces generated by logic transformations of the input grammar, which are in charge with the creation of these hypothesis spaces.
Abstract: LIGHT, the parsing systemfor typed-unification grammars [3], was recently extended so to allow the automate learning of attribute/feature paths values. Motivated by the logic design of these grammars [2], the learning strategy we adopted is inspired by Inductive Logic Programming [5]; we proceed by searching through hypothesis spaces generated by logic transformations of the input grammar. Two procedures -- one for generalisation, the other for specialisation -- are in charge with the creation of these hypothesis spaces.

Proceedings ArticleDOI
06 Oct 2002
TL;DR: The interest of using this grammar type (namely HPSG grammars) notably for Arabic parsing is shown, and its role in order to obtain a robust parsing is focused on.
Abstract: This paper deals with representation of syntactic information based on a formal work setting of the unification grammars. It shows the interest of using this grammar type (namely HPSG grammars) notably for Arabic parsing, and focuses on its role in order to obtain a robust parsing.

01 Jan 2002
TL;DR: Linear conjunctive grammars as mentioned in this paper are a generalization of linear context-free grammmars, in which the body of each conjunct contains no more than a single nonterminal symbol.
Abstract: Linear conjunctive grammars are conjunctive grammars in which the body of each conjunct contains no more than a single nonterminal symbol. They can at the same time be thought of as a special case of conjunctive grammars and as a generalization of linear context-free grammars. While the problem of whether the complement of any conjunctive language can be denoted with a conjuctive grammar is still open and conjectured to have a negative answer, it turns out that the subfamily of linear conjunctive languages is in fact closed under complement and therefore under all set-theoretic operations.

Journal ArticleDOI
TL;DR: A family of static evaluators for subclasses of the well-defined (i.e., noncircular) attribute grammars with look-ahead behaviors is proposed, and it is shown that, for any finite m, an NC(m) attribute grammar can be transformed to an equivalent NC(0) grammar.
Abstract: We propose a family of static evaluators for subclasses of the well-defined (i.e., noncircular) attribute grammars. These evaluators augment the evaluator for the absolutely noncircular attribute grammars with look-ahead behaviors. Because this family covers exactly the set of all well-defined attribute grammars, well-defined attribute grammars may be classified into a hierarchy, called the NC hierarchy, according to their evaluators in the family. The location of a noncircular attribute grammar in the NC hierarchy is an intrinsic property of the grammar. The NC hierarchy confirms a result of Riis and Skyum (1981), which says that all well-defined attribute grammars allow a (static) pure multivisit evaluator by actually constructing such an evaluator. We also show that, for any finite m, an NC(m) attribute grammar can be transformed to an equivalent NC(0) grammar.

Book ChapterDOI
23 Sep 2002
TL;DR: The class of 2-letter rigid grammars is studied and it is shown thatgrammars in this class can be learned very efficiently, within Gold's paradigm of identification in the limit, from positive examples.
Abstract: It is well-known that certain classes of classical categorial grammars are learnable, within Gold's paradigm of identification in the limit, from positive examples. In the search for classes which can be learned efficiently from strings, we study the class of 2-letter rigid grammars, which is the class of classical categorial grammars with an alphabet of two letters, each of which is assigned a single type. The (non-trivial) structure of this class of grammars is studied and it is shown that grammars in this class can be learned very efficiently. The algorithm given for solving this learning problem runs in time linear in the total length of the input strings. After seeing two or more strings in a language, the algorithm can determine precisely the (finite) set of grammars which can generate those strings.