scispace - formally typeset
Search or ask a question

Showing papers on "Context-free grammar published in 2000"


Journal ArticleDOI
TL;DR: A probabilistic syntactic approach to the detection and recognition of temporally extended activities and interactions between multiple agents and how the system correctly interprets activities of multiple interacting objects is demonstrated.
Abstract: This paper describes a probabilistic syntactic approach to the detection and recognition of temporally extended activities and interactions between multiple agents. The fundamental idea is to divide the recognition problem into two levels. The lower level detections are performed using standard independent probabilistic event detectors to propose candidate detections of low-level features. The outputs of these detectors provide the input stream for a stochastic context-free grammar parsing mechanism. The grammar and parser provide longer range temporal constraints, disambiguate uncertain low-level detections, and allow the inclusion of a priori knowledge about the structure of temporal events in a given domain. We develop a real-time system and demonstrate the approach in several experiments on gesture recognition and in video surveillance. In the surveillance application, we show how the system correctly interprets activities of multiple interacting objects.

739 citations


Journal ArticleDOI
TL;DR: It is shown that, subject to some mild restrictions, a grammar-based code is a universal code with respect to the family of finite-state information sources over the finite alphabet.
Abstract: We investigate a type of lossless source code called a grammar-based code, which, in response to any input data string x over a fixed finite alphabet, selects a context-free grammar G/sub x/ representing x in the sense that x is the unique string belonging to the language generated by G/sub x/. Lossless compression of x takes place indirectly via compression of the production rules of the grammar G/sub x/. It is shown that, subject to some mild restrictions, a grammar-based code is a universal code with respect to the family of finite-state information sources over the finite alphabet. Redundancy bounds for grammar-based codes are established. Reduction rules for designing grammar-based codes are presented.

437 citations


Journal Article
TL;DR: An object-oriented extension to canonical attribute grammars is described, permitting attributes to be references to arbitrary nodes in the syntax tree, and Attributes to be accessed via the reference attributes.
Abstract: An object-oriented extension to canonical attribute grammars is described, permitting attributes to be references to arbitrary nodes in the syntax tree, and attributes to be accessed via the reference attributes. Important practical problems such as name and type analysis for object-oriented languages can be expressed in a concise and modular manner in these grammars, and an optimal evaluation algorithm is available. An extensive example is given, capturing all the key constructs in object-oriented languages including block structure, classes, inheritance, qualified use, and assignment compatibility in the presence of subtyping. The formalism and algorithm have been implemented in APPLAB, an interactive language development tool.

192 citations


Patent
29 Sep 2000
TL;DR: In this paper, a method and apparatus to use semantic inference with speech recognition systems includes recognizing at least one spoken word, processing the spoken word using a context free grammar, deriving an output from the context-free grammar, and translating the output to a predetermined command.
Abstract: A method and apparatus to use semantic inference with speech recognition systems includes recognizing at least one spoken word, processing the spoken word using a context-free grammar, deriving an output from the context-free grammar, and translating the output to a predetermined command.

137 citations


Book ChapterDOI
Pat Langley1, Sean Stromsten1
31 May 2000
TL;DR: A rational reconstruction of Wolff's SNPR - the GRIDS system - is presented which incorporates a bias toward grammars that minimize description length, and the algorithm alternates between merging existing nonterminal symbols and creating new symbols, using a beam search to move from complex to simpler Grammars.
Abstract: We examine the role of simplicity in directing the induction of context-free grammars from sample sentences. We present a rational reconstruction of Wolff's SNPR - the GRIDS system - which incorporates a bias toward grammars that minimize description length. The algorithm alternates between merging existing nonterminal symbols and creating new symbols, using a beam search to move from complex to simpler grammars. Experiments suggest that this approach can induce accurate grammars and that it scales reasonably to more difficult domains.

101 citations


Proceedings ArticleDOI
05 Jun 2000
TL;DR: The proposed unified model can significantly reduce the test set perplexity from 378 to 90 in comparison with a domain-independent word trigram and converges well when domain-specific data becomes available.
Abstract: While context-free grammars (CFGs) remain as one of the most important formalisms for interpreting natural language, word n-gram models are surprisingly powerful for domain-independent applications. We propose to unify these two formalisms for both speech recognition and spoken language understanding (SLU). With portability as the major problem, we incorporated domain-specific CFGs into a domain-independent n-gram model that can improve the generalizability of the CFG and the specificity of the n-gram. In our experiments, the unified model can significantly reduce the test set perplexity from 378 to 90 in comparison with a domain-independent word trigram. The unified model converges well when domain-specific data becomes available. The perplexity can be further reduced from 90 to 65 with a limited amount of domain-specific data. While we have demonstrated excellent portability, the full potential of our approach lies in its unified recognition and understanding that we are investigating.

90 citations


Book ChapterDOI
11 Sep 2000
TL;DR: It is proved that the problem of parsing a given string or its most probable parse with stochastic regular grammars is NP-hard and does not allow for a polynomial time approximation scheme.
Abstract: Determinism plays an important role in grammatical inference. However, in practice, ambiguous grammars (and non determinism grammars in particular) are more used than determinism grammars. Computing the probability of parsing a given string or its most probable parse with stochastic regular grammars can be performed in linear time. However, the problem of finding the most probable string has yet not given any satisfactory answer. In this paper we prove that the problem is NP-hard and does not allow for a polynomial time approximation scheme. The result extends to stochastic regular syntax-directed translation schemes.

86 citations


Proceedings ArticleDOI
10 Sep 2000
TL;DR: The paper proposes an extension of layered graph grammars (LGGs), which have been introduced for the definition of visual languages (VLs), offering new constructs like negative application conditions (NACs) it allows one to produce more concise VL definitions.
Abstract: The paper proposes an extension of layered graph grammars (LGGs), which have been introduced for the definition of visual languages (VLs). Offering new constructs like negative application conditions (NACs) it allows one to produce more concise VL definitions. A new layering condition and critical pair analysis are the prerequisites for a new parsing algorithm which avoids the exponential behaviour of LGGs in many cases.

85 citations


Patent
Mark E. Epstein1
25 Oct 2000
TL;DR: This paper applied a context free grammar to the text input to determine substrings and corresponding parse trees, and examined each possible substring using an inventory of queries corresponding to the CFG.
Abstract: A method and system for use in a natural language understanding system for including grammars within a statistical parser. The method involves a series of steps. The invention receives a text input. The invention applies a first context free grammar to the text input to determine substrings and corresponding parse trees, wherein the substrings and corresponding parse trees further correspond to the first context free grammar. Additionally, the invention can examine each possible substring using an inventory of queries corresponding to the CFG.

74 citations


Proceedings ArticleDOI
31 Jul 2000
TL;DR: A noun chunker for German which is based on a head-lexicalised probabilistic context-free grammar was presented, which yielded 92% recall and 93% precision.
Abstract: We present a noun chunker for German which is based on a head-lexicalised probabilistic context-free grammar. A manually developed grammar was semi-automatically extended with robustness rules in order to allow parsing of unrestricted text. The model parameters were learned from unlabelled training data by a probabilistic context-free parser. For extracting noun chunks, the parser generates all possible noun chunk analyses, scores them with a novel algorithm which maximizes the best chunk sequence criterion, and chooses the most probable chunk sequence. An evaluation of the chunker on 2,140 hand-annotated noun chunks yielded 92% recall and 93% precision.

66 citations


Journal ArticleDOI
TL;DR: An approach for the generation of components for a software renovation factory is presented, generated from a contex-free grammar definition that recognizes the code that has to be renovated.

Journal ArticleDOI
TL;DR: A widely applicable method of using fractal sets to organize infinite‐state computations in a bounded state space and suggests that such a global perspective on the organization of the parameter space may be helpful for solving the hard problem of getting connectionist networks to learn complex grammars from examples.
Abstract: Connectionist network learning of context-free languages has so far been applied only to very simple cases and has often made use of an external stack. Learning complex context-free languages with a homogeneous neural mechanism looks like a much harder problem. The current paper takes a step toward solving this problem by analyzing context-free grammar computation (without addressing learning) in a class of analog computers called dynamical automata, which are naturally implemented in connectionist networks. The result is a widely applicable method of using fractal sets to organize infinite-state computations in a bounded state space. An appealing consequence is the development of parameter-space maps, which locate various complex computers in spatial relationships to one another. An example suggests that such a global perspective on the organization of the parameter space may be helpful for solving the hard problem of getting connectionist networks to learn complex grammars from examples.

Book ChapterDOI
11 Sep 2000
TL;DR: This paper employs the GA-based learning algorithm for context-free grammars using tabular representations, and presents an algorithm to eliminate unnecessary nonterminals and production rules using the partially structured examples at the initial stage of theGA- based learning algorithm.
Abstract: In this paper, we consider the problem of inductively learning context-free grammars from partially structured examples. A structured example is represented by a string with some parentheses inserted to indicate the shape of the derivation tree of a grammar. We show that the partially structured examples contribute to improving the efficiency of the learning algorithm. We employ the GA-based learning algorithm for context-free grammars using tabular representations which Sakakibara and Kondo have proposed previously [7], and present an algorithm to eliminate unnecessary nonterminals and production rules using the partially structured examples at the initial stage of the GA-based learning algorithm. We also show that our learning algorithm from partially structured examples can identify a context-free grammar having the intended structure and is more flexible and applicable than the learning methods from completely structured examples [5].

Proceedings Article
Robert C. Moore1
23 Feb 2000
TL;DR: This article developed an improved form of left-corner chart parsing for large context-free grammars, introducing improvements that result in significant speed-ups more compared to previously-known variants of left corner parsing.
Abstract: We develop an improved form of left-corner chart parsing for large context-free grammars, introducing improvements that result in significant speed-ups more compared to previously-known variants of left corner parsing We also compare our method to several other major parsing approaches, and find that our improved left-corner parsing method outperforms each of these across a range of grammars Finally, we also describe a new technique for minimizing the extra information needed to efficiently recover parses from the data structures built in the course of parsing

Book ChapterDOI
18 Sep 2000
TL;DR: A way of enforcing dimensional constraints through formal grammars in the GP framework through dynamic grammar pruning is presented, validated on the problem of identification of a materials response to a mechanical test.
Abstract: Application of Genetic Programming to the discovery of empirical laws is often impaired by the huge size of the domains involved. In physical applications, dimensional analysis is a powerful way to trim out the size of these spaces This paper presents a way of enforcing dimensional constraints through formal grammars in the GP framework. As one major limitation for grammar-guided GP comes from the initialization procedure (how to find admissible and sufficiently diverse trees with a limited depth), an initialization procedure based on dynamic grammar pruning is proposed. The approach is validated on the problem of identification of a materials response to a mechanical test.

Proceedings Article
Robert C. Moore1
29 Apr 2000
TL;DR: This work presents a new method for removing left recursion from CFGs that is both theoretically superior to the standard algorithm, and produces very compact non-left-recursive CFGs in practice.
Abstract: A long-standing issue regarding algorithms that manipulate context-free grammars (CFGs) in a "top-down" left-to-right fashion is that left recursion can lead to nontermination. An algorithm is known that transforms any CFG into an equivalent nonleft-recursive CFG, but the resulting grammars are often too large for practical use. We present a new method for removing left recursion from CFGs that is both theoretically superior to the standard algorithm, and produces very compact non-left-recursive CFGs in practice.

Book ChapterDOI
11 Sep 2000
TL;DR: A technique to infer finite-state transducers is proposed in this work, based on the formal relations between finite- state transducers and regular grammars.
Abstract: A technique to infer finite-state transducers is proposed in this work. This technique is based on the formal relations between finite-state transducers and regular grammars. The technique consists of: 1) building a corpus of training strings from the corpus of training pairs; 2) inferring a regular grammar and 3) transforming the grammar into a finite-state transducer.

Book ChapterDOI
TL;DR: An efficient algorithm is proposed to solve one of the problems associated to the use of weighted and stochastic Context-Free Grammars: the problem of computing the N best parse trees of a given string.
Abstract: Context-Free Grammars are the object of increasing interest in the pattern recognition research community in an attempt to overcome the limited modeling capabilities of the simpler regular grammars, and have application in a variety of fields such as language modeling, speech recognition, optical character recognition, computational biology, etc. This paper proposes an efficient algorithm to solve one of the problems associated to the use of weighted and stochastic Context-Free Grammars: the problem of computing the N best parse trees of a given string. After the best parse tree has been computed using the CYK algorithm, a large number of alternative parse trees are obtained, in order by weight (or probability), in a small fraction of the time required by the CYK algorithm to find the best parse tree. This is confirmed by experimental results using grammars from two different domains: a chromosome grammar, and a grammar modeling natural language sentences from the Wall Street Journal corpus.

Book ChapterDOI
25 Nov 2000
TL;DR: An efficient and scalable implementation for grammar induction based on the EMILE approach, which learns a subclass of the shallow context-free languages, and some interesting practical results on small and large text collections.
Abstract: In this paper we describe an efficient and scalable implementation for grammar induction based on the EMILE approach [2,3,4,5,6]. The current EMILE 4.1 implementation [11] is one of the first efficient grammar induction algorithms that work on free text. Although EMILE 4.1 is far from perfect, it enables researchers to do empirical grammar induction research on various types of corpora. The EMILE approach is based on notions from categorial grammar (cf. [10]), which is known to generate the class of context-free languages. EMILE learns from positive examples only (cf. [1,7,9]). We describe the algorithms underlying the approach and some interesting practical results on small and large text collections. As shown in the articles mentioned above, in the limit EMILE learns the correct grammatical structure of a language from sentences of that language. The conducted experiments show that, put into practice, EMILE 4.1 is efficient and scalable. This current implementation learns a subclass of the shallow context-free languages. This subclass seems sufficiently rich to be of practical interest. Especially Emile seems to be a valuable tool in the context of syntactic and semantic analysis of large text corpora.

Proceedings ArticleDOI
10 Sep 2000
TL;DR: The authors introduce the eXtended Positional Grammars (XPG) that enhance the descriptive power of positional grammars and present a more powerful LR based methodology for parsing visual languages described by XPGs.
Abstract: Positional grammars are a formalism for the definition and implementation of visual languages. They have already been used in the past as part of the VLCC system (Visual Language Compiler-Compiler) for the definition and the implementation of visual environments for editing and compiling flowcharts, chemical structures, combinatorial networks, electric circuits, etc. The authors introduce the eXtended Positional Grammars (XPG) that enhance the descriptive power of positional grammars. We also present a more powerful LR based methodology for parsing visual languages described by XPGs. The result is the possibility of describing and compiling a much wider class of visual languages, yet keeping most of the LR parsing efficiency.

Book ChapterDOI
11 Sep 2000
TL;DR: This paper describes a method of synthesizing context free grammars from positive and negative sample strings, which is implemented in a grammatical inference system called Synapse, based on incremental learning for positive samples and a rule generation method by "inductive CYK algorithm,” which generates minimal production rules required for parsing positive samples.
Abstract: This paper describes a method of synthesizing context free grammars from positive and negative sample strings, which is implemented in a grammatical inference system called Synapse. The method is based on incremental learning for positive samples and a rule generation method by “inductive CYK algorithm,” which generates minimal production rules required for parsing positive samples. Synapse can generate unambiguous grammars as well as ambiguous grammars. Some experiments showed that Synapse can synthesize several simple context free grammars in considerably short time.

Proceedings ArticleDOI
17 Apr 2000
TL;DR: This paper presents an FPGA-based implementation of a co-processing unit able to parse context-free grammars of real-life sizes that can be used for programming language syntactic analysis and natural language applications where parsing speed is an important issue.
Abstract: This paper presents an FPGA-based implementation of a co-processing unit able to parse context-free grammars of real-life sizes. The application fields of such a parser range from programming language syntactic analysis to very demanding natural language applications where parsing speed is an important issue.

Proceedings Article
29 Apr 2000
TL;DR: This paper describes a method for estimating conditional probability distributions over the parses of "unification-based" grammars which can utilize auxiliary distributions that are estimated by other means, and applies this estimator to a Stochastic Lexical-Functional Grammar.
Abstract: This paper describes a method for estimating conditional probability distributions over the parses of "unification-based" grammars which can utilize auxiliary distributions that are estimated by other means. We show how this can be used to incorporate information about lexical selectional preferences gathered from other sources into Stochastic "Unification-based" Grammars (SUBGs). While we apply this estimator to a Stochastic Lexical-Functional Grammar, the method is general, and should be applicable to stochastic versions of HPSGs, categorial grammars and transformational grammars.

Book ChapterDOI
01 Sep 2000
TL;DR: A formal approach for the specification of mobile code systems based on graph grammars, that is a formal description technique suitable for the description of highly parallel systems, and is intuitive even for non-theoreticians is introduced.
Abstract: In this paper we introduce a formal approach for the specification of mobile code systems. This approach is based on graph grammars, that is a formal description technique that is suitable for the description of highly parallel systems, and is intuitive even for non-theoreticians We define a special class of graph grammars using the concepts of object-based systems and include location information explicitly. Aspects of modularity and execution in an open environment are discussed.

Proceedings ArticleDOI
21 Dec 2000
TL;DR: Turbo recognition (TR) is a communication theory approach to the analysis of rectangular layouts, in the spirit of Document Image Decoding as discussed by the authors, where two grammars are used simultaneously to describe structure in orthogonal (horizontal and vertical directions) directions.
Abstract: Turbo recognition (TR) is a communication theory approach to the analysis of rectangular layouts, in the spirit of Document Image Decoding. The TR algorithm, inspired by turbo decoding, is based on a generative model of image production, in which two grammars are used simultaneously to describe structure in orthogonal (horizontal and vertical directions. This enables TR to strictly embody non-local constraints that cannot be taken into account by local statistical methods. This basis in finite state grammars also allows TR to be quickly retargetable to new domains. We illustrate some of the capabilities of TR with two examples involving realistic images. While TR, like turbo decoding, is not guaranteed to recover the statistically optimal solution, we present an experiment that demonstrates its ability to produce optimal or near-optimal results on a simple yet nontrivial example, the recovery of a filled rectangle in the midst of noise. Unlike methods such as stochastic context free grammars and exhaustive search, which are often intractable beyond small images, turbo recognition scales linearly with image size, suggesting TR as an efficient yet near-optimal approach to statistical layout analysis.© (2000) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Journal ArticleDOI
01 May 2000-Grammars
TL;DR: A generalization of context-free grammars which nonetheless still has a cubic parse time complexity is presented, which belongs to an extension of mildly context-sensitive languages in which the constant growth property is relaxed and which can thus potentially be used in natural language processing.
Abstract: Context-free grammars and cubic parse time are so related in people's minds that they often think that parsing any extension of context-free grammars must need some extra time. Of course, this is not necessarily true and this paper presents a generalization of context-free grammars which nonetheless still has a cubic parse time complexity. This extension, which defines a subclass of context-sensitive languages, has both a theoretical and a practical interest. The class of languages defined by these grammars is closed under both intersection and complement (in fact this class contains both the intersection and the complement of context-free languages). Moreover, these languages belong to an extension of mildly context-sensitive languages in which the constant growth property is relaxed and which can thus potentially be used in natural language processing.

Journal ArticleDOI
TL;DR: Book and Otto (1993) solve a number of word problems for monadic string-rewriting systems using an elegant automata-based technique that provides a uniform solution to several elementary problems on context-free languages.

Journal ArticleDOI
TL;DR: It is proved that the three-nonterminal scattered context grammars characterize the family of recursively enumerable languages.

Journal Article
TL;DR: The generating power of fuzzy context-free K-grammars is investigated, and it is shown that under minor assumptions on the parameter K, the family of languages generated by fuzzy Context-free Grammars possesses closure properties very similar to those of thefamily of ordinary context- free languages.
Abstract: Motivated by aspects of robustness in parsing a context-free language, we study generalized fuzzy context-free grammars. These so-called fuzzy context-free $K$-grammars provide a very general framework to describe correctly as well as erroneously derived sentences by a single generating mechanism. They model the situation of making a finite choice out of an infinity of possible grammatical errors during each context-free derivation step. Formally, a fuzzy context-free $K$-grammar is a fuzzy context-free grammar with a countable rather than a finite number of rules satisfying the following condition: for each symbol $\alpha$, the set containing all right-hand sides of rules with left-hand side equal to $\alpha$ forms a fuzzy language that belongs to a given family $K$ of fuzzy languages. We investigate the generating power of fuzzy context-free $K$-grammars, and we show that under minor assumptions on the parameter $K$, the family of languages generated by fuzzy context-free $K$-grammars possesses closure properties very similar to those of the family of ordinary context-free languages.