scispace - formally typeset
Search or ask a question
Author

Thomas J. Pennello

Bio: Thomas J. Pennello is an academic researcher from University of California, Santa Cruz. The author has contributed to research in topics: LALR parser & Parsing. The author has an hindex of 5, co-authored 6 publications receiving 248 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: Two relations that capture the essential structure of the problem of computing LALR(1) look-ahead sets are defined, and an efficient algorithm is presented to compute the sets in time linear in the size of the relations.
Abstract: Two relations that capture the essential structure of the problem of computing LALR(1) look-ahead sets are defined, and an efficient algorithm is presented to compute the sets in time linear in the size of the relations. In particular, for a PASCAL grammar, the algorithm performs fewer than 15 percent of the set unions performed by the popular compiler-compiler YACC. When a grammar is not LALR(1), the relations, represented explicitly, provide for printing useroriented error messages that specifically indicate how the look-ahead problem arose. In addition, certain loops in the digraphs induced by these relations indicate that the grammar is not LR(k) for any k. Finally, an oft-discovered and used but incorrect look-ahead set algorithm is similarly based on two other relations defined for the fwst time here. The formal presentation of this algorithm should help prevent its rediscovery.

98 citations

Proceedings ArticleDOI
01 Jan 1978
TL;DR: A "forward move algorithm", and some of its formal properties, is presented for use in a practical syntactic error recovery scheme for LR parsers and an error recovery algorithm that uses the accumulated right context is proposed.
Abstract: A "forward move algorithm", and some of its formal properties, is presented for use in a practical syntactic error recovery scheme for LR parsers. The algorithm finds "valid fragment" (comparable to a valid prefix) just to the right of a point of error detection. For expositional purposes the algorithm is presented as parsing arbitrarily far beyond the point of error detection in a "parallel" mode, as long as all parses agree on the read or reduce action to be taken at each parse step. In practice the forward move is achieved serially by adding "recovery states" to the LR machine. Based on the formal properties of the forward move we propose an error recovery algorithm that uses the accumulated right context. The performance of the recovery algorithm is illustrated in a specific case and discussed in general.

59 citations

Proceedings ArticleDOI
01 Jul 1986
TL;DR: LR parsers can be made to run 6 to 10 times as fast as the best table-interpretive LR parsers, and a factor of 2 to 4 increase in total table size can be expected, depending upon whether syntactic error recovery is required.
Abstract: LR parsers can be made to run 6 to 10 times as fast as the best table-interpretive LR parsers. The resulting parse time is negligible compared to the time required by the remainder of a typical compiler containing the parser.A parsing speed of 1/2 million lines per minute on a computer similar to a VAX 11/780 was achieved, up from an interpretive speed of 40,000 lines per minute. A speed of 240,000 lines per minute on an Intel 80286 was achieved, up from an interpretive speed of 37,000 lines per minute.The improvement is obtained by translating the parser's finite state control into assembly language. States become code memory addresses. The current input symbol resides in a register and a quick sequence of register-constant comparisons determines the next state, which is merely jumped to. The parser's push-down stack is implemented directly on a hardware stack. The stack contains code memory addresses rather than the traditional state numbers.The strongly-connected components of the directed graph induced by the parser's terminal and nonterminal transitions are examined to determine a typically small subset of the states that require parse-time stack-overflow-check code when hardware does not provide the check automatically.The increase in speed is at the expense of space: a factor of 2 to 4 increase in total table size can be expected, depending upon whether syntactic error recovery is required.

55 citations

01 Jan 1979
TL;DR: Two relations are defined that capture the essential structure of the problem of computing LALR(1) look-ahead sets, and an efficient algorithm is presented to compute the sets in time linear in the size of the relations.
Abstract: We define two relations that capture the essential structure of the problem of computing LALR(1) look-ahead sets, and present an efficient algorithm to compute the sets in time linear in the size of the relations. In particular, for a PASCAL grammar, our algorithm performs less than 20% of the set unions performed by a popular-compiler (YACC).

20 citations

Proceedings ArticleDOI
01 Aug 1979
TL;DR: In this paper, the authors define two relations that capture the essential structure of the problem of computing LALR(1) look-ahead sets, and present an efficient algorithm to compute the sets in time linear in the size of the relations.
Abstract: We define two relations that capture the essential structure of the problem of computing LALR(1) look-ahead sets, and present an efficient algorithm to compute the sets in time linear in the size of the relations. In particular, for a PASCAL grammar, our algorithm performs less than 20% of the set unions performed by a popular-compiler (YACC).

16 citations


Cited by
More filters
Book
01 Jan 1993
TL;DR: This paper presents a guide to the literature the self-applicable scheme specializer, a partial evaluator for a subset of scheme for a first-order functional languages.
Abstract: Functions, types and expressions programming languages and their operational semantics compilation partial evaluation of a flow chart languages partial evaluation of a first-order functional languages the view from Olympus partial evaluation of the Lambda calculus partial evaluation of prolog aspects of Similix - a partial evaluator for a subset of scheme partial evaluation of C applications of partial evaluation termination of partial evaluation program analysis more general program transformation guide to the literature the self-applicable scheme specializer.

1,549 citations

Book ChapterDOI
Alfred V. Aho1
02 Jan 1991
TL;DR: This chapter discusses the algorithms for solving string-matching problems that have proven useful for text-editing and text-processing applications and several innovative, theoretically interesting algorithms have been devised that run significantly faster than the obvious brute-force method.
Abstract: Publisher Summary This chapter discusses the algorithms for solving string-matching problems that have proven useful for text-editing and text-processing applications. String pattern matching is an important problem that occurs in many areas of science and information processing. In computing, it occurs naturally as part of data processing, text editing, term rewriting, lexical analysis, and information retrieval. Many text editors and programming languages have facilities for matching strings. In biology, string-matching problems arise in the analysis of nucleic acids and protein sequences, and in the investigation of molecular phylogeny. String matching is also one of the central and most widely studied problems in theoretical computer science. The simplest form of the problem is to locate an occurrence of a keyword as a substring in a sequence of characters, which is called the input string. For example, the input string queueing contains the keyword ueuei as a substring. Even for this problem, several innovative, theoretically interesting algorithms have been devised that run significantly faster than the obvious brute-force method.

413 citations

Journal Article
TL;DR: The construction of a very wide-coverage probabilistic parsing system for natural language (NL) based on LR parsing techniques, intended to rank the large number of syntactic analyses produced by NL grammars according to the frequency of occurrence of the individual rules deployed in each analysis.
Abstract: We describe work toward the construction of a very wide-coverage probabilistic parsing system for natural language (NL), based on LR parsing techniques. The system is intended to rank the large number of syntactic analyses produced by NL grammars according to the frequency of occurrence of the individual rules deployed in each analysis. We discuss a fully automatic procedure for constructing an LR parse table from a unification-based grammar formalism, and consider the suitability of alternative LALR(1) parse table construction methods for large grammars. The parse table is used as the basis for two parsers; a user-driven interactive system that provides a computationally tractable and labor-efficient method of supervised training of the statistical information required to drive the probabilistic parser. The latter is constructed by associating probabilities with the LR parse table directly. This technique is superior to parsers based on probabilistic lexical tagging or probabilistic context-free grammar because it allows for a more context-dependent probabilistic language model, as well as use of a more linguistically adequate grammar formalism. We compare the performance of an optimized variant of Tomita's (1987) generalized LR parsing algorithm to an (efficiently indexed and optimized) chart parser. We report promising results of a pilot study training on 150 noun definitions from the Longman Dictionary of Contemporary English (LDOCE) and retesting on these plus a further 55 definitions. Finally, we discuss limitations of the current system and possible extensions to deal with lexical (syntactic and semantic) frequency of occurrence.

256 citations

Journal ArticleDOI
TL;DR: An algorithm to solve the problem in time O(MN), where M and N are the lengths of A and R, and requires only O(N) space to deliver just the score of the best alignment, superior to an earlier algorithm by Wagner and Seiferas.

193 citations

Journal ArticleDOI
Gene Myers1
TL;DR: This work places a new worst-case upper bound on regular expression pattern matching using a combination of the node-listing and “Four-Russians” paradigms and provides an implementation that is faster than existing software for small regular expressions.
Abstract: Given a regular expression R of length P and a word A of length N, the membership problem is to determine if A is in the language denoted by R An O(PN/lgN) time algorithm is presented that is based on a lgN speedup of the standard O(PN) time simulation of R's nonderministic finite automaton on A using a combination of the node-listing and “Four-Russians” paradigms This result places a new worst-case upper bound on regular expression pattern matching Moreover, in practice the method provides an implementation that is faster than existing software for small regular expressions

135 citations