scispace - formally typeset
Search or ask a question

Showing papers on "Rule-based machine translation published in 1975"


Journal ArticleDOI
01 Jan 1975
TL;DR: The problem of inferring a stochastic grammar to model the behavior of an information source is introduced and techniques for carrying out the inference process are presented for a class of Stochastic finite-state and context-free grammars.
Abstract: Inference of high-dimensional grammars is discussed. Specifically, techniques for inferring tree grammars are briefly presented. The problem of inferring a stochastic grammar to model the behavior of an information source is also introduced and techniques for carrying out the inference process are presented for a class of stochastic finite-state and context-free grammars. The possible practical application of these methods is illustrated by examples.

254 citations


Journal ArticleDOI
TL;DR: It is shown how efficient LR and LL parsers can be constructed directly from certain classes of these specifications.
Abstract: Methods of describing the syntax of programming languages in ways that are more flexible and natural than conventional BNF descriptions are considered. These methods involve the use of ambiguous context-free grammars together with rules to resolve syntactic ambiguities. It is shown how efficient LR and LL parsers can be constructed directly from certain classes of these specifications.

115 citations


Journal ArticleDOI
TL;DR: A high level and nonprocedural translation definition language, CONVERT, which provides very powerful and highly flexible data restructuring capabilities and is based on the simple underlying concept of a form which enables the users to visualize the translation processes, and thus makes data translation a much simpler task.
Abstract: This paper describes a high level and nonprocedural translation definition language, CONVERT, which provides very powerful and highly flexible data restructuring capabilities. Its design is based on the simple underlying concept of a form which enables the users to visualize the translation processes, and thus makes data translation a much simpler task.“CONVERT” has been chosen for conveying the purpose of the language and should not be confused with any other language or program bearing the same name.

108 citations


Journal ArticleDOI
TL;DR: The maximum-likelihood criterion and the minimum-distance criterion are proposed for the classification of noisy strings described by context-free grammars and classification algorithms based on a modified Cocke-Younger-Kasami parsing scheme are presented.
Abstract: A model of noise deformation of the substitution type is adopted for linguistic patterns generated by formal grammars. The maximum-likelihood criterion and the minimum-distance criterion are proposed for the classification of noisy strings described by context-free grammars. Classification algorithms based on a modified Cocke-Younger-Kasami parsing scheme are presented.

61 citations


Journal ArticleDOI
TL;DR: The methods developed in this correspondence represent an approach to the problem of handling error-corrupted syntactic pattern strings, an area generally neglected in the numerous techniques for linguistic pattern description and recognition which have been reported.
Abstract: The methods developed in this correspondence represent an approach to the problem of handling error-corrupted syntactic pattern strings, an area generally neglected in the numerous techniques for linguistic pattern description and recognition which have been reported. The basic approach consists of applying error transformations to the productions of context-free grammars in order to generate new grammars (also context-free) capable of describing not only the original error-free patterns, but also patterns containing specific types of errors such as deleted, added, and interchanged symbols which arise often in the pattern-scanning process. Theoretical developments are illustrated in the framework of a syntactic recognition system for chromosome structures.

40 citations


Journal ArticleDOI
TL;DR: This paper presents a programming language designed specifically for the compact and perspicuous statement of restrictions of a natural language grammar, which embodies in its syntax and routines the relations which were found to be useful and adequate for computerized natural language analysis.
Abstract: Over the past few years, a number of systems for the computer analysis of natural language sentences have been based on augmented context-free grammars: a context-free grammar which defines a set of parse trees for a sentence, plus a group of restrictions to which a tree must conform in order to be a valid sentence analysis. As the coverage of the grammar is increased, an efficient representation becomes essential for further development. This paper presents a programming language designed specifically for the compact and perspicuous statement of restrictions of a natural language grammar. It is based on ten years' experience parsing text sentences with the comprehensive English grammar of the N.Y.U. Linguistic String Project, and embodies in its syntax and routines the relations which were found to be useful and adequate for computerized natural language analysis. The language is used in the current implementation of the Linguistic String Parser.

38 citations


Journal ArticleDOI
M. Bates1
TL;DR: This paper describes the design of the Bolt Beranek and Newman (BBN) speech parser with emphasis on the reasons for using the formalism of transition network grammars and on the interaction of the syntactic component with other parts of the system.
Abstract: When a person hears an English sentence, he uses many sources of information to assign structure and meaning to the utterance. One of these sources, syntax, is concerned with the goal of producing a consistent, meaningful, grammatical structure for the sentence. The exact type of structure produced is not as crucial as the process of building that structure because the speech environment has inherent problems which make the parsing of speech a much more complex task than the parsing of text. For example, lexical ambiguity, caused by variations in articulation and imperfect or imprecise phoneme recognition, would lead to a combinatorial explosion in conventional parsers. This paper describes the design of the Bolt Beranek and Newman (BBN) speech parser with emphasis on the reasons for using the formalism of transition network grammars and on the interaction of the syntactic component with other parts of the system. A detailed example is given to illustrate the operation of the parser.

34 citations


Journal ArticleDOI
TL;DR: A formal method of correcting errors of changed, deleted, and inserted terminals in the strings of a context-free language is considered.
Abstract: A formal method of correcting errors of changed, deleted, and inserted terminals in the strings of a context-free language is considered. Grammars generating strings containing these errors are first constructed from a known grammar for the language; these new grammars are then used to specify simple syntax-directed translation schemata which can parse both correct strings and strings with errors and simultaneously produce output strings in the original language. Stochastic aspects of productions and of errors are incorporated into the correction model to assign probabilities to translations produced.

27 citations


Journal ArticleDOI
TL;DR: The Bounded Context Parsable Grammars are defined, a class of recursive subsets of context free grammars for which the authors can construct linear time parsers and it is shown that the set of languages of thegrammars thus defined properly contains theSet of deterministic languages without the empty sentence.
Abstract: In this paper we extend Floyd's notion of parsing by bounded context to define the Bounded Context Parsable Grammars, a class of recursive subsets of context free grammars for which we can construct linear time parsers. It is shown that the set of languages of the grammars thus defined properly contains the set of deterministic languages without the empty sentence.

18 citations


Journal ArticleDOI
TL;DR: More errors in the strings coming out of the noisy channel can be corrected by the syntactic decoder using syntactic analysis than the !
Abstract: A model of a linguistic information source is proposed as a grammar that generates a language over some finite alphabet. It is pointed out that grammatical sentences generated by the source grammar contain intrinsic "redundancy" that can be exploited for error-corrections. Symbols occurring in the sentences are composed according to some syntactic rules determined by the source grammar, and hence are different in nature from the lexicographical source symbols assumed in information theory and algebraic coding theory. Almost all programming languages and some simple natural languages can be described by the linguistic source model proposed in this paper. In order to combat excessive errors for very noisy channels, a conventional encoding-decoding scheme that does not utilize the source structure is introduced into the communication system. Decoded strings coming out of the lexicographical decoder may not be grammatical, which indicates that some uncorrected errors still remain in the individual sentences and will be reprocessed by a syntactic decoder that converts ungrammatical strings into legal sentences of the source language by the maximum-likelihood criterion. Thus more errors in the strings coming out of the noisy channel can be corrected by the syntactic decoder using syntactic analysis than the !exicographical decoder is capable of correcting or even of detecting. To design the syntactic decoder we use parsing techniques from the study of compilers and formal languages.

17 citations



Journal ArticleDOI
Naomi Sager1
TL;DR: The results of an investigation into information structures in natural language science texts show that the literature of a science subfield has characteristic restrictions on lanugage usage which can be used to develop information formats for text sentences in the subfield.
Abstract: This paper presents the results of an investigation into information structures in natural language science texts. A novel hypothesis was tested; namely, that the literature of a science subfield has characteristic restrictions on lanugage usage which can be used to develop information formats for text sentences in the subfield. The formats provide a standard representation of the specific types of information found in sentences of subfield articles, though a priori semantic categories are not used. The method of sublanguage grammars for obtaining information formats is described. Illustrations are drawn from a sublanguage grammar written for a subfield of pharmacology. Parts of the procedure are computerized or are being implemented.

01 Apr 1975
TL;DR: The effort concentrated on the task of integrating machine translation system modules into a completely sequenced system of programs during execution of the translation process, and wrote interface modules for the front end of the system to enable all machines coded texts to be normalized before input to the actual analysis process.
Abstract: : The report documents results of a 12 month R and D effort in Chinese-English machine translation. The effort concentrated on the task of integrating machine translation system modules into a completely sequenced system of programs during execution of the translation process. In particular, interface modules for the front end of the system were written to enable all machines coded texts to be normalized before input to the actual analysis process. All texts entering the system are designated by a decimal reference so that each sentence or subpart of a sentence could be cross-referenced for retrieval and additional analysis. Programs for interfacing the sentence dictionary with the text normalization programs were completely designed but implementation had to be deferred due to funding difficulties and resultant decrease of programming manpower. As preparation for the live test of unedited physics text material, an extensive text was prepared with its associated dictionary which would reflect the latest updates in both the rules and the codes of the system dictionaries. More detailed analysis of the text resulted in additional rules for the several levels of the grammar. These updates still need to be incorporated into the existing rules and into the dictionaries. A considerable effort was devoted to the organization and detailed internal documentation of all data accumulated during the lifetime of the project. This includes the description of the dictionary grammar codes, the grammar rules and the partially completed sets of the machine translation system. The full set of grammar codes is included as an appendix in this report. (Author)

Proceedings ArticleDOI
10 Jun 1975
TL;DR: A meta-symbolic simulation system that includes a powerful behavioral simulation programming language that models, generates and manipulates events in the notation of a semantic network that changes through time, and a generalized, semantics-to-surface structure generation mechanism that can describe changes in the semantic universe in the syntax of any natural language for which a grammar is supplied.
Abstract: In our efforts to model the totality of synchronic and diachronic language behavior in complex social groups, we developed a meta-symbolic simulation system that includes a powerful behavioral simulation programming language that models, generates and manipulates events in the notation of a semantic network that changes through time, and a generalized, semantics-to-surface structure generation mechanism that can describe changes in the semantic universe in the syntax of any natural language for which a grammar is supplied. Because the system is a meta-theoretical device, it can handle generative semantic grammars formulated within a variety of theoretical frameworks.

01 Jan 1975
TL;DR: The modified Earley's parsing algorithm for transition network grammars is presented and an approach to the inference of transition networks is proposed to solve the problem of noise and distortion in syntactic pattern recognition.
Abstract: : Transition networks along with a discussion of their relationships to grammars of Chomsky's hierarchy is studied The modified Earley's parsing algorithm for transition network grammars is also presented A transition network is a model for a grammar It provides perspicuity in expression and allows efficient parsing algorithms The stochastic and error correcting versions of transition networks are also presented The approach of stochastic error correcting transition network analysis is proposed to solve the problem of noise and distortion in syntactic pattern recognition The advantages of this approach are discussed in detail and illustrated by an experiment on voice-chess language Finally, an approach to the inference of transition networks is proposed The inference on the probability assignment over the arcs of stochastic transition networks is also discussed The inference techniques are illustrated by examples

Proceedings ArticleDOI
01 Apr 1975
TL;DR: Picture construction and pattern recognition in ESP is described with emphasis on picture construction and some tests performed with an experimental version of ESP3 will also be discussed.
Abstract: Most graphics languages are composed of a primitive set of commands which allow for the creation and manipulation of graphical objects. These commands are generally at a low level, in that each command causes one operation to be performed. Often the commands are to subroutines embedded in an algorithmic language so that the arithmetic and control features of the higher level language may be used.ESP3 (Extended SNOBOL Picture Pattern Processor) is a new high-level graphics and pattern recognition language. ESP3 was designed in an effort to provide simple, natural, and efficient manipulation of line drawings. ESP3 differs from present graphics languages in the following ways:1) It provides a high-level method for picture construction. The evaluation of a picture expression (analagous to the SNOBOL4 string-valued expression) causes the construction of a picture.2) It provides extensive referencing facilities for naming and accessing points, subpictures, and attributes of pictures.3) It provides predicates for testing attributes of and relationships among pictures and points.4) It provides a means for defining picture patterns that describe classes of line drawings in much the same way that SNOBOL4 patterns describe classes of strings. Picture pattern matching is a built-in facility.ESP3 is based on the premise that structural descriptions are an essential part of both picture construction and pattern recognition. The concept of a structural description of a picture has its origin with the linguistic-approach to pattern recognition. In the linguistic approach, formal grammars are used as a mechanism for picture description. [Kirsch (1964), Narasimhan (1964, 1966,1970), Anderson (1968), Evans (1968), Miller and Shaw (1969), Fu and Swain (1971), Shaw (1970, 1972), Chien and Ribak (1972), Thomason and Gonzalez (1975)]. Stanton (1970) described a graphics language based on linguistic pattern recognition. ESP3 incorporates and extends many ideas from the above work, and includes all of the features of SNOBOL4 to provide a high-level graphics and pattern recognition language. Some suggested applications of ESP3 are the generation of graphical output, AI programs with imaging capabilities, pattern recognition systems, and scene analysis programs.This paper will describe picture construction and pattern recognition in ESP with emphasis on picture construction. Some tests performed with an experimental version of ESP3 will also be discussed. For a more detailed description of ESP3, see Shapiro (1974).


Journal ArticleDOI
TL;DR: A generalization of the finite state acceptors for derivation structures and for phrase structures is defined and it is proved that the set of syntactic structures of a recursively enumerable language is recursive.
Abstract: We define a generalization of the finite state acceptors for derivation structures and for phrase structures. Corresponding to the Chomsky hierarchy of grammars, there is a hierarchy of acceptors, and for both kinds of structures, the type 2 acceptors are tree automata. For i = 0, 1, 2, 3, the sets of structures recognized by the type i acceptors are just the sets of projections of the structures of the type i grammars, and the languages of the type i acceptors are just the type i languages. Finally, we prove that the set of syntactic structures of a recursively enumerable language is recursive.

01 Aug 1975
TL;DR: A system for syntactic analysis in the context of a computer system for the understanding of spontaneously spoken English is presented.
Abstract: : This report presents a system for syntactic analysis in the context of a computer system for the understanding of spontaneously spoken English.


01 Feb 1975
TL;DR: Some basic definitions and results concerning array and web languages, including equivalences between generators and acceptors, and between sequential and parallel models are reviewed, and various open questions are mentioned.
Abstract: : This paper reviews some basic definitions and results concerning array and web languages, including equivalences between generators and acceptors, and between sequential and parallel models. Generalizations of the basic models are also briefly discussed, and various open questions are mentioned.

01 Jan 1975
TL;DR: The Stored-Data Definition and Translation Task Group (SDDTTG) (SDDL) has three main sections which describe the logical structure, the physical structure, and the correspondence between the two, based on the DIAM String model.
Abstract: : Stored-Data Definition Languages have been developed to describe both the logical and the physical characteristics of stored-data. Two efforts, the Stored-Data Definition and Translation Task Group (SDDTTG) of the CODASYL Systems Committee and the Data Translation Project (DTP) at The University of Michigan, have produced Stored-Data Definition Lanugages. Although the translation methodologies are similar, the languages have different characteristics being influenced by their underlying models of data, reorganization, and translation capabilities. The language specified by SDDTTG (SDDL) has three main sections which describe the logical structure, the physical structure, and the correspondence between the two. The SDDL descriptions are based on the DIAM String model and, therefore are very adept at describing access paths. The SDDL also allows specification of validation information -- checks that are performed on data instances during the translation process. Thus, the SDDL is reformatting oriented, low level, precise, and a rather complete language.

01 Sep 1975
TL;DR: This analysis should ultimately benefit systems attempting to understand English input by providing surface structure to deep ease structure maps using the same templates as employed by the generator.
Abstract: : Natural language output can be generated from semantic nets by processing templates associated with concepts in the net. A set of verb templates is being derived from a study of the surface syntax of some 3000 English verbs: the active forms of the verbs have been classified according to subject, object(s), and complement(s); these syntactic patterns, augmented with case names, are used as a grammar to control the generation of text. This text in turn is passed through a speech synthesis program and output by a VOTRAX speech synthesizer. This analysis should ultimately benefit systems attempting to understand English input by providing surface structure to deep ease structure maps using the same templates as employed by the generator.

Book ChapterDOI
Roger C. Schank1
01 Jan 1975
TL;DR: This chapter discusses the conceptual approach to language processing, which is to try to figure out how humans communicate with other humans and model these processes.
Abstract: This chapter discusses the conceptual approach to language processing. Computational linguistics is defined as the problem of getting computers to communicate with humans, using natural language. The method used is to try to figure out how humans communicate with other humans and model these processes. The general problems of computational linguistics fall into the traditional domains of parsing and generating. Initial approaches to mechanical translation divided the problem of machine translation into three parts—sentence analysis, the transfer of structure, and sentence synthesis. Sentence analysis was syntactic analysis. One of the basic assumptions of machine translation work and of more current work on language analysis and synthesis routines is that an analysis grammar should be the same as a generation grammar.

Journal ArticleDOI
TL;DR: A suitable translator is described, driven principally by “paired” context-free grammars of the source and target languages but also able to accommodate context-sensitive rules.
Abstract: In certain types of experiment, the subject controls an on-line computer by giving commands in a simple source language—possibly a subset of English or of a high level computer language. The commands must then be decoded before they can be obeved. One method is to write an ad hoc program for the specific purpose. An alternative is to write a general purpose translator to decode the source language into a more primitive target language. A suitable translator is described, driven principally by “paired” context-free grammars of the source and target languages but also able to accommodate context-sensitive rules. Using the translator has several advantages. It is obviously much easier to write an ad hoc recognizer for a very primitive language than for a subset of English. Also, for small languages it is very easy to write and check grammars; minor modifications are a trivial job, and the finished product is unlikely to contain hidden bugs. An example of the method is given.

Book ChapterDOI
20 May 1975
TL;DR: It is shown that one can determine whether a given grammar fits another given grammar, and it is established that the containment problem for Szilard languages is decidable.
Abstract: One of the methods used for defining translations is the socalled syntax-directed translation scheme which can be interpreted as a pair of rather similar grammars with the productions working in parallel. Because of the similarity of the grammars each of the two grammars "fits" the other in the sense that for each derivation process in one grammar leading to a terminal word the corresponding derivation process in the other grammar also leads to a terminal word. For many practical applications it suffices to consider the case that one of the grammars fits the other, but not necessarily conversely. Investigating this idea, translations are obtained which are more powerful than the syntax-directed. It is shown that one can determine whether a given grammar fits another given grammar. As a by-product, it is established that the containment problem for Szilard languages is decidable.

Journal ArticleDOI
TL;DR: Syntactic operations may reveal optional processes in the child's transition from single-word utterances to grammatical usage that may be related to specific linguistic rather than general cognitive abilities.
Abstract: Recent emphasis on underlying semantic relations in the child's acquisition of grammar has left ignored those cases where syntactic operations can be observed relatively independent of semantic relations. Such operations may reveal optional processes in the child's transition from single-word utterances to grammatical usage that may be related to specific linguistic rather than general cognitive abilities.

Journal ArticleDOI
TL;DR: L’accès aux archives de la revue « Revue française d’automatique informatique recherche opérationnelle • Informatique théorique » implique l’ Accord avec les conditions générales d'utilisation.
Abstract: L’accès aux archives de la revue « Revue française d’automatique informatique recherche opérationnelle. Informatique théorique » implique l’accord avec les conditions générales d’utilisation (http://www.numdam.org/legal.php). Toute utilisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit contenir la présente mention de copyright.