scispace - formally typeset
Search or ask a question

Showing papers presented at "International ACM SIGIR Conference on Research and Development in Information Retrieval in 1971"


Proceedings ArticleDOI
01 Apr 1971
TL;DR: The deletion problem is also solved and the programming approaches involved yield a non-trivial case study of list-processing techniques.
Abstract: A storage and retrieval scheme which places items to be stored at the nodes of a binary tree is discussed. The tree is always balanced in a certain sense thus insuring that no excessively long search paths can exist. In addition to presenting the storage and retrieval algorithms, the deletion problem is also solved. The programming approaches involved yield a non-trivial case study of list-processing techniques. Finally, a cost analysis is given.

21 citations


Proceedings ArticleDOI
01 Apr 1971
TL;DR: An overview of research in progress in which a natural-language compiler has been constructed that accepts sentences in a user-extendable English subset, produces surface and deep-structure syntactic analyses, and uses a network of concepts to construct semantic interpretations formalized as computable procedures.
Abstract: This paper presents an overview of research in progress in which the principal aim is the achievement of more natural and expressive modes of on-line communication with complexly structured data bases. A natural-language compiler has been constructed that accepts sentences in a user-extendable English subset, produces surface and deep-structure syntactic analyses, and uses a network of concepts to construct semantic interpretations formalized as computable procedures. The procedures are evaluated by a data management system that updates, modifies, and searches data bases that can be formalized as finite models of states of affairs. The system has been designed and programmed to handle large vocabularies and large collections of facts efficiently. Plans for extending the research vehicle to interface with a deductive inference component and a voice input-output effort are briefly described.

20 citations


Proceedings ArticleDOI
01 Apr 1971
TL;DR: A storage and retrieval system utilizing the combinatorial file structure is developed and the new file organization is shown to have marked value with respect to minimum storage overhead and high retrieval speed.
Abstract: A file structure designed to provide rapid, random access with minimum storage overhead is presented. Storage and retrieval are achieved by direct attribute combination-to-address transformation thereby negating the necessity for large file dictionaries or list-pointer structures. The attribute combination-to-address transformation is conceptually similar to key-to-address transformation techniques, but the transformation is not limited to operations on a single key but operates on the combination of several independent keys (or any subset of the combination) describing an item or request.A storage and retrieval system utilizing the combinatorial file structure is developed. Storage and retrieval results derived from a simulated document library of 4000 items are presented. The new file organization is shown to have marked value with respect to minimum storage overhead and high retrieval speed.

17 citations


Proceedings ArticleDOI
01 Apr 1971
TL;DR: A full text retrieval system was designed for the responsa literature, which is a large corpus of Hebrew legal cases, and several methods were developed, among them "grammatical synthesis", which synthesizes all grammatical variants of a given keyword.
Abstract: A full text retrieval system was designed for the responsa literature, which is a large corpus of Hebrew legal cases. The unique problems of the data base --- mixture of Hebrew, Aramaic and vernaculars, lack of vowels and punctuation, extreme language inflection problems, homographs, existence of thousands of grammatical variants of any given keyword --- dictated development of new methods. Among them we list "grammatical synthesis", which synthesizes all grammatical variants of a given keyword; "Compact KWIC", which enables the user to have a glimpse of the nature of the search before having performed it; effective citation index imbedded in full text searches; and, in general, extensive use of both positive and negative feedback within a single search run. A number of searches performed on a relatively small data base gave in each case a recall of 100%. The average precision was 34%. A KWIC of strategic portions of retrieved documents usually enables a quick disposal of non-relevant material.

15 citations


Proceedings ArticleDOI
01 Apr 1971
TL;DR: This paper is a survey of some of the major semantic models that have been developed for automated semantic analysis of natural language, and concludes that the models described are significant contributions to an unexplored field called semantics.
Abstract: This paper is a survey of some of the major semantic models that have been developed for automated semantic analysis of natural language. Current approaches to semantic analysis and logical inference are based mainly on models of human cognitive processes such as Quillian's semantic memory, Simmon's Protosynthex III and others. All existing systems and/or models, more or less experimental, were applied to a small subset of English. They are highly tentative because the definitions of semantic processes and semantically structured lexicons are not formulated rigorously. This is due mainly to the fact that it is unknown whether a unique, consistent hierarchization of the semantic features of language is possible.However, the models described are significant contributions to an unexplored field called semantics. The progressive development of a sophisticated, semantically based system for automated processing of natural language is a realistic goal. It should not be neglected, despite the fact that it is difficult to predict when this goal will be achieved.

14 citations


Proceedings ArticleDOI
01 Apr 1971
TL;DR: The efficiency of processing natural language in REL English is achieved both by the detailed syntactic aspects which are incorporated into the REL English grammar, and by means of the particular implementation for processing features in the parsing algorithm.
Abstract: Ambiguity is a pervasive and important aspect of natural language. Ambiguities, which are disambiguated by context, contribute powerfully to the expressiveness of natural language as compared to formal languages. In computational systems using natural language, problems of properly controlling ambiguity are particularly large, partially because of the necessity to circumvent parsings due to multiple orderings in the application of rules.Features, that is, subcategorizations of parts-of-speech, constitute an effective means for controlling syntactic ambiguity through ordering the hierarchical organization of syntactic constituents. This is the solution adopted for controlling ambiguity in REL English, which is part of the REL (Rapidly Extensible Language) System. REL is a total software system for facilitating man/machine communications. The efficiency of processing natural language in REL English is achieved both by the detailed syntactic aspects which are incorporated into the REL English grammar, and by means of the particular implementation for processing features in the parsing algorithm.

13 citations


Proceedings Article
01 Jan 1971

9 citations


Proceedings ArticleDOI
01 Apr 1971
TL;DR: The application of compact coding, differencing and other techniques to indexed sequential files is discussed, the effects on system performance are discussed and reductions of almost 80% in mass storage requirements for a particular file are reported.
Abstract: The application of compact coding, differencing and other techniques to indexed sequential files is discussed. The effects on system performance are discussed and reductions of almost 80% in mass storage requirements for a particular file are reported.

8 citations


Proceedings ArticleDOI
01 Apr 1971
TL;DR: Questions which involve 'all', 'every', 'some', or the indefinite article, pose some peculiar problems when presented to a computerized question-answering system where ambiguities cannot be tolerated and can be solved by introducing a new kind of quantification that has the meaning of 'all F' together with a secondary meaning that the class F is not empty.
Abstract: Questions which involve 'all', 'every', 'some', or the indefinite article, pose some peculiar problems when presented to a computerized question-answering system where ambiguities cannot be tolerated. These problems vary from the nature of the correct answer in special cases to the very admissibility of the question itself. To deal with these problems it is convenient to divide questions into two classes---extensional questions whose answers are to name things or truth values, intensional questions whose answers are to give meanings. This paper examines extensional questions. For these, the interpretative problems arising with 'all' and 'every' can be solved by introducing a new kind of quantification, extensional universal quantification, that has the meaning of 'all F' together with a secondary meaning that the class F is not empty. Formal rules for this quantification are given, and it is shown that the so-called definite formulas (which explicate permissible queries) are closed under the new operator.

7 citations


Proceedings ArticleDOI
01 Apr 1971
TL;DR: In the paper it is shown that the decision problem for several classes of proper formulas is solvable, and the analogues of Kuhns' conjecture for these classes are false.
Abstract: The Relational Data File (RDF) of The Rand Corporation is among the most developed of question-answering systems. The "information language" of this system is an applied predicate calculus. The atomic units of information are binary relational sentences. The system has an inference-making capacity.As part of the actual construction and implementation of the RDF, a theory was developed by J. L. Kuhns to identify those formulas of the predicate calculus which represent the "reasonable" inquiries to put to this system. Accordingly, the classes of definite and proper formulus were defined, and their properties studied. The definite formulas share a semantic property Kuhns judged as necessarily possessed by a reasonable question to be processed by the RDF. The author has previously shown that the decision problem for the class of definite formulas is recursively unsolvable. The proper formulas are definite, and satisfy additional syntactic conditions intended to make them especially suitable for machine processing. The class of proper formulas depends on which logical primitives are employed. Different primitives give rise to different classes of formulas. A formula which can be effectively transformed into a proper equivalent is admissible. Kuhns conjectures that with respect to one particular class of proper formulas, all definite formulas are admissible. In the paper it is shown that the decision problem for several classes of proper formulas is solvable. The following results are established. Theorem 1: The class of proper formulas in prenex form on any complete set of connectives is recursive. Theorem 2: The class of proper formulas on ¬, ∨, ∃ is recursive. Theorem 3: The class of proper formulas on ¬, ⊃, ∃ is recursive. Theorem 4: The class of proper formulas on ¬, ⊃, ∨, ∃, is recursive. Thus, there is a mechanical decision procedure which determines whether an arbitrary formula is a member of the class. It follows that the analogues of Kuhns' conjecture for these classes are false.

6 citations


Proceedings ArticleDOI
A. M. Katcher1
01 Apr 1971
TL;DR: The ideas presented have been implemented on a version of TSS/360 Time Sharing System and are presently being used in a real environment and the overall compaction rate achieved was 3.16 to 1.
Abstract: The public storage in any time sharing system tends to continually grow. This necessitates the implementation of certain measures to maintain public storage. One of these possibilities is creation of an archival level of storage called "migrated" storage. Data that have not been referenced recently are moved or "migrated" to a less accessible level of external storage. Since these data are not accessed by the users directly, i.e., the data must be restored to public storage before being used, a certain variable length coding technique, viz., Huffman Coding, is used to compact and store these data. The ideas presented have been implemented on a version of TSS/360 Time Sharing System and are presently being used in a real environment. The overall compaction rate achieved was 3. 16 to 1. Further details on compaction rates and timings are also presented.

Proceedings ArticleDOI
01 Apr 1971
TL;DR: An experimental model for CUE, Proto-RELADES, which can "understand" and execute English sentences about the content of the library at IBM's Boston Programming Center is described.
Abstract: CUE, an input interface system which permits the computer to utilize natural but restricted English as input, is presented. In addition, an experimental model for CUE, Proto-RELADES, which can "understand" and execute English sentences about the content of the library at IBM's Boston Programming Center is described. These sentences can be query, command, or conditional sentences. The linguistic component of the system is based on a transformational grammar of English that performs a full syntactic and semantic analysis of each input sentence and translates it into relevant computer operations. The capabilities and limitations of this system are described.

Proceedings ArticleDOI
C. P. Wang1, V. Y. Lum1
01 Apr 1971
TL;DR: This paper utilizes the FOREM model as the principal tool and presents a hypothetical design example dealing with many essential issues of the design process and the possible tradeoffs can be identified.
Abstract: The design of a file system has never been a simple nor a straightforward task because of its complexity. Heuristics and experience still play a major role in guiding the design process. To organize the entire design process in a more systematic manner, large scale simulation has proved to be an effective technique. The FOREM models developed during the past several years (specifically for the evaluation of file system designs) represent facilities of this type. This paper utilizes the FOREM model as the principal tool and presents a hypothetical design example dealing with many essential issues of the design process. Evaluation of designs of several other actual file systems are also being carried out and will be reported at a later date. Only through quantitative evaluation can each design decision be arrived at correctly and the possible tradeoffs be identified.

Proceedings ArticleDOI
01 Apr 1971
TL;DR: Early developments and the status of recent efforts in document retrieval, quesiton-answering and data management systems are reviewed briefly.
Abstract: An introduction and some prospectives are provided for the 1971 ACM Information Storage and Retrieval Symposium held at the University of Maryland on April 1 and 2, 1971. The symposium, sponsored by the University of Maryland, the National Aeronautics and Space Administration and the Special Interest Group on Information Retrieval (SIGIR) of the ACM, focuses on advances in techniques in the computer oriented technology of information retrieval. Early developments and the status of recent efforts in document retrieval, quesiton-answering and data management systems are reviewed briefly.

Proceedings ArticleDOI
01 Apr 1971
TL;DR: The design and implementation of a general associative net structure to be used in an interactive information system are described and a scheme designed to manage large quantities of semantic data stored in a data base on disc is presented.
Abstract: This paper describes the design and implementation of a general associative net structure to be used in an interactive information system, and presents a scheme designed to manage large quantities of semantic data stored in a data base on disc. The associative-net-structured data base is functionally divided into two pools: the hierarchy pool and the linguistic pool. The network of items in the hierarchy pool represents the descriptive information about documents and the network of items in the linguistic pool represents the syntactic and semantic properties of the items in the hierarchy pool. Two search functions and a general search algorithm are presented in this paper. In the implementation, the data base is a regional data set on disc. Items and their associated labeled links are stored on disc tracks. The system establishes a directory to keep track of the items which have associated information stored on more than one track. The use of the directory eliminates unnecessary disc accesses and allows the system to move a proper track into core storage for data processing.

Proceedings ArticleDOI
01 Apr 1971
TL;DR: The ISDOS project is outlined as a coordinated approach which may serve as the beginning in the development of a satisfactory structure of a language and terminology for communication, a model of information processing systems and tools for studying and analyzing problems related to such systems.
Abstract: Research in file organization is a problem in which the unstructured efforts of many individual researchers have not produced results commensurate with the effort expended. The paper briefly examines some of the reasons for this and suggests a structure consisting of a language and terminology for communication, a model of information processing systems and tools for studying and analyzing problems related to such systems. The ISDOS project is outlined as a coordinated approach which may serve as the beginning in the development of a satisfactory structure.

Proceedings ArticleDOI
01 Apr 1971
TL;DR: A set of basic operations on types of files are defined, intended to fulfill the same role for information retrieval systems programmers that functions such as LOG(X) fill for mathematical applications programmers.. they should make the job very much easier.
Abstract: One of the difficulties faced in implementing information management and retrieval systems is that each case seems to present its own special complexities. As a result information retrieval systems typically fall behind their programming schedule and have many bugs when delivered. In this paper a set of basic operations on types of files are defined. These operations are intended to fulfill the same role for information retrieval systems programmers that functions such as LOG(X) fill for mathematical applications programmers.. they should make the job very much easier. The file operations have been implemented as a run-time package written in FORTRAN IV and Burroughs Extended Algol. The approach has been used to develop three different information management systems; an APL interactive computing system, a generalized information retrieval system, and a specialized information retrieval system for map oriented data. These systems are described.

Proceedings ArticleDOI
01 Apr 1971
TL;DR: An approach to determining an appropriate file structure for a given application is presented, by outlining a methodology for comparing some important aspects of data management system performance.
Abstract: An approach to determining an appropriate file structure for a given application is presented, by outlining a methodology for comparing some important aspects of data management system performance. The aspect chosen for analysis is the processing time required to evaluate Boolean functions defined on data values contained within a file structure and select elements from the structure satisfying the expression.Two file structures are studied. The structures are each combinations of hierarchical and inverted file organizations, which differ in the use of the pointers contained in the inverted file. In one case they link a value to nodes corresponding to its occurrence in the data hierarchy and in the second they link a value to the entry which contains the node corresponding to an occurrence.Algorithms for processing within each of the structures are discussed. Each algorithm is then modeled, and approximating models developed for simulation of the algorithms.

Proceedings ArticleDOI
01 Apr 1971
TL;DR: A notion is introduced which indicates the extent to which retrieval performance may be improved by a suitable choice of classification within the model and a method for determining the optimal performance for the model is outlined.
Abstract: A particular classification and retrieval model are considered. A notion is introduced which indicates the extent to which retrieval performance may be improved by a suitable choice of classification within the model. A method for determining the optimal performance for the model is outlined together with an algorithm for constructing the classification which allows this limit to be attained. A treatment of the mathematical preliminaries for a particular class of match function is given. The relevance of the analysis in research on information retrieval systems is discussed.