Showing papers presented at "International ACM SIGIR Conference on Research and Development in Information Retrieval in 1971"

PDF

Open Access

Proceedings Article•DOI•

A balanced tree storage and retrieval algorithm

[...]

Gary D. Knott¹•Institutions (1)

01 Apr 1971

TL;DR: The deletion problem is also solved and the programming approaches involved yield a non-trivial case study of list-processing techniques.

...read moreread less

Abstract: A storage and retrieval scheme which places items to be stored at the nodes of a binary tree is discussed. The tree is always balanced in a certain sense thus insuring that no excessively long search paths can exist. In addition to presenting the storage and retrieval algorithms, the deletion problem is also solved. The programming approaches involved yield a non-trivial case study of list-processing techniques. Finally, a cost analysis is given.

...read moreread less

21 citations

Proceedings Article•DOI•

The converse natural language data management system: current status and plans

[...]

Charles Kellogg¹, John D. Burger¹, Timothy Diller¹, Kenneth Fogt¹•Institutions (1)

System Development Corporation¹

01 Apr 1971

TL;DR: An overview of research in progress in which a natural-language compiler has been constructed that accepts sentences in a user-extendable English subset, produces surface and deep-structure syntactic analyses, and uses a network of concepts to construct semantic interpretations formalized as computable procedures.

...read moreread less

Abstract: This paper presents an overview of research in progress in which the principal aim is the achievement of more natural and expressive modes of on-line communication with complexly structured data bases. A natural-language compiler has been constructed that accepts sentences in a user-extendable English subset, produces surface and deep-structure syntactic analyses, and uses a network of concepts to construct semantic interpretations formalized as computable procedures. The procedures are evaluated by a data management system that updates, modifies, and searches data bases that can be formalized as finite models of states of affairs. The system has been designed and programmed to handle large vocabularies and large collections of facts efficiently. Plans for extending the research vehicle to interface with a deductive inference component and a voice input-output effort are briefly described.

...read moreread less

20 citations

Proceedings Article•DOI•

Elements of the randomized combinatorial file structure

[...]

Richard A. Gustafson¹•Institutions (1)

Air Force Technical Applications Center¹

01 Apr 1971

TL;DR: A storage and retrieval system utilizing the combinatorial file structure is developed and the new file organization is shown to have marked value with respect to minimum storage overhead and high retrieval speed.

...read moreread less

Abstract: A file structure designed to provide rapid, random access with minimum storage overhead is presented. Storage and retrieval are achieved by direct attribute combination-to-address transformation thereby negating the necessity for large file dictionaries or list-pointer structures. The attribute combination-to-address transformation is conceptually similar to key-to-address transformation techniques, but the transformation is not limited to operations on a single key but operates on the combination of several independent keys (or any subset of the combination) describing an item or request.A storage and retrieval system utilizing the combinatorial file structure is developed. Storage and retrieval results derived from a simulated document library of 4000 items are presented. The new file organization is shown to have marked value with respect to minimum storage overhead and high retrieval speed.

...read moreread less

17 citations

Proceedings Article•DOI•

Full text document retrieval: Hebrew legal texts (report on the first phase of the responsa retrieval project)

[...]

Y. Choueka¹, M. Cohen, J. Dueck², Aviezri S. Fraenkel³, M. Slae³ - Show less +1 more•Institutions (3)

Bar-Ilan University¹, Hebrew University of Jerusalem², Weizmann Institute of Science³

01 Apr 1971

TL;DR: A full text retrieval system was designed for the responsa literature, which is a large corpus of Hebrew legal cases, and several methods were developed, among them "grammatical synthesis", which synthesizes all grammatical variants of a given keyword.

...read moreread less

Abstract: A full text retrieval system was designed for the responsa literature, which is a large corpus of Hebrew legal cases. The unique problems of the data base --- mixture of Hebrew, Aramaic and vernaculars, lack of vowels and punctuation, extreme language inflection problems, homographs, existence of thousands of grammatical variants of any given keyword --- dictated development of new methods. Among them we list "grammatical synthesis", which synthesizes all grammatical variants of a given keyword; "Compact KWIC", which enables the user to have a glimpse of the nature of the search before having performed it; effective citation index imbedded in full text searches; and, in general, extensive use of both positive and negative feedback within a single search run. A number of searches performed on a relatively small data base gave in each case a recall of 100%. The average precision was 34%. A KWIC of strategic portions of retrieved documents usually enables a quick disposal of non-relevant material.

...read moreread less

15 citations

Proceedings Article•DOI•

The function of semantics in automated language processing

[...]

Milos G. Pacak¹, Arnold W. Pratt¹•Institutions (1)

National Institutes of Health¹

01 Apr 1971

TL;DR: This paper is a survey of some of the major semantic models that have been developed for automated semantic analysis of natural language, and concludes that the models described are significant contributions to an unexplored field called semantics.

...read moreread less

Abstract: This paper is a survey of some of the major semantic models that have been developed for automated semantic analysis of natural language. Current approaches to semantic analysis and logical inference are based mainly on models of human cognitive processes such as Quillian's semantic memory, Simmon's Protosynthex III and others. All existing systems and/or models, more or less experimental, were applied to a small subset of English. They are highly tentative because the definitions of semantic processes and semantically structured lexicons are not formulated rigorously. This is due mainly to the fact that it is unknown whether a unique, consistent hierarchization of the semantic features of language is possible.However, the models described are significant contributions to an unexplored field called semantics. The progressive development of a sophisticated, semantically based system for automated processing of natural language is a realistic goal. It should not be neglected, despite the fact that it is difficult to predict when this goal will be achieved.

...read moreread less

14 citations

Proceedings Article•DOI•

How features resolve syntactic ambiguity

[...]

Bozena Henisz Dostert¹, Frederick B. Thompson¹•Institutions (1)

California Institute of Technology¹

01 Apr 1971

TL;DR: The efficiency of processing natural language in REL English is achieved both by the detailed syntactic aspects which are incorporated into the REL English grammar, and by means of the particular implementation for processing features in the parsing algorithm.

...read moreread less

Abstract: Ambiguity is a pervasive and important aspect of natural language. Ambiguities, which are disambiguated by context, contribute powerfully to the expressiveness of natural language as compared to formal languages. In computational systems using natural language, problems of properly controlling ambiguity are particularly large, partially because of the necessity to circumvent parsings due to multiple orderings in the application of rules.Features, that is, subcategorizations of parts-of-speech, constitute an effective means for controlling syntactic ambiguity through ordering the hierarchical organization of syntactic constituents. This is the solution adopted for controlling ambiguity in REL English, which is part of the REL (Rapidly Extensible Language) System. REL is a total software system for facilitating man/machine communications. The efficiency of processing natural language in REL English is achieved both by the detailed syntactic aspects which are incorporated into the REL English grammar, and by means of the particular implementation for processing features in the parsing algorithm.

...read moreread less

13 citations

Proceedings Article•

Full text document retrieval: Hebrew legal texts

[...]

Yaacov Choueka, M. Cohen, J. Dueck, Aviezri S. Fraenkel, M. Slae - Show less +1 more

01 Jan 1971

9 citations

Proceedings Article•DOI•

Data compression techniques for economic processing of large commercial files

[...]

James E. Mulford, Richard K. Ridall

01 Apr 1971

TL;DR: The application of compact coding, differencing and other techniques to indexed sequential files is discussed, the effects on system performance are discussed and reductions of almost 80% in mass storage requirements for a particular file are reported.

...read moreread less

Abstract: The application of compact coding, differencing and other techniques to indexed sequential files is discussed. The effects on system performance are discussed and reductions of almost 80% in mass storage requirements for a particular file are reported.

...read moreread less

8 citations

Proceedings Article•DOI•

Quantification in query systems

[...]

J. L. Kuhns

01 Apr 1971

TL;DR: Questions which involve 'all', 'every', 'some', or the indefinite article, pose some peculiar problems when presented to a computerized question-answering system where ambiguities cannot be tolerated and can be solved by introducing a new kind of quantification that has the meaning of 'all F' together with a secondary meaning that the class F is not empty.

...read moreread less

Abstract: Questions which involve 'all', 'every', 'some', or the indefinite article, pose some peculiar problems when presented to a computerized question-answering system where ambiguities cannot be tolerated. These problems vary from the nature of the correct answer in special cases to the very admissibility of the question itself. To deal with these problems it is convenient to divide questions into two classes---extensional questions whose answers are to name things or truth values, intensional questions whose answers are to give meanings. This paper examines extensional questions. For these, the interpretative problems arising with 'all' and 'every' can be solved by introducing a new kind of quantification, extensional universal quantification, that has the meaning of 'all F' together with a secondary meaning that the class F is not empty. Formal rules for this quantification are given, and it is shown that the so-called definite formulas (which explicate permissible queries) are closed under the new operator.

...read moreread less

7 citations

Proceedings Article•DOI•

The relational data file and the decision problem for classes of proper formulas

[...]

Robert A. DiPaola¹•Institutions (1)

RAND Corporation¹

01 Apr 1971

TL;DR: In the paper it is shown that the decision problem for several classes of proper formulas is solvable, and the analogues of Kuhns' conjecture for these classes are false.

...read moreread less

Abstract: The Relational Data File (RDF) of The Rand Corporation is among the most developed of question-answering systems. The "information language" of this system is an applied predicate calculus. The atomic units of information are binary relational sentences. The system has an inference-making capacity.As part of the actual construction and implementation of the RDF, a theory was developed by J. L. Kuhns to identify those formulas of the predicate calculus which represent the "reasonable" inquiries to put to this system. Accordingly, the classes of definite and proper formulus were defined, and their properties studied. The definite formulas share a semantic property Kuhns judged as necessarily possessed by a reasonable question to be processed by the RDF. The author has previously shown that the decision problem for the class of definite formulas is recursively unsolvable. The proper formulas are definite, and satisfy additional syntactic conditions intended to make them especially suitable for machine processing. The class of proper formulas depends on which logical primitives are employed. Different primitives give rise to different classes of formulas. A formula which can be effectively transformed into a proper equivalent is admissible. Kuhns conjectures that with respect to one particular class of proper formulas, all definite formulas are admissible. In the paper it is shown that the decision problem for several classes of proper formulas is solvable. The following results are established. Theorem 1: The class of proper formulas in prenex form on any complete set of connectives is recursive. Theorem 2: The class of proper formulas on ¬, ∨, ∃ is recursive. Theorem 3: The class of proper formulas on ¬, ⊃, ∃ is recursive. Theorem 4: The class of proper formulas on ¬, ⊃, ∨, ∃, is recursive. Thus, there is a mechanical decision procedure which determines whether an arbitrary formula is a member of the class. It follows that the analogues of Kuhns' conjecture for these classes are false.

...read moreread less

6 citations

Proceedings Article•DOI•

Efficient utilization of limited access archival storage in a time shared environment

[...]

A. M. Katcher¹•Institutions (1)

IBM¹

01 Apr 1971

TL;DR: The ideas presented have been implemented on a version of TSS/360 Time Sharing System and are presently being used in a real environment and the overall compaction rate achieved was 3.16 to 1.

...read moreread less

Abstract: The public storage in any time sharing system tends to continually grow. This necessitates the implementation of certain measures to maintain public storage. One of these possibilities is creation of an archival level of storage called "migrated" storage. Data that have not been referenced recently are moved or "migrated" to a less accessible level of external storage. Since these data are not accessed by the users directly, i.e., the data must be restored to public storage before being used, a certain variable length coding technique, viz., Huffman Coding, is used to compact and store these data. The ideas presented have been implemented on a version of TSS/360 Time Sharing System and are presently being used in a real environment. The overall compaction rate achieved was 3. 16 to 1. Further details on compaction rates and timings are also presented.

...read moreread less

Proceedings Article•DOI•

CUE: a preprocessor system for restricted, natural English

[...]

David B. Loveman¹, John A. Moyne², Robert G. Tobey²•Institutions (2)

Air Force Institute of Technology¹, IBM²

01 Apr 1971

TL;DR: An experimental model for CUE, Proto-RELADES, which can "understand" and execute English sentences about the content of the library at IBM's Boston Programming Center is described.

...read moreread less

Abstract: CUE, an input interface system which permits the computer to utilize natural but restricted English as input, is presented. In addition, an experimental model for CUE, Proto-RELADES, which can "understand" and execute English sentences about the content of the library at IBM's Boston Programming Center is described. These sentences can be query, command, or conditional sentences. The linguistic component of the system is based on a transformational grammar of English that performs a full syntactic and semantic analysis of each input sentence and translates it into relevant computer operations. The capabilities and limitations of this system are described.

...read moreread less

Proceedings Article•DOI•

Quantitative evaluation of design tradeoffs in file systems

[...]

C. P. Wang¹, V. Y. Lum¹•Institutions (1)

IBM¹

01 Apr 1971

TL;DR: This paper utilizes the FOREM model as the principal tool and presents a hypothetical design example dealing with many essential issues of the design process and the possible tradeoffs can be identified.

...read moreread less

Abstract: The design of a file system has never been a simple nor a straightforward task because of its complexity. Heuristics and experience still play a major role in guiding the design process. To organize the entire design process in a more systematic manner, large scale simulation has proved to be an effective technique. The FOREM models developed during the past several years (specifically for the evaluation of file system designs) represent facilities of this type. This paper utilizes the FOREM model as the principal tool and presents a hypothetical design example dealing with many essential issues of the design process. Evaluation of designs of several other actual file systems are also being carried out and will be reported at a later date. Only through quantitative evaluation can each design decision be arrived at correctly and the possible tradeoffs be identified.

...read moreread less

Proceedings Article•DOI•

Introduction and perspectives for the 1971 ACM Information Storage and Retrieval Symposium

[...]

Jack Minker¹, Sam Rosenfeld¹•Institutions (1)

University of Maryland, College Park¹

01 Apr 1971

TL;DR: Early developments and the status of recent efforts in document retrieval, quesiton-answering and data management systems are reviewed briefly.

...read moreread less

Abstract: An introduction and some prospectives are provided for the 1971 ACM Information Storage and Retrieval Symposium held at the University of Maryland on April 1 and 2, 1971. The symposium, sponsored by the University of Maryland, the National Aeronautics and Space Administration and the Special Interest Group on Information Retrieval (SIGIR) of the ACM, focuses on advances in techniques in the computer oriented technology of information retrieval. Early developments and the status of recent efforts in document retrieval, quesiton-answering and data management systems are reviewed briefly.

...read moreread less

Proceedings Article•DOI•

Managing semantic data in an associative net

[...]

Stanley Y. W. Su¹•Institutions (1)

University of Florida¹

01 Apr 1971

TL;DR: The design and implementation of a general associative net structure to be used in an interactive information system are described and a scheme designed to manage large quantities of semantic data stored in a data base on disc is presented.

...read moreread less

Abstract: This paper describes the design and implementation of a general associative net structure to be used in an interactive information system, and presents a scheme designed to manage large quantities of semantic data stored in a data base on disc. The associative-net-structured data base is functionally divided into two pools: the hierarchy pool and the linguistic pool. The network of items in the hierarchy pool represents the descriptive information about documents and the network of items in the linguistic pool represents the syntactic and semantic properties of the items in the hierarchy pool. Two search functions and a general search algorithm are presented in this paper. In the implementation, the data base is a regional data set on disc. Items and their associated labeled links are stored on disc tracks. The system establishes a directory to keep track of the items which have associated information stored on more than one track. The use of the directory eliminates unnecessary disc accesses and allows the system to move a proper track into core storage for data processing.

...read moreread less

Proceedings Article•DOI•

An approach to research in file organization

[...]

Daniel Teichroew¹•Institutions (1)

University of Michigan¹

01 Apr 1971

TL;DR: The ISDOS project is outlined as a coordinated approach which may serve as the beginning in the development of a satisfactory structure of a language and terminology for communication, a model of information processing systems and tools for studying and analyzing problems related to such systems.

...read moreread less

Abstract: Research in file organization is a problem in which the unstructured efforts of many individual researchers have not produced results commensurate with the effort expended. The paper briefly examines some of the reasons for this and suggests a structure consisting of a language and terminology for communication, a model of information processing systems and tools for studying and analyzing problems related to such systems. The ISDOS project is outlined as a coordinated approach which may serve as the beginning in the development of a satisfactory structure.

...read moreread less

Proceedings Article•DOI•

A heathkit method for building data management programs

[...]

Earl Hunt¹, Gary A. Kildall²•Institutions (2)

University of Washington¹, Naval Postgraduate School²

01 Apr 1971

TL;DR: A set of basic operations on types of files are defined, intended to fulfill the same role for information retrieval systems programmers that functions such as LOG(X) fill for mathematical applications programmers.. they should make the job very much easier.

...read moreread less

Abstract: One of the difficulties faced in implementing information management and retrieval systems is that each case seems to present its own special complexities. As a result information retrieval systems typically fall behind their programming schedule and have many bugs when delivered. In this paper a set of basic operations on types of files are defined. These operations are intended to fulfill the same role for information retrieval systems programmers that functions such as LOG(X) fill for mathematical applications programmers.. they should make the job very much easier. The file operations have been implemented as a run-time package written in FORTRAN IV and Burroughs Extended Algol. The approach has been used to develop three different information management systems; an APL interactive computing system, a generalized information retrieval system, and a specialized information retrieval system for map oriented data. These systems are described.

...read moreread less

Proceedings Article•DOI•

File structure determination

[...]

Anthony J. Winkler¹, Alfred G. Dale²•Institutions (2)

United States Air Force Academy¹, University of Texas at Austin²

01 Apr 1971

TL;DR: An approach to determining an appropriate file structure for a given application is presented, by outlining a methodology for comparing some important aspects of data management system performance.

...read moreread less

Abstract: An approach to determining an appropriate file structure for a given application is presented, by outlining a methodology for comparing some important aspects of data management system performance. The aspect chosen for analysis is the processing time required to evaluate Boolean functions defined on data values contained within a file structure and select elements from the structure satisfying the expression.Two file structures are studied. The structures are each combinations of hierarchical and inverted file organizations, which differ in the use of the pointers contained in the inverted file. In one case they link a value to nodes corresponding to its occurrence in the data hierarchy and in the second they link a value to the entry which contains the node corresponding to an occurrence.Algorithms for processing within each of the structures are discussed. Each algorithm is then modeled, and approximating models developed for simulation of the algorithms.

...read moreread less

Proceedings Article•DOI•

Optimal classification and its consequences

[...]

David M. Jackson¹•Institutions (1)

Cornell University¹

01 Apr 1971

TL;DR: A notion is introduced which indicates the extent to which retrieval performance may be improved by a suitable choice of classification within the model and a method for determining the optimal performance for the model is outlined.

...read moreread less

Abstract: A particular classification and retrieval model are considered. A notion is introduced which indicates the extent to which retrieval performance may be improved by a suitable choice of classification within the model. A method for determining the optimal performance for the model is outlined together with an algorithm for constructing the classification which allows this limit to be attained. A treatment of the mathematical preliminaries for a particular class of match function is given. The relevance of the analysis in research on information retrieval systems is discussed.

...read moreread less