scispace - formally typeset
Search or ask a question

Showing papers on "Document retrieval published in 1969"


Journal ArticleDOI
Michael Lesk1
TL;DR: The SMART automatic document retrieval system is used to study association procedures for automatic content analysis, and the effect of word frequency and other parameters on the association process is investigated through examination of related pairs and through retrieval experiments.
Abstract: The SMART automatic document retrieval system is used to study association procedures for automatic content analysis. The effect of word frequency and other parameters on the association process is investigated through examination of related pairs and through retrieval experiments. Associated pairs of words usually reflect localized word meanings, and true synonyms cannot readily be found from first or second order relationships in our document collections. There is little overlap between word relationships found through associations and those used in thesaurus construction, and the effects of word associations and a thesaurus in retrieval are independent. The use of associations in retrieval experiments improves not only recall, by permitting new matches between requests and documents, but also precision, by reinforcing existing matches. In our experiments, the precision effect is responsible for most of the improvement possible with associations. A properly constructed thesaurus, however, offers better performance than statistical association methods.

187 citations


Proceedings ArticleDOI
Gerard Salton1
01 Sep 1969
TL;DR: The methods are evaluated, and it is shown that the effectiveness of the mixed language processing is approximately equivalent to that of the standard process operating within a single language only.
Abstract: Experiments conducted over the last few years with the SMART document retrieval system have shown that fully automatic text processing methods using relatively simple linguistic tools are as effective for purposes of document indexing, classification, search, and retrieval as the more elaborate manual methods normally used in practice. Up to now, all experiments were carried out entirely with English language queries and documents.The present study describes an extension of the SMART procedures to German language materials. A multi-lingual thesaurus is used for the analysis of documents and search requests, and tools are provided which make it possible to process English language documents against German queries, and vice versa. The methods are evaluated, and it is shown that the effectiveness of the mixed language processing is approximately equivalent to that of the standard process operating within a single language only.

21 citations


Proceedings ArticleDOI
01 Sep 1969
TL;DR: This paper describes a system for information storage, retrieval, and updating, with special attention to the search algorithm and data structure demanded for maximum program efficieny.
Abstract: This paper describes a system for information storage, retrieval, and updating, with special attention to the search algorithm and data structure demanded for maximum program efficieny. The program efficiency is especially warranted when a natural language or a symbolic language is involved in the searching process.The system is a basic framework for an efficient information system. It can be implemented for text processing and document retrieval; numerical data retrieval; and for handling of large files such as dictionaries, catalogs, and personnel records, as well as graphic informations. Currently, eight commands are implemented and operational in batch mode on a CDC 3600: STORE, RETRIEVE, ADD, DELETE, REPLACE, PRINT, COMPRESS and LIST. Further development will be on the use of teletype console, CRT terminal, and plotter under a time-sharing environment for producing immediate responses.The maximum program efficiency is obtained through a unique search algorithm and data structure. Instead of examining the recall ratio and the precision ratio at a higher level, this efficiency is measured in the most basic term of "average number of searches" required for looking up an item. In order to identify an item, at least one search is necessary even if it is found the first time. However, through the use of the hash-address of a key or keyword, in conjunction with an indirect-chaining list-structured table, and a large available space list, the average number of searches required for retrieving a certain item is 1.25 regardless of the size of the file in question. This is to be compared with 15.6 searches for the binary search technique in a 50,000-item file, and 5.8 searches for the letter-table method with no regard to file size.

6 citations


Journal ArticleDOI
TL;DR: Many scientists collect and maintain citations, references, and reprints in diverse forms that are subject to problems of care, cataloging, storage, and probably most important, of retrieval.
Abstract: Many scientists collect and maintain citations, references, and reprints in diverse forms. Investigators differ in the use, intensity, and breadth of their collections. The effort may be motivated by potential use to students and colleagues or may exist only long enough to deal with a particular problem or research project. All collections are subject to problems of care, cataloging, storage, and probably most important, of retrieval. Some workers have developed complex indexing procedures and some have given up and just stick them in boxes by author or date or some vague subject categorization. Several characteristics of modern digital computers afford some aid in solving the information retrieval problems. Many universities have time available to faculty and some have terminals available in most departments. Many installations have tape and disc storage and most serious workers have cardreading devices available within a short walk. Yet few investigators have computerized their efforts, preferring to await the future (Brown et al., 1967). Meanwhile, they expend time and effort in frustrating searches or extensive indexing. Computerized information retrieval of his own material is available

2 citations


01 Feb 1969
TL;DR: A computed aided indexing concept was formulated which is based on analysis of technical text and included a self-contained document analysis, storage and retrieval system with which requestors could interact by means of remote access terminals.
Abstract: : Additional part-time indexers were trained according to a previously established training program. Many documents are now being received in microfiche form. Several makes of microfiche readers were evaluated, particularly with regard to their use by indexers. One make offered the most advantages, and a number of these readers were purchased. A method of producing typed hard copy abstracts from microfiche was found. The number of search requests increased with requests both from outside organizations and from the AFML accounting for the increase. The thesaurus was updated and a separate section containing only metallic materials terminology made the thesaurus easier to use. A computed aided indexing concept was formulated which is based on analysis of technical text. Text words are matched against files of nontechnical words, technical words and bound terms. Words not recognized are presented for human intellectual decisions which are subsequently incorporated into the system. The concept included a self-contained document analysis, storage and retrieval system with which requestors could interact by means of remote access terminals. Remote communication capability with the computer was accomplished, some programs were prepared and file structures based on word length and alphabetization of the first two characters were designed.

1 citations