scispace - formally typeset
Search or ask a question

Showing papers on "Document retrieval published in 1968"


Journal ArticleDOI
TL;DR: A measure of document retrieval system performance called the “expected search length reduction factor” is defined and compared with indicators, such as precision and recall, that have been suggested by other workers.
Abstract: A measure of document retrieval system performance called the “expected search length reduction factor” is defined and compared with indicators, such as precision and recall, that have been suggested by other workers. The new measure is based on calculations of the expected number of irrelevant documents in the collection which would have to be searched through before the desired number of relevant documents could be found. Its advantages are: (1) it provides a single index for the property it attempts to measure; (2) it allows for gradations of retrieval status, through the mathematical concept of a “weak ordering”; (3) it evaluates retrieval performance relative to random searching; and (4) it takes into account the amount of relevant material desired by the requester.

281 citations


Proceedings Article
01 Feb 1968
TL;DR: Evaluation results for the real-time retrieval procedures are used to derive design criteria for future automatic information systems, and various user-controlled search strategies are described.
Abstract: Future operating document retrieval systems may be based on fully-automatic information analysis methods instead of manual indexing, and on real-time search procedures which allow the user to interact with the system during the search process. Performance characteristics are first given for fully-automatic information retrieval systems, and comparisons are made with presently operating partly-manual systems. Thereafter, various user-controlled search strategies are described, and the potential of these strategies in improving systems performance is discussed. The evaluation results for the real-time retrieval procedures are used to derive design criteria for future automatic information systems.

28 citations


Patent
01 May 1968

24 citations


Journal ArticleDOI
TL;DR: This paper considers some of these factors in relation to various parts of the complete retrieval system: the acquisition subsystem, the indexing subsystem,The index language, the searching subsystem, and the equipment subsystem.
Abstract: A retrieval system may be evaluated strictly in terms of user satisfaction (operating efficiency), or it may be evaluated from the point of view of efficient means of satisfying user requirements (economic efficiency). When we consider the relationship between operating efficiency and economic efficiency, we are faced with a whole series of possible trade‐offs. There may be several alternative paths we can follow in order to serve user needs. The problem is to determine the most economical path to follow. Pay‐off factors, break‐even points, and diminishing returns must be taken into consideration. This paper considers some of these factors in relation to various parts of the complete retrieval system: the acquisition subsystem, the indexing subsystem, the index language, the searching subsystem, and the equipment subsystem.

13 citations


Proceedings Article
01 Jan 1968

11 citations


01 Jan 1968
TL;DR: The Normalized Sentence-Index Matrix (N-SIM) system suggested differs from more traditional retrieval systems for legal literature in three respects: the categories used for classification are normalized versions of sentences from statutes, regulations, treaties, constitutions, case opinions, legal treatises, law review articles, and other documents in legal literature, the classification system is hierarchial and open-ended to evolve with the literature through time.
Abstract: An information retrieval system (as distinguished from a document retrieval system) is described for handling statute-oriented legal literature. The Normalized Sentence-Index Matrix (N-SIM) system suggested differs from more traditional retrieval systems for legal literature in three respects: (1) the categories used for classification are normalized versions of sentences from statutes, regulations, treaties, constitutions, case opinions, legal treatises, law review articles, and other documents in legal literature, (2) the classification system is hierarchial and open-ended to evolve with the literature through time, and (3) the organization of the file facilitates some analysis of the literature by computer. A sentence is expressed in implicative normal form (INF) when three specified conditions are fulfilled. Statutory norms are converted into INF before being stored in the N-SIM file. Negative implicative normal form (NINF) is also defined, and all assertions in legal literature about aspects of the statutory norms are converted into either INF or NINF for storage in the N-SIM file. The N-SIM file is designed so that it can be used manually as a loose-leaf service or in a system of automatic data processing by machine. It is hypothesized that statutes expressed in this normalized form will be understood by various audiences of readers both more quickly and more accurately than statutes expressed in their current form. A method for empirically testing this hypothesis is suggested. INTRODUCTION This paper is in the nature of a progress report on the development of an information retrieval system for legal literature that was first suggested at the 1965 Congress of the International Federation for Documentation in Washington, D. C. 1 Further elaboration of a "language normalization" approach *Presented to the American Bar Association Special Committee on Electronic Data Retrieval, August 6, 1967, in Honolulu, Hawaii. 1. Layman E. Allen, Sketch of a Proposed Semi-Automatic, Hierarchical, Open-Ended Storage and Retrieval System for Statute-Oriented Legal Literature, Proceedings of the 1965 Congress of FID (International Federation for Documentation), Washington, D.C., October 10-15, 1965 Area IV, INFORMATION NEEDS OF SOCIETY, Symposium B, Specific Knowledge Areas.

6 citations


Journal ArticleDOI
TL;DR: Experiments designed to evaluate the capabilities of mechanized information retrieval systems, with emphasis on interactive (man-machine) language and on some of the mechanical and psychological limitations in their design, were conducted at the Moore School Information Systems Laboratory.
Abstract: Experiments designed to evaluate the capabilities of mechanized information retrieval systems, with emphasis on interactive (man-machine) language and on some of the mechanical and psychological limitations in their design, were conducted at the Moore School Information Systems Laboratory. The basic assumption of the research is that an information retrieval system that provides for man-machine dialogue at a remote inquiry terminal should provide a searcher with many of the tools which would be available to him were he actually performing his search at a library or repository of documents. Factors involved in evaluation of such a system include ease of use, learning time, and effectiveness of actual retrieval. Three experiments and the conclusions resulting from them are detailed.

6 citations


Journal ArticleDOI
TL;DR: Easy English as discussed by the authors is a natural command language designed to simplify communication between man and machine through remote typewriter console, and it has been developed for retrieval of documents from a computerized data base, the Moore School Information Systems Laboratory files.
Abstract: Easy English is a natural command language designed to simplify communication between man and machine through remote typewriter console. It has been developed for retrieval of documents from a computerized data base, the Moore School Information Systems Laboratory files. Requests are formulated in a standardized syntactical form (examples of which are presented), and this form is then transformed into an equivalent query expressed in the retrieval system's original Symbolic Command Language, which is briefly described.Operation of Easy English is detailed by illustration of the transformations performed upon a sample request up to the point at which the request string is sent to the system. A macro flowchart of Easy English is included, and an Appendix provides the printout of a retrieval demonstration.

5 citations


Journal ArticleDOI
Michael Lesk1
TL;DR: Results from three document collections indicate that word normalization is efficiently performed by automatic thesaurus lookup, while phrase matching procedures, statistical association methods, and concept hierarchies are useful for special applications.

3 citations


Journal ArticleDOI
John O'Connor1
TL;DR: This paper is concerned with retrieval of documents, in response to a question, from which answers to that question can be inferred, and two sources of systematic knowledge of document-statement inference practices in a scientific field are described.
Abstract: (I) Better understanding of subject document retrieval might result if different functions of subject document retrieval systems are studied separately. This paper is concerned with retrieval of documents, in response to a question, from which answers to that question can be inferred (“answer-providing documents”). “Answer can be inferred from document” has many possible meanings, one of which must be selected (an “inference specification”). Inasmuch as scientists in a field sometimes disagree about the correctness of inferences, have somewhat different background knowledge, etc., any inference specification can only approximate scientific inference practices. Two sources of systematic knowledge of document-statement inference practices in a scientific field are described. (II) If a content word occurs in a question, then it occurs in any answer to that question (with some apparently tractable exceptions). An indexing procedure based on that fact is described which would permit retrieval of all answer-providing documents for a question. However, because the indexing is “nonrelational,” it could cause false retrievals as well. Various ways of dealing with such false retrievals are briefly indicated, and a study is sketched that would provide data for helping selection among them. Two special points concerning indexing for retrieval of answer-providing documents are discussed separately.

3 citations



Journal ArticleDOI
TL;DR: The importance of the ability to formulate reasonable and meaningful retrieval system goals as problems amenable to solution by known quantitative techniques is pointed out, and some discussion of computational feasibility and solution implementation is given.

01 Dec 1968
TL;DR: The REQUEST document retrieval system is described, including a description of the data base, the interactive mode, the modular implementation in ISL--a string manipulating language, and the multiple-level Boolean hierarchy query format.
Abstract: : The REQUEST document retrieval system is described. Included is a description of the data base, the interactive mode, the modular implementation in ISL--a string manipulating language, and the multiple-level Boolean hierarchy query format. (Author)

01 Aug 1968
TL;DR: Abstract : Contents: Previous processors and some problems; Processor description; Data and programming formats; Programs and examples.
Abstract: : Contents: Previous processors and some problems; Processor description; Data and programming formats; Programs and examples.

Journal ArticleDOI
TL;DR: The system is designed primarily for company reports but may include any documentary input, and computer programs comprise the Combined File Search system (IBM 10.3.047) with several interface programs included.
Abstract: The system is designed primarily for company reports but may include any documentary input. The following steps are involved in input: (1) Documents are scanned, selected, and subdivided (where necessary); (2) on a worksheet are entered a bibliographic description, a store address, and index descriptors and subdescriptors; index entries are chosen with the aid of a GAF Thesaurus structured to control synonymity, generic and other conceptual relationships; (3) document and worksheet are microfilmed for storage; (4) the worksheet contents are keyboarded for machine-readable records, first on tape for IBM 1401, in the future on disc for a third-generation computer. The input is punched cards at present, but more sophisticated input devices are under study. Computer programs comprise the Combined File Search (CFS) system (IBM 10.3.047) with several interface programs included. Document retrieval involves the following steps: (1) An inquirer from any one of the Company's units presents a Search Request Form using both narrative and descriptors chosen with the aid of the GAF Thesaurus; (2) the question is transformed at the center into a set of search descriptors and subdescriptors in appropriate logic statements involving AND, OR, and AND NOT logic, with the option of linking descriptors or truncating them to get word stems; the program has the option of printing out all or part of the master records for documents retrieved; (3) the inquirer reviews the index output and selects relevant addresses; (4) the addressed reports are located in the microfilm store at his location, reviewed, and prints made where needed.