scispace - formally typeset
Search or ask a question

Showing papers on "Document retrieval published in 1982"


Book
01 Jan 1982

101 citations


Journal ArticleDOI
TL;DR: A uniform and efficient approach for processing all query terms, based on a “permuted dictionary” and a corresponding set of access routines, requires essentially one disk access to obtain from the dictionary all the strings represented by a truncated term, with negligible computing time.
Abstract: In a typical inverted-file full-text document retrieval system, the user submits queries consisting of strings of characters combined by various operators. The strings are looked up in a text-dictionary which lists, for each string, all the places in the database at which it occurs. It is desirable to allow the user to include in his query truncated terms such as X ∗ , ∗ X , ∗ X ∗ , or X ∗ Y , where X and X are specified strings and ∗ is a variable-length-don't-care character, that is, ∗ represents an arbitrary, possibly empty, string. Processing these terms involves finding the set of all words in the dictionary that match these patterns. How to do this efficiently is a long-standing open problem in this domain. In this paper we present a uniform and efficient approach for processing all such query terms. The approach, based on a “permuted dictionary” and a corresponding set of access routines, requires essentially one disk access to obtain from the dictionary all the strings represented by a truncated term, with negligible computing time. It is thus well suited for on-line applications. Implementation is simple, and storage overhead is low: it can be made almost negligible by using some specially adapted compression techniques described in the paper. The basic approach is easily adaptable for slight variants, such as fixed (or bounded) length don't-care characters, or more complex pattern matching templates.

29 citations


Journal ArticleDOI
TL;DR: A closer examination of current theoretical developments and the present practice of information retrieval reveals that a combination of the existing theory of probabilistic retrieval into a practical methodology based on Boolean searches would be very promising.
Abstract: Up to the present a number of theoretical approaches has been developed in order to improve the effectiveness and/or efficiency of information retrieval. However, most of the information retrieval methods based on these approaches have never passed the stage of purely theoretical investigation, and experimentation, if carried out at all, has been very limited in scale. A closer examination of current theoretical developments and the present practice of information retrieval reveals that a combination of the existing theory of probabilistic retrieval into a practical methodology based on Boolean searches would be very promising. Such an extension of the probabilistic approach to information retrieval is outlined.

21 citations


Proceedings ArticleDOI
18 May 1982
TL;DR: The CITE (Current Information Transfer in English) prototype system as mentioned in this paper is a large-scale, weighted logic information retrieval system with natural language query input, ranked search output, dynamic user feedback and automatic associative vocabulary mapping capahilities.
Abstract: Large operational information retrieval systems typically employ inverted file structures and Boolean logic operators for efficient text retrieval. These systems require considerable user training for effective use. As a consequence, searching is commonly performed by professional intermediaries on behalf of end users.By contrast, many small scale experimental retrieval systems incorporate desirable user interface features, such as natural (English) language querying, ranked output and relevance feedback.The author describes the design and implementation of a natural language search interface to MEDLINE, the National library of Medicines largest and most heavily used data base. The CITE (Current Information Transfer in English) prototype system is a large-scale, weighted logic information retrieval system with natural language query input, ranked search output, dynamic user feedback and automatic associative vocabulary mapping capahilities.

20 citations


Proceedings ArticleDOI
18 May 1982

17 citations


Proceedings ArticleDOI
18 May 1982
TL;DR: A software implementation of an 'intelligent terminal' that overcomes many of the objections to non-Boolean retrieval and includes such features as the weighting of search terms, the construction and submission of search statements in the Common Command Language of EURONET, and the use of relevance feedback information to improve retrieval.
Abstract: Research has shown that document retrieval based on weighting functions and incorporating relevance feedback may be more effective than retrieval based on Boolean combinations. These novel methods have not been adopted by the large operational systems. In this paper, a software implementation of an 'intelligent terminal' is described. It overcomes many of the objections to non-Boolean retrieval. It is linked to an operational database via EURONET, and includes such features as the weighting of search terms, the construction and submission of search statements in the Common Command Language of EURONET, and the use of relevance feedback information to improve retrieval.

17 citations


Patent
08 Oct 1982
TL;DR: In this paper, the arrangement and retrieval of many document files with easy operation in efficient way, by executing the designation with attribute table formed in hierarchy and retrieval processing with tree structure retrieval file.
Abstract: PURPOSE:To realize the arrangement and retrieval of many document files with easy operation in efficient way, by executing the designation with attribute table formed in hierarchy and retrieval processing with tree structure retrieval file. CONSTITUTION:Document managing data comprising an attribute managing table and attribute data stored on a recording medium is transferred to a system control section 3 so as to organize the retrieval data file of the tree structure by each document attribute and the file is registered in a retrieval data file section 6. A check item A1 is designated by looking into a document attribute display A of list form of the highest rank from an operation display section 4, the item is retrieved by using the tree structure file of the retrieval file section 6 and the document attribute B of the list form of the next ranking is displayed.

10 citations


Journal ArticleDOI
Gerard Salton1
01 Sep 1982
TL;DR: A new Boolean retrieval environment is outlined in which the queries are automatically constructed from the original natural language query formulations provided by the users, and which produce better retrieval output than conventional retrieval operations based on manually prepared query statements.
Abstract: Conventional information retrieval systems use Boolean query formulations and inverted file technologies for search and retrieval purposes The need to construct complex Boolean queries in order to obtain the benefit of the existing retrieval operations constitutes a substantial burden for the users In most environments trained search intermediaries are used to facilitate the communication between system and userIn this note a new Boolean retrieval environment is outlined in which the queries are automatically constructed from the original natural language query formulations provided by the users Any available Boolean query formulations can also be improved automatically by using the natural language text of previously retrieved documents identified as relevant during previous searches The automatic queries can be formulated in a standard Boolean system, or in an extended system in which the interpretation of the Boolean operators and and or is relaxed In either case the automatic Boolean manipulations produce better retrieval output than conventional retrieval operations based on manually prepared query statements

10 citations


Journal ArticleDOI
TL;DR: The design and development of on-line documentation for a minicomputer-based management information system is described and design choices are outlined, compared with paper ones, and human engineering and “software psychology” issues are reviewed.
Abstract: We describe the design and development of on-line documentation for a minicomputer-based management information system. We outline the design choices, compare on-line documents with paper ones, and review human engineering and “software psychology” issues. On-line documents are accessed from any dial-up terminal. Document retrieval shares a common user interface with other information activities like report generation, trouble reporting, and interuser communication. Documents are “modular” with properties that make them easier to create, use, and maintain.

8 citations


Journal ArticleDOI
TL;DR: The procedure is based on the modification method of document search patterns oriented on the users' informational needs not only as expressed in queries formulated but also in their opinions on the particular system answers.
Abstract: Methods of weighting of descriptors in document search patterns are discussed. The concept of relative indexing and the conditions which should be satisfied by the descriptor weight in a document search pattern according to the idea of relative indexing are determined. A relative indexing procedure is outlined and some of the consequences of such an approach are examined in examples. The procedure is based on the modification method of document search patterns oriented on the users' informational needs not only as expressed in queries formulated but also in their opinions on the particular system answers.

7 citations




Journal ArticleDOI
01 Jan 1982
TL;DR: This tutorial will review the basic and improved procedures that have been devised to respond to how automatic indexing can represent the subject content of a document and consider the more radical questions of whether the traditional recall and precision ratios are the best criteria for evaluating the effectiveness of online retrieval systems.
Abstract: SIGIR, from its very onset in the early 1960's, has been concerned with the development of automatic information retrieval systems and with improving the effectiveness of automatic indexing. Automatic indexing is a subsystem, or component, of an automatic information retrieval system. The term "automatic" implies that the process is to be accomplished by a set of computer programs rather than by the intellectual effort of skilled people. The essential questions that need to be answered are how automatic indexing can:• Adequately represent the subject content of a document;• Improve recall by increasing the number of relevant documents retrieved;• Improve precision by decreasing the number of non-relevant documents retrieved.In this tutorial, I will review the basic and improved procedures that have been devised to respond to each of these questions. Finally, after we have reviewed developments in automatic indexing, we will consider the more radical questions of:• Whether we can rely exclusively on automatic indexing to achieve adequate retrieval effectiveness in online interactive document retrieval systems, and• Whether the traditional recall and precision ratios are the best criteria for evaluating the effectiveness of online retrieval systems.

Dissertation
01 Jan 1982
TL;DR: This thesis contends that instead of models which assume equal levels of similarities between concepts, the links between the concepts should have values assigned to them to indicate the degree of similarity between the concept and that the world model of the system should be structured such that concepts which are related to one another be clustered together.
Abstract: Owing to the rise in the volume of literature, problems arise in the retrieval of required information. Various retrieval strategies have been proposed, but most of that are not flexible enough for their users. Specifically, most of these systems assume that users know exactly what they are looking for before approaching the system, and that users are able to precisely express their information needs according to l aid- down specifications. There has, however, been described a retrieval program THOMAS which aims at satisfying incompletely- defined user needs through a man- machine dialogue which does not require any rigid queries. Unlike most systems, Thomas attempts to satisfy the user's needs from a model which it builds of the user's area of interest. This model is a subset of the program's "world model" - a database in the form of a network where the nodes represent concepts since various concepts have various degrees of similarities and associations, this thesis contends that instead of models which assume equal levels of similarities between concepts, the links between the concepts should have values assigned to them to indicate the degree of similarity between the concepts. Furthermore, the world model of the system should be structured such that concepts which are related to one another be clustered together, so that a user- interaction would involve only the relevant clusters rather than the entire database such clusters being determined by the system, not the user. This thesis also attempts to link the design work with the current notion in psychology centred on the use of the computer to simulate human cognitive processes. In this case, an attempt has been made to model a dialogue between two people - the information seeker and the information expert. The system, called Thomas-II, has been implemented and found to require less effort from the user than Thomas.

Proceedings ArticleDOI
18 May 1982
TL;DR: The main point of this implementation was to demonstrate that an efficient, effective and flexible system can be constructed using modern techniques.
Abstract: The significant advances made in theoretical and experimental research in information retrieval have many implications for system design. One possible design for a document retrieval system based on these advances is presented. A major part of this system design has been implemented as a bibliography filing and retrieval system for the Computer Science department at the University of Massachusetts. The implementation issues considered here are functionality, user interface and file organization. The main point of this implementation was to demonstrate that an efficient, effective and flexible system can be constructed using modern techniques.

Proceedings ArticleDOI
18 May 1982
TL;DR: An information retrieval system FAKYR is described which incorporates a variety of methods for organizing and retrieving information and for evaluating retrieval effectiveness and is a comfortable and large method data base.
Abstract: An information retrieval system FAKYR is described which incorporates a variety of methods for organizing and retrieving information and for evaluating retrieval effectiveness. The system has been developed in order to support education and research in the area of information retrieval. With respect to this purpose FAKYR is a comfortable and large method data base.