scispace - formally typeset
Search or ask a question

Showing papers on "Document retrieval published in 1985"


Journal ArticleDOI
TL;DR: An evaluation of a large, operational full-text document-retrieval system shows the system to be retrieving less than 20 percent of the documents relevant to a particular search.
Abstract: An evaluation of a large, operational full-text document-retrieval system (containing roughly 350,000 pages of text) shows the system to be retrieving less than 20 percent of the documents relevant to a particular search. The findings are discussed in terms of the theory and practice of full-text document retrieval.

871 citations


Journal ArticleDOI
TL;DR: This paper compares text retrieval methods intended for office systems with methods from database systems and from information retrieval systems, and examines the most interesting representatives of each class.
Abstract: This paper compares text retrieval methods intended for office systems. The operational requirements of the office environment are discussed, and retrieval methods from database systems and from information retrieval systems are examined. We classify these methods and examine the most interesting representatives of each class. Attempts to speed up retrieval with special purpose hardware are also presented, and issues such as approximate string matching and compression are discussed. A qualitative comparison of the examined methods is presented. The signature file method is discussed in more detail.

375 citations


Journal ArticleDOI
TL;DR: Initial experiments indicate that a RUBRIC rule set better matches human retrieval judgment than a standard Boolean keyword expression, given equal amounts of effort in defining each.
Abstract: A research prototype software system for conceptual information retrieval has been developed. The goal of the system, called RUBRIC, is to provide more automated and relevant access to unformatted textual databases. The approach is to use production rules from artificial intelligence to define a hierarchy of retrieval subtopics, with fuzzy context expressions and specific word phrases at the bottom. RUBRIC allows the definition of detailed queries starting at a conceptual level, partial matching of a query and a document, selection of only the highest ranked documents for presentation to the user, and detailed explanation of how and why a particular document was selected. Initial experiments indicate that a RUBRIC rule set better matches human retrieval judgment than a standard Boolean keyword expression, given equal amounts of effort in defining each. The techniques presented may be useful in stand-alone retrieval systems, front-ends to existing information retrieval systems, or real-time document filtering and routing.

149 citations


Journal ArticleDOI
TL;DR: Observation and analysis of about ninety searches resulted in a list of eighteen operational moves, or modifications of query formulation, that keep the meaning of query components unchanged, and twelve conceptual moves which change the meaningof query components.
Abstract: Moves, or changes in query formulation, are made to resolve three problem situations: (1) when retrieved sets are too large; (2) when they are too small; or (3) when retrieved sets are off‐target. Observation and analysis of about ninety searches resulted in a list of eighteen operational moves, or modifications of query formulation, that keep the meaning of query components unchanged, and twelve conceptual moves which change the meaning of query components. All these moves are explained and then related to search tactics and strategies.

141 citations


01 Oct 1985
TL;DR: The main goal of this thesis is to compare clustered file searches and inverted file searches in order to determine under what circumstances one search is to be preferred over the other.
Abstract: The major component of a document retrieval system is the component that searches the document collection and selects the documents to be returned in response to a query. Since users wait for the results of the search, the component must be efficient as well as effective. The main goal of this thesis is to compare clustered file searches and inverted file searches in order to determine under what circumstances one search is to be preferred over the other. A preliminary goal is to define a good cluster search. Three types of agglomerative clustering strategies, the single link, the complete link, and the group average link methods, are investigated. Searches of the single link hierarchy, the cluster hierarchy used extensively in previous research, are shown to be inferior to searches of the other hierarchy types. Searches of the group average link and complete link hierarchies perform similarly for small collections; for larger collections, searches of the complete link hierarchy are more effective. A top-down search of the group average link hierarchy is the most time efficient search asymptotically. The experimental evidence suggests that the difference in the efficiency and effectiveness of the complete link and group average link searches is due to the restricted depth of the complete link hierarchy. The depth of the group average link hierarchy increases as the size of the collection increases, but the depth of the complete link hierarchy does not. Thus the largest clusters in the complete link hierarchy are not very large, and the clusters can be accurately represented by centroids. Since the depth of the hierarchy does not increase with collection size, searches of the complete link hierarchy should remain effective for larger collections. The top-down search of the complete link hierarchy is somewhat more effective than the inverted file search. The relative efficiency of the two searches depends on the relative efficiency of accessing a page and computing a similarity, since the cluster search accesses many more pages but computes fewer similarities than the inverted file search. For an inexpensive similarity measure, the inverted file search is much more efficient.

131 citations



Journal ArticleDOI
TL;DR: Minor adjustments have been made for the display of full text databases, allowing words resulting in retrieval to be displayed in context; but changes have not been made in retrieval techniques.
Abstract: Complete texts of many journals are now available for online searching. Most of these full text databases have been made available on the same or similar search systems that provide access to bibliographic information. The systems use inverted files that retain limited context information (e.g., paragraphs and location of words within paragraphs). The retrieval techniques used are simply those that were developed earlier for bibliographic databases. Retrieval relies on Boolean logic, word stem searching with truncation, and word proximity specification. Minor adjustments have been made for the display of full text databases, allowing words resulting in retrieval to be displayed in context; but changes have not been made in retrieval techniques. This is due to the reliance on search systems that provide access to many types of databases, all of which are by‐products of improved techniques for creating printed publications.

67 citations


Journal ArticleDOI
TL;DR: This article presents the principle, procedures and rules which are utilized in the expert system, which assists users in selecting the right vocabulary terms for a database search.
Abstract: An expert system was developed in the area of information retrieval, with the objective of performing the job of an information specialist, who assists users in selecting the right vocabulary terms for a database search. The system is composed of two components: One is the knowledge base, represented as a semantic network, in which the nodes are words, concepts, phrases, comprising a vocabulary of the application area and the links express semantic relationships between those nodes. The second component is the rules, or procedures, which operate upon the knowledge-base, analogous to the decision rules or work patterns of the information specialist. Two major stages comprise the consulting process of the system: During the “search” stage relevant knowledge in the semantic network is activated, and search and evaluation rules are applied in order to find appropriate vocabulary terms to represent the user's problem. During the “suggest” stage those terms are further evaluated, dynamically rank-ordered according to relevancy, and suggested to the user. Explanations to the findings can be provided by the system and backtracking is possible in order to find alternatives in case some suggested term is rejected by the user. This article presents the principle, procedures and rules which are utilized in the expert system.

62 citations


Proceedings ArticleDOI
05 Jun 1985
TL;DR: The prototype system, called RUBRIC, is designed to help IR professionals gain easy access to large unformatted full text databases and can give significant improvements over commercially available Boolean keyword systems such as DIALOG, LEX1S, and MEDLARS.
Abstract: This paper describes an ongoing investigation into the application of ideas from Artificial Intelligence (AI) in the development of a computer-based aid for Information Retrieval (IR). The prototype system, called RUBRIC, is designed to help IR professionals gain easy access to large unformatted full text databases. Knowledge about retrieval requests is encoded in RUBRIC as a collection of rules with attached uncertainty values. This representation provides an appropriately expressive query language that can represent partial relevance and which is easily understood and modified. When coupled with an effective user interface, the rule-based approach can, we believe, give significant improvements over commercially available Boolean keyword systems such as DIALOG, LEX1S, and MEDLARS. At the same time, it avoids the theoretical and computational problems associated with full scale natural language processing of documents (e.g., as proposed by Lebowitz [1]), and the dil~eulties users have in understanding the mechanisms used in statistical approaches (e.g., Salton's SMART system

35 citations


Proceedings Article
21 Aug 1985
TL;DR: A modeling approach based on the type definition and the use of types in query formulation and processing is presented, allowing the definition of types at different levels of detail (type hierarchies).
Abstract: The problem of the retrieval by content of office documents is addressed here. However, the retrieval by content is greatly enhanced if the semantic role of document objects can he described. For this reason we introduce a conceptual level of modeling resulting in the definition of conceptual structures of documents. Type definition is essential for the retrieval, but since office document structures tend to greatly differ from instance to instance, we introduce the concept of weak type, allowing the definition of types at different levels of detail (type hierarchies). In this paper a modeling approach based on these ideas is presented. Particular emphasis is put on the type definition and the use of types in query formulation and processing.

33 citations


Journal ArticleDOI
15 Nov 1985-JAMA
TL;DR: A pilot test of a full-text, medical literature retrieval service demonstrated its capabilities for on-line search and retrieval of references, abstracts, and/orFull-text journal articles for medical practice, medical education, and research.
Abstract: A pilot test of a full-text, medical literature retrieval service demonstrated its capabilities for on-line search and retrieval of references, abstracts, and/or full-text journal articles During a three-month test period, more than 500 health care professionals conducted 9,377 searches using computer terminals located in seven different health care sites Searches were initiated for purposes of patient care, medical education, research, or for browsing The majority of responders to a questionnaire given during the test period said they would continue to use the service during the pilot test, and only about 1% reported the search process difficult to use or not "user-friendly" It is predictable that with a comprehensive data base, full-text medical literature retrieval can be very useful for medical practice, medical education, and research (JAMA1985;254:2768-2774)

Journal ArticleDOI
TL;DR: The comparison shows that certain features of a database system can have a significant effect on the efficiency of the implementation, and it appears that a database implementation of a sophisticated document retrieval system can be competitive with a stand-alone implementation.

Journal ArticleDOI
TL;DR: The method described solves the problems that Radecki-who uses lambda-level fuzzy sets-met with, trying to reduce time, and gives rise to a good working Information Retrieval System, as the examples show.

Journal ArticleDOI
Ron Sacks-Davis1
TL;DR: It is shown that the method performs well on query and is efficient of storage, and experimental results based on the use of this method are presented.

Proceedings ArticleDOI
Edward A. Fox1
05 Jun 1985
TL;DR: It is of interest to consider how Boolean logic systems can be extended to give better performance, especially with composite documents, and to integrate those approaches with vector methods.
Abstract: Experimental information retrieval (IR) systems, some dating back to the sixties, have demonstrated the viability of fully automatic document storage and retrieval methodologies with small to medium size bibliographic collections [72]. Many of these experimental systems utilize the vector space model in which each important term (such as a word stem) identifies a different dimension in a space, so that matrix methods and vector operations can be defined on queries and documents. Statistical techniques have been very effective, and probabilistic enhancements have given additional improvements [84]. However, the basic vector space model is oriented towards recording the essential information in the text of a title/abstract combination rather than describing more complex document structures. It is necessary to extend the model in order to handle composite documents.On the other hand, commonly available retrieval systems that employ Boolean logic queries and utilize inverted file storage schemes can without modification accommodate such documents, albeit with somewhat less effectiveness than is possible with more sophisticated systems. Hence, it is also of interest to consider how Boolean logic systems can be extended to give better performance, especially with composite documents, and to integrate those approaches with vector methods.

Book ChapterDOI
01 Jan 1985
TL;DR: Various approaches to text retrieval machines for large text database are surveyed and designs for multiple response resolution, an important but often ignored issue in associative memory and processors, are reviewed.
Abstract: Various approaches to text retrieval machines for large text database are surveyed. Signature processors for supporting superimposed coding are first described. Text processors for pattern matching are then categorized and discussed. Finally, various designs for multiple response resolution, an important but often ignored issue in associative memory and processors, are reviewed.

Journal ArticleDOI
TL;DR: The history of separate online system interfaces, leading to efforts to develop expert systems for searching databases, particularly for end users, are reviewed and the research in such expert systems is introduced.
Abstract: This paper reviews the history of separate online system interfaces, leading to efforts to develop expert systems for searching databases, particularly for end users, and introduces the research in such expert systems. Appended is a bibliography of sources on interfaces and expert systems for online retrieval.



Journal ArticleDOI
TL;DR: Results of a series of five experiments on human information seeking behavior in three different information seeking environments are presented and a conceptual model of how humans value information is presented.
Abstract: The paper presents and synthesizes results of a series of five experiments on human information seeking behavior in three different information seeking environments. The first three experiments utilized a highly-controlled, simulated information seeking task developed to study human search strategies in citation networks. Emphasis in the fourth and fifth experiments was placed on assessing the value of information for humans in realistic search environments. Subjects search on a topic of their own choice in a data base of fiction in Experiment Four and a data base of technical literature in Experiment Five. After summarizing the experimental results, a conceptual model of how humans value information is presented. The model is then used as a basis for a broad interpretation of the empirical results. Implications of both the empirical and modeling results are considered for the areas of information retrieval logic, system flexibility, retrieval methods, types of aiding, online estimation of information value, and computerizing versus computer-aiding.

Patent
23 Aug 1985
TL;DR: In this paper, a character file system consisting of a control subsystem 100 for providing the control of the whole system and a data base function, an input subsystem 200, a document recognizing device 300 for recognizing a document, a text search system 400 for executing high speed text search, and a terminal subsystem 800 for executing retrieval.
Abstract: PURPOSE: To provide the titled system with a full text searching function while directly referring the text of a document to retrieve the document by simultaneously executing the storage of test data and the retrieval of a character string from the read text data. CONSTITUTION: The character filing system consists of a control subsystem 100 for providing the control of the whole system and a data base function, an input subsystem 200, a document recognizing device 300 for recognizing a document, a text search system 400 for executing high speed text search, and a terminal subsystem 800 for executing retrieval. The text data read out from text files 451W453 are applied to the device 300 and the parallel search of character strings is executed. COPYRIGHT: (C)1987,JPO&Japio

Journal ArticleDOI
TL;DR: PSI currently provides a common command language for access to multiple document retrieval systems and it is shown that PSI could be extended to provide this same command language to access DBMS, whether the DBMS are relational or network.
Abstract: Due to their ready availability, database management systems are being applied to bibliographic databases with increasing frequency. This is being done in spite of the fact that although DBMS query languages tend to be very powerful, they are far too complex for the casual user. It is proposed that PSI, an existing virtual-system intermediary for document retrieval systems, be extended to include access to DBMS containing bibliographic data in order to circumvent the complexity problem or the casual user. PSI currently provides a common command language for access to multiple document retrieval systems. It is shown that PSI could be extended to provide this same command language to access DBMS, whether the DBMS are relational or network.

01 Jan 1985
TL;DR: Two elements make the idea of automatic full-text retrieval even more attractive: digital technology continues to provide computers that are larger, faster, cheaper, more reliable, and easier to use; and, on the other hand, full- text retrieval avoids the risks of human error.
Abstract: Document retrieval is the problem of finding stored documents that contain useful information. There exist a set of documents on a range of topics, written by different authors, at different times, and at varying levels of depth, detail, clarity, and precision, and a set of individuals who, at different times and for different reasons, search for recorded information that may be contained in some of the documents in this set. In each instance in which an individual seeks information, he or she will find some documents of the set useful and other documents not useful; the documents found useful are, we say, relevant; the others, not relevant. How should a collection of documents be organized so that a person can find all and only the relevant items? One answer is automatic full-text retrieval, which on its surface is disarmingly simple: Store the full text of all documents in the collection on a computer so that every character of every word in every sentence of every document can be located by the machine. Then, when a person wants information from that stored collection, the computer is instructed to search for all documents containing certain specified words and word combinations, which the user has specified. Two elements make the idea of automatic full-text retrieval even more attractive. On the one hand, digital technology continues to provide computers that are larger, faster, cheaper, more reliable, and easier to use; and, on the other hand, full-text retrieval avoids the

Journal ArticleDOI
TL;DR: Proposals are made for practical approaches to the design of electronic office systems to provide for the effective storage and retrieval of the documents which they generate.
Abstract: Electronic office systems involving the creation, transmission and storage of documents are now being installed with fifty or more user stations. Little provision is yet being made for the filing and retrieval of documents held within the systems, and problems common in conventional filing and registry practice will be at least as difficult in the electronic office. Proposals are made for practical approaches to the design of electronic office systems to provide for the effective storage and retrieval of the documents which they generate.

Journal Article
01 Jan 1985-Database
TL;DR: Certaines bases de donnees permettent l'interrogation par noms d'individus, d'organismes, de systemes etc., on donne des exemples de ce type of recherches et oficiales qu'elles presentent.
Abstract: Certaines bases de donnees permettent l'interrogation par noms d'individus, d'organismes, de systemes etc. On donne des exemples de ce type de recherches et de l'avantage qu'elles presentent

Proceedings ArticleDOI
05 Jun 1985
TL;DR: The background of URSA and its structure is discussed, with particular emphasis on the features that make it a good testbed for information retrieval techniques.
Abstract: The Utah Retrieval System Architecture provides an excellent testbed for the development and testing of new algorithms or techniques for information retrieval. URSA™ is a message-based structure capable of running on a variety of system configurations, ranging from a single mainframe processor to a system distributed across a number of dissimilar processors. It can readily support a variety of specialized backend processors, such as high-speed search engines.The architecture divides the components of a text retrieval system into two classes: servers and clients. A triple of servers (index, search, and document access) for each database provide the capabilities normally associated with a retrieval system. Possible clients for these servers include a window-based user interface, whose query language can be easily modified, a connection to a mainframe host processor, or Al-based query modification programs that wish to use the database.Any module in the system can be replaced by a new module using a different algorithm as long as the new module complies with the message formats for that function. In fact, with some care this module switch can occur while the system is running, without affecting the users. A monitor program collects statistics on all system messages, giving information regarding query complexity, processing time for each module, queueing times, and bandwidths between every module.This paper discusses the background of URSA and its structure, with particular emphasis on the features that make it a good testbed for information retrieval techniques.

Journal Article
TL;DR: The evolution of the Washington University School of Medicine BACS integrated library system toward information management functions is outlined and it is argued that libraries are flexible institutions that are likely to enlarge rather than to diminish.
Abstract: The evolution of the Washington University School of Medicine BACS integrated library system toward information management functions is outlined. The creation of a machine-readable database and its extension through telecommunications have consequences that reach beyond the functions of the library as we have perceived them. It is argued that libraries are flexible institutions that, with automation, are likely to enlarge rather than to diminish.


01 Jan 1985
TL;DR: A driving device for a pieZoelectric element for electrically driving a piezoelectrics element to obtain a predetermined mechanical displacement is provided.
Abstract: A driving device for a piezoelectric element for electrically driving a piezoelectric element to obtain a predetermined mechanical displacement, the piezoelectric element driving device being provided with a transformer having a primary winding and a secondary winding, the secondary winding being connected to the piezoelectric element; a switching element connected to the primary winding and controlling the amount of energy stored in an air-gap of the core of the transformer; and an energy control means for driving the switching element so that the amount of energy stored in the air-gap of the core of the transformer becomes a set value, the displacement of the piezoelectric element being controlled by the amount of energy stored in the air-gap of the core of the transformer.