scispace - formally typeset
Search or ask a question

Showing papers on "Document retrieval published in 1984"


Journal ArticleDOI
TL;DR: Analyse des variables affectant, peu ou prou, l'efficacite de the Recherche en ligne: politique d'indexation, strategie de recherche, taille de la base de donnees et taux de couverture des domaines specialises etc.
Abstract: Analyse des variables affectant, peu ou prou, l'efficacite de la recherche en ligne: politique d'indexation, strategie de recherche, taille de la base de donnees et taux de couverture des domaines specialises etc

114 citations


Proceedings ArticleDOI
02 Jul 1984
TL;DR: This paper investigates the use of adaptive mechanisms to control the selection of search strategies and indicates that, although an adaptive mechanism is capable of learning the appropriate response in simple situations, there are serious problems with using them to make complex decisions in a document retrieval system.
Abstract: A document retrieval system can incorporate many types of flexibility. One example of this is the ability to choose a search strategy that is appropriate for a particular user and query. This paper investigates the use of adaptive mechanisms to control the selection of search strategies. The experimental results indicate that, although an adaptive mechanism is capable of learning the appropriate response in simple situations, there are serious problems with using them to make complex decisions in a document retrieval system.

38 citations


Journal ArticleDOI
TL;DR: The logical design and implementation of document retrieval systems have lagged behind other fields of information technology in large part because the fundamental issues or problems are often unclear.
Abstract: The computerized retrieval of documents or texts from large databases is an area of increasing concern for those who design or use information management systems. The eXplosive growth of word processing and electronic mail systems is creating document databases of substantial size that will require sophisticated retrieval systems if users are to have satisfactory access. Unfortunately, document retrieval design has been the poor stepchild of the computer revolution. Largescale document retrieval system evaluations are few and often inconclusive, and commercial implementations of any but the most simple logical designs are almost unheard of. Popularized retrieval systems such as LEXIS, DIALOG, ORBIT, STAIRS, and SPIRES are based on logical retrieval designs that date back to the 1950s. The logical design and implementation of document retrieval systems have lagged behind other fields of information technology in large part because the fundamental issues or problems are often unclear. Commercially developed document retrieval systems have frequently treated document retrieval as merely a variant of data retrieval, assuming that advances in data retrieval technology will automatically translate into

36 citations


Journal ArticleDOI
TL;DR: The modern history of the information retrieval thesaurus may be dated from 1947, but at this early stage a confusion of terminology and purpose was evident and the situation was only partly clarified by the emergence of the first fully operational information retrievalThesaurus in 1959.
Abstract: The modern history of the information retrieval thesaurus may be dated from 1947. Even at this early stage a confusion of terminology and purpose was evident. The situation was only partly clarified by the emergence of the first fully operational information retrieval thesaurus in 1959. The intervening period produced a number of theoretical and practical contributions which shaped the thesaurus concept for operational use. The major ideas and influences of this period are examined and related.

30 citations


Journal ArticleDOI
TL;DR: The amount of new information gained by the user as a result of the search; and the user's ultimate satisfaction with the quality of the items retrieved are explored.
Abstract: The purpose of this research study was to undertake a systematic investigation into the relationships among: (1) the techniques used by search analysts during preliminary interviews with users before engaging in online retrieval of bibliograpThe purpose of this research study was to undertake a systematic investigation into the relationships among: (1) the techniques used by search analysts during preliminary interviews with users before engaging in online retrieval of bibliograpThe purpose of this research study was to undertake a systematic investigation into the relationships among: (1) the techniques used by search analysts during preliminary interviews with users before engaging in online retrieval of bibliograpThe purpose of this research study was to undertake a systematic investigation into the relationships among: (1) the techniques used by search analysts during preliminary interviews with users before engaging in online retrieval of bibliographic citations; (2) the amount of new information gained by the user as a result of the search; and (3) the user's ultimate satisfaction with the quality of the items retrieved. A series of controlled experiments were conducted to explore the effects of two interview techniques: the conscious use of “open” and “closed” questions and the use of pauses of different lengths by the search analyst during the online negotiation interview. Data were collected on various aspects of the user's need for information, the value he/she placed upon new knowledge, and the consequences of inadequate information. The analytical technique used was path analysis. While search analysts displayed no difficulty in asking open and closed questions, they found considerable difficulty in controlling the lengths of pauses. Among the findings were the following: the asking of open and closed questions had a modest effect on the amount learned by the users; the type of pause did have a significant effect on the amount clients learned; average user satisfaction was higher when open questions were asked; overall satisfaction was lower when moderate pauses were used; those learning most about their topic were, overall, more satisfied than those who learned less; those placing high importance on the information obtained tended to have lower satisfaction scor and “closed” questions and the use of pauses of different lengths by the search analyst during the online negotiation interview. Data were collected on various aspects of the user's need for information, the value he/sh and “closed” questions and the use of pauses of different lengths by the search analyst during the online negotiation interview. Data were collected on various aspects of the user's need for information, the value he/sh and “closed” questions and the use of pauses of different lengths by the search analyst during the online negotiation interview. Data were collected on various aspects of the user's need for information, the value he/she place upon new knowledge, and the consequences of inadequate information. The analytical technique used was path analysis. While search analysts displayed no difficulty in asking open and closed questions, they found considerable difficulty in controlling the lengths of pauses. Among the findings were the followlng: the ask. ing of open and closed questions had a modest effect on the amount learned by the users; the type of pause did have a significant effect on the amount clients learned; average user satisfaction was higher when open questions were asked; overall satisfaction was lower when moderate pauses were used; those learning most about their topic were, overall, more satisfied than those who learned less; those placing high importance on the information obtained tended to have lower satisfaction scores.

27 citations


Journal ArticleDOI
TL;DR: ANNOD is a retrieval system which combines use of probabilistic, linguistic, and empirical means to rank individual paragraphs of full text for their similarity to natural language queries proposed by users.
Abstract: “A Navigator of Natural Language Organized Data” (ANNOD) is a retrieval system which combines use of probabilistic, linguistic, and empirical means to rank individual paragraphs of full text for their similarity to natural language queries proposed by users. ANNOD includes common word deletion, word root isolation, query expansion by a thesaurus, and application of a complex empirical matching (ranking) algorithm. The Hepatitis Knowledge Base, the text of a prototype information system, was the file used for testing ANNOD. Responses to a series of users' unrestricted natural language queries were evaluated by three testers. Information needed to answer 85 to 95‰ of the queries was located and displayed in the first few selected paragraphs. It was successful in locating information in both the classified (listed in Table of Contents) and unclassified portions of text. Development of this retrieval system resulted from the complementarity of and interaction between computer science and medical domain expert knowledge. Extension of these techniques to larger knowledge bases is needed to clarify their proper role.

24 citations


Journal ArticleDOI
TL;DR: The organization of a natural language interface for data retrieval (a “question—answering system”) and some of the approaches being taken to text structuring are outlined.
Abstract: Natural language processing has two primary roles to play in the storage and retrieval of large bodies of information: providing a friendly, easily-learned interface to information retrieval systems, and automatically structuring texts so that their information can be more easily processed and retrieved. This article outlines the organization of a natural language interface for data retrieval (a “question—answering system”) and some of the approaches being taken to text structuring. It closes by describing a few of the research issues in computational linguistics and a possibility for using interactive natural language processing for information acquisition.

19 citations


Journal ArticleDOI
TL;DR: Best match retrieval experiments with three collections of documents and queries show that the DAP is very much more efficient than a conventional mainframe computer in calculating a measure of similarity between a query and each of the documents in a large collection.
Abstract: The ICL Distributed Array Processor, or DAP, is a single instruction stream, multiple data stream computer in which instructions are broadcast for simultaneous execution in each of 4096 processing elements. Although originally developed for numeric computation, the DAP also provides a means for the rapid matching of the term lists representing documents and queries in information retrieval systems, and this paper presents an investigation of the use of the DAP for the parallel searching of large serial files of documents. Best match retrieval experiments with three collections of documents and queries show that the DAP is very much more efficient than a conventional mainframe computer in calculating a measure of similarity between a query and each of the documents in a large collection. It is suggested that the DAP, or machines with similar architectures, could form the basis for interactive bibliographic searching of serial files.

14 citations


Proceedings ArticleDOI
K. L. Kwok1
02 Jul 1984
TL;DR: Extension of the concept of cited title terms to citing title terms shows that these two approaches are compatible with the current two competing models of probability of relevance for document retrieval, if a document can also be regarded as a query.
Abstract: The use of cited title terms of a scientific document for automatic indexing is explored. It offers a means of index term selection as well as term relevance weighting, based on author-provided relevance information and Bayes Theorem as in probabilistic retrieval. The latter quantitative consideration leads to a new measure of document-document similarity measure which is shown to have importance both for initial search and in relevance feedback retrieval, by offering a choice of iterative strategies.Extension of the concept of cited title terms to citing title terms shows that these two approaches are compatible with the current two competing models of probability of relevance for document retrieval (Robertson et al. 1982), if a document can also be regarded as a query. Their term usage may therefore provide the necessary statistics for parameter estimation to test both theories.

13 citations


Journal Article
01 Jan 1984-Online
TL;DR: Enquete aupres de 70 groupes d'utilisateurs dans six bibliotheques pour identifier les types d'assistance a l-utilisateur proposee dans le contexte de the recherche sur catalogue automatise.
Abstract: Enquete aupres de 70 groupes d'utilisateurs dans six bibliotheques pour identifier les types d'assistance a l'utilisateur proposee dans le contexte de la recherche sur catalogue automatise

12 citations


Journal ArticleDOI
TL;DR: Cluster based retrieval experiments based on the generation of hierarchic document classifications, such as those arising from the use of the single linkage clustering method, give results that are comparable in effectiveness with those obtained using the full similarity matrix.
Abstract: Best match search algorithms provide an efficient means of identifying the sets of nearest neighbors for each of the documents in a collection. These sets contain much of the important similarity data contained in a full interdocument similarity matrix and may be used for the generation of hierarchic document classifications, such as those arising from the use of the single linkage clustering method. Cluster based retrieval experiments based upon such classifications are shown to give results that are comparable in effectiveness with those obtained using the full similarity matrix.

Journal Article
01 Jan 1984-Online
TL;DR: In this paper, les bibliotheques peuvent aider l'utilisateur final a effectuer efficacement ses recherches en ligne, et.
Abstract: Comment les bibliotheques peuvent aider l'utilisateur final a effectuer efficacement ses recherches en ligne

Proceedings ArticleDOI
E. Barbi, F. Calvo, C. Perale, F. Sirovich, F. Turini 
TL;DR: The design and the implementation of a document retrieval system based on a concept extraction from documents based on an internal knowledge representation of the subject the documents deal with retrieving concepts from a text is described, performed by search/inferential algorithms.
Abstract: A document retrieval system is a quite useful tool in office environments, expecially for professionals who are interested in keeping along with the state of the art of their field by quickly consulting documents. The paper describes the design and the implementation of a document retrieval system based on a concept extraction from documents. Concept extraction is implemented using an internal knowledge representation (semantic network memory) of the subject the documents deal with retrieving concepts from a text is performed by search/inferential algorithms. Advantages of such methods are shown by comparing it to keyword extraction. The prototype system implementation is described. This system is actually in experimental phase. Several test descriptions provide a sketch of the first results.

Journal ArticleDOI
TL;DR: The procedures of selecting a competent expert and the ways of utilizing him for the ordering of documents in the system response to the user's query are described.
Abstract: Introduced into the model of a document retrieval system were users who are represented by their profiles. A separate group among them are experts. The possibilities of taking advantage of experts for retrieving documents are analysed. The procedures of selecting a competent expert and the ways of utilizing him for the ordering of documents in the system response to the user's query are described.



Book
01 Jan 1984
TL;DR: This study sees that communication (feedback) about the queries of inquirers searching for a given document can be incorporated by a retrieval system in order to redescribe that document so that its description matches better those queries.
Abstract: The central problem in document retrieval is that the subject of a document may be described in many different ways and, similarly, different inquirers may express similar information needs by a variety of different queries. This variance makes it difficult to get the "right" documents into the hands of the "right" inquirers, for retrieving a document by means of its subject description depends on that subject description adequately matching an inquirer's query. Document descriptions comprise only one part of a retrieval system, and a "good" document description is one that describes the subject of a document in a way that will match the queries of inquirers who will find that document relevant to their information need. In this study, we see that communication (feedback) about the queries of inquirers searching for a given document can be incorporated by a retrieval system in order to redescribe that document so that its description matches better those queries. An adaptive (genetic) algorithm, responsible for such redescription, achieves two aims: first, it increases the probability of a document's subject description matching a query to which the document is relevant (equivalently, it increases the degree of association between a document and a relevant query); second, the algorithm decreases the probability of a document's subject description matching a query to which the document is not relevant (equivalently, it decreases the degree of association between a document and a non-relevant query). Simulation experiments demonstrate the success of adaptive subject redescription in achieving these aims. The simulation technique, itself, is novel: By establishing a set of queries, (to some of which a document is relevant, the rest of which it is not), and measuring the association between the document's description and each of these queries, we obtain estimates of system recall and fallout without building an actual document collection. The method of obtaining such "simulated queries" is described. The simulation technique may help provide a solution to the problem of predicting the performance of a large-scale retrieval system based on its operation in a smaller-scale experimental setting.

Proceedings ArticleDOI
02 Jul 1984
TL;DR: The goals and design decisions for the Utah Retrieval System Architecture (URSA) are described, the prototype system's features and limitations are discussed, and the changes that will be made to produce the production version.
Abstract: The Utah Text Retrieval Project addresses a number of areas in information retrieval, including basic system structure, user interfaces integrating information retrieval with word processing, indexing techniques, and the use of specialized backend processors. Although the work on the development of a high-speed text search engine is generally the best known, probably the most exciting aspect of the project is the message-based architecture, which provides an adaptable testbed for information retrieval techniques. It can support a variety of index and search strategies, while instrumenting their performance so that they can be accurately compared in an identical environment.This paper describes the goals and design decisions for the Utah Retrieval System Architecture (URSA). It discusses the prototype system's features and limitations, and the changes that will be made to produce the production version.

Patent
28 Jun 1984
TL;DR: In this paper, the authors proposed a method to shorten a document retrieving time by detecting the presence of possibility of a specific character string exists in the document and previously selecting the document.
Abstract: PURPOSE:To shorten a document retrieving time by judging the presence of possibility of a specific character string exists in the document and previously selecting the document. CONSTITUTION:Respective characters in a document mixed with ''kanji'' (Chinese character) are encoded and recorded in a file 1 as binary-coded data. At that time, ''kanji'' codes in the document are processed to generate character existing flags 21-24, which are added to each document and recorded together with the document. At the retrieval of a document in the file, the character existing flags are generated in the same manner as recording in the file and the character existing flags of the objective document are loaded from the file to a memory to be compared. If the character existing flags of the document includes a retrieving character existing flag, the document may have a specific character string. Only in this case, the document is loaded on the memory and the existence of a specific character string is checked.

Journal ArticleDOI
TL;DR: L'integration de la recherche en ligne dans les cursus des ecoles de commerce dans la cadre de commerce est propose and les aspects equipement et cout sont abordes.
Abstract: L'integration de la recherche en ligne dans les cursus des ecoles de commerce. Un curriculum type est propose et les aspects equipement et cout sont abordes


Journal ArticleDOI
TL;DR: Simultaneous Remote Searching is a technique for slaving a remote terminal and performing an online bibliographic search at a remote site that is especially valuable for rural libraries, hospital libraries, or offices, where a trained searcher is not available.
Abstract: Simultaneous Remote Searching (SRS) is a technique for slaving a remote terminal and performing an online bibliographic search at a remote site. SRS is especially valuable for rural libraries, hospital libraries, or offices, where a trained searcher is not available. Through SRS, online searches can be available to any terminal or computer that has a modem.

Journal ArticleDOI
TL;DR: This is the seventh annual update of the bibliography that was published as a supplement to the first issue of Online Review.
Abstract: This is the seventh annual update of the bibliography that was published as a supplement to the first issue of Online Review. It is also the first update to the recently published concatenated bibliography containing the original issue plus the first six updates. This update covers the approximate time period late 1982 through early 1984. It contains 931 references.

Journal ArticleDOI
TL;DR: The objectives of the National Air and Space Museum to develop an information system providing for the capture of photographic, lineal and textual documents in high resolution digital images, to convert the digital images of text to standard ASCII computer code, and to retrieve the appropriate code automatically through keyword searching of full text are described.
Abstract: The objectives of the National Air and Space Museum — to develop an information system providing for the capture of photographic, lineal and textual documents in high resolution digital images, to convert the digital images of text to standard ASCII computer code, and to retrieve the appropriate code automatically through keyword searching of full text — are described. System assembly is examined, as are planned and potential applications.




Journal ArticleDOI
TL;DR: In support of an environmental research program in the field of water quality management at the Delft University of Technology, an exhaustive literature search was carried out, both online and manually.
Abstract: In support of an environmental research program in the field of water quality management at the Delft University of Technology, an exhaustive literature search was carried out, both online and manually. For online searching, information was retrieved from the databases Aqualine, Biosis, CA Search and Pascal, using the ESA/lnformation Retrieval System. For manual searching, professional journals and current awareness services were carefully studied.

Journal ArticleDOI
TL;DR: The results indicate that approximately half of the second queries have substantially different retrieved document sets when docu ment modification is used, and the general conclusion is that document mod ification can lead to improved information system performance.
Abstract: A method of using relevance judgements to modify docu ment descriptions has previously been proposed. An experi ment is described which investigates the effect that application of this method, based on one query, has on a second query. The results indicate that approximately half of the second queries have substantially different retrieved document sets when docu ment modification is used. Of these, about half behave better, and half worse, than originally. If the queries are repeated a second time, the proportion showing improved performance increases to 90% and only 3% still give inferior retrieved document sets. The general conclusion is that document mod ification can lead to improved information system perfor mance.