scispace - formally typeset
Search or ask a question
Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.


Papers
More filters
Patent
27 May 1993
TL;DR: A method of registering document information in a document information retrieval system which stores document information consisting of a large number of characters for retrieval of the stored document information is discussed in this paper.
Abstract: A document information compression and retrieval system which reduces the document data amount and shortens the retrieval time when mass document information is registered and retrieved. A method of registering document information in a document information retrieval system which stores document information consisting of a large number of characters for retrieval of the stored document information. Entered document information is separated into words. Whether or not each of the words is a word to which a compressed code is assigned is determined. If not already assigned, a compressed code is assigned to the word. The words are converted into the assigned compressed codes for storing a compressed text. At output, retrieval information is accepted and converted into compressed code and stored compressed texts are searched for the compressed text matching the compressed code of the retrieval information, then the words corresponding to the compressed codes are used to expand the compressed text into original document information.

80 citations

Proceedings ArticleDOI
07 Jul 2001
TL;DR: The results show that even with apparently incomprehensible system output, humans without any knowledge of Tamil can achieve performance rates as high as 86% accuracy for topic identification, 93% recall for document retrieval, and 64% recall on question answering.
Abstract: We report on our experience with building a statistical MT system from scratch, including the creation of a small parallel Tamil-English corpus, and the results of a task-based pilot evaluation of statistical MT systems trained on sets of ca. 1300 and ca. 5000 parallel sentences of Tamil and English data. Our results show that even with apparently incomprehensible system output, humans without any knowledge of Tamil can achieve performance rates as high as 86% accuracy for topic identification, 93% recall for document retrieval, and 64% recall on question answering (plus an additional 14% partially correct answers).

80 citations

Journal Article
TL;DR: In this paper, a method for measuring the similarity between two texts represented as conceptual graphs is presented, based on well-known strategies of text comparison, such as Dice coefficient, with new elements introduced due to the bipartite nature of the conceptual graphs.
Abstract: The use of conceptual graphs for the representation of text contents in information retrieval is discussed. A method for measuring the similarity between two texts represented as conceptual graphs is presented. The method is based on well-known strategies of text comparison, such as Dice coefficient, with new elements introduced due to the bipartite nature of the conceptual graphs. Examples of the representation and comparison of the phrases are given. The structure of an information retrieval system using two-level document representation, traditional keywords and conceptual graphs, is presented.

79 citations

Proceedings ArticleDOI
01 Dec 1989
TL;DR: A study evaluating how easily enhanced queries can be acquired from users and how effectively this additional knowledge can be used in retrieval indicates that significant effectiveness benefits can be obtained through the acquisition of domain concepts related to query concepts.
Abstract: In some recent experimental document retrieval systems, emphasis has been placed on the acquisition of a detailed model of the information need through interaction with the user. It has been argued that these “enhanced” queries, in combination with relevance feedback, will improve retrieval performance. In this paper, we describe a study with the aim of evaluating how easily enhanced queries can be acquired from users and how effectively this additional knowledge can be used in retrieval. The results indicate that significant effectiveness benefits can be obtained through the acquisition of domain concepts related to query concepts, together with their level of importance to the information need.

79 citations

Journal ArticleDOI
TL;DR: The ability to ask the questions one needs to ask as the foundation of performance evaluation, and recall and discrimination as the basic quantitative performance measures for binary noninteractive retrieval systems are established.
Abstract: This article presents a logical analysis of the characteristics of indexing and their effects on retrieval performance. It establishes the ability to ask the questions one needs to ask as the foundation of performance evaluation, and recall and discrimination as the basic quantitative performance measures for binary noninteractive retrieval systems. It then defines the characteristics of indexing that affect retrieval—namely, indexing devices, viewpoint-based and importance-based indexing exhaustivity, indexing specificity, indexing correctness, and indexing consistency—and examines in detail their effects on retrieval. It concludes that retrieval performance depends chiefly on the match between indexing and the requirements of the individual query and on the adaptation of the query formulation to the characteristics of the retrieval system, and that the ensuing complexity must be considered in the design and testing of retrieval systems. © 1994 John Wiley & Sons, Inc.

78 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
81% related
Metadata
43.9K papers, 642.7K citations
79% related
Recommender system
27.2K papers, 598K citations
79% related
Ontology (information science)
57K papers, 869.1K citations
78% related
Natural language
31.1K papers, 806.8K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202239
2021107
2020130
2019144
2018111