scispace - formally typeset
Search or ask a question
Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.


Papers
More filters
Journal ArticleDOI
01 Apr 2003
TL;DR: This work demonstrates how the World Wide Web can be mined in a fully automated manner for discovering the semantic similarity relationships among the concepts surfaced during an electronic brainstorming session, and thus improving the accuracy of automated clustering meeting messages.
Abstract: This work demonstrates how the World Wide Web can be mined in a fully automated manner for discovering the semantic similarity relationships among the concepts surfaced during an electronic brainstorming session, and thus improving the accuracy of automated clustering meeting messages. Our novel Context Sensitive Similarity Discovery (CSSD) method takes advantage of the meeting context when selecting a subset of Web pages for data mining, and then conducts regular concept co-occurrence analysis within that subset. Our results have implications on reducing information overload in applications of text technologies such as email filtering, document retrieval, text summarization, and knowledge management.

103 citations

Journal ArticleDOI
TL;DR: In this article, the authors cover the recent research in extending the document retrieval techniques to a broader class of sequence collections and uncover a rich world of relations between document retrieval challenges and fundamental problems on trees, strings, range queries, discrete geometry, and other areas.
Abstract: Document retrieval is one of the best-established information retrieval activities since the ’60s, pervading all search engines. Its aim is to obtain, from a collection of text documents, those most relevant to a pattern query. Current technology is mostly oriented to “natural language” text collections, where inverted indexes are the preferred solution. As successful as this paradigm has been, it fails to properly handle various East Asian languages and other scenarios where the “natural language” assumptions do not hold. Inthis survey, we cover the recent research in extending the document retrieval techniques to a broader class of sequence collections, which has applications in bioinformatics, data and web mining, chemoinformatics, software engineering, multimedia information retrieval, and many other fields. We focus on the algorithmic aspects of the techniques, uncovering a rich world of relations between document retrieval challenges and fundamental problems on trees, strings, range queries, discrete geometry, and other areas.

103 citations

Patent
Morita Tetsuya1
05 Oct 1990
TL;DR: In this paper, a document retrieval system includes an inputting unit for inputting a retrieval condition including one or a plurality of keywords and a weight value for each keyword, an operating unit having first factors corresponding to relationship values, each relationship value being defined as a degree of the relationship between two keywords out of keywords which are predetermined in the document retrieval systems, and second factors correspond to importance values.
Abstract: A document retrieval system includes an inputting unit for inputting a retrieval condition including one or a plurality of keywords and a weight value for each keyword, an operating unit having first factors corresponding to relationship values, each relationship value being defined as a degree of the relationship between two keywords out of keywords which are predetermined in the document retrieval system and second factors corresponding to importance values, each importance value being defined as a degree of importance of a keyword in each one of a plurality of documents which are predetermined in the document retrieval system, the operation unit generating a relevance value, which represents a degree of relevance in satisfying a user's requirement, for each of the documents on the basis of the retrieval condition input from the inputting unit, the first factors and the second factors, and an outputting unit for outputting the relevance value for each of the documents as a retrieval result.

102 citations

Journal ArticleDOI
TL;DR: This work looks at the weights from an entirely different approach involving thresholds, and generates an improved evaluation mechanism which seems to fulfill a larger subset of the desired criteria than previous mechanisms.
Abstract: There has been a good deal of work on information retrieval systems that have continuous weights assigned to the index terms that describe the records in the database, and/or to the query terms that describe the user queries. Recent articles have analyzed retrieval systems with continuous weights of either type and/or with a Boolean structure for the queries. They have also suggested criteria which such systems ought to satisfy and record evaluation mechanisms which partially satisfy these criteria. We offer a more careful analysis, based on a generalization of the discrete weights. We also look at the weights from an entirely different approach involving thresholds, and we generate an improved evaluation mechanism which seems to fulfill a larger subset of the desired criteria than previous mechanisms. This new mechanism allows the user to attach a “threshold” to the query term.

102 citations

Journal ArticleDOI
TL;DR: This paper compared information retrieval using lattice-based hybrid navigation with conventional Boolean querying, and showed that the performance of lattice retrieval was comparable to or better than Boolean retrieval.
Abstract: In this paper we present a comprehensive approach to automatic organization and hybrid navigation of text databases. An organizing stage first builds a particular lattice representation of the data, through text indexing followed by lattice clustering of the indexed texts. The lattice representation, then, supports the navigation stage of the system, a visual retrieval interface that combines three main retrieval strategies: browsing, querying, and bounding. Browsing and querying are used to search the retrieval space, bounding is used to restrict it based on the information that users have, or get during their interaction with the system. We show that such a hybrid paradigm permits high flexibility in trading off information exploration and retrieval and, in addiiton, has good retrieval performance. We compared information retrieval using lattice-based hybrid navigation with conventional Boolean querying. The results of an experiment conducted on two medium-sized bibliographic databases showed that the performance of lattice retrieval was comparable to or better than Boolean retrieval.

102 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
81% related
Metadata
43.9K papers, 642.7K citations
79% related
Recommender system
27.2K papers, 598K citations
79% related
Ontology (information science)
57K papers, 869.1K citations
78% related
Natural language
31.1K papers, 806.8K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202239
2021107
2020130
2019144
2018111