Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Automatic discovery of similarity relationships through Web mining

[...]

Dmitri Roussinov¹, J. Leon Zhao²•Institutions (2)

Arizona State University¹, University of Arizona²

01 Apr 2003

TL;DR: This work demonstrates how the World Wide Web can be mined in a fully automated manner for discovering the semantic similarity relationships among the concepts surfaced during an electronic brainstorming session, and thus improving the accuracy of automated clustering meeting messages.

...read moreread less

Abstract: This work demonstrates how the World Wide Web can be mined in a fully automated manner for discovering the semantic similarity relationships among the concepts surfaced during an electronic brainstorming session, and thus improving the accuracy of automated clustering meeting messages. Our novel Context Sensitive Similarity Discovery (CSSD) method takes advantage of the meeting context when selecting a subset of Web pages for data mining, and then conducts regular concept co-occurrence analysis within that subset. Our results have implications on reducing information overload in applications of text technologies such as email filtering, document retrieval, text summarization, and knowledge management.

...read moreread less

103 citations

Journal Article•DOI•

Spaces, Trees, and Colors: The algorithmic landscape of document retrieval on sequences

[...]

Gonzalo Navarro¹•Institutions (1)

University of Chile¹

01 Mar 2014-ACM Computing Surveys

TL;DR: In this article, the authors cover the recent research in extending the document retrieval techniques to a broader class of sequence collections and uncover a rich world of relations between document retrieval challenges and fundamental problems on trees, strings, range queries, discrete geometry, and other areas.

...read moreread less

Abstract: Document retrieval is one of the best-established information retrieval activities since the ’60s, pervading all search engines. Its aim is to obtain, from a collection of text documents, those most relevant to a pattern query. Current technology is mostly oriented to “natural language” text collections, where inverted indexes are the preferred solution. As successful as this paradigm has been, it fails to properly handle various East Asian languages and other scenarios where the “natural language” assumptions do not hold. Inthis survey, we cover the recent research in extending the document retrieval techniques to a broader class of sequence collections, which has applications in bioinformatics, data and web mining, chemoinformatics, software engineering, multimedia information retrieval, and many other fields. We focus on the algorithmic aspects of the techniques, uncovering a rich world of relations between document retrieval challenges and fundamental problems on trees, strings, range queries, discrete geometry, and other areas.

...read moreread less

103 citations

Patent•

Keyword associative document retrieval system

[...]

Morita Tetsuya¹•Institutions (1)

Ricoh¹

05 Oct 1990

TL;DR: In this paper, a document retrieval system includes an inputting unit for inputting a retrieval condition including one or a plurality of keywords and a weight value for each keyword, an operating unit having first factors corresponding to relationship values, each relationship value being defined as a degree of the relationship between two keywords out of keywords which are predetermined in the document retrieval systems, and second factors correspond to importance values.

...read moreread less

Abstract: A document retrieval system includes an inputting unit for inputting a retrieval condition including one or a plurality of keywords and a weight value for each keyword, an operating unit having first factors corresponding to relationship values, each relationship value being defined as a degree of the relationship between two keywords out of keywords which are predetermined in the document retrieval system and second factors corresponding to importance values, each importance value being defined as a degree of importance of a keyword in each one of a plurality of documents which are predetermined in the document retrieval system, the operation unit generating a relevance value, which represents a degree of relevance in satisfying a user's requirement, for each of the documents on the basis of the retrieval condition input from the inputting unit, the first factors and the second factors, and an outputting unit for outputting the relevance value for each of the documents as a retrieval result.

...read moreread less

102 citations

Journal Article•DOI•

A model for a weighted retrieval system

[...]

Duncan A. Buell¹, Donald H. Kraft¹•Institutions (1)

Louisiana State University¹

01 May 1981-Journal of the Association for Information Science and Technology

TL;DR: This work looks at the weights from an entirely different approach involving thresholds, and generates an improved evaluation mechanism which seems to fulfill a larger subset of the desired criteria than previous mechanisms.

...read moreread less

Abstract: There has been a good deal of work on information retrieval systems that have continuous weights assigned to the index terms that describe the records in the database, and/or to the query terms that describe the user queries. Recent articles have analyzed retrieval systems with continuous weights of either type and/or with a Boolean structure for the queries. They have also suggested criteria which such systems ought to satisfy and record evaluation mechanisms which partially satisfy these criteria. We offer a more careful analysis, based on a generalization of the discrete weights. We also look at the weights from an entirely different approach involving thresholds, and we generate an improved evaluation mechanism which seems to fulfill a larger subset of the desired criteria than previous mechanisms. This new mechanism allows the user to attach a “threshold” to the query term.

...read moreread less

102 citations

Journal Article•DOI•

Information retrieval through hybrid navigation of lattice representations

[...]

Claudio Carpineto¹, Giovanni Romano¹•Institutions (1)

Fondazione Ugo Bordoni¹

01 Nov 1996-International Journal of Human-computer Studies \/ International Journal of Man-machine Studies

TL;DR: This paper compared information retrieval using lattice-based hybrid navigation with conventional Boolean querying, and showed that the performance of lattice retrieval was comparable to or better than Boolean retrieval.

...read moreread less

Abstract: In this paper we present a comprehensive approach to automatic organization and hybrid navigation of text databases. An organizing stage first builds a particular lattice representation of the data, through text indexing followed by lattice clustering of the indexed texts. The lattice representation, then, supports the navigation stage of the system, a visual retrieval interface that combines three main retrieval strategies: browsing, querying, and bounding. Browsing and querying are used to search the retrieval space, bounding is used to restrict it based on the information that users have, or get during their interaction with the system. We show that such a hybrid paradigm permits high flexibility in trading off information exploration and retrieval and, in addiiton, has good retrieval performance. We compared information retrieval using lattice-based hybrid navigation with conventional Boolean querying. The results of an experiment conducted on two medium-sized bibliographic databases showed that the performance of lattice retrieval was comparable to or better than Boolean retrieval.

...read moreread less

102 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics