Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Applying genetic algorithms to query optimization in document retrieval

[...]

Jorng-Tzong Horng¹, Ching-Chang Yeh¹•Institutions (1)

National Central University¹

01 Sep 2000-Information Processing and Management

TL;DR: A novel approach to automatically retrieve keywords and then uses genetic algorithms to adapt the keyword weights and this approach is faster and uses less memory than the PAT-tree based approach.

...read moreread less

Abstract: This paper proposes a novel approach to automatically retrieve keywords and then uses genetic algorithms to adapt the keyword weights. One of the contributions of the paper is to combine the Bigram (Chen, A., He, J., Xu, L., Gey, F. C., & Meggs, J. 1997. Chinese text retrieval without using a dictionary , ACM SIGIR’97, Philadelphia, PA, USA, pp. 42–49; Yang, Y.-Y., Chang, J.-S., & Chen, K.-J. 1993), Document automatic classification and ranking , Master thesis, Department of Computer Science, National Tsing Hua University) model and PAT-tree structure (Chien, L.-F., Huang, T.-I., & Chien, M.-C. 1997 Pat-tree-based keyword extraction for Chinese information retrieval , ACM SIGIR’97, Philadelphia, PA, US, pp. 50–59) to retrieve keywords. The approach extracts bigrams from documents and uses the bigrams to construct a PAT-tree to retrieve keywords. The proposed approach can retrieve any type of keywords such as technical keywords and a person’s name. Effectiveness of the proposed approach is demonstrated by comparing how effective are the keywords found by both this approach and the PAT-tree based approach. This comparison reveals that our keyword retrieval approach is as accurate as the PAT-tree based approach, yet our approach is faster and uses less memory. The study then applies genetic algorithms to tune the weight of retrieved keywords. Moreover, several documents obtained from web sites are tested and experimental results are compared with those of other approaches, indicating that the proposed approach is highly promising for applications.

...read moreread less

132 citations

The Effectiveness and Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval

[...]

Ellen Marie Voorhees

01 Oct 1985

TL;DR: The main goal of this thesis is to compare clustered file searches and inverted file searches in order to determine under what circumstances one search is to be preferred over the other.

...read moreread less

Abstract: The major component of a document retrieval system is the component that searches the document collection and selects the documents to be returned in response to a query. Since users wait for the results of the search, the component must be efficient as well as effective. The main goal of this thesis is to compare clustered file searches and inverted file searches in order to determine under what circumstances one search is to be preferred over the other. A preliminary goal is to define a good cluster search. Three types of agglomerative clustering strategies, the single link, the complete link, and the group average link methods, are investigated. Searches of the single link hierarchy, the cluster hierarchy used extensively in previous research, are shown to be inferior to searches of the other hierarchy types. Searches of the group average link and complete link hierarchies perform similarly for small collections; for larger collections, searches of the complete link hierarchy are more effective. A top-down search of the group average link hierarchy is the most time efficient search asymptotically. The experimental evidence suggests that the difference in the efficiency and effectiveness of the complete link and group average link searches is due to the restricted depth of the complete link hierarchy. The depth of the group average link hierarchy increases as the size of the collection increases, but the depth of the complete link hierarchy does not. Thus the largest clusters in the complete link hierarchy are not very large, and the clusters can be accurately represented by centroids. Since the depth of the hierarchy does not increase with collection size, searches of the complete link hierarchy should remain effective for larger collections. The top-down search of the complete link hierarchy is somewhat more effective than the inverted file search. The relative efficiency of the two searches depends on the relative efficiency of accessing a page and computing a similarity, since the cluster search accesses many more pages but computes fewer similarities than the inverted file search. For an inexpensive similarity measure, the inverted file search is much more efficient.

...read moreread less

131 citations

Book Chapter•DOI•

The Use of Implicit Evidence for Relevance Feedback in Web Retrieval

[...]

Ryen W. White¹, Ian Ruthven², Joemon M. Jose¹•Institutions (2)

University of Glasgow¹, University of Strathclyde²

25 Mar 2002

TL;DR: The research focuses on the degree to which implicit evidence of document relevance can be substituted for explicit evidence in terms of both user opinion and search effectiveness.

...read moreread less

Abstract: In this paper we report on the application of two contrasting types of relevance feedback for web retrieval. We compare two systems; one using explicit relevance feedback (where searchers explicitly have to mark documents relevant) and one using implicit relevance feedback (where the system endeavours to estimate relevance by mining the searcher's interaction). The feedback is used to update the display according to the user's interaction. Our research focuses on the degree to which implicit evidence of document relevance can be substituted for explicit evidence. We examine the two variations in terms of both user opinion and search effectiveness.

...read moreread less

131 citations

Proceedings Article•DOI•

A flexible model for retrieval of SGML documents

[...]

Sung-Hyon Myaeng¹, Don-Hyun Jang¹, Mun-Seok Kim, Zong-Cheol Zhoo•Institutions (1)

Chungnam National University¹

01 Aug 1998

TL;DR: This paper implemented the model and ran a series of experiments to show that, in addition to the added functionality, the use of the structural information embedded in SGML documents can improve the effectiveness of document retrieval, compared to the case where no such information is used.

...read moreread less

Abstract: In traditional information retrieval (IR) systems, a document as a whole is the target for a query. With increasing interests in structured documents like SGML documents, there is a growing need to build an LR system that can retrieve parts of documents, which satisfy not only content-based but also structure-based requirements. In this paper, we describe an inference-net-based approach to this problem. The model is capable of retrieving elements at any level in a principled way, satisfying certain containment constraints in a quety. Moreover, lvhile the model is general enough to reproduce the ranking strategy adopted by conventional document retrieval systems by making use of document and collection level statistics such as TF and IDF, its flexibility allows for incorporation of a variety of pragmatic and semantic information associated with document structures. We implemented the model and ran a series of experiments to show that, in addition to the added functionality, the use of the structural information embedded in SGML documents can improve the effectiveness of document retrieval, compared to the case where no such information is used. We also show that giving a pragmatic preference to a certain element tape of the SGML documents can enhance retrieval effectiveness.

...read moreread less

131 citations

Patent•

Document retrieval method and system and computer readable storage medium

[...]

Yasuhiko Inaba, Katsumi Tada, Natsuko Sugaya, Tadataka Matsubayashi, Akihiko Yamaguchi, Mikihiko Tokunaga - Show less +2 more

13 Sep 2001

TL;DR: In this paper, a document retrieval method using a computer program includes retrieving a first set of documents using a first query expression generated by the computer program and an evaluation of the first set from the user.

...read moreread less

Abstract: A document retrieval method using a computer program includes retrieving a first set of documents using a first query expression generated by the computer program. The first set of documents is provided to a user. An evaluation of the first set of documents is received from the user. The first query expression is changed to a second query expression generated by the computer program based on the evaluation.

...read moreread less

130 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics