Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Discovering Event Evolution Patterns From Document Sequences

[...]

Chih-Ping Wei¹, Yu-Hsiu Chang•Institutions (1)

National Tsing Hua University¹

01 Mar 2007

TL;DR: Measured by miss and false alarm rates, the EP-supported ET (EPET) technique exhibits better tracking effectiveness than a traditional ET technique and suggests that the proposed EP technique could effectively discover event episodes and EPs in sequences of documents.

...read moreread less

Abstract: Recent advances in information and networking technologies have contributed significantly to global connectivity and greatly facilitated and fostered information creation, distribution, and access. The resultant ever-increasing volume of online textual documents creates an urgent need for new text mining techniques that can intelligently and automatically extract implicit and potentially useful knowledge from these documents for decision support. This research focuses on identifying and discovering event episodes together with their temporal relationships that occur frequently (referred to as evolution patterns (EPs) in this paper) in sequences of documents. The discovery of such EPs can be applied in domains such as knowledge management and used to facilitate existing document management and retrieval techniques [e.g., event tracking (ET)]. Specifically, we propose and design an EP discovery technique for mining EPs from sequences of documents. We experimentally evaluate our proposed EP technique in the context of facilitating ET. Measured by miss and false alarm rates, the EP-supported ET (EPET) technique exhibits better tracking effectiveness than a traditional ET technique. The encouraging performance of the EPET technique demonstrates the potential usefulness of EPs in supporting ET and suggests that the proposed EP technique could effectively discover event episodes and EPs in sequences of documents

...read moreread less

46 citations

Book Chapter•DOI•

Collective evolutionary concept distance based query expansion for effective web document retrieval

[...]

Clement H. C. Leung¹, Yuanxi Li¹, Alfredo Milani², Valentina Franzoni²•Institutions (2)

Hong Kong Baptist University¹, University of Perugia²

24 Jun 2013

TL;DR: The basic idea is to measure the distance between candidate concepts using the PMING distance, a collaborative semantic proximity measure, a measure which can be computed using statistical results from a web search engine, and show that the proposed technique can provide users with more satisfying expansion results and improve the quality of web document retrieval.

...read moreread less

Abstract: In this work several semantic approaches to concept-based query expansion and re-ranking schemes are studied and compared with different ontology-based expansion methods in web document search and retrieval. In particular, we focus on concept-based query expansion schemes where, in order to effectively increase the precision of web document retrieval and to decrease the users’ browsing time, the main goal is to quickly provide users with the most suitable query expansion. Two key tasks for query expansion in web document retrieval are to find the expansion candidates, as the closest concepts in web document domain, and to rank the expanded queries properly. The approach we propose aims at improving the expansion phase for better web document retrieval and precision. The basic idea is to measure the distance between candidate concepts using the PMING distance, a collaborative semantic proximity measure, i.e. a measure which can be computed using statistical results from a web search engine. Experiments show that the proposed technique can provide users with more satisfying expansion results and improve the quality of web document retrieval.

...read moreread less

46 citations

Journal Article•DOI•

A new fuzzy logic-based query expansion model for efficient information retrieval using relevance feedback approach

[...]

Jagendra Singh¹, Aditi Sharan¹•Institutions (1)

Jawaharlal Nehru University¹

01 Sep 2017-Neural Computing and Applications

TL;DR: This paper presents a new method for QE based on fuzzy logic considering the top-retrieved document as relevance feedback documents for mining additional QE terms and increases the precision rates and the recall rates of information retrieval systems for dealing with document retrieval.

...read moreread less

Abstract: Efficient query expansion (QE) terms selection methods are really very important for improving the accuracy and efficiency of the system by removing the irrelevant and redundant terms from the top-retrieved feedback documents corpus with respect to a user query. Each individual QE term selection method has its weaknesses and strengths. To overcome the weaknesses and to utilize the strengths of the individual method, we used multiple terms selection methods together. In this paper, we present a new method for QE based on fuzzy logic considering the top-retrieved document as relevance feedback documents for mining additional QE terms. Different QE terms selection methods calculate the degrees of importance of all unique terms of top-retrieved documents collection for mining additional expansion terms. These methods give different relevance scores for each term. The proposed method combines different weights of each term by using fuzzy rules to infer the weights of the additional query terms. Then, the weights of the additional query terms and the weights of the original query terms are used to form the new query vector, and we use this new query vector to retrieve documents. All the experiments are performed on TREC and FIRE benchmark datasets. The proposed QE method increases the precision rates and the recall rates of information retrieval systems for dealing with document retrieval. It gets a significant higher average recall rate, average precision rate and F measure on both datasets.

...read moreread less

46 citations

Journal Article•DOI•

A novel document retrieval method using the discrete wavelet transform

[...]

Laurence A. F. Park¹, Kotagiri Ramamohanarao¹, Marimuthu Palaniswami¹•Institutions (1)

University of Melbourne¹

01 Jul 2005-ACM Transactions on Information Systems

TL;DR: This work proposes a new spectral-based information retrieval method that is able to utilize many different levels of document resolution by examining the term patterns that occur in the documents, and takes advantage of the multiresolution analysis properties of the wavelet transform.

...read moreread less

Abstract: Current information retrieval methods either ignore the term positions or deal with exact term positions; the former can be seen as coarse document resolution, the latter as fine document resolution. We propose a new spectral-based information retrieval method that is able to utilize many different levels of document resolution by examining the term patterns that occur in the documents. To do this, we take advantage of the multiresolution analysis properties of the wavelet transform. We show that we are able to achieve higher precision when compared to vector space and proximity retrieval methods, while producing fast query times and using a compact index.

...read moreread less

46 citations

Journal Article•DOI•

The challenge of commercial document retrieval, part I: major issues, and a framework based on search exhaustivity, determinacy of representation and document collection size

[...]

David C. Blair¹•Institutions (1)

University of Michigan¹

01 Mar 2002-Information Processing and Management

TL;DR: It is shown that document retrieval - specifically, access to intellectual content - is a complex process which is most strongly influenced by three factors: the size of the document collection; the type of search (exhaustive, existence or sample); and, the determinacy of document representation.

...read moreread less

Abstract: With the growing focus on what is collectively known as "knowledge management", a shift continues to take place in commercial information system development: a shift away from the well-understood data retrieval/database model, to the more complex and challenging development of commercial document/ information retrieval models. While document retrieval has had a long and rich legacy of research, its impact on commercial applications has been modest. At the enterprise level most large organizations have little understanding of, or commitment to, high quality document access and management. Part of the reason for this is that we still do not have a good framework for understanding the major factors which affect the performance of large-scale corporate document retrieval systems. The thesis of this discussion is that document retrieval - specifically, access to intellectual content - is a complex process which is most strongly influenced by three factors: the size of the document collection; the type of search (exhaustive, existence or sample); and, the determinacy of document representation. Collectively, these factors can be used to provide a useful framework for, or taxonomy of, document retrieval, and highlight some of the fundamental issues facing the design and development of commercial document retrieval systems. This is the first of a series of three articles. Part II (D.C. Blair, The challenge of commercial document retrieval. Part II. A strategy for document searching based on identifiable document partitions, Information Processing and Management, 2001b, this issue) will discuss the implications of this framework for search strategy, and Part III (D.C. Blair, Some thoughts on the reported results of Text REtrieval Conference (TREC), Information Processing and Management, 2002, forthcoming) will consider the importance of the TREC results for our understanding of operating information retrieval systems.

...read moreread less

46 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics