Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Optimizing ranking functions: a connectionist approach to adaptive information retrieval

[...]

Brian Bartell¹•Institutions (1)

University of California, San Diego¹

01 Jan 1994

TL;DR: This dissertation examines the use of adaptive methods to automatically improve the performance of ranked text retrieval systems and proposes and empirically validate general adaptive methods which improve the ability of a large class of retrieval systems to rank documents effectively.

...read moreread less

Abstract: This dissertation examines the use of adaptive methods to automatically improve the performance of ranked text retrieval systems. The goal of a ranked retrieval system is to manage a large collection of text documents and to order documents for a user based on the estimated relevance of the documents to the user's information need (or query). The ordering enables the user to quickly find documents of interest. Ranked retrieval is a difficult problem because of the ambiguity of natural language, the large size of the collections, and because of the varying needs of users and varying collection characteristics. We propose and empirically validate general adaptive methods which improve the ability of a large class of retrieval systems to rank documents effectively. Our main adaptive method is to numerically optimize free parameters in a retrieval system by minimizing a non-metric criterion function. The criterion measures how well the system is ranking documents relative to a target ordering, defined by a set of training queries which include the users' desired document orderings. Thus, the system learns parameter settings which better enable it to rank relevant documents before irrelevant. The non-metric approach is interesting because it is a general adaptive method, an alternative to supervised methods for training neural networks in domains in which rank order or prioritization is important. A second adaptive method is also examined, which is applicable to a restricted class of retrieval systems but which permits an analytic solution. The adaptive methods are applied to a number of problems in text retrieval to validate their utility and practical efficiency. The applications include: A dimensionality reduction of vector-based document representations to a vector space in which inter-document similarity more accurately predicts semantic association; the estimation of a similarity measure which better predicts the relevance of documents to queries; and the estimation of a high-performance neural network combination of multiple retrieval systems into a single overall system. The applications demonstrate that the approaches improve performance and adapt to varying retrieval environments. We also compare the methods to numerous alternative adaptive methods in the text retrieval literature, with very positive results.

...read moreread less

50 citations

Journal Article•DOI•

Semi-supervised document retrieval

[...]

Ming Li¹, Hang Li², Zhi-Hua Zhou¹•Institutions (2)

Nanjing University¹, Microsoft²

01 May 2009-Information Processing and Management

TL;DR: Experimental results indicate that SSRank consistently and almost always significantly outperforms the baseline methods, given the same amount of labeled data, because SSRank can effectively leverage the use of unlabeled data in learning.

...read moreread less

Abstract: This paper proposes a new machine learning method for constructing ranking models in document retrieval. The method, which is referred to as SSRank, aims to use the advantages of both the traditional Information Retrieval (IR) methods and the supervised learning methods for IR proposed recently. The advantages include the use of limited amount of labeled data and rich model representation. To do so, the method adopts a semi-supervised learning framework in ranking model construction. Specifically, given a small number of labeled documents with respect to some queries, the method effectively labels the unlabeled documents for the queries. It then uses all the labeled data to train a machine learning model (in our case, Neural Network). In the data labeling, the method also makes use of a traditional IR model (in our case, BM25). A stopping criterion based on machine learning theory is given for the data labeling process. Experimental results on three benchmark datasets and one web search dataset indicate that SSRank consistently and almost always significantly outperforms the baseline methods (unsupervised and supervised learning methods), given the same amount of labeled data. This is because SSRank can effectively leverage the use of unlabeled data in learning.

...read moreread less

50 citations

Proceedings Article•

Vector Expansion in a Large Collection.

[...]

Ellen M. Voorhees, Yuan-Wang Hou

01 Jan 1992

TL;DR: This paper investigates whether a completely automatic, statistical expansion technique that uses a general-purpose thesaurus as a source of related concepts is viable for large collection and results indicate that the particular expansion technique used here improved the performance of some queries, but degrades theperformance of other queries.

...read moreread less

Abstract: This paper investigates whether a completely automatic, statistical expansion technique that uses a general-purpose thesaurus as a source of related concepts is viable for large collection. The retrieval results indicate that the particular expansion technique used here improved the performance of some queries, but degrades the performance of other queries. The variability of the method is attributable to two main factors: the choice of concepts that are expanded and the confounding effects expansion has on cencept weights. Addressing these problems will require both a better method for determining the important concepts of a text and a better method for determining the correct sense of an ambiguous word

...read moreread less

50 citations

Journal Article•DOI•

Text Retrieval Using Self-Organized Document Maps

[...]

Krista Lagus¹•Institutions (1)

Helsinki University of Technology¹

01 Feb 2002-Neural Processing Letters

TL;DR: This article describes how a document map that is automatically organized for browsing and visualization can be successfully utilized also in speeding up document retrieval and shows significantly improved performance compared to Salton's vector space model.

...read moreread less

Abstract: A map of text documents arranged using the Self-Organizing Map (SOM) algorithm (1) is organized in a meaningful manner so that items with similar content appear at nearby locations of the 2-dimensional map display, and (2) clusters the data, resulting in an approximate model of the data distribution in the high-dimensional document space. This article describes how a document map that is automatically organized for browsing and visualization can be successfully utilized also in speeding up document retrieval. Furthermore, experiments on the well-known CISI collection l3r show significantly improved performance compared to Salton's vector space model, measured by average precision (AP) when retrieving a small, fixed number of best documents. Regarding comparison with Latent Semantic Indexing the results are inconclusive.

...read moreread less

50 citations

Book Chapter•DOI•

Long-Term Learning for Web Search Engines

[...]

Charles Kemp¹, Kotagiri Ramamohanarao¹•Institutions (1)

University of Melbourne¹

19 Aug 2002

TL;DR: It is shown how a rigorous evaluation of Document Transformation can be carried out using the referer logs kept by web servers, and a new strategy for Document Transformation is described that is suitable for long-term incremental learning.

...read moreread less

Abstract: This paper considers how web search engines can learn from the successful searches recorded in their user logs.Document Transformation is a feasible approach that uses these logs to improve document representations. Existing test collections do not allow an adequate investigation of Document Transformation, but we show how a rigorous evaluation of this method can be carried out using the referer logs kept by web servers. We also describe a new strategy for Document Transformation that is suitable for long-term incremental learning.Our experiments show that Document Transformation improves retrieval performance over a medium sized collection of webpages.Commercial search engines may be able to achieve similar improvements by incorporating this approach.

...read moreread less

50 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics