Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Developments in automatic text retrieval.

[...]

Gerard Salton¹•Institutions (1)

Cornell University¹

30 Aug 1991-Science

TL;DR: The text analysis problem is examined, and modern approaches leading to the identification and retrieval of selected text items in response to search requests are discussed.

...read moreread less

Abstract: Recent developments in the storage, retrieval, and manipulation of large text files are described. The text analysis problem is examined, and modern approaches leading to the identification and retrieval of selected text items in response to search requests are discussed.

...read moreread less

661 citations

Journal Article•DOI•

Deep sentence embedding using long short-term memory networks: analysis and application to information retrieval

[...]

Hamid Palangi¹, Li Deng², Yelong Shen², Jianfeng Gao², Xiaodong He², Jianshu Chen², Xinying Song², Rabab K. Ward¹ - Show less +4 more•Institutions (2)

University of British Columbia¹, Microsoft²

01 Apr 2016-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: In this article, the LSTM-RNN model was used for sentence embedding in a web search engine and the results showed that the proposed method significantly outperformed the Paragraph Vector method for web document retrieval task.

...read moreread less

Abstract: This paper develops a model that addresses sentence embedding, a hot topic in current natural language processing research, using recurrent neural networks (RNN) with Long Short-Term Memory (LSTM) cells. The proposed LSTM-RNN model sequentially takes each word in a sentence, extracts its information, and embeds it into a semantic vector. Due to its ability to capture long term memory, the LSTM-RNN accumulates increasingly richer information as it goes through the sentence, and when it reaches the last word, the hidden layer of the network provides a semantic representation of the whole sentence. In this paper, the LSTM-RNN is trained in a weakly supervised manner on user click-through data logged by a commercial web search engine. Visualization and analysis are performed to understand how the embedding process works. The model is found to automatically attenuate the unimportant words and detect the salient keywords in the sentence. Furthermore, these detected keywords are found to automatically activate different cells of the LSTM-RNN, where words belonging to a similar topic activate the same cell. As a semantic representation of the sentence, the embedding vector can be used in many different applications. These automatic keyword detection and topic allocation abilities enabled by the LSTM-RNN allow the network to perform document retrieval, a difficult language processing task, where the similarity between the query and documents can be measured by the distance between their corresponding sentence embedding vectors computed by the LSTM-RNN. On a web search task, the LSTM-RNN embedding is shown to significantly outperform several existing state of the art methods. We emphasize that the proposed model generates sentence embedding vectors that are specially useful for web document retrieval tasks. A comparison with a well known general sentence embedding method, the Paragraph Vector, is performed. The results show that the proposed method in this paper significantly outperforms Paragraph Vector method for web document retrieval task.

...read moreread less

659 citations

Journal Article•DOI•

Evaluation of an inference network-based retrieval model

[...]

Howard R. Turtle¹, W. Bruce Croft¹•Institutions (1)

University of Massachusetts Amherst¹

01 Jul 1991

TL;DR: Network representations show promise as mechanisms for inferring probable relationships between documents and queries and have been used in information retrieval since at least the early 1960s.

...read moreread less

Abstract: Network representations have been used in information retrieval since at least the early 1960’s. Networks have been used to support diverse retrieval functions, including browsing [38], document clustering [7], spreading activation search [4], support for multiple search strategies [11], and representation of user knowledge [27] or document content [40]. Recent work suggests that significant improvements in retrieval performance will require techniques that, in some sense “understand” the content of documents and queries [9, 43] and can be used to infer probable relationships between documents and queries. In this view, information retrieval is an inference or evidential reasoning process in which we estimate the probability that a user’s information need, expressed as one or more queries, is met given a document as “evidence.” Network representations show promise as mechanisms for inferring these kinds of relationships [4, 12].

...read moreread less

653 citations

Journal Article•DOI•

Information retrieval as statistical translation

[...]

Adam L. Berger¹, John Lafferty¹•Institutions (1)

Carnegie Mellon University¹

01 Aug 1999

TL;DR: A simple, well motivated model of the document-to-query translation process is proposed, and an algorithm for learning the parameters of this model in an unsupervised manner from a collection of documents is described.

...read moreread less

Abstract: We propose a new probabilistic approach to information retrieval based upon the ideas and methods of statistical machine translation. The central ingredient in this approach is a statistical model of how a user might distill or "translate" a given document into a query. To assess the relevance of a document to a user's query, we estimate the probability that the query would have been generated as a translation of the document, and factor in the user's general preferences in the form of a prior distribution over documents. We propose a simple, well motivated model of the document-to-query translation process, and describe an algorithm for learning the parameters of this model in an unsupervised manner from a collection of documents. As we show, one can view this approach as a generalization and justification of the "language modeling" strategy recently proposed by Ponte and Croft. In a series of experiments on TREC data, a simple translation-based retrieval system performs well in comparison to conventional retrieval techniques. This prototype system only begins to tap the full potential of translation-based retrieval.

...read moreread less

651 citations

Proceedings Article•DOI•

Adapting ranking SVM to document retrieval

[...]

Yunbo Cao¹, Jun Xu², Tie-Yan Liu¹, Hang Li¹, Yalou Huang², Hsiao-Wuen Hon¹ - Show less +2 more•Institutions (2)

Microsoft¹, Nankai University²

06 Aug 2006

TL;DR: Experimental results show that the modifications made in conventional Ranking SVM can outperform the conventional ranking SVM and other existing methods for document retrieval on two datasets and employ two methods to conduct optimization on the loss function: gradient descent and quadratic programming.

...read moreread less

Abstract: The paper is concerned with applying learning to rank to document retrieval. Ranking SVM is a typical method of learning to rank. We point out that there are two factors one must consider when applying Ranking SVM, in general a "learning to rank" method, to document retrieval. First, correctly ranking documents on the top of the result list is crucial for an Information Retrieval system. One must conduct training in a way that such ranked results are accurate. Second, the number of relevant documents can vary from query to query. One must avoid training a model biased toward queries with a large number of relevant documents. Previously, when existing methods that include Ranking SVM were applied to document retrieval, none of the two factors was taken into consideration. We show it is possible to make modifications in conventional Ranking SVM, so it can be better used for document retrieval. Specifically, we modify the "Hinge Loss" function in Ranking SVM to deal with the problems described above. We employ two methods to conduct optimization on the loss function: gradient descent and quadratic programming. Experimental results show that our method, referred to as Ranking SVM for IR, can outperform the conventional Ranking SVM and other existing methods for document retrieval on two datasets.

...read moreread less

648 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics