Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Aspects of the P-Norm Model of Information Retrieval: Syntactic Query Generation, Efficiency, and Theoretical Properties

[...]

Maria Elena Smith¹•Institutions (1)

Cornell University¹

01 Jan 1990

TL;DR: The efficiency of a p-norm retrieval is significantly improved with a new p- norm retrieval algorithm which evaluates the entire document collection in one recursive traversal of the query tree, and list pruning methods for further efficiency improvements are introduced.

...read moreread less

Abstract: A practical information retrieval system must be easy to use by untrained users, and it must provide prompt responses to a user's search requests. In this thesis, these practical aspects of the p-norm model of information retrieval are explored. In addition, a study of theoretical properties of the p-norm model is presented. A syntactic method for generating p-norm queries from parse trees generated by the PLNLP syntactic analyzer is presented. The effectiveness of the syntactically generated queries is shown to be comparable to the effectiveness of manually constructed queries, and much better than that of statistically generated queries. The efficiency of a p-norm retrieval is significantly improved with a new p-norm retrieval algorithm which evaluates the entire document collection in one recursive traversal of the query tree. This algorithm is compared against the straightforward algorithm, which requires a traversal of the query tree for each document that is evaluated. The new algorithm is shown to be better both asymptotically and experimentally. The infinity-one model is introduced as a means of approximating the p-norm model without requiring exponentiation. Experimental results show that infinity-one retrieval is essentially as effective as p-norm retrieval, but much faster. List pruning methods for further efficiency improvements are also introduced and are shown to reduce retrieval time significantly without affecting the precision of top-ranked documents. The retrieval time of the infinity-one model with list pruning is shown to be comparable to that of pure Boolean retrieval. A theoretical study is also presented in which certain Boolean algebra properties, such as associativity, are shown to be unsatisfiable by any extended Boolean system with weak operators. The p-norm model is shown to satisfy all those properties that can be satisfied. In addition, the p-norm model is evaluated with respect to the Waller-Kraft wish list for extended Boolean systems.

...read moreread less

40 citations

Proceedings Article•DOI•

Efficient and secure ranked multi-keyword search on encrypted cloud data

[...]

Cengiz Örencik¹, Erkay Savas¹•Institutions (1)

Sabancı University¹

30 Mar 2012

TL;DR: A practical privacy-preserving ranked keyword search scheme based on PIR that allows multi-keyword queries with ranking capability and outperforms the most efficient proposals in literature in terms of time complexity by several orders of magnitude.

...read moreread less

Abstract: Information search and document retrieval from a remote database (e.g. cloud server) requires submitting the search terms to the database holder. However, the search terms may contain sensitive information that must be kept secret from the database holder. Moreover, the privacy concerns apply to the relevant documents retrieved by the user in the later stage since they may also contain sensitive data and reveal information about sensitive search terms. A related protocol, Private Information Retrieval (PIR), provides useful cryptographic tools to hide the queried search terms and the data retrieved from the database while returning most relevant documents to the user. In this paper, we propose a practical privacy-preserving ranked keyword search scheme based on PIR that allows multi-keyword queries with ranking capability. The proposed scheme increases the security of the keyword search scheme while still satisfying efficient computation and communication requirements. To the best of our knowledge the majority of previous works are not efficient for assumed scenario where documents are large files. Our scheme outperforms the most efficient proposals in literature in terms of time complexity by several orders of magnitude.

...read moreread less

40 citations

Proceedings Article•DOI•

Recognition and classification of noun phrases in queries for effective retrieval

[...]

Wei Zhang¹, Shuang Liu², Clement Yu¹, Chaojing Sun³, Fang Liu⁴, Weiyi Meng⁵ - Show less +2 more•Institutions (5)

University of Illinois at Chicago¹, Ask.com², Broadcom³, Microsoft⁴, Binghamton University⁵

06 Nov 2007

TL;DR: This paper defines four types of noun phrases and presents an algorithm for recognizing these phrases in queries and uses a baseline noun phrase recognition algorithm to recognize phrases from the TREC queries.

...read moreread less

Abstract: It has been shown that using phrases properly in the document retrieval leads to higher retrieval effectiveness. In this paper, we define four types of noun phrases and present an algorithm for recognizing these phrases in queries. The strengths of several existing tools are combined for phrase recognition. Our algorithm is tested using a set of 500 web queries from a query log, and a set of 238 TREC queries. Experimental results show that our algorithm yields high phrase recognition accuracy. We also use a baseline noun phrase recognition algorithm to recognize phrases from the TREC queries. A document retrieval experiment is conducted using the TREC queries (1) without any phrases, (2) with the phrases recognized from a baseline noun phrase recognition algorithm, and (3) with the phrases recognized from our algorithm respectively. The retrieval effectiveness of (3) is better than that of (2), which is better than that of (1). This demonstrates that utilizing phrases in queries does improve the retrieval effectiveness, and better noun phrase recognition yields higher retrieval performance.

...read moreread less

40 citations

Proceedings Article•

Learning Lexicon Models from Search Logs for Query Expansion

[...]

Jianfeng Gao¹, Shasha Xie², Xiaodong He¹, Alnur Ali¹•Institutions (2)

Microsoft¹, Princeton University²

12 Jul 2012

TL;DR: Evaluations on a real world data set show that the lexicon models, integrated into a ranker-based QE system, not only significantly improve the document retrieval performance but also outperform two state-of-the-art log- based QE methods.

...read moreread less

Abstract: This paper explores log-based query expansion (QE) models for Web search. Three lexicon models are proposed to bridge the lexical gap between Web documents and user queries. These models are trained on pairs of user queries and titles of clicked documents. Evaluations on a real world data set show that the lexicon models, integrated into a ranker-based QE system, not only significantly improve the document retrieval performance but also outperform two state-of-the-art log-based QE methods.

...read moreread less

39 citations

Journal Article•DOI•

An evaluation of interactive query expansion in an online library catalogue with a graphical user interface

[...]

Micheline Hancock-Beaulieu¹, Margaret Fieldhouse¹, Thien Do¹•Institutions (1)

Northampton Community College¹

01 Mar 1995-Journal of Documentation

TL;DR: An online library catalogue served as a testbed to evaluate an interactive query expansion facility based on relevance feedback for the Okapi probabilistic term weighting retrieval system, which was implemented in a graphical user interface (gui) environment using a game‐board metaphor for the search process.

...read moreread less

Abstract: An online library catalogue served as a testbed to evaluate an interactive query expansion facility based on relevance feedback for the Okapi probabilistic term weighting retrieval system. The facility was implemented in a graphical user interface (gui) environment using a game‐board metaphor for the search process, and allowed searchers to select candidate terms extracted from relevant retrieved items to reformulate queries. The take‐up of the interactive query expansion option was found to be lower, and its retrieval performance less effective, compared to previous tests featuring automatic query expansion. Contributory factors including the number, presentation and source of terms are discussed.

...read moreread less

39 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics