scispace - formally typeset
Search or ask a question
Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Using naturalistic inquiry methodology, an empirical study of user-based relevance interpretations is reported that reflects the nature of the thought processes of users who are evaluating bibliographic citations produced by a document retrieval system.
Abstract: Experimental research in information retrieval (IR) depends on the idea of relevance. Because of its key role in IR, recent questions about relevance have raised issues of methodological concern and have shaken the philosophical foundations of IR theory development. Despite an existing set of theoretical definitions of this concept, our understanding of relevance from users' perspectives is still limited. Using naturalistic inquiry methodology, this article reports an empirical study of user-based relevance interpretations. A model is presented that reflects the nature of the thought processes of users who are evaluating bibliographic citations produced by a document retrieval system. Three major categories of variables affecting relevance assessments-internal context, external context, and problem context-are identified and described. Users' relevance assessments involve multiple layers of interpretations that are derived from individuals' experiences, perceptions, and private knowledge related to the pa...

201 citations

Proceedings ArticleDOI
24 May 1994
TL;DR: In this paper, the problem of incremental updates of inverted lists is addressed using a new dual-structure index that dynamically separates long and short inverted lists and optimizes retrieval, update, and storage of each type of list.
Abstract: With the proliferation of the world's “information highways” a renewed interest in efficient document indexing techniques has come about. In this paper, the problem of incremental updates of inverted lists is addressed using a new dual-structure index. The index dynamically separates long and short inverted lists and optimizes retrieval, update, and storage of each type of list. To study the behavior of the index, a space of engineering trade-offs which range from optimizing update time to optimizing query performance is described. We quantitatively explore this space by using actual data and hardware in combination with a simulation of an information retrieval system. We then describe the best algorithm for a variety of criteria.

200 citations

Proceedings Article
23 Sep 2007
TL;DR: This work focuses on the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking.
Abstract: As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data "entities" (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. While entities appear in many pages, current engines only find each page individually. Toward searching directly and holistically for finding information of finer granularity, we study the problem of entity search, a significant departure from traditional document retrieval. We focus on the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking. We evaluate our online prototype over a 2TB Web corpus, and show that EntityRank performs effectively.

200 citations

Journal ArticleDOI
TL;DR: It is found that with the appropriate subword units, it is possible to achieve performance comparable to that of text-based word units if the underlying phonetic units are recognized correctly.

200 citations

Journal ArticleDOI
TL;DR: A synoptic view of the growth of the text processing technology of information extraction whose function is to extract information about a pre‐specified set of entities, relations or events from natural language texts and to record this information in structured representations called templates is given.
Abstract: In this paper we give a synoptic view of the growth of the text processing technology of information extraction (IE) whose function is to extract information about a pre‐specified set of entities, relations or events from natural language texts and to record this information in structured representations called templates. Here we describe the nature of the IE task, review the history of the area from its origins in AI work in the 1960s and 70s till the present, discuss the techniques being used to carry out the task, describe application areas where IE systems are or are about to be at work, and conclude with a discussion of the challenges facing the area. What emerges is a picture of an exciting new text processing technology with a host of new applications, both on its own and in conjunction with other technologies, such as information retrieval, machine translation and data mining.

199 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
81% related
Metadata
43.9K papers, 642.7K citations
79% related
Recommender system
27.2K papers, 598K citations
79% related
Ontology (information science)
57K papers, 869.1K citations
78% related
Natural language
31.1K papers, 806.8K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202239
2021107
2020130
2019144
2018111