scispace - formally typeset
Search or ask a question
Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.


Papers
More filters
Journal ArticleDOI
01 Dec 1994
TL;DR: An information retrieval system that simultaneously allows to search for text and speech documents and it is shown that the retrieval effectiveness based on such a small indexing vocabulary is similar to the retrieved effectiveness of a Boolean retrieval system.
Abstract: We present an information retrieval system that simultaneously allows to search for text and speech documents. The retrieval system accepts vague queries and performs a best-match search to find those documents that are relevant to the query. The output of the retrieval system is a list of ranked documents where the documents on the top of the list satisfy best the user's information need. The relevance of the documents is estimated by means of metadata (document description vectors). The metadata is automatically generated and it is organized such that queries can be processed efficiently. We introduce a controlled indexing vocabulary for both speech and text documents. The size of the new indexing vocabulary is small (1000 features) compared with the sizes of indexing vocabularies of conventional text retrieval (10000 - 100000 features). We show that the retrieval effectiveness based on such a small indexing vocabulary is similar to the retrieval effectiveness of a Boolean retrieval system.

45 citations

Journal ArticleDOI
TL;DR: In this article, the results of retrieval tests using a variety of these search methods in the CHESHIRE experimental online catalog system were described and compared with the results obtained using the traditional Boolean search methods of conventional online catalog systems.
Abstract: Research on the use and users of online catalogs conducted in the early 1980s found that subject searches were the most common form of online catalog search. At the same time, many of the problems experienced by online catalog users have been traced to difficulties with the subject access mechanisms of the online catalog. A stream of research has concentrated on appplying retrieval techniques derived from information retrieval techniques derived from information retrieval (IR) research to replace the Boolean search methods of conventional online catalog systems. This study describes the results of retrieval tests using a variety of these search methods in the CHESHIRE experimental online catalog system.

45 citations

Journal ArticleDOI
TL;DR: The study reveals the extent of subject searching activity, and suggests that this may have been underestimated in previous studies, and proposes that a future online searching environment will encourage a more truly interactive approach to subject searching.
Abstract: Searching behaviour in a university library is studied using a wholistic approach, encompassing the use of bibliographic tools and shelf browsing. The present study is designed as the first half of a ‘before and after’ study to permit the evaluation of the impact of a future online catalogue on users' searching behaviour. A combined methodology was devised: searchers were encouraged to talk aloud during their search, and this information, together with some probing and real time expert interpretation, enabled the experimenter to record the searching activity on a highly structured observation form. The study reveals the extent of subject searching activity, and suggests that this may have been underestimated in previous studies. The analysis of expressed topics, search formulation strategy and documents retrieved reveals the adaptive nature of the subject searching process, whereby the user adapts to the structure of the available tools. The information retrieval task in a traditional library system is tailored by the system to a single, one dimensional, sequential process. It is suggested that a major obstacle to subject searching effectiveness may lie in the lack of interaction between the different possible approaches in the searching process: the indexing language, the classification, and the titles. It is to be hoped that a future online searching environment will encourage a more truly interactive approach to subject searching.

45 citations

Journal ArticleDOI
TL;DR: A model for teaching the core informational skill of library‐based literature searching (information retrieval) based on a flow chart of the main stages in a systematic search, developed primarily in an academic, health sciences environment but operates at a sufficiently high level of generality to be of wide applicability in information skills programmes.
Abstract: It is expected that instruction in information skills (formerly known as bibliographic instruction) will be an important function of libraries in the “information society”. Describes a model for teaching the core informational skill of library‐based literature searching (information retrieval). It centres on a flow chart of the main stages in a systematic search: create set of search terms; formulate logical search statement; estimate parameters of search; search information sources; and record and evaluate references. The flow chart is flanked by two columns. One contains conceptual frameworks which illuminate aspects of the search process, such as the information chain and QRAQ (quantity, relevance, authority and quality), a simple schema for evaluating bibliographic references. The other column identifies library tools and services which can assist the end‐user at various stages of a search, such as search analysis and bibliographic instruction. The model was developed primarily in an academic, health sciences environment, but operates at a sufficiently high level of generality to be of wide applicability in information skills programmes.

45 citations

Proceedings ArticleDOI
29 Jun 2009
TL;DR: This paper presents a framework that advocates lazy update propagation with the following key feature: Efficient, incremental updates that immediately reflect the new data in the indexes in a way that gives strict guarantees on the quality of subsequent query answers.
Abstract: Approximate string matching is a problem that has received a lot of attention recently. Existing work on information retrieval has concentrated on a variety of similarity measures TF/IDF, BM25, HMM, etc.) specifically tailored for document retrieval purposes. As new applications that depend on retrieving short strings are becoming popular(e.g., local search engines like YellowPages.com, Yahoo!Local, and Google Maps) new indexing methods are needed, tailored for short strings. For that purpose, a number of indexing techniques and related algorithms have been proposed based on length normalized similarity measures. A common denominator of indexes for length normalized measures is that maintaining the underlying structures in the presence of incremental updates is inefficient, mainly due to data dependent, precomputed weights associated with each distinct token and string. Incorporating updates usually is accomplished by rebuilding the indexes at regular time intervals. In this paper we present a framework that advocates lazy update propagation with the following key feature: Efficient, incremental updates that immediately reflect the new data in the indexes in a way that gives strict guarantees on the quality of subsequent query answers. More specifically, our techniques guarantee against false negatives and limit the number of false positives produced. We implement a fully working prototype and illustrate that the proposed ideas work really well in practice for real datasets.

45 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
81% related
Metadata
43.9K papers, 642.7K citations
79% related
Recommender system
27.2K papers, 598K citations
79% related
Ontology (information science)
57K papers, 869.1K citations
78% related
Natural language
31.1K papers, 806.8K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202239
2021107
2020130
2019144
2018111