scispace - formally typeset
Search or ask a question
Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A prototypical implementation of a software tool for document retrieval which groups/arranges (pre-processed) documents based on a similarity measure to help a user to navigate through similar documents.
Abstract: In this article we present a prototypical implementation of a software tool for document retrieval which groups/arranges (pre-processed) documents based on a similarity measure. The prototype was developed based on self-organising maps to realise interactive associative search and visual exploration of document databases. This helps a user to navigate through similar documents. The navigation, especially the search for the first appropriate document, is supported by conventional keyword search methods. The usability of the presented approach is shown by a sample search.

41 citations

Journal ArticleDOI
William S. Cooper1
TL;DR: The problem of developing systems of logical inference for natural languages is discussed, and an example of such an analysis of a sublanguage of English is presented.
Abstract: Information Retrieval systems may be classified either as Document Retrieval systems or Fact Retrieval systems. It is contended that at least some of the latter will require the capability for performing logical deductions among natural language sentences. The problem of developing systems of logical inference for natural languages is discussed, and an example of such an analysis of a sublanguage of English is presented. An experimental Fact Retrieval system which incorporates this analysis has been programmed for the IBM 7090 computer, and its main algorithms are stated.

41 citations

Journal ArticleDOI
01 Sep 2004
TL;DR: The study indicates that Profile Skim was as least as effective as FindSkim in identifying relevant pages, as measured by traditional information retrieval measures, and there is some evidence that ProfileSkim is a precision-enhancing tool.
Abstract: We present a user-centred, task-oriented, comparative evaluation of two within-document retrieval tools. ProfileSkim computes a relevance profile for a document with respect to a query, and presents the profile as an interactive bar graph. FindSkim provides similar functionality to the web browser “Find” command. A novel simulated work task was devised, where participants are asked to identify (index) relevant pages of an electronic book, given topics from the existing book index. The original book index provides the ground truth, against which the indexing results of the participants can be compared. We confirmed a major hypothesis, namely ProfileSkim proved significantly more efficient than Find-Skim, as measured by time for task. The study indicates that ProfileSkim was as least as effective as FindSkim in identifying relevant pages, as measured by traditional information retrieval measures, and there is some evidence that ProfileSkim is a precision-enhancing tool. Based on qualitative data from questionnaires, we also provide strong evidence to support our conjecture that the participants would be more satisfied when using ProfileSkim than FindSkim. The experimental study confirmed the potential of relevance profiling for improving within-document retrieval. Relevance profiling should prove highly beneficial for users trying to identify relevant information within long documents.

41 citations

Proceedings ArticleDOI
28 Jul 2013
TL;DR: This work presents a novel cluster ranking approach that utilizes Markov Random Fields (MRFs), and shows that it significantly outperforms state-of- the-art cluster ranking methods and can be used to improve the performance of results-diversification methods.
Abstract: An important challenge in cluster-based document retrieval is ranking document clusters by their relevance to the query. We present a novel cluster ranking approach that utilizes Markov Random Fields (MRFs). MRFs enable the integration of various types of cluster-relevance evidence; e.g., the query-similarity values of the cluster's documents and query-independent measures of the cluster. We use our method to re-rank an initially retrieved document list by ranking clusters that are created from the documents most highly ranked in the list. The resultant retrieval effectiveness is substantially better than that of the initial list for several lists that are produced by effective retrieval methods. Furthermore, our cluster ranking approach significantly outperforms state-of- the-art cluster ranking methods. We also show that our method can be used to improve the performance of (state-of- the-art) results-diversification methods.

41 citations

Jae Sung Lee1
01 Jan 2008
TL;DR: A language independent Statistical Transliteration Model (STM) that learns rules automatically from word-aligned pairs in order to generate transliteration variations and a hybrid method that is more effective to generate various transliterations and consequently to retrieve more relevant documents is proposed.
Abstract: In Korean technical documents, many English words are transliterated into Korean in various ways. Most of these words are technical terms and proper nouns that are frequently used as query terms in information retrieval systems. As the communication with foreigners increases, an automatic transliteration system is needed to find the various transliterations for the cross lingual information systems, especially for the proper nouns and technical terms which are not registered in the dictionary. In this paper, we present a language independent Statistical Transliteration Model (STM) that learns rules automatically from word-aligned pairs in order to generate transliteration variations. For the transliteration from English to Korean, we compared two methods based on STM: the pivot method and the direct method. In the pivot method, the transliteration is done in two steps: converting English words into pronunciation symbols by using the STM and then converting these symbols into Korean words by using the Korean standard conversion rule. In the direct method, English words are directly converted to Korean words by using the STM without intermediate steps. After comparing the performance of the two methods, we propose a hybrid method that is more effective to generate various transliterations and consequently to retrieve more relevant documents.

41 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
81% related
Metadata
43.9K papers, 642.7K citations
79% related
Recommender system
27.2K papers, 598K citations
79% related
Ontology (information science)
57K papers, 869.1K citations
78% related
Natural language
31.1K papers, 806.8K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202239
2021107
2020130
2019144
2018111