Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The Association Factor in Information Retrieval

[...]

H. Edmund Stiles

01 Apr 1961-Journal of the ACM

TL;DR: An all computer document retrieval system which can find documents related to a request even though they may not be indexed by the exact terms of the request, and can present these documents in the order of their relevance to the request is described.

...read moreread less

Abstract: This paper describes an all computer document retrieval system which can find documents related to a request even though they may not be indexed by the exact terms of the request, and can present these documents in the order of their relevance to the request. The key to this ability lies in the application of a statistical formula by which the computer calculates the degree of association between pairs of index terms. With proper manipulation of these associations (entirely within the machine) a vocabulary of synonyms, near synonyms and other words closely related to any given term or group of terms is derived. Such a vocabulary related to a group of request terms is believed to be a much more powerful tool for selecting documents from a collection than has been available heretofore. By noting the number of matching terms between this extended list of request terms and the terms used to index a document, and with due regard for their degree of association, documents are selected by the computer and arranged in the order of their relevance to the request. Like all other documentalists who are operating large coordinate indexes, we are searching for better ways to exploit this type of information system. In our library we have already eliminated the time-consuming job of posting document numbers manually by enlisting the aid of a 705 computer. (The computer periodically prepares revised posting cards to replace the outdated ones.) Now we are searching for better solutions to our retrieval problems. One obvious retrieval problem in any large system is the time required to \"coordinate\" heavily posted terms. We are convinced we must mechanize if we are to allow our collection to grow indefinitely. A second problem is the retrieval of so many documents related to a single request that the customer finds it difficult to decide which to examine first. Since he has no precise means of determining which document is most closely related to a request, we have been forced into assisting him to use somewhat arbitrary or subjective means. The date of the document is sometimes used as a relevance criterion with the hope that the most recent document will be the most pertinent, or the name of the author is used with the hope that a known author will answer the request better than an unknown one. The pitfalls of such criteria are apparent. The third, and …

...read moreread less

100 citations

Journal Article•DOI•

New algorithms on wavelet trees and applications to information retrieval

[...]

Travis Gagie¹, Gonzalo Navarro², Simon J. Puglisi³•Institutions (3)

Aalto University¹, University of Chile², RMIT University³

01 Apr 2012-Theoretical Computer Science

TL;DR: In this article, the authors show how to use wavelet trees to solve fundamental algorithmic problems such as range quantile queries, range next value queries, and range intersection queries.

...read moreread less

100 citations

Patent•

Document retrieval system and search method using word set and character look-up tables

[...]

Robin Green¹•Institutions (1)

IBM¹

31 Jul 2001

TL;DR: A character look-up table is a boolean array with one bit elements that are processed in groups whose size corresponds to the maximum bit processing count of the computer, effectively culling non-matching words simultaneously as mentioned in this paper.

...read moreread less

Abstract: A computer-operated document retrieval system includes a lexicon of words contained in system documents, and a document look-up table that relates words by unique word numbers to the documents. A word look-up table identifies sets of words with common characteristics, specifically prefix value and word length, and a character look-up table identifies whether any word contains a specified character. A target set generator accesses the word look-up table to compose a target word set with characteristics corresponding to the search string. A refining module reduces the target set by selecting a set of characters from the search string, and accessing the character look-up table to identify which target words use the character set. The character look-up table is a boolean array with one bit elements that are processed in groups whose size corresponds to the maximum bit processing count of the computer, effectively culling non-matching words simultaneously. A string comparison module determines whether any word remaining in the target set matches the search string. The system quickly executes various searches, including prefix, exact match, wildcard, and fuzzy searches.

...read moreread less

100 citations

Journal Article•DOI•

Ontology-driven document enrichment

[...]

Enrico Motta¹, Simon Buckingham Shum¹, John Domingue¹•Institutions (1)

Open University¹

01 Jun 2000-International Journal of Human-computer Studies \/ International Journal of Man-machine Studies

TL;DR: An approach to document enrichment is presented, which consists of developing and integrating formal knowledge models with archives of documents, to provide intelligent knowledge retrieval and (possibly) additional knowledge-intensive services, beyond what is currently available using “standard” information retrieval and search facilities.

...read moreread less

Abstract: In this paper, we present an approach to document enrichment, which consists of developing and integrating formal knowledge models with archives of documents, to provide intelligent knowledge retrieval and (possibly) additional knowledge-intensive services, beyond what is currently available using “standard” information retrieval and search facilities. Our approach is ontology-driven, in the sense that the construction of the knowledge model is carried out in a top-down fashion, by populating a given ontology, rather than in a bottom-up fashion, by annotating a particular document. In this paper, we give an overview of the approach and we examine the various types of issues (e.g. modelling, organizational and user interface issues) which need to be tackled to effectively deploy our approach in the workplace. In addition, we also discuss a number of technologies we have developed to support ontology-driven document enrichment and we illustrate our ideas in the domains of electronic news publishing, scholarly discourse and medical guidelines.

...read moreread less

100 citations

Patent•

Document retrieval using internal dictionary-hierarchies to adjust per-subject match results

[...]

Anne E. Gattiker¹, Fadi H. Gebara¹, Anthony N. Hylick¹, Rouwaida Kanj¹•Institutions (1)

IBM¹

15 Sep 2015

TL;DR: In this article, a retrieval request for one or more documents containing search terms descriptive of the documents can be processed by identifying a set of candidate documents tagged with subjects, then using affinity values to adjust the aggregate score for the terms in the dictionaries.

...read moreread less

Abstract: Techniques for managing big data include retrieval using per-subject dictionaries having multiple levels of sub-classification hierarchy within the subject. Entries may include subject-determining-power (SDP) scores that provide an indication of the descriptive power of the entry term with respect to the subject of the dictionary containing the term. The same term may have entries in multiple dictionaries with different SDP scores in each of the dictionaries. A retrieval request for one or more documents containing search terms descriptive of the one or more documents can be processed by identifying a set of candidate documents tagged with subjects, i.e., identifiers of per-subject dictionaries having entries corresponding to a search term, then using affinity values to adjust the aggregate score for the terms in the dictionaries. Documents are then selected for best match to the subject based on the adjusted scores. Alternatively, the adjustment may be performed after selecting the documents by re-ordering them according to adjusted scores.

...read moreread less

100 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics