scispace - formally typeset
Search or ask a question
Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: It is shown that the most strongly co-occurring word pairs, which are therefore “associated” in a statistical sense, can be represented in the form of an “association map.”
Abstract: This article discusses the possibility of exploiting the statistics of word co-occurrence in text for purposes of document retrieval. Co-occurrence is defined and related to the mental processes of authors and readers; several means of quantitative measurement of word co-occurrence are then scrutinized. It is shown that the most strongly co-occurring word pairs, which are therefore “associated” in a statistical sense, can be represented in the form of an “association map.” The last half of the article presents two modes of use of association maps in literature searching.

89 citations

Journal ArticleDOI
TL;DR: This paper presents a meta-modelling framework for estimating the relevance of information retrieval in a number of discrete-time models and shows clear patterns in how these models are modified over time.
Abstract: Acknowledgments. Preface. 1. Introduction. 2. Mathematics Handbook. 3. Information Retrieval Models. 4. Mathematical Theory of Information Retrieval. 5. Relevance Effectiveness in Information Retrieval. 6. Further Topics in Information Retrieval. Appendices. References. Index.

89 citations

Journal ArticleDOI
01 Jun 2008
TL;DR: This study designs a Latent Semantic Indexing (LSI)-based MLDC technique capable of generating knowledge maps from multilingual documents, capable of maintaining a good balance between monolingual and cross-lingual clustering effectiveness when clustering a multilingual document corpus.
Abstract: The creation and deployment of knowledge repositories for managing, sharing, and reusing tacit knowledge within an organization has emerged as a prevalent approach in current knowledge management practices. A knowledge repository typically contains vast amounts of formal knowledge elements, which generally are available as documents. To facilitate users' navigation of documents within a knowledge repository, knowledge maps, often created by document clustering techniques, represent an appealing and promising approach. Various document clustering techniques have been proposed in the literature, but most deal with monolingual documents (i.e., written in the same language). However, as a result of increased globalization and advances in Internet technology, an organization often maintains documents in different languages in its knowledge repositories, which necessitates multilingual document clustering (MLDC) to create organizational knowledge maps. Motivated by the significance of this demand, this study designs a Latent Semantic Indexing (LSI)-based MLDC technique capable of generating knowledge maps (i.e., document clusters) from multilingual documents. The empirical evaluation results show that the proposed LSI-based MLDC technique achieves satisfactory clustering effectiveness, measured by both cluster recall and cluster precision, and is capable of maintaining a good balance between monolingual and cross-lingual clustering effectiveness when clustering a multilingual document corpus.

88 citations

Patent
28 Feb 1992
TL;DR: In this paper, a component character table is created in which characters occurring in each of the condensed texts are registered without duplication, and a text body search is executed for extracting a document which satisfies query condition imposed on the search term by consulting the texts of the documents extracted through the component characters table search and the condensed text search.
Abstract: High-speed full document retrieval method and system capable of providing result of retrieval within practically acceptable short search time. Upon registration of documents in a document database, condensed texts are created by decomposing each of textual character strings of the documents to be registered into fragmental character strings in dependence on character species and by checking mutual inclusion relations existing among the fragmental character strings. A component character table is created in which characters occurring in each of the condensed texts are registered without duplication. The condensed texts and the component character table are registered in the data base together with the texts of the documents to be registered. Upon retrieval of a document containing a search term designated by a user, a component character table search is first executed to extract those documents which contain all species of characters constituting the search term by consulting the component character table, and subsequently a condensed text search is executed by consulting the condensed texts of the documents. Finally, a text body search is executed for extracting a document which satisfies query condition imposed on the search term by consulting the texts of the documents extracted through the component character table search and the condensed text search.

88 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
81% related
Metadata
43.9K papers, 642.7K citations
79% related
Recommender system
27.2K papers, 598K citations
79% related
Ontology (information science)
57K papers, 869.1K citations
78% related
Natural language
31.1K papers, 806.8K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202239
2021107
2020130
2019144
2018111