scispace - formally typeset
Search or ask a question
Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A new, information theoretical interpretation of term strength is given, some of its uses in focusing the processing of documents for information retrieval and new results obtained in document categorization are described.

114 citations

01 Jan 2003
TL;DR: This paper summarize research in document layout analysis carried out over the last few years in the laboratory, which has developed a number of novel geometric algorithms and statistical methods that are applicable to a wide variety of languages and layouts.
Abstract: In this paper, I summarize research in document layout analysis carried out over the last few years in our laboratory. Correct document layout analysis is a key step in document capture conversions into electronic formats, optical character recognition (OCR), information retrieval from scanned documents, appearance-based document retrieval, and reformatting of documents for on-screen display. We have developed a number of novel geometric algorithms and statistical methods. Layout analysis systems built from these algorithms are applicable to a wide variety of languages and layouts, and have proven to be robust to the presence of noise and spurious features in a page image. The system itself consists of reusable and independent software modules that can be reconfigured to be adapted to different languages and applications. Currently, we are using them for electronic book and document capture applications. If there is commercial or government demand, we are interested in adapting these tools to information retrieval and intelligence applications.

114 citations

Patent
Yasushi Ogawa1
25 May 1990
TL;DR: A document retrieval system which includes a keyword connection table making section, document accuracy calculating section, a document sorting section, and a learning control section is described in this paper. But, it is limited to the use of keyword connections.
Abstract: A document retrieval system which includes a keyword connection table making section, a document accuracy calculating section, a document sorting section and a learning control section. The document accuracy calculating section calculates a document accuracy for each of the output documents in a prescribed manner by reference to a keyword connection table file. The document sorting section sorts the output documents in downward sequential order of the document accuracy. The learning control section serves to modify the weight of each keyword connection in a prescribed manner after the sorted output documents are given responsive to a query by a user, allowing the user make an evaluation on whether each document accuracy of the output documents is in conformity with the query. The document retrieval system is capable of providing the user with multiple choices from a numerical value between 0 and 1 in terms of a real number in making an evaluation on whether each document accuracy of the output documents is actually in conformity with the query.

114 citations

Proceedings ArticleDOI
18 Aug 1996
TL;DR: In this paper, a new statistical document filtering system called InRoute is described, the problems of flteringe ffectiveness and efficiency that arise with such a system, and experiments with various solutions.
Abstract: Although statistical retrieval models are now accepted widely, there has been little research on how to adapt them tothedemands ofhighspeed documeut filtering. The problems of document retrieval and document filtering are similar at an abstract level, but the architectures required, the optirnizations that are possible, andthecluality of the infermation available, are all different. This paper describes a new statistical document filtering system called InRoute, the problems of flteringe ffectiveness and efficiency that arise with such a system, and experiments with various solutions.

114 citations

Journal ArticleDOI
TL;DR: Analyse des variables affectant, peu ou prou, l'efficacite de the Recherche en ligne: politique d'indexation, strategie de recherche, taille de la base de donnees et taux de couverture des domaines specialises etc.
Abstract: Analyse des variables affectant, peu ou prou, l'efficacite de la recherche en ligne: politique d'indexation, strategie de recherche, taille de la base de donnees et taux de couverture des domaines specialises etc

114 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
81% related
Metadata
43.9K papers, 642.7K citations
79% related
Recommender system
27.2K papers, 598K citations
79% related
Ontology (information science)
57K papers, 869.1K citations
78% related
Natural language
31.1K papers, 806.8K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202239
2021107
2020130
2019144
2018111