scispace - formally typeset
Search or ask a question
Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.


Papers
More filters
Proceedings Article
12 Sep 1994
TL;DR: The integration of a structured-text retrieval system (TextMachine) into an object-oriented database system (Op) is described, using the external function capability of the database system to encapsulate the text retrieval system as an external information source.
Abstract: We describe the integration of a structured-text retrieval system (TextMachine) into an object-oriented database system (OpOur approach is a light-weight one, using the external function capability of the database system to encapsulate the text retrieval system as an external information source. Yet, we are able to provide a tight integration in the query language and processing; the user can access the text retrieval system using a standard database query language. The effcient and effective retrieval of structured text performed by the text retrieval system is seamlessly combined with the rich modeling and general-purpose querying capabilities of the database system, resulting in an integrated system with querying power beyond those of the underlying systems. The integrated system also provides uniform access to textual data in the text retrieval system and structured data in the database system, thereby achieving information fusion. We discuss the design and implementation of our prototype system, and address issues such as the proper framework for external integration, the modeling of complex categorization and structure hierarchies of documents (under automatic document schema impand techniques to reduce the performance overhead of accessing an external source.

55 citations

Proceedings ArticleDOI
18 Aug 1997
TL;DR: A technique for performing information retrieval on document images in such a manner that the accuracy has great utility is developed, and a surprisingly good result is obtained.
Abstract: In conventional information retrieval the task of finding users' search terms in a document is simple. When the document is not available in machine readable format, optical character recognition (OCR) can usually be performed. We have developed a technique for performing information retrieval on document images in such a manner that the accuracy has great utility. The method makes generalisations about the images of characters, then performs classification of these and agglomerates the resulting character shape codes into word tokens based on character shape coding. These are sufficiently specific in their representation of the underlying words to allow reasonable performance of retrieval. Using a collection of over 250 Mbytes of document texts and queries with known relevance assessments, we present a series of experiments to determine how various parameters in the retrieval strategy affect retrieval performance and we obtain a surprisingly good result.

55 citations

Proceedings Article
M. Peairs1
14 Aug 1995
TL;DR: A novel presentation of the document supports both indexing and recognition, thereby allowing the "desktop metaphor" to migrate back to the real desktop.
Abstract: Iconic paper can be used to retrieve documents from an electronic database. It provides on a physical sheet of paper a representation that can be used by humans for recognition, and by machines for indexing. A document in the database can be accessed by a gesture indicating a particular icon on the page. A novel presentation of the document supports both indexing and recognition, thereby allowing the "desktop metaphor" to migrate back to the real desktop.

55 citations

Book ChapterDOI
23 Mar 1998
TL;DR: This paper considers the relevant case where complex similarity queries are defined through a generic language £ and whose predicates refer to a single feature F and suggests that the index should process complex queries as a whole, thus evaluating multiple similarity predicates at a time.
Abstract: Efficient evaluation of similarity queries is one of the basic requirements for advanced multimedia applications. In this paper, we consider the relevant case where complex similarity queries are defined through a generic language L and whose predicates refer to a single feature F. Contrary to the language level which deals only with similarity scores, the proposed evaluation process is based on distances between feature values — known spatial or metric indexes use distances to evaluate predicates. The proposed solution suggests that the index should process complex queries as a whole, thus evaluating multiple similarity predicates at a time. The flexibility of our approach is demonstrated by considering three different similarity languages, and showing how the M-tree access method has been extended to this purpose. Experimental results clearly show that performance of the extended M-tree is consistently better than that of state-of-the-art search algorithms.

55 citations

Journal ArticleDOI
TL;DR: The results show that experience with search engines significantly affects users' attitudes toward search engines for information retrieval, the query- based service is more popular than the directory-based service, and users are not completely satisfied with the precision of retrieved information and the response time of search engines.

55 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
81% related
Metadata
43.9K papers, 642.7K citations
79% related
Recommender system
27.2K papers, 598K citations
79% related
Ontology (information science)
57K papers, 869.1K citations
78% related
Natural language
31.1K papers, 806.8K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202239
2021107
2020130
2019144
2018111