scispace - formally typeset
Search or ask a question
Proceedings Article•DOI•

Ontology guided access to document images

TL;DR: This paper makes use of an extension of OWL (ontology language for Web) to allow encoding of ontologies for document images to support conceptual querying and automated hyperlinking of document images.
Abstract: In this paper, we propose a scheme for accessing document images using ontology. We make use of an extension of OWL (ontology language for Web) to allow encoding of ontologies for document images. We experimentally demonstrate that reasoning with the concepts defined in ontology and their observation models provide a mechanism to support conceptual querying and automated hyperlinking of document images.
Citations
More filters
Journal Article•DOI•
TL;DR: A survey of the past researches on character based as keyword based approaches used for retrieving information from document images to provide insights into the strengths and weaknesses of current techniques and the guidance in choosing the area that future work on document image retrieval could address.
Abstract: This paper attempts to provide a survey of the past researches on character based as keyword based approaches used for retrieving information from document images. This survey also provides insights into the strengths and weaknesses of current techniques, relevancy lies between each technique and also the guidance in choosing the area that future work on document image retrieval could address.

39 citations


Cites background or methods from "Ontology guided access to document ..."

  • ...Since coarse features could work well across the scripts, it is adopted for three languages (Hindi, Telugu and Bengali) by Harit et al. (2005b) to index the documents....

    [...]

  • ...GFG is a graph based representation scheme (Chaudhury et al. 2003) for encoding structural relationship between the shape primitives to characterize the word images....

    [...]

  • ...Pixel level representation and matching has been performed by Statistical methods, Hausdorff methods and Coarse feature based methods whereas the feature level representation has been addressed by Primitive String and Geometric Feature Graph (GFG) Extraction....

    [...]

  • ...Later GFG is decoded by traversing the encoded string in Depth First order....

    [...]

  • ...A detailed summary has been presented in Table 7 for GFG technique....

    [...]

Patent•
25 Oct 2007
TL;DR: In this article, a method and apparatus for searching for documents residing on a network comprises receiving a search request from a user, which comprises one or more search terms of an ontology.
Abstract: A method and apparatus for searching for a documents residing on a network comprises receiving a search request from a user. The search request comprises one or more search terms of an ontology. The ontology includes a plurality of terms. One or more of the plurality of terms includes a plurality of sub-category terms. One or more documents residing on the network is identified based on the one or more search terms and an ontology index. The ontology index comprises a plurality of relationships between the plurality of terms and sub-category terms of the ontology and a plurality of documents residing on the network. One or more search results that describe the one or more documents is presented to the user. The one or more documents contain the one or more search terms, or one of the plurality of sub-category terms of the one or more search terms.

28 citations

Journal Article•DOI•
S. Abirami1, D. Manjula1•
TL;DR: A simple and effective method to extract the text and perform intelligent IR from Tamil Document Images without Optical Character Recognition (OCR) that could be easily adopted in large digital libraries for IR.
Abstract: Information Retrieval (IR) in document images has become a growing and challenging problem due to its rising popularity. This paper proposes a simple and effective method to extract the text and perform intelligent IR from Tamil Document Images without Optical Character Recognition (OCR). This methodology generates a feature string for every word image by extracting its features. This relies on their basic characteristics or shapes of letters instead of recognising the letters like OCR. The strength of this technique lies in extracting the text based on their basic features such as lines and black and white disposition rates in characters which is almost same for the characters across various font sizes and font faces. As an offline process, document images are preprocessed and text extraction process extracts the features from the word images based on their shapes and they are stored in temporary files. During online retrieval, textual keyword is obtained from the user and its primitive string is framed. Based on the primitive string, IR is performed and the resultant images are provided to the user. This technique could be easily adopted in large digital libraries for IR.

10 citations

Proceedings Article•DOI•
23 Sep 2007
TL;DR: It is shown through extensive experiments on a large database that use of LSA for document images provides improvements in retrieval precision as is the case with electronic text documents.
Abstract: In this paper we present an application of latent semantic analysis (LSA) for indexing and retrieval of document images with text The query is specified as a set of word images and the documents which best match with the query representation in the the latent semantic space are retrieved We show through extensive experiments on a large database that use of LSA for document images provides improvements in retrieval precision as is the case with electronic text documents

7 citations


Cites background from "Ontology guided access to document ..."

  • ...Harit et al [7] have made use of document content-domain ontologies for query expansion based on ontological relations between semantic concepts....

    [...]

Book Chapter•DOI•
01 Jan 2018
TL;DR: An ontology-driven content-based image retrieval system that follows bag of visual words model to recollect near-similar images from the database and the inclusion of ontology to prune the search space of CBIR system is observed to provide a considerable improvement in the performance.
Abstract: In this paper, we present an approach to retrieve structurally and semantically similar images from heritage image dataset. It is an ontology-driven content-based image retrieval (CBIR) system that follows bag of visual words model to recollect near-similar images from the database. Locality-sensitive hashing (LSH) technique has been employed to determine approximate nearest neighbor. We have used an ontology that is particularly developed for Hindu mythology using standard ontology markup language (OWL) on Protege framework to narrow down the semantic gap in the search space. The inclusion of ontology to prune the search space of CBIR system is observed to provide a considerable improvement in the performance. The approach is tested against annotated databases of heritage images that are collected from various heritage sites across India. A web-based system has also been developed to provide a suitable interface and to demonstrate this technique.

4 citations

References
More filters
Proceedings Article•DOI•
18 Jun 2003
TL;DR: This work presents an algorithm for matching handwritten words in noisy historical documents that performs better and is faster than competing matching techniques and presents experimental results on two different data sets from the George Washington collection.
Abstract: Libraries and other institutions are interested in providing access to scanned versions of their large collections of handwritten historical manuscripts on electronic media. Convenient access to a collection requires an index, which is manually created at great labor and expense. Since current handwriting recognizers do not perform well on historical documents, a technique called word spotting has been developed: clusters with occurrences of the same word in a collection are established using image matching. By annotating "interesting" clusters, an index can be built automatically. We present an algorithm for matching handwritten words in noisy historical documents. The segmented word images are preprocessed to create sets of 1-dimensional features, which are then compared using dynamic time warping. We present experimental results on two different data sets from the George Washington collection. Our experiments show that this algorithm performs better and is faster than competing matching techniques.

626 citations

Journal Article•DOI•
Esko Ukkonen1•
TL;DR: An algorithm is presented to construct a deterministic finite-state automaton that solves the problem of locating in any string a substring whose edit distance from p is at most a given constant t.

413 citations


"Ontology guided access to document ..." refers methods in this paper

  • ...The document images are indexed with respect to these features using a suffix tree [10] based indexing scheme....

    [...]

01 Jan 1997
TL;DR: WBIIS as mentioned in this paper applies a Daubechies' wavelet transform for each of the three opponent color components, and the wavelet coefficients in the lowest few frequency bands, and their variances, are stored as feature vectors.
Abstract: This paper describes WBIIS (Wavelet-Based Image Indexing and Searching), a new image indexing and retrieval algorithm with partial sketch image search- ing capability for large image databases. The algorithm characterizes the color variations over the spatial extent of the image in a manner that provides semantically meaningful image comparisons. The indexing algorithm applies a Daubechies' wavelet transform for each of the three opponent color components. The wavelet coeA- cients in the lowest few frequency bands, and their variances, are stored as feature vectors. To speed up retrieval, a two-step procedure is used that first does a crude selection based on the variances, and then refines the search by performing a feature vector match between the selected images and the query. For better accuracy in searching, two-level multiresolution matching may also be used. Masks are used for partial-sketch queries. This technique performs much better in capturing coherence of image, object granularity, local color/texture, and bias avoidance than traditional color layout algorithms. WBIIS is much faster and more accurate than traditional algorithms. When tested on a database of more than 10000 general-purpose images, the best 100 matches were found in 3.3 seconds.

363 citations

Journal Article•DOI•
TL;DR: WBIIS (Wavelet-Based Image Indexing and Searching), a new image indexing and retrieval algorithm with partial sketch image searching capability for large image databases, which performs much better in capturing coherence of image, object granularity, local color/texture, and bias avoidance than traditional color layout algorithms.
Abstract: This paper describes WBIIS (Wavelet-Based Image Indexing and Searching), a new image indexing and retrieval algorithm with partial sketch image searching capability for large image databases. The algorithm characterizes the color variations over the spatial extent of the image in a manner that provides semantically meaningful image comparisons. The indexing algorithm applies a Daubechies' wavelet transform for each of the three opponent color components. The wavelet coefficients in the lowest few frequency bands, and their variances, are stored as feature vectors. To speed up retrieval, a two-step procedure is used that first does a crude selection based on the variances, and then refines the search by performing a feature vector match between the selected images and the query. For better accuracy in searching, two-level multiresolution matching may also be used. Masks are used for partial-sketch queries. This technique performs much better in capturing coherence of image, object granularity, local color/texture, and bias avoidance than traditional color layout algorithms. WBIIS is much faster and more accurate than traditional algorithms. When tested on a database of more than 10 000 general-purpose images, the best 100 matches were found in 3.3 seconds.

346 citations


"Ontology guided access to document ..." refers methods in this paper

  • ...We have used the Color Histogram and Wavelet filter based features [11]....

    [...]

Journal Article•DOI•
TL;DR: A survey of methods developed by researchers to access and manipulate document images without the need for complete and accurate conversion is provided.

319 citations


"Ontology guided access to document ..." refers methods in this paper

  • ...Existing work on document image retrieval has made use of annotations [1], layout similarity [5, 3], word spotting [7]and OCR’ed text [6]In [8] a statistical classifier is used for retrieval of hand-written historical documents using joint probability distribution between features computed from word images and their transcription....

    [...]