Ontology guided access to document images

doi:10.1109/ICDAR.2005.181

Home
/
Papers
/
Ontology guided access to document images

Proceedings Article•DOI•

Ontology guided access to document images

Gaurav Harit¹, Santanu Chaudhury¹, Jagrati Paranjpe¹•Institutions (1)

Indian Institute of Technology Delhi¹

31 Aug 2005-pp 292-296

TL;DR: This paper makes use of an extension of OWL (ontology language for Web) to allow encoding of ontologies for document images to support conceptual querying and automated hyperlinking of document images.

read less

Abstract: In this paper, we propose a scheme for accessing document images using ontology. We make use of an extension of OWL (ontology language for Web) to allow encoding of ontologies for document images. We experimentally demonstrate that reasoning with the concepts defined in ontology and their observation models provide a mechanism to support conceptual querying and automated hyperlinking of document images.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A survey of keyword spotting techniques for printed document images

[...]

Abirami Murugappan¹, Baskaran Ramachandran¹, P. Dhavachelvan¹•Institutions (1)

Anna University¹

01 Feb 2011-Artificial Intelligence Review

TL;DR: A survey of the past researches on character based as keyword based approaches used for retrieving information from document images to provide insights into the strengths and weaknesses of current techniques and the guidance in choosing the area that future work on document image retrieval could address.

...read moreread less

Abstract: This paper attempts to provide a survey of the past researches on character based as keyword based approaches used for retrieving information from document images. This survey also provides insights into the strengths and weaknesses of current techniques, relevancy lies between each technique and also the guidance in choosing the area that future work on document image retrieval could address.

...read moreread less

39 citations

Cites background or methods from "Ontology guided access to document ..."

...Since coarse features could work well across the scripts, it is adopted for three languages (Hindi, Telugu and Bengali) by Harit et al. (2005b) to index the documents....
[...]
...GFG is a graph based representation scheme (Chaudhury et al. 2003) for encoding structural relationship between the shape primitives to characterize the word images....
[...]
...Pixel level representation and matching has been performed by Statistical methods, Hausdorff methods and Coarse feature based methods whereas the feature level representation has been addressed by Primitive String and Geometric Feature Graph (GFG) Extraction....
[...]
...Later GFG is decoded by traversing the encoded string in Depth First order....
[...]
...A detailed summary has been presented in Table 7 for GFG technique....
[...]

Patent•

Ontology-based network search engine

[...]

Thomas J. Eggebraaten¹, Jeffrey W. Tenner¹, Shannon E. Wenzel¹, Eric W. Will¹•Institutions (1)

IBM¹

25 Oct 2007

TL;DR: In this article, a method and apparatus for searching for documents residing on a network comprises receiving a search request from a user, which comprises one or more search terms of an ontology.

...read moreread less

Abstract: A method and apparatus for searching for a documents residing on a network comprises receiving a search request from a user. The search request comprises one or more search terms of an ontology. The ontology includes a plurality of terms. One or more of the plurality of terms includes a plurality of sub-category terms. One or more documents residing on the network is identified based on the one or more search terms and an ontology index. The ontology index comprises a plurality of relationships between the plurality of terms and sub-category terms of the ontology and a plurality of documents residing on the network. One or more search results that describe the one or more documents is presented to the user. The one or more documents contain the one or more search terms, or one of the plurality of sub-category terms of the one or more search terms.

...read moreread less

28 citations

Journal Article•DOI•

Feature string-based intelligent information retrieval from Tamil document images

[...]

S. Abirami¹, D. Manjula¹•Institutions (1)

Anna University¹

01 Jun 2009-Journal of Computer Applications in Technology

TL;DR: A simple and effective method to extract the text and perform intelligent IR from Tamil Document Images without Optical Character Recognition (OCR) that could be easily adopted in large digital libraries for IR.

...read moreread less

Abstract: Information Retrieval (IR) in document images has become a growing and challenging problem due to its rising popularity. This paper proposes a simple and effective method to extract the text and perform intelligent IR from Tamil Document Images without Optical Character Recognition (OCR). This methodology generates a feature string for every word image by extracting its features. This relies on their basic characteristics or shapes of letters instead of recognising the letters like OCR. The strength of this technique lies in extracting the text based on their basic features such as lines and black and white disposition rates in characters which is almost same for the characters across various font sizes and font faces. As an offline process, document images are preprocessed and text extraction process extracts the features from the word images based on their shapes and they are stored in temporary files. During online retrieval, textual keyword is obtained from the user and its primitive string is framed. Based on the primitive string, IR is performed and the resultant images are provided to the user. This technique could be easily adopted in large digital libraries for IR.

...read moreread less

10 citations

Proceedings Article•DOI•

Word image based latent semantic indexing for conceptual querying in document image databases

[...]

Subhashis Banerjee¹, Gaurav Harit¹, Santanu Chaudhury¹•Institutions (1)

Indian Institute of Technology Delhi¹

23 Sep 2007

TL;DR: It is shown through extensive experiments on a large database that use of LSA for document images provides improvements in retrieval precision as is the case with electronic text documents.

...read moreread less

Abstract: In this paper we present an application of latent semantic analysis (LSA) for indexing and retrieval of document images with text The query is specified as a set of word images and the documents which best match with the query representation in the the latent semantic space are retrieved We show through extensive experiments on a large database that use of LSA for document images provides improvements in retrieval precision as is the case with electronic text documents

...read moreread less

7 citations

Cites background from "Ontology guided access to document ..."

...Harit et al [7] have made use of document content-domain ontologies for query expansion based on ontological relations between semantic concepts....
[...]

Book Chapter•DOI•

Ontology-Driven Content-Based Retrieval of Heritage Images

[...]

Dipannita Podder¹, Jit Mukherjee¹, Shashaank M. Aswatha¹, Jayanta Mukherjee¹, Shamik Sural¹ - Show less +1 more•Institutions (1)

Indian Institute of Technology Kharagpur¹

01 Jan 2018

TL;DR: An ontology-driven content-based image retrieval system that follows bag of visual words model to recollect near-similar images from the database and the inclusion of ontology to prune the search space of CBIR system is observed to provide a considerable improvement in the performance.

...read moreread less

Abstract: In this paper, we present an approach to retrieve structurally and semantically similar images from heritage image dataset. It is an ontology-driven content-based image retrieval (CBIR) system that follows bag of visual words model to recollect near-similar images from the database. Locality-sensitive hashing (LSH) technique has been employed to determine approximate nearest neighbor. We have used an ontology that is particularly developed for Hindu mythology using standard ontology markup language (OWL) on Protege framework to narrow down the semantic gap in the search space. The inclusion of ontology to prune the search space of CBIR system is observed to provide a considerable improvement in the performance. The approach is tested against annotated databases of heritage images that are collected from various heritage sites across India. A web-based system has also been developed to provide a suitable interface and to demonstrate this technique.

...read moreread less

4 citations

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Word image matching using dynamic time warping

[...]

Toni M. Rath¹, R. Manmatha¹•Institutions (1)

University of Massachusetts Amherst¹

18 Jun 2003

TL;DR: This work presents an algorithm for matching handwritten words in noisy historical documents that performs better and is faster than competing matching techniques and presents experimental results on two different data sets from the George Washington collection.

...read moreread less

Abstract: Libraries and other institutions are interested in providing access to scanned versions of their large collections of handwritten historical manuscripts on electronic media. Convenient access to a collection requires an index, which is manually created at great labor and expense. Since current handwriting recognizers do not perform well on historical documents, a technique called word spotting has been developed: clusters with occurrences of the same word in a collection are established using image matching. By annotating "interesting" clusters, an index can be built automatically. We present an algorithm for matching handwritten words in noisy historical documents. The segmented word images are preprocessed to create sets of 1-dimensional features, which are then compared using dynamic time warping. We present experimental results on two different data sets from the George Washington collection. Our experiments show that this algorithm performs better and is faster than competing matching techniques.

...read moreread less

626 citations

Journal Article•DOI•

Finding approximate patterns in strings

[...]

Esko Ukkonen¹•Institutions (1)

University of Helsinki¹

01 Mar 1985-Journal of Algorithms

TL;DR: An algorithm is presented to construct a deterministic finite-state automaton that solves the problem of locating in any string a substring whose edit distance from p is at most a given constant t.

...read moreread less

413 citations

"Ontology guided access to document ..." refers methods in this paper

...The document images are indexed with respect to these features using a suffix tree [10] based indexing scheme....
[...]

Original articles Content-based image indexing and searching using Daubechies' wavelets

[...]

James Z. Wang, Gio Wiederhold, Oscar Firschein, Sha Xin Wei

01 Jan 1997

TL;DR: WBIIS as mentioned in this paper applies a Daubechies' wavelet transform for each of the three opponent color components, and the wavelet coefficients in the lowest few frequency bands, and their variances, are stored as feature vectors.

...read moreread less

Abstract: This paper describes WBIIS (Wavelet-Based Image Indexing and Searching), a new image indexing and retrieval algorithm with partial sketch image search- ing capability for large image databases. The algorithm characterizes the color variations over the spatial extent of the image in a manner that provides semantically meaningful image comparisons. The indexing algorithm applies a Daubechies' wavelet transform for each of the three opponent color components. The wavelet coeA- cients in the lowest few frequency bands, and their variances, are stored as feature vectors. To speed up retrieval, a two-step procedure is used that first does a crude selection based on the variances, and then refines the search by performing a feature vector match between the selected images and the query. For better accuracy in searching, two-level multiresolution matching may also be used. Masks are used for partial-sketch queries. This technique performs much better in capturing coherence of image, object granularity, local color/texture, and bias avoidance than traditional color layout algorithms. WBIIS is much faster and more accurate than traditional algorithms. When tested on a database of more than 10000 general-purpose images, the best 100 matches were found in 3.3 seconds.

...read moreread less

363 citations

Journal Article•DOI•

Content-based image indexing and searching using Daubechies' wavelets

[...]

James Z. Wang¹, Gio Wiederhold¹, Oscar Firschein¹, Sha Xin Wei¹•Institutions (1)

Stanford University¹

03 Mar 1998-International Journal on Digital Libraries

TL;DR: WBIIS (Wavelet-Based Image Indexing and Searching), a new image indexing and retrieval algorithm with partial sketch image searching capability for large image databases, which performs much better in capturing coherence of image, object granularity, local color/texture, and bias avoidance than traditional color layout algorithms.

...read moreread less

Abstract: This paper describes WBIIS (Wavelet-Based Image Indexing and Searching), a new image indexing and retrieval algorithm with partial sketch image searching capability for large image databases. The algorithm characterizes the color variations over the spatial extent of the image in a manner that provides semantically meaningful image comparisons. The indexing algorithm applies a Daubechies' wavelet transform for each of the three opponent color components. The wavelet coefficients in the lowest few frequency bands, and their variances, are stored as feature vectors. To speed up retrieval, a two-step procedure is used that first does a crude selection based on the variances, and then refines the search by performing a feature vector match between the selected images and the query. For better accuracy in searching, two-level multiresolution matching may also be used. Masks are used for partial-sketch queries. This technique performs much better in capturing coherence of image, object granularity, local color/texture, and bias avoidance than traditional color layout algorithms. WBIIS is much faster and more accurate than traditional algorithms. When tested on a database of more than 10 000 general-purpose images, the best 100 matches were found in 3.3 seconds.

...read moreread less

346 citations

"Ontology guided access to document ..." refers methods in this paper

...We have used the Color Histogram and Wavelet filter based features [11]....
[...]

Journal Article•DOI•

The Indexing and Retrieval of Document Images

[...]

David Doermann¹•Institutions (1)

University of Maryland, College Park¹

01 Jun 1998-Computer Vision and Image Understanding

TL;DR: A survey of methods developed by researchers to access and manipulate document images without the need for complete and accurate conversion is provided.

...read moreread less

319 citations

"Ontology guided access to document ..." refers methods in this paper

...Existing work on document image retrieval has made use of annotations [1], layout similarity [5, 3], word spotting [7]and OCR’ed text [6]In [8] a statistical classifier is used for retrieval of hand-written historical documents using joint probability distribution between features computed from word images and their transcription....
[...]