scispace - formally typeset
Search or ask a question

Showing papers by "Robert Sablatnig published in 2010"


Proceedings ArticleDOI
TL;DR: Ancient Slavonic manuscripts from the 11th century are investigated and a binarization-free approach based on local descriptors is proposed, which can handle highly degraded manuscript images with background clutter and faded out characters.
Abstract: Considering printed Latin text, the main issues of Optical Character Recognition (OCR) systems are solved. However, for degraded handwritten document images, basic preprocessing steps such as binarization, gain poor results with state-of-the-art methods. In this paper ancient Slavonic manuscripts from the 11th century are investigated. In order to minimize the consequences of false character segmentation, a binarization-free approach based on local descriptors is proposed. Additionally local information allows the recognition of partially visible or washed out characters. The proposed algorithm consists of two steps: character classification and character localization. Initially Scale Invariant Feature Transform (SIFT) features are extracted which are subsequently classified using Support Vector Machines (SVM). Afterwards, the interest points are clustered according to their spatial information. Thereby, characters are localized and finally recognized based on a weighted voting scheme of pre-classified local descriptors. Preliminary results show that the proposed system can handle highly degraded manuscript images with background clutter (e.g. stains, tears) and faded out characters.

24 citations


Journal ArticleDOI
TL;DR: An automatic method for the quantification of the development of cutaneous hemangiomas in digital images aimed at a more accurate and objective evaluation of the course of disease than the current clinical practice of manual measurement.

18 citations


Proceedings ArticleDOI
09 Jun 2010
TL;DR: The potential of document analysis techniques of snippets to support a reconstruction algorithm by considering additional features is shown and preliminary results show that these features can be determined reliably on a real dataset consisting of 690 snippets.
Abstract: Document analysis is done to analyze entire forms (e.g. intelligent form analysis, table detection) or to describe the layout/structure of a document. In this paper document analysis is applied to snippets of torn documents to calculate features that can be used for reconstruction. The main intention is to handle snippets of varying size and different contents (e.g. handwritten or printed text). Documents can either be destroyed by the intention to make the printed content unavailable (e.g. business crime) or due to time induced degeneration of ancient documents (e.g. bad storage conditions). Current reconstruction methods for manually torn documents deal with the shape, or e.g. inpainting and texture synthesis techniques. In this paper the potential of document analysis techniques of snippets to support a reconstruction algorithm by considering additional features is shown. This implies a rotational analysis, a color analysis, a line detection, a paper type analysis (checked, lined, blank) and a classification of the text (printed or hand written). Preliminary results show that these features can be determined reliably on a real dataset consisting of 690 snippets.

16 citations


Proceedings ArticleDOI
09 Jun 2010
TL;DR: A new message update rule in the well known belief propagation algorithm based on a higher order potential function is introduced to solve higher order energy functions.
Abstract: Multi-spectral imaging for the analysis and preservation of ancient documents has gained high attention in recent years. While readability enhancement is based on the multi-spectral image corpus, foreground-background separation still relies mainly on gray level or color images. In this paper we propose a foreground-background separation algorithm designed for multi-spectral images. The main contribution is the simultaneously utilization of spectral and spatial features. While spectral features incorporate the spectral components of the multi-spectral images, the spatial features are based on stroke properties. Higher order Markov Random Fields enables an efficient way to combine both features. To solve higher order energy functions, we introduce a new message update rule in the well known belief propagation algorithm based on a higher order potential function.

14 citations


Proceedings ArticleDOI
16 Nov 2010
TL;DR: An approach for the detection of decorative elements – such as initials and headlines – and text regions, focused on ancient manuscripts, is presented and shows that the method is able to locate regular text in ancient manuscripts.
Abstract: An approach for the detection of decorative elements – such as initials and headlines – and text regions, focused on ancient manuscripts, is presented. Due to their age, ancient manuscripts suffer from degradation and staining as well as ink is faded-out over the time. Identifying decorative elements and text regions allows indexing a manuscript and serves as input for Optical Character Recognition (OCR) as it localizes regions of interest within document pages. We propose a robust method inspired by state-of-the-art object recognition methodologies. Scale Invariant Feature Transform (SIFT) descriptors are chosen to detect the regions of interest, and the scale of the interest points is used for localization. The classification is based on the fact that local properties of the decorative elements are different to those of regular text. The results show that the method is able to locate regular text in ancient manuscripts. The detection rate of decorative elements is not as high as for regular text but already yields to promising results.

13 citations


Proceedings ArticleDOI
16 Nov 2010
TL;DR: A character recognition system that handles degraded manuscript documents like the ones discovered at the St. Catherine’s Monastery is presented, based on local descriptors which are clustered in order to localize characters.
Abstract: This paper presents a character recognition system that handles degraded manuscript documents like the ones discovered at the St. Catherine’s Monastery. In contrast to state-of-the-art OCR systems, no early decision (image binarization) needs to be performed. Thus, an object recognition methodology is adapted for the recognition of ancient manuscripts. The proposed system is based on local descriptors which are clustered in order to localize characters. Finally, a class probability histogram is assigned to each character present in an image which allows for the character classification. The system achieves an F0.5 score of 0.77 on real world data that contains 13.5% highly degraded characters.

10 citations


Proceedings ArticleDOI
13 Dec 2010
TL;DR: A texture-based approach is proposed, which exploits the fact that different kinds of textures have distinct orientation distributions, and shows the accuracy of the features chosen even when the method is applied to document pages that are different in writing style and line spacing to those in the training set.
Abstract: Text recognition in ancient documents poses specific challenges such as degradation and staining, fading out of ink, fluctuating text lines, superimposing of text-elements or varying layouts, amongst others. To cope with those challenges, a texture-based approach is proposed, which exploits the fact that different kinds of textures have distinct orientation distributions. The orientation information is extracted using the Auto-Correlation Function (ACF). The approach is applied to three different manuscripts, namely to Glagolitic manuscripts of the 11th century, a Latin and a composite Latin-German manuscript, both originating from the 14th century. The evaluation is based on manually labeled ground truth and shows the accuracy of the features chosen even when the method is applied to document pages that are different in writing style and line spacing to those in the training set.

9 citations


Proceedings ArticleDOI
13 Dec 2010
TL;DR: An overview of currently used 2D features for reconstruction on the example of paper reconstruction, how 3D objects like pottery can be reconstructed and shows how these2D features can be used for the reassembling of the Ephesos marble plates are presented.
Abstract: The reconstruction of marble plates of Ephesos is discussed in this paper. Automated reconstruction techniques exist for 2 dimensional objects as well as for 3 dimensional objects. Current applications that consider the shape of the fragments only, are able to reconstruct complete objects consisting of no more than e.g. 10 parts. Therefore, additional features are necessary to reconstruct objects with a large number of fragments, or if pieces of different objects are mixed up. This paper presents an overview of currently used 2D features for reconstruction on the example of paper reconstruction, how 3D objects like pottery can be reconstructed and shows how these 2D features can be used for the reassembling of the Ephesos marble plates.

8 citations


Proceedings ArticleDOI
TL;DR: Preliminary results show that these pre-processing steps can be performed reliably on a real dataset consisting of 690 snippets and the possibility of document analysis techniques of snippets to support the matching algorithm by considering additional features are shown.
Abstract: Document analysis is done to analyze entire forms (eg intelligent form analysis, table detection) or to describe the layout/structure of a document Also skew detection of scanned documents is performed to support OCR algorithms that are sensitive to skew In this paper document analysis is applied to snippets of torn documents to calculate features for the reconstruction Documents can either be destroyed by the intention to make the printed content unavailable (eg tax fraud investigation, business crime) or due to time induced degeneration of ancient documents (eg bad storage conditions) Current reconstruction methods for manually torn documents deal with the shape, inpainting and texture synthesis techniques In this paper the possibility of document analysis techniques of snippets to support the matching algorithm by considering additional features are shown This implies a rotational analysis, a color analysis and a line detection As a future work it is planned to extend the feature set with the paper type (blank, checked, lined), the type of the writing (handwritten vs machine printed) and the text layout of a snippet (text size, line spacing) Preliminary results show that these pre-processing steps can be performed reliably on a real dataset consisting of 690 snippets

6 citations


Book ChapterDOI
29 Nov 2010
TL;DR: An approach drawing its inspiration from state-of-the-art object recognition methodologies - namely Scale Invariant Feature Transform (Sift) descriptors - is proposed, and the method is able to locate regular text in ancient manuscripts.
Abstract: This paper presents a technique for layout analysis of historical document images based on local descriptors. The considered layout elements are regions of regular text and elements having a decorative meaning such as headlines and initials. The proposed technique exploits the differences in the local properties of the layout elements. For this purpose, an approach drawing its inspiration from state-of-the-art object recognition methodologies - namely Scale Invariant Feature Transform (Sift) descriptors - is proposed. The scale of the interest points is used for localization. The results show that the method is able to locate regular text in ancient manuscripts. The detection rate of decorative elements is not as high as for regular text but already yields to promising results.

5 citations


Proceedings ArticleDOI
23 Aug 2010
TL;DR: A Markov Random Field is incorporated which provides a powerful tool to combine bothground-background separation in multispectral images of damaged manuscripts and belief propagation for inference and include the higher order potentials by upgrading the message update.
Abstract: Foreground-background separation in multispectral images of damaged manuscripts can benefit from both, spectral and spatial information. Therefore, we incorporate a Markov Random Field which provides a powerful tool to combine both features simultaneously. Higher order models enable the inclusion of spatial constraints based on stroke characteristics. We apply belief propagation for inference and include the higher order potentials by upgrading the message update. The proposed segmentation method requires no training and is independent of script, size, and style of characters. We will demonstrate the robust performance on a set of degraded documents and on synthetic images.