scispace - formally typeset
Search or ask a question
Author

Julinda Gllavata

Other affiliations: University of Marburg
Bio: Julinda Gllavata is an academic researcher from University of Siegen. The author has contributed to research in topics: Optical character recognition & Image segmentation. The author has an hindex of 9, co-authored 13 publications receiving 421 citations. Previous affiliations of Julinda Gllavata include University of Marburg.

Papers
More filters
Proceedings ArticleDOI
23 Aug 2004
TL;DR: A robust text localization approach is presented, which can automatically detect horizontally aligned text with different sizes, fonts, colors and languages and is demonstrated by presenting experimental results for a set of video frames taken from the MPEG-7 video test set.
Abstract: Text localization and recognition in images is important for searching information in digital photo archives, video databases and Web sites However, since text is often printed against a complex background, it is often difficult to detect In this paper, a robust text localization approach is presented, which can automatically detect horizontally aligned text with different sizes, fonts, colors and languages First, a wavelet transform is applied to the image and the distribution of high-frequency wavelet coefficients is considered to statistically characterize text and non-text areas Then, the k-means algorithm is used to classify text areas in the image The detected text areas undergo a projection analysis in order to refine their localization Finally, a binary segmented text image is generated, to be used as input to an OCR engine The detection performance of our approach is demonstrated by presenting experimental results for a set of video frames taken from the MPEG-7 video test set

146 citations

Proceedings ArticleDOI
18 Sep 2003
TL;DR: An efficient algorithm which can automatically detect, localize and extract horizontally aligned text in images (and digital videos) with complex backgrounds is presented.
Abstract: Text detection in images or videos is an important step to achieve multimedia content retrieval. In this paper, an efficient algorithm which can automatically detect, localize and extract horizontally aligned text in images (and digital videos) with complex backgrounds is presented. The proposed approach is based on the application of a color reduction technique, a method for edge detection, and the localization of text regions using projection profile analyses and geometrical properties. The output of the algorithm are text boxes with a simplified background, ready to be fed into an OCR engine for subsequent character recognition. Our proposal is robust with respect to different font sizes, font colors, languages and background complexities. The performance of the approach is demonstrated by presenting promising experimental results for a set of images taken from different types of video sequences.

92 citations

Proceedings ArticleDOI
14 Dec 2003
TL;DR: The basic idea of the approach is to apply an appropriate local thresholding technique to sequences of line histogram differences in order to increase the robustness of text detection with respect to complex backgrounds.
Abstract: Texts appearing in images or videos are not only important sources of information but also significant entities for indexing and retrieval purposes. However, since text is often printed against a shaded or textured background it is often difficult to recognize. In this paper, an approach, which can automatically detect, localize and extract horizontally aligned text with different sizes, fonts and languages in images, is presented. The basic idea of our approach is to apply an appropriate local thresholding technique to sequences of line histogram differences in order to increase the robustness of text detection with respect to complex backgrounds. The performance of our approach is demonstrated by presenting experimental results for a set of images taken from video sequences.

39 citations

Proceedings ArticleDOI
21 Dec 2005
TL;DR: This paper presents an approach for discriminating between Latin and Ideographic script using a k-nearest neighbour classifier, and initial experimental results for a set of images containing text of different scripts demonstrate the good performance of the proposed solution.
Abstract: The extraction of textual information from images and videos is an important task for automatic content-based indexing and retrieval purposes. To extract text from images or videos coming from unknown international sources, it is necessary to know the script beforehand in order to employ suitable text segmentation and optical character recognition (OCR) methods. In this paper, we present an approach for discriminating between Latin and Ideographic script. The proposed approach proceeds as follows: first, the text present in an image is localized. Then, a set of low-level features is extracted from the localized text image. Finally, based on the extracted features, the decision about the type of the script is made using a k-nearest neighbour classifier. Initial experimental results for a set of images containing text of different scripts demonstrate the good performance of the proposed solution

33 citations

Proceedings ArticleDOI
11 Dec 2006
TL;DR: A novel approach based on fuzzy cluster ensemble techniques to solve the problem of detection and localization of text in videos by allowing the incremental inclusion of temporal information regarding the appearance of staticText in videos.
Abstract: Detection and localization of text in videos is an im- portant task towards enabling automatic content-based retrieval of digital video databases. However, since text is often displayed against a complex background, its de- tection is a challenging problem. In this paper, a novel approach based on fuzzy cluster ensemble techniques to solve this problem is presented. The advantage of this approach is that the fuzzy clustering ensemble allows the incremental inclusion of temporal information re- garding the appearance of static text in videos. Com- parative experimental results for a test set of 10.92 minutes of video sequences have shown the very good performance of the proposed approach with an overall recall of 92.04% and a precision of 96.71%.

27 citations


Cited by
More filters
Proceedings ArticleDOI
13 Jun 2010
TL;DR: A novel image operator is presented that seeks to find the value of stroke width for each image pixel, and its use on the task of text detection in natural images is demonstrated.
Abstract: We present a novel image operator that seeks to find the value of stroke width for each image pixel, and demonstrate its use on the task of text detection in natural images. The suggested operator is local and data dependent, which makes it fast and robust enough to eliminate the need for multi-scale computation or scanning windows. Extensive testing shows that the suggested scheme outperforms the latest published algorithms. Its simplicity allows the algorithm to detect texts in many fonts and languages.

1,531 citations

Proceedings ArticleDOI
16 Jun 2012
TL;DR: A system which detects texts of arbitrary orientations in natural images using a two-level classification scheme and two sets of features specially designed for capturing both the intrinsic characteristics of texts to better evaluate its algorithm and compare it with other competing algorithms.
Abstract: With the increasing popularity of practical vision systems and smart phones, text detection in natural scenes becomes a critical yet challenging task. Most existing methods have focused on detecting horizontal or near-horizontal texts. In this paper, we propose a system which detects texts of arbitrary orientations in natural images. Our algorithm is equipped with a two-level classification scheme and two sets of features specially designed for capturing both the intrinsic characteristics of texts. To better evaluate our algorithm and compare it with other competing algorithms, we generate a new dataset, which includes various texts in diverse real-world scenarios; we also propose a protocol for performance evaluation. Experiments on benchmark datasets and the proposed dataset demonstrate that our algorithm compares favorably with the state-of-the-art algorithms when handling horizontal texts and achieves significantly enhanced performance on texts of arbitrary orientations in complex natural scenes.

750 citations

Journal ArticleDOI
TL;DR: This review provides a fundamental comparison and analysis of the remaining problems in the field and summarizes the fundamental problems and enumerates factors that should be considered when addressing these problems.
Abstract: This paper analyzes, compares, and contrasts technical challenges, methods, and the performance of text detection and recognition research in color imagery It summarizes the fundamental problems and enumerates factors that should be considered when addressing these problems Existing techniques are categorized as either stepwise or integrated and sub-problems are highlighted including text localization, verification, segmentation and recognition Special issues associated with the enhancement of degraded text and the processing of video text, multi-oriented, perspectively distorted and multilingual text are also addressed The categories and sub-categories of text are illustrated, benchmark datasets are enumerated, and the performance of the most representative approaches is compared This review provides a fundamental comparison and analysis of the remaining problems in the field

709 citations

Journal ArticleDOI
TL;DR: This article presents an overview of existing map processing techniques, bringing together the past and current research efforts in this interdisciplinary field, to characterize the advances that have been made, and to identify future research directions and opportunities.
Abstract: Maps depict natural and human-induced changes on earth at a fine resolution for large areas and over long periods of time. In addition, maps—especially historical maps—are often the only information source about the earth as surveyed using geodetic techniques. In order to preserve these unique documents, increasing numbers of digital map archives have been established, driven by advances in software and hardware technologies. Since the early 1980s, researchers from a variety of disciplines, including computer science and geography, have been working on computational methods for the extraction and recognition of geographic features from archived images of maps (digital map processing). The typical result from map processing is geographic information that can be used in spatial and spatiotemporal analyses in a Geographic Information System environment, which benefits numerous research fields in the spatial, social, environmental, and health sciences. However, map processing literature is spread across a broad range of disciplines in which maps are included as a special type of image. This article presents an overview of existing map processing techniques, with the goal of bringing together the past and current research efforts in this interdisciplinary field, to characterize the advances that have been made, and to identify future research directions and opportunities.

674 citations

Journal ArticleDOI
TL;DR: A hybrid approach to robustly detect and localize texts in natural scene images using a text region detector, a conditional random field model, and a learning-based energy minimization method are presented.
Abstract: Text detection and localization in natural scene images is important for content-based image analysis. This problem is challenging due to the complex background, the non-uniform illumination, the variations of text font, size and line orientation. In this paper, we present a hybrid approach to robustly detect and localize texts in natural scene images. A text region detector is designed to estimate the text existing confidence and scale information in image pyramid, which help segment candidate text components by local binarization. To efficiently filter out the non-text components, a conditional random field (CRF) model considering unary component properties and binary contextual component relationships with supervised parameter learning is proposed. Finally, text components are grouped into text lines/words with a learning-based energy minimization method. Since all the three stages are learning-based, there are very few parameters requiring manual tuning. Experimental results evaluated on the ICDAR 2005 competition dataset show that our approach yields higher precision and recall performance compared with state-of-the-art methods. We also evaluated our approach on a multilingual image dataset with promising results.

394 citations