scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Text localization, enhancement and binarization in multimedia documents

10 Dec 2002-Vol. 2, pp 1037-1040
TL;DR: An algorithm to localize artificial text in images and videos using a measure of accumulated gradients and morphological post processing to detect the text is presented and the quality of the localized text is improved by robust multiple frame integration.
Abstract: The systems currently available for content based image and video retrieval work without semantic knowledge, i.e. they use image processing methods to extract low level features of the data. The similarity obtained by these approaches does not always correspond to the similarity a human user would expect. A way to include more semantic knowledge into the indexing process is to use the text included in the images and video sequences. It is rich in information but easy to use, e.g. by key word based queries. In this paper we present an algorithm to localize artificial text in images and videos using a measure of accumulated gradients and morphological post processing to detect the text. The quality of the localized text is improved by robust multiple frame integration. Anew technique for the binarization of the text boxes is proposed. Finally, detection and OCR results for a commercial OCR are presented.
Citations
More filters
01 Jan 1979
TL;DR: This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis and addressing interesting real-world computer Vision and multimedia applications.
Abstract: In the real world, a realistic setting for computer vision or multimedia recognition problems is that we have some classes containing lots of training data and many classes contain a small amount of training data. Therefore, how to use frequent classes to help learning rare classes for which it is harder to collect the training data is an open question. Learning with Shared Information is an emerging topic in machine learning, computer vision and multimedia analysis. There are different level of components that can be shared during concept modeling and machine learning stages, such as sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, etc. Regarding the specific methods, multi-task learning, transfer learning and deep learning can be seen as using different strategies to share information. These learning with shared information methods are very effective in solving real-world large-scale problems. This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis. Both state-of-the-art works, as well as literature reviews, are welcome for submission. Papers addressing interesting real-world computer vision and multimedia applications are especially encouraged. Topics of interest include, but are not limited to: • Multi-task learning or transfer learning for large-scale computer vision and multimedia analysis • Deep learning for large-scale computer vision and multimedia analysis • Multi-modal approach for large-scale computer vision and multimedia analysis • Different sharing strategies, e.g., sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, • Real-world computer vision and multimedia applications based on learning with shared information, e.g., event detection, object recognition, object detection, action recognition, human head pose estimation, object tracking, location-based services, semantic indexing. • New datasets and metrics to evaluate the benefit of the proposed sharing ability for the specific computer vision or multimedia problem. • Survey papers regarding the topic of learning with shared information. Authors who are unsure whether their planned submission is in scope may contact the guest editors prior to the submission deadline with an abstract, in order to receive feedback.

1,758 citations

Proceedings ArticleDOI
Simon M. Lucas1, A. Panaretos1, L. Sosa1, A. Tang1, S. Wong1, R. Young1 
03 Aug 2003
TL;DR: The robust reading problem was broken down into three sub-problems, and competitions for each stage, and also a competition for the best overall system, which was the only one to have any entries.
Abstract: This paper describes the robust reading competitions forICDAR 2003. With the rapid growth in research over thelast few years on recognizing text in natural scenes, thereis an urgent need to establish some common benchmarkdatasets, and gain a clear understanding of the current stateof the art. We use the term robust reading to refer to text imagesthat are beyond the capabilities of current commercialOCR packages. We chose to break down the robust readingproblem into three sub-problems, and run competitionsfor each stage, and also a competition for the best overallsystem. The sub-problems we chose were text locating,character recognition and word recognition.By breaking down the problem in this way, we hope togain a better understanding of the state of the art in eachof the sub-problems. Furthermore, our methodology involvesstoring detailed results of applying each algorithm toeach image in the data sets, allowing researchers to study indepth the strengths and weaknesses of each algorithm. Thetext locating contest was the only one to have any entries.We report the results of this contest, and show cases wherethe leading algorithms succeed and fail.

607 citations


Cites background from "Text localization, enhancement and ..."

  • ...In recent years there has been some significant research into these general reading systems that are able to locate and/or read text in scene images [11, 2, 3, 4, 1, 10]....

    [...]

Journal ArticleDOI
TL;DR: A survey of application domains, technical challenges, and solutions for the analysis of documents captured by digital cameras, and some sample applications under development and feasible ideas for future development is presented.
Abstract: The increasing availability of high-performance, low-priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or wearable computers, and standalone image or video devices are highly mobile and easy to use; they can capture images of thick books, historical manuscripts too fragile to touch, and text in scenes, making them much more versatile than desktop scanners. Should robust solutions to the analysis of documents captured with such devices become available, there will clearly be a demand in many domains. Traditional scanner-based document analysis techniques provide us with a good reference and starting point, but they cannot be used directly on camera-captured images. Camera-captured images can suffer from low resolution, blur, and perspective distortion, as well as complex layout and interaction of the content and background. In this paper we present a survey of application domains, technical challenges, and solutions for the analysis of documents captured by digital cameras. We begin by describing typical imaging devices and the imaging process. We discuss document analysis from a single camera-captured image as well as multiple frames and highlight some sample applications under development and feasible ideas for future development.

493 citations

Journal ArticleDOI
TL;DR: A new framework to detect text strings with arbitrary orientations in complex natural scene images with outperform the state-of-the-art results on the public Robust Reading Dataset, which contains text only in horizontal orientation.
Abstract: Text information in natural scene images serves as important clues for many image-based applications such as scene understanding, content-based image retrieval, assistive navigation, and automatic geocoding. However, locating text from a complex background with multiple colors is a challenging task. In this paper, we explore a new framework to detect text strings with arbitrary orientations in complex natural scene images. Our proposed framework of text string detection consists of two steps: 1) image partition to find text character candidates based on local gradient features and color uniformity of character components and 2) character candidate grouping to detect text strings based on joint structural features of text characters in each text string such as character size differences, distances between neighboring characters, and character alignment. By assuming that a text string has at least three characters, we propose two algorithms of text string detection: 1) adjacent character grouping method and 2) text line grouping method. The adjacent character grouping method calculates the sibling groups of each character candidate as string segments and then merges the intersecting sibling groups into text string. The text line grouping method performs Hough transform to fit text line among the centroids of text candidates. Each fitted text line describes the orientation of a potential text string. The detected text string is presented by a rectangle region covering all characters whose centroids are cascaded in its text line. To improve efficiency and accuracy, our algorithms are carried out in multi-scales. The proposed methods outperform the state-of-the-art results on the public Robust Reading Dataset, which contains text only in horizontal orientation. Furthermore, the effectiveness of our methods to detect text strings with arbitrary orientations is evaluated on the Oriented Scene Text Dataset collected by ourselves containing text strings in nonhorizontal orientations.

355 citations


Cites methods from "Text localization, enhancement and ..."

  • ...[37] improved Otsu’s method to binarize text regions from background, followed by a sequence of morphological processing to reduce noise and correct classification errors....

    [...]

Proceedings ArticleDOI
31 Aug 2005
TL;DR: The main result is that the leading 2005 entry has improved significantly on the leading 2003 entry, with an increase in average f- score from 0.5 to 0.62, where the f-score is the same adapted information retrieval measure used for the 2003 competition.
Abstract: This paper describes the results of the ICDAR 2005 competition for locating text in camera captured scenes. For this we used the same data as the ICDAR 2003 competition, which has been kept private until now. This allows a direct comparison with the 2003 entries. The main result is that the leading 2005 entry has improved significantly on the leading 2003 entry, with an increase in average f-score from 0.5 to 0.62, where the f-score is the same adapted information retrieval measure used for the 2003 competition. The paper also discusses the Web-based deployment and evaluation of text locating systems, and one of the leading entries has now been deployed in this way. This mode of usage could lead to more complete and more immediate knowledge of the strengths and weaknesses of each newly developed system.

304 citations

References
More filters
01 Jan 1979
TL;DR: This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis and addressing interesting real-world computer Vision and multimedia applications.
Abstract: In the real world, a realistic setting for computer vision or multimedia recognition problems is that we have some classes containing lots of training data and many classes contain a small amount of training data. Therefore, how to use frequent classes to help learning rare classes for which it is harder to collect the training data is an open question. Learning with Shared Information is an emerging topic in machine learning, computer vision and multimedia analysis. There are different level of components that can be shared during concept modeling and machine learning stages, such as sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, etc. Regarding the specific methods, multi-task learning, transfer learning and deep learning can be seen as using different strategies to share information. These learning with shared information methods are very effective in solving real-world large-scale problems. This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis. Both state-of-the-art works, as well as literature reviews, are welcome for submission. Papers addressing interesting real-world computer vision and multimedia applications are especially encouraged. Topics of interest include, but are not limited to: • Multi-task learning or transfer learning for large-scale computer vision and multimedia analysis • Deep learning for large-scale computer vision and multimedia analysis • Multi-modal approach for large-scale computer vision and multimedia analysis • Different sharing strategies, e.g., sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, • Real-world computer vision and multimedia applications based on learning with shared information, e.g., event detection, object recognition, object detection, action recognition, human head pose estimation, object tracking, location-based services, semantic indexing. • New datasets and metrics to evaluate the benefit of the proposed sharing ability for the specific computer vision or multimedia problem. • Survey papers regarding the topic of learning with shared information. Authors who are unsure whether their planned submission is in scope may contact the guest editors prior to the submission deadline with an abstract, in order to receive feedback.

1,758 citations


"Text localization, enhancement and ..." refers methods in this paper

  • ...New methods based on edge detection or texture analysis soon followed [2, 3, 7, 9] as well as algorithms working in the compressed domain [11]....

    [...]

Book
01 Jan 1986

1,745 citations


"Text localization, enhancement and ..." refers methods in this paper

  • ...For our system we chose Niblack’s method [4], which also performed one of the best in a well known survey [6]....

    [...]

Journal ArticleDOI
TL;DR: This paper presents a methodology for evaluation of low-level image analysis methods, using binarization (two-level thresholding) as an example, and defines the performance of the character recognition module as the objective measure.
Abstract: This paper presents a methodology for evaluation of low-level image analysis methods, using binarization (two-level thresholding) as an example. Binarization of scanned gray scale images is the first step in most document image analysis systems. Selection of an appropriate binarization method for an input image domain is a difficult problem. Typically, a human expert evaluates the binarized images according to his/her visual criteria. However, to conduct an objective evaluation, one needs to investigate how well the subsequent image analysis steps will perform on the binarized image. We call this approach goal-directed evaluation, and it can be used to evaluate other low-level image processing methods as well. Our evaluation of binarization methods is in the context of digit recognition, so we define the performance of the character recognition module as the objective measure. Eleven different locally adaptive binarization methods were evaluated, and Niblack's method gave the best performance.

700 citations


"Text localization, enhancement and ..." refers methods in this paper

  • ...For our system we chose Niblack’s method [4], which also performed one of the best in a well known survey [6]....

    [...]

Proceedings ArticleDOI
16 Aug 1998
TL;DR: Compared with some traditional text location methods, this method has the following advantages: 1) low computational cost; 2) robust to font size; and 3) high accuracy.
Abstract: Automatic text location (without character recognition capabilities) deals with extracting image regions that contain text only. The images of these regions can then be fed to an optical character recognition module or highlighted for users. This is very useful in a number of applications such as database indexing and converting paper documents to their electronic versions. The performance of our automatic text location algorithm is shown in several applications. Compared with some traditional text location methods, our method has the following advantages: 1) low computational cost; 2) robust to font size; and 3) high accuracy.

560 citations


"Text localization, enhancement and ..." refers methods in this paper

  • ...The first algorithms, introduced by the document processing community for the extraction of text from colored journal images, segmented characters before grouping them to words and lines [1]....

    [...]