scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Resolution enhancement of textual images via multiple coupled dictionaries and adaptive sparse representation selection

01 Mar 2015-International Journal on Document Analysis and Recognition (Springer Berlin Heidelberg)-Vol. 18, Iss: 1, pp 87-107
TL;DR: Significant improvements in visual quality and character recognition rates are achieved using the proposed approach, confirmed by a detailed comparative study with state-of-the-art upscaling approaches.
Abstract: Resolution enhancement has become a valuable research topic due to the rapidly growing need for high-quality images in various applications. Various resolution enhancement approaches have been successfully applied on natural images. Nevertheless, their direct application to textual images is not efficient enough due to the specificities that distinguish these particular images from natural images. The use of insufficient resolution introduces substantial loss of details which can make a text unreadable by humans and unrecognizable by OCR systems. To address these issues, a sparse coding-based approach is proposed to enhance the resolution of a textual image. Three major contributions are presented in this paper: (1) Multiple coupled dictionaries are learned from a clustered database and selected adaptively for a better reconstruction. (2) An automatic process is developed to collect the training database, which contains writing patterns extracted from high-quality character images. (3) A new local feature descriptor well suited for writing specificities is proposed for the clustering of the training database. The performance of these propositions is evaluated qualitatively and quantitatively on various types of low-resolution textual images. Significant improvements in visual quality and character recognition rates are achieved using the proposed approach, confirmed by a detailed comparative study with state-of-the-art upscaling approaches.
Citations
More filters
Journal ArticleDOI
TL;DR: The proposed mixture of experts (MoE) method to jointly learn the feature space partition and local regression models can use much less local models and time to achieve comparable or superior results to state-of-the-art SISR methods, providing a highly practical solution to real applications.
Abstract: Using a global regression model for single image super-resolution (SISR) generally fails to produce visually pleasant output. The recently developed local learning methods provide a remedy by partitioning the feature space into a number of clusters and learning a simple local model for each cluster. However, in these methods the space partition is conducted separately from local model learning, which results in an abundant number of local models to achieve satisfying performance. To address this problem, we propose a mixture of experts (MoE) method to jointly learn the feature space partition and local regression models. Our MoE consists of two components: gating network learning and local regressors learning. An expectation-maximization (EM) algorithm is adopted to train MoE on a large set of LR/HR patch pairs. Experimental results demonstrate that the proposed method can use much less local models and time to achieve comparable or superior results to state-of-the-art SISR methods, providing a highly practical solution to real applications.

50 citations

Proceedings ArticleDOI
23 Aug 2015
TL;DR: The main conclusion of this competition is that SR systems may improve OCR performances by up to 16.55 points in accuracy compared with bicubic interpolation for the proposed low resolution images.
Abstract: This paper presents the first international competition on Text Image Super-Resolution (SR) and the ICDAR2015-TextSR dataset. We describe the core of the competition: interest, dataset generation and evaluation procedure, together with participating teams and their respective methods. The obtained results, along with baseline image upscaling schemes and state-of-the-art SR approaches are reported and commented. The main conclusion of this competition is that SR systems may improve OCR performances by up to 16.55 points in accuracy compared with bicubic interpolation for the proposed low resolution images.

33 citations


Cites background or methods from "Resolution enhancement of textual i..."

  • ...More details on this system are described in [5]....

    [...]

  • ...To improve the unsupervised clustering of this database, an intelligent clustering method is applied and a new local feature descriptor, referred to as Histogram of Structure Tensors (HoST), is introduced making it possible to capture the local information of an image patch [5]....

    [...]

  • ...The ASRS system [5] (Adaptive Sparse Representation Selection based system) was submitted by Rim Walha, Fadoua Drira, Franck Lebourgeois and Adel M....

    [...]

Posted Content
TL;DR: It is reported that the winning entry of text image super-resolution framework has largely improved the OCR performance with low-resolution images used as input, reaching an OCR accuracy score of 77.19%, which is comparable with that of using the original high- resolution images.
Abstract: Text image super-resolution is a challenging yet open research problem in the computer vision community. In particular, low-resolution images hamper the performance of typical optical character recognition (OCR) systems. In this article, we summarize our entry to the ICDAR2015 Competition on Text Image Super-Resolution. Experiments are based on the provided ICDAR2015 TextSR dataset (3) and the released Tesseract-OCR 3.02 system (1). We report that our winning entry of text image super-resolution framework has largely improved the OCR performance with low-resolution images used as input, reaching an OCR accuracy score of 77.19%, which is comparable with that of using the original high-resolution images (78.80%). Index Terms—super resolution; optical character recogni- tion.

33 citations


Cites methods from "Resolution enhancement of textual i..."

  • ...The Synchromedia Lab [7] and ASRS [10] are methods of the other two competition teams....

    [...]

Journal ArticleDOI
TL;DR: This study surveys methods that are mainly designed for enhancing low-resolution textual images in super-resolution (SR) task and criticises these methods and discusses areas which promise improvements in such task.
Abstract: Super-resolution (SR) task has become an important research area due to the rapidly growing interest for high quality images in various computer vision and pattern recognition applications. This has led to the emergence of various SR approaches. According to the number of input images, two kinds of approaches could be distinguished: single or multi-input based approaches. Certainly, processing multiple inputs could lead to an interesting output, but this is not the case mainly for textual image processing. This study focuses on single image-based approaches. Most of the existing methods have been successfully applied on natural images. Nevertheless, their direct application on textual images is not enough efficient due to the specificities that distinguish these particular images from natural images. Therefore, SR approaches especially suited for textual images are proposed in the literature. Previous overviews of SR methods have been concentrated on natural images application with no real application on the textual ones. Thus, this study aims to tackle this lack by surveying methods that are mainly designed for enhancing low-resolution textual images. The authors further criticise these methods and discuss areas which promise improvements in such task. To the best of the authors’ knowledge, this survey is the first investigation in the literature.

20 citations

Proceedings ArticleDOI
01 Dec 2017
TL;DR: A new loss function is proposed when training CNN for text image SR to facilitate OCR, and a simple yet effective image padding method to refine the image boundaries during SR is proposed.
Abstract: Since low-resolution images may hamper the performance of optical character recognition (OCR), text image super-resolution (SR) has become an increasingly important problem in computer vision. Convolutional neural network (CNN) has been proposed for generic image SR as well as text image SR, but the previous works concern more on the objective quality (e.g. PSNR) rather than the OCR performance. In this paper, we propose a new loss function when training CNN for text image SR to facilitate OCR, and conduct model combination to further improve the performance. Also, we propose a simple yet effective image padding method to refine the image boundaries during SR. Experimental results show that we achieve an OCR accuracy of 78.10% on the ICDAR 2015 TextSR dataset, which is comparable with that of using the original high-resolution images (78.80%), and also exceeds the state-of-the-arts.

20 citations

References
More filters
Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

31,952 citations


"Resolution enhancement of textual i..." refers methods in this paper

  • ...HoST is designed to be non-oriented in contrast to the histogram of oriented gradients (HOG) descriptor [11]....

    [...]

  • ...Like [11], votes are weighted by using bilinear interpolation to reduce the aliasing effect....

    [...]

Journal ArticleDOI
TL;DR: A new graphical display is proposed for partitioning techniques, where each cluster is represented by a so-called silhouette, which is based on the comparison of its tightness and separation, and provides an evaluation of clustering validity.
Abstract: A new graphical display is proposed for partitioning techniques. Each cluster is represented by a so-called silhouette, which is based on the comparison of its tightness and separation. This silhouette shows which objects lie well within their cluster, and which ones are merely somewhere in between clusters. The entire clustering is displayed by combining the silhouettes into a single plot, allowing an appreciation of the relative quality of the clusters and an overview of the data configuration. The average silhouette width provides an evaluation of clustering validity, and might be used to select an ‘appropriate’ number of clusters.

14,144 citations


"Resolution enhancement of textual i..." refers background in this paper

  • ...The silhouette is another popular cluster validity index [44]....

    [...]

Journal ArticleDOI
TL;DR: A novel algorithm for adapting dictionaries in order to achieve sparse signal representations, the K-SVD algorithm, an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data.
Abstract: In recent years there has been a growing interest in the study of sparse representation of signals. Using an overcomplete dictionary that contains prototype signal-atoms, signals are described by sparse linear combinations of these atoms. Applications that use sparse representation are many and include compression, regularization in inverse problems, feature extraction, and more. Recent activity in this field has concentrated mainly on the study of pursuit algorithms that decompose signals with respect to a given dictionary. Designing dictionaries to better fit the above model can be done by either selecting one from a prespecified set of linear transforms or adapting the dictionary to a set of training signals. Both of these techniques have been considered, but this topic is largely still open. In this paper we propose a novel algorithm for adapting dictionaries in order to achieve sparse signal representations. Given a set of training signals, we seek the dictionary that leads to the best representation for each member in this set, under strict sparsity constraints. We present a new method-the K-SVD algorithm-generalizing the K-means clustering process. K-SVD is an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data. The update of the dictionary columns is combined with an update of the sparse representations, thereby accelerating convergence. The K-SVD algorithm is flexible and can work with any pursuit method (e.g., basis pursuit, FOCUSS, or matching pursuit). We analyze this algorithm and demonstrate its results both on synthetic tests and in applications on real image data

8,905 citations

Journal ArticleDOI
TL;DR: A measure is presented which indicates the similarity of clusters which are assumed to have a data density which is a decreasing function of distance from a vector characteristic of the cluster which can be used to infer the appropriateness of data partitions.
Abstract: A measure is presented which indicates the similarity of clusters which are assumed to have a data density which is a decreasing function of distance from a vector characteristic of the cluster. The measure can be used to infer the appropriateness of data partitions and can therefore be used to compare relative appropriateness of various divisions of the data. The measure does not depend on either the number of clusters analyzed nor the method of partitioning of the data and can be used to guide a cluster seeking algorithm.

6,757 citations


"Resolution enhancement of textual i..." refers background in this paper

  • ...Davies and Bouldin [13] proposed another clustering validity index, referred to as the Davies–Bouldin (DB) index, which is calculated as follows:...

    [...]

  • ...Davies and Bouldin [13] proposed another clustering validity index, referred to as the Davies–Bouldin (DB) index, which is calculated as follows: DB = 1 K ∑ ck∈ξ maxcl∈ξ\ck { S(ck) + S(cl) de(ck, cl) } (18) where S(ck) = 1|ck | ∑ xi ∈ck de(xi , ck) (19) In contrast to the CH index, the minimal value of DB index indicates the best clustering solution....

    [...]