scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A Noise-Resilient Super-Resolution Framework to Boost OCR Performance

TL;DR: This paper proposes a noise-resilient SR framework for text images and recognizes the text using a deep BLSTM network trained on high resolution images and tests the OCR performance of the noise- Resilient super-resolved images is at par with the original HR images.
Abstract: Recognizing text from noisy low-resolution (LR) images is extremely challenging and is an open problem for the computer vision community. Super-resolving a noisy LR text image results in noisy High Resolution (HR) text image, as super-resolution (SR) leads to spatial correlation in the noise, and further cannot be de-noised successfully. Traditional noise-resilient text image super-resolution methods utilize a denoising algorithm prior to text SR but denoising process leads to loss of some high frequency details, and the output HR image has missing information (texture details and edges). This paper proposes a noise-resilient SR framework for text images and recognizes the text using a deep BLSTM network trained on high resolution images. The proposed end-to-end deep learning based framework for noise-resilient text image SR simultaneously perform image denoising and super-resolution as well as preserves missing details. Stacked sparse denoising auto-encoder (SSDA) is learned for LR text image denoising, and our proposed coupled deep convolutional auto-encoder (CDCA) is learned for text image super-resolution. The pretrained weights for both these networks serve as initial weights to the end-to-end framework during finetuning, and the network is jointly optimized for both the tasks. We tested on several Indian Language datasets and the OCR performance of the noise-resilient super-resolved images is at par with the original HR images.
Citations
More filters
Book ChapterDOI
26 Jul 2020
TL;DR: A generative adversarial network (GAN) based framework is proposed, where a SR image generator and a document image quality discriminator are constructed and the obtained SR document images not only maintain the details of textures but remove the background noises, which achieve better OCR performance on the public databases.
Abstract: Super-resolving a low resolution (LR) document image can not only enhance the visual quality and readability of the text, but improve the optical character recognition (OCR) accuracy. However, even despite the ill-posed nature of image super-resolution (SR) problem, how do we treat the finer details of text with large upscale factors and suppress noises and artifacts at the same time, especially for low quality document images is still a challenging task. Thus, in order to boost the OCR accuracy, we propose a generative adversarial network (GAN) based framework in this paper, where a SR image generator and a document image quality discriminator are constructed. To obtain high quality SR document image, multiple losses are designed to encourage the generator to learn the structural properties of texts. Meanwhile, the quality discriminator is trained based on a relativistic loss function. Based on the proposed framework, the obtained SR document images not only maintain the details of textures but remove the background noises, which achieve better OCR performance on the public databases. The source codes and pre-trained models are available at https://gitlab.com/xujun.peng/doc-super-resolution.

13 citations

Proceedings ArticleDOI
01 Sep 2019
TL;DR: An end-to-end trainable deep-learning based framework for joint optimization of document enhancement and recognition, using a generative adversarial network (GAN) based framework to perform image denoising followed by deep back projection network (DBPN) for super-resolution.
Abstract: Recognizing text from degraded and low-resolution document images is still an open challenge in the vision community. Existing text recognition systems require a certain resolution and fails if the document is of low-resolution or heavily degraded or noisy. This paper presents an end-to-end trainable deep-learning based framework for joint optimization of document enhancement and recognition. We are using a generative adversarial network (GAN) based framework to perform image denoising followed by deep back projection network (DBPN) for super-resolution and use these super-resolved features to train a bidirectional long short term memory (BLSTM) with Connectionist Temporal Classification (CTC) for recognition of textual sequences. The entire network is end-to-end trainable and we obtain improved results than state-of-the-art for both the image enhancement and document recognition tasks. We demonstrate results on both printed and handwritten degraded document datasets to show the generalization capability of our proposed robust framework.

12 citations


Cites background or methods from "A Noise-Resilient Super-Resolution ..."

  • ...Script Handwritten Hindi Bangla Oriya IAM Database Enhancement Methods Noise OCR Accuracy PSNR OCR Accuracy PSNR OCR Accuracy PSNR OCR Accuracy PSNR HR Image 94.31 97.00 89.20 93.04 Test LR4x LR4x G 0.01 G 0.1 3.01 1.02 0 07.33 4.19 1.21 1.03 0.29 0 08.87 04.97 0.67 SR4x 92.21 25.98 94.18 25.85 87.54 26.01 91.23 28.14 Proposed 4x G 0.01 G 0.1 SnP 0.01 SnP 0.1 92.16 91.87 92.19 91.93 25.76 25.01 25.81 24.96 94.00 93.36 93.98 93.20 25.84 25.79 25.85 25.33 87.50 87.35 87.51 87.22 25.98 25.83 26.00 25.68 91.14 90.09 91.20 90.34 28.08 27.56 28.01 27.39 SSDA-CDCA 4x G 0.01 G 0.1 SnP 0.01 SnP 0.1 87.73 87.27 84.18 82.32 24.42 24.05 24.40 23.86 91.93 91.08 91.95 90.82 25.77 25.61 25.78 25.03 87.01 86.23 87.17 86.11 25.76 25.27 25.79 25.38 88.23 87.28 88.11 87.05 25.89 24.57 25.62 24.05 SSDA+CDCA 4x G 0.01 G 0.1 SnP 0.01 SnP 0.1 82.27 77.29 76.92 69.67 20.66 20.14 20.01 18.82 74.01 46.38 71.29 45.14 22.03 16.71 22.79 16.86 51.23 47.81 50.52 48.61 19.11 18.04 19.23 18.38 80.52 76.97 80.16 75.99 20.98 18.97 20.59 19.04 Bicubic4x Bicubic4x G 0.01 G 0.1 61.51 23.23 18.81 12.24 11.19 10.48 38.46 30.01 21.28 14.72 13.23 11.86 48.34 32.46 20.12 17.01 14.04 12.98 60.50 30.78 11.78 19.03 18.67 12.98 Test LR3x LR3x G 0.01 G 0.1 14.03 9.23 2.13 23.07 16.07 10.15 04.05 2.39 1.18 26.04 19.67 13.13 SR3x 94.01 29.13 96.11 27.86 89.14 27.34 92.56 29.04 Proposed 3x G 0.01 G 0.1 SnP 0.01 SnP 0.1 94.00 93.03 93.67 92.90 29.10 28.23 29.05 28.21 96.09 95.45 96.10 95.59 27.77 27.00 27.78 26.97 89.08 88.76 89.04 88.77 27.19 26.74 27.27 26.56 92.39 92.07 92.48 91.86 28.86 28.29 28.89 28.19 SSDA-CDCA 3x G 0.01 G 0.1 SnP 0.01 SnP 0.1 93.38 92.62 92.97 92.06 27.79 26.04 27.91 26.17 95.07 94.8 93.9 92.62 26.37 25.89 26.40 26.03 88.78 87.05 86.96 86.11 26.14 25.22 26.27 25.38 90.23 88.49 90.35 89.01 27.38 26.01 27.39 25.93 SSDA+CDCA 3x G 0.01 G 0.1 SnP 0.01 SnP 0.1 90.43 87.24 86.77 85.46 23.38 22.90 23.86 22.98 82.38 59.44 77.24 54.56 22.98 19.02 23.08 19.54 54.22 50.29 53.31 50.44 21.57 18.42 21.68 18.61 82.56 76.87 81.38 74.78 23.29 20.17 22.91 20.05 Bicubic3x Bicubic3x G 0.01 G 0.1 90.04 83.45 79.48 20.79 18.89 13.35 69.68 49.36 25.07 20.21 17.19 12.32 56.22 47.67 24.76 19.96 16.79 12.01 66.07 60.63 46.04 21.04 17.98 14.67 proposed framework, cascaded modules of our framework and our enhancement module cascaded with Tesseract [18]....

    [...]

  • ...In [9] authors have proposed joint optimisation of two CNN based de-noising and SR module to improve OCR....

    [...]

  • ...There are several works where image de-noising [1]–[3] or superresolution [4]–[7] are done separately, but very few tried to achieve the distortion-free super-resolved text image from low-resolution (LR) noisy and distorted images [8], [9]....

    [...]

  • ...SSDA-CDCA [9] had jointly trained denoising and super-resolution and then uses the super-resolved output as input to a BLSTM based OCR....

    [...]

  • ...In [9] authors show that joint optimization of text image superresolution and de-noising, improves the quality of noisy LR text image in comparison to other state-of-art algorithms such as [5]–[8]....

    [...]

Proceedings ArticleDOI
01 Sep 2019
TL;DR: This paper proposes to combine the OCR performance into the loss function during network training and results in the generation of high resolution text images that achieve high O CR performance that is comparable to the ground truth high-resolution text images and surpassing those of the SOA baseline results.
Abstract: Convolutional neural networks are shown to achieve breakthrough performance for the task of single image super resolution (SISR) for natural images. These state-of-the-art (SOA) networks have been adapted to the task of single text image super resolution and have been shown to boost the optical character recognition (OCR) performance. However, these approaches depend on variations of the standard mean squared error (MSE) loss in order to train the SR network for improving the text image quality which does not guarantee optimal OCR performance. In this paper, we propose to combine the OCR performance into the loss function during network training. This results in the generation of high resolution text images that achieve high OCR performance that is comparable to the ground truth high-resolution text images and surpassing those of the SOA baseline results. We define novel intuitive metrics to capture the improvement in the OCR performance and provide extensive experiments to qualitatively and quantitatively assess improvement in the results of our proposed approach against the SOA baselines on the standard UNLV dataset.

4 citations


Cites background from "A Noise-Resilient Super-Resolution ..."

  • ...These networks were replicated to address the problem of text image SR in recent years [7], [19], [22]....

    [...]

Book ChapterDOI
01 Jan 2022
TL;DR: In this paper , a Text-Attention-ed Super Resolution GAN (TASR-GAN) was proposed to address the problem of super-resolution of handwritten documents.
Abstract: AbstractSuper-resolution aims to increase the resolution and the clarity of the details in low-resolution images, and document images are no exception. Although significant improvements have been achieved in super-resolution for different domains, historical document images have not been addressed well. Most of the current works in the text domain deal with modern fonts and rely on extracting prior semantic information from a recognizer to super-resolve images. The absence of a reliable handwritten recognizer for Arabic documents, where historical documents have a complex structure and overlapping parts, makes these text-domain works inapplicable. This paper presents a Text-Attention-ed Super Resolution GAN (TASR-GAN) to address this problem. The model deals with historical Arabic documents and does not rely on prior semantic information. Since our input domain documents, text edges are essential for quality and readability; thus, we introduce a new loss function called text edge loss. This loss function provides more attention and weight to text edge information and guides through optimization to super-resolve images with accurate small regions’ details and fine edges to improve image quality. Experiments on six Arabic manuscripts show that the proposed TASR achieves state-of-the-art performance in terms of PSNR/SSIM metrics and significantly improves the visual image quality, mainly the edges of small regions details, and eliminates artifacts noises. Also, a grid search experiment has been conducted to tune the best hyperparameters values for our text edge loss function.KeywordsSuper-resolutionHistorical handwritten documentsGenerative adversarial networks
Proceedings ArticleDOI
23 Nov 2022
TL;DR: In this paper , an autoencoder for denoising text images and evaluating the OCR performance in converting the denoised image into text was proposed, which showed that datasetsize affects both denoizing and OCR.
Abstract: Document digitization has an important role in helping the company’s activities be more efficient, such as detecting text in invoice document images using optical character recognition (OCR). However, writing in images has many problems, especially tediously saved documents that can cause noise or interference in the picture, resulting in difficultly recognized writing. Our research aims to build an autoencoder for denoising text images and evaluate the OCR’s performance in converting the denoised image into text. The first step in the research is to test the OCR characteristics on the original text image and the text image given Gaussian noise. The next step is to build the optimal autoencoder model for denoising by studying the effect of dataset size and optimizer type. The last step is to test the OCR performance on the denoised text image produced by the optimum autoencoder model. The test results show that datasetsize affects denoising performance and OCR performance. From several autoencoder models compared, the autoencoder with dataset size =40 has the optimum performance, where the MSE values of the model for train and validation are 1277 and 1385, respectively. With images denoised from the optimum model, the OCR performance in converting images into text is 100%.
References
More filters
Journal ArticleDOI
TL;DR: A novel algorithm for adapting dictionaries in order to achieve sparse signal representations, the K-SVD algorithm, an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data.
Abstract: In recent years there has been a growing interest in the study of sparse representation of signals. Using an overcomplete dictionary that contains prototype signal-atoms, signals are described by sparse linear combinations of these atoms. Applications that use sparse representation are many and include compression, regularization in inverse problems, feature extraction, and more. Recent activity in this field has concentrated mainly on the study of pursuit algorithms that decompose signals with respect to a given dictionary. Designing dictionaries to better fit the above model can be done by either selecting one from a prespecified set of linear transforms or adapting the dictionary to a set of training signals. Both of these techniques have been considered, but this topic is largely still open. In this paper we propose a novel algorithm for adapting dictionaries in order to achieve sparse signal representations. Given a set of training signals, we seek the dictionary that leads to the best representation for each member in this set, under strict sparsity constraints. We present a new method-the K-SVD algorithm-generalizing the K-means clustering process. K-SVD is an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data. The update of the dictionary columns is combined with an update of the sparse representations, thereby accelerating convergence. The K-SVD algorithm is flexible and can work with any pursuit method (e.g., basis pursuit, FOCUSS, or matching pursuit). We analyze this algorithm and demonstrate its results both on synthetic tests and in applications on real image data

8,905 citations


"A Noise-Resilient Super-Resolution ..." refers background in this paper

  • ...Although much work has been done separately on natural image SR [2]–[5] and de-noising [6]–[8]....

    [...]

Proceedings ArticleDOI
25 Jun 2006
TL;DR: This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems of sequence learning and post-processing.
Abstract: Many real-world sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. In speech recognition, for example, an acoustic signal is transcribed into words or sub-word units. Recurrent neural networks (RNNs) are powerful sequence learners that would seem well suited to such tasks. However, because they require pre-segmented training data, and post-processing to transform their outputs into label sequences, their applicability has so far been limited. This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems. An experiment on the TIMIT speech corpus demonstrates its advantages over both a baseline HMM and a hybrid HMM-RNN.

5,188 citations


"A Noise-Resilient Super-Resolution ..." refers methods in this paper

  • ...With the use of Connectionist Temporal Classification (CTC) [23] as the output layer, we...

    [...]

  • ...With the use of Connectionist Temporal Classification (CTC) [23] as the output layer, we can learn many-to-many mapping between input and output sequences....

    [...]

  • ...CTC transforms the output of LSTM to conditional probability distribution over all possible label sequence conditioned over input sequence....

    [...]

Journal ArticleDOI
TL;DR: This paper presents a new approach to single-image superresolution, based upon sparse signal representation, which generates high-resolution images that are competitive or even superior in quality to images produced by other similar SR methods.
Abstract: This paper presents a new approach to single-image superresolution, based upon sparse signal representation. Research on image statistics suggests that image patches can be well-represented as a sparse linear combination of elements from an appropriately chosen over-complete dictionary. Inspired by this observation, we seek a sparse representation for each patch of the low-resolution input, and then use the coefficients of this representation to generate the high-resolution output. Theoretical results from compressed sensing suggest that under mild conditions, the sparse representation can be correctly recovered from the downsampled signals. By jointly training two dictionaries for the low- and high-resolution image patches, we can enforce the similarity of sparse representations between the low-resolution and high-resolution image patch pair with respect to their own dictionaries. Therefore, the sparse representation of a low-resolution image patch can be applied with the high-resolution image patch dictionary to generate a high-resolution image patch. The learned dictionary pair is a more compact representation of the patch pairs, compared to previous approaches, which simply sample a large amount of image patch pairs , reducing the computational cost substantially. The effectiveness of such a sparsity prior is demonstrated for both general image super-resolution (SR) and the special case of face hallucination. In both cases, our algorithm generates high-resolution images that are competitive or even superior in quality to images produced by other similar SR methods. In addition, the local sparse modeling of our approach is naturally robust to noise, and therefore the proposed algorithm can handle SR with noisy inputs in a more unified framework.

4,958 citations


"A Noise-Resilient Super-Resolution ..." refers background in this paper

  • ...Although much work has been done separately on natural image SR [2]–[5] and de-noising [6]–[8]....

    [...]

Book ChapterDOI
06 Sep 2014
TL;DR: This work proposes a deep learning method for single image super-resolution (SR) that directly learns an end-to-end mapping between the low/high-resolution images and shows that traditional sparse-coding-based SR methods can also be viewed as a deep convolutional network.
Abstract: We propose a deep learning method for single image super-resolution (SR). Our method directly learns an end-to-end mapping between the low/high-resolution images. The mapping is represented as a deep convolutional neural network (CNN) [15] that takes the low-resolution image as the input and outputs the high-resolution one. We further show that traditional sparse-coding-based SR methods can also be viewed as a deep convolutional network. But unlike traditional methods that handle each component separately, our method jointly optimizes all layers. Our deep CNN has a lightweight structure, yet demonstrates state-of-the-art restoration quality, and achieves fast speed for practical on-line usage.

4,445 citations

Book
01 Jan 2014
TL;DR: Digital image processing 3rd edition free ebooks download, ece 643 digital image processing i chapter 5, gonzfm i xxii 5 1.
Abstract: amazon com digital image processing 3rd edition, digital image processing 3rd edition pdf, digital image processing 3rd edition 9780131687288, een iust ac ir, download digital image processing 3rd edition pdf ebook, digital image processing gonzalez ebay, digital image processing 3rd edition, digital image processing 3rd edition free ebooks download, ece 643 digital image processing i chapter 5, gonzfm i xxii 5 1

1,830 citations


"A Noise-Resilient Super-Resolution ..." refers methods in this paper

  • ...Weiner filtering [15] have been employed for de-noising text in document images....

    [...]