scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Sparse Document Image Coding for Restoration

TL;DR: This paper leverages the fact that different characters possess similar strokes, curves, and edges, and learns a dictionary that gives sparse decomposition for patches, and concludes that this method is ideally suited for restoring highly degraded images in repositories such as digital libraries.
Abstract: Sparse representation based image restoration techniques have shown to be successful in solving various inverse problems such as denoising, in painting, and super-resolution, etc. on natural images and videos. In this paper, we explore the use of sparse representation based methods specifically to restore the degraded document images. While natural images form a very small subset of all possible images admitting the possibility of sparse representation, document images are significantly more restricted and are expected to be ideally suited for such a representation. However, the binary nature of textual document images makes dictionary learning and coding techniques unsuitable to be applied directly. We leverage the fact that different characters possess similar strokes, curves, and edges, and learn a dictionary that gives sparse decomposition for patches. Experimental results show significant improvement in image quality and OCR performance on documents collected from a variety of sources such as magazines and books. This method is therefore, ideally suited for restoring highly degraded images in repositories such as digital libraries.

Summary (2 min read)

I. INTRODUCTION

  • Recent years have seen a surge of interest in digitizing the old documents and books to preserve them for posterity and because of their potential applications in information extraction, retrieval etc.
  • Unfortunately many of these old documents and manuscripts are often degraded due to erosion, aging, printing process, ink blot and fading.
  • These works make an assumption that the original clean image of a given degraded image admits a sparse representation with respect to some basis.
  • The authors then seek for high sparsity for degraded images to reconstruct the text regions removing noisy artifacts in documents.
  • There have been many attempts in solving the problem.

II. SPARSE CODING FOR IMAGE RESTORATION

  • The problem is tackled with the sparsity prior which assumes that natural image patches can be sparsely represented in an appropriately chosen overcomplete basis and their sparse representation can be recovered from the noisy patches.
  • The above presented sparse coding framework has proved to yield very good results in restoring natural images.
  • The authors demonstrate the above mentioned challenges with a simple experiment.
  • The authors consider a portion of the page from a magazine that contains both text and a photograph.
  • Degradation can be treated as missing pixel for photograph and as cuts for text region.

III. RESTORATION OF DOCUMENT IMAGES

  • The most critical challenge in restoration of document images using sparse coding can be explained with the help of Figure 3 .
  • This clearly does not hold in the case of document images.
  • Current dictionary learning techniques are not adequate to learn an appropriate dictionary and parameters under such a non-linear mapping.
  • This ensures that the resulting approximation is close to both binary and a valid document patch.
  • The authors restoration method is follows: 1) Learn a set of representative basis elements that summarizes a given set of clean image patches.

A. Dictionary Learning

  • The dictionary learning starts with a set of clean patches extracted from the segmented words.
  • Single non-zero constraint of the coefficients simplifies the K-SVD algorithm to K-means algorithm, however, with the constraint that basis elements are normalized.

B. Sparse coding

  • In order to avoid blocky artifacts in the reconstructed image, the authors use overlapping patches for restoration and the final reconstructed image is obtained by performing averaging at the overlapped regions.
  • The reconstructed image might be grayish with little noisy artifacts.
  • Regions corresponding to text will have large pixel values as they are efficiently reconstructed while noisy regions will have small values.
  • The authors thus use a simple post-processing step that binarizes the gray scale image to remove some of the noisy stray pixels.
  • The authors found that threshold parameter did not vary too much the quality of the outputs and is fixed to 0.3 in all their experiments.

IV. EXPERIMENTS AND RESULTS

  • Restoration using the proposed method were carried out on a variety of document images with different levels of degradation.
  • For learning step, the authors collected clean documents from a high quality book that has similar font as that of degraded image.
  • If the patch size is large, the dictionary elements may overfit the training data, resulting in reduced flexibility of degraded images that can be restored.
  • Table 1 , shows the input and output PSNR for different kinds of degradations with various levels of noise.
  • The authors will now look at the effect of their document restoration on OCR recognition which gives a good measure on the quality of restoration.

V. CONCLUSION

  • The authors present an approach to document restoration, that uses the fact that different characters in a document share similar strokes, curves, edges, etc.
  • The authors extend the sparse coding based restoration for document images and learned a set of dictionary elements that gives highly sparse decomposition for image patches.
  • The authors restored severe degradations, including cuts, merges, blobs and erosions in documents, and showed the experimental results on both positive and negative cases.
  • The authors also demonstrated the improvement in recognition performance of OCR system.
  • Unlike natural images, binary images take only few values of intensity and are structured.

Did you find this useful? Give us your feedback

Figures (8)

Content maybe subject to copyright    Report

Sparse Document Image Coding for Restoration
Vijay Kumar, Amit Bansal, Goutam Hari Tulsiyan, Anand Mishra, Anoop Namboodiri and C. V. Jawahar
Center for Visual Information Technology, IIIT Hyderabad, India
Abstract—Sparse representation based image restoration tech-
niques have shown to be successful in solving various inverse
problems such as denoising, inpainting, and super-resolution,
etc. on natural images and videos. In this paper, we explore
the use of sparse representation based methods specifically to
restore the degraded document images. While natural images
form a very small subset of all possible images admitting
the possibility of sparse representation, document images are
significantly more restricted and are expected to be ideally
suited for such a representation. However, the binary nature of
textual document images makes dictionary learning and coding
techniques unsuitable to be applied directly. We leverage the
fact that different characters possess similar strokes, curves, and
edges, and learn a dictionary that gives sparse decomposition
for patches. Experimental results show significant improvement
in image quality and OCR performance on documents collected
from a variety of sources such as magazines and books. This
method is therefore, ideally suited for restoring highly degraded
images in repositories such as digital libraries.
KeywordsDocument restoration, Sparse representation, Dic-
tionary learning
I. INTRODUCTION
Recent years have seen a surge of interest in digitizing the
old documents and books to preserve them for posterity and be-
cause of their potential applications in information extraction,
retrieval etc. Unfortunately many of these old documents and
manuscripts are often degraded due to erosion, aging, printing
process, ink blot and fading. One such degraded image is
shown in Figure 1(a). Apart from cuts and bleeds shown in
this example, other types of degradation occur frequently in
documents. Restoration may be used as pre-processing step in
applications related to recognition and retrieval. Figure 1(c)
shows OCR output of Figure 1(a) which is severly affected
due to low quality of the document. Clearly, it is necessary
to remove these noisy artifacts and restore the degraded
document, close to its original form.
Recently, sparse representation has been shown to yield
state-of-the-art results in solving inverse problems such as de-
noising [1][2], inpainting [3] and super-resolution [4], demon-
strated on gray and color images, and videos [2]. These works
make an assumption that the original clean image of a given
degraded image admits a sparse representation with respect
to some basis. The sparse codes of the clean image are then
recovered from the degraded image. This is due to recent
results from compressed sensing [5] that it is possible to
efficiently recover a sparse signal from incomplete or noisy
measurements provided the basis matrix possess some special
properties.
In sparse coding framework, a given signal or image patch
is represented as a sparse linear combination of an overcom-
plete basis or dictionary. In this paper, we extend its application
(a) Degraded image (b) Restored image
Bank. Well, the brothers,chatting along,
Zmpgened to get to wondering What
»e be eke fate of a petfecdy honest
and intelligent stranger who should be
‘firmed admfit in Lomdun Without a.friend,
wimh me meney" but We mifliow
beak-note, and me Way fie wecmmnt
few We being in m 0~€it- Bmeher
(c) OCR output of (a)
Bank. Well, the brothers, chatting along,
happened to get to wondering what
might be the fate of a perfectly honest
and intelligent stranger who should be
turned adrift in London without a friend,
and with no money but that million-
pound bank-note, and no way to account
for his being in possession of it. Brother
(d) OCR output of (b)
Fig. 1. Part of a scanned page of an old book with severe degradation and the
restored image (b). Bold words in (c) and (d) indicate differences in Tesseract
OCR results. We achieved significant improvement in error rate from 14% to
4.1%.
for document image restoration that are essentially binary in
nature. Our experiments suggest that developing sparse repre-
sentations for binary images need a slightly different approach
than grayscale and color images. We observe that different
characters share similar strokes, curves and edges. This allows
us to automatically learn a set of features/dictionary that
represents them efficiently using the training data. We then
seek for high sparsity for degraded images to reconstruct the
text regions removing noisy artifacts in documents. Figure 1(b)
and (d) shows the result of the degraded image restored by
our proposed method and its effect on OCRs performance,
respectively. We show an improvement in error rate from 14%
to 4.1% in OCR.
Restoration of document images is a well studied topic.
There have been many attempts in solving the problem.
Gupta et al. [6] used a patch based alphabet model to remove
blurring artifacts for license plate images using a camera.
Lelore et al. [7] proposed an approach for the binarization
of seriously degraded manuscripts where the MRF model
parameters are estimated from the training set. A patch based
method is proposed in [8] where each patch is corrected
by a weighted average of similar patches, identified using a
modified genetic algorithm. Huang et al. [9] combined the
degradation model and the document model into an MRF
framework.
Banerjee et al. [10] used an MRF technique that creates
an image with smooth regions in both the foreground and
the background, while allowing sharp discontinuities across
and smoothness along the edges. In their follow-up work
[11], they modeled the contextual relationship using an MRF
to restore documents with a wide variety of noises. Such
methods perform well in restoring many severely degraded
documents. However, they have practical limitations from their
heavy computational requirements, which increases with larger
context.
We briefly look into details of the natural image restoration

Fig. 2. Restoration of portion of a magazine (top-left) with text and image for missing pixels and cuts. Region corresponding to natual image is restored well
while text region is not. Portions zoomed out with red boxes belong to text and black boxes belong to image. (Best viewed by zooming on a computer)
using sparse representation followed by proposed method for
document restoration and experimental results.
II. SPARSE CODING FOR IMAGE RESTORATION
In this framework, the task is to recover an image X
(clean/high resolution) R
M×N
given a degraded (noisy/low
resolution/missing values) image Y R
M×N
. The problem
is tackled with the sparsity prior which assumes that natural
image patches can be sparsely represented in an appropriately
chosen overcomplete basis and their sparse representation can
be recovered from the noisy patches. Specifically one assumes
that a clean patch x R
d
of a clean image X has a
sparse representation with respect to an overcomplete basis
D R
d×m
(m d). i.e,
x Dα s.t. ||α||
0
L, (1)
where α is the sparse representation of the image patch and
||.||
0
is l
0
pseudo-norm, which gives a measure of number
of non-zero entries in a vector, and the constant L defines
the required sparsity level. Finding the sparse solution α is a
NP-hard problem. The techniques such as i) greedy methods
(matching pursuit [12]) or ii) convex relaxation (l
1
-norm) can
be used to solve the above problem. Note that, we do not know
either the clean image patch x or its representation α. However,
we can recover the sparse representation α from incomplete
or noisy input image patches y of image Y, with respect to
an overcomplete dictionary D due to recent results from [5].
Thus, sparse representation of x is recovered from y as
ˆα = min
α
||α||
0
s.t ||y Dα||
2
, (2)
where is constant and can be tuned according to the applica-
tion at hand. For denoising, could be tuned proportional to
noise variance if it is known. As observed in [1][13], learning
a dictionary from the images itself instead of a generic basis
(DCT or wavelet) could improve the restoration performance.
The above presented sparse coding framework has proved
to yield very good results in restoring natural images. However,
the application of sparse coding techniques on document
restoration is more challenging due to following reasons: (1)
Near pixel accurate restorations are important in document
images. Errors are immediately visible in binary images as
opposed to natural images. (2) Noise in natural images often
are uniform and homogenous where the variance is known or
estimated, but it is difficult to model the noise in document
images. (3) Noise in document images usually contain a
mixture of degradations coming from independent processes
such as erosion, cuts, bleeds, etc.
We demonstrate the above mentioned challenges with a
simple experiment. We consider a portion of the page from a
magazine that contains both text and a photograph. We synthet-
ically painted the image with white at randomly selected blocks
as shown in Figure 2. Degradation can be treated as missing
pixel (inpainting) for photograph and as cuts for text region.
We used the sparse coding technique proposed in [3] treating
the missing pixels (cuts) as infinite noise and restored the
image after learning a dictionary using large number of clean
text and natural image patches. It can be seen that the regions
corresponding to photograph are restored properly while text
regions are not.
III. RESTORATION OF DOCUMENT IMAGES
The most critical challenge in restoration of document
images using sparse coding can be explained with the help
of Figure 3. One of the fundamental assumptions in such a
representation is that the elements of the dictionary span the
subspace of images of interest and that any linear combination
of a sparse subset of dictionary elements is indeed a valid
image. This clearly does not hold in the case of document
images. Document image patches are binary in nature and so
are the dictionary elements (d
i
in Figure 3), which is not the
case with their linear combination. Ideally, a document image
patch (y) that we would like to represent using a dictionary D
should be computed as:
y = g(D, α), (3)
where α is a set of parameters and g is a non-linear function
that maps from the binary document dictionary elements to
a valid binary document image or patch. Current dictionary
learning techniques are not adequate to learn an appropriate
dictionary and parameters under such a non-linear mapping.

An alternative is to use a non-linear function (thresholding is
a not-so-good example) over a learned linear mapping to a
point y
0
in the subspace.
y
0
= Dα, and y = f(y
0
) (4)
0000011111
00
00
11
11
00000000001111111111
00000
00000
00000
11111
11111
11111
0000000
0000000
0000000
0000000
0000000
0000000
0000000
1111111
1111111
1111111
1111111
1111111
1111111
1111111
00
00
00
11
11
11
α
1
d
1
α
3
d
3
2
α d
2
d
2
d
3
d
1
Document Dictionary
Space Spanned by
f(y’)
y
y’
Dictionary Element
Linear combination
Point in Doc. space
Fig. 3. Space of images and the subspace spanned by the basis vectors in
the dictionary (shown in solid black). The document images (white circles)
are often outside the subspace spanned by the basis vectors.
Here we approximate the ideal non-linear representation
function, g, as f (y
0
), where y
0
is a linear combination of
dictionary elements weighted by α as shown in Figure 3.
As seen in Figure 2, the results of such approximations are
often very noisy. We get over this problem by approximating
a given noisy image using highly sparse representation, where
the sparsity is specified to be 1. This ensures that the resulting
approximation is close to both binary and a valid document
patch.
Our restoration method is follows:
1) Learn a set of representative basis elements that
summarizes a given set of clean image patches.
2) Find the sparse representation of each degraded patch
over the learned basis and binarize the output.
A. Dictionary Learning
The dictionary learning starts with a set of clean patches
extracted from the segmented words. Each word image of size
m × n is split into patches of size d = p × q resulting in
P patches and each patch is represented as a vector R
d
.
For basis learning, we use a method similar to the K-SVD
algorithm presented in [1]. We learn the basis D R
d×k
,
such that each patch is represented by a single basis element,
as shown in Equation (5). Single non-zero constraint of the
coefficients simplifies the K-SVD algorithm to K-means al-
gorithm, however, with the constraint that basis elements are
normalized.
{
ˆ
D, ˆα
i
} = arg min
D,α
i
P
X
i=1
||x
i
Dα
i
||
2
(5)
s.t ||α
i
||
0
= 1, i = {1, . . . , P }
and ||D
j
||
2
= 1, j = {1, . . . , k}
The above equation is optimized in an iterative fashion min-
imizing the objective function over D and α
i
, similar to the
algorithm presented in [1]. When D is fixed, α
i
R
k
is given
by α
j
i
= D
T
j
x
i
for j = l, where l = arg max
l
D
T
l
x
i
, and
(a) (b)
(c)
Fig. 4. (a) Portion of document page (b) Characters share common strokes,
curves, tips, etc. Boxes shown in blue share common tips and boxes shown
in red share similar curve (c) Dictionary/Features captures the characteristic
information of a text
α
j
i
= 0 for j 6= l. This would result in the selection of the
basis with maximum correlation with the given signal as the
representation. Then, each column of D is updated using SVD
while fixing other columns, similar to K-SVD algorithm.
Figure 4 shows a subset of basis elements learned from a
set of patches extracted from word images. Document images
are usually binary in nature with 0’s and 1’s corresponding to
text and background regions respectively. Since the interest
region is text, we operate on the inverted images to allow
the regular conventions of natural image representation. Basis
elements learned for document images can be easily interpreted
unlike natural images. The fundamental elements that consti-
tute the documents are strokes, curves, glyffs, etc. and our
method automatically learns these elements. This can be seen
in Figure 4 that dictionary elements correspond to character
strokes, thick edges, curves, etc occuring in textual characters
(Figure 4 (c)) thereby representing them efficiently. Different
English characters possess similar kind of edges, strokes
or curves and such patches may share the same dictionary
element.
B. Sparse coding
Once the basis is learnt from a set of clean patches, any
degraded patch y
i
R
d
of a noisy image Y R
M×N
can be
decomposed sparsely over the basis and can be reconstructed
as per Equation 5. In order to avoid blocky artifacts in the re-
constructed image, we use overlapping patches for restoration
and the final reconstructed image is obtained by performing
averaging at the overlapped regions.
The reconstructed image might be grayish with little noisy
artifacts. Regions corresponding to text will have large pixel
values as they are efficiently reconstructed while noisy regions
will have small values. We thus use a simple post-processing
step that binarizes the gray scale image to remove some of the
noisy stray pixels. We found that threshold parameter did not
vary too much the quality of the outputs and is fixed to 0.3 in
all our experiments.
IV. EXPERIMENTS AND RESULTS
Restoration using the proposed method were carried out
on a variety of document images with different levels of

degradation. We assume that clean document images with
similar font as the one used in the degraded images are
available. We also note that our method is robust to slight
variation of font between training and testing, as will be
demonstrated later. This kind of setting is very much suitable
to restore documents from digital libraries, magazines, etc.
In such a case, fonts and texts are constant throughout the
book and any recent publication of the magazine can be used
as high quality training documents. Also, with the advent of
internet, one can obtain clean documents with any font easily
e.g simple search of ‘gothic text’ will result in lot of high
quality documents which can be used to restore gothic texts.
For all the experiments, we segment the image into degraded
words and carry out restoration of the individual words.
For learning step, we collected clean documents from a
high quality book that has similar font as that of degraded
image. Figure 4(a) shows a small region of the clean images
collected from a high quality book. The number of sparse
coding and basis learning iterations was fixed empirically to
200. In order to maintain overcompleteness and recover sparse
representation [5], size of dictionary is usually fixed to four
times the size of the patch. It is observed in [1], [13], [3]
that very large dictionary leads to overfitting i.e, learnt atoms
may correspond to individual patches instead of generalizing
for large number of patches and very small dictionary leads
to underfitting. Figure 4(b) shows basis elements learned from
clean images for a patch size of 15 × 15.
Figure 5 shows eight different words from the book con-
taining cuts, erosion artifacts, and ink bleed, along with our
restoration results. One kind of degradation that we notice is
smear and ink blobs, as seen in words golf, fascinating, to
catch and laboratory. Our algorithm is able to restore these
words very well, especially the word fascinating which is
heavily degraded with characters almost getting connected.
Another kind of degradation is fading resulting in near cuts
as seen in character a in word sanguinary, which is restored
with high resolution. Our algorithm takes about 12 seconds to
restore a document of size 157 × 663 on a 2GB RAM and
Intel(R) Core(TM) i3 2120 system with 3.30 GHz processor
with un-optimized implementation.
The algorithm however fails to restore the characters v and
e in word several. The cut in v is very large compared to the
size of patch and the horizontal region in e has lot of missing
pixels and any patch considered in the region is blank and
hence algorithm could not estimate the shape in these regions.
Similarly, character y in surely has large amount of bleed which
the algorithm failed to restore.
We note that the patch size we considered for restoration
has a clear effect on the quality of restoration. If the patch
size is too small and comparable with the size of artifacts
such as blobs and cuts, the algorithm will restore the noisy
region as well. If the patch size is large, the dictionary elements
may overfit the training data, resulting in reduced flexibility
of degraded images that can be restored. We fixed the size of
patches to one-third of character font size.
We evaluate our algorithm both qualitatively and quantita-
tively on various kinds of synthetically generated degradations
such as pixel flipping, blurring, cuts, and texture-blending.
An example for each type of degradations and their restored
TABLE I. PSNR (dB) RESULTS OF RESTORATIONS OUTPUTS OF
VARIOUS SYNTHETIC DEGRADATIONS.
Flips Blur Cuts Texture blending
6.61 / 6.7 5.1 / 6.9 5.56 / 6.69 4.06 / 6.73
6.75 / 6.82 5.9 / 7.32 6.18 / 8.28 4.47 / 6.77
6.77 / 6.85 6.15 / 7.60 6.96 / 9.01 4.56 / 6.82
6.78 / 6.91 7.05 / 8.40 7.82 / 9.32 4.66 / 6.85
outputs are shown in Figure 7. Flipping is generated using
the method proposed in [14] for various PSNR values, by
tuning the parameters α
0
, β
0
, α
1
, β
1
. Blurring is produced by
convolving image with a Gaussian kernel of various sizes.
Various levels of cuts are produced by randomly selecting
windows in the image and randomly flipping few pixels in
the window. Finally, texture-blending simulates effects such as
textured paper or stained paper, and was produced by linearly
blending the document with a texture image for various degrees
of blending. Table 1, shows the input and output PSNR for
different kinds of degradations with various levels of noise. We
can see a clear improvement in the PSNR values for various
degradations.
We will now look at the effect of our document restoration
on OCR recognition which gives a good measure on the
quality of restoration. We used the ABBYY FineReader [15]
and Tesseract-2.01 OCR [16] which are the most popular
and accurate OCRs available. We ran the OCR on 20 pages
of an old English book collected from digital library. Each
page of the book contains an average of 300 words and 2200
characters. The error rate measured on degraded documents
using ABBYY FineReader was 9% which was already very
good. However, after the restoration, it got further reduced
to 0.7% which is a significant improvement. Similarly, it got
reduced from 14% to 4.1% using Tesseract.
Figure 1 shows the restoration result of a region of de-
graded page collected from a digital library. Figure 1(c) and
(d) show the results of Tesseract OCR output before and after
restoration respectively. The recognition error on the degraded
page was due to erosion and low printing quality, which might
possibly confuse the OCR when the noise fills up the gap
between two characters in a word. However, after restoration
Figure 1(b), it is recognized with high accuracy.
Figure 6 gives the restored image for the word played using
popular methods such as median, Gaussian, Non-local means
and ours. Clearly, our result is superior in quality compared
to these methods. Our method does not make any assumption
of script and thus same approach can be applied to restore
documents with any script. However, this is beyond the scope
of this paper.
V. CONCLUSION
We present an approach to document restoration, that
uses the fact that different characters in a document share
similar strokes, curves, edges, etc. We extend the sparse coding
based restoration for document images and learned a set of
dictionary elements that gives highly sparse decomposition
for image patches. We restored severe degradations, including
cuts, merges, blobs and erosions in documents, and showed the
experimental results on both positive and negative cases. We
also demonstrated the improvement in recognition performance
of OCR system. Though we demonstrated the application of

Fig. 5. Restoration of various degraded words. Our algorithm can effectively restore pixel flips, background noise and ink blots (first eight words), while large
blobs and cuts that are similar in size to the dictionary patches are not restored (see the last two words).
(a) Pixel Flipping (b) Blurring (c) Cuts (d) Texture blending degradation
Fig. 7. Different kind of synthetic degradations. In each column top image shows degraded image and bottom one shows corresponding restored images. (Best
viewed by zooming on computer)
(a) Degraded image
(b) Median filter (c) Gaussian filter
(d) NL-means (e) Ours
Fig. 6. Comaprison with other methods. (a) Cropped word “played” from a
degreded document. Output of (b) Median filter (c) Gaussian filter (4) Non-
local means (d) Ours. We observe that our restoration technique produces
cleaner image as compared to the traditional filtering techniques as well as
Non-local means filtering.
sparse coding on challenging document restoration, there is a
room for improvement. Unlike natural images, binary images
take only few values of intensity and are structured. We would
like to work on this aspect along with theoretical guarantees
of sparse coding on document images as a part of our future
work.
ACKNOWLEDGMENT
This work is partly supported by the MCIT, New Delhi.
Vijay Kumar is supported by TCS Reserach PhD fellowship.
Anand Mishra is supported by Microsoft Research India PhD
fellowship 2012 award.
REFERENCES
[1] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for
designing overcomplete dictionaries for sparse representation. IEEE
Trans. Singal Process., 2006.
[2] J. Mairal, G. Sapiro, and M. Elad, “Learning multiscale sparse repre-
sentations for image and video restoration, Multiscale Modeling and
Simulation, 2008.
[3] J. Mairal, M. Elad, and G. Sapiro, “Sparse Representation for Color
Image Restoration, IEEE Trans. Image Process., 2008.
[4] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution
via sparse representation, TIP, 2010.
[5] D. L. Donoho, “Compressed sensing, IEEE Trans. Inf. Theory, 2006.
[6] M. D. Gupta, S. Rajaram, N. Petrovic, and T. S. Huang, “Restoration
and recognition in a loop, in CVPR, 2005.
[7] T. Lelore and F. Bouchara, “Document image binarisation using markov
field model, in ICDAR, 2009.
[8] R. F. Moghaddam and M. Cheriet, “Beyond pixels and regions: A
non-local patch means (nlpm) method for content-level restoration,
enhancement, and reconstruction of degraded document images, PR,
2011.
[9] Y. Huang, M. S. Brown, and D. Xu, A framework for reducing ink-
bleed in old documents, in CVPR, 2008.
[10] J. Banerjee and C. V. Jawahar, “Super-resolution of text images using
edge-directed tangent field, in DAS, 2008.
[11] J. Banerjee, A. M. Namboodiri, and C. V. Jawahar, “Contextual restora-
tion of severely degraded document images, in CVPR, 2009.
[12] S. Mallat and Z. Zhang, “Matching pursuit with time-frequency dictio-
naries, IEEE Trans. Singal Process., vol. 41, no. 12, 1993.
[13] M. Elad and M. Aharon, “Image denoising via sparse and redundant
representations over learned dictionaries, TIP, 2006.
[14] T. Kanungo, R. M. Haralick, H. S. Baird, W. Stuetzle, and D. Madigan,
A statistical, nonparametric methodology for document degradation
model validation, PAMI, 2000.
[15] “http://www.abbyy.com/.
[16] “http://code.google.com/p/tesseract-ocr/.
Citations
More filters
Book ChapterDOI
10 Jul 2019
TL;DR: A simple, deep learning based model, that uses convolution with transposed convolution and sub-pixel layers in the best possible way to construct the high-resolution image and shows significant improvement in terms of the subjective criterion of human readability and objective criterion of OCR character level accuracy.
Abstract: Given a low-resolution binary document image, we aim to improve its perceptual quality for enhanced readability. We have proposed a simple, deep learning based model, that uses convolution with transposed convolution and sub-pixel layers in the best possible way to construct the high-resolution image. The proposed architecture scales across the three different scripts tested, namely Tamil, Kannada and Roman. To show that the reconstructed output has enhanced readability, we have used the objective criterion of optical character recognizer (OCR) character level accuracy. The reported results by our CTCS architecture shows significant improvement in terms of the subjective criterion of human readability and objective criterion of OCR character level accuracy.

4 citations

Proceedings ArticleDOI
01 Dec 2018
TL;DR: Two denoising algorithms that are able to remove noise from Arabic documents are introduced, based on sparse representations over a learned dictionary and Denoising auto-encoders, which have provided best state of the art results for natural imagesDenoising.
Abstract: Document denoising is one of the most challenging tasks in any optical character recognition system, especially when the noise type is different from white noise Noise types are wide and, hence an effective denoising algorithm should be able to deal with different noise types This paper introduces two denoising algorithms that are able to remove noise from Arabic documents Our approaches are based on sparse representations over a learned dictionary and denoising auto-encoders, which have provided best state of the art results for natural images denoising The experiments show that those two algorithms are promising in document denoising, as they provide the ability of learning a prior knowledge of clean character models to use them in the denoising process

4 citations


Cites methods from "Sparse Document Image Coding for Re..."

  • ...The work in [7] used the idea of sparse redundant representation in document restoration....

    [...]

Journal ArticleDOI
29 Sep 2016-Symmetry
TL;DR: In this paper, the optimized scale invariant feature transform (SIFT) algorithm was used for the registration of continuous frames, and finally the image was reconstructed under the improved POCS theoretical framework and results showed that the algorithm can significantly smooth the noise and eliminate noise caused by the shadows of the lines.
Abstract: In order to address the problem of the uncertainty of existing noise models and of the complexity and changeability of the edges and textures of low-resolution document images, this paper presents a projection onto convex sets (POCS) algorithm based on text features. The current method preserves the edge details and smooths the noise in text images by adding text features as constraints to original POCS algorithms and converting the fixed threshold to an adaptive one. In this paper, the optimized scale invariant feature transform (SIFT) algorithm was used for the registration of continuous frames, and finally the image was reconstructed under the improved POCS theoretical framework. Experimental results showed that the algorithm can significantly smooth the noise and eliminate noise caused by the shadows of the lines. The lines of the reconstructed text are smoother and the stroke contours of the reconstructed text are clearer, and this largely eliminates the text edge vibration to enhance the resolution of the document image text.

2 citations


Cites result from "Sparse Document Image Coding for Re..."

  • ..., on the basis of sparse representation, pointed out that, although the shapes of the characters were not consistent, their edges and stroke curves were similar [6]....

    [...]

Book ChapterDOI
19 Mar 2018
TL;DR: The paper presents the results of the tests with ancient manuscripts in Polish, Latin, and English languages to detect the most frequent causes of errors during the digitization of ancient manuscripts and to suggest ways to improve the digitized process ofAncient manuscripts.
Abstract: Ancient manuscripts are extremely important sources of information on our history, past culture, and science. To make these invaluable documents easily accessible in the Web they must be digitized and then stored in electronic collections of archival documents. The automatic indexing and retrieval processes of such ancient digitized documents are more efficient if the contents of manuscripts are converted into editable text forms. Unfortunately, contemporary methods of digitization and character recognition are not sufficient enough. During optical character recognition process, low quality of ancient manuscripts and on the other hand not enough advanced software result in numerous errors in output texts. The paper presents the results of the tests with ancient manuscripts in Polish, Latin, and English languages. These experiments have allowed us to detect the most frequent causes of errors during the digitization of ancient manuscripts and to suggest ways to improve the digitization process of ancient manuscripts.
Journal ArticleDOI
TL;DR: This research focuses on removing a maximum number of degradation factors from a natural scene image containing text such that the detection and recognition of the text present in that image becomes very easy.
Abstract: The task of text detection natural scene images is very challenging due to the complex background and unpredictable text appearances in the image. Apart from the background and the structure of the text, unpredictability also lies in the image capturing quality. These issues include noise, orientation, low exposure, blurring, and other kinds of degradations. It is therefore necessary to first restore the target text in the image in order to ensure robust text detection and recognition. This research focuses on removing a maximum number of degradation factors from a natural scene image containing text such that the detection and recognition of the text present in that image becomes very easy. Text Specific Dictionaries will be used in order to restore the text in the images. The sparse representation method is selected with an aim to apply techniques such as denoising, deblurring, sharpening and implementing other forms of enhancement in a single text image restoration system. General Terms Image restoration, Image enhancement, Text Detection, Text recognition, Sparse representations, Dictionaries.

Cites background from "Sparse Document Image Coding for Re..."

  • ...[26] exploited the property of text that there are similar strokes, curves and edges for different characters....

    [...]

References
More filters
Book
D.L. Donoho1
01 Jan 2004
TL;DR: It is possible to design n=O(Nlog(m)) nonadaptive measurements allowing reconstruction with accuracy comparable to that attainable with direct knowledge of the N most important coefficients, and a good approximation to those N important coefficients is extracted from the n measurements by solving a linear program-Basis Pursuit in signal processing.
Abstract: Suppose x is an unknown vector in Ropfm (a digital image or signal); we plan to measure n general linear functionals of x and then reconstruct. If x is known to be compressible by transform coding with a known transform, and we reconstruct via the nonlinear procedure defined here, the number of measurements n can be dramatically smaller than the size m. Thus, certain natural classes of images with m pixels need only n=O(m1/4log5/2(m)) nonadaptive nonpixel samples for faithful recovery, as opposed to the usual m pixel samples. More specifically, suppose x has a sparse representation in some orthonormal basis (e.g., wavelet, Fourier) or tight frame (e.g., curvelet, Gabor)-so the coefficients belong to an lscrp ball for 0

18,609 citations

Journal ArticleDOI
TL;DR: The authors introduce an algorithm, called matching pursuit, that decomposes any signal into a linear expansion of waveforms that are selected from a redundant dictionary of functions, chosen in order to best match the signal structures.
Abstract: The authors introduce an algorithm, called matching pursuit, that decomposes any signal into a linear expansion of waveforms that are selected from a redundant dictionary of functions. These waveforms are chosen in order to best match the signal structures. Matching pursuits are general procedures to compute adaptive signal representations. With a dictionary of Gabor functions a matching pursuit defines an adaptive time-frequency transform. They derive a signal energy distribution in the time-frequency plane, which does not include interference terms, unlike Wigner and Cohen class distributions. A matching pursuit isolates the signal structures that are coherent with respect to a given dictionary. An application to pattern extraction from noisy signals is described. They compare a matching pursuit decomposition with a signal expansion over an optimized wavepacket orthonormal basis, selected with the algorithm of Coifman and Wickerhauser see (IEEE Trans. Informat. Theory, vol. 38, Mar. 1992). >

9,380 citations


"Sparse Document Image Coding for Re..." refers methods in this paper

  • ...The techniques such as i) greedy methods (matching pursuit [12]) or ii) convex relaxation (l1-norm) can be used to solve the above problem....

    [...]

Journal ArticleDOI
TL;DR: A novel algorithm for adapting dictionaries in order to achieve sparse signal representations, the K-SVD algorithm, an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data.
Abstract: In recent years there has been a growing interest in the study of sparse representation of signals. Using an overcomplete dictionary that contains prototype signal-atoms, signals are described by sparse linear combinations of these atoms. Applications that use sparse representation are many and include compression, regularization in inverse problems, feature extraction, and more. Recent activity in this field has concentrated mainly on the study of pursuit algorithms that decompose signals with respect to a given dictionary. Designing dictionaries to better fit the above model can be done by either selecting one from a prespecified set of linear transforms or adapting the dictionary to a set of training signals. Both of these techniques have been considered, but this topic is largely still open. In this paper we propose a novel algorithm for adapting dictionaries in order to achieve sparse signal representations. Given a set of training signals, we seek the dictionary that leads to the best representation for each member in this set, under strict sparsity constraints. We present a new method-the K-SVD algorithm-generalizing the K-means clustering process. K-SVD is an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data. The update of the dictionary columns is combined with an update of the sparse representations, thereby accelerating convergence. The K-SVD algorithm is flexible and can work with any pursuit method (e.g., basis pursuit, FOCUSS, or matching pursuit). We analyze this algorithm and demonstrate its results both on synthetic tests and in applications on real image data

8,905 citations


"Sparse Document Image Coding for Re..." refers background or methods in this paper

  • ...The above equation is optimized in an iterative fashion minimizing the objective function over D and αi, similar to the algorithm presented in [1]....

    [...]

  • ...Recently, sparse representation has been shown to yield state-of-the-art results in solving inverse problems such as denoising [1][2], inpainting [3] and super-resolution [4], demonstrated on gray and color images, and videos [2]....

    [...]

  • ...It is observed in [1], [13], [3] that very large dictionary leads to overfitting i....

    [...]

  • ...For basis learning, we use a method similar to the K-SVD algorithm presented in [1]....

    [...]

  • ...As observed in [1][13], learning a dictionary from the images itself instead of a generic basis (DCT or wavelet) could improve the restoration performance....

    [...]

Journal ArticleDOI
TL;DR: This work addresses the image denoising problem, where zero-mean white and homogeneous Gaussian additive noise is to be removed from a given image, and uses the K-SVD algorithm to obtain a dictionary that describes the image content effectively.
Abstract: We address the image denoising problem, where zero-mean white and homogeneous Gaussian additive noise is to be removed from a given image. The approach taken is based on sparse and redundant representations over trained dictionaries. Using the K-SVD algorithm, we obtain a dictionary that describes the image content effectively. Two training options are considered: using the corrupted image itself, or training on a corpus of high-quality image database. Since the K-SVD is limited in handling small image patches, we extend its deployment to arbitrary image sizes by defining a global image prior that forces sparsity over patches in every location in the image. We show how such Bayesian treatment leads to a simple and effective denoising algorithm. This leads to a state-of-the-art denoising performance, equivalent and sometimes surpassing recently published leading alternative denoising methods

5,493 citations


"Sparse Document Image Coding for Re..." refers background in this paper

  • ...It is observed in [1], [13], [3] that very large dictionary leads to overfitting i....

    [...]

  • ...As observed in [1][13], learning a dictionary from the images itself instead of a generic basis (DCT or wavelet) could improve the restoration performance....

    [...]

Journal ArticleDOI
TL;DR: This paper presents a new approach to single-image superresolution, based upon sparse signal representation, which generates high-resolution images that are competitive or even superior in quality to images produced by other similar SR methods.
Abstract: This paper presents a new approach to single-image superresolution, based upon sparse signal representation. Research on image statistics suggests that image patches can be well-represented as a sparse linear combination of elements from an appropriately chosen over-complete dictionary. Inspired by this observation, we seek a sparse representation for each patch of the low-resolution input, and then use the coefficients of this representation to generate the high-resolution output. Theoretical results from compressed sensing suggest that under mild conditions, the sparse representation can be correctly recovered from the downsampled signals. By jointly training two dictionaries for the low- and high-resolution image patches, we can enforce the similarity of sparse representations between the low-resolution and high-resolution image patch pair with respect to their own dictionaries. Therefore, the sparse representation of a low-resolution image patch can be applied with the high-resolution image patch dictionary to generate a high-resolution image patch. The learned dictionary pair is a more compact representation of the patch pairs, compared to previous approaches, which simply sample a large amount of image patch pairs , reducing the computational cost substantially. The effectiveness of such a sparsity prior is demonstrated for both general image super-resolution (SR) and the special case of face hallucination. In both cases, our algorithm generates high-resolution images that are competitive or even superior in quality to images produced by other similar SR methods. In addition, the local sparse modeling of our approach is naturally robust to noise, and therefore the proposed algorithm can handle SR with noisy inputs in a more unified framework.

4,958 citations


"Sparse Document Image Coding for Re..." refers background in this paper

  • ...Recently, sparse representation has been shown to yield state-of-the-art results in solving inverse problems such as denoising [1][2], inpainting [3] and super-resolution [4], demonstrated on gray and color images, and videos [2]....

    [...]

Frequently Asked Questions (19)
Q1. What are the future works mentioned in the paper "Sparse document image coding for restoration" ?

The authors would like to work on this aspect along with theoretical guarantees of sparse coding on document images as a part of their future work. 

In this paper, the authors explore the use of sparse representation based methods specifically to restore the degraded document images. 

In order to avoid blocky artifacts in the reconstructed image, the authors use overlapping patches for restoration and the final reconstructed image is obtained by performing averaging at the overlapped regions. 

The techniques such as i) greedy methods (matching pursuit [12]) or ii) convex relaxation (l1-norm) can be used to solve the above problem. 

In order to maintain overcompleteness and recover sparse representation [5], size of dictionary is usually fixed to four times the size of the patch. 

a document image patch (y) that the authors would like to represent using a dictionary D should be computed as:y = g(D,α), (3)where α is a set of parameters and g is a non-linear function that maps from the binary document dictionary elements to a valid binary document image or patch. 

Their algorithm takes about 12 seconds to restore a document of size 157 × 663 on a 2GB RAM and Intel(R) Core(TM) i3−2120 system with 3.30 GHz processor with un-optimized implementation. 

(3) Noise in document images usually contain a mixture of degradations coming from independent processes such as erosion, cuts, bleeds, etc. 

One of the fundamental assumptions in such a representation is that the elements of the dictionary span the subspace of images of interest and that any linear combination of a sparse subset of dictionary elements is indeed a valid image. 

The recognition error on the degraded page was due to erosion and low printing quality, which might possibly confuse the OCR when the noise fills up the gap between two characters in a word. 

texture-blending simulates effects such as textured paper or stained paper, and was produced by linearly blending the document with a texture image for various degrees of blending. 

with the advent of internet, one can obtain clean documents with any font easily e.g simple search of ‘gothic text’ will result in lot of high quality documents which can be used to restore gothic texts. 

The authors used the sparse coding technique proposed in [3] treating the missing pixels (cuts) as infinite noise and restored the image after learning a dictionary using large number of clean text and natural image patches. 

If the patch size is large, the dictionary elements may overfit the training data, resulting in reduced flexibility of degraded images that can be restored. 

It is observed in [1], [13], [3] that very large dictionary leads to overfitting i.e, learnt atoms may correspond to individual patches instead of generalizing for large number of patches and very small dictionary leads to underfitting. 

The fundamental elements that constitute the documents are strokes, curves, glyffs, etc. and their method automatically learns these elements. 

Another kind of degradation is fading resulting in near cuts as seen in character a in word sanguinary, which is restored with high resolution. 

i.e,x ≈ Dα s.t. ||α||0 L, (1)where α is the sparse representation of the image patch and ||.||0 is l0 pseudo-norm, which gives a measure of number of non-zero entries in a vector, and the constant L defines the required sparsity level. 

sparse representation of x is recovered from y asα̂ = min α ||α||0 s.t ||y −Dα||2 ≤ , (2)where is constant and can be tuned according to the application at hand.