scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Sparse Document Image Coding for Restoration

TL;DR: This paper leverages the fact that different characters possess similar strokes, curves, and edges, and learns a dictionary that gives sparse decomposition for patches, and concludes that this method is ideally suited for restoring highly degraded images in repositories such as digital libraries.
Abstract: Sparse representation based image restoration techniques have shown to be successful in solving various inverse problems such as denoising, in painting, and super-resolution, etc. on natural images and videos. In this paper, we explore the use of sparse representation based methods specifically to restore the degraded document images. While natural images form a very small subset of all possible images admitting the possibility of sparse representation, document images are significantly more restricted and are expected to be ideally suited for such a representation. However, the binary nature of textual document images makes dictionary learning and coding techniques unsuitable to be applied directly. We leverage the fact that different characters possess similar strokes, curves, and edges, and learn a dictionary that gives sparse decomposition for patches. Experimental results show significant improvement in image quality and OCR performance on documents collected from a variety of sources such as magazines and books. This method is therefore, ideally suited for restoring highly degraded images in repositories such as digital libraries.

Summary (2 min read)

I. INTRODUCTION

  • Recent years have seen a surge of interest in digitizing the old documents and books to preserve them for posterity and because of their potential applications in information extraction, retrieval etc.
  • Unfortunately many of these old documents and manuscripts are often degraded due to erosion, aging, printing process, ink blot and fading.
  • These works make an assumption that the original clean image of a given degraded image admits a sparse representation with respect to some basis.
  • The authors then seek for high sparsity for degraded images to reconstruct the text regions removing noisy artifacts in documents.
  • There have been many attempts in solving the problem.

II. SPARSE CODING FOR IMAGE RESTORATION

  • The problem is tackled with the sparsity prior which assumes that natural image patches can be sparsely represented in an appropriately chosen overcomplete basis and their sparse representation can be recovered from the noisy patches.
  • The above presented sparse coding framework has proved to yield very good results in restoring natural images.
  • The authors demonstrate the above mentioned challenges with a simple experiment.
  • The authors consider a portion of the page from a magazine that contains both text and a photograph.
  • Degradation can be treated as missing pixel for photograph and as cuts for text region.

III. RESTORATION OF DOCUMENT IMAGES

  • The most critical challenge in restoration of document images using sparse coding can be explained with the help of Figure 3 .
  • This clearly does not hold in the case of document images.
  • Current dictionary learning techniques are not adequate to learn an appropriate dictionary and parameters under such a non-linear mapping.
  • This ensures that the resulting approximation is close to both binary and a valid document patch.
  • The authors restoration method is follows: 1) Learn a set of representative basis elements that summarizes a given set of clean image patches.

A. Dictionary Learning

  • The dictionary learning starts with a set of clean patches extracted from the segmented words.
  • Single non-zero constraint of the coefficients simplifies the K-SVD algorithm to K-means algorithm, however, with the constraint that basis elements are normalized.

B. Sparse coding

  • In order to avoid blocky artifacts in the reconstructed image, the authors use overlapping patches for restoration and the final reconstructed image is obtained by performing averaging at the overlapped regions.
  • The reconstructed image might be grayish with little noisy artifacts.
  • Regions corresponding to text will have large pixel values as they are efficiently reconstructed while noisy regions will have small values.
  • The authors thus use a simple post-processing step that binarizes the gray scale image to remove some of the noisy stray pixels.
  • The authors found that threshold parameter did not vary too much the quality of the outputs and is fixed to 0.3 in all their experiments.

IV. EXPERIMENTS AND RESULTS

  • Restoration using the proposed method were carried out on a variety of document images with different levels of degradation.
  • For learning step, the authors collected clean documents from a high quality book that has similar font as that of degraded image.
  • If the patch size is large, the dictionary elements may overfit the training data, resulting in reduced flexibility of degraded images that can be restored.
  • Table 1 , shows the input and output PSNR for different kinds of degradations with various levels of noise.
  • The authors will now look at the effect of their document restoration on OCR recognition which gives a good measure on the quality of restoration.

V. CONCLUSION

  • The authors present an approach to document restoration, that uses the fact that different characters in a document share similar strokes, curves, edges, etc.
  • The authors extend the sparse coding based restoration for document images and learned a set of dictionary elements that gives highly sparse decomposition for image patches.
  • The authors restored severe degradations, including cuts, merges, blobs and erosions in documents, and showed the experimental results on both positive and negative cases.
  • The authors also demonstrated the improvement in recognition performance of OCR system.
  • Unlike natural images, binary images take only few values of intensity and are structured.

Did you find this useful? Give us your feedback

Figures (8)

Content maybe subject to copyright    Report

Sparse Document Image Coding for Restoration
Vijay Kumar, Amit Bansal, Goutam Hari Tulsiyan, Anand Mishra, Anoop Namboodiri and C. V. Jawahar
Center for Visual Information Technology, IIIT Hyderabad, India
Abstract—Sparse representation based image restoration tech-
niques have shown to be successful in solving various inverse
problems such as denoising, inpainting, and super-resolution,
etc. on natural images and videos. In this paper, we explore
the use of sparse representation based methods specifically to
restore the degraded document images. While natural images
form a very small subset of all possible images admitting
the possibility of sparse representation, document images are
significantly more restricted and are expected to be ideally
suited for such a representation. However, the binary nature of
textual document images makes dictionary learning and coding
techniques unsuitable to be applied directly. We leverage the
fact that different characters possess similar strokes, curves, and
edges, and learn a dictionary that gives sparse decomposition
for patches. Experimental results show significant improvement
in image quality and OCR performance on documents collected
from a variety of sources such as magazines and books. This
method is therefore, ideally suited for restoring highly degraded
images in repositories such as digital libraries.
KeywordsDocument restoration, Sparse representation, Dic-
tionary learning
I. INTRODUCTION
Recent years have seen a surge of interest in digitizing the
old documents and books to preserve them for posterity and be-
cause of their potential applications in information extraction,
retrieval etc. Unfortunately many of these old documents and
manuscripts are often degraded due to erosion, aging, printing
process, ink blot and fading. One such degraded image is
shown in Figure 1(a). Apart from cuts and bleeds shown in
this example, other types of degradation occur frequently in
documents. Restoration may be used as pre-processing step in
applications related to recognition and retrieval. Figure 1(c)
shows OCR output of Figure 1(a) which is severly affected
due to low quality of the document. Clearly, it is necessary
to remove these noisy artifacts and restore the degraded
document, close to its original form.
Recently, sparse representation has been shown to yield
state-of-the-art results in solving inverse problems such as de-
noising [1][2], inpainting [3] and super-resolution [4], demon-
strated on gray and color images, and videos [2]. These works
make an assumption that the original clean image of a given
degraded image admits a sparse representation with respect
to some basis. The sparse codes of the clean image are then
recovered from the degraded image. This is due to recent
results from compressed sensing [5] that it is possible to
efficiently recover a sparse signal from incomplete or noisy
measurements provided the basis matrix possess some special
properties.
In sparse coding framework, a given signal or image patch
is represented as a sparse linear combination of an overcom-
plete basis or dictionary. In this paper, we extend its application
(a) Degraded image (b) Restored image
Bank. Well, the brothers,chatting along,
Zmpgened to get to wondering What
»e be eke fate of a petfecdy honest
and intelligent stranger who should be
‘firmed admfit in Lomdun Without a.friend,
wimh me meney" but We mifliow
beak-note, and me Way fie wecmmnt
few We being in m 0~€it- Bmeher
(c) OCR output of (a)
Bank. Well, the brothers, chatting along,
happened to get to wondering what
might be the fate of a perfectly honest
and intelligent stranger who should be
turned adrift in London without a friend,
and with no money but that million-
pound bank-note, and no way to account
for his being in possession of it. Brother
(d) OCR output of (b)
Fig. 1. Part of a scanned page of an old book with severe degradation and the
restored image (b). Bold words in (c) and (d) indicate differences in Tesseract
OCR results. We achieved significant improvement in error rate from 14% to
4.1%.
for document image restoration that are essentially binary in
nature. Our experiments suggest that developing sparse repre-
sentations for binary images need a slightly different approach
than grayscale and color images. We observe that different
characters share similar strokes, curves and edges. This allows
us to automatically learn a set of features/dictionary that
represents them efficiently using the training data. We then
seek for high sparsity for degraded images to reconstruct the
text regions removing noisy artifacts in documents. Figure 1(b)
and (d) shows the result of the degraded image restored by
our proposed method and its effect on OCRs performance,
respectively. We show an improvement in error rate from 14%
to 4.1% in OCR.
Restoration of document images is a well studied topic.
There have been many attempts in solving the problem.
Gupta et al. [6] used a patch based alphabet model to remove
blurring artifacts for license plate images using a camera.
Lelore et al. [7] proposed an approach for the binarization
of seriously degraded manuscripts where the MRF model
parameters are estimated from the training set. A patch based
method is proposed in [8] where each patch is corrected
by a weighted average of similar patches, identified using a
modified genetic algorithm. Huang et al. [9] combined the
degradation model and the document model into an MRF
framework.
Banerjee et al. [10] used an MRF technique that creates
an image with smooth regions in both the foreground and
the background, while allowing sharp discontinuities across
and smoothness along the edges. In their follow-up work
[11], they modeled the contextual relationship using an MRF
to restore documents with a wide variety of noises. Such
methods perform well in restoring many severely degraded
documents. However, they have practical limitations from their
heavy computational requirements, which increases with larger
context.
We briefly look into details of the natural image restoration

Fig. 2. Restoration of portion of a magazine (top-left) with text and image for missing pixels and cuts. Region corresponding to natual image is restored well
while text region is not. Portions zoomed out with red boxes belong to text and black boxes belong to image. (Best viewed by zooming on a computer)
using sparse representation followed by proposed method for
document restoration and experimental results.
II. SPARSE CODING FOR IMAGE RESTORATION
In this framework, the task is to recover an image X
(clean/high resolution) R
M×N
given a degraded (noisy/low
resolution/missing values) image Y R
M×N
. The problem
is tackled with the sparsity prior which assumes that natural
image patches can be sparsely represented in an appropriately
chosen overcomplete basis and their sparse representation can
be recovered from the noisy patches. Specifically one assumes
that a clean patch x R
d
of a clean image X has a
sparse representation with respect to an overcomplete basis
D R
d×m
(m d). i.e,
x Dα s.t. ||α||
0
L, (1)
where α is the sparse representation of the image patch and
||.||
0
is l
0
pseudo-norm, which gives a measure of number
of non-zero entries in a vector, and the constant L defines
the required sparsity level. Finding the sparse solution α is a
NP-hard problem. The techniques such as i) greedy methods
(matching pursuit [12]) or ii) convex relaxation (l
1
-norm) can
be used to solve the above problem. Note that, we do not know
either the clean image patch x or its representation α. However,
we can recover the sparse representation α from incomplete
or noisy input image patches y of image Y, with respect to
an overcomplete dictionary D due to recent results from [5].
Thus, sparse representation of x is recovered from y as
ˆα = min
α
||α||
0
s.t ||y Dα||
2
, (2)
where is constant and can be tuned according to the applica-
tion at hand. For denoising, could be tuned proportional to
noise variance if it is known. As observed in [1][13], learning
a dictionary from the images itself instead of a generic basis
(DCT or wavelet) could improve the restoration performance.
The above presented sparse coding framework has proved
to yield very good results in restoring natural images. However,
the application of sparse coding techniques on document
restoration is more challenging due to following reasons: (1)
Near pixel accurate restorations are important in document
images. Errors are immediately visible in binary images as
opposed to natural images. (2) Noise in natural images often
are uniform and homogenous where the variance is known or
estimated, but it is difficult to model the noise in document
images. (3) Noise in document images usually contain a
mixture of degradations coming from independent processes
such as erosion, cuts, bleeds, etc.
We demonstrate the above mentioned challenges with a
simple experiment. We consider a portion of the page from a
magazine that contains both text and a photograph. We synthet-
ically painted the image with white at randomly selected blocks
as shown in Figure 2. Degradation can be treated as missing
pixel (inpainting) for photograph and as cuts for text region.
We used the sparse coding technique proposed in [3] treating
the missing pixels (cuts) as infinite noise and restored the
image after learning a dictionary using large number of clean
text and natural image patches. It can be seen that the regions
corresponding to photograph are restored properly while text
regions are not.
III. RESTORATION OF DOCUMENT IMAGES
The most critical challenge in restoration of document
images using sparse coding can be explained with the help
of Figure 3. One of the fundamental assumptions in such a
representation is that the elements of the dictionary span the
subspace of images of interest and that any linear combination
of a sparse subset of dictionary elements is indeed a valid
image. This clearly does not hold in the case of document
images. Document image patches are binary in nature and so
are the dictionary elements (d
i
in Figure 3), which is not the
case with their linear combination. Ideally, a document image
patch (y) that we would like to represent using a dictionary D
should be computed as:
y = g(D, α), (3)
where α is a set of parameters and g is a non-linear function
that maps from the binary document dictionary elements to
a valid binary document image or patch. Current dictionary
learning techniques are not adequate to learn an appropriate
dictionary and parameters under such a non-linear mapping.

An alternative is to use a non-linear function (thresholding is
a not-so-good example) over a learned linear mapping to a
point y
0
in the subspace.
y
0
= Dα, and y = f(y
0
) (4)
0000011111
00
00
11
11
00000000001111111111
00000
00000
00000
11111
11111
11111
0000000
0000000
0000000
0000000
0000000
0000000
0000000
1111111
1111111
1111111
1111111
1111111
1111111
1111111
00
00
00
11
11
11
α
1
d
1
α
3
d
3
2
α d
2
d
2
d
3
d
1
Document Dictionary
Space Spanned by
f(y’)
y
y’
Dictionary Element
Linear combination
Point in Doc. space
Fig. 3. Space of images and the subspace spanned by the basis vectors in
the dictionary (shown in solid black). The document images (white circles)
are often outside the subspace spanned by the basis vectors.
Here we approximate the ideal non-linear representation
function, g, as f (y
0
), where y
0
is a linear combination of
dictionary elements weighted by α as shown in Figure 3.
As seen in Figure 2, the results of such approximations are
often very noisy. We get over this problem by approximating
a given noisy image using highly sparse representation, where
the sparsity is specified to be 1. This ensures that the resulting
approximation is close to both binary and a valid document
patch.
Our restoration method is follows:
1) Learn a set of representative basis elements that
summarizes a given set of clean image patches.
2) Find the sparse representation of each degraded patch
over the learned basis and binarize the output.
A. Dictionary Learning
The dictionary learning starts with a set of clean patches
extracted from the segmented words. Each word image of size
m × n is split into patches of size d = p × q resulting in
P patches and each patch is represented as a vector R
d
.
For basis learning, we use a method similar to the K-SVD
algorithm presented in [1]. We learn the basis D R
d×k
,
such that each patch is represented by a single basis element,
as shown in Equation (5). Single non-zero constraint of the
coefficients simplifies the K-SVD algorithm to K-means al-
gorithm, however, with the constraint that basis elements are
normalized.
{
ˆ
D, ˆα
i
} = arg min
D,α
i
P
X
i=1
||x
i
Dα
i
||
2
(5)
s.t ||α
i
||
0
= 1, i = {1, . . . , P }
and ||D
j
||
2
= 1, j = {1, . . . , k}
The above equation is optimized in an iterative fashion min-
imizing the objective function over D and α
i
, similar to the
algorithm presented in [1]. When D is fixed, α
i
R
k
is given
by α
j
i
= D
T
j
x
i
for j = l, where l = arg max
l
D
T
l
x
i
, and
(a) (b)
(c)
Fig. 4. (a) Portion of document page (b) Characters share common strokes,
curves, tips, etc. Boxes shown in blue share common tips and boxes shown
in red share similar curve (c) Dictionary/Features captures the characteristic
information of a text
α
j
i
= 0 for j 6= l. This would result in the selection of the
basis with maximum correlation with the given signal as the
representation. Then, each column of D is updated using SVD
while fixing other columns, similar to K-SVD algorithm.
Figure 4 shows a subset of basis elements learned from a
set of patches extracted from word images. Document images
are usually binary in nature with 0’s and 1’s corresponding to
text and background regions respectively. Since the interest
region is text, we operate on the inverted images to allow
the regular conventions of natural image representation. Basis
elements learned for document images can be easily interpreted
unlike natural images. The fundamental elements that consti-
tute the documents are strokes, curves, glyffs, etc. and our
method automatically learns these elements. This can be seen
in Figure 4 that dictionary elements correspond to character
strokes, thick edges, curves, etc occuring in textual characters
(Figure 4 (c)) thereby representing them efficiently. Different
English characters possess similar kind of edges, strokes
or curves and such patches may share the same dictionary
element.
B. Sparse coding
Once the basis is learnt from a set of clean patches, any
degraded patch y
i
R
d
of a noisy image Y R
M×N
can be
decomposed sparsely over the basis and can be reconstructed
as per Equation 5. In order to avoid blocky artifacts in the re-
constructed image, we use overlapping patches for restoration
and the final reconstructed image is obtained by performing
averaging at the overlapped regions.
The reconstructed image might be grayish with little noisy
artifacts. Regions corresponding to text will have large pixel
values as they are efficiently reconstructed while noisy regions
will have small values. We thus use a simple post-processing
step that binarizes the gray scale image to remove some of the
noisy stray pixels. We found that threshold parameter did not
vary too much the quality of the outputs and is fixed to 0.3 in
all our experiments.
IV. EXPERIMENTS AND RESULTS
Restoration using the proposed method were carried out
on a variety of document images with different levels of

degradation. We assume that clean document images with
similar font as the one used in the degraded images are
available. We also note that our method is robust to slight
variation of font between training and testing, as will be
demonstrated later. This kind of setting is very much suitable
to restore documents from digital libraries, magazines, etc.
In such a case, fonts and texts are constant throughout the
book and any recent publication of the magazine can be used
as high quality training documents. Also, with the advent of
internet, one can obtain clean documents with any font easily
e.g simple search of ‘gothic text’ will result in lot of high
quality documents which can be used to restore gothic texts.
For all the experiments, we segment the image into degraded
words and carry out restoration of the individual words.
For learning step, we collected clean documents from a
high quality book that has similar font as that of degraded
image. Figure 4(a) shows a small region of the clean images
collected from a high quality book. The number of sparse
coding and basis learning iterations was fixed empirically to
200. In order to maintain overcompleteness and recover sparse
representation [5], size of dictionary is usually fixed to four
times the size of the patch. It is observed in [1], [13], [3]
that very large dictionary leads to overfitting i.e, learnt atoms
may correspond to individual patches instead of generalizing
for large number of patches and very small dictionary leads
to underfitting. Figure 4(b) shows basis elements learned from
clean images for a patch size of 15 × 15.
Figure 5 shows eight different words from the book con-
taining cuts, erosion artifacts, and ink bleed, along with our
restoration results. One kind of degradation that we notice is
smear and ink blobs, as seen in words golf, fascinating, to
catch and laboratory. Our algorithm is able to restore these
words very well, especially the word fascinating which is
heavily degraded with characters almost getting connected.
Another kind of degradation is fading resulting in near cuts
as seen in character a in word sanguinary, which is restored
with high resolution. Our algorithm takes about 12 seconds to
restore a document of size 157 × 663 on a 2GB RAM and
Intel(R) Core(TM) i3 2120 system with 3.30 GHz processor
with un-optimized implementation.
The algorithm however fails to restore the characters v and
e in word several. The cut in v is very large compared to the
size of patch and the horizontal region in e has lot of missing
pixels and any patch considered in the region is blank and
hence algorithm could not estimate the shape in these regions.
Similarly, character y in surely has large amount of bleed which
the algorithm failed to restore.
We note that the patch size we considered for restoration
has a clear effect on the quality of restoration. If the patch
size is too small and comparable with the size of artifacts
such as blobs and cuts, the algorithm will restore the noisy
region as well. If the patch size is large, the dictionary elements
may overfit the training data, resulting in reduced flexibility
of degraded images that can be restored. We fixed the size of
patches to one-third of character font size.
We evaluate our algorithm both qualitatively and quantita-
tively on various kinds of synthetically generated degradations
such as pixel flipping, blurring, cuts, and texture-blending.
An example for each type of degradations and their restored
TABLE I. PSNR (dB) RESULTS OF RESTORATIONS OUTPUTS OF
VARIOUS SYNTHETIC DEGRADATIONS.
Flips Blur Cuts Texture blending
6.61 / 6.7 5.1 / 6.9 5.56 / 6.69 4.06 / 6.73
6.75 / 6.82 5.9 / 7.32 6.18 / 8.28 4.47 / 6.77
6.77 / 6.85 6.15 / 7.60 6.96 / 9.01 4.56 / 6.82
6.78 / 6.91 7.05 / 8.40 7.82 / 9.32 4.66 / 6.85
outputs are shown in Figure 7. Flipping is generated using
the method proposed in [14] for various PSNR values, by
tuning the parameters α
0
, β
0
, α
1
, β
1
. Blurring is produced by
convolving image with a Gaussian kernel of various sizes.
Various levels of cuts are produced by randomly selecting
windows in the image and randomly flipping few pixels in
the window. Finally, texture-blending simulates effects such as
textured paper or stained paper, and was produced by linearly
blending the document with a texture image for various degrees
of blending. Table 1, shows the input and output PSNR for
different kinds of degradations with various levels of noise. We
can see a clear improvement in the PSNR values for various
degradations.
We will now look at the effect of our document restoration
on OCR recognition which gives a good measure on the
quality of restoration. We used the ABBYY FineReader [15]
and Tesseract-2.01 OCR [16] which are the most popular
and accurate OCRs available. We ran the OCR on 20 pages
of an old English book collected from digital library. Each
page of the book contains an average of 300 words and 2200
characters. The error rate measured on degraded documents
using ABBYY FineReader was 9% which was already very
good. However, after the restoration, it got further reduced
to 0.7% which is a significant improvement. Similarly, it got
reduced from 14% to 4.1% using Tesseract.
Figure 1 shows the restoration result of a region of de-
graded page collected from a digital library. Figure 1(c) and
(d) show the results of Tesseract OCR output before and after
restoration respectively. The recognition error on the degraded
page was due to erosion and low printing quality, which might
possibly confuse the OCR when the noise fills up the gap
between two characters in a word. However, after restoration
Figure 1(b), it is recognized with high accuracy.
Figure 6 gives the restored image for the word played using
popular methods such as median, Gaussian, Non-local means
and ours. Clearly, our result is superior in quality compared
to these methods. Our method does not make any assumption
of script and thus same approach can be applied to restore
documents with any script. However, this is beyond the scope
of this paper.
V. CONCLUSION
We present an approach to document restoration, that
uses the fact that different characters in a document share
similar strokes, curves, edges, etc. We extend the sparse coding
based restoration for document images and learned a set of
dictionary elements that gives highly sparse decomposition
for image patches. We restored severe degradations, including
cuts, merges, blobs and erosions in documents, and showed the
experimental results on both positive and negative cases. We
also demonstrated the improvement in recognition performance
of OCR system. Though we demonstrated the application of

Fig. 5. Restoration of various degraded words. Our algorithm can effectively restore pixel flips, background noise and ink blots (first eight words), while large
blobs and cuts that are similar in size to the dictionary patches are not restored (see the last two words).
(a) Pixel Flipping (b) Blurring (c) Cuts (d) Texture blending degradation
Fig. 7. Different kind of synthetic degradations. In each column top image shows degraded image and bottom one shows corresponding restored images. (Best
viewed by zooming on computer)
(a) Degraded image
(b) Median filter (c) Gaussian filter
(d) NL-means (e) Ours
Fig. 6. Comaprison with other methods. (a) Cropped word “played” from a
degreded document. Output of (b) Median filter (c) Gaussian filter (4) Non-
local means (d) Ours. We observe that our restoration technique produces
cleaner image as compared to the traditional filtering techniques as well as
Non-local means filtering.
sparse coding on challenging document restoration, there is a
room for improvement. Unlike natural images, binary images
take only few values of intensity and are structured. We would
like to work on this aspect along with theoretical guarantees
of sparse coding on document images as a part of our future
work.
ACKNOWLEDGMENT
This work is partly supported by the MCIT, New Delhi.
Vijay Kumar is supported by TCS Reserach PhD fellowship.
Anand Mishra is supported by Microsoft Research India PhD
fellowship 2012 award.
REFERENCES
[1] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for
designing overcomplete dictionaries for sparse representation. IEEE
Trans. Singal Process., 2006.
[2] J. Mairal, G. Sapiro, and M. Elad, “Learning multiscale sparse repre-
sentations for image and video restoration, Multiscale Modeling and
Simulation, 2008.
[3] J. Mairal, M. Elad, and G. Sapiro, “Sparse Representation for Color
Image Restoration, IEEE Trans. Image Process., 2008.
[4] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution
via sparse representation, TIP, 2010.
[5] D. L. Donoho, “Compressed sensing, IEEE Trans. Inf. Theory, 2006.
[6] M. D. Gupta, S. Rajaram, N. Petrovic, and T. S. Huang, “Restoration
and recognition in a loop, in CVPR, 2005.
[7] T. Lelore and F. Bouchara, “Document image binarisation using markov
field model, in ICDAR, 2009.
[8] R. F. Moghaddam and M. Cheriet, “Beyond pixels and regions: A
non-local patch means (nlpm) method for content-level restoration,
enhancement, and reconstruction of degraded document images, PR,
2011.
[9] Y. Huang, M. S. Brown, and D. Xu, A framework for reducing ink-
bleed in old documents, in CVPR, 2008.
[10] J. Banerjee and C. V. Jawahar, “Super-resolution of text images using
edge-directed tangent field, in DAS, 2008.
[11] J. Banerjee, A. M. Namboodiri, and C. V. Jawahar, “Contextual restora-
tion of severely degraded document images, in CVPR, 2009.
[12] S. Mallat and Z. Zhang, “Matching pursuit with time-frequency dictio-
naries, IEEE Trans. Singal Process., vol. 41, no. 12, 1993.
[13] M. Elad and M. Aharon, “Image denoising via sparse and redundant
representations over learned dictionaries, TIP, 2006.
[14] T. Kanungo, R. M. Haralick, H. S. Baird, W. Stuetzle, and D. Madigan,
A statistical, nonparametric methodology for document degradation
model validation, PAMI, 2000.
[15] “http://www.abbyy.com/.
[16] “http://code.google.com/p/tesseract-ocr/.
Citations
More filters
Journal ArticleDOI

[...]

TL;DR: Significant improvements in visual quality and character recognition rates are achieved using the proposed approach, confirmed by a detailed comparative study with state-of-the-art upscaling approaches.
Abstract: Resolution enhancement has become a valuable research topic due to the rapidly growing need for high-quality images in various applications. Various resolution enhancement approaches have been successfully applied on natural images. Nevertheless, their direct application to textual images is not efficient enough due to the specificities that distinguish these particular images from natural images. The use of insufficient resolution introduces substantial loss of details which can make a text unreadable by humans and unrecognizable by OCR systems. To address these issues, a sparse coding-based approach is proposed to enhance the resolution of a textual image. Three major contributions are presented in this paper: (1) Multiple coupled dictionaries are learned from a clustered database and selected adaptively for a better reconstruction. (2) An automatic process is developed to collect the training database, which contains writing patterns extracted from high-quality character images. (3) A new local feature descriptor well suited for writing specificities is proposed for the clustering of the training database. The performance of these propositions is evaluated qualitatively and quantitatively on various types of low-resolution textual images. Significant improvements in visual quality and character recognition rates are achieved using the proposed approach, confirmed by a detailed comparative study with state-of-the-art upscaling approaches.

26 citations


Cites background from "Sparse Document Image Coding for Re..."

  • [...]

Journal ArticleDOI

[...]

TL;DR: This paper proposes a two-step restoration method for documents affected by bleed-through, exploiting information from the recto and verso images, and evaluates the performance of the proposed method on the images of a popular database of ancient documents, and the results validate the performance.
Abstract: Bleed-through is a frequent, pervasive degradation in ancient manuscripts, which is caused by ink seeped from the opposite side of the sheet. Bleed-through, appearing as an extra interfering text, hinders document readability and makes it difficult to decipher the information contents. Digital image restoration techniques have been successfully employed to remove or significantly reduce this distortion. This paper proposes a two-step restoration method for documents affected by bleed-through, exploiting information from the recto and verso images. First, the bleed-through pixels are identified, based on a non-stationary, linear model of the two texts overlapped in the recto-verso pair. In the second step, a dictionary learning-based sparse image inpainting technique, with non-local patch grouping, is used to reconstruct the bleed-through-contaminated image information. An overcomplete sparse dictionary is learned from the bleed-through-free image patches, which is then used to estimate a befitting fill-in for the identified bleed-through pixels. The non-local patch similarity is employed in the sparse reconstruction of each patch, to enforce the local similarity. Thanks to the intrinsic image sparsity and non-local patch similarity, the natural texture of the background is well reproduced in the bleed-through areas, and even a possible overestimation of the bleed through pixels is effectively corrected, so that the original appearance of the document is preserved. We evaluate the performance of the proposed method on the images of a popular database of ancient documents, and the results validate the performance of the proposed method compared to the state of the art.

9 citations


Cites methods from "Sparse Document Image Coding for Re..."

  • [...]

Proceedings ArticleDOI

[...]

01 Oct 2014
TL;DR: A preliminary study on the human eye's perception of English document images, taken by mobile phone at various distances is presented and a text enhancement technique using the combination of power-law transformation and morphological bottom hat filtering achieves an improved performance in OCR recognition rate.
Abstract: With the burst of low-cost and versatile mobile phones, people have more choices to digitize papers simply by capturing the text images of interest and put them through Optical Character Recognition (OCR) systems. However, the task is much more challenging when it comes to the issue of recognition of camera-based text images taken at remote distances. Those images, which often come up in small font and low resolution texts, results in poor readability in OCR. This paper presents a preliminary study on the human eye's perception of English document images, taken by mobile phone at various distances. In addition, a text enhancement technique using the combination of power-law transformation and morphological bottom hat filtering, following by morphological reconstruction and interpolation is also addressed. The results achieve an improved performance in OCR recognition rate.

7 citations


Cites background from "Sparse Document Image Coding for Re..."

  • [...]

Journal ArticleDOI

[...]

TL;DR: This study proves that the application of an efficient sparse coding-based denoising process followed by the magnification process can achieve good restoration results even if the input image is highly noisy.
Abstract: The resolution enhancement of textual images poses a significant challenge mainly in the presence of noise. The inherent difficulties are twofold. First is the reconstruction of an upscaled version of the input low-resolution image without amplifying the effect of noise. Second is the achievement of an improved visual image quality and a better OCR accuracy. Classically, the issue is addressed by the application of a denoising step used as a preprocessing or a post-processing to the magnification process. Starting by a denoising process could be more promising to avoid any magnified artifacts while proceeding otherwise. However, the state of the art underlines the limitations of denoising approaches faced with the low spatial resolution of textual images. Recently, sparse coding has attracted increasing interest due to its effectiveness in different reconstruction tasks. This study proves that the application of an efficient sparse coding-based denoising process followed by the magnification process can achieve good restoration results even if the input image is highly noisy. The main specificities of the proposed sparse coding-based framework are: (1) cascading denoising and magnification of each image patch, (2) the use of sparsity stemmed from the non-local self-similarity given in textual images and (3) the use of dual dictionary learning involving both online and offline dictionaries that are selected adaptively for each local region of the input degraded image to recover its corresponding noise-free high-resolution version. Extensive experiments on synthetic and real low-resolution noisy textual images are carried out to validate visually and quantitatively the effectiveness of the proposed system. Promising results, in terms of image visual quality as well as character recognition rates, are achieved when compared it with the state-of-the-art approaches.

5 citations


Cites background or methods from "Sparse Document Image Coding for Re..."

  • [...]

  • [...]

Proceedings ArticleDOI

[...]

23 Aug 2015
TL;DR: A joint denoising and magnification system based on sparse coding that uses both online and offline learned dictionaries that are selected adaptively for each image patch of the input Low-Resolution (LR) noisy image to generate its corresponding noise-free High- Resolution (HR) version.
Abstract: Current issues on textual image magnification have been focused on noise-free low-resolution images. Nevertheless, real circumstances are far from these assumptions and existing systems are generally confronted with noisy images; limiting thus the efficiency of the magnification process. The scope of this study is to propose a joint denoising and magnification system based on sparse coding to tackle such a problem. The underlying idea suggests the representation of an image patch by a linear combination of few elements from a suitable dictionary. The proposed system uses both online and offline learned dictionaries that are selected adaptively for each image patch of the input Low-Resolution (LR) noisy image to generate its corresponding noise-free High-Resolution (HR) version. In fact, the online learned dictionaries are trained on a clustered dataset of the image patches selected from the input image and used for the denoising purpose in order to take benefit of the non-local self-similarity assumption in textual images. For the offline learned dictionaries, they are trained on an external LR/HR image patch pair dataset and employed for the magnification purpose. The performance of the proposed system is evaluated visually and quantitatively on different LR noisy textual images and promising results are achieved when compared with other existing systems and conventional approaches dealing with such kind of images.

4 citations

References
More filters
Book

[...]

D.L. Donoho1
01 Jan 2004
TL;DR: It is possible to design n=O(Nlog(m)) nonadaptive measurements allowing reconstruction with accuracy comparable to that attainable with direct knowledge of the N most important coefficients, and a good approximation to those N important coefficients is extracted from the n measurements by solving a linear program-Basis Pursuit in signal processing.
Abstract: Suppose x is an unknown vector in Ropfm (a digital image or signal); we plan to measure n general linear functionals of x and then reconstruct. If x is known to be compressible by transform coding with a known transform, and we reconstruct via the nonlinear procedure defined here, the number of measurements n can be dramatically smaller than the size m. Thus, certain natural classes of images with m pixels need only n=O(m1/4log5/2(m)) nonadaptive nonpixel samples for faithful recovery, as opposed to the usual m pixel samples. More specifically, suppose x has a sparse representation in some orthonormal basis (e.g., wavelet, Fourier) or tight frame (e.g., curvelet, Gabor)-so the coefficients belong to an lscrp ball for 0

18,593 citations

Journal ArticleDOI

[...]

TL;DR: The authors introduce an algorithm, called matching pursuit, that decomposes any signal into a linear expansion of waveforms that are selected from a redundant dictionary of functions, chosen in order to best match the signal structures.
Abstract: The authors introduce an algorithm, called matching pursuit, that decomposes any signal into a linear expansion of waveforms that are selected from a redundant dictionary of functions. These waveforms are chosen in order to best match the signal structures. Matching pursuits are general procedures to compute adaptive signal representations. With a dictionary of Gabor functions a matching pursuit defines an adaptive time-frequency transform. They derive a signal energy distribution in the time-frequency plane, which does not include interference terms, unlike Wigner and Cohen class distributions. A matching pursuit isolates the signal structures that are coherent with respect to a given dictionary. An application to pattern extraction from noisy signals is described. They compare a matching pursuit decomposition with a signal expansion over an optimized wavepacket orthonormal basis, selected with the algorithm of Coifman and Wickerhauser see (IEEE Trans. Informat. Theory, vol. 38, Mar. 1992). >

8,847 citations


"Sparse Document Image Coding for Re..." refers methods in this paper

  • [...]

Journal ArticleDOI

[...]

TL;DR: A novel algorithm for adapting dictionaries in order to achieve sparse signal representations, the K-SVD algorithm, an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data.
Abstract: In recent years there has been a growing interest in the study of sparse representation of signals. Using an overcomplete dictionary that contains prototype signal-atoms, signals are described by sparse linear combinations of these atoms. Applications that use sparse representation are many and include compression, regularization in inverse problems, feature extraction, and more. Recent activity in this field has concentrated mainly on the study of pursuit algorithms that decompose signals with respect to a given dictionary. Designing dictionaries to better fit the above model can be done by either selecting one from a prespecified set of linear transforms or adapting the dictionary to a set of training signals. Both of these techniques have been considered, but this topic is largely still open. In this paper we propose a novel algorithm for adapting dictionaries in order to achieve sparse signal representations. Given a set of training signals, we seek the dictionary that leads to the best representation for each member in this set, under strict sparsity constraints. We present a new method-the K-SVD algorithm-generalizing the K-means clustering process. K-SVD is an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data. The update of the dictionary columns is combined with an update of the sparse representations, thereby accelerating convergence. The K-SVD algorithm is flexible and can work with any pursuit method (e.g., basis pursuit, FOCUSS, or matching pursuit). We analyze this algorithm and demonstrate its results both on synthetic tests and in applications on real image data

8,149 citations


"Sparse Document Image Coding for Re..." refers background or methods in this paper

  • [...]

  • [...]

  • [...]

  • [...]

  • [...]

Journal ArticleDOI

[...]

TL;DR: This work addresses the image denoising problem, where zero-mean white and homogeneous Gaussian additive noise is to be removed from a given image, and uses the K-SVD algorithm to obtain a dictionary that describes the image content effectively.
Abstract: We address the image denoising problem, where zero-mean white and homogeneous Gaussian additive noise is to be removed from a given image. The approach taken is based on sparse and redundant representations over trained dictionaries. Using the K-SVD algorithm, we obtain a dictionary that describes the image content effectively. Two training options are considered: using the corrupted image itself, or training on a corpus of high-quality image database. Since the K-SVD is limited in handling small image patches, we extend its deployment to arbitrary image sizes by defining a global image prior that forces sparsity over patches in every location in the image. We show how such Bayesian treatment leads to a simple and effective denoising algorithm. This leads to a state-of-the-art denoising performance, equivalent and sometimes surpassing recently published leading alternative denoising methods

5,015 citations


"Sparse Document Image Coding for Re..." refers background in this paper

  • [...]

  • [...]

Journal ArticleDOI

[...]

TL;DR: This paper presents a new approach to single-image superresolution, based upon sparse signal representation, which generates high-resolution images that are competitive or even superior in quality to images produced by other similar SR methods.
Abstract: This paper presents a new approach to single-image superresolution, based upon sparse signal representation. Research on image statistics suggests that image patches can be well-represented as a sparse linear combination of elements from an appropriately chosen over-complete dictionary. Inspired by this observation, we seek a sparse representation for each patch of the low-resolution input, and then use the coefficients of this representation to generate the high-resolution output. Theoretical results from compressed sensing suggest that under mild conditions, the sparse representation can be correctly recovered from the downsampled signals. By jointly training two dictionaries for the low- and high-resolution image patches, we can enforce the similarity of sparse representations between the low-resolution and high-resolution image patch pair with respect to their own dictionaries. Therefore, the sparse representation of a low-resolution image patch can be applied with the high-resolution image patch dictionary to generate a high-resolution image patch. The learned dictionary pair is a more compact representation of the patch pairs, compared to previous approaches, which simply sample a large amount of image patch pairs , reducing the computational cost substantially. The effectiveness of such a sparsity prior is demonstrated for both general image super-resolution (SR) and the special case of face hallucination. In both cases, our algorithm generates high-resolution images that are competitive or even superior in quality to images produced by other similar SR methods. In addition, the local sparse modeling of our approach is naturally robust to noise, and therefore the proposed algorithm can handle SR with noisy inputs in a more unified framework.

4,389 citations


"Sparse Document Image Coding for Re..." refers background in this paper

  • [...]

Related Papers (5)

[...]

01 Oct 2014

[...]

01 Nov 2012

[...]

31 Aug 2005
Hong Liu, Suoqian Feng, Hongbin Zha, Xueping Liu 
Frequently Asked Questions (19)
Q1. What are the future works mentioned in the paper "Sparse document image coding for restoration" ?

The authors would like to work on this aspect along with theoretical guarantees of sparse coding on document images as a part of their future work. 

In this paper, the authors explore the use of sparse representation based methods specifically to restore the degraded document images. 

In order to avoid blocky artifacts in the reconstructed image, the authors use overlapping patches for restoration and the final reconstructed image is obtained by performing averaging at the overlapped regions. 

The techniques such as i) greedy methods (matching pursuit [12]) or ii) convex relaxation (l1-norm) can be used to solve the above problem. 

In order to maintain overcompleteness and recover sparse representation [5], size of dictionary is usually fixed to four times the size of the patch. 

a document image patch (y) that the authors would like to represent using a dictionary D should be computed as:y = g(D,α), (3)where α is a set of parameters and g is a non-linear function that maps from the binary document dictionary elements to a valid binary document image or patch. 

Their algorithm takes about 12 seconds to restore a document of size 157 × 663 on a 2GB RAM and Intel(R) Core(TM) i3−2120 system with 3.30 GHz processor with un-optimized implementation. 

(3) Noise in document images usually contain a mixture of degradations coming from independent processes such as erosion, cuts, bleeds, etc. 

One of the fundamental assumptions in such a representation is that the elements of the dictionary span the subspace of images of interest and that any linear combination of a sparse subset of dictionary elements is indeed a valid image. 

The recognition error on the degraded page was due to erosion and low printing quality, which might possibly confuse the OCR when the noise fills up the gap between two characters in a word. 

texture-blending simulates effects such as textured paper or stained paper, and was produced by linearly blending the document with a texture image for various degrees of blending. 

with the advent of internet, one can obtain clean documents with any font easily e.g simple search of ‘gothic text’ will result in lot of high quality documents which can be used to restore gothic texts. 

The authors used the sparse coding technique proposed in [3] treating the missing pixels (cuts) as infinite noise and restored the image after learning a dictionary using large number of clean text and natural image patches. 

If the patch size is large, the dictionary elements may overfit the training data, resulting in reduced flexibility of degraded images that can be restored. 

It is observed in [1], [13], [3] that very large dictionary leads to overfitting i.e, learnt atoms may correspond to individual patches instead of generalizing for large number of patches and very small dictionary leads to underfitting. 

The fundamental elements that constitute the documents are strokes, curves, glyffs, etc. and their method automatically learns these elements. 

Another kind of degradation is fading resulting in near cuts as seen in character a in word sanguinary, which is restored with high resolution. 

i.e,x ≈ Dα s.t. ||α||0 L, (1)where α is the sparse representation of the image patch and ||.||0 is l0 pseudo-norm, which gives a measure of number of non-zero entries in a vector, and the constant L defines the required sparsity level. 

sparse representation of x is recovered from y asα̂ = min α ||α||0 s.t ||y −Dα||2 ≤ , (2)where is constant and can be tuned according to the application at hand.