scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

An MRF Model for Binarization of Natural Scene Text

18 Sep 2011-pp 11-16
TL;DR: This work represents the pixels in a document image as random variables in an MRF, and introduces a new energy function on these variables to find the optimal binarization, using an iterative graph cut scheme.
Abstract: Inspired by the success of MRF models for solving object segmentation problems, we formulate the binarization problem in this framework. We represent the pixels in a document image as random variables in an MRF, and introduce a new energy (or cost) function on these variables. Each variable takes a foreground or background label, and the quality of the binarization (or labelling) is determined by the value of the energy function. We minimize the energy function, i.e. find the optimal binarization, using an iterative graph cut scheme. Our model is robust to variations in foreground and background colours as we use a Gaussian Mixture Model in the energy function. In addition, our algorithm is efficient to compute, and adapts to a variety of document images. We show results on word images from the challenging ICDAR 2003 dataset, and compare our performance with previously reported methods. Our approach shows significant improvement in pixel level accuracy as well as OCR accuracy.

Summary (1 min read)

Jump to:  and [Summary]

Summary

  • Over the past two decades, numerous scientific studies have demonstrated that endocrine disrupting 70 chemicals (EDCs) elicit adverse effects on sensitive aquatic species, such as fish [1-7].
  • The REP, in turn, is the ratio of the effect 222 concentration of the reference compound estradiol EC50(E2) and the chemical i’s EC50(i) (Equation 2).
  • This result is supported by two recent reviews on the performance of current analytical 306 methods that have shown that 35 % of reviewed methods complied with the EQS for E2, while only one 307 method complied with the EQS for EE2 [49, 50].
  • The situation for MELN is markedly different from that of ER-CALUX.
  • 376 Thus, EEQchem results for MELN are strongly based on E1 concentrations – a compound that was always 377 measured (except for a few samples by Lab 2, Figure 3).

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

HAL Id: hal-00817972
https://hal.inria.fr/hal-00817972
Submitted on 17 Oct 2013
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
An MRF Model for Binarization of Natural Scene Text
Anand Mishra, Karteek Alahari, C.V. Jawahar
To cite this version:
Anand Mishra, Karteek Alahari, C.V. Jawahar. An MRF Model for Binarization of Natural Scene
Text. ICDAR - International Conference on Document Analysis and Recognition, Sep 2011, Beijing,
China. �10.1109/ICDAR.2011.12�. �hal-00817972�

An MRF Model for Binarization of Natural Scene Text
Anand Mishra
, Karteek Alahari
and C.V. Jawahar
International Institute of Information Technology Hyderabad, India
INRIA - Willow, ENS, Paris, France
Email: anand.mishra@research.iiit.ac.in, karteek.alahari@ens.fr, jawahar@iiit.ac.in
Abstract—Inspired by the success of MRF models for solving
object segmentation problems, we formulate the binarization
problem in this framework. We represent the pixels in a docu-
ment image as random variables in an MRF, and introduce a
new energy (or cost) function on these variables. Each variable
takes a foreground or background label, and the quality of the
binarization (or labelling) is determined by the value of the
energy function. We minimize the energy function, i.e. find the
optimal binarization, using an iterative graph cut scheme. Our
model is robust to variations in foreground and background
colours as we use a Gaussian Mixture Model in the energy
function. In addition, our algorithm is efficient to compute,
and adapts to a variety of document images. We show results
on word images from the challenging ICDAR 2003 dataset, and
compare our performance with previously reported methods.
Our approach shows significant improvement in pixel level
accuracy as well as OCR accuracy.
Keyw ords-MRF, GMM, Graph Cut, Binarization
I. INTRODUCTION
Binarization is one of the key preprocessing steps in any
document image analysis system. The performance of the
subsequent steps like character segmentation and recogni-
tion are highly dependant on the success of binarization.
Document image binarization is an active area of research
for many years. Is binarization a solved problem? Obviously
not, especially, due to the emerging need for recognition
of text in video sequences, digital-born (Web and email)
images, old historic manuscripts and natural scenes where
the state of art recognition performance is really poor. In
this regard, designing a powerful binarization algorithm
can be considered as a major step towards robust text
understanding. The recent interest of the community by
organising a binarization contest like DIBCO 2009 [1] at
10th International Conference on Document Analysis and
Recognition (ICDAR 2009) also supports our claim. Note
that DIBCO 2009 had 43 submissions which shows active
interest in this research area.
We, in this work, focus on binarization of natural scene
text. Natural scene texts contain numerous degradations not
usually present in machine printed ones such as uneven
lighting, blur, complex background, perspective distortion,
multiple colours etc. Methods such as interactive graph
cut by Boykov et al. [2] and thereafter GrabCut [3] have
shown p romising performance in foreground/background
segmentation of natural scenes in recent years. We formulate
Figure 1. Some samples images we considered in this work
the binarization problem in this framework (where text is
foreground and anything else is background), and define a
novel energy (cost) function such that the quality of the
binarization is determined by the energy value. We minimize
this energy functio n to find the optimal binarization using
an iterative graph cut scheme. The graph cut method needs
to be initialized with foreground/background seeds. To make
the binarization fully automatic, we obtain initial seeds for
graph cuts by our auto-seeding algorithm. At each iteration
of graph cut, seeds and binarization are refined. This makes
it more powerful compared to one shot graph cut algorithm.
Moreover, we model foreground and background colours in
a G MMRF framework [4] to make the binarization robust
to variations in foreground and background colours.
The remainder of the paper is organised as follows.
We discuss related work in Section II. In Section III, the
binarization problem is formulated as a labelling problem,
where we define an energy function such that its minimum
corresponds to the target binary image. This section also
briefly introduces graph cut method. Section IV explains
proposed iterative graph cut based binarization scheme. It
also elaborates the method of finding auto-seeds for the
graph cut. Section V describes experiments and results based
on the challenging ICDAR 2003 word dataset. Some sample
images of this dataset are shown in Figure 1. We finally
conclude the work in Section VI.
II. R
ELATED WORK
Traditional thresholding based binarization can be cat-
egorized into two categories: the one which uses global
threshold for the given document (like Otsu [5], Kittler et
al. [6]) and the one with local thresholds (like Sauvola [7],
Niblack [8]). An exhaustive review of thresholding based

binarization is beyond the scope of this paper. The reader
is encouraged to see [9] for this. Although most of these
previous algorithms perform satisfactorily for many cases,
they suffer from the problems like: (1) Manual tuning of pa-
rameters, (2) High sensitivity to the choice of parameters, (3)
Handling images with uneven lighting, noisy background,
similar foreground-background colours.
Recently, Markov Random Field (MRF) based binariza-
tion has been applied for degraded documents. In [10],
Wolf et al. proposed binarization in an energy minimization
framework and applied a less powerful and computationally
expensive simulated annealing (SA) for energy minimiza-
tion. In [11], authors classified document into Text Region
(TR), Near Text Region (NTR) and Background Regions
(BR) and then applied graph cut to produce final binary
image. MRF based binarization for hand-held device cap-
tured document images was proposed in [12], where authors
first used thresholding based technique to produce a binary
image and then applied graph cuts to remove noise and
smooth binarization output. However, these methods can
not be directly applied to natural scene text images due to
additional challenges like blur, hardly distinguishable fore-
ground/background colours, variable font sizes, and styles.
Researchers have also shown interest in colour image
binarization in recent years (see [13], [14]). But these
methods lack a principled formulation of the binarization
problem of complex colour documents, and hence can not
be generalized.
III. T
HE BINARIZATION PROBLEM
We define the binarization problem in a labelling frame-
work as follows: the binarization of an image can be
expressed as a vector of binary random variables X =
{X
1
,X
2
, ..., X
n
}, where each random variable X
i
takes a
label x
i
∈{0, 1} based on whether it is text (foreground)
or non-text (background). Most of the heuristic based al-
gorithms take the decision of assigning label 0 or 1 to x
i
based on the pixel value at that position or local statistics.
Such algorithms are not effective in our case because of the
variations in foreground/background colour distributions.
In this work, we formulate the problem in a more princi-
pled framework where we represent image pixels as nodes in
a Markov Random Field and associate a unary and pairwise
cost of labelling pixels. We then solve the problem in an
energy minimization framework where the “Gibbs energy
function E of following form is defined:
E(x, θ, z)=E
i
(x, θ, z)+E
ij
(x, z), (1)
such that its minimum corresponds to the target binary
image. Here x = {x
1
,x
2
, ..., x
n
} is a set of labels at each
pixel. θ is the set of model parameters which is learnt
from the foreground/background colour distributions and the
vector z = {z
1
,z
2
, ..., z
n
} denotes the colour intensities of
pixels.
In Equation (1), E
i
(·) and E
ij
(·) corresponds to data term
and smoothness term respectively. Data term E
i
(·) measures
the degree of agreement o f the inferred label x
i
to the
observed image data z
i
. The smoothness term measures the
cost of assigning labels x
i
, x
j
to adjacent pixels and is used
to impose spatial smoothness. A typical unary term can be
expressed as:
E
i
(x, θ, z)=
i
log p(x
i
|z
i
).
Similarly, the smoothness term most commonly used in
literature is the Potts model:
E
ij
(x, z)=λ
(i,j)N
exp
(z
i
z
j
)
2
2β
2
[x
i
= x
j
]
dist(i, j)
,
where λ determines the degree of smoothness, dist(i, j) is
the Euclidean distance between neighbouring pixels i and j.
The constant β allows discontinuity preserving smoothing. N
denotes the neighbourhood system defined in MRF. Further,
the smoothness term imposes cost only for those adjacent
pixels which have different labels (i.e. [x
i
= x
j
]).
The problem of binarization is now to find the global
minima of the Gibbs energy, i.e.,
x
=argmin
x
E(x, θ , z). (2)
The global minima of this energy function can be efficiently
computed by graph cut [15] su bject to fulfilling the cri-
teria of sub modularity [16]. For this a weighted graph
G =(V,E) is formed where each vertex corresponds to an
image pixel, and edges link adjacent pixels. Two additional
vertices source (s) and sink (t) are added to the graph. All the
other vertices are connected to them with weighted edges.
The weights of all the edges are defined in such a way that
every cut of the graph is equivalent to some label assignment
to the energy function. Note that the cut of the graph G
is a partition of set of vertices V into two disjoint sets S
and T and the cost of the cut is defined as the sum of the
weights of edges going from vertices belonging to set S to T
(see [16]). The min cut of such a graph corresponds to the
global m inima of the energy function. There are efficient
implementations available for finding min cut of such a
graph [15].
In [2], the set of model parameters θ describe im-
age foreground/background histograms. The histograms are
constructed directly from the foreground/background seeds
which are obtained with user interaction. However, the
foreground/background distribution in our case (see images
in Figure 1) can not be captured efficiently by a naive
histogram distribution. Rather, we assume each pixel colour
is generated from a Gaussian Mixture Model (GMM). In this
regard, we are highly inspired by the success of the GrabCut
[3] for object segmentation. But at the same time, we want
to avoid any user interaction to make the binarization fully
automatic. We achieve this by our auto seeding algorithm

which we describe in the Section IV-A. Furthermore, iter-
ative graph cut based binarization is also more suitable for
our application as it refines seeds and, binarization output at
each iteration and thus produces a clean binarization result
even in case of noisy foreground/background distributions.
IV. I
TERATIVE GRAPH CUT BASED BINARIZATION
In GMMR F framework [4], each pixel colour is generated
from one of the 2c Gaussian Mixture Models (GMM
S)(c
GMM
S for foreground and background each) with mean
μ and covariance Σ i.e. each foreground colour pixel is
generated from following distribution:
p(z
i
|x
i
, θ,k
i
)=N(z, θ; μ(x
i
,k
i
), Σ(x
i
,k
i
)), (3)
where N denotes a Gaussian distribution, x
i
∈{0, 1}
and k
i
∈{1, ..., c}. To model foreground colour using
above distribution, an additional vector k = {k
1
,k
2
, ..., k
n
}
is introduced where each k
i
takes one of the c GMM
components. Similarly, background colour is modelled from
one of the c GMM components. Further, the likelihood prob-
abilities of observation can be assumed to be independent
from the pixel position. Thus can be expressed as:
p(z|x, θ, k)=
i
p(z
i
|x
i
, θ,k
i
)
=
i
π(x
i
,k
i
)
det(Σ(x
i
,k
i
))
×
exp(
1
2
(z
i
μ(x
i
,k
i
))
T
Σ(x
i
,k
i
)
1
(z
i
μ(x
i
,k
i
))).
Here π(·) is Gaussian mixture weighting coefficient.
Due to the introduction of GMM
S the energy function in
Equation (1) now becomes:
E(x, k, θ, z)=E
i
(x, k, θ, z)+E
ij
(x, z), (4)
i.e. the data term d epends on its assignment to GMM
component. It is given by:
E
i
(x, k, θ, z)=
i
log p(z|x, θ, k). (5)
In order to make the energy function robust to low
contrast colour images we modify the smoothness term of
the energy function by adding a new term which measures
the “edginess” of the pixels as follows:
E
ij
(x, z)=λ
1
(i,j)N
[x
i
= x
j
]exp(β||z
i
z
j
||
2
)
+λ
2
(i,j)N
[x
i
= x
j
]exp(β||w
i
w
j
||
2
).
(6)
Here w
i
denotes the magnitude of gradient (edginess) at
pixel i and N denotes the neighbourhood system defined for
the MRF model. The two neighbouring pixels with similar
edginess values are more likely to belong to the same class.
The edginess term enforces this constraint. The constants
λ
1
and λ
2
determine the relative strength of the colour and
edginess differences respectively. Parameters λ
i
and β are
learnt automatically from the image.
The Gaussian Mixture Models, in Equation (5), need to
be initialized with foreground/background seeds. Since our
objective is to make the binar ization fully automatic, we
initialize GMM
S by foreground-background seeds obtained
from our auto seeding algorithm. Then, at each iteration, the
seeds are refined and n ew GMM
S are learnt from them. It
makes the algorithm more powerful and allows it to adapt
to the variations in foreground/background.
A. Auto-seeding
To perform automatic binarization we need to compute
foreground and background seeds for graph cut. Given an
image we first convert it to an edge image using Canny edge
operator and then find the foreground and background seeds
as follows:
1) Foreground seeds: Our foreground seeding algorithm
is highly motivated from the fact that there exist a parallel
edge curve (line) for every edge curve (line) in a character
i.e. if an edge pixel has gradient orientation θ then in
direction of θ there exists an edge pixel whose gradient
orientation is π θ
Step 1: Let p be a non-traversed edge pixel with gradient
orientation θ. For every such edge pixel p we traverse the
edge image in direction of θ until we hit an edge pixel q
whose gradient orientation is (πθ)±
π
36
(i.e. approximately
opposite gradient direction). We mark this line segment
pq
as foreground seed candidate and store the length of it. We
repeat this process for all the non-traversed edge pixels.
After finding all foreground seed candidates, we remove all
those line segments whose length is too high or too low with
respect to the majority of seed candidates. The remaining
line segments are marked as foreground seeds.
Step 2: Handling images with light text on dark back-
ground: When we have such image we rarely get parallel
edge curves (lines) with the above mentioned traversal,
rather many line segments
pq start hitting the image bound-
ary. We automatically detect such situations and subtract π
to the original orientation a nd then follow the same process
as Step 1.
2) Background seeds: For background seeding we adopt
the following scheme: Given an edge image we find out
the horizontal/vertical line having no edge pixel. We mar k
that line as background. When we do not get background
seeds in the above method then we relax our criteria and
mark all those regions as background which are accessible
(without hitting an edge pixel) from at least two sides of
image boundary. In practice, for some cases we do not get
enough background seeds even after relaxation. For such
cases we traverse the edge image from all four sides o f
the image boundary till we hit an edge. We mark all these
regions as background seeds. Figure 2 shows typical initial
seeds fo r the iterative graph cut.

(a) (b)
Figure 2. (a) Input Image (b) Its foreground-background seeds, Red and
blue colour shows foreground and background seeds respectiv ely (Best
vie wed in colour).
Figure 3. Images where auto-seeding fails
Although the proposed auto-seeding method performs
satisfactorily well, it tend to fail in cases where Canny
edge operator produces too many noisy or broken edges.
In such cases some foreground regions are falsely marked
as background and vice-versa, which leads to poor bina-
rization. We show two such example in Figure 3, where our
auto-seeding algorithm fails to mark foreground-background
regions appropriately.
In summary, once we obtain initial seeds, GMM
S for
foreground and background colours are learnt. Then, based
on the data and smoothness terms in Equation (5) and (6)
respectively, the graph is formed. We use standard graph cut
algorithm [15] to obtain initial binarization result. We then
re-estimate GMM
S using an initial binarization result and
iterate the graph cut over new data and smooth ness term,
until convergence. This refines the binary image at each
iteration an d finally pr oduces a clean binary image.
V. R
ESULTS AND DISCUSSIONS
We use sample images from the ICDAR 2003 Robust
Word Recognition dataset [17] for our experiments. It con-
sists of 171 natural scene text images. These images have
several degradations due to uneven lighting, complex back-
ground, blur and similar foreground background colours. To
evaluate the performance of proposed binarization algorithm,
we compare it with the well-known thresholding based bina-
rization techniques like Otsu [5], Sauvola [7], Niblack [8],
Kittler et al. [6]. We also compare our binarization algorithm
with colour thresholding based method proposed in [14].
Note that these classical binarization algorithms produce
white text on black background in case of images with light
text on dark background. On the contrary, our binarization
algorithm works in object segmentation framework and thus
produces black text on white background always. However,
for fair comparison we reverse the colour of binarized output
of the classical methods if they produce white text on black
background.
For the proposed binarization algorithm we used 10
GMM components (5 each for foreground and background).
We empirically determine the number of iteration for graph
cuts as 8, since no significant change in binarization, was
observed beyond 8 iterations. We also show our results with
and without edginess difference in the the pairwise term.
(Note that by edginess difference term we mean, energy
function with gradient magnitude difference in addition to
difference in RGB colour space). For parameter sensitive
algorithms like [7] and [8] we use the parameters from which
we obtain the best OCR accuracy.
All the implementations of the proposed method are done
using C++ graph cut code [15] and Matlab. The proposed
method takes 32 seconds on average to produce final binary
result for an image on system with 2 GB RAM and Intel
R
Core
TM
2 Duo CPU with 2.93 GHz processor system.
A. Qualitative evaluation
First we compare the proposed binarization algorithm with
thresholding based methods intuitively in Figure 4. Samples
of images with uneven lighting, hardly distinguishable fore-
ground/background colours, noisy foreground colours, are
shown in this figure. We observe that our approach produces
clearly readable binary images. Further, our algorithm pro-
duces lesser noise compared to the local thresholding based
algorithms like [7], [8], which also helps to improve the
OCR accuracy.
B. Quantitative evaluation
Quantitative evaluation of bin arization is o ne of the
biggest challenge for document image community [9]. In
this work, we demonstrate the performance of binarization
not only based on OCR accuracy but also in terms of pixel
level accuracy.
1) OCR accuracy: We test OCR accuracy to verify
robustness of our algorithm. For this we fed the binariza-
tion result of all algorithms to commercial OCR ABBYY
fine reader 9.0 [18]. The word and c haracter r ecognition
accuracies are summarized in Table I. Since this dataset
consists of images of tight word boundaries, global methods
(like [5], [6]) performs better than popular local methods.
Furthermore, OCR fails to perform well in case of noisy
binarization output (as in the case of Sauvola and Niblack).
Otsu followed by colour thresholding binarization proposed
in [14] improves the word recognition accuracy but not sig-
nificantly. However, since the proposed algorithm produces
clean binary images, it shows significant improvement in
OCR accuracy.
2) Pixel level accuracy: For comparing various bina-
rization algorithms based on pixel accuracy,wepicked30
images from the ICDAR 2003 word dataset and produced
pixel level binarization ground truth for it. These images

Citations
More filters
01 Jan 2011
TL;DR: A new benchmark dataset for research use is introduced containing over 600,000 labeled digits cropped from Street View images, and variants of two recently proposed unsupervised feature learning methods are employed, finding that they are convincingly superior on benchmarks.
Abstract: Detecting and reading text from natural images is a hard computer vision task that is central to a variety of emerging applications. Related problems like document character recognition have been widely studied by computer vision and machine learning researchers and are virtually solved for practical applications like reading handwritten digits. Reliably recognizing characters in more complex scenes like photographs, however, is far more difficult: the best existing methods lag well behind human performance on the same tasks. In this paper we attack the problem of recognizing digits in a real application using unsupervised feature learning methods: reading house numbers from street level photos. To this end, we introduce a new benchmark dataset for research use containing over 600,000 labeled digits cropped from Street View images. We then demonstrate the difficulty of recognizing these digits when the problem is approached with hand-designed features. Finally, we employ variants of two recently proposed unsupervised feature learning methods and find that they are convincingly superior on our benchmarks.

5,311 citations


Cites methods from "An MRF Model for Binarization of Na..."

  • ...Finally, we note that in prior work binarization has been an important component in scene text applications, driven partly by efforts to re-use existing OCR machinery in new domains [24, 25]....

    [...]

Proceedings ArticleDOI
07 Sep 2009
TL;DR: A framework is presented that uses a higher order prior computed from an English dictionary to recognize a word, which may or may not be a part of the dictionary, and achieves significant improvement in word recognition accuracies without using a restricted word list.
Abstract: The problem of recognizing text in images taken in the wild has gained significant attention from the computer vision community in recent years. Contrary to recognition of printed documents, recognizing scene text is a challenging problem. We focus on the problem of recognizing text extracted from natural scene images and the web. Significant attempts have been made to address this problem in the recent past. However, many of these works benefit from the availability of strong context, which naturally limits their applicability. In this work we present a framework that uses a higher order prior computed from an English dictionary to recognize a word, which may or may not be a part of the dictionary. We show experimental results on publicly available datasets. Furthermore, we introduce a large challenging word dataset with five thousand words to evaluate various steps of our method exhaustively. The main contributions of this work are: (1) We present a framework, which incorporates higher order statistical language models to recognize words in an unconstrained manner (i.e. we overcome the need for restricted word lists, and instead use an English dictionary to compute the priors). (2) We achieve significant improvement (more than 20%) in word recognition accuracies without using a restricted word list. (3) We introduce a large word recognition dataset (atleast 5 times larger than other public datasets) with character level annotation and benchmark it.

789 citations

Journal ArticleDOI
TL;DR: This review provides a fundamental comparison and analysis of the remaining problems in the field and summarizes the fundamental problems and enumerates factors that should be considered when addressing these problems.
Abstract: This paper analyzes, compares, and contrasts technical challenges, methods, and the performance of text detection and recognition research in color imagery It summarizes the fundamental problems and enumerates factors that should be considered when addressing these problems Existing techniques are categorized as either stepwise or integrated and sub-problems are highlighted including text localization, verification, segmentation and recognition Special issues associated with the enhancement of degraded text and the processing of video text, multi-oriented, perspectively distorted and multilingual text are also addressed The categories and sub-categories of text are illustrated, benchmark datasets are enumerated, and the performance of the most representative approaches is compared This review provides a fundamental comparison and analysis of the remaining problems in the field

709 citations


Cites methods from "An MRF Model for Binarization of Na..."

  • ...segmentation problems, Mishra [135] and Kim and Lee [185] formulated the text binarization problem in optimal frameworks and used an energy minimization to label text pixels....

    [...]

  • ...Mishra et al. [161] presented a framework that utilizes both bottom-up (character) and top-down (language) cues for text recognition....

    [...]

  • ...Inspired by the success of CRF models for solving image segmentation problems, Mishra [135] and Kim and Lee [185] formulated the text binarization problem in optimal frameworks and used an energy minimization to label text pixels....

    [...]

Proceedings ArticleDOI
23 Jun 2014
TL;DR: This paper proposes a novel multi-scale representation for scene text recognition that consists of a set of detectable primitives, termed as strokelets, which capture the essential substructures of characters at different granularities.
Abstract: Driven by the wide range of applications, scene text detection and recognition have become active research topics in computer vision. Though extensively studied, localizing and reading text in uncontrolled environments remain extremely challenging, due to various interference factors. In this paper, we propose a novel multi-scale representation for scene text recognition. This representation consists of a set of detectable primitives, termed as strokelets, which capture the essential substructures of characters at different granularities. Strokelets possess four distinctive advantages: (1) Usability: automatically learned from bounding box labels, (2) Robustness: insensitive to interference factors, (3) Generality: applicable to variant languages, and (4) Expressivity: effective at describing characters. Extensive experiments on standard benchmarks verify the advantages of strokelets and demonstrate the effectiveness of the proposed algorithm for text recognition.

303 citations


Cites methods from "An MRF Model for Binarization of Na..."

  • ...We intentionally avoid the term “character detection” as certain algorithms (such as [17, 29]) utilize binarization to seek character candidates....

    [...]

  • ...However, binarization based methods [17, 29] are sensitive to noise, blur and nonuniform illumination; connected component based methods [21, 23] are unable to handle connected characters and...

    [...]

  • ...To tackle these issues, several approaches were proposed, which employed adaptive binarization [17, 29], connected component extraction [21, 23] or direct character detection [27, 18, 25]....

    [...]

Journal ArticleDOI
TL;DR: Jiang et al. as mentioned in this paper summarized and analyzed the major changes and significant progresses of scene text detection and recognition in the deep learning era, highlighting recent techniques and benchmarks, and looking ahead into future trends.
Abstract: With the rise and development of deep learning, computer vision has been tremendously transformed and reshaped. As an important research area in computer vision, scene text detection and recognition has been inevitably influenced by this wave of revolution, consequentially entering the era of deep learning. In recent years, the community has witnessed substantial advancements in mindset, methodology and performance. This survey is aimed at summarizing and analyzing the major changes and significant progresses of scene text detection and recognition in the deep learning era. Through this article, we devote to: (1) introduce new insights and ideas; (2) highlight recent techniques and benchmarks; (3) look ahead into future trends. Specifically, we will emphasize the dramatic differences brought by deep learning and remaining grand challenges. We expect that this review paper would serve as a reference book for researchers in this field. Related resources are also collected in our Github repository ( https://github.com/Jyouhou/SceneTextPapers ).

243 citations

References
More filters
Proceedings ArticleDOI
26 Jul 2009
TL;DR: This paper proposes a document image binarization method, which is especially robust to the images degraded by uneven light condition, such as the camera captured document images, and uses a descriptor that captures the regional properties around a given pixel.
Abstract: This paper proposes a document image binarization method, which is especially robust to the images degraded by uneven light condition, such as the camera captured document images. A descriptor that captures the regional properties around a given pixel is first defined for this purpose. For each pixel, the descriptor is defined as a vector composed of filter responses with varying length. This descriptor is shown to give highly discriminating pattern with respect to the background region, text region, and near text region. Of course there are misclassified pixels, which are then relabeled using an energy optimization method, specifically by using the graph cut method. For this, we devise an appropriate energy function that leads to clear and correct binarization. The proposed descriptor is also used for the skew detection, and thus correcting the skewed documents.

33 citations


"An MRF Model for Binarization of Na..." refers methods in this paper

  • ...In [11], authors classified document into Text Region (TR), Near Text Region (NTR) and Background Regions (BR) and then applied graph cut to produce final binary image....

    [...]

Proceedings ArticleDOI
12 Dec 2010
TL;DR: A novel Markov random fields based binarization algorithm is proposed to segment foreground text from document images captured using hand-held devices (such as cell-phone or digital camera) and outperforms other state-of-the-art approaches.
Abstract: In this paper, a novel Markov random fields (MRF) based binarization algorithm is proposed to segment foreground text from document images captured using hand-held devices (such as cell-phone or digital camera). In the MRF based framework, an edge potential feature is extracted to preserve the strokes of foreground text and to remove isolated noise and an intensity feature is used to smooth the entire document image. Prior to binarization, we use a nonlinear function to enhance the quality of document images which suffer from insufficient or uneven illumination. Experimental results show that our method outperforms other state-of-the-art approaches.

19 citations


"An MRF Model for Binarization of Na..." refers methods in this paper

  • ...MRF based binarization for hand-held device captured document images was proposed in [12], where authors first used thresholding based technique to produce a binary image and then applied graph cuts to remove noise and smooth binarization output....

    [...]

Frequently Asked Questions (12)
Q1. What are the contributions mentioned in the paper "An mrf model for binarization of natural scene text" ?

Inspired by the success of MRF models for solving object segmentation problems, the authors formulate the binarization problem in this framework. The authors represent the pixels in a document image as random variables in an MRF, and introduce a new energy ( or cost ) function on these variables. The authors show results on word images from the challenging ICDAR 2003 dataset, and compare their performance with previously reported methods. Their approach shows significant improvement in pixel level accuracy as well as OCR accuracy. 

iterative graph cut based binarization is also more suitable for their application as it refines seeds and, binarization output at each iteration and thus produces a clean binarization result even in case of noisy foreground/background distributions. 

Due to the introduction of GMMS the energy function in Equation (1) now becomes:E(x, k, θ, z) = Ei(x, k, θ, z) + Eij(x, z), (4)i.e. the data term depends on its assignment to GMM component. 

The proposed method takes 32 seconds on average to produce final binary result for an image on system with 2 GB RAM and Intel R© CoreTM 2 Duo CPU with 2.93 GHz processor system. 

(Note that by edginess difference term the authors mean, energy function with gradient magnitude difference in addition to difference in RGB colour space). 

the smoothness term most commonly used in literature is the Potts model:Eij(x, z) = λ ∑(i,j)∈N exp−(zi − zj)2 2β2 [xi = xj ] dist(i, j) ,where λ determines the degree of smoothness, dist(i, j) is the Euclidean distance between neighbouring pixels i and j. 

For every such edge pixel p the authors traverse the edge image in direction of θ until the authors hit an edge pixel q whose gradient orientation is (π−θ)± π36 (i.e. approximately opposite gradient direction). 

(5)In order to make the energy function robust to low contrast colour images the authors modify the smoothness term of the energy function by adding a new term which measures the “edginess” of the pixels as follows:Eij(x, z) = λ1 ∑(i,j)∈N [xi = xj ]exp(−β||zi − zj||2)+λ2 ∑(i,j)∈N [xi = xj ]exp(−β||wi − wj ||2). 

ITERATIVE GRAPH CUT BASED BINARIZATIONIn GMMRF framework [4], each pixel colour is generated from one of the 2c Gaussian Mixture Models (GMMS) (c GMMS for foreground and background each) with mean μ and covariance Σ i.e. each foreground colour pixel is generated from following distribution:p(zi|xi, θ, ki) = N (z, θ; μ(xi, ki), Σ(xi, ki)), (3) where N denotes a Gaussian distribution, xi ∈ {0, 1} and ki ∈ {1, ..., c}. 

Although most of these previous algorithms perform satisfactorily for many cases, they suffer from the problems like: (1) Manual tuning of parameters, (2) High sensitivity to the choice of parameters, (3) Handling images with uneven lighting, noisy background, similar foreground-background colours. 

The authors then re-estimate GMMS using an initial binarization result and iterate the graph cut over new data and smoothness term, until convergence. 

But these methods lack a principled formulation of the binarization problem of complex colour documents, and hence can not be generalized.