scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

An MRF Model for Binarization of Natural Scene Text

18 Sep 2011-pp 11-16
TL;DR: This work represents the pixels in a document image as random variables in an MRF, and introduces a new energy function on these variables to find the optimal binarization, using an iterative graph cut scheme.
Abstract: Inspired by the success of MRF models for solving object segmentation problems, we formulate the binarization problem in this framework. We represent the pixels in a document image as random variables in an MRF, and introduce a new energy (or cost) function on these variables. Each variable takes a foreground or background label, and the quality of the binarization (or labelling) is determined by the value of the energy function. We minimize the energy function, i.e. find the optimal binarization, using an iterative graph cut scheme. Our model is robust to variations in foreground and background colours as we use a Gaussian Mixture Model in the energy function. In addition, our algorithm is efficient to compute, and adapts to a variety of document images. We show results on word images from the challenging ICDAR 2003 dataset, and compare our performance with previously reported methods. Our approach shows significant improvement in pixel level accuracy as well as OCR accuracy.

Summary (1 min read)

Jump to:  and [Summary]

Summary

  • Over the past two decades, numerous scientific studies have demonstrated that endocrine disrupting 70 chemicals (EDCs) elicit adverse effects on sensitive aquatic species, such as fish [1-7].
  • The REP, in turn, is the ratio of the effect 222 concentration of the reference compound estradiol EC50(E2) and the chemical i’s EC50(i) (Equation 2).
  • This result is supported by two recent reviews on the performance of current analytical 306 methods that have shown that 35 % of reviewed methods complied with the EQS for E2, while only one 307 method complied with the EQS for EE2 [49, 50].
  • The situation for MELN is markedly different from that of ER-CALUX.
  • 376 Thus, EEQchem results for MELN are strongly based on E1 concentrations – a compound that was always 377 measured (except for a few samples by Lab 2, Figure 3).

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

HAL Id: hal-00817972
https://hal.inria.fr/hal-00817972
Submitted on 17 Oct 2013
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
An MRF Model for Binarization of Natural Scene Text
Anand Mishra, Karteek Alahari, C.V. Jawahar
To cite this version:
Anand Mishra, Karteek Alahari, C.V. Jawahar. An MRF Model for Binarization of Natural Scene
Text. ICDAR - International Conference on Document Analysis and Recognition, Sep 2011, Beijing,
China. �10.1109/ICDAR.2011.12�. �hal-00817972�

An MRF Model for Binarization of Natural Scene Text
Anand Mishra
, Karteek Alahari
and C.V. Jawahar
International Institute of Information Technology Hyderabad, India
INRIA - Willow, ENS, Paris, France
Email: anand.mishra@research.iiit.ac.in, karteek.alahari@ens.fr, jawahar@iiit.ac.in
Abstract—Inspired by the success of MRF models for solving
object segmentation problems, we formulate the binarization
problem in this framework. We represent the pixels in a docu-
ment image as random variables in an MRF, and introduce a
new energy (or cost) function on these variables. Each variable
takes a foreground or background label, and the quality of the
binarization (or labelling) is determined by the value of the
energy function. We minimize the energy function, i.e. find the
optimal binarization, using an iterative graph cut scheme. Our
model is robust to variations in foreground and background
colours as we use a Gaussian Mixture Model in the energy
function. In addition, our algorithm is efficient to compute,
and adapts to a variety of document images. We show results
on word images from the challenging ICDAR 2003 dataset, and
compare our performance with previously reported methods.
Our approach shows significant improvement in pixel level
accuracy as well as OCR accuracy.
Keyw ords-MRF, GMM, Graph Cut, Binarization
I. INTRODUCTION
Binarization is one of the key preprocessing steps in any
document image analysis system. The performance of the
subsequent steps like character segmentation and recogni-
tion are highly dependant on the success of binarization.
Document image binarization is an active area of research
for many years. Is binarization a solved problem? Obviously
not, especially, due to the emerging need for recognition
of text in video sequences, digital-born (Web and email)
images, old historic manuscripts and natural scenes where
the state of art recognition performance is really poor. In
this regard, designing a powerful binarization algorithm
can be considered as a major step towards robust text
understanding. The recent interest of the community by
organising a binarization contest like DIBCO 2009 [1] at
10th International Conference on Document Analysis and
Recognition (ICDAR 2009) also supports our claim. Note
that DIBCO 2009 had 43 submissions which shows active
interest in this research area.
We, in this work, focus on binarization of natural scene
text. Natural scene texts contain numerous degradations not
usually present in machine printed ones such as uneven
lighting, blur, complex background, perspective distortion,
multiple colours etc. Methods such as interactive graph
cut by Boykov et al. [2] and thereafter GrabCut [3] have
shown p romising performance in foreground/background
segmentation of natural scenes in recent years. We formulate
Figure 1. Some samples images we considered in this work
the binarization problem in this framework (where text is
foreground and anything else is background), and define a
novel energy (cost) function such that the quality of the
binarization is determined by the energy value. We minimize
this energy functio n to find the optimal binarization using
an iterative graph cut scheme. The graph cut method needs
to be initialized with foreground/background seeds. To make
the binarization fully automatic, we obtain initial seeds for
graph cuts by our auto-seeding algorithm. At each iteration
of graph cut, seeds and binarization are refined. This makes
it more powerful compared to one shot graph cut algorithm.
Moreover, we model foreground and background colours in
a G MMRF framework [4] to make the binarization robust
to variations in foreground and background colours.
The remainder of the paper is organised as follows.
We discuss related work in Section II. In Section III, the
binarization problem is formulated as a labelling problem,
where we define an energy function such that its minimum
corresponds to the target binary image. This section also
briefly introduces graph cut method. Section IV explains
proposed iterative graph cut based binarization scheme. It
also elaborates the method of finding auto-seeds for the
graph cut. Section V describes experiments and results based
on the challenging ICDAR 2003 word dataset. Some sample
images of this dataset are shown in Figure 1. We finally
conclude the work in Section VI.
II. R
ELATED WORK
Traditional thresholding based binarization can be cat-
egorized into two categories: the one which uses global
threshold for the given document (like Otsu [5], Kittler et
al. [6]) and the one with local thresholds (like Sauvola [7],
Niblack [8]). An exhaustive review of thresholding based

binarization is beyond the scope of this paper. The reader
is encouraged to see [9] for this. Although most of these
previous algorithms perform satisfactorily for many cases,
they suffer from the problems like: (1) Manual tuning of pa-
rameters, (2) High sensitivity to the choice of parameters, (3)
Handling images with uneven lighting, noisy background,
similar foreground-background colours.
Recently, Markov Random Field (MRF) based binariza-
tion has been applied for degraded documents. In [10],
Wolf et al. proposed binarization in an energy minimization
framework and applied a less powerful and computationally
expensive simulated annealing (SA) for energy minimiza-
tion. In [11], authors classified document into Text Region
(TR), Near Text Region (NTR) and Background Regions
(BR) and then applied graph cut to produce final binary
image. MRF based binarization for hand-held device cap-
tured document images was proposed in [12], where authors
first used thresholding based technique to produce a binary
image and then applied graph cuts to remove noise and
smooth binarization output. However, these methods can
not be directly applied to natural scene text images due to
additional challenges like blur, hardly distinguishable fore-
ground/background colours, variable font sizes, and styles.
Researchers have also shown interest in colour image
binarization in recent years (see [13], [14]). But these
methods lack a principled formulation of the binarization
problem of complex colour documents, and hence can not
be generalized.
III. T
HE BINARIZATION PROBLEM
We define the binarization problem in a labelling frame-
work as follows: the binarization of an image can be
expressed as a vector of binary random variables X =
{X
1
,X
2
, ..., X
n
}, where each random variable X
i
takes a
label x
i
∈{0, 1} based on whether it is text (foreground)
or non-text (background). Most of the heuristic based al-
gorithms take the decision of assigning label 0 or 1 to x
i
based on the pixel value at that position or local statistics.
Such algorithms are not effective in our case because of the
variations in foreground/background colour distributions.
In this work, we formulate the problem in a more princi-
pled framework where we represent image pixels as nodes in
a Markov Random Field and associate a unary and pairwise
cost of labelling pixels. We then solve the problem in an
energy minimization framework where the “Gibbs energy
function E of following form is defined:
E(x, θ, z)=E
i
(x, θ, z)+E
ij
(x, z), (1)
such that its minimum corresponds to the target binary
image. Here x = {x
1
,x
2
, ..., x
n
} is a set of labels at each
pixel. θ is the set of model parameters which is learnt
from the foreground/background colour distributions and the
vector z = {z
1
,z
2
, ..., z
n
} denotes the colour intensities of
pixels.
In Equation (1), E
i
(·) and E
ij
(·) corresponds to data term
and smoothness term respectively. Data term E
i
(·) measures
the degree of agreement o f the inferred label x
i
to the
observed image data z
i
. The smoothness term measures the
cost of assigning labels x
i
, x
j
to adjacent pixels and is used
to impose spatial smoothness. A typical unary term can be
expressed as:
E
i
(x, θ, z)=
i
log p(x
i
|z
i
).
Similarly, the smoothness term most commonly used in
literature is the Potts model:
E
ij
(x, z)=λ
(i,j)N
exp
(z
i
z
j
)
2
2β
2
[x
i
= x
j
]
dist(i, j)
,
where λ determines the degree of smoothness, dist(i, j) is
the Euclidean distance between neighbouring pixels i and j.
The constant β allows discontinuity preserving smoothing. N
denotes the neighbourhood system defined in MRF. Further,
the smoothness term imposes cost only for those adjacent
pixels which have different labels (i.e. [x
i
= x
j
]).
The problem of binarization is now to find the global
minima of the Gibbs energy, i.e.,
x
=argmin
x
E(x, θ , z). (2)
The global minima of this energy function can be efficiently
computed by graph cut [15] su bject to fulfilling the cri-
teria of sub modularity [16]. For this a weighted graph
G =(V,E) is formed where each vertex corresponds to an
image pixel, and edges link adjacent pixels. Two additional
vertices source (s) and sink (t) are added to the graph. All the
other vertices are connected to them with weighted edges.
The weights of all the edges are defined in such a way that
every cut of the graph is equivalent to some label assignment
to the energy function. Note that the cut of the graph G
is a partition of set of vertices V into two disjoint sets S
and T and the cost of the cut is defined as the sum of the
weights of edges going from vertices belonging to set S to T
(see [16]). The min cut of such a graph corresponds to the
global m inima of the energy function. There are efficient
implementations available for finding min cut of such a
graph [15].
In [2], the set of model parameters θ describe im-
age foreground/background histograms. The histograms are
constructed directly from the foreground/background seeds
which are obtained with user interaction. However, the
foreground/background distribution in our case (see images
in Figure 1) can not be captured efficiently by a naive
histogram distribution. Rather, we assume each pixel colour
is generated from a Gaussian Mixture Model (GMM). In this
regard, we are highly inspired by the success of the GrabCut
[3] for object segmentation. But at the same time, we want
to avoid any user interaction to make the binarization fully
automatic. We achieve this by our auto seeding algorithm

which we describe in the Section IV-A. Furthermore, iter-
ative graph cut based binarization is also more suitable for
our application as it refines seeds and, binarization output at
each iteration and thus produces a clean binarization result
even in case of noisy foreground/background distributions.
IV. I
TERATIVE GRAPH CUT BASED BINARIZATION
In GMMR F framework [4], each pixel colour is generated
from one of the 2c Gaussian Mixture Models (GMM
S)(c
GMM
S for foreground and background each) with mean
μ and covariance Σ i.e. each foreground colour pixel is
generated from following distribution:
p(z
i
|x
i
, θ,k
i
)=N(z, θ; μ(x
i
,k
i
), Σ(x
i
,k
i
)), (3)
where N denotes a Gaussian distribution, x
i
∈{0, 1}
and k
i
∈{1, ..., c}. To model foreground colour using
above distribution, an additional vector k = {k
1
,k
2
, ..., k
n
}
is introduced where each k
i
takes one of the c GMM
components. Similarly, background colour is modelled from
one of the c GMM components. Further, the likelihood prob-
abilities of observation can be assumed to be independent
from the pixel position. Thus can be expressed as:
p(z|x, θ, k)=
i
p(z
i
|x
i
, θ,k
i
)
=
i
π(x
i
,k
i
)
det(Σ(x
i
,k
i
))
×
exp(
1
2
(z
i
μ(x
i
,k
i
))
T
Σ(x
i
,k
i
)
1
(z
i
μ(x
i
,k
i
))).
Here π(·) is Gaussian mixture weighting coefficient.
Due to the introduction of GMM
S the energy function in
Equation (1) now becomes:
E(x, k, θ, z)=E
i
(x, k, θ, z)+E
ij
(x, z), (4)
i.e. the data term d epends on its assignment to GMM
component. It is given by:
E
i
(x, k, θ, z)=
i
log p(z|x, θ, k). (5)
In order to make the energy function robust to low
contrast colour images we modify the smoothness term of
the energy function by adding a new term which measures
the “edginess” of the pixels as follows:
E
ij
(x, z)=λ
1
(i,j)N
[x
i
= x
j
]exp(β||z
i
z
j
||
2
)
+λ
2
(i,j)N
[x
i
= x
j
]exp(β||w
i
w
j
||
2
).
(6)
Here w
i
denotes the magnitude of gradient (edginess) at
pixel i and N denotes the neighbourhood system defined for
the MRF model. The two neighbouring pixels with similar
edginess values are more likely to belong to the same class.
The edginess term enforces this constraint. The constants
λ
1
and λ
2
determine the relative strength of the colour and
edginess differences respectively. Parameters λ
i
and β are
learnt automatically from the image.
The Gaussian Mixture Models, in Equation (5), need to
be initialized with foreground/background seeds. Since our
objective is to make the binar ization fully automatic, we
initialize GMM
S by foreground-background seeds obtained
from our auto seeding algorithm. Then, at each iteration, the
seeds are refined and n ew GMM
S are learnt from them. It
makes the algorithm more powerful and allows it to adapt
to the variations in foreground/background.
A. Auto-seeding
To perform automatic binarization we need to compute
foreground and background seeds for graph cut. Given an
image we first convert it to an edge image using Canny edge
operator and then find the foreground and background seeds
as follows:
1) Foreground seeds: Our foreground seeding algorithm
is highly motivated from the fact that there exist a parallel
edge curve (line) for every edge curve (line) in a character
i.e. if an edge pixel has gradient orientation θ then in
direction of θ there exists an edge pixel whose gradient
orientation is π θ
Step 1: Let p be a non-traversed edge pixel with gradient
orientation θ. For every such edge pixel p we traverse the
edge image in direction of θ until we hit an edge pixel q
whose gradient orientation is (πθ)±
π
36
(i.e. approximately
opposite gradient direction). We mark this line segment
pq
as foreground seed candidate and store the length of it. We
repeat this process for all the non-traversed edge pixels.
After finding all foreground seed candidates, we remove all
those line segments whose length is too high or too low with
respect to the majority of seed candidates. The remaining
line segments are marked as foreground seeds.
Step 2: Handling images with light text on dark back-
ground: When we have such image we rarely get parallel
edge curves (lines) with the above mentioned traversal,
rather many line segments
pq start hitting the image bound-
ary. We automatically detect such situations and subtract π
to the original orientation a nd then follow the same process
as Step 1.
2) Background seeds: For background seeding we adopt
the following scheme: Given an edge image we find out
the horizontal/vertical line having no edge pixel. We mar k
that line as background. When we do not get background
seeds in the above method then we relax our criteria and
mark all those regions as background which are accessible
(without hitting an edge pixel) from at least two sides of
image boundary. In practice, for some cases we do not get
enough background seeds even after relaxation. For such
cases we traverse the edge image from all four sides o f
the image boundary till we hit an edge. We mark all these
regions as background seeds. Figure 2 shows typical initial
seeds fo r the iterative graph cut.

(a) (b)
Figure 2. (a) Input Image (b) Its foreground-background seeds, Red and
blue colour shows foreground and background seeds respectiv ely (Best
vie wed in colour).
Figure 3. Images where auto-seeding fails
Although the proposed auto-seeding method performs
satisfactorily well, it tend to fail in cases where Canny
edge operator produces too many noisy or broken edges.
In such cases some foreground regions are falsely marked
as background and vice-versa, which leads to poor bina-
rization. We show two such example in Figure 3, where our
auto-seeding algorithm fails to mark foreground-background
regions appropriately.
In summary, once we obtain initial seeds, GMM
S for
foreground and background colours are learnt. Then, based
on the data and smoothness terms in Equation (5) and (6)
respectively, the graph is formed. We use standard graph cut
algorithm [15] to obtain initial binarization result. We then
re-estimate GMM
S using an initial binarization result and
iterate the graph cut over new data and smooth ness term,
until convergence. This refines the binary image at each
iteration an d finally pr oduces a clean binary image.
V. R
ESULTS AND DISCUSSIONS
We use sample images from the ICDAR 2003 Robust
Word Recognition dataset [17] for our experiments. It con-
sists of 171 natural scene text images. These images have
several degradations due to uneven lighting, complex back-
ground, blur and similar foreground background colours. To
evaluate the performance of proposed binarization algorithm,
we compare it with the well-known thresholding based bina-
rization techniques like Otsu [5], Sauvola [7], Niblack [8],
Kittler et al. [6]. We also compare our binarization algorithm
with colour thresholding based method proposed in [14].
Note that these classical binarization algorithms produce
white text on black background in case of images with light
text on dark background. On the contrary, our binarization
algorithm works in object segmentation framework and thus
produces black text on white background always. However,
for fair comparison we reverse the colour of binarized output
of the classical methods if they produce white text on black
background.
For the proposed binarization algorithm we used 10
GMM components (5 each for foreground and background).
We empirically determine the number of iteration for graph
cuts as 8, since no significant change in binarization, was
observed beyond 8 iterations. We also show our results with
and without edginess difference in the the pairwise term.
(Note that by edginess difference term we mean, energy
function with gradient magnitude difference in addition to
difference in RGB colour space). For parameter sensitive
algorithms like [7] and [8] we use the parameters from which
we obtain the best OCR accuracy.
All the implementations of the proposed method are done
using C++ graph cut code [15] and Matlab. The proposed
method takes 32 seconds on average to produce final binary
result for an image on system with 2 GB RAM and Intel
R
Core
TM
2 Duo CPU with 2.93 GHz processor system.
A. Qualitative evaluation
First we compare the proposed binarization algorithm with
thresholding based methods intuitively in Figure 4. Samples
of images with uneven lighting, hardly distinguishable fore-
ground/background colours, noisy foreground colours, are
shown in this figure. We observe that our approach produces
clearly readable binary images. Further, our algorithm pro-
duces lesser noise compared to the local thresholding based
algorithms like [7], [8], which also helps to improve the
OCR accuracy.
B. Quantitative evaluation
Quantitative evaluation of bin arization is o ne of the
biggest challenge for document image community [9]. In
this work, we demonstrate the performance of binarization
not only based on OCR accuracy but also in terms of pixel
level accuracy.
1) OCR accuracy: We test OCR accuracy to verify
robustness of our algorithm. For this we fed the binariza-
tion result of all algorithms to commercial OCR ABBYY
fine reader 9.0 [18]. The word and c haracter r ecognition
accuracies are summarized in Table I. Since this dataset
consists of images of tight word boundaries, global methods
(like [5], [6]) performs better than popular local methods.
Furthermore, OCR fails to perform well in case of noisy
binarization output (as in the case of Sauvola and Niblack).
Otsu followed by colour thresholding binarization proposed
in [14] improves the word recognition accuracy but not sig-
nificantly. However, since the proposed algorithm produces
clean binary images, it shows significant improvement in
OCR accuracy.
2) Pixel level accuracy: For comparing various bina-
rization algorithms based on pixel accuracy,wepicked30
images from the ICDAR 2003 word dataset and produced
pixel level binarization ground truth for it. These images

Citations
More filters
Book ChapterDOI
01 Nov 2014
TL;DR: This paper presents a novel approach to recognize text in scene images that outperforms the state-of-the-art techniques significantly and is able to recognize the whole word images without character-level segmentation and recognition.
Abstract: Scene text recognition is a useful but very challenging task due to uncontrolled condition of text in natural scenes. This paper presents a novel approach to recognize text in scene images. In the proposed technique, a word image is first converted into a sequential column vectors based on Histogram of Oriented Gradient (HOG). The Recurrent Neural Network (RNN) is then adapted to classify the sequential feature vectors into the corresponding word. Compared with most of the existing methods that follow a bottom-up approach to form words by grouping the recognized characters, our proposed method is able to recognize the whole word images without character-level segmentation and recognition. Experiments on a number of publicly available datasets show that the proposed method outperforms the state-of-the-art techniques significantly. In addition, the recognition results on publicly available datasets provide a good benchmark for the future research in this area.

233 citations


Cites methods from "An MRF Model for Binarization of Na..."

  • ...Datsets ICDAR03 (Full) ICDAR03 (50) ICDAR11 (Full) ICDAR11 (50) SVT MRF [5] 0.67 0.69 - - - IR [7] 0.75 0.77 - - - NESP [6] 0.66 - 0.73 - PLEX [16] 0.62 0.76 - - 0.57 HOG + CRF [10] - 0.82 - - 0.73 PBS [9] 0.79 0.87 0.83 0.87 0.74 WFST [11] 0.83 - 0.56 - 0.73 CNN [14] 0.84 0.90 - - 0.70 Proposed 0.82 0.92 0.83 0.91 0.83 ICDAR03(FULL) and ICDAR11(FULL) in Table 1), as well as with lexicon consisting of 50 random words from the test set (as denoted by ICDAR03(50) and ICDAR11(50) in Table 1)....

    [...]

  • ...Several systems have been reported that exploit Markov Random Field [5], Nonlinear color enhancement [6] and Inverse Rendering [7] to extract the character regions....

    [...]

  • ...The text segmentation methods (MRF, IR, and NESP) produce lower recognition accuracy than other methods because robust and accurate scene text segmentation by itself is an very challenging task....

    [...]

  • ...We compare our proposed method with eight state-of-the-art techniques, including markov random field method (MRF) [5], inverse rendering method (IR) [7], nonlinear color enhancement method (NESP) [6], pictorial structure method (PLEX) [16], HOG based conditional random field method (HOG+CRF) [10], weighted finite-state transducers method (WFST) [11], part based tree structure method (PBS) [9] and convolutional neural network method (CNN) [14]....

    [...]

Journal ArticleDOI
Nicholas R. Howe1
TL;DR: An automatic technique for setting parameters in a manner that tunes them to the individual image, yielding a final binarization algorithm that can cut total error by one-third with respect to the baseline version is described.
Abstract: Document analysis systems often begin with binarization as a first processing stage. Although numerous techniques for binarization have been proposed, the results produced can vary in quality and often prove sensitive to the settings of one or more control parameters. This paper examines a promising approach to binarization based upon simple principles, and shows that its success depends most significantly upon the values of two key parameters. It further describes an automatic technique for setting these parameters in a manner that tunes them to the individual image, yielding a final binarization algorithm that can cut total error by one-third with respect to the baseline version. The results of this method advance the state of the art on recent benchmarks.

185 citations


Cites background from "An MRF Model for Binarization of Na..."

  • ...A number of projects have explored the use of Markov random fields (MRF) for binarization [12, 18, 15, 11]....

    [...]

Posted Content
TL;DR: This survey is aimed at summarizing and analyzing the major changes and significant progresses of scene text detection and recognition in the deep learning era.
Abstract: With the rise and development of deep learning, computer vision has been tremendously transformed and reshaped. As an important research area in computer vision, scene text detection and recognition has been inescapably influenced by this wave of revolution, consequentially entering the era of deep learning. In recent years, the community has witnessed substantial advancements in mindset, approach and performance. This survey is aimed at summarizing and analyzing the major changes and significant progresses of scene text detection and recognition in the deep learning era. Through this article, we devote to: (1) introduce new insights and ideas; (2) highlight recent techniques and benchmarks; (3) look ahead into future trends. Specifically, we will emphasize the dramatic differences brought by deep learning and the grand challenges still remained. We expect that this review paper would serve as a reference book for researchers in this field. Related resources are also collected and compiled in our Github repository: this https URL.

155 citations


Cites background from "An MRF Model for Binarization of Na..."

  • ...Variousmethods have been proposed to tackle these sub-problems, which includes text binariza- tion (Zhiwei et al. 2010; Mishra et al. 2011; Wakahara and Kita 2011; Lee and Kim 2013), text line segmentation (Ye et al. 2003), character segmentation (Nomura et al. 2005; Shivakumara et al. 2011; Roy et…...

    [...]

  • ...s for classification. Another discomposed the recognition process into a series of sub-problems. Various methods have been proposed to tackle these sub-problems, which includes text binarization [78], [105], [153], [182], text line segmentation [169], character segmentation [113], [126], [139], single JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, X X 3 Images Stroke Width Map Width Variance Map MSER Comp...

    [...]

Journal ArticleDOI
TL;DR: The Histogram of Oriented Gradient is extended and two new feature descriptors are proposed: Co-occurrence HOG (Co-HOG) and Convolutional Co-Hog (ConvCo- HOG) for accurate recognition of scene texts of different languages.

130 citations


Cites background from "An MRF Model for Binarization of Na..."

  • ...When applied for texts in scene images, the recognition performance of these existing OCR systems is often not satisfactory because such texts could appear in arbitrary size, color, fonts, orientations, lighting, and background as illustrated in Fig....

    [...]

Journal ArticleDOI
TL;DR: This paper proposes a novel scene text recognition technique that performs word level recognition without character segmentation and adapts the recurrent neural network with Long Short Term Memory, the technique that has been widely used for handwriting recognition in recent years.

129 citations


Cites methods from "An MRF Model for Binarization of Na..."

  • ...Text segmentation methods (MRF [3], IR [5], and NESP [4]) produce lower recognition accuracy than other methods because robust and accurate scene text segmentation is a very challenging task....

    [...]

  • ...The compared techniques can be grouped into three categories including 1) Segmentation based techniques (markov random field method (MRF) [3], inverse rendering method (IR) [5], nonlinear color enhancement method (NESP) [4]) that segment the text regions from the word images, 2) Character level recognition techniques (HMM Maxout model (HMM) [9], HOG based conditional random field method (HOGCRF) [12], CNN model (CNN) [11], Part Based Tree structure method (PBS) [14] 1, Clustering sub-patches of characters method (Strokelets) [15], PhotoOCR [10] and Deep CNN Model (DCNN) [16]) that recognize word images through segmentation and integration of character recognition results and 3) Word level recognition techniques (Embedded attributes (AE) [18], Dynamic time warping (DTW) [17], and Whole Word Deep CNN Model(WWDCNN) [19]) that treat each word images as a whole without character segmentation....

    [...]

  • ...The compared techniques can be grouped into three categories including 1) Segmentation based techniques (markov random field method (MRF) [3], inverse rendering method (IR) [5], nonlinear color enhancement method (NESP) [4]) that segment the text regions from the word images, 2) Character level recognition techniques (HMM Maxout model (HMM) [9], HOG based conditional random field method (HOGCRF) [12], CNN model (CNN) [11], Part Based Tree structure method (PBS) [14] (1), Clustering sub-patches of characters method (Strokelets) [15], PhotoOCR [10] and Deep CNN Model (DCNN) [16]) that recognize word images through segmentation and integration of character recognition results and 3) Word level recognition techniques (Embedded attributes (AE) [18], Dynamic time warping (DTW) [17], and Whole Word Deep CNN Model(WWDCNN) [19]) that treat each word images as a whole without character segmentation....

    [...]

References
More filters
Journal ArticleDOI

37,017 citations


"An MRF Model for Binarization of Na..." refers methods in this paper

  • ...We also compare our method with Otsu followed by colour thresholding (CT) [14]....

    [...]

  • ...Otsu followed by colour thresholding binarization proposed in [14] improves the word recognition accuracy but not significantly....

    [...]

  • ...Traditional thresholding based binarization can be categorized into two categories: the one which uses global threshold for the given document (like Otsu [5], Kittler et al. [6]) and the one with local thresholds (like Sauvola [7], Niblack [8])....

    [...]

  • ...To evaluate the performance of proposed binarization algorithm, we compare it with the well-known thresholding based binarization techniques like Otsu [5], Sauvola [7], Niblack [8], Kittler et al. [6]....

    [...]

  • ...Since this dataset consists of images of tight word boundaries, global methods (like [5], [6]) performs better than popular local methods....

    [...]

Journal ArticleDOI
01 Aug 2004
TL;DR: A more powerful, iterative version of the optimisation of the graph-cut approach is developed and the power of the iterative algorithm is used to simplify substantially the user interaction needed for a given quality of result.
Abstract: The problem of efficient, interactive foreground/background segmentation in still images is of great practical importance in image editing. Classical image segmentation tools use either texture (colour) information, e.g. Magic Wand, or edge (contrast) information, e.g. Intelligent Scissors. Recently, an approach based on optimization by graph-cut has been developed which successfully combines both types of information. In this paper we extend the graph-cut approach in three respects. First, we have developed a more powerful, iterative version of the optimisation. Secondly, the power of the iterative algorithm is used to simplify substantially the user interaction needed for a given quality of result. Thirdly, a robust algorithm for "border matting" has been developed to estimate simultaneously the alpha-matte around an object boundary and the colours of foreground pixels. We show that for moderately difficult examples the proposed method outperforms competitive tools.

5,670 citations

Journal ArticleDOI
TL;DR: This paper compares the running times of several standard algorithms, as well as a new algorithm that is recently developed that works several times faster than any of the other methods, making near real-time performance possible.
Abstract: Minimum cut/maximum flow algorithms on graphs have emerged as an increasingly useful tool for exactor approximate energy minimization in low-level vision. The combinatorial optimization literature provides many min-cut/max-flow algorithms with different polynomial time complexity. Their practical efficiency, however, has to date been studied mainly outside the scope of computer vision. The goal of this paper is to provide an experimental comparison of the efficiency of min-cut/max flow algorithms for applications in vision. We compare the running times of several standard algorithms, as well as a new algorithm that we have recently developed. The algorithms we study include both Goldberg-Tarjan style "push -relabel" methods and algorithms based on Ford-Fulkerson style "augmenting paths." We benchmark these algorithms on a number of typical graphs in the contexts of image restoration, stereo, and segmentation. In many cases, our new algorithm works several times faster than any of the other methods, making near real-time performance possible. An implementation of our max-flow/min-cut algorithm is available upon request for research purposes.

4,463 citations

Proceedings ArticleDOI
07 Jul 2001
TL;DR: In this paper, the user marks certain pixels as "object" or "background" to provide hard constraints for segmentation, and additional soft constraints incorporate both boundary and region information.
Abstract: In this paper we describe a new technique for general purpose interactive segmentation of N-dimensional images. The user marks certain pixels as "object" or "background" to provide hard constraints for segmentation. Additional soft constraints incorporate both boundary and region information. Graph cuts are used to find the globally optimal segmentation of the N-dimensional image. The obtained solution gives the best balance of boundary and region properties among all segmentations satisfying the constraints. The topology of our segmentation is unrestricted and both "object" and "background" segments may consist of several isolated parts. Some experimental results are presented in the context of photo/video editing and medical image segmentation. We also demonstrate an interesting Gestalt example. A fast implementation of our segmentation method is possible via a new max-flow algorithm.

3,571 citations

Book ChapterDOI
03 Sep 2001
TL;DR: The goal of this paper is to provide an experimental comparison of the efficiency of min-cut/max flow algorithms for applications in vision, comparing the running times of several standard algorithms, as well as a new algorithm that is recently developed.
Abstract: After [10, 15, 12, 2, 4] minimum cut/maximum flow algorithms on graphs emerged as an increasingly useful tool for exact or approximate energy minimization in low-level vision. The combinatorial optimization literature provides many min-cut/max-flow algorithms with different polynomial time complexity. Their practical efficiency, however, has to date been studied mainly outside the scope of computer vision. The goal of this paper is to provide an experimental comparison of the efficiency of min-cut/max flow algorithms for energy minimization in vision. We compare the running times of several standard algorithms, as well as a new algorithm that we have recently developed. The algorithms we study include both Goldberg-style "push-relabel" methods and algorithms based on Ford-Fulkerson style augmenting paths. We benchmark these algorithms on a number of typical graphs in the contexts of image restoration, stereo, and interactive segmentation. In many cases our new algorithm works several times faster than any of the other methods making near real-time performance possible.

3,099 citations


"An MRF Model for Binarization of Na..." refers background or methods in this paper

  • ...But these methods lack a principled formulation of the binarization problem of complex colour documents, and hence can not be generalized....

    [...]

  • ...Then, based on the data and smoothness terms in Equation (5) and (6) respectively, the graph is formed....

    [...]

  • ...All the implementations of the proposed method are done using C++ graph cut code [15] and Matlab....

    [...]

Frequently Asked Questions (12)
Q1. What are the contributions mentioned in the paper "An mrf model for binarization of natural scene text" ?

Inspired by the success of MRF models for solving object segmentation problems, the authors formulate the binarization problem in this framework. The authors represent the pixels in a document image as random variables in an MRF, and introduce a new energy ( or cost ) function on these variables. The authors show results on word images from the challenging ICDAR 2003 dataset, and compare their performance with previously reported methods. Their approach shows significant improvement in pixel level accuracy as well as OCR accuracy. 

iterative graph cut based binarization is also more suitable for their application as it refines seeds and, binarization output at each iteration and thus produces a clean binarization result even in case of noisy foreground/background distributions. 

Due to the introduction of GMMS the energy function in Equation (1) now becomes:E(x, k, θ, z) = Ei(x, k, θ, z) + Eij(x, z), (4)i.e. the data term depends on its assignment to GMM component. 

The proposed method takes 32 seconds on average to produce final binary result for an image on system with 2 GB RAM and Intel R© CoreTM 2 Duo CPU with 2.93 GHz processor system. 

(Note that by edginess difference term the authors mean, energy function with gradient magnitude difference in addition to difference in RGB colour space). 

the smoothness term most commonly used in literature is the Potts model:Eij(x, z) = λ ∑(i,j)∈N exp−(zi − zj)2 2β2 [xi = xj ] dist(i, j) ,where λ determines the degree of smoothness, dist(i, j) is the Euclidean distance between neighbouring pixels i and j. 

For every such edge pixel p the authors traverse the edge image in direction of θ until the authors hit an edge pixel q whose gradient orientation is (π−θ)± π36 (i.e. approximately opposite gradient direction). 

(5)In order to make the energy function robust to low contrast colour images the authors modify the smoothness term of the energy function by adding a new term which measures the “edginess” of the pixels as follows:Eij(x, z) = λ1 ∑(i,j)∈N [xi = xj ]exp(−β||zi − zj||2)+λ2 ∑(i,j)∈N [xi = xj ]exp(−β||wi − wj ||2). 

ITERATIVE GRAPH CUT BASED BINARIZATIONIn GMMRF framework [4], each pixel colour is generated from one of the 2c Gaussian Mixture Models (GMMS) (c GMMS for foreground and background each) with mean μ and covariance Σ i.e. each foreground colour pixel is generated from following distribution:p(zi|xi, θ, ki) = N (z, θ; μ(xi, ki), Σ(xi, ki)), (3) where N denotes a Gaussian distribution, xi ∈ {0, 1} and ki ∈ {1, ..., c}. 

Although most of these previous algorithms perform satisfactorily for many cases, they suffer from the problems like: (1) Manual tuning of parameters, (2) High sensitivity to the choice of parameters, (3) Handling images with uneven lighting, noisy background, similar foreground-background colours. 

The authors then re-estimate GMMS using an initial binarization result and iterate the graph cut over new data and smoothness term, until convergence. 

But these methods lack a principled formulation of the binarization problem of complex colour documents, and hence can not be generalized.