scispace - formally typeset
Open AccessProceedings ArticleDOI

An MRF Model for Binarization of Natural Scene Text

Reads0
Chats0
TLDR
This work represents the pixels in a document image as random variables in an MRF, and introduces a new energy function on these variables to find the optimal binarization, using an iterative graph cut scheme.
Abstract
Inspired by the success of MRF models for solving object segmentation problems, we formulate the binarization problem in this framework. We represent the pixels in a document image as random variables in an MRF, and introduce a new energy (or cost) function on these variables. Each variable takes a foreground or background label, and the quality of the binarization (or labelling) is determined by the value of the energy function. We minimize the energy function, i.e. find the optimal binarization, using an iterative graph cut scheme. Our model is robust to variations in foreground and background colours as we use a Gaussian Mixture Model in the energy function. In addition, our algorithm is efficient to compute, and adapts to a variety of document images. We show results on word images from the challenging ICDAR 2003 dataset, and compare our performance with previously reported methods. Our approach shows significant improvement in pixel level accuracy as well as OCR accuracy.

read more

Content maybe subject to copyright    Report

HAL Id: hal-00817972
https://hal.inria.fr/hal-00817972
Submitted on 17 Oct 2013
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
An MRF Model for Binarization of Natural Scene Text
Anand Mishra, Karteek Alahari, C.V. Jawahar
To cite this version:
Anand Mishra, Karteek Alahari, C.V. Jawahar. An MRF Model for Binarization of Natural Scene
Text. ICDAR - International Conference on Document Analysis and Recognition, Sep 2011, Beijing,
China. �10.1109/ICDAR.2011.12�. �hal-00817972�

An MRF Model for Binarization of Natural Scene Text
Anand Mishra
, Karteek Alahari
and C.V. Jawahar
International Institute of Information Technology Hyderabad, India
INRIA - Willow, ENS, Paris, France
Email: anand.mishra@research.iiit.ac.in, karteek.alahari@ens.fr, jawahar@iiit.ac.in
Abstract—Inspired by the success of MRF models for solving
object segmentation problems, we formulate the binarization
problem in this framework. We represent the pixels in a docu-
ment image as random variables in an MRF, and introduce a
new energy (or cost) function on these variables. Each variable
takes a foreground or background label, and the quality of the
binarization (or labelling) is determined by the value of the
energy function. We minimize the energy function, i.e. find the
optimal binarization, using an iterative graph cut scheme. Our
model is robust to variations in foreground and background
colours as we use a Gaussian Mixture Model in the energy
function. In addition, our algorithm is efficient to compute,
and adapts to a variety of document images. We show results
on word images from the challenging ICDAR 2003 dataset, and
compare our performance with previously reported methods.
Our approach shows significant improvement in pixel level
accuracy as well as OCR accuracy.
Keyw ords-MRF, GMM, Graph Cut, Binarization
I. INTRODUCTION
Binarization is one of the key preprocessing steps in any
document image analysis system. The performance of the
subsequent steps like character segmentation and recogni-
tion are highly dependant on the success of binarization.
Document image binarization is an active area of research
for many years. Is binarization a solved problem? Obviously
not, especially, due to the emerging need for recognition
of text in video sequences, digital-born (Web and email)
images, old historic manuscripts and natural scenes where
the state of art recognition performance is really poor. In
this regard, designing a powerful binarization algorithm
can be considered as a major step towards robust text
understanding. The recent interest of the community by
organising a binarization contest like DIBCO 2009 [1] at
10th International Conference on Document Analysis and
Recognition (ICDAR 2009) also supports our claim. Note
that DIBCO 2009 had 43 submissions which shows active
interest in this research area.
We, in this work, focus on binarization of natural scene
text. Natural scene texts contain numerous degradations not
usually present in machine printed ones such as uneven
lighting, blur, complex background, perspective distortion,
multiple colours etc. Methods such as interactive graph
cut by Boykov et al. [2] and thereafter GrabCut [3] have
shown p romising performance in foreground/background
segmentation of natural scenes in recent years. We formulate
Figure 1. Some samples images we considered in this work
the binarization problem in this framework (where text is
foreground and anything else is background), and define a
novel energy (cost) function such that the quality of the
binarization is determined by the energy value. We minimize
this energy functio n to find the optimal binarization using
an iterative graph cut scheme. The graph cut method needs
to be initialized with foreground/background seeds. To make
the binarization fully automatic, we obtain initial seeds for
graph cuts by our auto-seeding algorithm. At each iteration
of graph cut, seeds and binarization are refined. This makes
it more powerful compared to one shot graph cut algorithm.
Moreover, we model foreground and background colours in
a G MMRF framework [4] to make the binarization robust
to variations in foreground and background colours.
The remainder of the paper is organised as follows.
We discuss related work in Section II. In Section III, the
binarization problem is formulated as a labelling problem,
where we define an energy function such that its minimum
corresponds to the target binary image. This section also
briefly introduces graph cut method. Section IV explains
proposed iterative graph cut based binarization scheme. It
also elaborates the method of finding auto-seeds for the
graph cut. Section V describes experiments and results based
on the challenging ICDAR 2003 word dataset. Some sample
images of this dataset are shown in Figure 1. We finally
conclude the work in Section VI.
II. R
ELATED WORK
Traditional thresholding based binarization can be cat-
egorized into two categories: the one which uses global
threshold for the given document (like Otsu [5], Kittler et
al. [6]) and the one with local thresholds (like Sauvola [7],
Niblack [8]). An exhaustive review of thresholding based

binarization is beyond the scope of this paper. The reader
is encouraged to see [9] for this. Although most of these
previous algorithms perform satisfactorily for many cases,
they suffer from the problems like: (1) Manual tuning of pa-
rameters, (2) High sensitivity to the choice of parameters, (3)
Handling images with uneven lighting, noisy background,
similar foreground-background colours.
Recently, Markov Random Field (MRF) based binariza-
tion has been applied for degraded documents. In [10],
Wolf et al. proposed binarization in an energy minimization
framework and applied a less powerful and computationally
expensive simulated annealing (SA) for energy minimiza-
tion. In [11], authors classified document into Text Region
(TR), Near Text Region (NTR) and Background Regions
(BR) and then applied graph cut to produce final binary
image. MRF based binarization for hand-held device cap-
tured document images was proposed in [12], where authors
first used thresholding based technique to produce a binary
image and then applied graph cuts to remove noise and
smooth binarization output. However, these methods can
not be directly applied to natural scene text images due to
additional challenges like blur, hardly distinguishable fore-
ground/background colours, variable font sizes, and styles.
Researchers have also shown interest in colour image
binarization in recent years (see [13], [14]). But these
methods lack a principled formulation of the binarization
problem of complex colour documents, and hence can not
be generalized.
III. T
HE BINARIZATION PROBLEM
We define the binarization problem in a labelling frame-
work as follows: the binarization of an image can be
expressed as a vector of binary random variables X =
{X
1
,X
2
, ..., X
n
}, where each random variable X
i
takes a
label x
i
∈{0, 1} based on whether it is text (foreground)
or non-text (background). Most of the heuristic based al-
gorithms take the decision of assigning label 0 or 1 to x
i
based on the pixel value at that position or local statistics.
Such algorithms are not effective in our case because of the
variations in foreground/background colour distributions.
In this work, we formulate the problem in a more princi-
pled framework where we represent image pixels as nodes in
a Markov Random Field and associate a unary and pairwise
cost of labelling pixels. We then solve the problem in an
energy minimization framework where the “Gibbs energy
function E of following form is defined:
E(x, θ, z)=E
i
(x, θ, z)+E
ij
(x, z), (1)
such that its minimum corresponds to the target binary
image. Here x = {x
1
,x
2
, ..., x
n
} is a set of labels at each
pixel. θ is the set of model parameters which is learnt
from the foreground/background colour distributions and the
vector z = {z
1
,z
2
, ..., z
n
} denotes the colour intensities of
pixels.
In Equation (1), E
i
(·) and E
ij
(·) corresponds to data term
and smoothness term respectively. Data term E
i
(·) measures
the degree of agreement o f the inferred label x
i
to the
observed image data z
i
. The smoothness term measures the
cost of assigning labels x
i
, x
j
to adjacent pixels and is used
to impose spatial smoothness. A typical unary term can be
expressed as:
E
i
(x, θ, z)=
i
log p(x
i
|z
i
).
Similarly, the smoothness term most commonly used in
literature is the Potts model:
E
ij
(x, z)=λ
(i,j)N
exp
(z
i
z
j
)
2
2β
2
[x
i
= x
j
]
dist(i, j)
,
where λ determines the degree of smoothness, dist(i, j) is
the Euclidean distance between neighbouring pixels i and j.
The constant β allows discontinuity preserving smoothing. N
denotes the neighbourhood system defined in MRF. Further,
the smoothness term imposes cost only for those adjacent
pixels which have different labels (i.e. [x
i
= x
j
]).
The problem of binarization is now to find the global
minima of the Gibbs energy, i.e.,
x
=argmin
x
E(x, θ , z). (2)
The global minima of this energy function can be efficiently
computed by graph cut [15] su bject to fulfilling the cri-
teria of sub modularity [16]. For this a weighted graph
G =(V,E) is formed where each vertex corresponds to an
image pixel, and edges link adjacent pixels. Two additional
vertices source (s) and sink (t) are added to the graph. All the
other vertices are connected to them with weighted edges.
The weights of all the edges are defined in such a way that
every cut of the graph is equivalent to some label assignment
to the energy function. Note that the cut of the graph G
is a partition of set of vertices V into two disjoint sets S
and T and the cost of the cut is defined as the sum of the
weights of edges going from vertices belonging to set S to T
(see [16]). The min cut of such a graph corresponds to the
global m inima of the energy function. There are efficient
implementations available for finding min cut of such a
graph [15].
In [2], the set of model parameters θ describe im-
age foreground/background histograms. The histograms are
constructed directly from the foreground/background seeds
which are obtained with user interaction. However, the
foreground/background distribution in our case (see images
in Figure 1) can not be captured efficiently by a naive
histogram distribution. Rather, we assume each pixel colour
is generated from a Gaussian Mixture Model (GMM). In this
regard, we are highly inspired by the success of the GrabCut
[3] for object segmentation. But at the same time, we want
to avoid any user interaction to make the binarization fully
automatic. We achieve this by our auto seeding algorithm

which we describe in the Section IV-A. Furthermore, iter-
ative graph cut based binarization is also more suitable for
our application as it refines seeds and, binarization output at
each iteration and thus produces a clean binarization result
even in case of noisy foreground/background distributions.
IV. I
TERATIVE GRAPH CUT BASED BINARIZATION
In GMMR F framework [4], each pixel colour is generated
from one of the 2c Gaussian Mixture Models (GMM
S)(c
GMM
S for foreground and background each) with mean
μ and covariance Σ i.e. each foreground colour pixel is
generated from following distribution:
p(z
i
|x
i
, θ,k
i
)=N(z, θ; μ(x
i
,k
i
), Σ(x
i
,k
i
)), (3)
where N denotes a Gaussian distribution, x
i
∈{0, 1}
and k
i
∈{1, ..., c}. To model foreground colour using
above distribution, an additional vector k = {k
1
,k
2
, ..., k
n
}
is introduced where each k
i
takes one of the c GMM
components. Similarly, background colour is modelled from
one of the c GMM components. Further, the likelihood prob-
abilities of observation can be assumed to be independent
from the pixel position. Thus can be expressed as:
p(z|x, θ, k)=
i
p(z
i
|x
i
, θ,k
i
)
=
i
π(x
i
,k
i
)
det(Σ(x
i
,k
i
))
×
exp(
1
2
(z
i
μ(x
i
,k
i
))
T
Σ(x
i
,k
i
)
1
(z
i
μ(x
i
,k
i
))).
Here π(·) is Gaussian mixture weighting coefficient.
Due to the introduction of GMM
S the energy function in
Equation (1) now becomes:
E(x, k, θ, z)=E
i
(x, k, θ, z)+E
ij
(x, z), (4)
i.e. the data term d epends on its assignment to GMM
component. It is given by:
E
i
(x, k, θ, z)=
i
log p(z|x, θ, k). (5)
In order to make the energy function robust to low
contrast colour images we modify the smoothness term of
the energy function by adding a new term which measures
the “edginess” of the pixels as follows:
E
ij
(x, z)=λ
1
(i,j)N
[x
i
= x
j
]exp(β||z
i
z
j
||
2
)
+λ
2
(i,j)N
[x
i
= x
j
]exp(β||w
i
w
j
||
2
).
(6)
Here w
i
denotes the magnitude of gradient (edginess) at
pixel i and N denotes the neighbourhood system defined for
the MRF model. The two neighbouring pixels with similar
edginess values are more likely to belong to the same class.
The edginess term enforces this constraint. The constants
λ
1
and λ
2
determine the relative strength of the colour and
edginess differences respectively. Parameters λ
i
and β are
learnt automatically from the image.
The Gaussian Mixture Models, in Equation (5), need to
be initialized with foreground/background seeds. Since our
objective is to make the binar ization fully automatic, we
initialize GMM
S by foreground-background seeds obtained
from our auto seeding algorithm. Then, at each iteration, the
seeds are refined and n ew GMM
S are learnt from them. It
makes the algorithm more powerful and allows it to adapt
to the variations in foreground/background.
A. Auto-seeding
To perform automatic binarization we need to compute
foreground and background seeds for graph cut. Given an
image we first convert it to an edge image using Canny edge
operator and then find the foreground and background seeds
as follows:
1) Foreground seeds: Our foreground seeding algorithm
is highly motivated from the fact that there exist a parallel
edge curve (line) for every edge curve (line) in a character
i.e. if an edge pixel has gradient orientation θ then in
direction of θ there exists an edge pixel whose gradient
orientation is π θ
Step 1: Let p be a non-traversed edge pixel with gradient
orientation θ. For every such edge pixel p we traverse the
edge image in direction of θ until we hit an edge pixel q
whose gradient orientation is (πθ)±
π
36
(i.e. approximately
opposite gradient direction). We mark this line segment
pq
as foreground seed candidate and store the length of it. We
repeat this process for all the non-traversed edge pixels.
After finding all foreground seed candidates, we remove all
those line segments whose length is too high or too low with
respect to the majority of seed candidates. The remaining
line segments are marked as foreground seeds.
Step 2: Handling images with light text on dark back-
ground: When we have such image we rarely get parallel
edge curves (lines) with the above mentioned traversal,
rather many line segments
pq start hitting the image bound-
ary. We automatically detect such situations and subtract π
to the original orientation a nd then follow the same process
as Step 1.
2) Background seeds: For background seeding we adopt
the following scheme: Given an edge image we find out
the horizontal/vertical line having no edge pixel. We mar k
that line as background. When we do not get background
seeds in the above method then we relax our criteria and
mark all those regions as background which are accessible
(without hitting an edge pixel) from at least two sides of
image boundary. In practice, for some cases we do not get
enough background seeds even after relaxation. For such
cases we traverse the edge image from all four sides o f
the image boundary till we hit an edge. We mark all these
regions as background seeds. Figure 2 shows typical initial
seeds fo r the iterative graph cut.

(a) (b)
Figure 2. (a) Input Image (b) Its foreground-background seeds, Red and
blue colour shows foreground and background seeds respectiv ely (Best
vie wed in colour).
Figure 3. Images where auto-seeding fails
Although the proposed auto-seeding method performs
satisfactorily well, it tend to fail in cases where Canny
edge operator produces too many noisy or broken edges.
In such cases some foreground regions are falsely marked
as background and vice-versa, which leads to poor bina-
rization. We show two such example in Figure 3, where our
auto-seeding algorithm fails to mark foreground-background
regions appropriately.
In summary, once we obtain initial seeds, GMM
S for
foreground and background colours are learnt. Then, based
on the data and smoothness terms in Equation (5) and (6)
respectively, the graph is formed. We use standard graph cut
algorithm [15] to obtain initial binarization result. We then
re-estimate GMM
S using an initial binarization result and
iterate the graph cut over new data and smooth ness term,
until convergence. This refines the binary image at each
iteration an d finally pr oduces a clean binary image.
V. R
ESULTS AND DISCUSSIONS
We use sample images from the ICDAR 2003 Robust
Word Recognition dataset [17] for our experiments. It con-
sists of 171 natural scene text images. These images have
several degradations due to uneven lighting, complex back-
ground, blur and similar foreground background colours. To
evaluate the performance of proposed binarization algorithm,
we compare it with the well-known thresholding based bina-
rization techniques like Otsu [5], Sauvola [7], Niblack [8],
Kittler et al. [6]. We also compare our binarization algorithm
with colour thresholding based method proposed in [14].
Note that these classical binarization algorithms produce
white text on black background in case of images with light
text on dark background. On the contrary, our binarization
algorithm works in object segmentation framework and thus
produces black text on white background always. However,
for fair comparison we reverse the colour of binarized output
of the classical methods if they produce white text on black
background.
For the proposed binarization algorithm we used 10
GMM components (5 each for foreground and background).
We empirically determine the number of iteration for graph
cuts as 8, since no significant change in binarization, was
observed beyond 8 iterations. We also show our results with
and without edginess difference in the the pairwise term.
(Note that by edginess difference term we mean, energy
function with gradient magnitude difference in addition to
difference in RGB colour space). For parameter sensitive
algorithms like [7] and [8] we use the parameters from which
we obtain the best OCR accuracy.
All the implementations of the proposed method are done
using C++ graph cut code [15] and Matlab. The proposed
method takes 32 seconds on average to produce final binary
result for an image on system with 2 GB RAM and Intel
R
Core
TM
2 Duo CPU with 2.93 GHz processor system.
A. Qualitative evaluation
First we compare the proposed binarization algorithm with
thresholding based methods intuitively in Figure 4. Samples
of images with uneven lighting, hardly distinguishable fore-
ground/background colours, noisy foreground colours, are
shown in this figure. We observe that our approach produces
clearly readable binary images. Further, our algorithm pro-
duces lesser noise compared to the local thresholding based
algorithms like [7], [8], which also helps to improve the
OCR accuracy.
B. Quantitative evaluation
Quantitative evaluation of bin arization is o ne of the
biggest challenge for document image community [9]. In
this work, we demonstrate the performance of binarization
not only based on OCR accuracy but also in terms of pixel
level accuracy.
1) OCR accuracy: We test OCR accuracy to verify
robustness of our algorithm. For this we fed the binariza-
tion result of all algorithms to commercial OCR ABBYY
fine reader 9.0 [18]. The word and c haracter r ecognition
accuracies are summarized in Table I. Since this dataset
consists of images of tight word boundaries, global methods
(like [5], [6]) performs better than popular local methods.
Furthermore, OCR fails to perform well in case of noisy
binarization output (as in the case of Sauvola and Niblack).
Otsu followed by colour thresholding binarization proposed
in [14] improves the word recognition accuracy but not sig-
nificantly. However, since the proposed algorithm produces
clean binary images, it shows significant improvement in
OCR accuracy.
2) Pixel level accuracy: For comparing various bina-
rization algorithms based on pixel accuracy,wepicked30
images from the ICDAR 2003 word dataset and produced
pixel level binarization ground truth for it. These images

Citations
More filters
Proceedings ArticleDOI

Whole is Greater than Sum of Parts: Recognizing Scene Text Words

TL;DR: This work presents a holistic word recognition framework that represents the scene text image and synthetic images generated from lexicon words using gradient-based features, and recognizes the text in the image by matching the scene and synthetic image features with the novel weighted Dynamic Time Warping (wDTW) approach.
Journal ArticleDOI

Toward Integrated Scene Text Reading

TL;DR: This work describes and evaluates a reading system that combines several pieces, using probabilistic methods for coarsely binarizing a given text region, identifying baselines, and jointly performing word and character segmentation during the recognition process.
Journal ArticleDOI

Strokelets: A Learned Multi-Scale Mid-Level Representation for Scene Text Recognition

TL;DR: This paper proposes a novel multi-scale representation, which leads to accurate, robust character identification and recognition, which consists of a set of mid-level primitives, termed strokelets, which capture the underlying substructures of characters at different granularities.
Journal ArticleDOI

Scene Text Detection and Segmentation Based on Cascaded Convolution Neural Networks

TL;DR: In this method, a CNN-based text-aware candidate text region (CTR) extraction model is designed and trained using both the edges and the whole regions of text, with which coarse CTRs are detected.
Proceedings ArticleDOI

Image Binarization for End-to-End Text Understanding in Natural Images

TL;DR: The main finding is the fact that image binarization methods combined with additional filtering of generated connected components and off-the-shelf OCR engines can achieve state-of- the-art performance for end-to-end text understanding in natural images.
References
More filters
Journal ArticleDOI

"GrabCut": interactive foreground extraction using iterated graph cuts

TL;DR: A more powerful, iterative version of the optimisation of the graph-cut approach is developed and the power of the iterative algorithm is used to simplify substantially the user interaction needed for a given quality of result.
Journal ArticleDOI

An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision

TL;DR: This paper compares the running times of several standard algorithms, as well as a new algorithm that is recently developed that works several times faster than any of the other methods, making near real-time performance possible.
Proceedings ArticleDOI

Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images

TL;DR: In this paper, the user marks certain pixels as "object" or "background" to provide hard constraints for segmentation, and additional soft constraints incorporate both boundary and region information.
Book ChapterDOI

An Experimental Comparison of Min-cut/Max-flow Algorithms for Energy Minimization in Vision

TL;DR: The goal of this paper is to provide an experimental comparison of the efficiency of min-cut/max flow algorithms for applications in vision, comparing the running times of several standard algorithms, as well as a new algorithm that is recently developed.
Frequently Asked Questions (12)
Q1. What are the contributions mentioned in the paper "An mrf model for binarization of natural scene text" ?

Inspired by the success of MRF models for solving object segmentation problems, the authors formulate the binarization problem in this framework. The authors represent the pixels in a document image as random variables in an MRF, and introduce a new energy ( or cost ) function on these variables. The authors show results on word images from the challenging ICDAR 2003 dataset, and compare their performance with previously reported methods. Their approach shows significant improvement in pixel level accuracy as well as OCR accuracy. 

iterative graph cut based binarization is also more suitable for their application as it refines seeds and, binarization output at each iteration and thus produces a clean binarization result even in case of noisy foreground/background distributions. 

Due to the introduction of GMMS the energy function in Equation (1) now becomes:E(x, k, θ, z) = Ei(x, k, θ, z) + Eij(x, z), (4)i.e. the data term depends on its assignment to GMM component. 

The proposed method takes 32 seconds on average to produce final binary result for an image on system with 2 GB RAM and Intel R© CoreTM 2 Duo CPU with 2.93 GHz processor system. 

(Note that by edginess difference term the authors mean, energy function with gradient magnitude difference in addition to difference in RGB colour space). 

the smoothness term most commonly used in literature is the Potts model:Eij(x, z) = λ ∑(i,j)∈N exp−(zi − zj)2 2β2 [xi = xj ] dist(i, j) ,where λ determines the degree of smoothness, dist(i, j) is the Euclidean distance between neighbouring pixels i and j. 

For every such edge pixel p the authors traverse the edge image in direction of θ until the authors hit an edge pixel q whose gradient orientation is (π−θ)± π36 (i.e. approximately opposite gradient direction). 

(5)In order to make the energy function robust to low contrast colour images the authors modify the smoothness term of the energy function by adding a new term which measures the “edginess” of the pixels as follows:Eij(x, z) = λ1 ∑(i,j)∈N [xi = xj ]exp(−β||zi − zj||2)+λ2 ∑(i,j)∈N [xi = xj ]exp(−β||wi − wj ||2). 

ITERATIVE GRAPH CUT BASED BINARIZATIONIn GMMRF framework [4], each pixel colour is generated from one of the 2c Gaussian Mixture Models (GMMS) (c GMMS for foreground and background each) with mean μ and covariance Σ i.e. each foreground colour pixel is generated from following distribution:p(zi|xi, θ, ki) = N (z, θ; μ(xi, ki), Σ(xi, ki)), (3) where N denotes a Gaussian distribution, xi ∈ {0, 1} and ki ∈ {1, ..., c}. 

Although most of these previous algorithms perform satisfactorily for many cases, they suffer from the problems like: (1) Manual tuning of parameters, (2) High sensitivity to the choice of parameters, (3) Handling images with uneven lighting, noisy background, similar foreground-background colours. 

The authors then re-estimate GMMS using an initial binarization result and iterate the graph cut over new data and smoothness term, until convergence. 

But these methods lack a principled formulation of the binarization problem of complex colour documents, and hence can not be generalized.