What is the main advantage of Gabor function-based texture analysis?

Tan [7] developed Gabor function-based texture analysis for machine-printed script identification that discriminates Chinese, Latin, Greek, Russian, Persian, and Malayalam script documents.

Why is the IMF not used as a discriminating feature?

all the Intrinsic mode functions(IMF) obtained does not contain sufficient energy in that orientaion and can not be used as discriminating feature.

(Open Access) An empirical intrinsic mode based characterization of Indian scripts (2012) | Kavita Bhardwaj

Q: What contributions have the authors mentioned in the paper "An empirical intrinsic mode based characterization of indian scripts" ?

In this paper, the authors describe a novel technique for Document script identification ( DSI ) from printed documents, using Empirical Mode Decomposition ( EMD ). The authors demonstrate how the proposed method use these IMFs as feature vectors to distinguish various scripts.

Q: What is the purpose of the identification of the script used in printed documents?

The Identification of the script used in printed documents is useful for the digitization of the conventional paper documents, sorting of document images according to the scripts in which they are written, for selecting appropriate script-specific OCRs for the retrieval of online archives of document images or for indexing of documents in digital library.

Q: What are the two prerequisites for a similarity measure?

Pre-requisites– the number of external and the number of zero crossing must either equal or differ at most by one;– should be symmetric with respect to local zero mean.

An Empirical Intrinsic mode based characterization of

Indian Scripts

Kavita Bhardwaj

∗

Indian Institute of Technology

New Delhi,India

kavitab788@gmail.com

Santanu Chaudhury

Indian Institute of Technology

New Delhi,India

schaudhury@gmail.com

Sumantra Dutta Roy

Indian Institute of Technology

New Delhi,India

sumantra.dutta.roy@gmail.com

ABSTRACT

In this paper, we describe a novel technique for Document script

identiﬁcation(DSI) from printed documents, using Empirical Mode

Decomposition (EMD). The intrinsic decomposition nature can adap-

tively decompose script images into a series of modes representing

different local features of script images. In this method, Radon

transformed script images are decomposed into ﬁnite set of IMFs

(Intrinsic Mode Functions). The energy concentration in a particu-

lar orientation characterises a script texture as it indicates the domi-

nance of individual script in that direction. We demonstrate how the

proposed method use these IMFs as feature vectors to distinguish

various scripts.

Keywords:

Empirical mode decomposition(EMD), Radon transform, Intrinsic

mode function, AdaBoostM1

1. INTRODUCTION

The Identiﬁcation of the script used in printed documents is use-

ful for the digitization of the conventional paper documents, sort-

ing of document images according to the scripts in which they are

written, for selecting appropriate script-speciﬁc OCRs for the re-

trieval of online archives of document images or for indexing of

documents in digital library.

Ghosh et al. [1] proposes the categorisation of script recogni-

tion methods as structure-based and visual appearance-based. He

discussed the methods of both categories at page-level, paragraph-

level, word-level and character-level. A vast survey is presented

for each of the categories. By reffering Wang et al. [9] and Ghosh

et al. [1], it is found that according to the feature extraction, all

the methods lying under any category are grouped into three major

categories- Statistical-information based methods, Structure-based

methods, Texture-based methods. Statistical information-based al-

gorithms use character density distribution, vertical and horizontal

projections, for classifying printed documents. Waked et al. [8]

used bounding box size distribution, character density distribution,

∗

Corresponding author

Permission to make digital or hard copies of part or all of this work for

personal or classroom use is granted without fee provided that copies are not

made or distributed for proﬁt or commercial advantage and that copies bear

this notice and the full citation on the ﬁrst page. Copyrights for components

of this work owned by others than ACM must be honored. Abstracting with

credit is permitted. To copy otherwise, to republish, to post on servers or to

redistribute to lists, requires prior speciﬁc permission and/or a fee.

DAR ’12, December 16, 2012, Mumbai, IN, India

vertical and horizontal projections for the classiﬁcation of printed

documents. Lam et al. [5] has also used statistical features for

script identiﬁcation in printed-documents. These methods are more

useful for scripts that differ signiﬁcantly in style. Structure-based

methods focus on extraction and analysis of connected components

and use the identiﬁcation results of these â

AIJsignaturesâ

I to de-

termine the script(s) used. These methods in general have advan-

tage of discriminating similar scripts. Hochberg et al. [2] exploited

the shape characteristics of "textual symbols" for the identiﬁcation

of script(s). Pal and Chaudhuri [6], presented the script charac-

teristics and shape based features for script identiﬁcation. Visual-

appearance and texture analysis-based methods are related, because

according to appearance of any text block, corresponding texture

analysis-based method can be used for extraction of features. Joshi

et al. [4] proposes the Gabor function-based texture analysis to

extract features and used hierarchical classiﬁcation to distinguish

among the script(s). Tan [7] developed Gabor function-based tex-

ture analysis for machine-printed script identiﬁcation that discrim-

inates Chinese, Latin, Greek, Russian, Persian, and Malayalam

script documents.

For our problem, we propose an algorithm based on Empirical

Mode Decomposition(EMD) for textural analysis of script classes.

The directionality and periodicity reﬂect the effective directions for

textural processing of subpatterns. Each script class will always ex-

hibit a speciﬁc periodicity at a particular angular orientation. This

is observed in Radon transformed image of the script classes con-

sidered for the problem and they are decomposed in different mode

functions to compute directional energy speciﬁed by each IMF.

The scripts involved in this paper are Devnagari, Roman(English),

Malayalam, Bangla and Gurumukhi. The cosine similarity measure

is used as our measure to deﬁne the most similar script class for the

descrimination. We use Adaboost binary decision tree to improve

the classiﬁcation.

The rest of the paper is organized as follows. In (Sec. 2), we sum-

marize the proposed approach and the framework for the problem is

described in (subsec. 2.1). The results are shown through Table 3in

(Sec. 3). Experimental observation are described in (Sec. 4). Con-

clusions and future work are discussed in (Sec. 5).

2. THE PROPOSED APPROACH

The method described in this paper involves four main steps.

• First, the preprocessing is performed initially to remove noise

that includes binarization of document images.

• Second, Radon transform is computed on document images

of each script at different angles of orientation between 0

◦

to 90

◦

. The unique characteristic of each script is observed

at a particular orientation in radon transformed image. The

Figure 1: Schematic Block Diagram of the System

transformed image is decomposed using empirical mode de-

composition(EMD) in a ﬁnite set of IMFs.

• Third, the energy is computed corresponding to each Intrin-

sic Mode Function(IMF) at different orientations. The en-

ergy at a particular orientation characterises a script class tex-

ture as it indicates the dominance of individual script class in

that orientation.

• This angle of orientation and the IMF exhibiting maximum

energy helps to chose a distinguishing feature vector for in-

dividual script.

• Fourth, the feature vectors for test script images are obtained

similarly and then classiﬁed using AdaBoost binary decision

tree.

Figure 1 shows a schematic diagram of the system. We will dis-

cuss each of the modules in detail in the upcoming sections of this

paper.

2.1 A Generalised Framework to Script Iden-

tiﬁcation

We propose a general framework to address the problem of script

identiﬁcation. This proposed scheme (for document script iden-

tifcation exploits the textural characteristic of each script. The

constructing patterns of each script are formed by oriented lin-

ear/curvilinear subpatterns. The energy distributed at different ori-

entations, characterises each script according to the textural char-

acteristics of each script. For instance, the devnagari script is char-

acterised by the dominance of horizontal lines, whereas Malay-

alam script is characterised by the dominance of vertical lines and

curves. While Roman(English) script is a good mix of linear and

curved subpatterns. On the other hand, energy is distributed more

or less evenly amongst different orientations for scripts which have

curved patterns, like Malayalam and Gurumukhi. Here, in the pro-

posed framework, features are learned for each script from prior

knowledge(from the training data set) of target script classes. These

extracted features contain ﬁner and sufﬁcient discriminating details

of the scripts. A script class is separated from other script class by

exploiting its unique features.

2.2 Feature Extraction

The energy based features are used corresponding to each script

class. We use Empirical mode decomposition to capture the ori-

ented local energy . As discussed earlier, the constructing subpat-

terns of each script are composed of linear/ curvilinear subpatterns.

Therefore any script class can be distinguished from other based on

the structural subpattern behavior.

From the prior observation(having the knowledge about struc-

tural subpatterns) of each script, we have empirically determined

the angular orientation.

We have computed the oriented local energy features for the dis-

crimination between script class. Also in Table 2, we have deﬁned

the average energy and the variance corresponding to each script

class. While testing a document or deﬁning a script class to a doc-

ument, the energy corresponding to the test document is compared

with the average energy corresponding to each class. The variance

for each script class is also computed. The projections are taken us-

ing Radon transform at different orientations for each script class

between angle 0

◦

to 90

◦

. The transformed script images are de-

composed using EMD(Empirical mode decomposition) to analyze

local characteristics and a ﬁnite set of IMF(s) are obtained. We

have considered ﬁrst four IMF(s) in our experiments.Since, all the

Intrinsic mode functions(IMF) obtained does not contain sufﬁcient

energy in that orientaion and can not be used as discriminating fea-

ture. We have empirically determined the IMF in which the en-

ergy distribution is maximum for that script class and moreover the

script class is distinguishable from other script class. The angu-

lar orientation at which Radon transform is computed and the IMF

compositly represents the distinguishing feature vector for each

script class.

Table 1 shows the feature vectors selected for the script classes

considered for our problem.

Features Extracted for each script class

Feat_vector Script class angle of orient. IMF

FV1 Devnagari 90

◦

IMF1

FV2 Malayalam 90

◦

IMF3

FV3 Gurumukhi 0 − 10

◦

IMF1

FV4 Roman 0

◦

IMF4

FV5 Bangla 90

◦

IMF2

Table 1: Feature vectors deﬁned for script classes

The average energy and the variance corresponding to each fea-

ture vector is shown below in Table 2.

Average energy and Variance for each Feat_vect

Feat_vector Average energy Variance

FV1 0.0338 6.8798 exp −04

FV2 0.0243 2.5480e-04

FV3 0.0633 0.0057

FV4 0.0437 0.0028

FV5 0.0479 8.1585 exp −04

Table 2: Average energy and Variance

2.3 Feature Selection

There are many potential beneﬁts to feature selection like facili-

tating data visualization, data understanding, reducing training and

utilization times and improving accuracy. Feature selection is se-

lecting the most relevant variables, is usually suboptimal for build-

ing a predictor, particularly if the variables are redundant. Here,

we have used the feature selection for selecting appropriate and

relevant features for the problem at hand. The angle of orientation

chosen for computing radon transform is totally dependent on data

visualization and understanding of data. We have empirically com-

puted the angular orientation during training by having knowledge

and understanding of a script class.

Next, as aforementioned, EMD method decomposes a signal into

a set of components called IMF. But all the IMF(s) are not relevant

as they all are not informative. So we have to choose only the IMF

that contain useful information(maximum energy) and discarding

those that share similar amounts of energy. We have not used any

standard method for selection of these features, as they are not help-

ful for our problem. Rather we have empirically found the angular

orientation and IMF which can be used a distinguishing feature for

individual script class.

2.4 Classiﬁer Design

For our problem, we use to deﬁne the most similar script class

to the training images as well as to testing document. We use Ad-

aboost binary decision tree as a classiﬁer to improve the results.

This involves training of the classiﬁer and testing the new docu-

ment for different script classes.

• Train the classiﬁer: The system extracts the textural feature

of different script classes and they depend on the charac-

teristics of speciﬁc scripts. we have performed the training

based on the distance of a training sample to each class. Dur-

ing training, we extract the above discussed features for our

dataset and use cosine similarity measure as our measure of

similarity for the discrimination. This similarity measure can

be computed amongst arbitrary vectors.

• Classifying a new document: Testing a new document, we

compute the features in the same way as described above.

Cosine similarity measure is used for the comparison with

other script classes in the dataset and the most similar script

class is assigned to the test document. To improve the per-

formance of classiﬁcation, we use Adaboost binary decision

tree.

2.5 Empirical Mode Decomposition

In this paper, we utilize the aspect of decomposing a signal into

IMFs for analyzing nonstationary and nonlinear time series data

developed by Huang et al. [3]. The intrinsic decomposition nature

can adaptively decompose images into a series of modes represent-

ing different local features of images. The decomposition is based

on the local characteristic of time scale of the data.

• Pre-requisites

– the number of external and the number of zero crossing

must either equal or differ at most by one;

– should be symmetric with respect to local zero mean.

With these two prerequisites, an IMF can be represented with

a meaningful instantaneous frequency.

• Sifting Procedure

From a given signal x(t), we extract IMFs using sifting pro-

cess satisfying the above deﬁned conditions.

1. Identify all extremas

2. interpolate between minima with respective maxima end-

ing up with L

min

(t) respective L

max

(t)

3. compute the mean m(t) = (L

min

(t) + L

max

(t))/2

4. extract the details d(t) = x(t) − m(t)

5. iterate the same process on the residual m(t)

Once this is achieved, the detail is referred to as an Intrin-

sic Mode Function (IMF), the corresponding residual m(t) is

computed and step 5 applies.

• Stopping Criterian

If we go for sifting beyond a limit, we will get IMFs as fre-

quency modulated signal, but constant in amplitude. So stan-

dard deviation is computed as stopping equation to stop sift-

ing. The equation is:

x(t) =

i=1

IMF

+ r

(t) (1)

where m is the number of IMFs obtained for a given signal

and r

(t) is ﬁnal residual.

3. RESULTS AND DISCUSSION

We have used 300 samples of each script class and a total of 1500

images database have been used for out experiments. These experi-

ments have been done vice a versa for each pair of script class clas-

siﬁcation. We have performed the experiments on training and test-

ing dataset vice a versa. So the accuracy of the experiments shown

below is the average accuracy. As far as the feature extracted for

script classes under consideration is concerned, it depends on the

ratio of linear,curvical and curvilinear textural behavior of the sub-

patterns. For example the ratio of curvical and linear subpatterns

are more in Devnagari and Roman than Gurumukhi and Bangla.

The script classes Gurumukhi and Bangla contains more cuvilinear

subpatterns than linear. While if Malyalam is considered, it con-

tains more cuvical subpatterns than Gurumukhi and Bangla even.

So the angle of orientation chosen strictly depends on ratio of the

structural subpattern behavior. While selecteing the feature vector

for any script class the structural behavior of must be analyzed.

Table 3 shows the average classiﬁcation accuracy evaluated of

all the script classes.

Classiﬁcation accuracy achieved

Script Script classes

Dev Mal Guru Roman Bangla

Devnagari 97% 3% - - -

Malyalam 4% 96% - - -

Devnagari 97% - - 3% -

Roman 5% - - 95% -

Malayalam - 94% - 6% -

Roman - 3% - 97% -

Devnagari 92% - - 8% -

Gurumukhi 10% - 90% - -

Bangla - - 4% - 96%

Gurumukhi - - 92% - 8%

4. EXPERIMENTAL OBSERVATION

We have applied the proposed script identiﬁcation(classiﬁcation)

method on OCR database document images. We have used for our

experimentation 300 documents of each script, so dataset is com-

posed of total 1500 document images. 50% of the images are used

for training and rest 50% for testing of each script. We have used

512 by 512 document images for extracting features. The average

number of characters in 512 by 512 document image is approx.

1500 to 2000. The proposed method gives better performance on

document image with minimum 256 by 256 size , because it can

capture sufﬁcient distribution of pixel intensity according to strokes

of each script class. We have done the experimentation on inter-

changing the traing and testing dataset. So the performance shown

above in (Sec. 3) is the average performance of both experiments

done on different document images of the dataset.

5. CONCLUSION AND FUTURE WORK

This method is developed for identiﬁcation(classiﬁcation) of mul-

tiplt script classes individually. The strong potential of the pre-

sented work here is its application and accuracy. The EMD(Empirical

Based Decomposition) based approach has been applied in vari-

ous applications but here is the ﬁrst of its kind for script identiﬁca-

tion. The proposed method is computationally less time consum-

ing, hence for script speciﬁc identiﬁcation applications, it will be

highly effective as compared to other expensive feature extraction

and classiﬁcation methods. Also the performance of the method

is reasonably high. In future, We can extend this work for more

number of scripts and even for different script classes. We can also

apply it for text/image separation. As the energy distribution of im-

ages are evenly scattered at diiferent mode functions but in text it

will always be high and oriented than images. Also the proposed

method can be extended to multiple scales for script identiﬁcation.

Acknowledgment

The authors are grateful to Prof. S.D Joshi, Dept. of Electrical

Engg., IIT, Delhi for their helpful discussion and encouragement

during this work.

6. REFERENCES

[1] D. Ghosh, T. Dube, and S. A.P. Script Recognition - A

Review. IEEE Trans. Pattern Analysis and Machine

Intelligence, 32:2142 – 2161, 2010.

[2] L. Hochberg, L. Kerns, P. Kelly, and T. Thomas. Automatic

Script Identiﬁcation from images using Cluster-based

Templates. IEEE Trans. Pattern Analysis and Machine

Intelligence, 19:176–181, 1997.

[3] H. Huang and J. Pan. Speech pitch determination based on

hilbert-huang transform. Signal Processing.

[4] G. Joshi, S. Garg, and J. Sivaswamy. A generalized framework

for script identiﬁcation. Int. J. on Document Analysis and

Recognition, page 55â

A¸S68, 2007.

[5] L. LAM, J. DING, and C. SUEN. Differentiating between

Oriental and European Scripts by Statistical Features.

International Journal of Pattern Recognition and Artiﬁcial

Intelligence, pages 63–79, 1998.

[6] U. Pal and B. Chaudhuri. Script Line Separation from Indian

Multi-Script Documents. In Int. Conf. Document Analysis and

Recognition, pages 406–409, 1999.

[7] T. Tan. Rotation invariant texture features and their use in

automatic script identiﬁcation. IEEE Trans. Pattern Analysis

and Machine Intelligence, pages 751–756, 1998.

[8] B. Waked, S. Bergler, C. Suen, S. Khoury, and C. Y. S. S.

Khoury. Skew Detection, page segmentation, and script

classiﬁcation of printed document images. pages 4470–4475,

1998.

[9] N. Wang, L. Lam, and C. Y. Suen. Noise tolerant script

identiﬁcation of printed oriental and english documents using

a downgraded pixel density feature. Pattern Recognition,

International Conference on, pages 2037–2040, 2010.

An empirical intrinsic mode based characterization of Indian scripts

Figures

Citations

Directional Discrete Cosine Transform for Handwritten Script Identification

References

Rotation invariant texture features and their use in automatic script identification

Script Recognition—A Review

Automatic script identification from document images using cluster-based templates

Speech pitch determination based on Hilbert-Huang transform

Script line separation from Indian multi-script documents

Related Papers (5)

Decomposition of functions into pairs of intrinsic mode functions

Analysis of the Intrinsic Mode Functions

An optimization based empirical mode decomposition scheme

Variable sampling of the empirical mode decomposition of two-dimensional signals

Search for information-bearing components in neural data.

Frequently Asked Questions (6)

Q1. What contributions have the authors mentioned in the paper "An empirical intrinsic mode based characterization of indian scripts" ?

Q2. What is the angle of orientation for computing radon transform?

Q3. What is the purpose of the identification of the script used in printed documents?

Q4. What are the two prerequisites for a similarity measure?

Q5. What is the main advantage of Gabor function-based texture analysis?

Q6. Why is the IMF not used as a discriminating feature?