Home
/
Authors
/
Shamita Ghosh

Author

Shamita Ghosh

Bio: Shamita Ghosh is an academic researcher from Indian Statistical Institute. The author has contributed to research in topics: Standard test image & Optical character recognition. The author has an hindex of 1, co-authored 2 publications receiving 25 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Composite Script Identification and Orientation Detection for Indian Text Images

[...]

Shamita Ghosh¹, Bidyut B. Chaudhuri¹•Institutions (1)

Indian Statistical Institute¹

18 Sep 2011

TL;DR: This paper proposes a script identification method that works for unknown orientation for all 11 official Indian scripts by a multi-stage tree classifier using features invariant to 0°/180° orientation.

...read moreread less

Abstract: A major preprocessing step in a multi-script OCR is to identify the script type of the test document image. The published papers on script identification usually assume that the test image is in correct i.e. 0° orientation. But by mistake a document may be fed to the system in wrong orientation, say at an angle of nearly 180° or ±90°. In this method we propose a script identification method that works for unknown orientation for all 11 official Indian scripts. Here, we first find the skew and counter-rotate the document by the skew angle. This will lead to correct (0°) or upside down (180°) orientation. Then script identification is done by a multi-stage tree classifier using features invariant to 0°/180° orientation. Next we go to find the orientation of the image by a two class classifier for each script. Performance of the proposed method has been tested on a variety of documents and promising results have been obtained.

...read moreread less

25 citations

Proceedings Article•DOI•

Orientation detection of major Indian scripts

[...]

Bidyut B. Chaudhuri¹, Shamita Ghosh¹•Institutions (1)

Indian Statistical Institute¹

25 Jul 2009

TL;DR: Several simple features like gray density, vertical/horizontal crossing counts, border directions and water reservoir area are tested to detect orientation and it is found that the last one works uniformly good for all 11 classes of printed Indian scripts.

...read moreread less

Abstract: The problem of orientation detection of all major printed Indian scripts is considered. For a large skew, the corrected document may be oriented in 0° or 180°. The OCR system will make fatal error if the document is in 180° orientation. We tested with several simple features like gray density, vertical/horizontal crossing counts, border directions and water reservoir area to detect orientation and found that the last one works uniformly good for all 11 classes of printed Indian scripts. Experimental results are presented on moderate data sets of all scripts.

...read moreread less

1 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Script Identification of Multi-Script Documents: A Survey

[...]

Kurban Ubul¹, Gulzira Tursun¹, Alimjan Aysa¹, Donato Impedovo², Giuseppe Pirlo², Tuergen Yibulayin - Show less +2 more•Institutions (2)

Xinjiang University¹, University of Bari²

30 Mar 2017-IEEE Access

TL;DR: The most vital processes in script identification are addressed in detail: identification and discriminating methods, features extraction (local and global, and classification), and classification.

...read moreread less

Abstract: In recent years, with the widespread of Internet and digitized processing of multi-script documents worldwide, script identification techniques have become more important in the pattern recognition field. Script identification concerns methods for identifying different scripts in multi-lingual, multi-script documents. This paper presents a comprehensive overview on research activities in the field and focuses on the most valuable results obtained so far. The most vital processes in script identification are addressed in detail: identification and discriminating methods, features extraction (local and global), and classification. Different kinds of approaches have been developed and promising results have been achieved. This paper reports SoA performance results. This paper reports methods concerning handwritten, printed, and hybrid document processing. More research is necessary to meet the performance levels essential for everyday applications.

...read moreread less

49 citations

Journal Article•DOI•

New Gradient-Spatial-Structural Features for video script identification

[...]

Palaiahnakote Shivakumara¹, Zehuan Yuan², Danni Zhao³, Tong Lu², Chew Lim Tan³ - Show less +1 more•Institutions (3)

University of Malaya¹, Nanjing University², National University of Singapore³

01 Jan 2015-Computer Vision and Image Understanding

TL;DR: This paper proposes to integrate the spatial and the structural features based on end points, intersection points, junction points and straightness of the skeleton of text components in a novel way to identify the scripts.

...read moreread less

39 citations

Proceedings Article•DOI•

New Spatial-Gradient-Features for Video Script Identification

[...]

Danni Zhao¹, Palaiahnakote Shivakumara¹, Shijian Lu², Chew Lim Tan¹•Institutions (2)

National University of Singapore¹, Institute for Infocomm Research Singapore²

27 Mar 2012

TL;DR: New features based on Spatial-Gradient-Features (SGF) at block level for identifying six video scripts namely, Arabic, Chinese, English, Japanese, Korean and Tamil are presented, which helps in enhancing the capability of the current OCR on video text recognition by choosing an appropriate OCR engine when video contains multi-script frames.

...read moreread less

Abstract: In this paper, we present new features based on Spatial-Gradient-Features (SGF) at block level for identifying six video scripts namely, Arabic, Chinese, English, Japanese, Korean and Tamil This works helps in enhancing the capability of the current OCR on video text recognition by choosing an appropriate OCR engine when video contains multi-script frames The input for script identification is the text blocks obtained by our text frame classification method For each text block, we obtain horizontal and vertical gradient information to enhance the contrast of the text pixels We divide the horizontal gradient block into two equal parts as upper and lower at the centroid in the horizontal direction Histogram on the horizontal gradient values of the upper and the lower part is performed to select dominant text pixels In the same way, the method selects dominant pixels from the right and the left parts obtained by dividing the vertical gradient block vertically The method combines the horizontal and the vertical dominant pixels to obtain text components Skeleton concept is used to reduce pixel width to a single pixel to extract spatial features We extract four features based on proximity between end points, junction points, intersection points and pixels The method is evaluated on 770 frames of six scripts in terms of classification rate and is compared with an existing method We have achieved 821% average classification rate

...read moreread less

35 citations

Proceedings Article•DOI•

Automatic Handwritten Indian Scripts Identification

[...]

Rajmohan Pardeshi¹, Bidyut B. Chaudhuri², Mallikarjun Hangarge¹, K. C. Santosh•Institutions (2)

Commerce College, Jaipur¹, Indian Statistical Institute²

15 Dec 2014

TL;DR: A word level handwritten Indian script identification technique that employs the Radon transform, discrete wavelet transform, statistical filters and discrete cosine transform to extract the directional multi-resolution spatial features.

...read moreread less

Abstract: Since OCR engines are usually script-dependent, automatic text recognition in multi-script document requires a pre-processor module that identifies the scripts. Based on this motivation, in this paper, we present a word level handwritten Indian script identification technique. To handle this, words are first segmented by morphological dilation and performed connected component labelling. We then employ the Radon transform, discrete wavelet transform, statistical filters and discrete cosine transform to extract the directional multi-resolution spatial features. We tested the features by using linear discriminant analysis, support vector machine and K-nearest neighbour classifiers over 11 different major Indian scripts (including Roman) in bi-script and tri-script scenario. In our tests, we have achieved maximum accuracies of 98% and 96% for bi-script and tri-scipt respectively.

...read moreread less

35 citations

Proceedings Article•DOI•

Gradient-Angular-Features for Word-wise Video Script Identification

[...]

Palaiahnakote Shivakumara¹, Nabin Sharma², Umapada Pal³, Michael Blumenstein², Chew Lim Tan⁴ - Show less +1 more•Institutions (4)

University of Malaya¹, Griffith University², Indian Statistical Institute³, National University of Singapore⁴

24 Aug 2014

TL;DR: This paper presents new Gradient-Angular-Features (GAF) for video script identification, namely, Arabic, Chinese, English, Japanese, Korean and Tamil, and proposes novel GAF for the PTC to study the structure of the components in the form of cursiveness and softness.

...read moreread less

Abstract: Script identification at the word level is challenging because of complex backgrounds and low resolution of video. The presence of graphics and scene text in video makes the problem more challenging. In this paper, we employ gradient angle segmentation on words from video text lines. This paper presents new Gradient-Angular-Features (GAF) for video script identification, namely, Arabic, Chinese, English, Japanese, Korean and Tamil. This work enables us to select an appropriate OCR when the frame has words of multi-scripts. We employ gradient directional features for segmenting words from video text lines. For each segmented word, we study the gradient information in effective ways to identify text candidates. The skeleton of the text candidates is analyzed to identify Potential Text Candidates (PTC) by filtering out unwanted text candidates. We propose novel GAF for the PTC to study the structure of the components in the form of cursiveness and softness. The histogram operation on the GAF is performed in different ways to obtain discriminative features. The method is evaluated on 760 words of six scripts having low contrast, complex background, different font sizes, etc. in terms of the classification rate and is compared with an existing method to show the effectiveness of the method. We achieve 88.2% average classification rate.

...read moreread less

24 citations

1
2
3
4
…
5
6

Collapse