Author

Chakravorty Bhagvati

Bio: Chakravorty Bhagvati is an academic researcher from University of Hyderabad. The author has contributed to research in topics: Integration testing. The author has an hindex of 1, co-authored 1 publications receiving 25 citations.

Topics: Integration testing

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Experiences of integration and performance testing of multilingual OCR for printed Indian scripts

[...]

Deepak Arya¹, C. V. Jawahar², Chakravorty Bhagvati³, Tushar Patnaik¹, Bidyut B. Chaudhuri⁴, Gurpreet Singh Lehal⁵, Santanu Chaudhury⁶, A. G. Ramakrishna⁷ - Show less +4 more•Institutions (7)

Centre for Development of Advanced Computing¹, International Institute of Information Technology, Hyderabad², University of Hyderabad³, Indian Statistical Institute⁴, Punjabi University⁵, Indian Institute of Technology Delhi⁶, Indian Institute of Science⁷

17 Sep 2011

TL;DR: The project is an attempt to implement an integrated platform for OCR of different Indian languages and currently is being enhanced for handling the space and time constraints, achieving higher recognition accuracies and adding new functionalities.

...read moreread less

Abstract: This paper presents integration and testing scheme for managing a large Multilingual OCR Project. The project is an attempt to implement an integrated platform for OCR of different Indian languages. Software engineering, workflow management and testing processes have been discussed in this paper. The OCR has now been experimentally deployed for some specific applications and currently is being enhanced for handling the space and time constraints, achieving higher recognition accuracies and adding new functionalities.

...read moreread less

26 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A benchmark image database of isolated Bangla handwritten compound characters

[...]

Nibaran Das¹, Kallol Acharya¹, Ram Sarkar¹, Subhadip Basu¹, Mahantapas Kundu¹, Mita Nasipuri¹ - Show less +2 more•Institutions (1)

Jadavpur University¹

01 Dec 2014-International Journal on Document Analysis and Recognition

TL;DR: A benchmark image database of isolated handwritten Bangla compound characters, used in the standard Bangla literature, is presented, which may facilitate research on handwritten character recognition, especially related to Bangla form document processing systems.

...read moreread less

Abstract: In the present work, we present a benchmark image database of isolated handwritten Bangla compound characters, used in the standard Bangla literature. A thorough survey over more than 2 million Bangla words has revealed that there exist around 334 compound characters in Bangla script. Of which, only around 171 character classes form unique pattern shapes, and some of these classes are often written in multiple styles. Altogether, 55,278 isolated character images, belonging to 199 different pattern shapes, are collected using three different data collection modalities. The database is divided into training and test sets in 4:1 ratio for each pattern class, by considering a balanced distribution of shapes from different modalities. A convex hull and quadtree-based feature set has been designed, and the test set recognition performance is reported with the support vector machine classifier. We have achieved a recognition accuracy of 79.35 % on the test database consisting of 171 character classes. The complete compound character image database is freely available as CMATERdb 3.1.3.3 from the website http://code.google.com/p/cmaterdb/ , which may facilitate research on handwritten character recognition, especially related to Bangla form document processing systems.

...read moreread less

81 citations

Proceedings Article•

Recognition of printed Devanagari text using BLSTM Neural Network

[...]

Naveen Sankaran¹, C. V. Jawahar¹•Institutions (1)

International Institute of Information Technology, Hyderabad¹

01 Nov 2012

TL;DR: This paper proposes a recognition scheme for the Indian script of Devanagari using a Recurrent Neural Network known as Bidirectional LongShort Term Memory (BLSTM) and reports a reduction of more than 20% in word error rate and over 9% reduction in character error rate while comparing with the best available OCR system.

...read moreread less

Abstract: In this paper, we propose a recognition scheme for the Indian script of Devanagari. Recognition accuracy of Devanagari script is not yet comparable to its Roman counterparts. This is mainly due to the complexity of the script, writing style etc. Our solution uses a Recurrent Neural Network known as Bidirectional LongShort Term Memory (BLSTM). Our approach does not require word to character segmentation, which is one of the most common reason for high word error rate. We report a reduction of more than 20% in word error rate and over 9% reduction in character error rate while comparing with the best available OCR system.

...read moreread less

64 citations

Proceedings Article•DOI•

Multilingual OCR for Indic Scripts

[...]

Minesh Mathew¹, Ajeet Kumar Singh¹, C. V. Jawahar¹•Institutions (1)

International Institute of Information Technology, Hyderabad¹

11 Apr 2016

TL;DR: An end-to-end RNN based architecture which can detect the script and recognize the text in a segmentation-free manner is proposed for this purpose and demonstrated for 12 Indian languages and English.

...read moreread less

Abstract: In Indian scenario, a document analysis system has to support multiple languages at the same time. With emerging multilingualism in urban India, often bilingual, trilingual or even more languages need to be supported. This demands development of a multilingual OCR system which can work seamlessly across Indic scripts. In our approach the script is identified at word level, prior to the recognition of the word. An end-to-end RNN based architecture which can detect the script and recognize the text in a segmentation-free manner is proposed for this purpose. We demonstrate the approach for 12 Indian languages and English. It is observed that, even with the similar architecture, performance on Indian languages are poorer compared to English. We investigate this further. Our approach is evaluated on a large corpus comprising of thousands of pages. The Hindi OCR is compared with other popular OCRs for the language, as a further testimony for the efficacy of our method.

...read moreread less

40 citations

Proceedings Article•DOI•

Robust Recognition of Degraded Documents Using Character N-Grams

[...]

Shrey Dutta¹, Naveen Sankaran¹, K. Pramod Sankar², C. V. Jawahar¹•Institutions (2)

International Institute of Information Technology, Hyderabad¹, Xerox²

27 Mar 2012

TL;DR: A novel recognition approach that results in a 15% decrease in word error rate on heavily degraded Indian language document images by exploiting the additional context present in the character n-gram images, which enables better disambiguation between confusing characters in the recognition phase.

...read moreread less

Abstract: In this paper we present a novel recognition approach that results in a 15% decrease in word error rate on heavily degraded Indian language document images. OCRs have considerably good performance on good quality documents, but fail easily in presence of degradations. Also, classical OCR approaches perform poorly over complex scripts such as those for Indian languages. We address these issues by proposing to recognize character n-gram images, which are basically groupings of consecutive character/component segments. Our approach is unique, since we use the character n-grams as a primitive for recognition rather than for post processing. By exploiting the additional context present in the character n-gram images, we enable better disambiguation between confusing characters in the recognition phase. The labels obtained from recognizing the constituent n-grams are then fused to obtain a label for the word that emitted them. Our method is inherently robust to degradations such as cuts and merges which are common in digital libraries of scanned documents. We also present a reliable and scalable scheme for recognizing character n-gram images. Tests on English and Malayalam document images show considerable improvement in recognition in the case of heavily degraded documents.

...read moreread less

40 citations

Proceedings Article•DOI•

Towards a Robust OCR System for Indic Scripts

[...]

Praveen Krishnan¹, Naveen Sankaran¹, Ajeet Kumar Singh¹, C. V. Jawahar¹•Institutions (1)

International Institute of Information Technology, Hyderabad¹

07 Apr 2014

TL;DR: A web based OCR system which follows a unified architecture for seven Indian languages, is robust against popular degradations, follows a segmentation free approach, addresses the UNICODE re-ordering issues, and can enable continuous learning with user inputs and feedbacks is proposed.

...read moreread less

Abstract: The current Optical Character Recognition OCR systems for Indic scripts are not robust enough for recognizing arbitrary collection of printed documents. Reasons for this limitation includes the lack of resources (e.g. not enough examples with natural variations, lack of documentation available about the possible font/style variations) and the architecture which necessitates hard segmentation of word images followed by an isolated symbol recognition. Variations among scripts, latent symbol to UNICODE conversion rules, non-standard fonts/styles and large degradations are some of the major reasons for the unavailability of robust solutions. In this paper, we propose a web based OCR system which (i) follows a unified architecture for seven Indian languages, (ii) is robust against popular degradations, (iii) follows a segmentation free approach, (iv) addresses the UNICODE re-ordering issues, and (v) can enable continuous learning with user inputs and feedbacks. Our system is designed to aid the continuous learning while being usable i.e., we capture the user inputs (say example images) for further improving the OCRs. We use the popular BLSTM based transcription scheme to achieve our target. This also enables incremental training and refinement in a seamless manner. We report superior accuracy rates in comparison with the available OCRs for the seven Indian languages.

...read moreread less

24 citations

Collapse