scispace - formally typeset
Search or ask a question

Showing papers on "Malayalam published in 2009"


Book
27 Oct 2009
TL;DR: This unique guide/reference is the very first comprehensive book on the subject of OCR (Optical Character Recognition) for Indic scripts and provides a section on the enhancement of text and images obtained from historical Indic palm leaf manuscripts.
Abstract: This unique guide/reference is the very first comprehensive book on the subject of OCR (Optical Character Recognition) for Indic scripts. Features: contains contributions from the leading researchers in the field; discusses data set creation for OCR development; describes OCR systems that cover 8 different scripts Bangla, Devanagari, Gurmukhi, Gujarati, Kannada, Malayalam, Tamil, and Urdu (Perso-Arabic); explores the challenges of Indic script handwriting recognition in the online domain; examines the development of handwriting-based text input systems; describes ongoing work to increase access to Indian cultural heritage materials; provides a section on the enhancement of text and images obtained from historical Indic palm leaf manuscripts; investigates different techniques for word spotting in Indic scripts; reviews mono-lingual and cross-lingual information retrieval in Indic languages. This is an excellent reference for researchers and graduate students studying OCR technology and methodologies.

46 citations


Proceedings ArticleDOI
28 Dec 2009
TL;DR: By using this approach, a given English sentence can be translated to its Malayalam equivalent by using this rule based method.
Abstract: Here we propose a method for translating English sentences to Malayalam. This machine translation is done by rule based method. The core process is mediated by bilingual dictionaries and rules for converting source language structures into target language structures. The rules used in this approach are prepared based on the Parts Of Speech (POS) tag and dependency information obtained from the parser. There are mainly two types of rules used here, one is transfer link rule and the other is morphological rules. In this method, the transfer link rules are used for generating target structure. Morphological rules are used for assigning morphological features. The bilingual dictionary used here is English, Malayalam bilingual dictionary. By using this approach, a given English sentence can be translated to its Malayalam equivalent.

34 citations


Proceedings ArticleDOI
27 Oct 2009
TL;DR: Although the experiments have been performed on a very small corpus, the results have shown that the statistical approach works well with a highly agglutinative language like Malayalam.
Abstract: A Parts of Speech tagger for Malayalam which uses a stochastic approach has been proposed. The tagger makes use of word frequencies and bigram statistics from a corpus. The morphological analyzer is used to generate a tagged corpus due to the unavailability of an annotated corpus in Malayalam. Although the experiments have been performed on a very small corpus, the results have shown that the statistical approach works well with a highly agglutinative language like Malayalam

32 citations


BookDOI
12 Nov 2009
TL;DR: The role of translation in the making of Malayalam literary tradition has been discussed in this article, where Kothari, Rita, and Ramakrishnan discuss the role of translational power in the creation and evolution of the Malayan literary tradition.
Abstract: 1. Acknowledgements 2. Foreword (by Devy, Ganesh) 3. Introduction (by Kothari, Rita) 4. Caste in and Recasting language: Tamil in translation (by Prasad, G.J.V.) 5. Translation as resistance: The role of translation in the making of Malayalam literary tradition (by Ramakrishnan, E.V.) 6. Tellings and renderings in medieval Karnataka: The episode of Kirata Shiva and Arjuna (by Satyanath, T.S.) 7. Translating tragedy into Kannada: Politics of genre and the nationalist elite (by Tharakeshwar, V.B.) 8. The afterlives of panditry: Rethinking fidelity in sacred texts with multiple origins (by Merrill, Christi A.) 9. Beyond textual acts of translation: Kitab At-Tawhid and the Politics of Muslim Identity in British India (by Raja, Masood Ashraf) 10. Reading Gandhi in two tongues (by Suhrud, Tridip) 11. Being-in-translation: Sufism in Sindh (by Kothari, Rita) 12. (Mis)Representation of sufism through translation (by Farahzad, Farzaneh) 13. Translating Indian poetry in the Colonial Period in Korea (by Hyun, Theresa) 14. A. K. Ramanujan: What happened in the library (by Simon, Sherry) 15. An etymological exploration of 'translation' in Japan (by Wakabayashi, Judy) 16. Translating against the grain: Negotiation of meaning in the colonial trial of chief Langalibalele and its aftermath (by Ridge, Stanley G.M.) 17. Index

27 citations


Proceedings ArticleDOI
06 Mar 2009
TL;DR: This paper describes an OCR system for printed text documents in Malayalam, a language of the South Indian State, Kerala that uses wavelet multi-resolution analysis for the purpose of extracting features and Feed Forward Back-propagation Neural Network to accomplish the recognition tasks.
Abstract: OCR reading technology is benefited by the evolution of high-powered desktop computing allowing for the development of more powerful recognition software that can read a variety of common printed fonts and handwritten texts. But still it remains a highly challenging task to implement an OCR that works under all possible conditions and gives highly accurate results. This paper describes an OCR system for printed text documents in Malayalam, a language of the South Indian State, Kerala. The input to the system would be the scanned image of a page of text and the output is a machine editable file. Initially, the image is preprocessed to remove noise and skew. Lines, words and characters are segmented from the processed document image. The proposed method uses wavelet multi-resolution analysis for the purpose of extracting features and Feed Forward Back-propagation Neural Network to accomplish the recognition tasks.

26 citations


Journal ArticleDOI
TL;DR: In this article, a three-decade-long relationship between the economy influenced by the Gulf and Malayalam cinema, in its industrial and narrative context, is examined. And the authors argue that the contestation over regional identity was played out on aesthetic grounds, where the wealth and objects associated with the Gulf economy were deployed both at the formal and thematic levels, to produce claims about the legitimacy and desirability of the changes that become visible in the economic and social hierarchies within the region.
Abstract: This article, taking up for analysis the three-decade-long relationship between the economy influenced by the Gulf and Malayalam cinema, in its industrial and narrative context, argues that the Gulf has been a significant point of reference for the imagining of a cultural identity in Kerala. It attempts to weave together three aspects—the development models that are in place, the economic conditions within which the film industry operates and the textual aspects of the films produced—to foreground the links between the economy, aesthetics and the imagining of regional identity. I argue that the contestation over regional identity was played out on aesthetic grounds, where the wealth and objects associated with the Gulf economy were deployed both at the formal and thematic levels, to produce claims about the legitimacy and desirability of the changes that become visible in the economic and social hierarchies within the region. The article examines the representations of the Gulf within the region, using se...

21 citations


Book ChapterDOI
01 Jan 2009
TL;DR: This chapter presents the approach for recognition of Malayalam documents, both printed and handwritten, and classification results as well as ongoing activities are presented.
Abstract: Malayalam is an Indian language spoken by 40 million people with its own script. It has a rich literary tradition. A character recognition system for this language will be of immense help in a spectrum of applications ranging from data entry to reading aids. The Malayalam script has a large number of similar characters making the recognition problem challenging. In this chapter, we present our approach for recognition of Malayalam documents, both printed and handwritten. Classification results as well as ongoing activities are presented.

16 citations


Proceedings ArticleDOI
28 Dec 2009
TL;DR: This work on incorporating rule based reordering and morphological information for English to Malayalam statistical machine translation by applying simple modified transformation rules on the English parse tree, which is given by the Stanford Dependency Parser.
Abstract: In this paper, we mention our work on incorporating rule based reordering and morphological information for English to Malayalam statistical machine translation. The main ideas which have proven very effective are (i) reordering the English source sentence according to Malayalam syntax, and (ii) using the root suffix separation on both English and Malayalam words. The first one is done by applying simple modified transformation rules on the English parse tree, which is given by the Stanford Dependency Parser. The second one is developed by using a morph analyzer. This approach achieves good performance and better results over the phrase-based system. Our approach avoids the use of parsing for the target language (Malayalam), making it suitable for statistical machine translation from English to Malayalam, since parsing tools for Malayalam are currently not available.

14 citations


Proceedings ArticleDOI
09 Dec 2009
TL;DR: An efficient Online Handwritten character Recognition System for Malayalam Characters (OHR-M) using Kohonen network is presented, which is writer independent with a recognition time of 15–32 milliseconds.
Abstract: This paper presents an efficient Online Handwritten character Recognition System for Malayalam Characters (OHR-M) using Kohonen network. It would help in recognizing Malayalam text entered using pen-like devices. It will be more natural and efficient way for users to enter text using a pen than keyboard and mouse. To identify the difference between similar characters in Malayalam a novel feature extraction method has been adopted-a combination of context bitmap and normalized (x, y) coordinates. The system reported an accuracy of 88.75% which is writer independent with a recognition time of 15–32 milliseconds.

11 citations


01 Jan 2009
TL;DR: Various handcrafted rules designed for the suffix separation process in the English Malayalam SMT are presented and the quick look up table provided can be used as a guideline in implementing suffix separation inMalayalam language.
Abstract: Suffix separation plays a vital role in improving the quality of training in the Statistical Machine Translation from English into Malayalam. The morphological richness and the agglutinative nature of Malayalam make it necessary to retrieve the root word from its inflected form in the training process. The suffix separation process accomplishes this task by scrutinizing the Malayalam words and by applying sandhi rules. In this paper, various handcrafted rules designed for the suffix separation process in the English Malayalam SMT are presented. A classification of these rules is done based on the Malayalam syllable preceding the suffix in the inflected form of the word (check_letter). The suffixes beginning with the vowel sounds like ആല, ഉെെ, ഇല etc are mainly considered in this process. By examining the check_letter in a word, the suffix separation rules can be directly applied to extract the root words. The quick look up table provided in this paper can be used as a guideline in implementing suffix separation in Malayalam language.

9 citations


Proceedings ArticleDOI
04 Dec 2009
TL;DR: The proposed method uses wavelet analysis for extracting features of the image and Back propagation neural network is used to accomplish the recognition tasks of Malayalam Character recognition.
Abstract: This paper specifies an OCR system for printed Malayalam characters. Malayalam is the principal language of the South Indian state Kerala. The input to the system would be the scanned image of a page of text and the output is a machine editable file. Malayalam Character recognition is a complex task because of the presence of two scripts; old script and new script and a lot of combinational characters. Initially, the image is preprocessed to remove noise. Then skew correction methods are applied to the document. Lines, words and characters are segmented from the processed document image. The proposed method uses wavelet analysis for extracting features of the image and Back propagation neural network is used to accomplish the recognition tasks.

01 Jan 2009
TL;DR: This paper addresses the integration of a complete Malayalam Text Read-out system designed for the visually challenged and finds interesting applications in libraries, offices where instructions and notices are to be read and also in assisted filling of application forms.
Abstract: Inclusion of the specially enabled in the IT revolution is both a social obligation as well as a computational challenge in the rapidly advancing digital world today. This paper addresses the integration of a complete Malayalam Text Read-out system designed for the visually challenged. The system accepts a page of printed Malayalam text with English numerals scans it into a digital document which is then subjected to skew correction, segmentation, before feature extraction to perform classification. Once classified, the text in Malayalam is read- out by a text to speech conversion unit. Alternately the user has an option to get Braille prints of selected portions. The system finds interesting applications in libraries, offices where instructions and notices are to be read and also in assisted filling of application forms. Results along with analysis are presented.

Proceedings ArticleDOI
25 Jul 2009
TL;DR: An improved XML standard for storing online handwritten data in Indian languages is proposed, which gives quality labels at different levels to the data, and has provision to annotate all the peculiarities of writing the script of the various Indian languages included in the current consortium project.
Abstract: This article proposes an improved XML standard for storing online handwritten data in Indian languages. This standard has evolved over a period of two years, and is currently being used by the Consortium for online handwritten recognition of Indian languages, for annotating about 100,000 handwritten words in each of six Indian languages, namely, Tamil, Kannada, Telugu, Malayalam, Hindi and Bangla. In order that the huge amount of data that is being collected is useable by the future researchers, it is preferable that the data is stored in a format that is unambiguous and easy to read. The uniqueness of this refined standard is that it gives quality labels at different levels to the data, and has provision to annotate all the peculiarities of writing the script of the various Indian languages included in the current consortium project. The current format allows the use of automated and semi-automated annotation tools.

Proceedings ArticleDOI
27 Oct 2009
TL;DR: This paper addresses the problem of segmentation of printed Malayalam characters, a fairly complex task, along with their characterization through non-trivial dominant Eigen values of column-stochastic image matrices through effective performance of the OCR system.
Abstract: Indian languages especially South Indian languages have several distinct characteristics that are exploited for the development of a robust optical character recognition system (OCR). This paper addresses the problem of segmentation of printed Malayalam characters, a fairly complex task, along with their characterization through non-trivial dominant Eigen values of column-stochastic image matrices. Rectangular image matrices obtained after digitalization, segmentation and normalization are converted to column-stochastic square matrices. Non trivial dominant Eigen values of such matrices have proved to be unique for characterization of printed Malayalam characters. Further, a novel segmentation algorithm has been proposed and tested. Results and analysis presented indicate effective performance of the OCR system.





01 Jan 2009
TL;DR: The proposed method uses wavelet analysis for extracting features of the image and Back propagation neural network is used to accomplish the recognition tasks of Malayalam Character recognition.
Abstract: This paper specifies an OCR system for printed Malayalam characters. Malayalam is the principal language of the South Indian state Kerala. It belongs to the family of Dravidian Language. The input to the system would be the scanned image of a page of text and the output is a machine editable file. Malayalam Character recognition is a complex task because of the presence of two scripts; old script and new script and a lot of combinational characters. Initially, the image is preprocessed to remove noise. Then skew correction methods are applied to the document. Lines, words and characters are segmented from the processed document image. The proposed method uses wavelet analysis for extracting features of the image and Back propagation neural network is used to accomplish the recognition tasks.

01 Jan 2009
TL;DR: The frequencies of word collocations can be used to clearly distinguish an author in a highly inflectious language such as Malayalam and this work tries to extract the word level and character level features present in the text for characterizing the style of an author.
Abstract: Author identification is the problem of identifying the author of an anonymous text or text whose authorship is in doubt from a given set of authors. The works by different authors are strongly distinguished by quantifiable features of the text. This paper deals with the attempts made on identifying the most likely author of a text in Malayalam from a list of authors. Malayalam is a Dravidian language with agglutinative nature and not much successful tools have been developed to extract syntactic & semantic features of texts in this language. We have done a detailed study on the various stylometric features that can be used to form an authors profile and have found that the frequencies of word collocations can be used to clearly distinguish an author in a highly inflectious language such as Malayalam. In our work we try to extract the word level and character level features present in the text for characterizing the style of an author. Our first step was towards creating a profile for each of the candidate authors whose texts were available with us, first from word n-gram frequencies and then by using variable length character n-gram frequencies. Profiles of the set of authors under consideration thus formed, was then compared with the features extracted from anonymous text, to suggest the most likely author.

Journal ArticleDOI
TL;DR: In this paper, the authors look at the way in which the Christians of Kerala adopted idioms typical of "Hindrian" Brāhmanical religion for worship and later for purposes of propaganda.
Abstract: This article looks at the way in which the Christians of Kerala adopted idioms typical of ‘Hindu’ Brāhmanical religion for worship and later for purposes of propaganda. Taken up for detailed study are the works by Rev. John Ernest Hanxledon (1681–1732), a German Jesuit who worked in Kerala in the first three decades of the eighteenth century. Items such as the use of popular Malayalam metres, figures of speech, other literary tropes, the employment of the linguistic technique of social distancing, etc. make for a good study. These have been studied in contrast with the experience of Rev. Roberto de Nobili a century ago.

01 Jan 2009
TL;DR: In this paper,resent Zone Fourier descriptors,Gradient feature andGaborfeature.
Abstract: Character recognition istheimportant areainimage oftheOCR workdoneon Indian language isexcellently processing andpattern recognition fields. Handwritten characterreviewed in(2). In(3)asurvey onfeature extraction methods recognition hasreceived extensive attention inacademic and forcharacter recognition isreviewed. Featureextraction production fields. Therecognition system canbeeither on-line or methodincludes Template matching, Deformable templates, off-line. Off-line handwriting recognition isthesubfield ofopticalUnitary Imagetransforms, Graphdescription, Projection character recognition. India isamulti-lingual andmulti-script Histograms, Contourprofiles, Zoning,Geometric moment country, whereeighteen official scripts areaccepted andhave invariants, Zernike Moments,Splinecurveapproximation, overhundred regional languages. Inthis paperwepresent Zone Fourier descriptors ,Gradient feature andGaborfeature. and Distance metricbasedfeature extraction system. The character centroid iscomputed andtheimageisfurther divided Indiais a multi-lingual and multi-script country intonequal zones. Average distance fromthecharacter centroidcomprising ofeighteen official languages, namelyAssamese, totheeachpixel present inthezoneiscomputed. ThisprocedureBangla,English, Gujarati, Hindi,Kankanai, Kannada, isrepeated forall thezones present inthenumeral image. FinallyKashmiri, Malayalam, Marathi, Nepali, Oriya,Punjabi, nsuchfeatures areextracted forclassification andrecognition. Rajasthani, Sanskrit, Tamil, Telugu andUrdu.Recognition of Support vector machine isusedforsubsequent classification and handwritten Indian scripts isdifficult because ofthepresence recognition purpose. We obtained 97.75%recognition ratefor of numerals, vowels, consonants, vowelmodifiers and Kannada numerals. compound characters.