scispace - formally typeset
Search or ask a question
Topic

Malayalam

About: Malayalam is a research topic. Over the lifetime, 783 publications have been published within this topic receiving 4655 citations. The topic is also known as: ml & Malayalam language.


Papers
More filters
Book
27 Oct 2009
TL;DR: This unique guide/reference is the very first comprehensive book on the subject of OCR (Optical Character Recognition) for Indic scripts and provides a section on the enhancement of text and images obtained from historical Indic palm leaf manuscripts.
Abstract: This unique guide/reference is the very first comprehensive book on the subject of OCR (Optical Character Recognition) for Indic scripts. Features: contains contributions from the leading researchers in the field; discusses data set creation for OCR development; describes OCR systems that cover 8 different scripts Bangla, Devanagari, Gurmukhi, Gujarati, Kannada, Malayalam, Tamil, and Urdu (Perso-Arabic); explores the challenges of Indic script handwriting recognition in the online domain; examines the development of handwriting-based text input systems; describes ongoing work to increase access to Indian cultural heritage materials; provides a section on the enhancement of text and images obtained from historical Indic palm leaf manuscripts; investigates different techniques for word spotting in Indic scripts; reviews mono-lingual and cross-lingual information retrieval in Indic languages. This is an excellent reference for researchers and graduate students studying OCR technology and methodologies.

46 citations

Proceedings ArticleDOI
01 Nov 2013
TL;DR: A consortium effort on building text to speech (TTS) systems for 13 Indian languages using the same common framework and the TTS systems are evaluated using Mean Opinion Score (DMOS) and Word Error Rate (WER).
Abstract: In this paper, we discuss a consortium effort on building text to speech (TTS) systems for 13 Indian languages. There are about 1652 Indian languages. A unified framework is therefore attempted required for building TTSes for Indian languages. As Indian languages are syllable-timed, a syllable-based framework is developed. As quality of speech synthesis is of paramount interest, unit-selection synthesizers are built. Building TTS systems for low-resource languages requires that the data be carefully collected an annotated as the database has to be built from the scratch. Various criteria have to addressed while building the database, namely, speaker selection, pronunciation variation, optimal text selection, handling of out of vocabulary words and so on. The various characteristics of the voice that affect speech synthesis quality are first analysed. Next the design of the corpus of each of the Indian languages is tabulated. The collected data is labeled at the syllable level using a semiautomatic labeling tool. Text to speech synthesizers are built for all the 13 languages, namely, Hindi, Tamil, Marathi, Bengali, Malayalam, Telugu, Kannada, Gujarati, Rajasthani, Assamese, Manipuri, Odia and Bodo using the same common framework. The TTS systems are evaluated using degradation Mean Opinion Score (DMOS) and Word Error Rate (WER). An average DMOS score of ≈3.0 and an average WER of about 20 % is observed across all the languages.

42 citations

Journal ArticleDOI
TL;DR: Using a bilingual dictionary, the Malayalam morphological analyzer and the Tamil morphological generator have been described, a program for analyzing the morphology of an input word.
Abstract: Language Processing (NLP) is both a modern computational technology and a method of investigating and evaluating claims about human language itself. Some prefer the term Computational Linguistics in order to capture this latter function, but NLP is a term that links back into the history of Artificial Intelligence (AI), the general study of cognitive function by computational processes, normally with an emphasis on the role of knowledge representations, that is to say the need for representations of our knowledge of the world in order to understand human language with computers. A morphological analyzer or generator supplies information concerning morphosyntactic properties of the words it analyses or constructs. Morphological Analysis and Generation are important components for building computational grammars as well as Machine Translation. Morphological Analyzer is a program for analyzing the morphology of an input word; the analyzer reads the inflected surface form of each word in a text and provides its lexical form while Generation is the inverse process. Both Analysis and Generation make use of lexicon. Malayalam like the other languages in the Dravidian family exhibits the characteristics of an agglutinative language. Here using a bilingual dictionary, the Malayalam morphological analyzer and the Tamil morphological generator have been described.

38 citations

Book
09 Apr 2012
TL;DR: More than Real as mentioned in this paper draws our attention to a period in Indian history that signified major civilizational change and the emergence of a new, proto-modern vision, in which the imagination came to be recognized as the defining feature of human beings.
Abstract: From the fifteenth to the eighteenth centuries, the major cultures of southern India underwent a revolution in sensibility reminiscent of what had occurred in Renaissance Italy. During this time, the imagination came to be recognized as the defining feature of human beings. "More than Real" draws our attention to a period in Indian history that signified major civilizational change and the emergence of a new, proto-modern vision. In general, India conceived of the imagination as a causative agent: things we perceive are real because we imagine them. David Shulman illuminates this distinctiveness and shows how it differed radically from Western notions of reality and models of the mind. Shulman's explication offers insightful points of comparison with ancient Greek, medieval Islamic, and early modern European theories of mind, and returns Indology to its rightful position of intellectual relevance in the humanities. At a time when contemporary ideologies and language wars threaten to segregate the study of pre-modern India into linguistic silos, Shulman demonstrates through his virtuoso readings of important literary works - works translated lyrically by the author from Sanskrit, Tamil, Telugu, and Malayalam - that Sanskrit and the classical languages of southern India have been intimately interwoven for centuries.

37 citations

Journal ArticleDOI
TL;DR: This paper examined two constructions, It-Cleft Sentences (e.g. It is me who/that wrote the book) and Wh-C Left Sentences, which constitute a problematic area of contemporary research in grammar.
Abstract: The aim of this paper is to examine two constructions, It-Cleft Sentences (e.g. It is me who/that wrote the book) and Wh-Cleft Sentences (e.g. The one who wrote the book is me), which constitute a problematic area of contemporary research in grammar.It-Cleft Sentences and Wh-Cleft Sentences (henceforth ICS and WCS, respectively) appear in a number of languages which are typologically different from each other, and have some, but not all, of their characteristics in common. In Malayalam, for example, in the configuration of the ICS, S¯ is not recognizable: cf. Mohanan, 1978. Both ICS and WCS are present in many European languages (although ICS seem to have a more limited geographic distribution) and in Chinese. In the Semitic languages (Arabic, Hebrew) only the WCS type occurs. The present paper will deal mainly with English constructions and will also present, at the syntactic level, a comparative analysis between the constructions in English and the corresponding constructions in the Romance languages (French, Italian and Spanish). This comparison is useful in that it allows us to study the existence of a field of variability in the syntactic properties characterizing the way these types of sentences are realized in European languages.

37 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
71% related
Sentence
41.2K papers, 929.6K citations
69% related
Language acquisition
33.9K papers, 957.2K citations
65% related
Perception
27.6K papers, 937.2K citations
64% related
Narrative
64.2K papers, 1.1M citations
63% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202376
2022157
202197
202068
201935
201847