scispace - formally typeset
Search or ask a question
Topic

Malayalam

About: Malayalam is a research topic. Over the lifetime, 783 publications have been published within this topic receiving 4655 citations. The topic is also known as: ml & Malayalam language.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper proposes for relevant dialects a mora-sharing solution that recognises the special status of syllables incorporating long segments – long vowels or geminate consonants in Classical Arabic and many modern Arabic dialects.
Abstract: In Classical Arabic and many modern Arabic dialects, syllables ending in VVC or in the left leg of a geminate have a special status. An examination of Kiparsky's (2003) semisyllable account of syllabification types and related phenomena in Arabic against a wider set of data shows that while this account explains much syllable-related variation, certain phenomena cannot be captured, and several dialects appear to exhibit conflicting syllable-related phenomena. Phenomena not readily covered by the semisyllable account commonly involve long segments – long vowels or geminate consonants. In this paper, I propose for relevant dialects a mora-sharing solution that recognises the special status of syllables incorporating long segments. Such a mora-sharing solution is not new, but has been proposed for the analysis of syllables containing long segments in a number of languages, including Arabic (Broselow 1992, Broselow et al. 1995), Malayalam, Hindi (Broselow et al. 1997) and Bantu languages (Maddieson 1993, Hubbard 1995).

36 citations

Journal ArticleDOI
TL;DR: A deep learning approach is proposed for learning the rules for identifying the morphemes automatically and segmenting them from the original word, to identify the grammatical structure of the word.

36 citations

Proceedings Article
01 May 2020
TL;DR: Free high quality multi-speaker speech corpora for Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu, which are six of the twenty two official languages of India spoken by 374 million native speakers are presented.
Abstract: We present free high quality multi-speaker speech corpora for Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu, which are six of the twenty two official languages of India spoken by 374 million native speakers. The datasets are primarily intended for use in text-to-speech (TTS) applications, such as constructing multilingual voices or being used for speaker or language adaptation. Most of the corpora (apart from Marathi, which is a female-only database) consist of at least 2,000 recorded lines from female and male native speakers of the language. We present the methodological details behind corpora acquisition, which can be scaled to acquiring data for other languages of interest. We describe the experiments in building a multilingual text-to-speech model that is constructed by combining our corpora. Our results indicate that using these corpora results in good quality voices, with Mean Opinion Scores (MOS) > 3.6, for all the languages tested. We believe that these resources, released with an open-source license, and the described methodology will help in the progress of speech applications for the languages described and aid corpora development for other, smaller, languages of India and beyond.

34 citations

Book ChapterDOI
13 Dec 2006
TL;DR: A document segmentation algorithm that can handle the complexity of Indian scripts in large document image collections by being posed as a graph cut problem that incorporates the apriori information from script structure in the objective function of the cut.
Abstract: Most of the state-of-the-art segmentation algorithms are designed to handle complex document layouts and backgrounds, while assuming a simple script structure such as in Roman script. They perform poorly when used with Indian languages, where the components are not strictly collinear. In this paper, we propose a document segmentation algorithm that can handle the complexity of Indian scripts in large document image collections. Segmentation is posed as a graph cut problem that incorporates the apriori information from script structure in the objective function of the cut. We show that this information can be learned automatically and be adapted within a collection of documents (a book) and across collections to achieve accurate segmentation. We show the results on Indian language documents in Telugu script. The approach is also applicable to other languages with complex scripts such as Bangla, Kannada, Malayalam, and Urdu.

34 citations

Book
01 Jan 1998
TL;DR: This article collected twenty-nine published and unpublished papers by the linguist James Gair, considered the foremost western scholar of the Sri Lankan languages Sinhala and Jaffna Tamil, who considered issues in a variety of Indian languages, including Hindi, Marathi, Tamil, Malayalam, and Bengali.
Abstract: This volume collects twenty-nine published and unpublished papers by the linguist James Gair, considered the foremost western scholar of the Sri Lankan languages Sinhala and Jaffna Tamil. Ranging over thirty years, his work also considers issues in a variety of Indian languages, including Hindi, Marathi, Tamil, Malayalam, and Bengali. The collection reflects the wide range of Gair's interests, from morpho-syntactic questions to questions regarding historical and areal linguistics, especially language contact and diglossia, and extending to language acquisition. By collecting these papers and making them newly accessible, this volume will provide an important resource not only for scholars of these languages but for linguists interested in the theoretical issues Gair explores.

34 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
71% related
Sentence
41.2K papers, 929.6K citations
69% related
Language acquisition
33.9K papers, 957.2K citations
65% related
Perception
27.6K papers, 937.2K citations
64% related
Narrative
64.2K papers, 1.1M citations
63% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202376
2022157
202197
202068
201935
201847