scispace - formally typeset
Search or ask a question
Topic

Malayalam

About: Malayalam is a research topic. Over the lifetime, 783 publications have been published within this topic receiving 4655 citations. The topic is also known as: ml & Malayalam language.


Papers
More filters
Journal ArticleDOI
TL;DR: It is shown that postulating a Focus Phrase above vP enables us to explain such diverse phenomena as the Malayalam question word's position contiguous to V, the ‘remnant’ in English pseudogapping, the clause-final ‘floated’ focus marker in English, and the position of the “cleft focus” in English andMalayalam clefts.
Abstract: It is shown that postulating a Focus Phrase above vP enables us to explain such diverse phenomena as the Malayalam question word's position contiguous to V, the ‘remnant’ in English pseudogapping, the clause-final ‘floated’ focus marker in English, and the position of the ‘cleft focus’ in English and Malayalam clefts. Assuming a Kaynean view of the underlying structure of SOV languages, we argue that the ‘canonical’ positions to which the verb's internal arguments are moved in these languages are above this Focus Phrase. Postulating an iterable Topic Phrase above the Focus Phrase (and above the ‘canonical’ positions in SOV languages) enables us to account for the definiteness/specificity constraints on clause-internal scrambling in Malayalam, German and Dutch, and on object shift in Scandinavian. Finally, it is shown that all the functions attributed to an ‘outer’ Spec position of vP are better fulfilled by the Topic/Focus positions above vP that we postulated.

166 citations

Patent
12 Apr 2002
TL;DR: In this article, a system and method for writing Indian languages using the English writing scheme is provided that includes specifying a script using English alphabet to represent the various characters and character combinations in various Indian languages.
Abstract: A system and method for writing Indian languages using the English writing scheme is provided that includes specifying a script using the English alphabet to represent the various characters and character combinations in various Indian languages. The specified script follows the writing conventions of English. This script is based on how the Indian languages are spoken and rules are specified to facilitate mapping the sounds represented in English characters to the native language in its written form. This common method is intended for writing Hindi and related languages, such as Sanskrit, Marathi, and Gujarathi, and Bengali and somewhat distant, but closely related Dravidian languages, such as Malayalam, Tamil, Kannada, and Telegu.

147 citations

Proceedings Article
07 Jun 2012
TL;DR: A collection of parallel corpora between English and six languages from the Indian subcontinent, which are low-resource, under-studied, and exhibit linguistic phenomena that are difficult for machine translation research are built.
Abstract: Recent work has established the efficacy of Amazon's Mechanical Turk for constructing parallel corpora for machine translation research. We apply this to building a collection of parallel corpora between English and six languages from the Indian subcontinent: Bengali, Hindi, Malayalam, Tamil, Telugu, and Urdu. These languages are low-resource, under-studied, and exhibit linguistic phenomena that are difficult for machine translation. We conduct a variety of baseline experiments and analysis, and release the data to the community.

134 citations

Journal ArticleDOI
TL;DR: Various feature extraction and classification techniques associated with the offline handwriting recognition of the regional scripts are discussed in this survey, which will serve as a compendium not only for researchers in India, but also for policymakers and practitioners in India.
Abstract: Offline handwriting recognition in Indian regional scripts is an interesting area of research as almost 460 million people in India use regional scripts. The nine major Indian regional scripts are Bangla (for Bengali and Assamese languages), Gujarati, Kannada, Malayalam, Oriya, Gurumukhi (for Punjabi language), Tamil, Telugu, and Nastaliq (for Urdu language). A state-of-the-art survey about the techniques available in the area of offline handwriting recognition (OHR) in Indian regional scripts will be of a great aid to the researchers in the subcontinent and hence a sincere attempt is made in this article to discuss the advancements reported in this regard during the last few decades. The survey is organized into different sections. A brief introduction is given initially about automatic recognition of handwriting and official regional scripts in India. The nine regional scripts are then categorized into four subgroups based on their similarity and evolution information. The first group contains Bangla, Oriya, Gujarati and Gurumukhi scripts. The second group contains Kannada and Telugu scripts and the third group contains Tamil and Malayalam scripts. The fourth group contains only Nastaliq script (Perso-Arabic script for Urdu), which is not an Indo-Aryan script. Various feature extraction and classification techniques associated with the offline handwriting recognition of the regional scripts are discussed in this survey. As it is important to identify the script before the recognition step, a section is dedicated to handwritten script identification techniques. A benchmarking database is very important for any pattern recognition related research. The details of the datasets available in different Indian regional scripts are also mentioned in the article. A separate section is dedicated to the observations made, future scope, and existing difficulties related to handwriting recognition in Indian regional scripts. We hope that this survey will serve as a compendium not only for researchers in India, but also for policymakers and practitioners in India. It will also help to accomplish a target of bringing the researchers working on different Indian scripts together. Looking at the recent developments in OHR of Indian regional scripts, this article will provide a better platform for future research activities.

133 citations

Proceedings ArticleDOI
16 Dec 2020
TL;DR: The HASOC track as mentioned in this paper is dedicated to evaluate technology for finding offensive language and hate speech, which has attracted much interest and over 40 research groups have participated as well as described their approaches in papers.
Abstract: This paper presents the HASOC track and its two parts. HASOC is dedicated to evaluate technology for finding Offensive Language and Hate Speech. HASOC is creating test collections for languages with few resources and English for comparison. The first track within HASOC has continued work from 2019 and provided a testbed of Twitter posts for Hindi, German and English. The second track within HASOC has created test resources for Tamil and Malayalam in native and Latin script. Posts were extracted mainly from Youtube and Twitter. Both tracks have attracted much interest and over 40 research groups have participated as well as described their approaches in papers. In this overview, we present the tasks, the data and the main results.

127 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
71% related
Sentence
41.2K papers, 929.6K citations
69% related
Language acquisition
33.9K papers, 957.2K citations
65% related
Perception
27.6K papers, 937.2K citations
64% related
Narrative
64.2K papers, 1.1M citations
63% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202376
2022157
202197
202068
201935
201847