scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Om: one tool for many (Indian) languages

TL;DR: Om as discussed by the authors uses ASCII characters to represent Indian language alphabets, and thus can be read directly in English, by a large number of users who cannot read script in other Indian languages than their mother tongue.
Abstract: Many different languages are spoken in India, each language being the mother tongue of tens of millions of people. While the languages and scripts are distinct from each other, the grammar and the alphabet are similar to a large extent. One common feature is that all the Indian languages are phonetic in nature. In this paper we describe the development of a transliteration scheme Om which exploits this phonetic nature of the alphabet. Om uses ASCII characters to represent Indian language alphabets, and thus can be read directly in English, by a large number of users who cannot read script in other Indian languages than their mother tongue. It is also useful in computer applications where local language tools such as email and chat are not yet available. Another significant contribution presented in this paper is the development of a text editor for Indian languages that integrates the Om input for many Indian languages into a word processor such as Microsoft Win Word®. The text editor is also developed on Java® platform that can run on Unix machines as well. We propose this transliteration scheme as a possible standard for Indian language transliteration and keyboard entry.

Content maybe subject to copyright    Report

Citations
More filters
01 Jan 2014
TL;DR: Investigation of the phonological length of utterance in native Kannada speaking children of 3 to 7 years age revealed increase inPMLU score as the age increased suggesting a developmental trend in PMLU acquisition.
Abstract: Phonological mean length of utterance (PMLU) is a whole word measure for measuring phonological proficiency. It measures the length of a child’s word and the number of correct consonants. The present study investigated the phonological length of utterance in native Kannada speaking children of 3 to 7 years age. A total of 400 subjects in the age range of 3-7 years participated in the study. Spontaneous speech samples were elicited from each child and analyzed for PMLU as per the rules suggested by Ingram. Mann-Whitney U test and Kruskal Wallis test were employed to compare the differences between the means of PMLU scores across the gender and the age respectively. The result revealed increase in PMLU score as the age increased suggesting a developmental trend in PMLU acquisition. No statistically significant differences were observed between the means of PMLU scores across the gender.

230 citations

01 Jan 2007
TL;DR: The efforts in addressing the issues of Font-to-Akshara mapping, pronunciation rules for Aksharas, text normalization in the context of building text- to-speech systems in Indian languages are discussed.
Abstract: To build a natural sounding speech synthesis system, it is essential that the text processing component produce an appropriate sequence of phonemic units corresponding to an arbitrary input text. In this paper we discuss our efforts in addressing the issues of Font-to-Akshara mapping, pronunciation rules for Aksharas, text normalization in the context of building text-to-speech systems in Indian languages.

44 citations

Proceedings Article
01 Jan 2008
TL;DR: A more discerning method which applies different techniques based on the word origin in transliteration, which does not require training data on the target side, while it uses more sophisticated techniques on the source side.
Abstract: Transliteration is the process of transcribing words from a source script to a target script. These words can be content words or proper nouns. They may be of local or foreign origin. In this paper we present a more discerning method which applies different techniques based on the word origin. The techniques used also take into account the properties of the scripts. Our approach does not require training data on the target side, while it uses more sophisticated techniques on the source side. Fuzzy string matching is used to compensate for lack of training on the target side. We have evaluated on two Indian languages and have achieved substantially better results (increase of up to 0.44 in MRR) than the baseline and comparable to the state of the art. Our experiments clearly show that word origin is an important factor in achieving higher accuracy in transliteration.

37 citations

Journal ArticleDOI
TL;DR: The usefulness of the Unicode based approach to build transliteration editors for Indian languages using Unicode and by taking advantage of its rendering engine is demonstrated and its advantages needing little maintenance and few entries in the mapping table, and ease of adding new features such as adding letters, to the transliterations scheme are reported.
Abstract: Transliteration editors are essential for keying-in Indian language scripts into the computer using QWERTY keyboard. Applications of transliteration editors in the context of Universal Digital Library (UDL) include entry of meta-data and dictionaries for Indian languages. In this paper we propose a simple approach for building transliteration editors for Indian languages using Unicode and by taking advantage of its rendering engine. We demonstrate the usefulness of the Unicode based approach to build transliteration editors for Indian languages, and report its advantages needing little maintenance and few entries in the mapping table, and ease of adding new features such as adding letters, to the transliteration scheme. We demonstrate the transliteration editor for 9 Indian languages and also explain how this approach can be adapted for Arabic scripts.

36 citations

Journal ArticleDOI
31 Aug 2013
TL;DR: This paper has proposed the name d entity transliteration for Hindi to English and Mar athi to English language pairs using Support Vector Machine (SVM), and uses phonetic of the source language and nas two features for translator.
Abstract: Language transliteration is one of the important ar eas in NLP. Transliteration is very useful for conv erting the named entities (NEs) written in one script to a nother script in NLP applications like Cross Lingua l Information Retrieval (CLIR), Multilingual Voice Ch at Applications and Real Time Machine Translation (MT). The most important requirement of Translitera tion system is to preserve the phonetic properties of source language after the transliteration in target language. In this paper, we have proposed the name d entity transliteration for Hindi to English and Mar athi to English language pairs using Support Vector Machine (SVM). In the proposed approach, the source named entity is segmented into transliteration uni ts; hence transliteration problem can be viewed as sequ ence labeling problem. The classification of phonet ic units is done by using the polynomial kernel functi on of Support Vector Machine (SVM). Proposed approach uses phonetic of the source language and nas two features for transliteration.

25 citations

References
More filters
01 Jan 2006
TL;DR: The goal of the Universal Digital Library Project (UDL) is described and the approach taken by-and the technological challenges associated with-the Million Books to the Web Project (MBP) are presented.
Abstract: This paper describes the goal of the Universal Digital Library Project (UDL) and presents the approach taken by-and the technological challenges associated with-the Million Books to the Web Project (MBP) The Digital Library of India (DLI) initiative, which is the Indian part of the UDL and MBP, is discussed DLI fosters a large number of research activities in areas such as text summarization, information retrieval, machine translation and transliteration, optical character recognition, handwriting recognition, and natural language parsing and morphological analyses This paper provides an overview of the activities of DLI in these areas and shows how DLI serves as a multilingual resource

13 citations