scispace - formally typeset
Search or ask a question
Author

Reddy

Bio: Reddy is an academic researcher. The author has contributed to research in topics: First language & Grammar. The author has an hindex of 1, co-authored 1 publications receiving 20 citations.

Papers
More filters
Ganapathiraju, Madhavi, Balakrishnan, Reddy, Raj 
01 Jan 2005
TL;DR: The development of a transliteration scheme Om which uses ASCII characters to represent Indian language alphabets, and thus can be read directly in English, by a large number of users who cannot read script in other Indian languages than their mother tongue.
Abstract: Many different languages are spoken in India, each language being the mother tongue of tens of millions of people.While the languages and scripts are distinct from each other, the grammar and the alphabet are similar to a large extent. One common feature is that all the Indian languages are phonetic in nature. In this paper we describe the development of a translit eration scheme Om which exploits this phonetic nature of the alphabet. Om uses ASCⅡ characters to represent Indian language alphabets, and thus can be read directly in English, by a large number of users who cannot read script in other Indian languages than their mother tongue. It is also useful in computer applications where local language tools such as email and chat are not yet available. Another significant contribution presented in this paper is the development of a text editor for Indian languages that integrates the Om input for many Indian languages into a word processor such as Microsoft WinWord(R). The text editor is also developed on Java(R) platform that can run on Unix machines as well. We propose this transliteration scheme as a possible standard for Indian language transliteration and keyboard entry.

20 citations


Cited by
More filters
01 Jan 2007
TL;DR: The efforts in addressing the issues of Font-to-Akshara mapping, pronunciation rules for Aksharas, text normalization in the context of building text- to-speech systems in Indian languages are discussed.
Abstract: To build a natural sounding speech synthesis system, it is essential that the text processing component produce an appropriate sequence of phonemic units corresponding to an arbitrary input text. In this paper we discuss our efforts in addressing the issues of Font-to-Akshara mapping, pronunciation rules for Aksharas, text normalization in the context of building text-to-speech systems in Indian languages.

44 citations

Proceedings Article
01 Jan 2008
TL;DR: A more discerning method which applies different techniques based on the word origin in transliteration, which does not require training data on the target side, while it uses more sophisticated techniques on the source side.
Abstract: Transliteration is the process of transcribing words from a source script to a target script. These words can be content words or proper nouns. They may be of local or foreign origin. In this paper we present a more discerning method which applies different techniques based on the word origin. The techniques used also take into account the properties of the scripts. Our approach does not require training data on the target side, while it uses more sophisticated techniques on the source side. Fuzzy string matching is used to compensate for lack of training on the target side. We have evaluated on two Indian languages and have achieved substantially better results (increase of up to 0.44 in MRR) than the baseline and comparable to the state of the art. Our experiments clearly show that word origin is an important factor in achieving higher accuracy in transliteration.

37 citations

Journal ArticleDOI
TL;DR: The usefulness of the Unicode based approach to build transliteration editors for Indian languages using Unicode and by taking advantage of its rendering engine is demonstrated and its advantages needing little maintenance and few entries in the mapping table, and ease of adding new features such as adding letters, to the transliterations scheme are reported.
Abstract: Transliteration editors are essential for keying-in Indian language scripts into the computer using QWERTY keyboard. Applications of transliteration editors in the context of Universal Digital Library (UDL) include entry of meta-data and dictionaries for Indian languages. In this paper we propose a simple approach for building transliteration editors for Indian languages using Unicode and by taking advantage of its rendering engine. We demonstrate the usefulness of the Unicode based approach to build transliteration editors for Indian languages, and report its advantages needing little maintenance and few entries in the mapping table, and ease of adding new features such as adding letters, to the transliteration scheme. We demonstrate the transliteration editor for 9 Indian languages and also explain how this approach can be adapted for Arabic scripts.

36 citations

Journal ArticleDOI
31 Aug 2013
TL;DR: This paper has proposed the name d entity transliteration for Hindi to English and Mar athi to English language pairs using Support Vector Machine (SVM), and uses phonetic of the source language and nas two features for translator.
Abstract: Language transliteration is one of the important ar eas in NLP. Transliteration is very useful for conv erting the named entities (NEs) written in one script to a nother script in NLP applications like Cross Lingua l Information Retrieval (CLIR), Multilingual Voice Ch at Applications and Real Time Machine Translation (MT). The most important requirement of Translitera tion system is to preserve the phonetic properties of source language after the transliteration in target language. In this paper, we have proposed the name d entity transliteration for Hindi to English and Mar athi to English language pairs using Support Vector Machine (SVM). In the proposed approach, the source named entity is segmented into transliteration uni ts; hence transliteration problem can be viewed as sequ ence labeling problem. The classification of phonet ic units is done by using the polynomial kernel functi on of Support Vector Machine (SVM). Proposed approach uses phonetic of the source language and nas two features for transliteration.

25 citations

Journal ArticleDOI
TL;DR: This paper focuses on Hindi to English machine transliteration of Indian named entities such as proper nouns, place names and organization names using conditional random fields (CRF).
Abstract: Machine transliteration has received significant research attention in recent years. In most cases, the source language has been English and the target language is an Asian language. This paper focuses on Hindi to English machine transliteration of Indian named entities such as proper nouns, place names and organization names using conditional random fields (CRF). Hindi is the national language of the India and spoken by more than 500 millions Indian. Hindi is the world‟s fourth most commonly used language after Chinese, English and Spanish. This system takes Indian place name as an input in Hindi language using Devanagari script and transliterates it into English. The input to the system is provided in the form of syllabification in order to apply the n-gram techniques. As more than 50% named entities are formed as a combination of two and three syllabic units, the ngram approach with unigrams, bigrams and trigrams of Hindi are used to train the corpus. The system provides the satisfactory performance for trigrams as compared to unigrams and bigrams. General Terms Machine Transliteration

18 citations