scispace - formally typeset
Search or ask a question
Author

Jatinderkumar R. Saini

Bio: Jatinderkumar R. Saini is an academic researcher from Open University. The author has contributed to research in topics: Machine translation & Sanskrit grammar. The author has an hindex of 3, co-authored 4 publications receiving 11 citations. Previous affiliations of Jatinderkumar R. Saini include Symbiosis International University.

Papers
More filters
Proceedings ArticleDOI
01 Jul 2019
TL;DR: Grammatical divergences are discussed here so as to reflect in implementation of Machine Translation System (MTS), due to scarce or unavailability of parallel aligned corpora to incorporate statistical or Example based methodology.
Abstract: Looking at vastness, depth and precise nature of Sanskrit grammar and geographically wide proliferation of Gujarati language and its native speaker, it becomes necessary to spotlight on constituency characteristics and features of Sanskrit and Gujarati. Both the languages fall under Indo-Iranian language sub-tree, but there are grammatical divergences which are discussed here so as to reflect in implementation of Machine Translation System (MTS). The content revolves around divergence pattern for a rule base MT system, due to scarce or unavailability of parallel aligned corpora to incorporate statistical or Example based methodology. The Sanskrit grammatical constituents like indeclinables, pronouns, verbs and nouns are analyzed. The Sanskrit inflectional affixes are mapped to its Gujarati inflectional affixes for each equivalent grammar constituent.

7 citations

Book ChapterDOI
01 Jan 2020
TL;DR: The list of most common seventy-five Sanskrit stopwords are evaluated using rule-based morphological analyzer and most stopwords were classified as indeclinables and pronouns.
Abstract: The identification and removal of a stopword is a common preprocessing task in many natural language processing implementations. The morphologically parsed information of stopword is also relevant in analysis of various NLP tasks. The list of most common seventy-five Sanskrit stopwords are evaluated using rule-based morphological analyzer. Most stopwords were classified as indeclinables and pronouns. The Gujarati equivalent of stopwords is retrieved using bilingual dictionary so as to cache the data for faster retrieval during MT process.

5 citations

Book ChapterDOI
01 Jan 2020
TL;DR: Sanskrit-Gujarati bilingual dictionary design, contents and its applicability is discussed to facilitate use-cases like machine translation, cross-lingual information retrieval, stemming, lemmatization, and other related task.
Abstract: Working with cross-linguistic environments where lexico-semantic features are vital, the use of digitized bilingual dictionary cannot be overlooked. Here, Sanskrit-Gujarati bilingual dictionary design, contents and its applicability is discussed. The Sanskrit-Gujarati lemmas are correspondingly mapped so as to facilitate use-cases like machine translation, cross-lingual information retrieval, stemming, lemmatization, and other related task. The dictionary design and implementation is through Comma Separated Verbose (CSV) format and Relational Database Management System (RDBMS), but also convertible to formatted tag-based form for better portability. It is usual to have bilingual dictionary for scarce resourced languages prepared manually as opposed to automated and aligned bilingual corpora method for several Natural Language Processing (NLP) related task.

4 citations

Book ChapterDOI
01 Jan 2020
TL;DR: Here, 328 Sanskrit words are tested through four morphological analyzers namely—Samsaadhanii, morphological Analyzers by JNU and TDIL, both of which are available online and locally developed and installed Sanguj morphological analyzezer.
Abstract: In linguistics, morphology is a study regarding word, word formation, its analysis, and generation. A morphological analyzer is a tool to understand grammatical characteristics and constituent’s part-of-speech information. A morphological analyzer is a useful tool in many NLP implementations such as syntactic parser, spell checker, information retrieval, and machine translation. Here, 328 Sanskrit words are tested through four morphological analyzers namely—Samsaadhanii, morphological analyzers by JNU and TDIL, both of which are available online and locally developed and installed Sanguj morphological analyzer. There is a negligible divergence in the reflected results.

2 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A comprehensive survey of MTS in general and for English, Hindi and Sanskrit languages in particular is presented and the availability of MT language modeling tools, parsers data repositories and evaluation metrics is tabulated.
Abstract: Transforming text from one language to another by using computer systems automatically or with little human interventions is known as Machine Translation System (MTS). Divergence among natural languages in a multilingual environment makes Machine Translation (MT) a difficult and challenging task. The purpose of this paper is to present a comprehensive survey of MTS in general and for English, Hindi and Sanskrit languages in particular. The state-of-the-art MT approach is Neural Machine Translation (NMT) which has been used by Google, Amazon, Facebook and Microsoft but it requires large corpus as well as high computing systems. The availability of MT language modeling tools, parsers data repositories and evaluation metrics has been tabulated in this article. The classification of MTS, evaluation methods and platforms has been done based on a well-defined set of criteria. The new research avenues have been explored in this survey article which will help in developing good quality MTS. Although several surveys have been done on MTS but none of them have followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) approach including tools and evaluation methods as done in this survey specifically for English, Hindi and Sanskrit languages.

14 citations

Proceedings ArticleDOI
01 Jun 2020
TL;DR: The proposed Gujarati to English Idioms translator accurately translates the trigram and bigram idioms.
Abstract: Gujarati language is the official language of the state of Gujarat located on the western region of India. Machine Translation System (MTS) translates text from one language to other language. Based on our review, we found that very few machine translation systems are available that converts Gujarati text into English language. This paper focuses on the translation of Gujarati trigram idioms. Idiom is defined as a token-sequence whose meaning is different from the literal meaning of the individual tokens. The proposed Gujarati to English Idioms translator accurately translates the trigram and bigram idioms. We have created the corpus of nearly 3000 n-gram idioms and from this corpus we have found nearly 890 trigram idioms and 1735 bigram idioms. This paper studies the translation of trigram and bigram idioms.

7 citations

Journal ArticleDOI
TL;DR: The proposed approach processes the oldest, untouched, one of the morphologically critical languages, Sanskrit and builds a document term matrix for Sanskrit (DTMS) and Document synset matrix Sanskrit (DSMS) to solve the problem of polysemy.
Abstract: Identifying the similarity between two documents is a challenging but important task. It benefits various applications like recommender systems, plagiarism detection and so on. To process any text document one of the popularly used approaches is document term matrix (DTM). The proposed approach processes the oldest, untouched, one of the morphologically critical languages, Sanskrit and builds a document term matrix for Sanskrit (DTMS) and Document synset matrix Sanskrit (DSMS). DTMS uses the frequency of the term whereas DSMS uses the frequency of synset instead of term and contributes to the dimension reduction. The proposed approach considers the semantics and context of the corpus to solve the problem of polysemy. More than 760 documents including Subhashitas and stories are processed together. F1 Score, precision, Matthews Correlation coefficient (MCC) which is the most balanced measure and accuracy are used to prove the betterment of the proposed approach.

5 citations

Book ChapterDOI
01 Jan 2020
TL;DR: Here, 328 Sanskrit words are tested through four morphological analyzers namely—Samsaadhanii, morphological Analyzers by JNU and TDIL, both of which are available online and locally developed and installed Sanguj morphological analyzezer.
Abstract: In linguistics, morphology is a study regarding word, word formation, its analysis, and generation. A morphological analyzer is a tool to understand grammatical characteristics and constituent’s part-of-speech information. A morphological analyzer is a useful tool in many NLP implementations such as syntactic parser, spell checker, information retrieval, and machine translation. Here, 328 Sanskrit words are tested through four morphological analyzers namely—Samsaadhanii, morphological analyzers by JNU and TDIL, both of which are available online and locally developed and installed Sanguj morphological analyzer. There is a negligible divergence in the reflected results.

2 citations

Journal ArticleDOI
TL;DR: In this paper , a machine translation framework using a grammatical transfer approach to translate the written Sanskrit language to Gujarati has been proposed and realized, which uses a tokenization, lemmatization, morphological analysis, bilingual synonym-based dictionary, language synthesis, and transliteration.
Abstract: —Sanskrit falls under the Indo-European language family category. Gujarati, which has descended from the Sanskrit language, is a widely spoken language particularly in the Indian state of Gujarat. The proposed and realized Machine Translation framework uses a grammatical transfer approach to translate the written Sanskrit language to Gujarati. Because both languages are morphologically rich, studying the morphology of each item is difficult but necessary to incorporate into implementation. To improve the implementation accuracy and translation clarity, an in-depth research of the creation of Nouns, Verbs, Pronouns, and Indeclinables, as well as their mappings, has been carried out. Tokenization, lemmatization, morphological analysis, Sanskrit-Gujarati bilingual synonym-based dictionary, language synthesis, and transliteration are the proposed framework's primary components. The implementation outcome was tested on 1,000 phrases, using the automated Bilingual Evaluation Understudy (BLEU) scale which yielded a value of 58.04 It was also tested on the ALPAC scale, yielding the Intelligibility score of 69.16 and the Fidelity score of 68.11. The results are encouraging and prove that the proposed system is promising and robust for the implementation in the real world applications.

1 citations